Skip to main content

⭐️squeue

Overview

squeue is the command in Slurm for viewing job queue status:

  • Display all pending and running jobs
  • Support multiple filtering, sorting, and formatting options
  • View detailed job information and state reasons

Common Options

OptionDescriptionExample
-a, --all Show all jobs in queuessqueue -a
-j, --job=job(s) Specify job IDsqueue -j 123,124
-u, --user=user_name(s)Filter by usersqueue -u user1
-p, --partition=partition(s)Filter by partitionsqueue -p gpu
-t, --states=statesFilter by statesqueue -t RUNNING
-i, --iterate=secondsRefresh intervalsqueue -i 5 (refresh every 5 seconds)
-o, --format=formatCustom outputsqueue -o "%i %P %j %u %t %M %D %R"
-S, --sort=fieldsSort outputsqueue -S "-t" (sort by time desc)
--startShow expected start timesqueue --start

Examples

Use squeue to view job status.

# squeue -a
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
8 compute sleep root R 8-02:07:58 2 ip-10-10-2-[70,80]

Here JOBID is the job ID, NAME is the job name, USER is the user, TIME is elapsed runtime, NODES is the number of nodes in use, and NODELIST is the list of nodes running the job. -a means show all jobs in all queues.

squeue supports showing multiple fields. For example, in addition to the default output, show the submit time and working directory:

squeue -a -o "%.18i %.9P %.8j %.8u %.2t %20V %.10M %.6D %R %Z"

Output Field Details

  • JOBID: Job ID.

  • PARTITION: Partition name.

  • NAME: Job name.

  • USER: Username.

  • ST: State.

    • PD: pending, PENDING.
    • R: running, RUNNING.
    • CA: cancelled, CANCELLED.
    • CF: configuring, CONFIGURING.
    • CG: completing, COMPLETING.
    • CD: completed, COMPLETED.
    • F: failed, FAILED.
    • TO: timeout, TIMEOUT.
    • NF: node failure, NODE FAILURE.
    • SE: special exit state, SPECIAL EXIT STATE.
  • TIME: Elapsed runtime.

  • NODELIST(REASON): Allocated node list (reason):

    • AssociationCpuLimit: The association's specified CPUs are in use; the job will run eventually.
    • AssociationMaxJobsLimit: The association's max jobs limit has been reached; the job will run eventually.
    • AssociationNodeLimit: The association's specified nodes are in use; the job will run eventually.
    • AssociationJobLimit: The job has reached its maximum allowed job count limit.
    • AssociationResourceLimit: The job has reached its maximum allowed resource limit.
    • AssociationTimeLimit: The job has reached its time limit.
    • BadConstraints: The job has constraints that cannot be satisfied.
    • BeginTime: The job's earliest start time has not been reached.
    • Cleaning: The job was requeued and is still performing cleanup from the previous run.
    • Dependency: The job is waiting for a dependent job to finish.
    • FrontEndDown: No front-end node is available to run this job.
    • InactiveLimit: The job has reached the system inactive limit.
    • InvalidAccount: The job user account is invalid; cancel and resubmit with the correct account.
    • InvalidQOS: The job QOS is invalid; cancel and resubmit with the correct QoS.
    • JobHeldAdmin: The job is held by the system administrator.
    • JobHeldUser: The job is held by the user.
    • JobLaunchFailure: The job could not be launched, possibly due to filesystem failures or invalid program names.
    • Licenses: The job is waiting for the required licenses.
    • NodeDown: The required nodes are down.
    • NonZeroExitCode: The job ended with a non-zero exit code.
    • PartitionDown: The required partition is in DOWN state.
    • PartitionInactive: The required partition is in Inactive state.
    • PartitionNodeLimit: The job's required nodes exceed the current partition limit.
    • PartitionTimeLimit: The job's required partition reached its time limit.
    • PartitionCpuLimit: The CPUs for the job's partition are already in use; the job will run eventually.
    • PartitionMaxJobsLimit: The partition's max jobs limit has been reached; the job will run eventually.
    • PartitionNodeLimit: The specified nodes in the job's partition are already in use; the job will run eventually.
    • Priority: The required partition has higher priority jobs or reservations.
    • Prolog: The job's PrologSlurmctld prolog is still running.
    • QOSJobLimit: The job's QOS has reached its max jobs limit.
    • QOSResourceLimit: The job's QOS has reached its max resource limit.
    • QOSGrpCpuLimit: All CPUs for the job's QoS group are in use; the job will run eventually.
    • QOSGrpMaxJobsLimit: The job's QoS group max jobs limit has been reached; the job will run eventually.
    • QOSGrpNodeLimit: All nodes for the job's QoS group are in use; the job will run eventually.
    • QOSTimeLimit: The job's QOS has reached its time limit.
    • QOSUsageThreshold: The required QOS usage threshold was violated.
    • ReqNodeNotAvail: The required nodes are not available, such as when nodes are down.
    • Reservation: The job is waiting for its reserved resources to become available.
    • Resources: The job will wait until the required resources are available.
    • SystemFailure: Slurm system failure, such as filesystem or network failure.
    • TimeLimit: The job exceeded its time limit.
    • QOSUsageThreshold: The required QoS usage threshold was violated.
    • WaitingForScheduling: Waiting to be scheduled.