Skip to main content

Manage Compute Jobs and Resources

squeue View queued or running jobs

Use the squeue command to get the current job status. If the job you want is not shown in the squeue output, it means the job has already exited.

In the output, the ST column is the job status. The status codes mean:

  • R: Running
  • PD: Pending
  • CG: Completing
  • S: Suspended

Common squeue command and option combinations are as follows:

FunctionCommand Example
Show the status of all jobs in the queuesqueue
View job information for job ID 11squeue -j 11
View job information for user user1squeue -u user1
View jobs submitted to partition01squeue -p partition01
View jobs using node compute01squeue -w compute01
View jobs in Pending statesqueue --state=PENDING
View detailed info for a job with custom outputsqueue -j 11 -o "%.18i %.9P %.8j %.8u %.2t %20V %.10M %.6D %R %Z"
View detailed info for a partition with custom outputsqueue -p partition01 -o "%.18i %.9P %.8j %.8u %.2t %20V %.10M %.6D %R %Z"

Other options can be viewed with squeue --help.

sinfo View partition information

The main function of sinfo is to view status information for partitions and nodes. Common command and option combinations are as follows:

FunctionCommand Example
Show status of all partitions in the clustersinfo -Nl
Show usage of a specified partitionsinfo -p partition01
Show detailed usage of a specified partitionsinfo -p partition01 -N -o "%20N %15C %.5a %.6t"

Node status meanings in sinfo output:

  • alloc: Node is allocated
  • drain: Node is drained/unresponsive; no new jobs will be assigned in this state
  • idle: Node is idle
  • mix: Node has partial resources allocated
  • comp: Node is releasing resources; nodes in other states are unavailable

Example

[root@login1 ~]# sinfo -N -o "%20N %15C %.5a %.6t"
NODELIST CPUS(A/I/O/T) AVAIL STATE
compute01 0/4/0/4 up idle
compute02 0/4/0/4 up idle
compute03 0/4/0/4 up idle

In the second column CPUS(A/I/O/T), A = CPUs used by jobs, I = idle CPUs, T = total CPUs on the node.

Common sinfo options

--help    # Show help for the sinfo command;
-d # Show non-responsive nodes in the cluster;
-i <seconds> # Refresh partition/node output every N seconds
-n <name_list> # Show specified node(s); separate multiple nodes with commas;
-N # Display one line per node;
-p # <partition> Show specified partition(s); separate multiple partitions with commas;
-r # Show only responsive nodes;
-R # Show reasons for node issues;

Output in a specified format;

-o #<output_format>    Show specified output. The format is %[[.]size]type. "." means right alignment; omitted means left alignment. size is the field width; type is the item to display. Common items include:
%a Availability state
%A Show node counts as "allocated/idle"; do not use with "%t" or "%T"
%c Number of cores per node
%C Total cores as "allocated/idle/other/total"
%D Total number of nodes
%E Reason a node is unavailable
%m Memory per node (in M)
%N Node name
%O CPU load
%P Partition name; the default partition is marked with "*"
%r Only root can submit jobs (yes/no)
%R Partition name
%t Node state (compact form)
%T Node state (extended form)

scancel Cancel running or queued jobs and view job status

The scancel command can cancel running or pending jobs in the queue.

Common commands and parameter examples:

FunctionCommand Example
Cancel job ID 11scancel 11
Cancel job named test-001scancel -n test-001
Cancel jobs submitted to partition01scancel -p partition01
Cancel pending jobsscancel -t PENDING
Cancel jobs running on node compute01scancel -w -n compute01 -t RUNNING

Other parameter options can be viewed with scancel --help.

Common scancel options:

--help                # Show help for the scancel command;
-A <account> # Cancel jobs for the specified account; if no job_id is specified, cancel all;
-n <job_name> # Cancel jobs with the specified job name;
-p <partition_name> # Cancel jobs in the specified partition;
-q <qos> # Cancel jobs with the specified qos;
-t <job_state_name> # Cancel jobs in the specified state, "PENDING", "RUNNING" or "SUSPENDED";
-u <user_name> # Cancel jobs for the specified user;

sacct View historical job information

The sacct command can view historical job start/end time, end status, job ID, job name, number of nodes used, node list, runtime, and more.

Example

View runtime information for a job:

sacct -j 29

The output includes: job ID, job name, partition, billing account, requested CPU count, status, and exit code.

[root@head ~]# sacct -j 9
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
9 sleep partition+ _fsched_a+ 1 COMPLETED 0:0
9.extern extern _fsched_a+ 1 COMPLETED 0:0
9.0 sleep _fsched_a+ 1 COMPLETED 0:0

You can add output parameters to view detailed job information, for example:

[root@head ~]# sacct -j 9 -X -o jobid,jobname%50,user,group,partition,submit,start,end,state,alloccpus,reqmem,elapsed,exitcode,workdir%300
JobID JobName User Group Partition Submit Start End State AllocCPUS ReqMem Elapsed ExitCode WorkDir
------------ -------------------------------------------------- --------- --------- ---------- ------------------- ------------------- ------------------- ---------- ---------- ---------- ---------- -------- -----------------------------------------------------
9 sleep cyan cyan partition+ 2024-09-03T00:25:02 2024-09-03T00:25:02 2024-09-03T00:26:02 COMPLETED 1 1Mn 00:01:00 0:0 /fastone/users/cyan

View job history and runtime information since a specific time:


[root@head ~]# sacct -X -T -S2024-08-10-11:00:00 -o jobid,jobname,user,partition,submit,start,end,state,alloccpus,reqmem,elapsed,exitcode,workdir
JobID JobName User Partition Submit Start End State AllocCPUS ReqMem Elapsed ExitCode WorkDir
------------ ---------- --------- ---------- ------------------- ------------------- ------------------- ---------- ---------- ---------- ---------- -------- --------------------
2 hostname shaobing+ partition+ 2024-07-31T23:13:41 2024-07-31T23:13:41 2024-07-31T23:13:42 COMPLETED 1 1Mn 00:00:01 0:0 /fastone/users/shao+
3 big_task1 cadservi+ partition+ 2024-08-01T06:47:18 2024-08-01T06:47:19 2024-08-01T06:49:29 CANCELLED+ 6 1Mn 00:02:10 0:0 /fastone/users/cads+
4 big_task1 cadservi+ partition+ 2024-08-01T06:53:05 2024-08-01T06:53:05 2024-08-01T06:53:45 CANCELLED+ 6 1Mn 00:00:40 0:0 /fastone/users/cads+
5 big_task1 cadservi+ partition+ 2024-08-01T22:04:58 2024-08-01T22:04:59 2024-08-01T22:06:09 CANCELLED+ 6 1Mn 00:01:10 0:0 /fastone/users/cads+
6 big_task1 cadservi+ partition+ 2024-08-01T22:08:04 2024-08-01T22:08:05 2024-08-01T22:08:19 CANCELLED+ 2 1Mn 00:00:14 0:0 /fastone/users/cads+
7 big_task1 cadservi+ partition+ 2024-08-01T22:11:50 2024-08-01T22:11:51 2024-08-01T22:11:51 FAILED 2 1Mn 00:00:00 127:0 /fastone/users/cads+
8 Fano-slot shaobing+ partition+ 2024-08-01T22:50:31 2024-08-01T22:50:32 2024-08-01T22:57:46 FAILED 4 1Mn 00:07:14 127:0 /fastone/users/shao+
9 sleep cyan partition+ 2024-09-03T00:25:02 2024-09-03T00:25:02 2024-09-03T00:26:02 COMPLETED 1 1Mn 00:01:00 0:0 /fastone/users/cyan

For more output fields, see sacct -help.

# For sacct output, -o can include the following fields:

--format=jobid,jobname,partition,maxvmsize,maxvmsizenode,
maxvmsizetask,avevmsize,maxrss,maxrssnode,
maxrsstask,averss,maxpages,maxpagesnode,
maxpagestask,avepages,mincpu,mincpunode,
mincputask,avecpu,ntasks,alloccpus,elapsed,
state,exitcode,avecpufreq,reqcpufreqmin,
reqcpufreqmax,reqcpufreqgov,consumedenergy,
maxdiskread,maxdiskreadnode,maxdiskreadtask,
avediskread,maxdiskwrite,maxdiskwritenode,
maxdiskwritetask,avediskread,allocgres,reqgres

# If output is truncated, add "%field_length" after a format item to show more, for example "workdir%300"

scontrol View Fsched configuration and status

scontrol is used to view or modify Fsched configuration, including jobs, job steps, nodes, partitions, reservations, and overall system configuration. Regular users can use scontrol to query and display many Fsched status details, while most modification commands can only be executed by the root user or administrators.

Common command and parameter examples for regular users:

FunctionCommand Example
View details of job ID 9scontrol show job 9
View details of all running, queued, and just completed jobsscontrol show job
View details of node compute03scontrol show node compute03
View details of all nodesscontrol show node
View details of all partitionsscontrol show partition
View Fsched configuration informationscontrol show config

Other options can be viewed with scontrol --help.