⭐️sinfo

Overview

sinfo is the command in Slurm for viewing cluster partitions and node status:

Display partition information
View node resource status
Monitor system load

Common Options

Option	Description	Example
`-a, --all`	Show all partitions	`sinfo -a`
`-l, --long`	Show detailed information	`sinfo -l`
`-N, --Node`	Show by node	`sinfo -N`
`-p, --partition=PARTITION`	Specify partition	`sinfo -p gpu`
`-t, --states=node_state`	Filter by state	`sinfo -t idle`
`-o, --format=format`	Custom output format	`sinfo -o "%P %a %D %T"`
`-S, --sort=fields`	Sort output	`sinfo -S +P,-m`
`-i, --iterate=seconds`	Refresh interval	`sinfo -i 5` (refresh every 5 seconds)

Examples

View partition status

# sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
compute*     up   infinite      1  down* ip-10-10-2-109
compute*     up   infinite      2    mix ip-10-10-2-[70,80]

Here, PARTITION indicates the partition, NODES indicates the number of nodes, NODELIST is the node list, and STATE indicates the node state. idle means the node is idle, and allocated means the node has one or more jobs allocated.

View detailed partition information

sinfo -l shows more information:

# sinfo -l
PARTITION AVAIL TIMELIMIT JOB_SIZE   ROOT OVERSUBS GROUPS NODES STATE NODELIST 
compute*  up    infinite  1-infinite no   NO       all    2     mixed compute[1-2]

View detailed node information

sinfo -Nl shows one node per line, i.e. details for each node:

# sinfo -Nl
NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT 
node1    1     compute*  idle  16  2:8:1 64000   0        1
node2    1     compute*  alloc 16  2:8:1 64000   0        1

Output Field Details

AVAIL: up means available, down means unavailable.
CPUS: Number of CPUs on each node.
S:C:T: Number of CPU sockets (S), CPU cores (C), and threads (T) on each node. One CPU socket can contain multiple CPU cores, and so on.
SOCKETS: Number of CPU sockets on each node.
CORES: Number of CPU cores on each node.
THREADS: Number of threads on each node.
GROUPS: User groups that can use the partition; all means all groups can use it.
JOB_SIZE: Minimum and maximum number of nodes available for a user job. If only one value is shown, min and max are the same. infinite means no limit.
TIMELIMIT: Job walltime limit (walltime refers to actual elapsed time measured by a clock). infinite means no limit. If limited, the format is "days-hours:minutes:seconds".
MEMORY: Physical memory size in MB.
NODELIST: Node name list, formatted like node[1-10,11,13-28].
NODES: Node count.
NODES(A/I): Node count, with state format "available/idle".
NODES(A/I/O/T): Node count, with state format "available/idle/other/total".
PARTITION: Partition name; a trailing * means it is the default partition.
ROOT: Whether resources are restricted to the root account.
OVERSUBSCRIBE: Whether job allocations can exceed compute resources (such as CPU count):
- no: oversubscription not allowed.
- exclusive: exclusive; only these jobs can use the resources (equivalent to srun --exclusive).
- force: resources are always oversubscribed.
- yes: resources can be oversubscribed.
STATE: Node state. Possible values include:
- allocated, alloc: allocated.
- completing, comp: completing.
- down: down.
- drained, drain: drained.
- draining, drng: draining.
- fail: failed.
- failing, failg: failing.
- future, futr: future (available later).
- idle: idle and can accept new jobs.
- maint: maintenance.
- mixed: mixed; the node is running jobs but has some idle CPU cores and can accept new jobs.
- perfctrs, npc: unavailable due to network performance counters in use.
- power_down, pow_dn: powered down.
- power_up, pow_up: powering up.
- reserved, resv: reserved.
- unknown, unk: unknown.
Note: if the state has a suffix *, the node is not responding.
TMP_DISK: Size of the partition where /tmp resides, in MB.

Overview​

Common Options​

Examples​

View partition status​

View detailed partition information​

View detailed node information​

Output Field Details​