⭐️sinfo
Overview
sinfo is the command in Slurm for viewing cluster partitions and node status:
- Display partition information
- View node resource status
- Monitor system load
Common Options
| Option | Description | Example |
|---|---|---|
-a, --all | Show all partitions | sinfo -a |
-l, --long | Show detailed information | sinfo -l |
-N, --Node | Show by node | sinfo -N |
-p, --partition=PARTITION | Specify partition | sinfo -p gpu |
-t, --states=node_state | Filter by state | sinfo -t idle |
-o, --format=format | Custom output format | sinfo -o "%P %a %D %T" |
-S, --sort=fields | Sort output | sinfo -S +P,-m |
-i, --iterate=seconds | Refresh interval | sinfo -i 5 (refresh every 5 seconds) |
Examples
View partition status
# sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
compute* up infinite 1 down* ip-10-10-2-109
compute* up infinite 2 mix ip-10-10-2-[70,80]
Here, PARTITION indicates the partition, NODES indicates the number of nodes, NODELIST is the node list, and STATE indicates the node state. idle means the node is idle, and allocated means the node has one or more jobs allocated.
View detailed partition information
sinfo -l shows more information:
# sinfo -l
PARTITION AVAIL TIMELIMIT JOB_SIZE ROOT OVERSUBS GROUPS NODES STATE NODELIST
compute* up infinite 1-infinite no NO all 2 mixed compute[1-2]
View detailed node information
sinfo -Nl shows one node per line, i.e. details for each node:
# sinfo -Nl
NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT
node1 1 compute* idle 16 2:8:1 64000 0 1
node2 1 compute* alloc 16 2:8:1 64000 0 1
Output Field Details
-
AVAIL: up means available, down means unavailable. -
CPUS: Number of CPUs on each node. -
S:C:T: Number of CPU sockets (S), CPU cores (C), and threads (T) on each node. One CPU socket can contain multiple CPU cores, and so on. -
SOCKETS: Number of CPU sockets on each node. -
CORES: Number of CPU cores on each node. -
THREADS: Number of threads on each node. -
GROUPS: User groups that can use the partition; all means all groups can use it. -
JOB_SIZE: Minimum and maximum number of nodes available for a user job. If only one value is shown, min and max are the same. infinite means no limit. -
TIMELIMIT: Job walltime limit (walltime refers to actual elapsed time measured by a clock). infinite means no limit. If limited, the format is "days-hours:minutes:seconds". -
MEMORY: Physical memory size in MB. -
NODELIST: Node name list, formatted like node[1-10,11,13-28]. -
NODES: Node count. -
NODES(A/I): Node count, with state format "available/idle". -
NODES(A/I/O/T): Node count, with state format "available/idle/other/total". -
PARTITION: Partition name; a trailing * means it is the default partition. -
ROOT: Whether resources are restricted to the root account. -
OVERSUBSCRIBE: Whether job allocations can exceed compute resources (such as CPU count):- no: oversubscription not allowed.
- exclusive: exclusive; only these jobs can use the resources (equivalent to
srun --exclusive). - force: resources are always oversubscribed.
- yes: resources can be oversubscribed.
-
STATE: Node state. Possible values include:- allocated, alloc: allocated.
- completing, comp: completing.
- down: down.
- drained, drain: drained.
- draining, drng: draining.
- fail: failed.
- failing, failg: failing.
- future, futr: future (available later).
- idle: idle and can accept new jobs.
- maint: maintenance.
- mixed: mixed; the node is running jobs but has some idle CPU cores and can accept new jobs.
- perfctrs, npc: unavailable due to network performance counters in use.
- power_down, pow_dn: powered down.
- power_up, pow_up: powering up.
- reserved, resv: reserved.
- unknown, unk: unknown.
Note: if the state has a suffix *, the node is not responding.
-
TMP_DISK: Size of the partition where/tmpresides, in MB.