scontrol
Overview
scontrol is the command tool in Slurm for system control and configuration management:
- View and modify Slurm configuration
- Manage jobs, nodes, partitions, and other resources
- Most operations require administrator privileges
- Provides real-time system status viewing
Common Options
| Feature | Command Example | Description |
|---|---|---|
| Job Management | ||
| View jobs | scontrol show job <jobid> | Show job details |
| Job extended info | scontrol show job_ex <jobid> | Show extra job info |
| Modify job | scontrol update jobid=123 ... | Modify job parameters |
| Cancel job | scontrol kill <jobid> | Terminate a running job |
| Node Management | ||
| View node | scontrol show node <node> | Show node details |
| Modify node | scontrol update nodename=node01 ... | Update node config |
| Drain node | scontrol update nodename=node01 state=DRAIN | Set node to maintenance |
| Partition Management | ||
| View partition | scontrol show partition <name> | Show partition config |
| Modify partition | scontrol update partition=debug ... | Update partition parameters |
| Other Features | ||
| View config | scontrol show config | Show system config |
| View licenses | scontrol show lic | Show license status |
Examples
Job management
View extended job info
scontrol show job_ex 12345
Extended field details
| Field | Description |
|---|---|
| RespHost | Host name for interactive job requests |
| Port | Allocated response port (alloc_resp_port) |
| OtherPort | Other ports (used for notifications such as SRUN_PING) |
| LastActivity | Last active time for job allocation (job last_time_active) |
Modify job parameters
scontrol update jobid=12345 TimeLimit=1-12:00:00
Modify job priority
scontrol update jobid=12345 Priority=1000
Terminate a job
scontrol kill 12345 "Maintenance required"
Node control
Set node to maintenance
scontrol update nodename=node01 state=DRAIN reason="Hardware upgrade"
Resume node
scontrol update nodename=node01 state=RESUME
Partition control
Change partition state
scontrol update partition=debug state=UP
Set partition node weight
scontrol update partition=debug weight=100
State Management
Node state types
| State | Description |
|---|---|
| IDLE | Node is idle/available |
| ALLOC | Node is allocated |
| MIXED | Node is partially allocated |
| DRAIN | Node is in maintenance |
| FAIL | Node failure |
| DOWN | Node is down |
Partition state types
| State | Description |
|---|---|
| UP | Partition available |
| DOWN | Partition unavailable |
| DRAIN | Partition in maintenance |
Notes
- Most modification operations require administrator privileges
- Configuration changes may affect system operation; use a maintenance window
- Node state changes may take time to take effect
- Some changes may require updating the slurm.conf configuration file
- Test before production operations
📌 Best practices:
- Check current status before important operations:
scontrol show <entity> - Verify after changes:
scontrol reconfigure - Record reasons for maintenance operations:
scontrol update ... reason="<details>" - Use node range expressions for batch operations:
node[01-08,12]