QoS New Policy Parameters
fsched (fsched-10.61+) supports new QoS policy parameters:
MaxTRESRunMinsPerAccount- For all accounts associated with this QoS, the maximum total resource minutes that all running jobs of each account can use.
MaxTRESRunMinsPerUser- For all users associated with this QoS, the maximum total resource minutes that all running jobs of each user can use.
warning
- Manually modifying slurm accounting associations and QoS settings conflicts with the UI cluster quota settings, so do not use them together with the UI cluster quota feature.
- In addition to updating fsched, you must also update the slurmdbd container image. Otherwise, there will be a bug when setting these two new parameters (impact: if the value to set for MaxTRESRunMinsPerAccount equals the current value of MaxTRESRunMinsPerUser, the setting will fail with no error).
- If you submit jobs without using
-tto specifytime_limit, the defaulttime_limitis unlimited. In that case, ifMaxTRESRunMinsPerAccountorMaxTRESRunMinsPerUseris set, the job will always be pending because the resource minutes exceed the maximum limit.
Example
-
Assume users
aliceandbobexist, both with primary groupjjtest.root@head1:~# id alice
uid=2006(alice) gid=2013(jjtest) groups=2001(fsadmin),2003(defaultGroup),2013(jjtest)
root@head1:~# id bob
uid=2007(bob) gid=2013(jjtest) groups=2001(fsadmin),2003(defaultGroup),2013(jjtest) -
Create associations for the current cluster and users
aliceandbob.sacctmgr add account jjtest cluster=fastone-18
sacctmgr add user alice account=jjtest cluster=fastone-18
sacctmgr add user bob account=jjtest cluster=fastone-18 -
Create a QoS and set
MaxTRESRunMinsPerAccountandMaxTRESRunMinsPerUser.sacctmgr add qos test_tres
sacctmgr modify qos test_tres set MaxTRESRunMinsPerAccount=cpu=3
sacctmgr modify qos test_tres set MaxTRESRunMinsPerUser=cpu=2 -
Associate the QoS with the associations of users
aliceandbob.sacctmgr modify user alice account=jjtest cluster=fastone-18 set qos=test_tres
sacctmgr modify user bob account=jjtest cluster=fastone-18 set qos=test_tres -
User
alicesubmits a job withtime_limitof 2 minutes, then submits another job withtime_limitof 1 minute. The later job will be pending due toMaxCpuRunMinsPerUser.alice@head1:~$ srun -t 2 sleep 100&
alice@head1:~$ srun -t 1 sleep 100&
alice@head1:~$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
40 partition sleep alice PD 0:00 1 (MaxCpuRunMinsPerUser)
39 partition sleep alice R 0:07 1 compute1 -
Before the jobs in step 5 finish, user
bobsubmits a job withtime_limitof 2 minutes. The job will be pending due toMaxCpuRunMinsPerAccount.bob@head1:~$ srun -t 2 sleep 600&
bob@head1:~$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
42 partition sleep alice PD 0:00 1 (MaxCpuRunMinsPerUser)
43 partition sleep bob PD 0:00 1 (MaxCpuRunMinsPerAccount)
41 partition sleep alice R 0:21 1 compute1