Fairshare Used Factor That Considers Actual Resource Usage
Slurm uses the priority/multifactor plugin for multifactor job scheduling. The Fairshare Factor considers the difference between committed resources and actually allocated resources. fsched (fsched-10.87) adds the new FairshareUsed Factor, which considers the difference between committed resources and actually consumed resources.
Purpose
When calculating a user's job priority, this factor considers the actual resources consumed by the user's completed jobs. More actual consumption lowers the priority associated with FAIRSHAREUSED, thereby reducing the overall priority.
- If you upgrade to this fsched version and later downgrade to an older version, the
slurmctldservice will fail to start due to an incompatiblestatefile version. In theslurmctldlog you will seeF: Can not recover assoc_usage state, incompatible version, got 2561 need >= 8192 <= 8704. In this case, manually delete theassoc_usagefile under the clusterstatedirectory, then restartslurmctld.
- To use the
FairshareUsed Factor, you must configure the correspondingassociationforcluster-account-userinaccountingfirst. - Currently,
FairshareUsed Factorsupports only the defaultFair Treefairshare algorithm and does not support theclassicfairshare algorithm. - If you set a relatively large weight for
FairshareUsed Factor, then the more resources are actually consumed, the smaller the pending jobs' priority values for that user, and the lower the priority.
Cluster Configuration
PriorityWeightFairshareUsed: a cluster custom parameter of integer type that represents the weight of FairshareUsed Factor.
Inspection Tools
sshare
sshare is used to list resource shares for association in the cluster. fsched (fsched-10.87) adds a new sshare option --ext that includes information related to FairshareUsed Factor.
sprio
sprio is used to view the factors that make up job scheduling priority. fsched (fsched-10.87) adds a new sprio option --ext that includes information related to FairshareUsed Factor.
Usage Example
-
Confirm that the corresponding
associationforcluster-account-useris configured inaccounting.root@ubuntu22-4c-3:~# sacctmgr list assoc cluster=fastone-5
Cluster Account User Partition Share Priority GrpJobs GrpTRES GrpSubmit GrpWall GrpTRESMins MaxJobs MaxTRES MaxTRESPerNode MaxSubmit MaxWall MaxTRESMins QOS Def QOS GrpTRESRunMin
---------- ---------- ---------- ---------- --------- ---------- ------- ------------- --------- ----------- ------------- ------- ------------- -------------- --------- ----------- ------------- -------------------- --------- -------------
fastone-5 root 1 normal
fastone-5 root root 1 normal
fastone-5 _fsched_a+ 1 normal
fastone-5 jj 1 normal
fastone-5 jj jj partition+ 1 user-fastone-5-jj-p+
fastone-5 test1 1 normal
fastone-5 test1 alice partition+ 1 fastone-5-test1-par+
fastone-5 test1 bob partition+ 1 fastone-5-test1-par+
fastone-5 test2 1 normal
fastone-5 test2 charlie partition+ 1 fastone-5-test2-par+ -
Modify the cluster custom parameters, set the priority plugin to
priority/multifactor, and set the desiredPriorityWeightFairshareUsed. Here, to make actual resource consumption have a larger impact on priority, a largerPriorityWeightFairshareUsedis configured.PriorityType=priority/multifactor
PriorityWeightFairshare=30
PriorityWeightFairshareUsed=3000 -
Submit jobs that consume more resources with some users, and submit jobs that allocate resources but consume fewer resources with other users, to test the impact of the new factor.
alice@ubuntu22-4c-1:~$ srun -n4 sleep 100&
[1] 2512319
alice@ubuntu22-4c-1:~$ srun -n4 sleep 100&
[2] 2512372charlie@ubuntu22-4c-2:~$ srun stress-ng --cpu 1 --cpu-load 100 -t 100s &
[1] 164380
charlie@ubuntu22-4c-2:~$ srun stress-ng --cpu 1 --cpu-load 100 -t 100s &
[2] 164387 -
Before and after the resource-intensive jobs complete, use
sshare --extto view resource shares and compare the changes.root@ubuntu22-4c-3:~# sshare -a --ext
Account User RawShares NormShares RawUsage EffectvUsage FairShare RawUsageUse EffectvUsageU FairShareU
-------------------- ---------- ---------- ----------- ----------- ------------- ---------- ----------- ------------- ----------
root 0.000000 517 1.000000 14 0.000000
root root 1 0.200000 0 0.000000 0.800000 0 0.000815 0.600000
_fsched_all 1 0.200000 0 0.000000 0 0.000000
jj 1 0.200000 0 0.000000 0 0.000000
jj jj 1 1.000000 0 0.000000 1.000000 0 0.000000 1.000000
test1 1 0.200000 394 0.763174 14 0.999185
test1 alice 1 0.500000 394 1.000000 0.200000 14 1.000000 0.200000
test1 bob 1 0.500000 0 0.000000 0.400000 0 0.000000 0.400000
test2 1 0.200000 122 0.236825 0 0.000000
test2 charlie 1 1.000000 122 1.000000 0.600000 0 0.000000 1.000000root@ubuntu22-4c-3:~# sshare -a --ext
Account User RawShares NormShares RawUsage EffectvUsage FairShare RawUsageUse EffectvUsageU FairShareU
-------------------- ---------- ---------- ----------- ----------- ------------- ---------- ----------- ------------- ----------
root 0.000000 898 1.000000 177817 0.000000
root root 1 0.200000 0 0.000000 0.800000 0 0.000000 0.800000
_fsched_all 1 0.200000 0 0.000000 0 0.000000
jj 1 0.200000 0 0.000000 0 0.000000
jj jj 1 1.000000 0 0.000000 1.000000 0 0.000000 1.000000
test1 1 0.200000 705 0.784619 14 0.000081
test1 alice 1 0.500000 705 1.000000 0.200000 14 1.000000 0.400000
test1 bob 1 0.500000 0 0.000000 0.400000 0 0.000000 0.600000
test2 1 0.200000 193 0.215381 177803 0.999919
test2 charlie 1 1.000000 193 1.000000 0.600000 177803 1.000000 0.200000 -
Before and after the resource-intensive jobs complete, use
sprio --extto view the priority changes of users' pending jobs.root@ubuntu22-4c-3:~# sprio --ext -l
JOBID PARTITION USER PRIORITY SITE AGE ASSOC FAIRSHARE JOBSIZE PARTITION QOS NICE TRES FAIRSHAREU
674 partition alice 606 0 0 0 6 0 0 0 0 600
675 partition charlie 3018 0 0 0 18 0 0 0 0 3000root@ubuntu22-4c-3:~# sprio --ext -l
JOBID PARTITION USER PRIORITY SITE AGE ASSOC FAIRSHARE JOBSIZE PARTITION QOS NICE TRES FAIRSHAREU
675 partition charlie 618 0 0 0 18 0 0 0 0 600
676 partition alice 1206 0 0 0 6 0 0 0 0 1200
677 partition charlie 618 0 0 0 18 0 0 0 0 600