Skip to main content

Fairshare Used Factor That Considers Actual Resource Usage

Slurm uses the priority/multifactor plugin for multifactor job scheduling. The Fairshare Factor considers the difference between committed resources and actually allocated resources. fsched (fsched-10.87) adds the new FairshareUsed Factor, which considers the difference between committed resources and actually consumed resources.

Purpose

When calculating a user's job priority, this factor considers the actual resources consumed by the user's completed jobs. More actual consumption lowers the priority associated with FAIRSHAREUSED, thereby reducing the overall priority.

warning
  • If you upgrade to this fsched version and later downgrade to an older version, the slurmctld service will fail to start due to an incompatible state file version. In the slurmctld log you will see F: Can not recover assoc_usage state, incompatible version, got 2561 need >= 8192 <= 8704. In this case, manually delete the assoc_usage file under the cluster state directory, then restart slurmctld.
tip
  • To use the FairshareUsed Factor, you must configure the corresponding association for cluster-account-user in accounting first.
  • Currently, FairshareUsed Factor supports only the default Fair Tree fairshare algorithm and does not support the classic fairshare algorithm.
  • If you set a relatively large weight for FairshareUsed Factor, then the more resources are actually consumed, the smaller the pending jobs' priority values for that user, and the lower the priority.

Cluster Configuration

PriorityWeightFairshareUsed: a cluster custom parameter of integer type that represents the weight of FairshareUsed Factor.

Inspection Tools

sshare

sshare is used to list resource shares for association in the cluster. fsched (fsched-10.87) adds a new sshare option --ext that includes information related to FairshareUsed Factor.

sprio

sprio is used to view the factors that make up job scheduling priority. fsched (fsched-10.87) adds a new sprio option --ext that includes information related to FairshareUsed Factor.

Usage Example

  1. Confirm that the corresponding association for cluster-account-user is configured in accounting.

    root@ubuntu22-4c-3:~# sacctmgr list assoc cluster=fastone-5
    Cluster Account User Partition Share Priority GrpJobs GrpTRES GrpSubmit GrpWall GrpTRESMins MaxJobs MaxTRES MaxTRESPerNode MaxSubmit MaxWall MaxTRESMins QOS Def QOS GrpTRESRunMin
    ---------- ---------- ---------- ---------- --------- ---------- ------- ------------- --------- ----------- ------------- ------- ------------- -------------- --------- ----------- ------------- -------------------- --------- -------------
    fastone-5 root 1 normal
    fastone-5 root root 1 normal
    fastone-5 _fsched_a+ 1 normal
    fastone-5 jj 1 normal
    fastone-5 jj jj partition+ 1 user-fastone-5-jj-p+
    fastone-5 test1 1 normal
    fastone-5 test1 alice partition+ 1 fastone-5-test1-par+
    fastone-5 test1 bob partition+ 1 fastone-5-test1-par+
    fastone-5 test2 1 normal
    fastone-5 test2 charlie partition+ 1 fastone-5-test2-par+
  2. Modify the cluster custom parameters, set the priority plugin to priority/multifactor, and set the desired PriorityWeightFairshareUsed. Here, to make actual resource consumption have a larger impact on priority, a larger PriorityWeightFairshareUsed is configured.

    PriorityType=priority/multifactor
    PriorityWeightFairshare=30
    PriorityWeightFairshareUsed=3000
  3. Submit jobs that consume more resources with some users, and submit jobs that allocate resources but consume fewer resources with other users, to test the impact of the new factor.

    alice@ubuntu22-4c-1:~$ srun -n4 sleep 100&
    [1] 2512319
    alice@ubuntu22-4c-1:~$ srun -n4 sleep 100&
    [2] 2512372
    charlie@ubuntu22-4c-2:~$ srun stress-ng --cpu 1 --cpu-load 100 -t 100s &
    [1] 164380
    charlie@ubuntu22-4c-2:~$ srun stress-ng --cpu 1 --cpu-load 100 -t 100s &
    [2] 164387
  4. Before and after the resource-intensive jobs complete, use sshare --ext to view resource shares and compare the changes.

    root@ubuntu22-4c-3:~# sshare -a --ext
    Account User RawShares NormShares RawUsage EffectvUsage FairShare RawUsageUse EffectvUsageU FairShareU
    -------------------- ---------- ---------- ----------- ----------- ------------- ---------- ----------- ------------- ----------
    root 0.000000 517 1.000000 14 0.000000
    root root 1 0.200000 0 0.000000 0.800000 0 0.000815 0.600000
    _fsched_all 1 0.200000 0 0.000000 0 0.000000
    jj 1 0.200000 0 0.000000 0 0.000000
    jj jj 1 1.000000 0 0.000000 1.000000 0 0.000000 1.000000
    test1 1 0.200000 394 0.763174 14 0.999185
    test1 alice 1 0.500000 394 1.000000 0.200000 14 1.000000 0.200000
    test1 bob 1 0.500000 0 0.000000 0.400000 0 0.000000 0.400000
    test2 1 0.200000 122 0.236825 0 0.000000
    test2 charlie 1 1.000000 122 1.000000 0.600000 0 0.000000 1.000000
    root@ubuntu22-4c-3:~# sshare -a --ext
    Account User RawShares NormShares RawUsage EffectvUsage FairShare RawUsageUse EffectvUsageU FairShareU
    -------------------- ---------- ---------- ----------- ----------- ------------- ---------- ----------- ------------- ----------
    root 0.000000 898 1.000000 177817 0.000000
    root root 1 0.200000 0 0.000000 0.800000 0 0.000000 0.800000
    _fsched_all 1 0.200000 0 0.000000 0 0.000000
    jj 1 0.200000 0 0.000000 0 0.000000
    jj jj 1 1.000000 0 0.000000 1.000000 0 0.000000 1.000000
    test1 1 0.200000 705 0.784619 14 0.000081
    test1 alice 1 0.500000 705 1.000000 0.200000 14 1.000000 0.400000
    test1 bob 1 0.500000 0 0.000000 0.400000 0 0.000000 0.600000
    test2 1 0.200000 193 0.215381 177803 0.999919
    test2 charlie 1 1.000000 193 1.000000 0.600000 177803 1.000000 0.200000
  5. Before and after the resource-intensive jobs complete, use sprio --ext to view the priority changes of users' pending jobs.

    root@ubuntu22-4c-3:~# sprio --ext -l
    JOBID PARTITION USER PRIORITY SITE AGE ASSOC FAIRSHARE JOBSIZE PARTITION QOS NICE TRES FAIRSHAREU
    674 partition alice 606 0 0 0 6 0 0 0 0 600
    675 partition charlie 3018 0 0 0 18 0 0 0 0 3000
    root@ubuntu22-4c-3:~# sprio --ext -l
    JOBID PARTITION USER PRIORITY SITE AGE ASSOC FAIRSHARE JOBSIZE PARTITION QOS NICE TRES FAIRSHAREU
    675 partition charlie 618 0 0 0 18 0 0 0 0 600
    676 partition alice 1206 0 0 0 6 0 0 0 0 1200
    677 partition charlie 618 0 0 0 18 0 0 0 0 600