Skip to main content

QoS Rejection Policy Flags

fsched supports QoS (Quality of Service) rejection policy flags to control behavior when jobs exceed resource limits. With these flags, administrators can control scheduling policies more granularly.

tip

This feature applies to fsched 10.101 and later.

Flag Details

DenyOnMaxPerJob

When DenyOnMaxPerJob is set, if a job requests resources that exceed the QoS per-job maximum resource limit (MaxTRESPerJob), the job is rejected immediately instead of entering the queue.

Use cases:

  • Strictly limit per-job resource usage.
  • Avoid users submitting oversized jobs that occupy the queue for a long time.

DenyOnMaxPerUser

When DenyOnMaxPerUser is set, if a job would cause the user's resource usage to exceed the QoS per-user maximum resource limit (MaxTRESPerUser), the job is rejected immediately instead of entering the queue.

Use cases:

  • Strictly limit per-user resource usage.
  • Prevent a single user from consuming too many cluster resources.

DenyOnGrp

When DenyOnGrp is set, if a job would cause the group's resource usage to exceed the QoS group maximum resource limit (GrpTRES), the job is rejected immediately instead of entering the queue.

Use cases:

  • Strictly limit total resource usage for a group.
  • Avoid a group's jobs over-consuming cluster resources.

How to Use

View QoS Flags

sacctmgr show qos format=name,flags

Set Rejection Policy Flags

Use sacctmgr to set QoS flags:

# Set a single flag
sacctmgr modify qos <qos_name> set flags=DenyOnMaxPerJob

# Set multiple flags
sacctmgr modify qos <qos_name> set flags=DenyOnMaxPerJob,DenyOnMaxPerUser

# Add a flag to existing flags
sacctmgr modify qos <qos_name> set flags+=DenyOnGrp

Remove Rejection Policy Flags

# Remove a specific flag
sacctmgr modify qos <qos_name> set flags-=DenyOnMaxPerJob

Configuration Examples

Example 1: Limit Per-Job Resources and Reject Oversized Jobs Immediately

  1. Create a QoS and set the per-job maximum CPU count to 16.
sacctmgr add qos limited_job
sacctmgr modify qos limited_job set MaxTRESPerJob=cpu=16
  1. Set the DenyOnMaxPerJob flag.
sacctmgr modify qos limited_job set flags=DenyOnMaxPerJob
  1. Associate the QoS with a user.
sacctmgr modify user alice account=myaccount set qos=limited_job
  1. Test: when user alice submits a job requesting more than 16 CPUs, it is rejected immediately.
alice@head:~$ sbatch -n 20 job.sh
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy

Example 2: Limit Total User Resources and Reject Oversized Jobs Immediately

  1. Create a QoS and set the per-user maximum CPU count to 32.
sacctmgr add qos limited_user
sacctmgr modify qos limited_user set MaxTRESPerUser=cpu=32
  1. Set the DenyOnMaxPerUser flag.
sacctmgr modify qos limited_user set flags=DenyOnMaxPerUser
  1. Associate the QoS with a user.
sacctmgr modify user bob account=myaccount set qos=limited_user
  1. Test: when user bob already has jobs using 24 CPUs, submitting another job requesting 16 CPUs will be rejected immediately.
bob@head:~$ squeue -u bob
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
101 partition job1 bob R 0:30 3 compute[1-3]

bob@head:~$ sbatch -n 16 job2.sh
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy

Example 3: Combine Multiple Flags

  1. Create a QoS and set multiple limits.
sacctmgr add qos strict_qos
sacctmgr modify qos strict_qos set MaxTRESPerJob=cpu=16
sacctmgr modify qos strict_qos set MaxTRESPerUser=cpu=32
sacctmgr modify qos strict_qos set GrpTRES=cpu=64
  1. Set all three rejection policy flags at the same time.
sacctmgr modify qos strict_qos set flags=DenyOnMaxPerJob,DenyOnMaxPerUser,DenyOnGrp

With this configuration:

  • A single job requesting more than 16 CPUs will be rejected.
  • A new job will be rejected when the user's total usage exceeds 32 CPUs.
  • A new job will be rejected when the group's total usage exceeds 64 CPUs.

Behavior Comparison

Without Rejection Policy Flags (Default Behavior)

When a job exceeds resource limits, it enters the PENDING state and waits for resources to be released before it can run.

bob@head:~$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
102 partition job2 bob PD 0:00 1 (MaxCpuPerUser)
101 partition job1 bob R 0:30 3 compute[1-3]

With Rejection Policy Flags Enabled

When a job exceeds resource limits, it is rejected immediately and the submission fails.

bob@head:~$ sbatch job2.sh
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy

Notes

warning
  1. Rejection policy flags cause job submission to fail instead of waiting. Choose whether to enable them based on actual needs.
  2. These flags affect only newly submitted jobs that exceed limits, and do not affect jobs that are already running.
  3. Before enabling rejection policies, it is recommended to test resource limits without flags first to ensure they are reasonable.
  4. Rejection policy flags are different from the DenyOnLimit flag: DenyOnLimit applies to all limit types, while these three flags can target different limit types separately.