Partition Administrators
By default, administrative operations on the cluster can only be performed by the super administrator. In large environments, the super administrator may not be able to handle all administrative operations. Therefore, we introduce the concept of partition administrators. A partition administrator is a special user who can perform administrative operations on a specific partition in the cluster, but cannot administer the entire cluster. Currently, partition administrators can perform the following operations on a partition:
- Cancel jobs in the partition.
- Set whether the partition can accept new jobs (DRAIN/UP).
Supported Versions
10.61 and later
Usage
- Partition administration is configured with the partition
Adminsparameter. This parameter supports multiple partition administrators. The format is Linux usernames, separated by commas (,). - Users with partition administrator privileges can use
scancelto cancel other users' jobs in the corresponding partition. - Users with partition administrator privileges can use
bkillandqdelin the wrappers to cancel other users' jobs in the corresponding partition. - Users with partition administrator privileges can use
scontrol update partition=<partition_name> state=<UP/DRAIN>to set whether the partition can accept new jobs.
Example
Assume there is an admin user admin. We can configure them as the administrator of partition compute as follows:
PartitionName=compute Nodes=compute[1-3] ... Admins=admin
warning
- Partition administrators can only perform management operations on their partitions, not on the entire cluster.
- If a partition administrator is configured incorrectly or the username cannot be resolved, the cluster controller will fail to start.
- When using
scontrol, avoid updating other partition options at the same time; otherwise the state update will fail.