Skip to main content

Release Notes

Fsched 10.106

Updated: 2025.12.31

  • slurm:
    • Added license server monitoring and license allocation management for the current cluster
    • Added adaptive scheduling
    • Added the AllowUsers parameter for partitions
    • Enhanced fsched_list_job() to support more pagination options
    • Added QoS denial behavior for "by job" and "by user"
    • Limited wckey input length to 42 characters
    • Added the checkpoint/criu plugin for job checkpoint and restore operations, supporting incremental checkpoints, pre-dump, and load-aware delays
    • Enhanced cli_filter to support wrappers and the custom field API
    • Fixed issues:
      • Fixed a problem where, if a user submitted a job without using -c to specify CPU cores, then after slurmctld restarted the core count was incorrectly stored as 0xfffe, causing abnormal CPUsPerTask display and incorrect bjobs output
      • Fixed crashes caused by sview, cpus_per_task persistence, gres_detail_str, and related issues
  • wrapper:
    • Added support for dynamic license accounting
    • Enhanced fslsproc with tree display and conflict detection
    • Added btop and bbot commands to change job order
    • Added bpeek to view the stdout/stderr of running batch jobs
    • Added fsopt, supporting bsub, qsub, sbatch, and srun; supports both interactive and batch commands
    • Enhanced lshosts to support the -l, -T, -a, and -R options, with filtering by host or cluster
    • Added support for the fsquota command to display resource quotas and limits
      • Displays accounting association limits and QoS policies
      • Displays current resource usage for jobs, CPU, memory, nodes, and GPU
      • Supports filtering by user, account, and QoS
      • Provides JSON output for programmatic access
    • Added a new cli_filter adapter compatible with LSF and SGE
      • Supports the bsub, qsub, qsh, and qrsh commands
      • Adds custom fields to distinguish wrapper jobs from native SLURM commands
      • Adds comprehensive documentation including user guide, design, and custom fields
    • Added the -json option to support JSON output
      • Added 6 custom output fields: account, requeue, tmp_disk, min_nodes, max_nodes, and ntasks_per_node
      • Expanded custom field support to a total of 86 field names, including 71 standard fields and 15 aliases
      • Improved field formatting and compatibility with LSF
      • Refactored the internal implementation to improve maintainability
    • Enhanced the statesvc service ListJobs API with a force-refresh option
    • bsub
      • Added the -env option, supporting the full LSF syntax: all, none, selective, exclusion, and assignment
      • Added support for -H (suspend job), -Ne (exit notification), and -ti (orphan process termination)
      • Added ulimit support, including -M, -C, -c, -D, -F, -S, -v, -p, -T, and -ul
      • Added support for fsiod, a native x11/stdio forwarding system. Experimental. Advantages: small footprint, fully asynchronous behavior, LSF-compatible behavior, and about a 10% performance improvement for srun -x11
      • bsub -w supports using JOBNAME as the job condition, allowing scripts to query job state directly by JOBNAME
        • Supports the done(job_name), ended(job_name), exit(job_name), and started(job_name) syntax
    • Fixed issues:
      • Fixed the lack of server-side user filtering in qacct, where each request fetched the full dataset and wasted bandwidth and memory
      • Fixed an issue where qacct without a specified job ID only fetched data and did not print results
      • Fixed pagination when qacct queried server data
      • Fixed an issue where fsjobs displayed only the current user's jobs by default

Fsched 10.96

Updated: 2025.09.25

  • slurm:

    • fsched ping: added checks for jobs in the pending state
    • fsched list jobs API: added filter conditions for comment, wckey, group_id, and node_name
    • Removed erroneous logs for the CgroupAutomount configuration option
    • Added the job_submit/intelliparams job submission plugin
    • Changed CR_LLN "load" to use a ratio rather than available CPU count
    • Added the FairshareUsed factor, calculated from resources already consumed
    • Added the --ext option to sshare to include the FairshareUsed field
    • Added the --ext option to sprio to include the FairshareUsed field
    • Fixed issues:
      • Removed proactive loading while loading job information to avoid slurmd hangs
      • Used the generic _get_avail_map for batch job binding to fix node ordering during terminate job requests
      • Used connection, send, and receive timeout settings when fetching job details to fix statesvc hangs
      • Removed the _access check to fix permission denied errors for prolog and epilog tasks when using root_squash
      • Stopped sorting node names in _job_test to fix CPU binding issues
      • Fixed a race condition that caused slurmctld to crash during automatic scaling
      • Adjusted the log level for task cgroup errors
  • wrapper:

    • Added the bswitch command to switch pending jobs to another queue
    • Added the bstop command to stop running jobs
    • Added the bresume command to resume stopped jobs
    • Added the bhist command to display job history
    • Added the lsinfo command
    • bhosts: added the -a, -aff, -alloc, -e, -x, -X, and -R options; added filtering by cluster_name; fixed status display for the -l and -m options
    • lsload: added the -I, -w, -l, -N, -E, -R, and -a options, and added filtering by host or cluster
    • statesvc: added expanded node lists (expanded nodelists)
    • Added support for mapping bsub -G to the Slurm account
    • Added support for fscgdet on both cgroup v1 and v2
    • statesvc: added job extra information for intelliparams
    • bjobs:
      • Switched to using the Fsched API with server-side filtering to load job information
      • Added the start time and finish time fields
    • bqueues:
      • Added the -m cluster_name option
      • Added loadSched / loadStop information to the -l output
      • Added the -alloc option
      • Added the -u user,all option
      • Added JL/U and JL/H output
    • Fixed issues:
      • Fixed an issue where bsub -I did not correctly forward command arguments, and fixed the permission issue with bsub -Ep
      • bjobs: fixed -A and -UF, fixed display by array_job_id list, fixed memory usage display, and added job descriptions, scheduling parameters, and resource requirement details to the -l output
      • bqueues:
        • Fixed scheduling parameter display in the -l output
        • Fixed multi-task jobs
        • Fixed -m all
        • Fixed error messages and error codes when a partition or host could not be found
        • Fixed Users in the -l output

Fsched 10.77

Updated: 2025.03.14

  • Added support for systems using cgroup v2
  • Added LoadStop and LoadSched parameter settings based on CPU load
  • Added support for bjobs -l to display task information submitted by other users
  • Fixed a number of known issues

Fsched 10.62

Updated: 2024.12.13

  • Added support for configuring multiple partition administrators, authorizing them to cancel any job in the partition and control whether the partition accepts jobs (enable/stop)
  • Added support for setting the maximum available CPU count at the partition level
  • Added QoS policy support for setting the maximum resource minutes that all running jobs can use for each account or user; when a job exceeds the configured limit, it remains pending
  • Allowed updating job memory, requiring the select/cons_tres_ex plugin when the job is running or pending
  • Allowed updating job CPU allocation, only for single-node jobs and requiring the select/cons_tres_ex plugin
  • Added support for querying job usage information, node load information, and user usage information for completed jobs
  • Added parsing for parts of the qsub and sqtat command parameters in the SGE wrapper
  • Fixed a number of known issues

Fsched 10.37

Updated: 2024.09.15

  • Allowed users to increase the time limit of already submitted jobs
  • Allowed configuring a partition-level option to kill jobs that exceed their requested memory
  • Added loadStop and loadSched settings based on CPU and memory utilization
  • Avoided potential job failures during scheduling when the authentication system, such as LDAP or NIS, becomes unavailable
  • Added statistics such as resource usage for running jobs
  • Improved response speed when canceling srun jobs
  • Added node load and job load collection mechanisms, and used them to improve the output of the lsload and bjobs commands in the LSF wrapper
  • Improved the failover mechanism in HA scenarios to shorten switchover time
  • Improved stability under high load