Cluster Analysis
The Cluster Analysis module provides querying and analysis for historical jobs on clusters. It helps you understand hardware resource utilization and job statistics, and provides reliable references for tuning, scheduling strategy decisions, and cluster scaling.
Cluster analysis includes three types of panels: query, analysis, and resource lists.
Query panels
- Execution time of completed jobs within a time range: Statistics by partition and by user. Supports selecting cluster and latest completion time.
- Queued and running jobs: Shows submitter, submission time, requested CPU/memory, wait time. Supports filtering by user, job name, execution host, and partition.
- Completed jobs: Shows submitter, submission time, execution time, CPU time, requested CPU/memory, execution host, partition, job status. Supports filtering by user, job name, execution host, partition, and status.
- User job status query: Shows counts of jobs per user (pending allocation, running, completed, failed). Supports filtering by user.
- Job list: Lists all jobs in the cluster. Supports filtering by cluster, partition, status, job ID, etc.
Analysis panels
- Jobs with unreasonable memory requests: By default shows jobs with memory delta percent 50% (difference between requested memory and actual usage) or memory delta > 1 GiB (difference between requested memory and peak memory usage). Supports filtering by memory delta and delta percent.
- Abnormally terminated jobs: Shows jobs with a non-zero exit code.
- User usage statistics: CPU usage by user over time.
- Jobs with unreasonable CPU requests: By default shows jobs with CPU delta percent
> 100%or< 50%(difference between requested CPU and actual CPU usage percent). Adjust thresholds as needed. - Average waiting time of queued jobs in the cluster: Select cluster and time range.
- Average waiting time of queued jobs in partitions: Select cluster and time range.
- Number of completed jobs in partitions: Select cluster and time range.
- License monitoring analysis: Requires separate enablement/configuration. Provides license service state, runtime state, total licenses, used licenses, total usage rate, per-user usage rate, and per-group usage rate.
Cluster list
The navigation page shows resource usage for the HPC cluster list.
Cluster host list
The navigation page shows host resource usage for the cluster host list.
FAQ
- If a chart line color is too light to distinguish, click the color block in the legend to change it to a more vivid color.