Cluster Monitoring Dashboard
This dashboard provides a comprehensive view of cluster runtime status. It covers three main areas: resource scheduling, task execution, and application license usage, helping you monitor cluster health and resource utilization.
Prerequisites
- You have a Desktop Portal account and are logged in.
- You have at least one authorized cluster. Contact your administrator if needed.
- You have cluster read permission. Contact your administrator if needed.
Fsched Resource Monitoring
Cluster Resource Overview
- CPU usage: real-time CPU usage across all compute nodes in the cluster.
- Memory usage: real-time memory usage across all compute nodes in the cluster.
Partition Resource Details
- Partition name: partition name and unique identifier.
- Total CPUs: total physical CPU cores across nodes in the partition.
- Running CPUs: CPU cores currently used by running tasks.
- Idle CPUs: available CPU cores not allocated to tasks.
- Utilization: (running CPUs / total CPUs) * 100%.
Node Resource Details
- Node name: node unique identifier.
- CPU usage: (used CPUs / total CPUs) on the node.
- Partition: partition the node belongs to.
Task Monitoring
Task fields:
- Task ID: system-generated unique identifier for each submitted task.
- Task name: name provided by the user when submitting the task.
- User: user account that submitted the task.
- Status: execution stage (running, queued, completed, failed, and so on).
- Created at: time when the task was submitted.
- Started at: time when the task actually started.
- Finished at: time when the task completed.
- Runtime: elapsed time from start to finish.
- Total CPU cores: total CPU cores requested by the task.
- GPU count: number of GPUs requested by the task.
License Monitoring
License Overview
- Total: total number of licenses granted to the cluster for a given software.
- In use: number of licenses currently in use by tasks.
- Available: remaining available licenses.
- Users: number of distinct users currently consuming the license.
- Overall utilization: overall license utilization, reflecting resource pressure.
- Health status: color-coded health indicator (critical red / warning yellow / normal green).
License Details
- Feature: the licensed feature/module name.
- Total: total licenses for this feature.
- In use: used licenses for this feature.
- Available: remaining licenses for this feature.
- Utilization: utilization percentage for this feature.
- Users: number of users consuming this feature.
- Status: current health status for this feature.
Per-User Usage
- User-level details: per-user consumption of each license feature.
- Search: filter by username or feature name.
FAQs
-
Are CPU and memory usage metrics on the dashboard fully real-time?
- There is usually a data collection delay for CPU and memory metrics (typically around 1 to 3 minutes). This is a trade-off between monitoring accuracy and system performance.
-
How often is license usage updated?
- License metrics are typically updated every 1 to 3 minutes. When license status changes (for example, from "normal" to "warning"), the dashboard will refresh after the next collection cycle.