Version: FCP 25.11

Monitoring Service

tip

To ensure responsiveness, line charts display data for only the top 30 nodes.
File system monitoring and base node monitoring are available only when Hybrid Cloud is enabled in FCP-Suite.

Charts in Monitoring Service can be dragged and zoomed in/out for easier inspection. Refreshing the page resets charts to their initial state.

Cluster Monitoring

Cluster monitoring includes multiple views: cluster overview, compute partition monitoring, node list monitoring, node monitoring, GPU monitoring, service status monitoring, and scheduler monitoring.

Cluster overview

Real-time metrics at the cluster level include: compute node count, compute partition count, total CPU cores, CPU utilization, and average wait time for queued jobs.

Charts: adjust the time range at the top-right to view monitoring data for the desired period.

Compute node CPU utilization (line chart)
Cluster job status distribution (pie chart): counts of Queued/Running/Completed jobs from the Fsched scheduler in-memory statistics
Running CPU cores (line chart)
Queued CPU cores (line chart)
Average wait time for queued jobs (line chart)
Cluster job status counts (stacked chart): counts of Queued/Running/Completed jobs from Fsched in-memory statistics
Compute node count (line chart)

Compute partition monitoring

Real-time metrics at the partition level include: average wait time for queued jobs, node count, CPU cores, total scheduler CPU, free CPU, running CPU, queued CPU, CPU utilization, and memory utilization.

Charts: adjust the time range at the top-right to view monitoring data for the desired period.

Partition CPU utilization (line chart)
Partition running CPU utilization percent (line chart)
Partition memory utilization (line chart)
Partition running CPU cores (line chart)
Partition CPU cores (line chart)
Partition average wait time for queued jobs (line chart)
Partition total memory and allocated memory (line chart)
Partition allocated memory percent (line chart)
Partition queued job count (line chart)
Partition running job count (line chart)
Partition compute node count (line chart)

Node list

Real-time fields include: node name, node ID, cluster ID, partition, uptime, CPU count, total memory, root partition, CPU utilization, memory utilization, root partition utilization, swap utilization, scheduler node status, session count, session user count, running job count, total scheduler CPU, free CPU, and running CPU.

Node monitoring

Real-time metrics at the node level include: uptime, CPU count, CPU iowait, total memory, total file descriptors, total CPU utilization, memory utilization, and swap utilization.

Charts: adjust the time range at the top-right to view monitoring data for the desired period.

CPU utilization (line chart)
Swap (line chart)
Memory (line chart)
5-minute network traffic (stacked chart)
System load average (line chart)
Disk read/write bytes per second (line chart)
Network bandwidth per second (line chart)
Disk IOPS (line chart)
Open file descriptors (left) / context switches per second (right) (line + scatter)
Disk utilization (line chart)
Network socket connections (line chart)
I/O time breakdown within 1 second (line chart)
Per-I/O latency (reference: < 100 ms) (beta) (line chart)

GPU monitoring

Real-time metrics include: GPU count, warnings, GPU utilization, and GPU memory utilization.

Charts: adjust the time range at the top-right to view monitoring data for the desired period.

GPU utilization (detail) (line chart)
GPU memory utilization (detail) (line chart)
GPU frequency (line chart)
Power (line chart)
Memory frequency (line chart)
GPU temperature (line chart)
Memory temperature (line chart)
Memory used (frame buffer) (line chart)
Memory free (frame buffer) (line chart)

Note: CentOS 6.x does not support GPU monitoring.

Service monitoring

Service status monitoring for each node in the cluster.

Scheduler monitoring

Shows node states at the scheduler level for Fsched clusters.

Fully allocated: alloc (blue)
Partially allocated: mix (light blue)
Idle: idle (green)
Unavailable: drain + resv + maint + completing (the first three states are marked unavailable by administrators) (gray)
Fault: down + fail + error (red)

Desktop Monitoring

Node monitoring

Shows hardware resource information (CPU, memory, storage, etc.) for the selected desktop and node.

GPU monitoring

When the node has GPU devices, shows GPU-related metrics.

Note: CentOS 6.x does not support GPU monitoring.

Service monitoring

Shows runtime status of desktop-related services for the selected desktop and node.

File System Monitoring

Node monitoring

Shows hardware resource monitoring for the file system, including CPU, memory, storage, and more.

Service monitoring

Shows runtime status of file-system-related services.

Performance monitoring

Shows file system performance metrics, including IOPS, throughput, latency, and capacity information (available and total).

Management Node Monitoring

Node monitoring

Shows hardware resource monitoring for management nodes, including CPU, memory, storage, and more.

Service monitoring

Shows runtime status of system services on the selected management node.

Base Node Monitoring

Node monitoring

Shows hardware resource monitoring for base nodes on the platform, including CPU, memory, storage, and more.

FAQ

Removing or releasing all compute nodes in a cluster is an invalid operation. In this state, the monitoring system cannot collect valid node metrics and abnormal data is not meaningful.
If a chart line color is too light to distinguish, click the color block in the legend to switch it to a more vivid color.

Cluster Monitoring​

Cluster overview​

Compute partition monitoring​

Node list​

Node monitoring​

GPU monitoring​

Service monitoring​

Scheduler monitoring​

Desktop Monitoring​

Node monitoring​

GPU monitoring​

Service monitoring​

File System Monitoring​

Node monitoring​

Service monitoring​

Performance monitoring​

Management Node Monitoring​

Node monitoring​

Service monitoring​

Base Node Monitoring​

Node monitoring​

FAQ​

Cluster Monitoring

Cluster overview

Compute partition monitoring

Node list

Node monitoring

GPU monitoring

Service monitoring

Scheduler monitoring

Desktop Monitoring

Node monitoring

GPU monitoring

Service monitoring

File System Monitoring

Node monitoring

Service monitoring

Performance monitoring

Management Node Monitoring

Node monitoring

Service monitoring

Base Node Monitoring

Node monitoring

FAQ