Skip to main content
Version: FCP 25.11

Cluster Monitoring Dashboard

This dashboard provides a comprehensive view of cluster runtime status. It covers three main areas: resource scheduling, task execution, and application license usage, helping you monitor cluster health and resource utilization.

Prerequisites

  • You have a Desktop Portal account and are logged in.
  • You have at least one authorized cluster. Contact your administrator if needed.
  • You have cluster read permission. Contact your administrator if needed.

Fsched Resource Monitoring

Cluster Resource Overview

  • CPU usage: real-time CPU usage across all compute nodes in the cluster.
  • Memory usage: real-time memory usage across all compute nodes in the cluster.

Partition Resource Details

  • Partition name: partition name and unique identifier.
  • Total CPUs: total physical CPU cores across nodes in the partition.
  • Running CPUs: CPU cores currently used by running tasks.
  • Idle CPUs: available CPU cores not allocated to tasks.
  • Utilization: (running CPUs / total CPUs) * 100%.

Node Resource Details

  • Node name: node unique identifier.
  • CPU usage: (used CPUs / total CPUs) on the node.
  • Partition: partition the node belongs to.

Task Monitoring

Task fields:

  • Task ID: system-generated unique identifier for each submitted task.
  • Task name: name provided by the user when submitting the task.
  • User: user account that submitted the task.
  • Status: execution stage (running, queued, completed, failed, and so on).
  • Created at: time when the task was submitted.
  • Started at: time when the task actually started.
  • Finished at: time when the task completed.
  • Runtime: elapsed time from start to finish.
  • Total CPU cores: total CPU cores requested by the task.
  • GPU count: number of GPUs requested by the task.

License Monitoring

License Overview

  • Total: total number of licenses granted to the cluster for a given software.
  • In use: number of licenses currently in use by tasks.
  • Available: remaining available licenses.
  • Users: number of distinct users currently consuming the license.
  • Overall utilization: overall license utilization, reflecting resource pressure.
  • Health status: color-coded health indicator (critical red / warning yellow / normal green).

License Details

  • Feature: the licensed feature/module name.
  • Total: total licenses for this feature.
  • In use: used licenses for this feature.
  • Available: remaining licenses for this feature.
  • Utilization: utilization percentage for this feature.
  • Users: number of users consuming this feature.
  • Status: current health status for this feature.

Per-User Usage

  • User-level details: per-user consumption of each license feature.
  • Search: filter by username or feature name.

FAQs

  • Are CPU and memory usage metrics on the dashboard fully real-time?

    • There is usually a data collection delay for CPU and memory metrics (typically around 1 to 3 minutes). This is a trade-off between monitoring accuracy and system performance.
  • How often is license usage updated?

    • License metrics are typically updated every 1 to 3 minutes. When license status changes (for example, from "normal" to "warning"), the dashboard will refresh after the next collection cycle.