Version: FCP 25.11

Alert Service

tip

The Idle shutdown automation is available only after Hybrid Cloud is enabled in FCP-Suite.

Alert Policies

Limit: You can create up to 1000 alert policies.

Field descriptions

Policy name: Required. 1 to 40 characters. Must start with a letter. Can include numbers, _, and -.
Object: Required. Select a cluster or the system platform.
- Regular users can create alerts only for clusters visible to them (clusters they created or clusters shared with them).
- Only running clusters can be selected. Non-running clusters may be shown in the list but cannot be selected.
Type: Required. Host, Service, or Scheduler (default: Host).
- Host: Select one or more nodes for a cluster/system platform. For file systems, no node selection is required.
- Service: No specific node selection is required; the system monitors services on all nodes for the selected object.
- Scheduler: Available only when the object is an Fsched cluster. Scheduler-type policies apply to the Fsched scheduler.
Nodes: Required.
- If the object is a cluster:
  - All nodes: Default. When selected, you cannot select other nodes.
  - Or select one or more specific nodes (head, login, compute).
- If the object is the system platform:
  - All nodes: Default. When selected, you cannot select other nodes.
  - Or select one or more specific nodes (for example, all-in-one is one node; all-in-two is two nodes). Only running nodes can be selected.
Partitions: Required only for Scheduler type. Select one, multiple, or all partitions in the cluster.
Severity: Required. Notification, Warning, or Critical.
Sampling interval: Required. How often the system samples data and computes the average in that interval. Unit: minutes. Range: 1 to 1,000,000.
Consecutive periods: Required. How many consecutive periods must exceed the threshold before triggering an alert. Unit: times. Range: 1 to 1,000,000.
Silence period: Required. If the alert is not recovered, how often to resend notifications. Default: 24 hours. Options: 5/15/30 minutes, 1/3/6/12/24 hours.
Status:
- Enabled: Default. Policy is effective, notifications are sent, and alert records are generated.
- Disabled: No notifications and no alert records.
User: User who created the policy.
Actions:
- Delete: Available in any status. Requires confirmation. Deleting a policy also deletes all alert records generated by it.
- Edit: All fields are editable except policy name, object, and nodes.
- Enable/Disable: Toggle policy status.
- Bulk actions: Delete, Enable, Disable.

Notes

Releasing a cluster automatically disables all alert policies associated with that cluster.
If a partition is released or nodes are removed/powered off, related alert checks may produce no data.
- If the policy does not include a node running status rule, no data does not generate notifications or records.
- If the policy includes a node running status rule, no data generates notifications and records normally.

Alert behavior

Send notifications: Yes/No.
Notification list:
- Email: Shows email address and username.
- WeCom: Shows the WeCom robot ID and remarks.
Automation:
- Idle shutdown: When enabled, the system performs an automatic shutdown when the rule is triggered.
  - This automation is available for clusters.
  - To use it, the alert rule must be CPU usage with condition <.
  - After configuration: if CPU usage stays below the threshold for N minutes, the system shuts down the target.
  - N = sampling interval (minutes) x consecutive periods (times).

Alert Rules

When any rule matches, the policy is considered triggered.

Limits:

You cannot add two identical monitoring items.
You can add up to 8 monitoring items.
There is always one default rule and it cannot be deleted.

Monitoring items (Host)

Metric	Condition	Threshold	Unit
CPU usage	`>` `>=` `<` `<=` `=` `!=`	1 to 100	%
Memory usage	`>` `>=` `<` `<=` `=` `!=`	1 to 100	%
Node running status	`=`	Normal / Abnormal	-
Disk usage	`>` `>=` `<` `<=` `=` `!=`	1 to 100	%
Inbound traffic	`>` `>=` `<` `<=` `=` `!=`	1 to 100000000	kb/s
Outbound traffic	`>` `>=` `<` `<=` `=` `!=`	1 to 100000000	kb/s
Disk I/O write	`>` `>=` `<` `<=` `=` `!=`	1 to 100000000	kb/s
Disk I/O read	`>` `>=` `<` `<=` `=` `!=`	1 to 100000000	kb/s

Monitoring items (Service)

Service monitoring checks whether any service component on the selected cluster/system platform is abnormal. If any service becomes abnormal, an alert is triggered.

Monitoring items (Scheduler)

Metric	Condition	Threshold	Unit
Scheduler node status	`=`	Unavailable / Down (default: Down; multi-select supported)	-
Job status	`=`	Running	-

Metric notes

Scheduler node status metrics are sourced from Cluster Monitoring > Scheduler Monitoring > Node View.

Scheduler node status definitions

alloc, mix, etc. are scheduler-level node states from sinfo.

Available = alloc + mix + idle + completing
Unavailable (marked unavailable by admin) = drain + resv + maint
Down = down + fail + error

Alert Notifications

Send notifications: Required. Yes/No.
- If Yes, email/WeCom/Feishu settings are shown.
- If No, no notifications are sent. An alert record is still created.

Email

Shows a user list. Select one or more users. Selected users are disabled and cannot be selected again.
User list scope:
- Administrators can see all users and can configure email notifications to any users.
- Regular users can see only themselves and can configure notifications only for themselves.

Test

Sends a test message to the configured email address or WeCom/Feishu destination.

WeCom

Provide the WeCom robot webhook URL and a remark.

Feishu

For Feishu robot configuration, see Configure Feishu Robot.
Feishu notifications can also be added under Alert behavior.

Alert notification groups

You can create notification groups and bind WeCom/Feishu destinations to the group.

Create group:
- Group name: Required. Globally unique.
- Description: Optional.
- Members / WeCom / Feishu: Optional.
- Validation rules:
  1. The platform verifies global uniqueness and generates a group ID.
  2. A group record is added to the group list.
  3. The group is mapped to a Linux user group on cluster nodes and membership is synced to cluster nodes.
  4. Group notification methods include member emails and all bound WeCom/Feishu destinations.
Group list shows group ID, name, description, user count, bound WeCom/Feishu counts, and creation time.
Actions:
- Edit: Edit description, add users, add WeCom, add Feishu.
- Delete: Delete the group.

When creating or editing an alert policy, you can select a group. If selected, alerts notify all member emails and all bound WeCom/Feishu destinations.

Alert Policies​

Field descriptions​

Notes​

Alert behavior​

Alert Rules​

Monitoring items (Host)​

Monitoring items (Service)​

Monitoring items (Scheduler)​

Metric notes​

Scheduler node status definitions​

Alert Notifications​

Email​

Test​

WeCom​

Feishu​

Alert notification groups​