Alert Service
The Idle shutdown automation is available only after Hybrid Cloud is enabled in FCP-Suite.
Alert Policies
Limit: You can create up to 1000 alert policies.
Field descriptions
- Policy name: Required. 1 to 40 characters. Must start with a letter. Can include numbers,
_, and-. - Object: Required. Select a cluster or the system platform.
- Regular users can create alerts only for clusters visible to them (clusters they created or clusters shared with them).
- Only running clusters can be selected. Non-running clusters may be shown in the list but cannot be selected.
- Type: Required. Host, Service, or Scheduler (default: Host).
- Host: Select one or more nodes for a cluster/system platform. For file systems, no node selection is required.
- Service: No specific node selection is required; the system monitors services on all nodes for the selected object.
- Scheduler: Available only when the object is an Fsched cluster. Scheduler-type policies apply to the Fsched scheduler.
- Nodes: Required.
- If the object is a cluster:
- All nodes: Default. When selected, you cannot select other nodes.
- Or select one or more specific nodes (head, login, compute).
- If the object is the system platform:
- All nodes: Default. When selected, you cannot select other nodes.
- Or select one or more specific nodes (for example, all-in-one is one node; all-in-two is two nodes). Only running nodes can be selected.
- If the object is a cluster:
- Partitions: Required only for Scheduler type. Select one, multiple, or all partitions in the cluster.
- Severity: Required. Notification, Warning, or Critical.
- Sampling interval: Required. How often the system samples data and computes the average in that interval. Unit: minutes. Range: 1 to 1,000,000.
- Consecutive periods: Required. How many consecutive periods must exceed the threshold before triggering an alert. Unit: times. Range: 1 to 1,000,000.
- Silence period: Required. If the alert is not recovered, how often to resend notifications. Default: 24 hours. Options: 5/15/30 minutes, 1/3/6/12/24 hours.
- Status:
- Enabled: Default. Policy is effective, notifications are sent, and alert records are generated.
- Disabled: No notifications and no alert records.
- User: User who created the policy.
- Actions:
- Delete: Available in any status. Requires confirmation. Deleting a policy also deletes all alert records generated by it.
- Edit: All fields are editable except policy name, object, and nodes.
- Enable/Disable: Toggle policy status.
- Bulk actions: Delete, Enable, Disable.
Notes
- Releasing a cluster automatically disables all alert policies associated with that cluster.
- If a partition is released or nodes are removed/powered off, related alert checks may produce
no data.- If the policy does not include a node running status rule,
no datadoes not generate notifications or records. - If the policy includes a node running status rule,
no datagenerates notifications and records normally.
- If the policy does not include a node running status rule,
Alert behavior
- Send notifications: Yes/No.
- Notification list:
- Email: Shows email address and username.
- WeCom: Shows the WeCom robot ID and remarks.
- Automation:
- Idle shutdown: When enabled, the system performs an automatic shutdown when the rule is triggered.
- This automation is available for clusters.
- To use it, the alert rule must be CPU usage with condition
<. - After configuration: if CPU usage stays below the threshold for
Nminutes, the system shuts down the target. N = sampling interval (minutes) x consecutive periods (times).
- Idle shutdown: When enabled, the system performs an automatic shutdown when the rule is triggered.
Alert Rules
When any rule matches, the policy is considered triggered.
Limits:
- You cannot add two identical monitoring items.
- You can add up to 8 monitoring items.
- There is always one default rule and it cannot be deleted.
Monitoring items (Host)
| Metric | Condition | Threshold | Unit |
|---|---|---|---|
| CPU usage | > >= < <= = != | 1 to 100 | % |
| Memory usage | > >= < <= = != | 1 to 100 | % |
| Node running status | = | Normal / Abnormal | - |
| Disk usage | > >= < <= = != | 1 to 100 | % |
| Inbound traffic | > >= < <= = != | 1 to 100000000 | kb/s |
| Outbound traffic | > >= < <= = != | 1 to 100000000 | kb/s |
| Disk I/O write | > >= < <= = != | 1 to 100000000 | kb/s |
| Disk I/O read | > >= < <= = != | 1 to 100000000 | kb/s |
Monitoring items (Service)
Service monitoring checks whether any service component on the selected cluster/system platform is abnormal. If any service becomes abnormal, an alert is triggered.
Monitoring items (Scheduler)
| Metric | Condition | Threshold | Unit |
|---|---|---|---|
| Scheduler node status | = | Unavailable / Down (default: Down; multi-select supported) | - |
| Job status | = | Running | - |
Metric notes
Scheduler node status metrics are sourced from Cluster Monitoring > Scheduler Monitoring > Node View.
Scheduler node status definitions
alloc,mix, etc. are scheduler-level node states fromsinfo.
- Available =
alloc+mix+idle+completing - Unavailable (marked unavailable by admin) =
drain+resv+maint - Down =
down+fail+error
Alert Notifications
- Send notifications: Required. Yes/No.
- If Yes, email/WeCom/Feishu settings are shown.
- If No, no notifications are sent. An alert record is still created.
Email
- Shows a user list. Select one or more users. Selected users are disabled and cannot be selected again.
- User list scope:
- Administrators can see all users and can configure email notifications to any users.
- Regular users can see only themselves and can configure notifications only for themselves.
Test
Sends a test message to the configured email address or WeCom/Feishu destination.
WeCom
- Provide the WeCom robot webhook URL and a remark.
Feishu
- For Feishu robot configuration, see Configure Feishu Robot.
- Feishu notifications can also be added under Alert behavior.
Alert notification groups
You can create notification groups and bind WeCom/Feishu destinations to the group.
- Create group:
- Group name: Required. Globally unique.
- Description: Optional.
- Members / WeCom / Feishu: Optional.
- Validation rules:
- The platform verifies global uniqueness and generates a group ID.
- A group record is added to the group list.
- The group is mapped to a Linux user group on cluster nodes and membership is synced to cluster nodes.
- Group notification methods include member emails and all bound WeCom/Feishu destinations.
- Group list shows group ID, name, description, user count, bound WeCom/Feishu counts, and creation time.
- Actions:
- Edit: Edit description, add users, add WeCom, add Feishu.
- Delete: Delete the group.
When creating or editing an alert policy, you can select a group. If selected, alerts notify all member emails and all bound WeCom/Feishu destinations.