Skip to main content
Version: FCP 25.11

Impact of Restarting or Shutting Down the Platform and Related Nodes

The overall FCP platform consists of three major parts:

  • Platform management nodes
    • Core node
    • Monitor node (optional)
  • Cluster nodes
    • Head node
    • Compute node
    • Login node
    • Desktop node
  • External supporting service nodes
    • Authentication service (optional)
    • NTP service
    • Storage service

If the nodes above are shut down, the impact is as follows:

Node TypeIn-Cluster (Fsched) JobsTask ModeCluster ManagementCluster MonitoringUser ManagementData AccessRemote Access
Management nodeLong downtime may make task accounting information inaccurate; short downtime has no effectTask submission is unavailableCluster management is unavailableCluster monitoring is unavailableUser management is unavailableData access is unavailableRemote access is unavailable
Monitor nodeNoneNoneNoneCluster monitoring is unavailableNoneNoneNone
Head nodeNew jobs cannot be submitted. Running jobs continue until completion, but resources cannot be released afterwardTasks failCluster management is unavailableSome monitoring information cannot be collectedNoneNoneNone
Compute nodeJobs running on the node failTasks running on the node failCluster management is unavailableMonitoring information for that node cannot be collectedNoneNoneNone
Login nodeInteractive jobs running on the node failNoneCluster management is unavailableMonitoring information for that node cannot be collectedNoneNoneNone
Desktop nodeJobs running on the node failNoneCluster management is unavailableMonitoring information for that node cannot be collectedNoneNoneNone
Authentication serviceLong downtime (> 1 minute) prevents task submission because submitter identity cannot be verified; short downtime has no effectLong downtime (> 1 minute) prevents task submission because submitter identity cannot be verifiedUsers cannot log inNoneUser management is unavailableAuthentication cannot be verifiedAuthentication cannot be verified
NTP serviceLong failures cause time drift, which breaks node-to-node validation and prevents jobs from running; short failures have no effectLong failures cause time drift, which breaks node-to-node validation and prevents jobs from runningNoneNoneNoneNoneNone
Storage serviceTask execution may fail, depending on the applicationTask submission is unavailableCluster management is unavailable and management operations may blockNoneNoneData access is unavailableIf the user home directory is on shared storage, users cannot log in