跳到主要内容
版本:FCP 24.11

版本变更须知

本文档记录了升级各版本时的注意事项,在升级前请务必阅读本文档。

提示
  1. 因为升级涉及到不同版本之间的变化,请先获取当前部署版本信息。请跟据获取到版本信息仔细阅读以下内容,中间版本也需要同时考虑。例如:获取到的当前部署版本是24.05且计划升级到24.11时,请阅读24.05及之后所有版本的变更内容。
  2. 该文档中出现的本版本,指右上角所选版本。

24.05

集群管理

  • 负载阈值:本版本中,Fsched改变了负载阈值的实现方式,提供了更稳定的管理能力。如果您的集群依赖负载阈值功能,请确保在执行平台升级前将 fsched 升级,否则将导致集群配置异常。如果暂时无法升级,请考虑关闭该功能。

环境配置

  • 通知配置要求
    在新版本中, 对通知配置进行了优化, 将SMTP端口配置修改为SSL和非SSL选项。
    升级后, 将使用您在旧版本中配置的SSL端口以启用SMTP SSL功能, 请确认在旧版本中配置的端口是正确的。
    要查看相关配置, 请通过配置管理员登录配置界面并切换到通知配置选项卡中。

24.08

集群管理

  • 分区管理员:本版本中,新增分区管理员功能。此功能新增 fsched 的配置,因为旧功能的 fsched 不具备这个配置,会导致集群配置异常。请确保在执行平台升级前将 fsched 升级。该功能无法临时关闭。

桌面管理

  • 桌面应用: 本版本中, 新增了桌面应用功能,此功能的使用需要安装Web Portal相关组件且须升级服务。以下是相关组件安装要求,请在桌面节点上安装基础组件:

    对于Ubuntu系统, 请执行:

    sudo apt install xdotool

    对于CentOS系统, 请执行:

    sudo yum install xorg-x11-utils xdotool

运维

  • 集群分析: 配置集群分析中的grafana面板
    1. 升级前,集群分析未更改过dashboard。在core节点,将下方内容替换到此文档中/fastone-services/fastone/ui/assets/custom-data/cluster-monitor-panel.json

      cluster-monitor-panel需替换内容
      {
      "type": "cluster",
      "category": [
      {
      "id": "1",
      "title": {
      "cn": "监控",
      "en": "Monitor"
      },
      "layout": {
      "col": 2
      },
      "list": [
      {
      "id": "1-1",
      "title": {
      "cn": "集群监控概览",
      "en": "Cluster monitor overview"
      },
      "link": "/main/monitor/clusters"
      },
      {
      "id": "1-2",
      "title": {
      "cn": "运营分析-集群视图",
      "en": "Operations Analysis - Cluster View"
      },
      "link": "/main/monitor/analysis",
      "checkPermission": true,
      "permissions": "ROLE_ADMIN"
      },
      {
      "id": "1-3",
      "title": {
      "cn": "分区监控",
      "en": "Partition monitor"
      },
      "link": "/main/monitor/clusters"
      },
      {
      "id": "1-4",
      "title": {
      "cn": "调度器节点监控",
      "en": "Scheduler node monitoring"
      },
      "linkSlurmMonitor": true,
      "link": "/main/monitor/clusters",
      "queryParma": {
      "activeIndex": 2,
      "cluster_id": null
      }
      },
      {
      "id": "1-5",
      "title": {
      "cn": "主机监控",
      "en": "Host monitoring"
      },
      "link": "/main/monitor/clusters",
      "queryParma": {
      "activeIndex": 0,
      "hostActiveIndex": 1
      }
      },
      {
      "id": "1-6",
      "title": {
      "cn": "集群作业列表",
      "en": "Cluster job list"
      },
      "link": "/main/monitor/analysis",
      "queryParma": {
      "type": "cluster"
      },
      "checkPermission": true,
      "permissions": "ROLE_ADMIN"
      },
      {
      "id": "1-7",
      "title": {
      "cn": "服务状态监控",
      "en": "Service status monitoring"
      },
      "link": "/main/monitor/clusters",
      "queryParma": {
      "activeIndex": 1
      }
      }
      ]
      },
      {
      "id": "2",
      "title": {
      "cn": "查询",
      "en": "Query"
      },
      "layout": {
      "col": 1
      },
      "list": [
      {
      "id": "2-1",
      "title": {
      "cn": "一定时间内已完成的作业执行时间",
      "en": "Execution Time of Completed Jobs within a Specified Time Period"
      },
      "grafanaLink": {
      "cn": "/grafana/d/job-exec-time/job-exec-time",
      "en": "/grafana/d/job-exec-time-en/job-exec-time-en"
      },
      "desc": {
      "cn" : [
      "查询到的数据跟选择的集群和设置的最近完成时间有关",
      "该图表的数据指标收集有最多5分钟的延迟",
      "订阅的数据为首次加载的集群;如果需要更改订阅的集群,请返回上层界面选择新的集群,然后再次进入当前界面进行订阅"
      ],
      "en": [
      "The time range in the upper right corner of the chart is invalid, and the data queried is related to the selected cluster and the latest completion time set",
      "The data indicators of this chart are collected with a delay of up to 5 minutes",
      "The subscribed data is for the initially loaded cluster; If you need to change the subscribed cluster, please return to the previous screen to select a new cluster, and then re-enter the current screen to subscribe again."
      ]
      }
      },
      {
      "id": "2-2",
      "title": {
      "cn": "等待中&运行中作业",
      "en": "Waiting & Running Jobs"
      },
      "grafanaLink": {
      "cn": "/grafana/d/waiting-and-running-jobs/waiting-and-running-jobs",
      "en": "/grafana/d/waiting-and-running-jobs-en/waiting-and-running-jobs-en"
      },
      "desc": {
      "cn": [
      "该图表右上角时间范围有效,根据作业的提交时间进行筛选,在预估数据量很大的情况下通过时间筛选来提高性能",
      "该图表的数据指标收集有最多5分钟的延迟",
      "当查询不到作业,dashboard显示“No Data”",
      "订阅的数据为首次加载的集群;如果需要更改订阅的集群,请返回上层界面选择新的集群,然后再次进入当前界面进行订阅"
      ],
      "en": [
      "The time range in the upper right corner of the chart is valid, and the data is filtered according to the submission time of the job. In the case of a large estimated data volume, performance is improved by filtering by time",
      "The data indicators of this chart are collected with a delay of up to 5 minutes",
      "When no job is queried, the dashboard displays 'No Data'",
      "The subscribed data is for the initially loaded cluster; If you need to change the subscribed cluster, please return to the previous screen to select a new cluster, and then re-enter the current screen to subscribe again."
      ]
      }
      },
      {
      "id": "2-3",
      "title": {
      "cn": "已完成作业",
      "en": "Completed Jobs"
      },
      "grafanaLink": {
      "cn": "/grafana/d/finished-jobs/finished-jobs",
      "en": "/grafana/d/finished-jobs-en/finished-jobs-en"
      },
      "desc": {
      "cn": [
      "该图表右上角时间范围有效,根据作业的提交时间进行筛选,在预估数据量很大的情况下通过时间筛选来提高性能",
      "该图表的数据指标收集有最多5分钟的延迟",
      "当查询不到作业,dashboard显示“No Data”",
      "订阅的数据为首次加载的集群;如果需要更改订阅的集群,请返回上层界面选择新的集群,然后再次进入当前界面进行订阅"
      ],
      "en": [
      "The time range in the upper right corner of the chart is valid, and the data is filtered according to the submission time of the job. In the case of a large estimated data volume, performance is improved by filtering by time",
      "The data indicators of this chart are collected with a delay of up to 5 minutes",
      "When no job is queried, the dashboard displays 'No Data'",
      "The subscribed data is for the initially loaded cluster; If you need to change the subscribed cluster, please return to the previous screen to select a new cluster, and then re-enter the current screen to subscribe again."
      ]
      }
      },
      {
      "id": "2-4",
      "title": {
      "cn": "用户作业状态查询",
      "en": "User Job Status Inquiry"
      },
      "grafanaLink": {
      "cn": "/grafana/d/job-state/job-state",
      "en": "/grafana/d/job-state-en/job-state-en"
      },
      "desc": {
      "cn": [
      "该图表右上角的时间范围仅开始时间有效,统计的数据反映的是调度器内存中在该时刻的作业信息,用户可以查询到设置的开始时间最近15分钟的作业信息",
      "该图表的数据指标收集有最多5分钟的延迟",
      "订阅的数据为首次加载的集群;如果需要更改订阅的集群,请返回上层界面选择新的集群,然后再次进入当前界面进行订阅"
      ],
      "en": [
      "The time range in the upper right corner of the chart only considers the start time, and the data reflects the job information in the scheduler's memory at that moment;Users can query job information from the 15 minutes leading up to the specified start time",
      "The data indicators of this chart are collected with a delay of up to 5 minutes",
      "The subscribed data is for the initially loaded cluster; If you need to change the subscribed cluster, please return to the previous screen to select a new cluster, and then re-enter the current screen to subscribe again."
      ]
      }
      },
      {
      "id": "2-5",
      "title": {
      "cn": "作业列表",
      "en": "Job List"
      },
      "grafanaLink": {
      "cn": "/grafana/d/job-list/job-list",
      "en": "/grafana/d/job-list-en/job-list-en"
      },
      "desc": {
      "cn": [
      "该图表右上角时间范围有效,根据作业的提交时间进行筛选,在预估数据量很大的情况下通过时间筛选来提高性能",
      "该图表的数据指标收集有最多5分钟的延迟",
      "当查询不到作业,dashboard显示“No Data”",
      "订阅的数据为首次加载的集群;如果需要更改订阅的集群,请返回上层界面选择新的集群,然后再次进入当前界面进行订阅"
      ],
      "en": [
      "The time range in the upper right corner of the chart is valid, and the data is filtered according to the submission time of the job. In the case of a large estimated data volume, performance is improved by filtering by time",
      "The data indicators of this chart are collected with a delay of up to 5 minutes",
      "When no job is queried, the dashboard displays 'No Data'",
      "The subscribed data is for the initially loaded cluster; If you need to change the subscribed cluster, please return to the previous screen to select a new cluster, and then re-enter the current screen to subscribe again."
      ]
      }
      }
      ]
      },
      {
      "id": "3",
      "title": {
      "cn": "分析",
      "en": "Analysis"
      },
      "layout": {
      "col": 2
      },
      "list": [
      {
      "id": "3-1",
      "title": {
      "cn": "内存指定不合理的作业",
      "en": "Jobs with Unreasonable Memory Allocation"
      },
      "grafanaLink": {
      "cn": "/grafana/d/memory-unreasonable-job/memory-unreasonable-job",
      "en": "/grafana/d/memory-unreasonable-job-en/memory-unreasonable-job-en"
      },
      "desc": {
      "cn": [
      "该图表右上角时间范围有效,根据作业的提交时间进行筛选,在预估数据量很大的情况下通过时间筛选来提高性能",
      "该图表的数据指标收集有最多5分钟的延迟",
      "当查询不到作业,dashboard显示“No Data”",
      "订阅的数据为首次加载的集群;如果需要更改订阅的集群,请返回上层界面选择新的集群,然后再次进入当前界面进行订阅"
      ],
      "en": [
      "The time range in the upper right corner of the chart is valid, and the data is filtered according to the submission time of the job. In the case of a large estimated data volume, performance is improved by filtering by time",
      "The data indicators of this chart are collected with a delay of up to 5 minutes",
      "When no job is queried, the dashboard displays 'No Data'",
      "The subscribed data is for the initially loaded cluster; If you need to change the subscribed cluster, please return to the previous screen to select a new cluster, and then re-enter the current screen to subscribe again."
      ]
      }
      },
      {
      "id": "3-2",
      "title": {
      "cn": "异常退出的作业",
      "en": "Jobs that terminated abnormally"
      },
      "grafanaLink": {
      "cn": "/grafana/d/failed-job/failed-job",
      "en": "/grafana/d/failed-job-en/failed-job-en"
      },
      "desc": {
      "cn": [
      "该图表右上角时间范围有效,根据作业的提交时间进行筛选,在预估数据量很大的情况下通过时间筛选来提高性能",
      "该图表的数据指标收集有最多5分钟的延迟",
      "当查询不到作业,dashboard显示“No Data”",
      "订阅的数据为首次加载的集群;如果需要更改订阅的集群,请返回上层界面选择新的集群,然后再次进入当前界面进行订阅"
      ],
      "en": [
      "The time range in the upper right corner of the chart is valid, and the data is filtered according to the submission time of the job. In the case of a large estimated data volume, performance is improved by filtering by time",
      "The data indicators of this chart are collected with a delay of up to 5 minutes",
      "When no job is queried, the dashboard displays 'No Data'",
      "The subscribed data is for the initially loaded cluster; If you need to change the subscribed cluster, please return to the previous screen to select a new cluster, and then re-enter the current screen to subscribe again."
      ]
      }
      },
      {
      "id": "3-3",
      "title": {
      "cn": "用户用量统计",
      "en": "User Usage Statistics"
      },
      "grafanaLink": {
      "cn": "/grafana/d/user-usage-statistics/user-usage-statistics",
      "en": "/grafana/d/user-usage-statistics-en/user-usage-statistics-en"
      },
      "desc": {
      "cn": [
      "该图表右上角时间范围有效,根据作业的完成时间进行筛选",
      "该图表的数据指标每天凌晨2点计算一次",
      "当查询不到作业,dashboard显示“No Data”",
      "订阅的数据为首次加载的集群;如果需要更改订阅的集群,请返回上层界面选择新的集群,然后再次进入当前界面进行订阅"
      ],
      "en": [
      "The time range in the upper right corner of the chart is valid, and the data is filtered according to the completion time of the job",
      "The data indicators of this chart are calculated at 2:00 am every day",
      "When no job is queried, the dashboard displays 'No Data'",
      "The subscribed data is for the initially loaded cluster; If you need to change the subscribed cluster, please return to the previous screen to select a new cluster, and then re-enter the current screen to subscribe again."
      ]
      }
      },
      {
      "id": "3-4",
      "title": {
      "cn": "CPU指定不合理的作业",
      "en": "Jobs with Unreasonable CPU Allocation"
      },
      "grafanaLink": {
      "cn": "/grafana/d/cpu-unreasonable-job/cpu-unreasonable-job",
      "en": "/grafana/d/cpu-unreasonable-job-en/cpu-unreasonable-job-en"
      },
      "desc": {
      "cn": [
      "该图表右上角时间范围有效,根据作业的提交时间进行筛选,在预估数据量很大的情况下通过时间筛选来提高性能",
      "该图表的数据指标收集有最多5分钟的延迟",
      "当查询不到作业,dashboard显示“No Data”",
      "订阅的数据为首次加载的集群;如果需要更改订阅的集群,请返回上层界面选择新的集群,然后再次进入当前界面进行订阅"
      ],
      "en": [
      "The time range in the upper right corner of the chart is valid, and the data is filtered according to the submission time of the job. In the case of a large estimated data volume, performance is improved by filtering by time",
      "The data indicators of this chart are collected with a delay of up to 5 minutes",
      "When no job is queried, the dashboard displays 'No Data'",
      "The subscribed data is for the initially loaded cluster; If you need to change the subscribed cluster, please return to the previous screen to select a new cluster, and then re-enter the current screen to subscribe again."
      ]
      }
      },
      {
      "id": "3-5",
      "title": {
      "cn": "集群等待中的作业平均等待时长",
      "en": "Average Waiting Time of Pending Jobs in the Cluster"
      },
      "grafanaLink": {
      "cn": "/grafana/d/cluster-pending-job-wait-time/cluster-pending-job-wait-time",
      "en": "/grafana/d/cluster-pending-job-wait-time-en/cluster-pending-job-wait-time-en"
      },
      "desc": {
      "cn": [
      "该图表右上角时间范围有效,根据选择的时间展示相应时间段的折线图",
      "该图表的数据指标收集有最多1分钟的延迟",
      "订阅的数据为首次加载的集群;如果需要更改订阅的集群,请返回上层界面选择新的集群,然后再次进入当前界面进行订阅"
      ],
      "en": [
      "The time range in the upper right corner of the chart is valid, and the line chart of the corresponding time period is displayed according to the selected time",
      "The data indicators of this chart are collected with a delay of up to 1 minute",
      "The subscribed data is for the initially loaded cluster; If you need to change the subscribed cluster, please return to the previous screen to select a new cluster, and then re-enter the current screen to subscribe again."
      ]
      }
      },
      {
      "id": "3-6",
      "title": {
      "cn": "分区等待中的作业平均等待时长",
      "en": "Average Waiting Time of Pending Jobs in the Partition"
      },
      "grafanaLink": {
      "cn": "/grafana/d/partition-pending-job-wait-time/partition-pending-job-wait-time",
      "en": "/grafana/d/partition-pending-job-wait-time-en/partition-pending-job-wait-time-en"
      },
      "desc": {
      "cn": [
      "该图表右上角时间范围有效,根据选择的时间展示相应时间段的折线图",
      "该图表的数据指标收集有最多1分钟的延迟",
      "订阅的数据为首次加载的集群;如果需要更改订阅的集群,请返回上层界面选择新的集群,然后再次进入当前界面进行订阅"
      ],
      "en": [
      "The time range in the upper right corner of the chart is valid, and the line chart of the corresponding time period is displayed according to the selected time",
      "The data indicators of this chart are collected with a delay of up to 1 minute",
      "The subscribed data is for the initially loaded cluster; If you need to change the subscribed cluster, please return to the previous screen to select a new cluster, and then re-enter the current screen to subscribe again."
      ]
      }
      },
      {
      "id": "3-7",
      "title": {
      "cn": "分区中已完成的作业数量",
      "en": "Number of Completed Jobs in the Partition"
      },
      "grafanaLink": {
      "cn": "/grafana/d/partition-completed-job/partition-completed-job",
      "en": "/grafana/d/partition-completed-job-en/partition-completed-job-en"
      },
      "desc": {
      "cn": [
      "该图表右上角时间范围有效,根据选择的时间展示相应时间段的折线图,统计的数据是各个时刻调度器内存中已完成的作业数量,即各个时刻最近15分钟已完成的作业数量",
      "该图表的数据指标收集有最多5分钟的延迟",
      "当查询不到作业,dashboard显示“No Data”",
      "订阅的数据为首次加载的集群;如果需要更改订阅的集群,请返回上层界面选择新的集群,然后再次进入当前界面进行订阅"
      ],
      "en": [
      "The time range in the upper right corner of the chart is valid, displaying a line chart of the corresponding time period based on the selected time. The statistical data is the number of completed jobs in the scheduler memory at each time point, that is, the number of jobs completed in the last 15 minutes at each time point",
      "The data indicator collection for this chart has a maximum delay of 5 minutes",
      "When the job cannot be queried, the dashboard displays' No Data”",
      "The subscribed data is for the initially loaded cluster; If you need to change the subscribed cluster, please return to the previous screen to select a new cluster, and then re-enter the current screen to subscribe again."
      ]
      }
      }
      ]
      }
      ]
      }

    2. 升级前,集群分析更改过dashboard。按需调整cluster-monitor-panel.json文件,调整方法参考自定义集群分析grafana面板.md

用户管理

  • 用户过期时间: 升级前用户设置的过期时间,在升级后不会再生效。
  • admin用户功能变更: 内置用户认证系统环境,admin用户移除了远程链接和新建任务功能。
    注意在升级前,请确保断开 admin 用户的远程链接,以避免升级后占用资源。