跳到主要内容
版本:FCP 24.08

升级文档

升级须知

升级对平台中的集群及任务影响 (请务必评估版本变更须知)

前提条件

请先获取fastone-fcp-xxx.tgz的升级安装包,将安装包放到core节点中的DEPLOY_DIR,再执行如下步骤

升级步骤

FCP-CORE

  • 执行解压
# 解压
sudo tar xf ./fastone-fcp-xxx.tgz -C ./
  • 进入到解压后的目录
cd ymir-specs/upgrade/2405upgrade2408
sudo bash upgrade.sh
  • 配置集群分析grafana面板
  1. 升级前,集群分析未更改过dashboard。在core节点,将下方内容替换到此文档中/fastone-services/fastone/ui/assets/custom-data/cluster-monitor-panel.json。
cluster-monitor-panel需替换内容
{
"type": "cluster",
"category": [
{
"id": "1",
"title": {
"cn": "监控",
"en": "Monitor"
},
"layout": {
"col": 2
},
"list": [
{
"id": "1-1",
"title": {
"cn": "集群监控概览",
"en": "Cluster monitor overview"
},
"link": "/main/monitor/clusters"
},
{
"id": "1-2",
"title": {
"cn": "运营分析-集群视图",
"en": "Operations Analysis - Cluster View"
},
"link": "/main/monitor/analysis",
"checkPermission": true,
"permissions": "ROLE_ADMIN"
},
{
"id": "1-3",
"title": {
"cn": "分区监控",
"en": "Partition monitor"
},
"link": "/main/monitor/clusters"
},
{
"id": "1-4",
"title": {
"cn": "调度器节点监控",
"en": "Scheduler node monitoring"
},
"linkSlurmMonitor": true,
"link": "/main/monitor/clusters",
"queryParma": {
"activeIndex": 2,
"cluster_id": null
}
},
{
"id": "1-5",
"title": {
"cn": "主机监控",
"en": "Host monitoring"
},
"link": "/main/monitor/clusters",
"queryParma": {
"activeIndex": 0,
"hostActiveIndex": 1
}
},
{
"id": "1-6",
"title": {
"cn": "集群作业列表",
"en": "Cluster job list"
},
"link": "/main/monitor/analysis",
"queryParma": {
"type": "cluster"
},
"checkPermission": true,
"permissions": "ROLE_ADMIN"
},
{
"id": "1-7",
"title": {
"cn": "服务状态监控",
"en": "Service status monitoring"
},
"link": "/main/monitor/clusters",
"queryParma": {
"activeIndex": 1
}
}
]
},
{
"id": "2",
"title": {
"cn": "查询",
"en": "Query"
},
"layout": {
"col": 1
},
"list": [
{
"id": "2-1",
"title": {
"cn": "一定时间内已完成的JOB执行时间",
"en": "Execution time of a JOB within a specified period of time"
},
"grafanaLink": {
"cn": "/grafana/d/job-exec-time/job-exec-time",
"en": "/grafana/d/job-exec-time-en/job-exec-time-en"
},
"desc": {
"cn" : [
"该图表右上角时间范围无效,查询到的数据跟选择的集群和设置的最近完成时间有关",
"该图表的数据指标收集有最多5分钟的延迟"
],
"en": [
"The time range in the upper right corner of the chart is invalid, and the data queried is related to the selected cluster and the latest completion time set",
"The data indicators of this chart are collected with a delay of up to 5 minutes"
]
}
},
{
"id": "2-2",
"title": {
"cn": "等待中&运行中JOB",
"en": "Waiting&Running JOB"
},
"grafanaLink": {
"cn": "/grafana/d/waiting-and-running-jobs/waiting-and-running-jobs",
"en": "/grafana/d/waiting-and-running-jobs-en/waiting-and-running-jobs-en"
},
"desc": {
"cn": [
"该图表右上角时间范围有效,根据job的提交时间进行筛选,在预估数据量很大的情况下通过时间筛选来提高性能",
"该图表的数据指标收集有最多5分钟的延迟",
"当查询不到JOB,dashboard显示“No Data”"
],
"en": [
"The time range in the upper right corner of the chart is valid, and the data is filtered according to the submission time of the job. In the case of a large estimated data volume, performance is improved by filtering by time",
"The data indicators of this chart are collected with a delay of up to 5 minutes",
"When no JOB is queried, the dashboard displays 'No Data'"
]
}
},
{
"id": "2-3",
"title": {
"cn": "已完成JOB",
"en": "Completed JOB"
},
"grafanaLink": {
"cn": "/grafana/d/finished-jobs/finished-jobs",
"en": "/grafana/d/finished-jobs-en/finished-jobs-en"
},
"desc": {
"cn": [
"该图表右上角时间范围有效,根据job的提交时间进行筛选,在预估数据量很大的情况下通过时间筛选来提高性能",
"该图表的数据指标收集有最多5分钟的延迟",
"当查询不到JOB,dashboard显示“No Data”"
],
"en": [
"The time range in the upper right corner of the chart is valid, and the data is filtered according to the submission time of the job. In the case of a large estimated data volume, performance is improved by filtering by time",
"The data indicators of this chart are collected with a delay of up to 5 minutes",
"When no JOB is queried, the dashboard displays 'No Data'"
]
}
},
{
"id": "2-4",
"title": {
"cn": "用户JOB状态查询",
"en": "User JOB status query"
},
"grafanaLink": {
"cn": "/grafana/d/job-state/job-state",
"en": "/grafana/d/job-state-en/job-state-en"
},
"desc": {
"cn": [
"该图表右上角时间范围有效,统计的数据是调度器内存中的JOB,可以查询到JOB为所选择的时间段最近15分钟的JOB信息",
"该图表的数据指标收集有最多5分钟的延迟"
],
"en": [
"The time range in the upper right corner of the chart is valid, and the data is the JOB in the scheduler memory, and the JOB information of the latest 15 minutes of the selected time period can be queried",
"The data indicators of this chart are collected with a delay of up to 5 minutes"
]
}
},
{
"id": "2-5",
"title": {
"cn": "作业列表",
"en": "Job list"
},
"grafanaLink": {
"cn": "/grafana/d/job-list/job-list",
"en": "/grafana/d/job-list-en/job-list-en"
},
"desc": {
"cn": [
"该图表右上角时间范围有效,根据job的提交时间进行筛选,在预估数据量很大的情况下通过时间筛选来提高性能",
"该图表的数据指标收集有最多5分钟的延迟",
"当查询不到JOB,dashboard显示“No Data”"
],
"en": [
"The time range in the upper right corner of the chart is valid, and the data is filtered according to the submission time of the job. In the case of a large estimated data volume, performance is improved by filtering by time",
"The data indicators of this chart are collected with a delay of up to 5 minutes",
"When no JOB is queried, the dashboard displays 'No Data'"
]
}
}
]
},
{
"id": "3",
"title": {
"cn": "分析",
"en": "Analysis"
},
"layout": {
"col": 2
},
"list": [
{
"id": "3-1",
"title": {
"cn": "内存指定不合理的JOB",
"en": "Unreasonable memory specified JOB"
},
"grafanaLink": {
"cn": "/grafana/d/memory-unreasonable-job/memory-unreasonable-job",
"en": "/grafana/d/memory-unreasonable-job-en/memory-unreasonable-job-en"
},
"desc": {
"cn": [
"该图表右上角时间范围有效,根据job的提交时间进行筛选,在预估数据量很大的情况下通过时间筛选来提高性能",
"该图表的数据指标收集有最多5分钟的延迟",
"当查询不到JOB,dashboard显示“No Data”"
],
"en": [
"The time range in the upper right corner of the chart is valid, and the data is filtered according to the submission time of the job. In the case of a large estimated data volume, performance is improved by filtering by time",
"The data indicators of this chart are collected with a delay of up to 5 minutes",
"When no JOB is queried, the dashboard displays 'No Data'"
]
}
},
{
"id": "3-2",
"title": {
"cn": "异常退出的JOB",
"en": "Abnormal exit of JOB"
},
"grafanaLink": {
"cn": "/grafana/d/failed-job/failed-job",
"en": "/grafana/d/failed-job-en/failed-job-en"
},
"desc": {
"cn": [
"该图表右上角时间范围有效,根据job的提交时间进行筛选,在预估数据量很大的情况下通过时间筛选来提高性能",
"该图表的数据指标收集有最多5分钟的延迟",
"当查询不到JOB,dashboard显示“No Data”"
],
"en": [
"The time range in the upper right corner of the chart is valid, and the data is filtered according to the submission time of the job. In the case of a large estimated data volume, performance is improved by filtering by time",
"The data indicators of this chart are collected with a delay of up to 5 minutes",
"When no JOB is queried, the dashboard displays 'No Data'"
]
}
},
{
"id": "3-3",
"title": {
"cn": "用户用量统计",
"en": "User usage statistics"
},
"grafanaLink": {
"cn": "/grafana/d/user-usage-statistics/user-usage-statistics",
"en": "/grafana/d/user-usage-statistics-en/user-usage-statistics-en"
},
"desc": {
"cn": [
"该图表右上角时间范围有效,根据job的完成时间进行筛选",
"该图表的数据指标每天凌晨2点计算一次",
"当查询不到JOB,dashboard显示“No Data”"
],
"en": [
"The time range in the upper right corner of the chart is valid, and the data is filtered according to the completion time of the job",
"The data indicators of this chart are calculated at 2:00 am every day",
"When no JOB is queried, the dashboard displays 'No Data'"
]
}
},
{
"id": "3-4",
"title": {
"cn": "CPU指定不合理的JOB",
"en": "CPU specified unreasonable JOB"
},
"grafanaLink": {
"cn": "/grafana/d/cpu-unreasonable-job/cpu-unreasonable-job",
"en": "/grafana/d/cpu-unreasonable-job-en/cpu-unreasonable-job-en"
},
"desc": {
"cn": [
"该图表右上角时间范围有效,根据job的提交时间进行筛选,在预估数据量很大的情况下通过时间筛选来提高性能",
"该图表的数据指标收集有最多5分钟的延迟",
"当查询不到JOB,dashboard显示“No Data”"
],
"en": [
"The time range in the upper right corner of the chart is valid, and the data is filtered according to the submission time of the job. In the case of a large estimated data volume, performance is improved by filtering by time",
"The data indicators of this chart are collected with a delay of up to 5 minutes",
"When no JOB is queried, the dashboard displays 'No Data'"
]
}
},
{
"id": "3-5",
"title": {
"cn": "集群等待中的JOB平均等待时长",
"en": "Average waiting time for JOBs waiting in the cluster"
},
"grafanaLink": {
"cn": "/grafana/d/cluster-pending-job-wait-time/cluster-pending-job-wait-time",
"en": "/grafana/d/cluster-pending-job-wait-time-en/cluster-pending-job-wait-time-en"
},
"desc": {
"cn": [
"该图表右上角时间范围有效,根据选择的时间展示相应时间段的折线图",
"该图表的数据指标收集有最多1分钟的延迟"
],
"en": [
"The time range in the upper right corner of the chart is valid, and the line chart of the corresponding time period is displayed according to the selected time",
"The data indicators of this chart are collected with a delay of up to 1 minute"
]
}
},
{
"id": "3-6",
"title": {
"cn": "分区等待中的JOB平均等待时长",
"en": "Average waiting time of JOBs waiting for partitions"
},
"grafanaLink": {
"cn": "/grafana/d/partition-pending-job-wait-time/partition-pending-job-wait-time",
"en": "/grafana/d/partition-pending-job-wait-time-en/partition-pending-job-wait-time-en"
},
"desc": {
"cn": [
"该图表右上角时间范围有效,根据选择的时间展示相应时间段的折线图",
"该图表的数据指标收集有最多1分钟的延迟"
],
"en": [
"The time range in the upper right corner of the chart is valid, and the line chart of the corresponding time period is displayed according to the selected time",
"The data indicators of this chart are collected with a delay of up to 1 minute"
]
}
},
{
"id": "3-7",
"title": {
"cn": "分区中已完成的JOB数量",
"en": "Number of completed jobs in the partition"
},
"grafanaLink": {
"cn": "/grafana/d/partition-completed-job/partition-completed-job",
"en": "/grafana/d/partition-completed-job-en/partition-completed-job-en"
},
"desc": {
"cn": [
"该图表右上角时间范围有效,根据选择的时间展示相应时间段的折线图,统计的数据是各个时刻调度器内存中已完成的JOB数量,即各个时刻最近15分钟已完成的JOB数量",
"该图表的数据指标收集有最多5分钟的延迟",
"当查询不到JOB,dashboard显示“No Data”"
]
}
}
]
}
]
}
  1. 升级前,集群分析更改过dashboard。按需调整cluster-monitor-panel.json文件,调整方法参考自定义集群分析grafana面板.md

FCP-SUITE

FCP-SUITE的升级方式与FCP-CORE相同, 请将上述步骤在core节点和monitor节点中重复执行.