开发任务模板

更新时间：2026-04-02

此文档讲解如何使用分布式训练任务开发自己的业务任务模板。

在进行工作流编排前，开发者需要先开发单个任务，将工作流中的每个单个任务在分布式训练任务中进行提交验证，一旦单个任务顺利跑通，继续将任务制作为任务模板即可在工作流中被复用，沉淀为AI资产。

任务模板开发

工作流的CustomTask对应百舸的分布式训练任务，所以百舸CustomTask类型的任务模板支持的参数与分布式训练任务完全一致。

在分布式训练任务中跑通任务

任务代码开发完成后，进入百舸控制台，点击左侧【分布式训练】，点击页面的【创建任务】填写表单验证任务。创建训练任务详细步骤参考创建训练任务

任务跑通后，将任务参数信息整理为工作流任务基础模板，即将创建任务时填写的表单项使用接口参数方式表达，任务参数可以在任务详情页查看。

以下模板主要以数据处理场景进行演示

完整的任务参数模板可参考任务模板示例/固定参数任务模板

YAML

1version: v1
2kind: PipelineTemplate
3taskTemplates:
4  - name: custom-task-demo
5    type: CustomTask
6    spec:
7      queue: aihcq-h1plvpzb5gh0
8      jobType: PyTorchJob
9      command: sleep 5s
10      priority: normal
11      jobSpec:
12        image: registry.baidubce.com/aihc-aiak/aiak-megatron:ubuntu20.04-cu11.8-torch1.14.0-py38_v1.2.7.12_release
13        replicas: 1
14        resources: []
15        envs:
16          - name: SOURCE_DATA_DIR
17            value: bos://my-bucket/datasets/source
18          - name: TARGET_DATA_DIR
19            value: bos://my-bucket/datasets/target
20      datasources: 
21        - type: pfs
22          name: pfs-pxE6jz
23          sourcePath: /
24          mountPath: /mnt/cluster
25        - type: bos
26          name: ""
27          sourcePath: bos://my-bucket/datasets
28          mountPath: /mnt/bos/data
29tasks:
30  - name: task_a
31    taskTemplateName: custom-task-demo

设计任务模板参数

将任务中可以外部传入的参数设置为input参数，推荐的设计原则：

如果不同任务使用的资源量不同，将资源参数提取为input
所有的任务逻辑控制参数，放置在envs中传入

上述示例中我们将环境变量中定义的数据目录（SOURCE_DATA_DIR、TARGET_DATA_DIR）、执行命令（command）、BOS挂载地址设置为模板的可传入参数

YAML

1version: v1
2kind: PipelineTemplate
3taskTemplates:
4  - name: custom-task-demo
5    type: CustomTask
6    inputs:
7      - name: command # 参数名
8        type: string  # 参数类
9        hint: 执行命令  # 参数描述
10      - name: source_data_dir # 参数名
11        type: string # 参数类型
12        hint: 源数据目录 # 参数描述
13      - name: target_data_dir # 参数名
14        type: string # 参数类型
15        hint: 目标数据目录 # 参数描述
16      - name: bos_dir # 参数名
17        type: string # 参数类型
18        hint: BOS存储地址 # 参数描述
19    spec:
20      queue: aihcq-h1plvpzb5gh0
21      jobType: PyTorchJob
22      command: '{{inputs.parameters.command}}'
23      jobSpec:
24        replicas: 1
25        image: registry.baidubce.com/aihc-aiak/aiak-megatron:ubuntu20.04-cu11.8-torch1.14.0-py38_v1.2.7.12_release
26        resources: []
27        envs:
28          - name: SOURCE_DATA_DIR
29            value: '{{inputs.parameters.source_data_dir}}'
30          - name: TARGET_DATA_DIR
31            value: '{{inputs.parameters.target_data_dir}}'
32      labels: []
33      datasources: 
34        - type: bos
35          name: ""
36          sourcePath: '{{inputs.parameters.bos_dir}}'
37          mountPath: /mnt/bos/data
38tasks:
39  - name: task_a
40    taskTemplateName: custom-task-demo
41    inputs:
42      - name: command
43        value: echo "This is a defined command."
44      - name: source_data_dir
45        value: bos://my-bucket/datasets/source
46      - name: target_data_dir
47        value: bos://my-bucket/datasets/target
48      - name: bos_dir
49        value: bos://my-bucket/datasets

在工作流中使用任务模板

以下示例设置job-1打印 This is job_a. ，job-2设置打印 This is job_b.

YAML

1version: v1
2kind: PipelineTemplate
3taskTemplates:
4  - name: custom-task-demo
5    type: CustomTask
6    inputs:
7      - name: command # 参数名
8        type: string  # 参数类
9        hint: 执行命令  # 参数描述
10      - name: source_data_dir # 参数名
11        type: string # 参数类型
12        hint: 源数据目录 # 参数描述
13      - name: target_data_dir # 参数名
14        type: string # 参数类型
15        hint: 目标数据目录 # 参数描述
16      - name: bos_dir # 参数名
17        type: string # 参数类型
18        hint: BOS存储地址 # 参数描述
19    spec:
20      queue: aihcq-h1plvpzb5gh0
21      jobType: PyTorchJob
22      command: '{{inputs.parameters.command}}'
23      jobSpec:
24        replicas: 1
25        image: registry.baidubce.com/aihc-aiak/aiak-megatron:ubuntu20.04-cu11.8-torch1.14.0-py38_v1.2.7.12_release
26        resources: []
27        envs:
28          - name: SOURCE_DATA_DIR
29            value: '{{inputs.parameters.source_data_dir}}'
30          - name: TARGET_DATA_DIR
31            value: '{{inputs.parameters.target_data_dir}}'
32      labels: []
33      datasources: 
34        - type: bos
35          name: ""
36          sourcePath: '{{inputs.parameters.bos_dir}}'
37          mountPath: /mnt/bos/data
38tasks:
39  - name: task_a
40    taskTemplateName: custom-task-demo
41    inputs:
42      - name: command
43        value: echo "This is job_a."
44      - name: source_data_dir
45        value: bos://my-bucket/datasets/source/a
46      - name: target_data_dir
47        value: bos://my-bucket/datasets/target/a
48      - name: bos_dir
49        value: bos://my-bucket/datasets/a
50  - name: task_b
51    taskTemplateName: custom-task-demo
52    inputs:
53      - name: command
54        value: echo "This is job_b."
55      - name: source_data_dir
56        value: bos://my-bucket/datasets/source/b
57      - name: target_data_dir
58        value: bos://my-bucket/datasets/target/b
59      - name: bos_dir
60        value: bos://my-bucket/datasets/b
61    dependencies:
62      - task_a

以上我们讲解了如何开发一个简单的任务模板，更多高级的任务参数你可以继续看后续任务示例模板。

任务模板示例

以下提供通用任务模板供参考,实际使用时可以根据任务参数需求自行删减参数。

通用

通用任务模板

参数说明：

参数名称	说明
command	任务的执行命令
source_data_dir	源数据集的目录
target_data_dir	数据集的保存目录
bos_dir	挂载的BOS目录

YAML

1version: v1
2kind: PipelineTemplate
3taskTemplates:
4  - name: custom-task-demo
5    type: CustomTask
6    inputs:
7      - name: command
8        type: string
9        hint: 执行命令
10      - name: source_data_dir
11        type: string
12        hint: 源数据目录
13      - name: target_data_dir
14        type: string
15        hint: 目标数据目录
16      - name: bos_dir
17        type: string
18        hint: 挂载的BOS目录
19    spec:
20      queue: aihcq-h1plvpzb5gh0
21      jobType: PyTorchJob
22      command: '{{inputs.parameters.command}}'
23      jobSpec:
24        replicas: 1
25        image: >-
26          registry.baidubce.com/aihc-aiak/aiak-megatron:ubuntu20.04-cu11.8-torch1.14.0-py38_v1.2.7.12_release
27        resources: []
28        envs:
29          - name: SOURCE_DATA_DIR
30            value: '{{inputs.parameters.source_data_dir}}'
31          - name: TARGET_DATA_DIR
32            value: '{{inputs.parameters.target_data_dir}}'
33      labels: []
34      datasources: 
35        - type: bos
36          name: ''
37          sourcePath: '{{inputs.parameters.bos_dir}}'
38          mountPath: /mnt/bos
39tasks:
40  - name: task_a
41    taskTemplateName: custom-task-demo
42    inputs:
43      - name: command
44        value: echo "This is job_a."
45      - name: source_data_dir
46        value: bos://my-bucket/datasets/source
47      - name: target_data_dir
48        value: bos://my-bucket/datasets/target
49      - name: bos_dir
50        value: bos://my-bucket/datasets

分布式训练任务模板

以下是完整的分布式训练任务参数，参数说明参考分布式训练任务的OpenAPI接口。

开发者可以基于完整参数模板开发自己训练、数据处理等业务任务模板。

YAML

1version: v1
2kind: PipelineTemplate
3taskTemplates:
4  - name: custom-task-template
5    type: CustomTask
6    spec:
7      queue: aihcq-xxxxx
8      jobType: PyTorchJob
9      command: sleep 30s
10      priority: normal
11      enableBccl: false
12      faultTolerance: true
13      faultToleranceArgs: --enable-replace=true --enable-hang-detection=true
14        --hang-detection-log-timeout-minutes=7
15        --hang-detection-startup-toleration-minutes=15
16        --hang-detection-stack-timeout-minutes=3 --max-num-of-unconditional-retry=2
17        --custom-log-patterns=timeout1 --custom-log-patterns=timeout2
18      retentionPeriod: 1d
19      jobSpec:
20        image: registry.baidubce.com/aihc-aiak/aiak-megatron:ubuntu20.04-cu11.8-torch1.14.0-py38_v1.2.7.12_release
21        imageConfig:
22          username: your-registry-username
23          password: your-registry-password
24        replicas: 2
25        resources:
26          - name: baidu.com/a800_80g_cgpu
27            quantity: 8
28          - name: cpu
29            quantity: 96
30          - name: memory
31            quantity: 512
32          - name: sharedMemory
33            quantity: 64
34        envs:
35          - name: NCCL_DEBUG
36            value: INFO
37          - name: NCCL_IB_DISABLE
38            value: "0"
39          - name: CUDA_VISIBLE_DEVICES
40            value: 0,1,2,3,4,5,6,7
41        enableRDMA: true
42        hostNetwork: false
43      labels:
44        - key: project
45          value: llm-training
46        - key: team
47          value: ai-platform
48      datasources:
49        - type: pfs
50          name: pfs-pxE6jz
51          sourcePath: /
52          mountPath: /mnt/cluster
53          options:
54            readOnly: false
55        - type: hostPath
56          name: host-data
57          sourcePath: /data/shared
58          mountPath: /mnt/host-data
59          options:
60            readOnly: true
61        - type: bos
62          name: ""
63          sourcePath: bos://my-bucket/datasets/
64          mountPath: /mnt/bos-data
65          options:
66            readOnly: true
67        - type: cfs
68          name: cfs-instance-id
69          sourcePath: /
70          mountPath: /mnt/cfs-data
71          options:
72            readOnly: false
73        - type: rapidfs
74          name: rapidfs-instance-id
75          sourcePath: /
76          mountPath: /mnt/rapidfs-data
77          options:
78            readOnly: false
79        - type: dataset
80          name: dataset-id
81          sourcePath: /
82          mountPath: /mnt/dataset
83          options:
84            readOnly: true
85      tensorboardConfig:
86        enable: true
87        logPath: /mnt/cluster/tensorboard-logs
88      alertConfig:
89        instanceId: your-cluster-monitor-instance-id
90        alertItems:
91          - jobRunning
92          - jobFT
93          - nodeFT
94          - jobFailed
95          - jobSucceed
96          - jobHang
97        for: 0m
98        notifyRuleId: notify-xxxxxxxx
99tasks:
100- name: job-3
101  taskTemplateName: custom-task-template
102- name: job-2
103  taskTemplateName: custom-task-template

在业务实践中可以使用“固定参数+自定义传参”的混合方式将常用的数据处理任务、训练任务、测评任务等固定为任务模板在不同的工作流中复用

数据集下载

从魔搭下载数据集

参数说明：

参数名称	说明
queue_id	队列ID，示例：aihcq-h1plvp
dataset_name	魔搭社区的数据集名称，示例：liucong/Chinese-DeepSeek-R1-Distill-data-110k
target_data_dir	数据集的保存目录，示例：bos://my-bucket/datasets

yaml模板：

YAML

1version: v1
2kind: PipelineTemplate
3
4taskTemplates:
5  - name: template-dataset-download-modelscope
6    type: CustomTask
7    inputs:
8      - name: queue_id
9        type: string
10        hint: 队列ID
11      - name: dataset_name
12        type: string
13        hint: 数据集名称
14      - name: target_data_dir
15        type: string
16        hint: 数据集在BOS的存储目录
17    spec:
18      queue: '{{inputs.parameters.queue_id}}'
19      jobType: PyTorchJob
20      command: |
21        #!/bin/sh
22
23        # 检查是否已安装 modelscope
24        if ! command -v modelscope >/dev/null 2>&1; then
25            echo "modelscope 未安装，正在安装..."
26            pip install --user modelscope
27            export PATH="$HOME/.local/bin:$PATH"
28        fi
29
30        # 使用 modelscope CLI 下载数据集
31        echo "正在下载数据集 $DATASET_NAME..."
32        modelscope download --dataset "$DATASET_NAME" --revision master --local_dir /mnt/bos/data
33
34        if [ $? -eq 0 ]; then
35            echo "数据集已成功下载至: /mnt/bos/data"
36        else
37            echo "下载失败！请检查网络、权限或数据集是否存在。"
38            exit 1
39        fi
40      jobSpec:
41        image: registry.baidubce.com/aihcp-public/pytorch:2.7.0-cu12.8.61-py3.12-ubuntu24.04
42        replicas: 1
43        envs:
44          - name: DATASET_NAME
45            value: '{{inputs.parameters.dataset_name}}'
46      datasources:
47        - type: bos
48          name: ""
49          sourcePath: '{{inputs.parameters.target_data_dir}}'
50          mountPath: /mnt/bos/data
51
52tasks:
53  - name: job-1
54    taskTemplateName: template-dataset-download-modelscope
55    inputs:
56      - name: queue_id
57        value: aihcq-h1plvp
58      - name: dataset_name
59        value: liucong/Chinese-DeepSeek-R1-Distill-data-110k
60      - name: target_data_dir
61        value: bos://my-bucket/datasets

从HuggingFace下载数据集

从HuggingFace下载数据集，需要自行保证集群可以正常访问HuggingFace

参数说明：

参数名称	说明
queue_id	队列ID，示例：aihcq-h1plvp
dataset_name	魔搭社区的数据集名称，示例：liucong/Chinese-DeepSeek-R1-Distill-data-110k
target_data_dir	数据集的保存目录，示例：bos://my-bucket/datasets

yaml模板：

YAML

1version: v1
2kind: PipelineTemplate
3
4taskTemplates:
5  - name: template-dataset-download-huggingface
6    type: CustomTask
7    inputs:
8      - name: queue_id
9        type: string
10        hint: 队列ID
11      - name: dataset_name
12        type: string
13        hint: 数据集名称
14      - name: target_data_dir
15        type: string
16        hint: 数据集在BOS的存储目录
17    spec:
18      queue: '{{inputs.parameters.queue_id}}'
19      jobType: PyTorchJob
20      command: |
21        #!/bin/sh
22
23        # 构建本地目标路径
24        LOCAL_DIR="/mnt/bos/data/$DATASET_NAME"
25        mkdir -p "$LOCAL_DIR"
26
27        # 检查是否已存在（简单判断）
28        if [ -f "$LOCAL_DIR/.hf_download_complete" ]; then
29            echo "数据集似乎已下载（检测到 .hf_download_complete 标记），跳过。"
30            exit 0
31        fi
32
33        # 安装 huggingface_hub（如果未安装）
34        if ! python3 -c "import huggingface_hub" &> /dev/null; then
35            echo "正在安装 huggingface_hub..."
36            pip install --quiet huggingface_hub
37        fi
38
39        # 创建临时 Python 脚本
40        PY_SCRIPT=$(cat <<EOF
41        from huggingface_hub import snapshot_download
42        import os
43
44        local_dir = os.environ['LOCAL_DIR']
45        repo_id = os.environ['DATASET_NAME']
46
47        print(f"正在从 Hugging Face 下载数据集: {repo_id}")
48        snapshot_download(
49            repo_id=repo_id,
50            local_dir=local_dir,
51            repo_type="dataset",
52            max_workers=8
53        )
54        print("下载完成！")
55        EOF
56        )
57
58        # 导出环境变量供 Python 使用
59        export LOCAL_DIR="$LOCAL_DIR"
60        export DATASET_NAME="$DATASET_NAME"
61
62        # 执行下载
63        echo "开始从 Hugging Face Hub 下载数据集..."
64        python3 -c "$PY_SCRIPT"
65
66        # 检查是否成功并创建标记文件
67        if [ $? -eq 0 ]; then
68            touch "$LOCAL_DIR/.hf_download_complete"
69            echo "数据集已保存至: $LOCAL_DIR"
70        else
71            echo "下载失败。请检查数据集 ID、网络或访问权限。"
72            exit 1
73        fi
74      jobSpec:
75        image: registry.baidubce.com/aihcp-public/pytorch:2.7.0-cu12.8.61-py3.12-ubuntu24.04
76        replicas: 1
77        envs:
78          - name: DATASET_NAME
79            value: '{{inputs.parameters.dataset_name}}'
80      datasources:
81        - type: bos
82          name: ""
83          sourcePath: '{{inputs.parameters.target_data_dir}}'
84          mountPath: /mnt/bos/data
85
86tasks:
87  - name: job-1
88    taskTemplateName: template-dataset-download-huggingface
89    inputs:
90      - name: queue_id
91        value: aihcq-h1plvp
92      - name: dataset_name
93        value: liucong/Chinese-DeepSeek-R1-Distill-data-110k
94      - name: target_data_dir
95        value: bos://my-bucket/datasets

数据转储

从BOS拉取数据到PFS

可用于从BOS动态加载冷数据到PFS上用于训练、数据处理等

参数说明：

参数名称	说明
queue_id	队列ID，示例：aihcq-h1plvp
bos_path	BOS源路径，示例：bos://my-bucket/datasets/
bos_ak	访问BOS的ak
bos_sk	访问BOS的sk
pfs_path	PFS源路径，示例：/datasets/my-dataset
pfs_id	PFS实例ID

YAML

1version: v1
2kind: PipelineTemplate
3taskTemplates:
4  - name: bos-to-pfs
5    type: CustomTask
6    inputs:
7      - name: queue_id
8        type: string
9        hint: 队列ID
10      - name: bos_path
11        type: string
12        hint: 数据集的BOS路径
13      - name: bos_ak
14        type: string
15        hint: BOS存储桶的访问ak
16      - name: bos_sk
17        type: string
18        hint: BOS存储桶的访问sk
19      - name: pfs_path
20        type: string
21        hint: PFS源路径
22      - name: pfs_id
23        type: string
24        hint: PFS实例ID
25    spec:
26      queue: '{{inputs.parameters.queue_id}}'
27      jobType: PyTorchJob
28      command: |
29        WORK_DIR="$(pwd)"
30        ZIP_URL="https://doc.bce.baidu.com/bos-optimization/mac-bcecmd-0.5.10.zip"
31        ZIP_FILE="mac-bcecmd-0.5.10.zip"
32        EXTRACT_DIR="mac-bcecmd-0.5.10"
33        echo "正在下载 bcecmd 工具..."
34        curl -LO "$ZIP_URL"
35        if [ $? -ne 0 ]; then
36            echo "下载失败！请检查网络或 URL。"
37            exit 1
38        fi
39        echo "正在解压..."
40        unzip -o "$ZIP_FILE"
41        if [ $? -ne 0 ]; then
42            echo "解压失败！"
43            exit 1
44        fi
45        cd "$EXTRACT_DIR" || { echo " 无法进入目录 $EXTRACT_DIR"; exit 1; }
46        echo " 正在创建 credentials 文件..."
47        mkdir -p /root/.go-bcecli
48        cat > ~/.go-bcecli/credentials <<EOF
49        [Defaults]
50        Ak = "$BOS_AK"
51        Sk = "$BOS_SK"
52        EOF
53        chmod 600 ~/.go-bcecli/credentials
54        echo "bcecmd工具安装及配置完成,开始下载数据"
55        ./bcecmd bos sync {{inputs.parameters.bos_path}} /mnt/cluster/dataset
56        echo "数据集下载成功，已保存到PFS的{{inputs.parameters.pfs_path}}路径"
57      jobSpec:
58        image: registry.baidubce.com/inference/aibox-ubuntu:v2.0-22.04
59        replicas: 1
60        envs:
61          - name: BOS_AK
62            value: '{{inputs.parameters.bos_ak}}'
63          - name: BOS_SK
64            value: '{{inputs.parameters.bos_sk}}'
65      datasources: 
66        - type: pfs
67          name: '{{inputs.parameters.pfs_id}}'
68          sourcePath: '{{inputs.parameters.pfs_path}}'
69          mountPath: /mnt/cluster/dataset
70tasks:
71  - name: sync-bos-to-pfs
72    taskTemplateName: bos-to-pfs
73    inputs:
74      - name: queue_id
75        value: aihcq-h1plvpzb5gh0
76      - name: bos_path
77        value: bos://my-bucket/datasets/
78      - name: bos_ak
79        value: <你的sk>
80      - name: bos_sk
81        value: <你的sk>
82      - name: pfs_path
83        value: /datasets/my-dataset
84      - name: pfs_id
85        value: pfs-xxxx

备份PFS数据到BOS

可用于动态备份PFS上的数据到BOS存储，释放PFS存储空间

参数说明：

参数名称	说明
queue_id	队列ID，示例：aihcq-h1plvp
bos_path	BOS源路径，示例：bos://my-bucket/datasets/
bos_ak	访问BOS的ak
bos_sk	访问BOS的sk
pfs_path	PFS源路径，示例：/datasets/my-dataset
pfs_id	PFS实例ID

YAML

1version: v1
2kind: PipelineTemplate
3taskTemplates:
4  - name: pfs-to-bos
5    type: CustomTask
6    inputs:
7      - name: queue_id
8        type: string
9        hint: 队列ID
10      - name: bos_path
11        type: string
12        hint: 数据集的BOS路径
13      - name: bos_ak
14        type: string
15        hint: BOS存储桶的访问ak
16      - name: bos_sk
17        type: string
18        hint: BOS存储桶的访问sk
19      - name: pfs_path
20        type: string
21        hint: PFS源路径
22      - name: pfs_id
23        type: string
24        hint: PFS实例ID
25    spec:
26      queue: '{{inputs.parameters.queue_id}}'
27      jobType: PyTorchJob
28      command: |
29        WORK_DIR="$(pwd)"
30        ZIP_URL="https://doc.bce.baidu.com/bos-optimization/mac-bcecmd-0.5.10.zip"
31        ZIP_FILE="mac-bcecmd-0.5.10.zip"
32        EXTRACT_DIR="mac-bcecmd-0.5.10"
33        echo "正在下载 bcecmd 工具..."
34        curl -LO "$ZIP_URL"
35        if [ $? -ne 0 ]; then
36            echo "下载失败！请检查网络或 URL。"
37            exit 1
38        fi
39        echo "正在解压..."
40        unzip -o "$ZIP_FILE"
41        if [ $? -ne 0 ]; then
42            echo "解压失败！"
43            exit 1
44        fi
45        cd "$EXTRACT_DIR" || { echo " 无法进入目录 $EXTRACT_DIR"; exit 1; }
46        echo " 正在创建 credentials 文件..."
47        mkdir -p /root/.go-bcecli
48        cat > ~/.go-bcecli/credentials <<EOF
49        [Defaults]
50        Ak = "$BOS_AK"
51        Sk = "$BOS_SK"
52        EOF
53        chmod 600 ~/.go-bcecli/credentials
54        echo "bcecmd工具安装及配置完成,开始上传数据"
55        ./bcecmd bos sync /mnt/cluster/dataset {{inputs.parameters.bos_path}}
56        echo "数据集上传成功，已保存到BOS的{{inputs.parameters.bos_path}}路径"
57      jobSpec:
58        image: registry.baidubce.com/inference/aibox-ubuntu:v2.0-22.04
59        replicas: 1
60        envs:
61          - name: BOS_AK
62            value: '{{inputs.parameters.bos_ak}}'
63          - name: BOS_SK
64            value: '{{inputs.parameters.bos_sk}}'
65      datasources: 
66        - type: pfs
67          name: '{{inputs.parameters.pfs_id}}'
68          sourcePath: '{{inputs.parameters.pfs_path}}'
69          mountPath: /mnt/cluster/dataset
70tasks:
71  - name: sync-pfs-to-bos
72    taskTemplateName: pfs-to-bos
73    inputs:
74      - name: queue_id
75        value: aihcq-h1plvpzb5gh0
76      - name: bos_path
77        value: bos://my-bucket/datasets/
78      - name: bos_ak
79        value: <你的sk>
80      - name: bos_sk
81        value: <你的sk>
82      - name: pfs_path
83        value: /datasets/my-dataset
84      - name: pfs_id
85        value: pfs-xxxx

清理PFS存储的数据

清理PFS上的数据，释放存储空间，如训练结束后自动清理数据集

参数说明：

参数名称	说明
queue_id	队列ID
pfs_path	PFS源路径，示例：/datasets/my-dataset
pfs_id	PFS实例ID

YAML

1version: v1
2kind: PipelineTemplate
3taskTemplates:
4  - name: pfs-remove
5    type: CustomTask
6    inputs:
7      - name: queue_id
8        type: string
9        hint: 队列ID
10      - name: pfs_id
11        type: string
12        hint: PFS实例ID
13      - name: pfs_path
14        type: string
15        hint: PFS源路径
16    spec:
17      queue: '{{inputs.parameters.queue_id}}'
18      jobType: PyTorchJob
19      command: |
20        rm -r /mnt/cluster/dataset
21        echo "数据集删除成功，已删除PFS的{{inputs.parameters.pfs_path}}路径上的数据"
22      priority: normal
23      jobSpec:
24        image: registry.baidubce.com/inference/aibox-ubuntu:v2.0-22.04
25        replicas: 1
26      datasources: 
27        - type: pfs
28          name: '{{inputs.parameters.pfs_id}}'
29          sourcePath: '{{inputs.parameters.pfs_path}}'
30          mountPath: /mnt/cluster/dataset
31tasks:
32  - name: pfs-file-remove
33    taskTemplateName: pfs-remove
34    inputs:
35      - name: queue_id
36        value: aihcq-h1plvpzb5gh0
37      - name: pfs_path
38        value: /datasets/my-dataset
39      - name: pfs_id
40        value: pfs-xxxx

更多模板持续更新中...

评价此篇文章

有帮助没帮助

运行工作流

工作流模板示例

百度智能云

百度百舸 · AI计算平台

百度百舸 · AI计算平台

开发任务模板

任务模板开发

在分布式训练任务中跑通任务

设计任务模板参数

在工作流中使用任务模板

任务模板示例

通用

通用任务模板

分布式训练任务模板

数据集下载

从魔搭下载数据集

从HuggingFace下载数据集

数据转储

从BOS拉取数据到PFS

备份PFS数据到BOS

清理PFS存储的数据