查询训练任务事件
更新时间:2024-12-27
描述
获取一个任务系统事件。
请求结构
GET /api/v1/aijobs/{jobId}/events
Host:aihc.bj.baidubce.com
Authorization:authorization string
ContentType: application/json
请求头域
除公共头域外,无其它特殊头域。
请求参数
参数名称 | 类型 | 是否必须 | 参数位置 | 说明 |
---|---|---|---|---|
resourcePoolId | String | 是 | Query 参数 | 标识资源池的唯一标识符 |
jobId | String | 是 | Path 参数 | 训练任务ID |
jobFramework | String | 是 | Query 参数 | 训练任务框架类型,当前支持 "PyTorchJob" |
startTime | String | 否 | Query 参数 | 获取任务事件的起始时间,默认为任务创建时间 |
endTime | String | 否 | Query 参数 | 获取任务事件的结束时间,默认为 now |
返回头域
除公共头域,无其它特殊头域。
返回参数
参数名称 | 类型 | 说明 |
---|---|---|
events | Array of Event | 事件列表 |
total | Number | 事件的总数 |
返回示例
{
"events": [
{
"reason": "JobTerminated",
"message": "Job has been terminated. Deleting PodGroup",
"firstTimestamp": "2024-07-15 16:52:50 +0000 UTC",
"lastTimestamp": "2024-07-15 16:52:50 +0000 UTC",
"count": 4,
"type": "Normal"
},
{
"reason": "SuccessfulDeletePodGroup",
"message": "Deleted PodGroup: test-api-llama2-7b-4",
"firstTimestamp": "2024-07-15 16:52:50 +0000 UTC",
"lastTimestamp": "2024-07-15 16:52:50 +0000 UTC",
"count": 4,
"type": "Normal"
},
{
"reason": "ExitedWithCode",
"message": "Pod: default.test-api-llama2-7b-4-master-0 exited with code 1",
"firstTimestamp": "2024-07-15 16:52:41 +0000 UTC",
"lastTimestamp": "2024-07-15 16:52:50 +0000 UTC",
"count": 2,
"type": "Normal"
},
{
"reason": "FailedToStartFaultTolerance",
"message": "Pytorchjob: test-api-llama2-7b-4, failed to start fault tolerance。Reason:check all nodes are healthy。",
"firstTimestamp": "2024-07-15 16:52:50 +0000 UTC",
"lastTimestamp": "2024-07-15 16:52:50 +0000 UTC",
"count": 1,
"type": "Warning"
},
{
"reason": "JobFailed",
"message": "PyTorchJob test-api-llama2-7b-4 is failed because 1 Master replica(s) failed.",
"firstTimestamp": "2024-07-15 16:52:50 +0000 UTC",
"lastTimestamp": "2024-07-15 16:52:50 +0000 UTC",
"count": 1,
"type": "Normal"
},
{
"reason": "Error",
"message": "Error pod test-api-llama2-7b-4-master-0 container pytorch exitCode: 1 terminated message: ",
"firstTimestamp": "2024-07-15 16:52:41 +0000 UTC",
"lastTimestamp": "2024-07-15 16:52:41 +0000 UTC",
"count": 1,
"type": "Warning"
},
{
"reason": "SuccessfulCreateService",
"message": "Created service: test-api-llama2-7b-4-master-0",
"firstTimestamp": "2024-07-15 12:47:04 +0000 UTC",
"lastTimestamp": "2024-07-15 12:47:04 +0000 UTC",
"count": 1,
"type": "Normal"
},
{
"reason": "SuccessfulCreatePod",
"message": "Created pod: test-api-llama2-7b-4-master-0",
"firstTimestamp": "2024-07-15 12:47:04 +0000 UTC",
"lastTimestamp": "2024-07-15 12:47:04 +0000 UTC",
"count": 1,
"type": "Normal"
}
],
"total": 8
}