查询训练任务事件
更新时间:2025-05-23
描述
获取一个任务系统事件。
请求结构
Bash
1GET /api/v1/aijobs/{jobId}/events
2Host:aihc.bj.baidubce.com
3Authorization:authorization string
4ContentType: application/json
请求头域
除公共头域外,无其它特殊头域。
请求参数
参数名称 | 类型 | 是否必须 | 参数位置 | 说明 |
---|---|---|---|---|
resourcePoolId | String | 是 | Query 参数 | 标识资源池的唯一标识符 |
jobId | String | 是 | Path 参数 | 训练任务ID |
jobFramework | String | 是 | Query 参数 | 训练任务框架类型,当前支持 "PyTorchJob" |
startTime | String | 否 | Query 参数 | 获取任务事件的起始时间,默认为任务创建时间(unix时间戳) |
endTime | String | 否 | Query 参数 | 获取任务事件的结束时间,默认为 now (unix时间戳) |
返回头域
除公共头域,无其它特殊头域。
返回参数
参数名称 | 类型 | 说明 |
---|---|---|
events | Array of Event | 事件列表 |
total | Number | 事件的总数 |
返回示例
JSON
1{
2 "events": [
3 {
4 "reason": "JobTerminated",
5 "message": "Job has been terminated. Deleting PodGroup",
6 "firstTimestamp": "2024-07-15 16:52:50 +0000 UTC",
7 "lastTimestamp": "2024-07-15 16:52:50 +0000 UTC",
8 "count": 4,
9 "type": "Normal"
10 },
11 {
12 "reason": "SuccessfulDeletePodGroup",
13 "message": "Deleted PodGroup: test-api-llama2-7b-4",
14 "firstTimestamp": "2024-07-15 16:52:50 +0000 UTC",
15 "lastTimestamp": "2024-07-15 16:52:50 +0000 UTC",
16 "count": 4,
17 "type": "Normal"
18 },
19 {
20 "reason": "ExitedWithCode",
21 "message": "Pod: default.test-api-llama2-7b-4-master-0 exited with code 1",
22 "firstTimestamp": "2024-07-15 16:52:41 +0000 UTC",
23 "lastTimestamp": "2024-07-15 16:52:50 +0000 UTC",
24 "count": 2,
25 "type": "Normal"
26 },
27 {
28 "reason": "FailedToStartFaultTolerance",
29 "message": "Pytorchjob: test-api-llama2-7b-4, failed to start fault tolerance。Reason:check all nodes are healthy。",
30 "firstTimestamp": "2024-07-15 16:52:50 +0000 UTC",
31 "lastTimestamp": "2024-07-15 16:52:50 +0000 UTC",
32 "count": 1,
33 "type": "Warning"
34 },
35 {
36 "reason": "JobFailed",
37 "message": "PyTorchJob test-api-llama2-7b-4 is failed because 1 Master replica(s) failed.",
38 "firstTimestamp": "2024-07-15 16:52:50 +0000 UTC",
39 "lastTimestamp": "2024-07-15 16:52:50 +0000 UTC",
40 "count": 1,
41 "type": "Normal"
42 },
43 {
44 "reason": "Error",
45 "message": "Error pod test-api-llama2-7b-4-master-0 container pytorch exitCode: 1 terminated message: ",
46 "firstTimestamp": "2024-07-15 16:52:41 +0000 UTC",
47 "lastTimestamp": "2024-07-15 16:52:41 +0000 UTC",
48 "count": 1,
49 "type": "Warning"
50 },
51 {
52 "reason": "SuccessfulCreateService",
53 "message": "Created service: test-api-llama2-7b-4-master-0",
54 "firstTimestamp": "2024-07-15 12:47:04 +0000 UTC",
55 "lastTimestamp": "2024-07-15 12:47:04 +0000 UTC",
56 "count": 1,
57 "type": "Normal"
58 },
59 {
60 "reason": "SuccessfulCreatePod",
61 "message": "Created pod: test-api-llama2-7b-4-master-0",
62 "firstTimestamp": "2024-07-15 12:47:04 +0000 UTC",
63 "lastTimestamp": "2024-07-15 12:47:04 +0000 UTC",
64 "count": 1,
65 "type": "Normal"
66 }
67 ],
68 "total": 8
69}