部署 TensorFlow Serving 推理服务
更新时间:2022-12-01
本文介绍如何部署 TensorFlow Serving 推理服务,并指定队列、GPU资源。
前提条件
- 您已成功安装 CCE GPU Manager 和 CCE AI Job Scheduler 组件,否则云原生 AI 功能将无法使用。
操作步骤示例
这里用 TensorFlow Serving 作为示例,演示如何通过 deployment 部署推理服务。
-
部署 TensorFlow Serving 推理服务
- 指定使用 default 队列:scheduling.volcano.sh/queue-name: default
- 申请 1张GPU卡的50%的算力,10Gi显存
- 调度器指定为 volcano (必须)
参考 yaml 如下:
apiVersion: apps/v1
kind: Deployment
metadata:
name: gpu-demo
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: gpu-demo
template:
metadata:
annotations:
scheduling.volcano.sh/queue-name: default
labels:
app: gpu-demo
spec:
containers:
- image: registry.baidubce.com/cce-public/tensorflow-serving:demo-gpu
imagePullPolicy: Always
name: gpu-demo
env:
- name: MODEL_NAME
value: half_plus_two
ports:
- containerPort: 8501
resources:
limits:
cpu: "2"
memory: 2Gi
baidu.com/v100_32g_cgpu: "1"
baidu.com/v100_32g_cgpu_core: "50"
baidu.com/v100_32g_cgpu_memory: "10"
requests:
cpu: "2"
memory: 2Gi
baidu.com/v100_32g_cgpu: "1"
baidu.com/v100_32g_cgpu_core: "50"
baidu.com/v100_32g_cgpu_memory: "10"
# if gpu core isolation is enabled, set the following preStop hook for graceful shutdown.
# `tf_serving_entrypoint.sh` needs to be replaced with the name of your gpu process.
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "kill -10 `ps -ef | grep tf_serving_entrypoint.sh | grep -v grep | awk '{print $2}'` && sleep 1"]
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: volcano
- 执行以下命令,查看任务运行状态
kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
gpu-demo 1/1 1 1 30s
kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
gpu-demo-65767d67cc-xhdgg 1/1 Running 0 63s 172.23.1.86 192.168.48.8 <none> <none>
- 验证 Tensorflow 推理服务是否可用
# 需替换 <172.23.1.86> 为实际 pod ip
curl -d '{"instances": [1.0, 2.0, 5.0]}' -X POST http://172.23.1.86:8501/v1/models/half_plus_two:predict
# 输出类似如下结果:
{
"predictions": [2.5, 3.0, 4.5]
}
队列使用说明
可通过 annotations 指定队列
annotations:
scheduling.volcano.sh/queue-name: <队列名称>
资源申请说明
单卡独占示例
resources:
requests:
baidu.com/v100_32g_cgpu: 1 // 1卡
cpu: "4"
memory: 6Gi
limits:
baidu.com/v100_32g_cgpu: 1 // limit与request必须一致
cpu: "4"
memory: 6Gi
多卡独占示例:
resources:
requests:
baidu.com/v100_32g_cgpu: 2 // 2卡
cpu: "4"
memory: 6Gi
limits:
baidu.com/v100_32g_cgpu: 2 // limit与request必须一致
cpu: "4"
memory: 6Gi
单卡共享【不进行算力隔离,只有显存隔离】示例:
resources:
requests:
baidu.com/v100_32g_cgpu: 1 // 1卡
baidu.com/v100_32g_cgpu_memory: 10 // 10GB
cpu: "4"
memory: 6Gi
limits:
baidu.com/v100_32g_cgpu: 1 // limit与request必须一致
baidu.com/v100_32g_cgpu_memory: 10
cpu: "4"
memory: 6Gi
单卡共享【同时支持显存隔离和算力隔离】示例:
resources:
requests:
baidu.com/v100_32g_cgpu: 1 // 1卡
baidu.com/v100_32g_cgpu_core: 50 // 50%, 0.5卡算力
baidu.com/v100_32g_cgpu_memory: 10 // 10GB
cpu: "4"
memory: 6Gi
limits:
baidu.com/v100_32g_cgpu: 1 // limit与request必须一致
baidu.com/v100_32g_cgpu_core: 50
baidu.com/v100_32g_cgpu_memory: 10
cpu: "4"
memory: 6Gi
GPU卡类型和资源名称对比关系
目前以下类型的GPU支持显存和算力的共享与隔离:
GPU卡型号 | 资源名称 |
---|---|
Tesla V100-SXM2-16GB | baidu.com/v100_16g_cgpu |
Tesla V100-SXM2-32GB | baidu.com/v100_32g_cgpu |
Tesla T4 | baidu.com/t4_16g_cgpu |