创建GPU实例
更新时间:2024-09-25
本文介绍如何创建并使用BCI GPU实例。
BCI GPU规格说明
BCI提供以下型号 GPU Pod 规格,不同的 GPU 卡型号和大小会对应不同的 CPU、内存选项,请在创建工作负载时根据您的实际需求选择最合适规格,并进行资源分配。
BCI 规格名称 | CPU | 内存 | GPU类型 | 显存 | GPU卡数 |
bci.gna2.c8m36.1a10 | 8 | 36 | Nvidia A10 PCIE | 24*1 | 1 |
bci.gna2.c18m74.1a10 | 18 | 74 | Nvidia A10 PCIE | 24*1 | 1 |
bci.gna2.c30m118.2a10 | 30 | 118 | Nvidia A10 PCIE | 24*2 | 2 |
bci.gna2.c62m240.4a10 | 62 | 240 | Nvidia A10 PCIE | 24*4 | 4 |
创建实例
配置说明如下
- 指定GPU型号,需要在 Annotation 指定 GPU型号,请注意, Annotation需要配置在Pod Spec中,而不是Deployment Spec中。
annotations:
bci.virtual-kubelet.io/bci-gpu-type: "Nvidia A10 PCIE"
- 指定资源配置:GPU卡数、CPU和内存数量
resources:
limits:
nvidia.com/gpu: 1 # GPU卡数
cpu: 8 # CPU核数
memory: 36Gi # MEM数量
requests:
nvidia.com/gpu: 1 # GPU卡数
cpu: 8 # CPU核数
memory: 36Gi # MEM数量
完整业务YAML示例
apiVersion: apps/v1
kind: Deployment
metadata:
name: spot-deployment-test-gpu-wzy
labels:
run: ooo
spec:
replicas: 1
selector:
matchLabels:
run: ooo
template:
metadata:
creationTimestamp: null
labels:
run: ooo
annotations:
bci.virtual-kubelet.io/bci-gpu-type: "Nvidia A10 PCIE"
bci.virtual-kubelet.io/bci-logical-zone: "zoneF" # 填写对应资源的可用区
bci.virtual-kubelet.io/bci-subnet-id: "xxxxxx" # 子网需要和可用区对应
name: spot-deployment-test-wzy-bid
spec:
volumes:
- name: podinfo
downwardAPI:
defaultMode: 420
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.labels['mylabel']
path: mylabel
- fieldRef:
apiVersion: v1
fieldPath: metadata.annotations['myannotation']
path: myannotation
- fieldRef:
apiVersion: v1
fieldPath: metadata.labels
path: labels
- fieldRef:
apiVersion: v1
fieldPath: metadata.annotations
path: annotations
- path: workload_cpu_limit
resourceFieldRef:
containerName: ooo1
divisor: 1m
resource: limits.cpu
- path: workload_cpu_request
resourceFieldRef:
containerName: ooo1
divisor: 1m
resource: requests.cpu
- path: workload_mem_limit
resourceFieldRef:
containerName: ooo1
divisor: 1Mi
resource: limits.memory
- path: workload_mem_request
resourceFieldRef:
containerName: ooo1
divisor: 1Mi
resource: requests.memory
nodeSelector:
type: "virtual-kubelet"
tolerations:
- key: "virtual-kubelet.io/provider"
operator: "Equal"
value: "baidu"
effect: "NoSchedule"
hostAliases:
- ip: "127.0.0.1"
hostnames:
- "foo.local"
- "bar.local"
- ip: "10.1.2.3"
hostnames:
- "foo.remote"
- "bar.remote"
containers:
- image: hub.baidubce.com/cce/nginx-alpine-go
name: ooo1
env:
- name: "MY_CPU_LIMIT"
valueFrom:
resourceFieldRef:
containerName: ooo1
resource: limits.cpu
- name: "MY_CPU_REQ"
valueFrom:
resourceFieldRef:
containerName: ooo1
resource: requests.cpu
- name: "MY_IP"
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
volumeMounts:
- name: podinfo
mountPath: /etc/podinfo
resources:
limits:
nvidia.com/gpu: 1 # GPU卡数
cpu: 8 # CPU核数
memory: 36Gi # MEM数量
requests:
nvidia.com/gpu: 1 # GPU卡数
cpu: 8 # CPU核数
memory: 36Gi # MEM数量