开启智能预测功能
Sugar BI 的智能预测功能包含内置预测服务、BML/EasyDL 预测服务、训练预测服务三部分:
内置预测服无需进行额外部署,在 SaaS 高级版、私有部署 2 账号以上版本可用;
训练预测服务部分只在私有部署版本中提供,须购买包含智能预测功能的 License。然后按照下面步骤部署模型训练和推理服务后即可使用;
BML/EasyDL 预测服务在 SaaS 高级版中支持。在私有部署版本中则和训练预测服务一致,须购买包含智能预测功能的 License。
前期准备
智能预测模型的储存依赖对象存储,该部分与自助取数功能复用,如果您已经完成了自助取数功能的部署和开启,则不需要额外配置;如果未完成,请参照开启自助取数功能进行配置,并完成该文档最后的部署验证部分。
下面进行预测模型的训练和推理服务的部署。
安装环境
模型训练和推理服务建议部署在与 Sugar BI 本体独立的服务器上,安装环境要求如下:
软件环境
Docker
需要您的机器安装Docker
,推荐版本为v17
及以上,Sugar BI可以直接运行在单机 Docker 环境上,如果需要集群化、高可用,可以使用 Docker 自带的 swarm 或者 kubernetes;Sugar BI的安装部署过程中需要您对 Docker 的基础知识有所了解,详见Docker 官网。
Docker 的安装建议参考官方文档,可以安装在Centos、Ubuntu、Docker Desktop on Windows、Mac等系统上。Windows 上的 Docker 启动需要使用「以管理员身份运行」。
硬件环境
如果训练的数据量不大(同时进行训练的任务总数据量在 10G 以下)且不严格要求高可用,单机运行即可,否则视具体情况推荐使用 kubernetes 集群方式部署(建议最少 3 台机器);每台机器的配置建议:内存(最低 32G)、CPU(16 核或以上)、硬盘(200GB SSD 或以上)。
下载
需要下载训练服务和推理服务两个镜像,我们提供了两种获取安装包镜像的方式:
1、百度智能云的 docker 镜像服务
镜像已经上传到百度智能云 docker 镜像服务,如果机器可以访问公网,可以使用以下命令获取:
1docker pull registry.baidubce.com/sugarbi/sugar-ml-train:1.1.1
2docker tag registry.baidubce.com/sugarbi/sugar-ml-train:1.1.1 sugarbi/sugar-ml-train:1.1.1
3docker pull registry.baidubce.com/sugarbi/sugar-ml-predict:1.0.1
4docker tag registry.baidubce.com/sugarbi/sugar-ml-predict:1.0.1 sugarbi/sugar-ml-predict:1.0.1
5// 查看刚刚拉取的镜像
6docker images
另外,如果您的服务器 CPU、操作系统是国产化的,那么你可能需要 ARM 架构的镜像,命令如下:
1docker pull registry.baidubce.com/sugarbi/sugar-ml-train-arm64:1.1.1
2docker tag registry.baidubce.com/sugarbi/sugar-ml-train-arm64:1.1.1 sugarbi/sugar-ml-train-arm64:1.1.1
3docker pull registry.baidubce.com/sugarbi/sugar-ml-predict-arm64:1.0.1
4docker tag registry.baidubce.com/sugarbi/sugar-ml-predict-arm64:1.0.1 sugarbi/sugar-ml-predict-arm64:1.0.1
5// 查看刚刚拉取的镜像
6docker images
2、直接在线下载
如果您的部署机器不能连接互联网,您可以在联网的机器上访问下列链接下载镜像(如果鼠标点击下载不了,可以复制该地址在浏览器中打开),然后将下载的安装包拷贝到部署机器上,注意:如果您需要同时部署到多台机器上,需要将下载的安装包文件拷贝到所有部署机器上,并且以下的操作也需要在所有部署机器上执行。
点击这里下载: https://sugar-docker-image.cdn.bcebos.com/sugarbi-ml-train-1.1.1.tar.gz
点击这里下载: https://sugar-docker-image.cdn.bcebos.com/sugarbi-ml-predict-1.0.1.tar.gz
在部署的机器上,进入到您拷贝的安装包的同目录下,
1docker load -i ./sugarbi-ml-train-1.1.1.tar.gz
2docker load -i ./sugarbi-ml-predict-1.0.1.tar.gz
3// 上面的命令会执行数十秒,然后执行以下命令查看刚刚load的镜像
4docker images
另外,如果您的服务器 CPU、操作系统是国产化的,那么你可能需要 ARM 架构的镜像:
点击这里下载: https://sugar-docker-image.cdn.bcebos.com/sugarbi-ml-train-arm64-1.1.1.tar.gz
点击这里下载: https://sugar-docker-image.cdn.bcebos.com/sugarbi-ml-predict-arm64-1.0.1.tar.gz
在部署的机器上,进入到您拷贝的安装包的同目录下,
1docker load -i ./sugarbi-ml-train-arm64-1.1.1.tar.gz
2docker load -i ./sugarbi-ml-predict-arm64-1.0.1.tar.gz
3// 上面的命令会执行数十秒,然后执行以下命令查看刚刚load的镜像
4docker images
运行训练和推理服务
首先检查两个镜像是否成功加载,执行docker images
都能看到如下展示即证明下载成功:
注意如果使用 ARM 版本镜像,镜像名称会有 -arm64 后缀,下面的运行命令中也需要做相应调整。
安装有两种方式分别对应 Docker 单机模式和多机集群模式:
一、Docker 单机运行
运行训练服务镜像
运行下面的命令即可在 8090 端口上启用训练服务,您也可以调整 8090 为其他端口。
1docker run --shm-size=4g --restart unless-stopped -d -p 8090:54321 --name sugar-ml-train sugarbi/sugar-ml-train:1.1.1
运行推理服务镜像
- 1、新建一个名为
env
的文件(文件名没有后缀,就是 env),并用文本编辑器打开,复制以下的内容并填写相应部分:
1# 自助取数中使用的对象储存的类型:minio 或 bos
2sugar_ml_model_storage_type=bos
3# 对象存储中为预测模型分配的 bucket
4sugar_ml_model_bucket=sugar-ml
5
6# 如果对象存储的类型是 bos,配置下面 3 个环境变量
7sugar_ml_bos_ak=
8sugar_ml_bos_sk=
9sugar_ml_bos_endpoint=http://bj.bcebos.com
10
11# 如果对象存储的类型是 minio,配置下面 3 个环境变量
12sugar_ml_minio_ak=
13sugar_ml_minio_sk=
14sugar_ml_minio_endpoint=
- 2、在
env
文件的目录下执行以下命令来启动推理服务
为了更快速的推理,需要提供一个文件夹来储存预测模型的缓存,在宿主机新建文件夹~/sugar-ml-predict
(当然目录名称您可以随意修改,同步替换下面命令中的即可):
1docker run --ulimit nofile=65100:65100 --restart unless-stopped -d -p 8091:8111 --name sugar-ml-predict -v ~/sugar-ml-predict:/sugar-predict --env-file env sugarbi/sugar-ml-predict:1.0.1
其中 8091 是本地端口,可以修改成自己想要的端口。
二、kubernetes 集群部署
集群方式部署时,建议使用 3 台机器,训练和预测服务都部署 3 个以上 pod ,均匀分布在各机器上。
部署推理服务
部署推理服务,新建一个predict.yaml
文件,并用文本编辑器打开,复制以下的内容并填写相应部分:
1apiVersion: v1
2kind: Namespace
3metadata:
4 name: sugar-ml
5---
6kind: Deployment
7apiVersion: apps/v1
8metadata:
9 labels:
10 app: sugar-ml-predict
11 name: sugar-ml-predict
12 namespace: sugar-ml
13spec:
14 selector:
15 matchLabels:
16 app: sugar-ml-predict
17 replicas: 3 # pod 数目,可根据实际情况调整
18 template:
19 metadata:
20 labels:
21 app: sugar-ml-predict
22 spec:
23 volumes:
24 - name: sugar-predict
25 hostPath:
26 path: /sugar-ml-predict # 模型缓存挂载的volume,可以根据实际情况修改
27 containers:
28 - name: sugar-ml-predict
29 image: registry.baidubce.com/sugarbi/sugar-ml-predict:1.0.1
30 ports:
31 - containerPort: 8111
32 volumeMounts:
33 - name: sugar-predict
34 mountPath: /sugar-predict
35 livenessProbe:
36 httpGet:
37 path: /predict
38 port: 8111
39 initialDelaySeconds: 30
40 timeoutSeconds: 5
41 env:
42 - name: sugar_ml_model_storage_type # 自助取数中使用的对象储存的类型:minio 或 bos
43 value: bos
44 - name: sugar_ml_model_bucket # 对象存储中为预测模型分配的 bucket
45 value: sugar-ml
46
47 - name: sugar_ml_bos_ak # 如果对象存储的类型是 bos,配置下面 3 个环境变量
48 value:
49 - name: sugar_ml_bos_sk
50 value:
51 - name: sugar_ml_bos_endpoint
52 value: http://bj.bcebos.com
53
54 - name: sugar_ml_minio_ak # 如果对象存储的类型是 minio,配置下面 3 个环境变量
55 value:
56 - name: sugar_ml_minio_sk
57 value:
58 - name: sugar_ml_minio_endpoint
59 value:
60
61---
62kind: Service
63apiVersion: v1
64metadata:
65 labels:
66 app: sugar-ml-predict
67 name: sugar-ml-predict-service
68 namespace: sugar-ml
69spec:
70 ports:
71 - port: 8111
72 targetPort: 8111
73 nodePort: 32590 # 推理服务端口号,可根据实际情况调整
74 type: NodePort
75 selector:
76 app: sugar-ml-predict
然后在该文件所在目录执行:
1kubectl apply -f predict.yaml
这样就将推理服务部署在了 32590 端口,可根据实际情况调整端口号。
通过下面命令查看节点状态:
1kubectl get pods -A
等待全部节点运行状态都变为 Ready 即可。例如这里部署了 3 个 POD:
部署 Ingress
- 1、训练服务需要通过 Ingress 暴露出去,如果您的 k8s 中有自己的 Ingress 服务可以跳过本步骤,在 Ingress 中将下一节部署的训练服务 Service
sugar-ml-service
暴露出去即可。
如果没有已有的 Ingress ,这里给出 Ingress 的一种安装方法,Ingress 有很多种实现,也可使用您熟悉的版本。
新建一个ingress-nginx.yaml
文件,并用文本编辑器打开,复制以下的内容并填写相应部分:
1apiVersion: v1
2kind: Namespace
3metadata:
4 name: ingress-nginx
5 labels:
6 app.kubernetes.io/name: ingress-nginx
7 app.kubernetes.io/instance: ingress-nginx
8
9---
10# Source: ingress-nginx/templates/controller-serviceaccount.yaml
11apiVersion: v1
12kind: ServiceAccount
13metadata:
14 labels:
15 helm.sh/chart: ingress-nginx-4.0.15
16 app.kubernetes.io/name: ingress-nginx
17 app.kubernetes.io/instance: ingress-nginx
18 app.kubernetes.io/version: 1.1.1
19 app.kubernetes.io/managed-by: Helm
20 app.kubernetes.io/component: controller
21 name: ingress-nginx
22 namespace: ingress-nginx
23automountServiceAccountToken: true
24---
25# Source: ingress-nginx/templates/controller-configmap.yaml
26apiVersion: v1
27kind: ConfigMap
28metadata:
29 labels:
30 helm.sh/chart: ingress-nginx-4.0.15
31 app.kubernetes.io/name: ingress-nginx
32 app.kubernetes.io/instance: ingress-nginx
33 app.kubernetes.io/version: 1.1.1
34 app.kubernetes.io/managed-by: Helm
35 app.kubernetes.io/component: controller
36 name: ingress-nginx-controller
37 namespace: ingress-nginx
38data:
39 allow-snippet-annotations: 'true'
40---
41# Source: ingress-nginx/templates/clusterrole.yaml
42apiVersion: rbac.authorization.k8s.io/v1
43kind: ClusterRole
44metadata:
45 labels:
46 helm.sh/chart: ingress-nginx-4.0.15
47 app.kubernetes.io/name: ingress-nginx
48 app.kubernetes.io/instance: ingress-nginx
49 app.kubernetes.io/version: 1.1.1
50 app.kubernetes.io/managed-by: Helm
51 name: ingress-nginx
52rules:
53 - apiGroups:
54 - ''
55 resources:
56 - configmaps
57 - endpoints
58 - nodes
59 - pods
60 - secrets
61 - namespaces
62 verbs:
63 - list
64 - watch
65 - apiGroups:
66 - ''
67 resources:
68 - nodes
69 verbs:
70 - get
71 - apiGroups:
72 - ''
73 resources:
74 - services
75 verbs:
76 - get
77 - list
78 - watch
79 - apiGroups:
80 - networking.k8s.io
81 resources:
82 - ingresses
83 verbs:
84 - get
85 - list
86 - watch
87 - apiGroups:
88 - ''
89 resources:
90 - events
91 verbs:
92 - create
93 - patch
94 - apiGroups:
95 - networking.k8s.io
96 resources:
97 - ingresses/status
98 verbs:
99 - update
100 - apiGroups:
101 - networking.k8s.io
102 resources:
103 - ingressclasses
104 verbs:
105 - get
106 - list
107 - watch
108---
109# Source: ingress-nginx/templates/clusterrolebinding.yaml
110apiVersion: rbac.authorization.k8s.io/v1
111kind: ClusterRoleBinding
112metadata:
113 labels:
114 helm.sh/chart: ingress-nginx-4.0.15
115 app.kubernetes.io/name: ingress-nginx
116 app.kubernetes.io/instance: ingress-nginx
117 app.kubernetes.io/version: 1.1.1
118 app.kubernetes.io/managed-by: Helm
119 name: ingress-nginx
120roleRef:
121 apiGroup: rbac.authorization.k8s.io
122 kind: ClusterRole
123 name: ingress-nginx
124subjects:
125 - kind: ServiceAccount
126 name: ingress-nginx
127 namespace: ingress-nginx
128---
129# Source: ingress-nginx/templates/controller-role.yaml
130apiVersion: rbac.authorization.k8s.io/v1
131kind: Role
132metadata:
133 labels:
134 helm.sh/chart: ingress-nginx-4.0.15
135 app.kubernetes.io/name: ingress-nginx
136 app.kubernetes.io/instance: ingress-nginx
137 app.kubernetes.io/version: 1.1.1
138 app.kubernetes.io/managed-by: Helm
139 app.kubernetes.io/component: controller
140 name: ingress-nginx
141 namespace: ingress-nginx
142rules:
143 - apiGroups:
144 - ''
145 resources:
146 - namespaces
147 verbs:
148 - get
149 - apiGroups:
150 - ''
151 resources:
152 - configmaps
153 - pods
154 - secrets
155 - endpoints
156 verbs:
157 - get
158 - list
159 - watch
160 - apiGroups:
161 - ''
162 resources:
163 - services
164 verbs:
165 - get
166 - list
167 - watch
168 - apiGroups:
169 - networking.k8s.io
170 resources:
171 - ingresses
172 verbs:
173 - get
174 - list
175 - watch
176 - apiGroups:
177 - networking.k8s.io
178 resources:
179 - ingresses/status
180 verbs:
181 - update
182 - apiGroups:
183 - networking.k8s.io
184 resources:
185 - ingressclasses
186 verbs:
187 - get
188 - list
189 - watch
190 - apiGroups:
191 - ''
192 resources:
193 - configmaps
194 resourceNames:
195 - ingress-controller-leader
196 verbs:
197 - get
198 - update
199 - apiGroups:
200 - ''
201 resources:
202 - configmaps
203 verbs:
204 - create
205 - apiGroups:
206 - ''
207 resources:
208 - events
209 verbs:
210 - create
211 - patch
212---
213# Source: ingress-nginx/templates/controller-rolebinding.yaml
214apiVersion: rbac.authorization.k8s.io/v1
215kind: RoleBinding
216metadata:
217 labels:
218 helm.sh/chart: ingress-nginx-4.0.15
219 app.kubernetes.io/name: ingress-nginx
220 app.kubernetes.io/instance: ingress-nginx
221 app.kubernetes.io/version: 1.1.1
222 app.kubernetes.io/managed-by: Helm
223 app.kubernetes.io/component: controller
224 name: ingress-nginx
225 namespace: ingress-nginx
226roleRef:
227 apiGroup: rbac.authorization.k8s.io
228 kind: Role
229 name: ingress-nginx
230subjects:
231 - kind: ServiceAccount
232 name: ingress-nginx
233 namespace: ingress-nginx
234---
235# Source: ingress-nginx/templates/controller-service-webhook.yaml
236apiVersion: v1
237kind: Service
238metadata:
239 labels:
240 helm.sh/chart: ingress-nginx-4.0.15
241 app.kubernetes.io/name: ingress-nginx
242 app.kubernetes.io/instance: ingress-nginx
243 app.kubernetes.io/version: 1.1.1
244 app.kubernetes.io/managed-by: Helm
245 app.kubernetes.io/component: controller
246 name: ingress-nginx-controller-admission
247 namespace: ingress-nginx
248spec:
249 type: ClusterIP
250 ports:
251 - name: https-webhook
252 port: 443
253 targetPort: webhook
254 appProtocol: https
255 selector:
256 app.kubernetes.io/name: ingress-nginx
257 app.kubernetes.io/instance: ingress-nginx
258 app.kubernetes.io/component: controller
259---
260# Source: ingress-nginx/templates/controller-service.yaml
261apiVersion: v1
262kind: Service
263metadata:
264 annotations:
265 labels:
266 helm.sh/chart: ingress-nginx-4.0.15
267 app.kubernetes.io/name: ingress-nginx
268 app.kubernetes.io/instance: ingress-nginx
269 app.kubernetes.io/version: 1.1.1
270 app.kubernetes.io/managed-by: Helm
271 app.kubernetes.io/component: controller
272 name: ingress-nginx-controller
273 namespace: ingress-nginx
274spec:
275 type: NodePort
276 externalTrafficPolicy: Local
277 ipFamilyPolicy: SingleStack
278 ipFamilies:
279 - IPv4
280 ports:
281 - name: http
282 port: 80
283 protocol: TCP
284 targetPort: http
285 appProtocol: http
286 nodePort: 32591 # Ingress HTTP 服务的端口号,可根据情况调整
287 - name: https
288 port: 443
289 protocol: TCP
290 targetPort: https
291 appProtocol: https
292 nodePort: 32592 # Ingress HTTPS 服务的端口号,可根据情况调整
293 selector:
294 app.kubernetes.io/name: ingress-nginx
295 app.kubernetes.io/instance: ingress-nginx
296 app.kubernetes.io/component: controller
297---
298# Source: ingress-nginx/templates/controller-deployment.yaml
299apiVersion: apps/v1
300kind: Deployment
301metadata:
302 labels:
303 helm.sh/chart: ingress-nginx-4.0.15
304 app.kubernetes.io/name: ingress-nginx
305 app.kubernetes.io/instance: ingress-nginx
306 app.kubernetes.io/version: 1.1.1
307 app.kubernetes.io/managed-by: Helm
308 app.kubernetes.io/component: controller
309 name: ingress-nginx-controller
310 namespace: ingress-nginx
311spec:
312 selector:
313 matchLabels:
314 app.kubernetes.io/name: ingress-nginx
315 app.kubernetes.io/instance: ingress-nginx
316 app.kubernetes.io/component: controller
317 revisionHistoryLimit: 10
318 minReadySeconds: 0
319 template:
320 metadata:
321 labels:
322 app.kubernetes.io/name: ingress-nginx
323 app.kubernetes.io/instance: ingress-nginx
324 app.kubernetes.io/component: controller
325 spec:
326 dnsPolicy: ClusterFirst
327 containers:
328 - name: controller
329 image: registry.cn-hangzhou.aliyuncs.com/google_containers/nginx-ingress-controller:v1.1.1
330 imagePullPolicy: IfNotPresent
331 lifecycle:
332 preStop:
333 exec:
334 command:
335 - /wait-shutdown
336 args:
337 - /nginx-ingress-controller
338 - --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
339 - --election-id=ingress-controller-leader
340 - --controller-class=k8s.io/ingress-nginx
341 - --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
342 - --validating-webhook=:8443
343 - --validating-webhook-certificate=/usr/local/certificates/cert
344 - --validating-webhook-key=/usr/local/certificates/key
345 securityContext:
346 capabilities:
347 drop:
348 - ALL
349 add:
350 - NET_BIND_SERVICE
351 runAsUser: 101
352 allowPrivilegeEscalation: true
353 env:
354 - name: POD_NAME
355 valueFrom:
356 fieldRef:
357 fieldPath: metadata.name
358 - name: POD_NAMESPACE
359 valueFrom:
360 fieldRef:
361 fieldPath: metadata.namespace
362 - name: LD_PRELOAD
363 value: /usr/local/lib/libmimalloc.so
364 livenessProbe:
365 failureThreshold: 5
366 httpGet:
367 path: /healthz
368 port: 10254
369 scheme: HTTP
370 initialDelaySeconds: 10
371 periodSeconds: 10
372 successThreshold: 1
373 timeoutSeconds: 1
374 readinessProbe:
375 failureThreshold: 3
376 httpGet:
377 path: /healthz
378 port: 10254
379 scheme: HTTP
380 initialDelaySeconds: 10
381 periodSeconds: 10
382 successThreshold: 1
383 timeoutSeconds: 1
384 ports:
385 - name: http
386 containerPort: 80
387 hostPort: 80
388 protocol: TCP
389 - name: https
390 hostPort: 443
391 containerPort: 443
392 protocol: TCP
393 - name: webhook
394 containerPort: 8443
395 protocol: TCP
396 volumeMounts:
397 - name: webhook-cert
398 mountPath: /usr/local/certificates/
399 readOnly: true
400 resources:
401 requests:
402 cpu: 100m
403 memory: 90Mi
404 nodeSelector:
405 kubernetes.io/os: linux
406 serviceAccountName: ingress-nginx
407 terminationGracePeriodSeconds: 300
408 volumes:
409 - name: webhook-cert
410 secret:
411 secretName: ingress-nginx-admission
412---
413# Source: ingress-nginx/templates/controller-ingressclass.yaml
414# We don't support namespaced ingressClass yet
415# So a ClusterRole and a ClusterRoleBinding is required
416apiVersion: networking.k8s.io/v1
417kind: IngressClass
418metadata:
419 labels:
420 helm.sh/chart: ingress-nginx-4.0.15
421 app.kubernetes.io/name: ingress-nginx
422 app.kubernetes.io/instance: ingress-nginx
423 app.kubernetes.io/version: 1.1.1
424 app.kubernetes.io/managed-by: Helm
425 app.kubernetes.io/component: controller
426 name: nginx
427 namespace: ingress-nginx
428spec:
429 controller: k8s.io/ingress-nginx
430---
431# Source: ingress-nginx/templates/admission-webhooks/validating-webhook.yaml
432# before changing this value, check the required kubernetes version
433# https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#prerequisites
434apiVersion: admissionregistration.k8s.io/v1
435kind: ValidatingWebhookConfiguration
436metadata:
437 labels:
438 helm.sh/chart: ingress-nginx-4.0.15
439 app.kubernetes.io/name: ingress-nginx
440 app.kubernetes.io/instance: ingress-nginx
441 app.kubernetes.io/version: 1.1.1
442 app.kubernetes.io/managed-by: Helm
443 app.kubernetes.io/component: admission-webhook
444 name: ingress-nginx-admission
445webhooks:
446 - name: validate.nginx.ingress.kubernetes.io
447 matchPolicy: Equivalent
448 rules:
449 - apiGroups:
450 - networking.k8s.io
451 apiVersions:
452 - v1
453 operations:
454 - CREATE
455 - UPDATE
456 resources:
457 - ingresses
458 failurePolicy: Fail
459 sideEffects: None
460 admissionReviewVersions:
461 - v1
462 clientConfig:
463 service:
464 namespace: ingress-nginx
465 name: ingress-nginx-controller-admission
466 path: /networking/v1/ingresses
467---
468# Source: ingress-nginx/templates/admission-webhooks/job-patch/serviceaccount.yaml
469apiVersion: v1
470kind: ServiceAccount
471metadata:
472 name: ingress-nginx-admission
473 namespace: ingress-nginx
474 annotations:
475 helm.sh/hook: pre-install,pre-upgrade,post-install,post-upgrade
476 helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
477 labels:
478 helm.sh/chart: ingress-nginx-4.0.15
479 app.kubernetes.io/name: ingress-nginx
480 app.kubernetes.io/instance: ingress-nginx
481 app.kubernetes.io/version: 1.1.1
482 app.kubernetes.io/managed-by: Helm
483 app.kubernetes.io/component: admission-webhook
484---
485# Source: ingress-nginx/templates/admission-webhooks/job-patch/clusterrole.yaml
486apiVersion: rbac.authorization.k8s.io/v1
487kind: ClusterRole
488metadata:
489 name: ingress-nginx-admission
490 annotations:
491 helm.sh/hook: pre-install,pre-upgrade,post-install,post-upgrade
492 helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
493 labels:
494 helm.sh/chart: ingress-nginx-4.0.15
495 app.kubernetes.io/name: ingress-nginx
496 app.kubernetes.io/instance: ingress-nginx
497 app.kubernetes.io/version: 1.1.1
498 app.kubernetes.io/managed-by: Helm
499 app.kubernetes.io/component: admission-webhook
500rules:
501 - apiGroups:
502 - admissionregistration.k8s.io
503 resources:
504 - validatingwebhookconfigurations
505 verbs:
506 - get
507 - update
508---
509# Source: ingress-nginx/templates/admission-webhooks/job-patch/clusterrolebinding.yaml
510apiVersion: rbac.authorization.k8s.io/v1
511kind: ClusterRoleBinding
512metadata:
513 name: ingress-nginx-admission
514 annotations:
515 helm.sh/hook: pre-install,pre-upgrade,post-install,post-upgrade
516 helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
517 labels:
518 helm.sh/chart: ingress-nginx-4.0.15
519 app.kubernetes.io/name: ingress-nginx
520 app.kubernetes.io/instance: ingress-nginx
521 app.kubernetes.io/version: 1.1.1
522 app.kubernetes.io/managed-by: Helm
523 app.kubernetes.io/component: admission-webhook
524roleRef:
525 apiGroup: rbac.authorization.k8s.io
526 kind: ClusterRole
527 name: ingress-nginx-admission
528subjects:
529 - kind: ServiceAccount
530 name: ingress-nginx-admission
531 namespace: ingress-nginx
532---
533# Source: ingress-nginx/templates/admission-webhooks/job-patch/role.yaml
534apiVersion: rbac.authorization.k8s.io/v1
535kind: Role
536metadata:
537 name: ingress-nginx-admission
538 namespace: ingress-nginx
539 annotations:
540 helm.sh/hook: pre-install,pre-upgrade,post-install,post-upgrade
541 helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
542 labels:
543 helm.sh/chart: ingress-nginx-4.0.15
544 app.kubernetes.io/name: ingress-nginx
545 app.kubernetes.io/instance: ingress-nginx
546 app.kubernetes.io/version: 1.1.1
547 app.kubernetes.io/managed-by: Helm
548 app.kubernetes.io/component: admission-webhook
549rules:
550 - apiGroups:
551 - ''
552 resources:
553 - secrets
554 verbs:
555 - get
556 - create
557---
558# Source: ingress-nginx/templates/admission-webhooks/job-patch/rolebinding.yaml
559apiVersion: rbac.authorization.k8s.io/v1
560kind: RoleBinding
561metadata:
562 name: ingress-nginx-admission
563 namespace: ingress-nginx
564 annotations:
565 helm.sh/hook: pre-install,pre-upgrade,post-install,post-upgrade
566 helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
567 labels:
568 helm.sh/chart: ingress-nginx-4.0.15
569 app.kubernetes.io/name: ingress-nginx
570 app.kubernetes.io/instance: ingress-nginx
571 app.kubernetes.io/version: 1.1.1
572 app.kubernetes.io/managed-by: Helm
573 app.kubernetes.io/component: admission-webhook
574roleRef:
575 apiGroup: rbac.authorization.k8s.io
576 kind: Role
577 name: ingress-nginx-admission
578subjects:
579 - kind: ServiceAccount
580 name: ingress-nginx-admission
581 namespace: ingress-nginx
582---
583# Source: ingress-nginx/templates/admission-webhooks/job-patch/job-createSecret.yaml
584apiVersion: batch/v1
585kind: Job
586metadata:
587 name: ingress-nginx-admission-create
588 namespace: ingress-nginx
589 annotations:
590 helm.sh/hook: pre-install,pre-upgrade
591 helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
592 labels:
593 helm.sh/chart: ingress-nginx-4.0.15
594 app.kubernetes.io/name: ingress-nginx
595 app.kubernetes.io/instance: ingress-nginx
596 app.kubernetes.io/version: 1.1.1
597 app.kubernetes.io/managed-by: Helm
598 app.kubernetes.io/component: admission-webhook
599spec:
600 template:
601 metadata:
602 name: ingress-nginx-admission-create
603 labels:
604 helm.sh/chart: ingress-nginx-4.0.15
605 app.kubernetes.io/name: ingress-nginx
606 app.kubernetes.io/instance: ingress-nginx
607 app.kubernetes.io/version: 1.1.1
608 app.kubernetes.io/managed-by: Helm
609 app.kubernetes.io/component: admission-webhook
610 spec:
611 containers:
612 - name: create
613 image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-webhook-certgen:v1.1.1
614 imagePullPolicy: IfNotPresent
615 args:
616 - create
617 - --host=ingress-nginx-controller-admission,ingress-nginx-controller-admission.$(POD_NAMESPACE).svc
618 - --namespace=$(POD_NAMESPACE)
619 - --secret-name=ingress-nginx-admission
620 env:
621 - name: POD_NAMESPACE
622 valueFrom:
623 fieldRef:
624 fieldPath: metadata.namespace
625 securityContext:
626 allowPrivilegeEscalation: false
627 restartPolicy: OnFailure
628 serviceAccountName: ingress-nginx-admission
629 nodeSelector:
630 kubernetes.io/os: linux
631 securityContext:
632 runAsNonRoot: true
633 runAsUser: 2000
634---
635# Source: ingress-nginx/templates/admission-webhooks/job-patch/job-patchWebhook.yaml
636apiVersion: batch/v1
637kind: Job
638metadata:
639 name: ingress-nginx-admission-patch
640 namespace: ingress-nginx
641 annotations:
642 helm.sh/hook: post-install,post-upgrade
643 helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
644 labels:
645 helm.sh/chart: ingress-nginx-4.0.15
646 app.kubernetes.io/name: ingress-nginx
647 app.kubernetes.io/instance: ingress-nginx
648 app.kubernetes.io/version: 1.1.1
649 app.kubernetes.io/managed-by: Helm
650 app.kubernetes.io/component: admission-webhook
651spec:
652 template:
653 metadata:
654 name: ingress-nginx-admission-patch
655 labels:
656 helm.sh/chart: ingress-nginx-4.0.15
657 app.kubernetes.io/name: ingress-nginx
658 app.kubernetes.io/instance: ingress-nginx
659 app.kubernetes.io/version: 1.1.1
660 app.kubernetes.io/managed-by: Helm
661 app.kubernetes.io/component: admission-webhook
662 spec:
663 containers:
664 - name: patch
665 image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-webhook-certgen:v1.1.1
666 imagePullPolicy: IfNotPresent
667 args:
668 - patch
669 - --webhook-name=ingress-nginx-admission
670 - --namespace=$(POD_NAMESPACE)
671 - --patch-mutating=false
672 - --secret-name=ingress-nginx-admission
673 - --patch-failure-policy=Fail
674 env:
675 - name: POD_NAMESPACE
676 valueFrom:
677 fieldRef:
678 fieldPath: metadata.namespace
679 securityContext:
680 allowPrivilegeEscalation: false
681 restartPolicy: OnFailure
682 serviceAccountName: ingress-nginx-admission
683 nodeSelector:
684 kubernetes.io/os: linux
685 securityContext:
686 runAsNonRoot: true
687 runAsUser: 2000
这个文件中将 Ingress 的 HTTP 服务暴露在 32591 端口,将 HTTPS 服务暴露在 32592 端口。可根据实际情况修改,文件的其余部分基本不需要修改。
然后在该文件所在目录执行:
1kubectl apply -f ingress-nginx.yaml
通过下面命令查看节点状态:
1kubectl get pods -A
等待 ingress controller 节点运行状态变为 Ready 即可:
- 2、配置训练服务在 Ingress 中的转发规则。
新建一个ingress-http.yaml
文件,并用文本编辑器打开,复制以下的内容并填写相应部分:
1apiVersion: networking.k8s.io/v1
2kind: Ingress
3metadata:
4 name: sugar-ml-ingress
5 namespace: sugar-ml
6 annotations:
7 kubernetes.io/ingress.class: 'nginx'
8 nginx.ingress.kubernetes.io/proxy-body-size: 2048m
9spec:
10 rules:
11 - http:
12 paths:
13 - path: /
14 pathType: Prefix
15 backend:
16 service:
17 name: sugar-ml-service
18 port:
19 number: 54321
然后在该文件所在目录执行:
1kubectl apply -f ingress-http.yaml
部署训练服务
部署训练服务,新建一个train.yaml
文件,并用文本编辑器打开,复制以下的内容并填写相应部分:
1apiVersion: apps/v1
2kind: StatefulSet
3metadata:
4 name: sugar-ml-stateful-set
5 namespace: sugar-ml
6spec:
7 serviceName: sugar-ml-service
8 podManagementPolicy: 'Parallel'
9 replicas: 3 # pod 数目,需要与 node 数目一致,一个 node 上部署一个 pod
10 selector:
11 matchLabels:
12 app: sugar-ml-k8s
13 template:
14 metadata:
15 labels:
16 app: sugar-ml-k8s
17 spec:
18 affinity:
19 podAntiAffinity:
20 requiredDuringSchedulingIgnoredDuringExecution:
21 - labelSelector:
22 matchExpressions:
23 - key: app
24 operator: In
25 values:
26 - sugar-ml-k8s
27 topologyKey: 'kubernetes.io/hostname'
28 containers:
29 - name: sugar-ml-k8s
30 image: 'registry.baidubce.com/sugarbi/sugar-ml-train:1.1.1'
31 resources:
32 requests:
33 memory: '4Gi'
34 ports:
35 - containerPort: 54321
36 protocol: TCP
37 readinessProbe:
38 httpGet:
39 path: /kubernetes/isLeaderNode
40 port: 8080
41 initialDelaySeconds: 5
42 periodSeconds: 5
43 failureThreshold: 1
44 livenessProbe:
45 exec:
46 command:
47 - /bin/sh
48 - -c
49 - sh liveness.sh
50 failureThreshold: 5
51 initialDelaySeconds: 200
52 periodSeconds: 10
53 successThreshold: 1
54 timeoutSeconds: 10
55 env:
56 - name: sugar_ml_service_dns
57 value: sugar-ml-service.sugar-ml.svc.cluster.local
58 - name: sugar_ml_lookup_timeout
59 value: '1800'
60 - name: sugar_ml_node_count
61 value: '3' # 与上面 replicas 值对应
62 - name: sugar_ml_service_endpoint
63 value: http://ingress-nginx-controller.ingress-nginx # 访问上一节中配置的 Ingress http 端口的地址。也可以是 http://XXX.XXX.XX.XX:32591,将 XXX 替换为机器 IP 即可
64---
65apiVersion: v1
66kind: Service
67metadata:
68 name: sugar-ml-service
69 namespace: sugar-ml
70spec:
71 type: ClusterIP
72 clusterIP: None
73 selector:
74 app: sugar-ml-k8s
75 ports:
76 - protocol: TCP
77 port: 54321
然后在该文件所在目录执行:
1kubectl apply -f train.yaml
通过下面命令查看节点状态:
1kubectl get pods -A
等待全部节点运行状态都变为 Running,其中 1 个节点 Ready 即可。例如这里部署了 3 个 POD:
在 Sugar BI 中开启智能预测功能
配置环境变量并重启
训练和推理服务运行成功之后,就可以在 Sugar BI 中来启用该功能了,需要在 Sugar BI 本身的 env 配置文件中增加以下配置(同理,Swarm、Kubernetes 部署方式也是同样的添加相应的环境变量配置即可)。
1# 训练服务的地址,XXXXXX 需要替换为训练服务所在的机器地址
2# 如果是按照上面文档的 Docker 方式部署,端口号一般为 8090
3# 如果是按照上面文档的 Kubernetes 方式部署,端口号一般为 32591
4sugar_ml_train_endpoint=http://XXXXXX:8090
5# 推理服务的地址,XXXXXX 需要替换为推理服务所在的机器地址
6# 如果是按照上面文档的 Docker 方式部署,端口号一般为 8091
7# 如果是按照上面文档的 Kubernetes 方式部署,端口号一般为 32590
8sugar_ml_predict_endpoint=http://XXXXXX:8091/predict
9
10# 预测模型在对象存储中的 bucket,注意和上面推理服务中的 sugar_ml_model_bucket 对应,一般为 sugar-ml
11sugar_ml_model_bucket=sugar-ml
12# 同时进行训练的任务的最大数目,根据机器配置和训练数据量配置
13export sugar_ml_max_running_train_num=10
env 文件更新之后需要重启 Sugar BI
验证部署
重启完成后,可进入「组织管理」->「外部模块部署验证」进行验证。
功能使用
智能预测功能的使用可参考 模型训练