开启智能预测功能
Sugar BI 的智能预测功能包含内置预测服务、BML/EasyDL 预测服务、训练预测服务三部分:
内置预测服无需进行额外部署,在 SaaS 高级版、私有部署 2 账号以上版本可用;
训练预测服务部分只在私有部署版本中提供,须购买包含智能预测功能的 License。然后按照下面步骤部署模型训练和推理服务后即可使用;
BML/EasyDL 预测服务在 SaaS 高级版中支持。在私有部署版本中则和训练预测服务一致,须购买包含智能预测功能的 License。
前期准备
智能预测模型的储存依赖对象存储,该部分与自助取数功能复用,如果您已经完成了自助取数功能的部署和开启,则不需要额外配置;如果未完成,请参照开启自助取数功能进行配置,并完成该文档最后的部署验证部分。
下面进行预测模型的训练和推理服务的部署。
安装环境
模型训练和推理服务建议部署在与 Sugar BI 本体独立的服务器上,安装环境要求如下:
软件环境
Docker
需要您的机器安装Docker
,推荐版本为v17
及以上,Sugar BI可以直接运行在单机 Docker 环境上,如果需要集群化、高可用,可以使用 Docker 自带的 swarm 或者 kubernetes;Sugar BI的安装部署过程中需要您对 Docker 的基础知识有所了解,详见Docker 官网。
Docker 的安装建议参考官方文档,可以安装在Centos、Ubuntu、Docker Desktop on Windows、Mac等系统上。Windows 上的 Docker 启动需要使用「以管理员身份运行」。
硬件环境
如果训练的数据量不大(同时进行训练的任务总数据量在 10G 以下)且不严格要求高可用,单机运行即可,否则视具体情况推荐使用 kubernetes 集群方式部署(建议最少 3 台机器);每台机器的配置建议:内存(最低 32G)、CPU(16 核或以上)、硬盘(200GB SSD 或以上)。
下载
需要下载训练服务和推理服务两个镜像,我们提供了两种获取安装包镜像的方式:
1、百度智能云的 docker 镜像服务
镜像已经上传到百度智能云 docker 镜像服务,如果机器可以访问公网,可以使用以下命令获取:
docker pull registry.baidubce.com/sugarbi/sugar-ml-train:1.1.1
docker tag registry.baidubce.com/sugarbi/sugar-ml-train:1.1.1 sugarbi/sugar-ml-train:1.1.1
docker pull registry.baidubce.com/sugarbi/sugar-ml-predict:1.0.1
docker tag registry.baidubce.com/sugarbi/sugar-ml-predict:1.0.1 sugarbi/sugar-ml-predict:1.0.1
// 查看刚刚拉取的镜像
docker images
另外,如果您的服务器 CPU、操作系统是国产化的,那么你可能需要 ARM 架构的镜像,命令如下:
docker pull registry.baidubce.com/sugarbi/sugar-ml-train-arm64:1.1.1
docker tag registry.baidubce.com/sugarbi/sugar-ml-train-arm64:1.1.1 sugarbi/sugar-ml-train-arm64:1.1.1
docker pull registry.baidubce.com/sugarbi/sugar-ml-predict-arm64:1.0.1
docker tag registry.baidubce.com/sugarbi/sugar-ml-predict-arm64:1.0.1 sugarbi/sugar-ml-predict-arm64:1.0.1
// 查看刚刚拉取的镜像
docker images
2、直接在线下载
如果您的部署机器不能连接互联网,您可以在联网的机器上访问下列链接下载镜像(如果鼠标点击下载不了,可以复制该地址在浏览器中打开),然后将下载的安装包拷贝到部署机器上,注意:如果您需要同时部署到多台机器上,需要将下载的安装包文件拷贝到所有部署机器上,并且以下的操作也需要在所有部署机器上执行。
点击这里下载: https://sugar-docker-image.cdn.bcebos.com/sugarbi-ml-train-1.1.1.tar.gz
点击这里下载: https://sugar-docker-image.cdn.bcebos.com/sugarbi-ml-predict-1.0.1.tar.gz
在部署的机器上,进入到您拷贝的安装包的同目录下,
docker load -i ./sugarbi-ml-train-1.1.1.tar.gz
docker load -i ./sugarbi-ml-predict-1.0.1.tar.gz
// 上面的命令会执行数十秒,然后执行以下命令查看刚刚load的镜像
docker images
另外,如果您的服务器 CPU、操作系统是国产化的,那么你可能需要 ARM 架构的镜像:
点击这里下载: https://sugar-docker-image.cdn.bcebos.com/sugarbi-ml-train-arm64-1.1.1.tar.gz
点击这里下载: https://sugar-docker-image.cdn.bcebos.com/sugarbi-ml-predict-arm64-1.0.1.tar.gz
在部署的机器上,进入到您拷贝的安装包的同目录下,
docker load -i ./sugarbi-ml-train-arm64-1.1.1.tar.gz
docker load -i ./sugarbi-ml-predict-arm64-1.0.1.tar.gz
// 上面的命令会执行数十秒,然后执行以下命令查看刚刚load的镜像
docker images
运行训练和推理服务
首先检查两个镜像是否成功加载,执行docker images
都能看到如下展示即证明下载成功:
注意如果使用 ARM 版本镜像,镜像名称会有 -arm64 后缀,下面的运行命令中也需要做相应调整。
安装有两种方式分别对应 Docker 单机模式和多机集群模式:
一、Docker 单机运行
运行训练服务镜像
运行下面的命令即可在 8090 端口上启用训练服务,您也可以调整 8090 为其他端口。
docker run --shm-size=4g --restart unless-stopped -d -p 8090:54321 --name sugar-ml-train sugarbi/sugar-ml-train:1.1.1
运行推理服务镜像
- 1、新建一个名为
env
的文件(文件名没有后缀,就是 env),并用文本编辑器打开,复制以下的内容并填写相应部分:
# 自助取数中使用的对象储存的类型:minio 或 bos
sugar_ml_model_storage_type=bos
# 对象存储中为预测模型分配的 bucket
sugar_ml_model_bucket=sugar-ml
# 如果对象存储的类型是 bos,配置下面 3 个环境变量
sugar_ml_bos_ak=
sugar_ml_bos_sk=
sugar_ml_bos_endpoint=http://bj.bcebos.com
# 如果对象存储的类型是 minio,配置下面 3 个环境变量
sugar_ml_minio_ak=
sugar_ml_minio_sk=
sugar_ml_minio_endpoint=
- 2、在
env
文件的目录下执行以下命令来启动推理服务
为了更快速的推理,需要提供一个文件夹来储存预测模型的缓存,在宿主机新建文件夹~/sugar-ml-predict
(当然目录名称您可以随意修改,同步替换下面命令中的即可):
docker run --ulimit nofile=65100:65100 --restart unless-stopped -d -p 8091:8111 --name sugar-ml-predict -v ~/sugar-ml-predict:/sugar-predict --env-file env sugarbi/sugar-ml-predict:1.0.1
其中 8091 是本地端口,可以修改成自己想要的端口。
二、kubernetes 集群部署
集群方式部署时,建议使用 3 台机器,训练和预测服务都部署 3 个以上 pod ,均匀分布在各机器上。
部署推理服务
部署推理服务,新建一个predict.yaml
文件,并用文本编辑器打开,复制以下的内容并填写相应部分:
apiVersion: v1
kind: Namespace
metadata:
name: sugar-ml
---
kind: Deployment
apiVersion: apps/v1
metadata:
labels:
app: sugar-ml-predict
name: sugar-ml-predict
namespace: sugar-ml
spec:
selector:
matchLabels:
app: sugar-ml-predict
replicas: 3 # pod 数目,可根据实际情况调整
template:
metadata:
labels:
app: sugar-ml-predict
spec:
volumes:
- name: sugar-predict
hostPath:
path: /sugar-ml-predict # 模型缓存挂载的volume,可以根据实际情况修改
containers:
- name: sugar-ml-predict
image: registry.baidubce.com/sugarbi/sugar-ml-predict:1.0.1
ports:
- containerPort: 8111
volumeMounts:
- name: sugar-predict
mountPath: /sugar-predict
livenessProbe:
httpGet:
path: /predict
port: 8111
initialDelaySeconds: 30
timeoutSeconds: 5
env:
- name: sugar_ml_model_storage_type # 自助取数中使用的对象储存的类型:minio 或 bos
value: bos
- name: sugar_ml_model_bucket # 对象存储中为预测模型分配的 bucket
value: sugar-ml
- name: sugar_ml_bos_ak # 如果对象存储的类型是 bos,配置下面 3 个环境变量
value:
- name: sugar_ml_bos_sk
value:
- name: sugar_ml_bos_endpoint
value: http://bj.bcebos.com
- name: sugar_ml_minio_ak # 如果对象存储的类型是 minio,配置下面 3 个环境变量
value:
- name: sugar_ml_minio_sk
value:
- name: sugar_ml_minio_endpoint
value:
---
kind: Service
apiVersion: v1
metadata:
labels:
app: sugar-ml-predict
name: sugar-ml-predict-service
namespace: sugar-ml
spec:
ports:
- port: 8111
targetPort: 8111
nodePort: 32590 # 推理服务端口号,可根据实际情况调整
type: NodePort
selector:
app: sugar-ml-predict
然后在该文件所在目录执行:
kubectl apply -f predict.yaml
这样就将推理服务部署在了 32590 端口,可根据实际情况调整端口号。
通过下面命令查看节点状态:
kubectl get pods -A
等待全部节点运行状态都变为 Ready 即可。例如这里部署了 3 个 POD:
部署 Ingress
- 1、训练服务需要通过 Ingress 暴露出去,如果您的 k8s 中有自己的 Ingress 服务可以跳过本步骤,在 Ingress 中将下一节部署的训练服务 Service
sugar-ml-service
暴露出去即可。
如果没有已有的 Ingress ,这里给出 Ingress 的一种安装方法,Ingress 有很多种实现,也可使用您熟悉的版本。
新建一个ingress-nginx.yaml
文件,并用文本编辑器打开,复制以下的内容并填写相应部分:
apiVersion: v1
kind: Namespace
metadata:
name: ingress-nginx
labels:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
---
# Source: ingress-nginx/templates/controller-serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
helm.sh/chart: ingress-nginx-4.0.15
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 1.1.1
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx
namespace: ingress-nginx
automountServiceAccountToken: true
---
# Source: ingress-nginx/templates/controller-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
labels:
helm.sh/chart: ingress-nginx-4.0.15
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 1.1.1
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx-controller
namespace: ingress-nginx
data:
allow-snippet-annotations: 'true'
---
# Source: ingress-nginx/templates/clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
helm.sh/chart: ingress-nginx-4.0.15
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 1.1.1
app.kubernetes.io/managed-by: Helm
name: ingress-nginx
rules:
- apiGroups:
- ''
resources:
- configmaps
- endpoints
- nodes
- pods
- secrets
- namespaces
verbs:
- list
- watch
- apiGroups:
- ''
resources:
- nodes
verbs:
- get
- apiGroups:
- ''
resources:
- services
verbs:
- get
- list
- watch
- apiGroups:
- networking.k8s.io
resources:
- ingresses
verbs:
- get
- list
- watch
- apiGroups:
- ''
resources:
- events
verbs:
- create
- patch
- apiGroups:
- networking.k8s.io
resources:
- ingresses/status
verbs:
- update
- apiGroups:
- networking.k8s.io
resources:
- ingressclasses
verbs:
- get
- list
- watch
---
# Source: ingress-nginx/templates/clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
helm.sh/chart: ingress-nginx-4.0.15
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 1.1.1
app.kubernetes.io/managed-by: Helm
name: ingress-nginx
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: ingress-nginx
subjects:
- kind: ServiceAccount
name: ingress-nginx
namespace: ingress-nginx
---
# Source: ingress-nginx/templates/controller-role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
labels:
helm.sh/chart: ingress-nginx-4.0.15
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 1.1.1
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx
namespace: ingress-nginx
rules:
- apiGroups:
- ''
resources:
- namespaces
verbs:
- get
- apiGroups:
- ''
resources:
- configmaps
- pods
- secrets
- endpoints
verbs:
- get
- list
- watch
- apiGroups:
- ''
resources:
- services
verbs:
- get
- list
- watch
- apiGroups:
- networking.k8s.io
resources:
- ingresses
verbs:
- get
- list
- watch
- apiGroups:
- networking.k8s.io
resources:
- ingresses/status
verbs:
- update
- apiGroups:
- networking.k8s.io
resources:
- ingressclasses
verbs:
- get
- list
- watch
- apiGroups:
- ''
resources:
- configmaps
resourceNames:
- ingress-controller-leader
verbs:
- get
- update
- apiGroups:
- ''
resources:
- configmaps
verbs:
- create
- apiGroups:
- ''
resources:
- events
verbs:
- create
- patch
---
# Source: ingress-nginx/templates/controller-rolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
helm.sh/chart: ingress-nginx-4.0.15
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 1.1.1
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx
namespace: ingress-nginx
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: ingress-nginx
subjects:
- kind: ServiceAccount
name: ingress-nginx
namespace: ingress-nginx
---
# Source: ingress-nginx/templates/controller-service-webhook.yaml
apiVersion: v1
kind: Service
metadata:
labels:
helm.sh/chart: ingress-nginx-4.0.15
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 1.1.1
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx-controller-admission
namespace: ingress-nginx
spec:
type: ClusterIP
ports:
- name: https-webhook
port: 443
targetPort: webhook
appProtocol: https
selector:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/component: controller
---
# Source: ingress-nginx/templates/controller-service.yaml
apiVersion: v1
kind: Service
metadata:
annotations:
labels:
helm.sh/chart: ingress-nginx-4.0.15
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 1.1.1
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx-controller
namespace: ingress-nginx
spec:
type: NodePort
externalTrafficPolicy: Local
ipFamilyPolicy: SingleStack
ipFamilies:
- IPv4
ports:
- name: http
port: 80
protocol: TCP
targetPort: http
appProtocol: http
nodePort: 32591 # Ingress HTTP 服务的端口号,可根据情况调整
- name: https
port: 443
protocol: TCP
targetPort: https
appProtocol: https
nodePort: 32592 # Ingress HTTPS 服务的端口号,可根据情况调整
selector:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/component: controller
---
# Source: ingress-nginx/templates/controller-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
helm.sh/chart: ingress-nginx-4.0.15
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 1.1.1
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx-controller
namespace: ingress-nginx
spec:
selector:
matchLabels:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/component: controller
revisionHistoryLimit: 10
minReadySeconds: 0
template:
metadata:
labels:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/component: controller
spec:
dnsPolicy: ClusterFirst
containers:
- name: controller
image: registry.cn-hangzhou.aliyuncs.com/google_containers/nginx-ingress-controller:v1.1.1
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
exec:
command:
- /wait-shutdown
args:
- /nginx-ingress-controller
- --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
- --election-id=ingress-controller-leader
- --controller-class=k8s.io/ingress-nginx
- --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
- --validating-webhook=:8443
- --validating-webhook-certificate=/usr/local/certificates/cert
- --validating-webhook-key=/usr/local/certificates/key
securityContext:
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE
runAsUser: 101
allowPrivilegeEscalation: true
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: LD_PRELOAD
value: /usr/local/lib/libmimalloc.so
livenessProbe:
failureThreshold: 5
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
ports:
- name: http
containerPort: 80
hostPort: 80
protocol: TCP
- name: https
hostPort: 443
containerPort: 443
protocol: TCP
- name: webhook
containerPort: 8443
protocol: TCP
volumeMounts:
- name: webhook-cert
mountPath: /usr/local/certificates/
readOnly: true
resources:
requests:
cpu: 100m
memory: 90Mi
nodeSelector:
kubernetes.io/os: linux
serviceAccountName: ingress-nginx
terminationGracePeriodSeconds: 300
volumes:
- name: webhook-cert
secret:
secretName: ingress-nginx-admission
---
# Source: ingress-nginx/templates/controller-ingressclass.yaml
# We don't support namespaced ingressClass yet
# So a ClusterRole and a ClusterRoleBinding is required
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
labels:
helm.sh/chart: ingress-nginx-4.0.15
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 1.1.1
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: nginx
namespace: ingress-nginx
spec:
controller: k8s.io/ingress-nginx
---
# Source: ingress-nginx/templates/admission-webhooks/validating-webhook.yaml
# before changing this value, check the required kubernetes version
# https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#prerequisites
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
labels:
helm.sh/chart: ingress-nginx-4.0.15
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 1.1.1
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
name: ingress-nginx-admission
webhooks:
- name: validate.nginx.ingress.kubernetes.io
matchPolicy: Equivalent
rules:
- apiGroups:
- networking.k8s.io
apiVersions:
- v1
operations:
- CREATE
- UPDATE
resources:
- ingresses
failurePolicy: Fail
sideEffects: None
admissionReviewVersions:
- v1
clientConfig:
service:
namespace: ingress-nginx
name: ingress-nginx-controller-admission
path: /networking/v1/ingresses
---
# Source: ingress-nginx/templates/admission-webhooks/job-patch/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: ingress-nginx-admission
namespace: ingress-nginx
annotations:
helm.sh/hook: pre-install,pre-upgrade,post-install,post-upgrade
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
labels:
helm.sh/chart: ingress-nginx-4.0.15
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 1.1.1
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
---
# Source: ingress-nginx/templates/admission-webhooks/job-patch/clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: ingress-nginx-admission
annotations:
helm.sh/hook: pre-install,pre-upgrade,post-install,post-upgrade
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
labels:
helm.sh/chart: ingress-nginx-4.0.15
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 1.1.1
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
rules:
- apiGroups:
- admissionregistration.k8s.io
resources:
- validatingwebhookconfigurations
verbs:
- get
- update
---
# Source: ingress-nginx/templates/admission-webhooks/job-patch/clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: ingress-nginx-admission
annotations:
helm.sh/hook: pre-install,pre-upgrade,post-install,post-upgrade
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
labels:
helm.sh/chart: ingress-nginx-4.0.15
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 1.1.1
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: ingress-nginx-admission
subjects:
- kind: ServiceAccount
name: ingress-nginx-admission
namespace: ingress-nginx
---
# Source: ingress-nginx/templates/admission-webhooks/job-patch/role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: ingress-nginx-admission
namespace: ingress-nginx
annotations:
helm.sh/hook: pre-install,pre-upgrade,post-install,post-upgrade
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
labels:
helm.sh/chart: ingress-nginx-4.0.15
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 1.1.1
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
rules:
- apiGroups:
- ''
resources:
- secrets
verbs:
- get
- create
---
# Source: ingress-nginx/templates/admission-webhooks/job-patch/rolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: ingress-nginx-admission
namespace: ingress-nginx
annotations:
helm.sh/hook: pre-install,pre-upgrade,post-install,post-upgrade
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
labels:
helm.sh/chart: ingress-nginx-4.0.15
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 1.1.1
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: ingress-nginx-admission
subjects:
- kind: ServiceAccount
name: ingress-nginx-admission
namespace: ingress-nginx
---
# Source: ingress-nginx/templates/admission-webhooks/job-patch/job-createSecret.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: ingress-nginx-admission-create
namespace: ingress-nginx
annotations:
helm.sh/hook: pre-install,pre-upgrade
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
labels:
helm.sh/chart: ingress-nginx-4.0.15
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 1.1.1
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
spec:
template:
metadata:
name: ingress-nginx-admission-create
labels:
helm.sh/chart: ingress-nginx-4.0.15
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 1.1.1
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
spec:
containers:
- name: create
image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-webhook-certgen:v1.1.1
imagePullPolicy: IfNotPresent
args:
- create
- --host=ingress-nginx-controller-admission,ingress-nginx-controller-admission.$(POD_NAMESPACE).svc
- --namespace=$(POD_NAMESPACE)
- --secret-name=ingress-nginx-admission
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
securityContext:
allowPrivilegeEscalation: false
restartPolicy: OnFailure
serviceAccountName: ingress-nginx-admission
nodeSelector:
kubernetes.io/os: linux
securityContext:
runAsNonRoot: true
runAsUser: 2000
---
# Source: ingress-nginx/templates/admission-webhooks/job-patch/job-patchWebhook.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: ingress-nginx-admission-patch
namespace: ingress-nginx
annotations:
helm.sh/hook: post-install,post-upgrade
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
labels:
helm.sh/chart: ingress-nginx-4.0.15
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 1.1.1
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
spec:
template:
metadata:
name: ingress-nginx-admission-patch
labels:
helm.sh/chart: ingress-nginx-4.0.15
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 1.1.1
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
spec:
containers:
- name: patch
image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-webhook-certgen:v1.1.1
imagePullPolicy: IfNotPresent
args:
- patch
- --webhook-name=ingress-nginx-admission
- --namespace=$(POD_NAMESPACE)
- --patch-mutating=false
- --secret-name=ingress-nginx-admission
- --patch-failure-policy=Fail
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
securityContext:
allowPrivilegeEscalation: false
restartPolicy: OnFailure
serviceAccountName: ingress-nginx-admission
nodeSelector:
kubernetes.io/os: linux
securityContext:
runAsNonRoot: true
runAsUser: 2000
这个文件中将 Ingress 的 HTTP 服务暴露在 32591 端口,将 HTTPS 服务暴露在 32592 端口。可根据实际情况修改,文件的其余部分基本不需要修改。
然后在该文件所在目录执行:
kubectl apply -f ingress-nginx.yaml
通过下面命令查看节点状态:
kubectl get pods -A
等待 ingress controller 节点运行状态变为 Ready 即可:
- 2、配置训练服务在 Ingress 中的转发规则。
新建一个ingress-http.yaml
文件,并用文本编辑器打开,复制以下的内容并填写相应部分:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: sugar-ml-ingress
namespace: sugar-ml
annotations:
kubernetes.io/ingress.class: 'nginx'
nginx.ingress.kubernetes.io/proxy-body-size: 2048m
spec:
rules:
- http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: sugar-ml-service
port:
number: 54321
然后在该文件所在目录执行:
kubectl apply -f ingress-http.yaml
部署训练服务
部署训练服务,新建一个train.yaml
文件,并用文本编辑器打开,复制以下的内容并填写相应部分:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: sugar-ml-stateful-set
namespace: sugar-ml
spec:
serviceName: sugar-ml-service
podManagementPolicy: 'Parallel'
replicas: 3 # pod 数目,需要与 node 数目一致,一个 node 上部署一个 pod
selector:
matchLabels:
app: sugar-ml-k8s
template:
metadata:
labels:
app: sugar-ml-k8s
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- sugar-ml-k8s
topologyKey: 'kubernetes.io/hostname'
containers:
- name: sugar-ml-k8s
image: 'registry.baidubce.com/sugarbi/sugar-ml-train:1.1.1'
resources:
requests:
memory: '4Gi'
ports:
- containerPort: 54321
protocol: TCP
readinessProbe:
httpGet:
path: /kubernetes/isLeaderNode
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 1
livenessProbe:
exec:
command:
- /bin/sh
- -c
- sh liveness.sh
failureThreshold: 5
initialDelaySeconds: 200
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
env:
- name: sugar_ml_service_dns
value: sugar-ml-service.sugar-ml.svc.cluster.local
- name: sugar_ml_lookup_timeout
value: '1800'
- name: sugar_ml_node_count
value: '3' # 与上面 replicas 值对应
- name: sugar_ml_service_endpoint
value: http://ingress-nginx-controller.ingress-nginx # 访问上一节中配置的 Ingress http 端口的地址。也可以是 http://XXX.XXX.XX.XX:32591,将 XXX 替换为机器 IP 即可
---
apiVersion: v1
kind: Service
metadata:
name: sugar-ml-service
namespace: sugar-ml
spec:
type: ClusterIP
clusterIP: None
selector:
app: sugar-ml-k8s
ports:
- protocol: TCP
port: 54321
然后在该文件所在目录执行:
kubectl apply -f train.yaml
通过下面命令查看节点状态:
kubectl get pods -A
等待全部节点运行状态都变为 Running,其中 1 个节点 Ready 即可。例如这里部署了 3 个 POD:
在 Sugar BI 中开启智能预测功能
配置环境变量并重启
训练和推理服务运行成功之后,就可以在 Sugar BI 中来启用该功能了,需要在 Sugar BI 本身的 env 配置文件中增加以下配置(同理,Swarm、Kubernetes 部署方式也是同样的添加相应的环境变量配置即可)。
# 训练服务的地址,XXXXXX 需要替换为训练服务所在的机器地址
# 如果是按照上面文档的 Docker 方式部署,端口号一般为 8090
# 如果是按照上面文档的 Kubernetes 方式部署,端口号一般为 32591
sugar_ml_train_endpoint=http://XXXXXX:8090
# 推理服务的地址,XXXXXX 需要替换为推理服务所在的机器地址
# 如果是按照上面文档的 Docker 方式部署,端口号一般为 8091
# 如果是按照上面文档的 Kubernetes 方式部署,端口号一般为 32590
sugar_ml_predict_endpoint=http://XXXXXX:8091/predict
# 预测模型在对象存储中的 bucket,注意和上面推理服务中的 sugar_ml_model_bucket 对应,一般为 sugar-ml
sugar_ml_model_bucket=sugar-ml
# 同时进行训练的任务的最大数目,根据机器配置和训练数据量配置
export sugar_ml_max_running_train_num=10
env 文件更新之后需要重启 Sugar BI
验证部署
重启完成后,可进入「组织管理」->「外部模块部署验证」进行验证。
功能使用
智能预测功能的使用可参考 模型训练