简介:本文提供基于Kubeadm的Kubernetes集群部署详细指南,涵盖环境准备、节点初始化、组件安装及集群验证全流程,适合生产环境部署参考。
生产环境建议配置:Master节点(4核CPU/16GB内存/100GB磁盘),Worker节点(8核CPU/32GB内存/200GB磁盘)。对于测试环境,可使用3节点(1Master+2Worker)的最低配置:每个节点2核CPU/4GB内存/40GB磁盘。需注意Kubernetes 1.24+版本已移除Dockershim,建议使用containerd作为容器运行时。
推荐使用CentOS 7/8或Ubuntu 20.04 LTS,需关闭SELinux(setenforce 0)和防火墙(或开放6443、10250等必要端口)。内核参数需调整:
# 修改sysctl.confecho "net.bridge.bridge-nf-call-iptables=1" >> /etc/sysctl.confecho "vm.swappiness=0" >> /etc/sysctl.confmodprobe br_netfiltersysctl -p
所有节点需安装containerd 1.6+版本:
# 安装containerdcat <<EOF | sudo tee /etc/modules-load.d/containerd.confoverlaybr_netfilterEOFsudo modprobe overlaysudo modprobe br_netfilter# 添加Kubernetes仓库cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo[kubernetes]name=Kubernetesbaseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-\$basearchenabled=1gpgcheck=1repo_gpgcheck=1gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpgEOF# 安装组件sudo yum install -y containerd kubelet kubeadm kubectl --disableexcludes=kubernetessudo systemctl enable containerd kubelet
使用kubeadm init时建议指定版本和Pod网络:
sudo kubeadm init --kubernetes-version=v1.28.0 \--pod-network-cidr=10.244.0.0/16 \--service-cidr=10.96.0.0/12 \--ignore-preflight-errors=Swap# 初始化完成后需执行mkdir -p $HOME/.kubesudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/configsudo chown $(id -u):$(id -g) $HOME/.kube/config
在Master节点获取join命令:
kubeadm token create --print-join-command
在Worker节点执行获取的命令,例如:
kubeadm join 192.168.1.100:6443 --token abcdef.1234567890abcdef \--discovery-token-ca-cert-hash sha256:xxxxxxxxxxxxxxxxxxxxxxxx
kubectl get nodes -o wide# 正常状态应为Ready,角色包含control-plane或<none>kubectl get cs# 验证核心组件状态(Scheduler/ControllerManager/ETCD)
推荐Calico(支持NetworkPolicy):
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/tigera-operator.yamlkubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/custom-resources.yaml
或使用Flannel:
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
以NFS为例创建StorageClass:
apiVersion: storage.k8s.io/v1kind: StorageClassmetadata:name: nfs-storageprovisioner: example.com/nfsparameters:archiveOnDelete: "false"
使用Prometheus Operator:
helm repo add prometheus-community https://prometheus-community.github.io/helm-chartshelm install prometheus prometheus-community/kube-prometheus-stack
建议部署3个Master节点,使用外部ETCD集群:
# etcd.yaml示例apiVersion: kubeadm.k8s.io/v1beta3kind: ClusterConfigurationetcd:external:endpoints:- https://etcd1:2379- https://etcd2:2379- https://etcd3:2379caFile: /etc/kubernetes/pki/etcd/ca.crtcertFile: /etc/kubernetes/pki/etcd/client.crtkeyFile: /etc/kubernetes/pki/etcd/client.key
启用RBAC和PodSecurityPolicy:
# 创建ServiceAccountkubectl create serviceaccount -n kube-system tillerkubectl create clusterrolebinding tiller-cluster-rule \--clusterrole=cluster-admin \--serviceaccount=kube-system:tiller# 启用PodSecurityapiVersion: policy/v1beta1kind: PodSecurityPolicymetadata:name: restrictedspec:privileged: falserunAsUser:rule: MustRunAsNonRoot
使用Velero进行集群备份:
# 安装Velerovelero install \--provider aws \--plugins velero/velero-plugin-for-aws:v1.4.0 \--bucket velero-backup \--secret-file ./credentials-velero \--backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://minio:9000# 执行备份velero backup create full-backup --include-namespaces=default
kubectl describe node <node>中的Conditions,确认kubelet日志(journalctl -u kubelet)kubectl describe pod <pod>查看Events,检查是否因资源不足或PVC绑定失败netstat -tulnp | grep 6443,检查coredns日志
# 收集关键组件日志kubectl logs -n kube-system <pod-name> --previousjournalctl -u kubelet -n 100 --no-pagercrictl logs <container-id># 诊断网络问题kubectl run -it --rm debug --image=busybox --restart=Never -- ship routeping <service-ip>
# 升级前检查kubeadm upgrade plan# 执行升级sudo kubeadm upgrade apply v1.28.1# 升级kubeletyum install -y kubelet-1.28.1 kubeadm-1.28.1 kubectl-1.28.1systemctl restart kubelet# 逐个排水并升级Worker节点kubectl drain <node-name> --ignore-daemonsets# 在节点上执行升级命令sudo kubeadm upgrade nodesystemctl restart kubeletkubectl uncordon <node-name>
本教程完整覆盖了从环境准备到生产优化的全流程,特别针对Kubernetes 1.28版本的新特性(如Storage Capacity Tracking、Windows节点支持等)进行了适配说明。建议初次部署时先在测试环境验证所有步骤,生产环境部署前务必做好数据备份和回滚方案。