EFK日志采集系统部署指南

EFK日志采集系统简介

EFK即ElasticSearch+Fluentd+Kibana。通过Fluentd在各个节点进行日志的采集汇聚到ElasticSearch中,由Kibana作前端展示。

  • Elasticsearch 是一个分布式的搜索和分析引擎,可以用于全文检索、结构化检索和分析,并能将这三者结合起来。Elasticsearch 基于 Lucene 开发,现在是使用最广的开源搜索引擎之一,Wikipedia、Stack Overflow、GitHub 等都基于 Elasticsearch 来构建他们的搜索引擎。
  • Fluentd是一个优秀的log信息收集的开源免费软件,目前已支持超过125种系统的log信息获取。Fluentd结合其他数据处理平台的使用,可以搭建大数据收集和处理平台,搭建商业化的解决方案。
  • Kibana是一个开源的分析与可视化平台,设计出来用于和Elasticsearch一起使用的。你可以用kibana搜索、查看、交互存放在Elasticsearch索引里的数据,使用各种不同的图表、表格、地图等kibana能够很轻易地展示高级数据分析与可视化。

部署前准备工作

为了顺利完成EFK日志采集系统在CCE服务提供的Kubernetes集群部署,我们首先需要完成一些前置工作:

  • 用户需要在CCE上拥有一个已经完成初始化的Kubernetes集群
  • 用户已经根据指导文档能够通过kubectl正常访问集群。

创建ElasticSearch以及Fluentd用户

执行以下命令:

$ kubectl create -f es-rbac.yaml 
$ kubectl create -f fluentd-es-rbac.yaml

注意:
用户在使用es-rbac.yamlfluentd-es-rbac.yaml之前,请先确认一下集群版本号,不同版本号使用的yaml文件不同。

集群版本号为1.6的用户可以使用的es-rbac.yaml文件如下:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: elasticsearch
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1alpha1
metadata:
  name: elasticsearch
subjects:
  - kind: ServiceAccount
    name: elasticsearch
    namespace: kube-system
roleRef:
  kind: ClusterRole
  name: view
  apiGroup: rbac.authorization.k8s.io

集群版本号为1.8的用户可以使用的es-rbac.yaml文件如下:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: elasticsearch
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: elasticsearch
subjects:
  - kind: ServiceAccount
    name: elasticsearch
    namespace: kube-system
roleRef:
  kind: ClusterRole
  name: view
  apiGroup: rbac.authorization.k8s.io

集群版本号为1.6的用户可以使用的fluentd-es-rbac.yaml文件如下:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluentd
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1alpha1
metadata:
  name: fluentd
subjects:
  - kind: ServiceAccount
    name: fluentd
    namespace: kube-system
roleRef:
  kind: ClusterRole
  name: view
  apiGroup: rbac.authorization.k8s.io

集群版本号为1.8的用户可以使用的fluentd-es-rbac.yaml文件如下:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluentd
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: fluentd
subjects:
  - kind: ServiceAccount
    name: fluentd
    namespace: kube-system
roleRef:
  kind: ClusterRole
  name: view
  apiGroup: rbac.authorization.k8s.io

部署Fluentd

DaemonSet fluentd-es-v1.22 只会调度到设置了标签 beta.kubernetes.io/fluentd-ds-ready=true 的 Node,需要在期望运行 fluentd 的 Node 上设置该标签;

$ kubectl get nodes
NAME           STATUS    AGE       VERSION
192.168.1.92   Ready     12d        v1.8.6
192.168.1.93   Ready     12d        v1.8.6
192.168.1.94   Ready     12d        v1.8.6
192.168.1.95   Ready     12d        v1.8.6

$ kubectl label nodes 192.168.1.92 192.168.1.93 192.168.1.94 192.168.1.95  beta.kubernetes.io/fluentd-ds-ready=true
node "192.168.1.92" labeled
node "192.168.1.93" labeled
node "192.168.1.94" labeled
node "192.168.1.95" labeled

打上标签以后执行对应的yaml文件启动fluentd,默认是在kube-system这个namespace下。

$ kubectl create -f fluentd-es-ds.yaml
daemonset "fluentd-es-v1.22" created

$ kubectl get pods -n kube-system -o wide
NAME                        READY     STATUS    RESTARTS   AGE       IP             NODE
fluentd-es-v1.22-07kls      1/1       Running   0          10s       172.18.4.187   192.168.1.94
fluentd-es-v1.22-4np74      1/1       Running   0          10s       172.18.2.162   192.168.1.93
fluentd-es-v1.22-tbh5c      1/1       Running   0          10s       172.18.3.201   192.168.1.95
fluentd-es-v1.22-wlgjb      1/1       Running   0          10s       172.18.1.187   192.168.1.92

对应的 fluentd-es-ds.yaml文件如下:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: fluentd-es-v1.22
  namespace: kube-system
  labels:
    k8s-app: fluentd-es
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
    version: v1.22
spec:
  template:
    metadata:
      labels:
        k8s-app: fluentd-es
        kubernetes.io/cluster-service: "true"
        version: v1.22
      # This annotation ensures that fluentd does not get evicted if the node
      # supports critical pod annotation based priority scheme.
      # Note that this does not guarantee admission on the nodes (#40573).
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      serviceAccountName: fluentd
      containers:
      - name: fluentd-es
        image: hub.baidubce.com/public/fluentd-elasticsearch:1.22
        command:
          - '/bin/sh'
          - '-c'
          - '/usr/sbin/td-agent 2>&1 >> /var/log/fluentd.log'
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
      nodeSelector:
        beta.kubernetes.io/fluentd-ds-ready: "true"
      tolerations:
      - key : "node.alpha.kubernetes.io/ismaster"
        effect: "NoSchedule"
      terminationGracePeriodSeconds: 30
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers

fluentd启动后可以至对应节点下的/var/log/fluent.log文件查看fluentd的日志有无异常,如果出现类似于unreadable之类的错误检查fluentd-es-ds.yaml挂载的目录是否完全。fluentd会从挂载的目录中采集日志,如果某个日志文件只是软链,需要挂载最初的日志文件目录位置。

部署ElasticSearch服务

首先创建相应的service用于访问elasticsearch

$kubectl create -f es-service.yaml
service "elasticsearch-logging" created

$kubectl get svc -n kube-system
NAME                    CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
elasticsearch-logging   172.16.215.15   <none>        9200/TCP        1m

对应的es-service.yaml文件如下:

apiVersion: v1
kind: Service
metadata:
  name: elasticsearch-logging
  namespace: kube-system
  labels:
    k8s-app: elasticsearch-logging
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/name: "Elasticsearch"
spec:
  ports:
  - port: 9200
    protocol: TCP
    targetPort: db
  selector:
    k8s-app: elasticsearch-logging

启动elsaticsearch服务,通过curl CLUSTER-IP+PORT可以判断elasticsearch服务是否正常启动。

$kubectl create -f es-controller.yaml
replicationcontroller "elasticsearch-logging-v1" created

$kubectl get pods -n kube-system -o wide
NAME                             READY     STATUS    RESTARTS   AGE       IP             NODE
elasticsearch-logging-v1-0kll0   1/1       Running   0          43s       172.18.2.164   192.168.1.93
elasticsearch-logging-v1-vh17k   1/1       Running   0          43s       172.18.1.189   192.168.1.92

$curl 172.16.215.15:9200
{
  "name" : "elasticsearch-logging-v1-vh17k",
  "cluster_name" : "kubernetes-logging",
  "cluster_uuid" : "cjvE3LJjTvic8TGCbbKxZg",
  "version" : {
    "number" : "2.4.1",
    "build_hash" : "c67dc32e24162035d18d6fe1e952c4cbcbe79d16",
    "build_timestamp" : "2016-09-27T18:57:55Z",
    "build_snapshot" : false,
    "lucene_version" : "5.5.2"
  },
  "tagline" : "You Know, for Search"
}

对应的es-controller.yaml文件如下:

apiVersion: v1
kind: ReplicationController
metadata:
  name: elasticsearch-logging-v1
  namespace: kube-system
  labels:
    k8s-app: elasticsearch-logging
    version: v1
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
spec:
  replicas: 2
  selector:
    k8s-app: elasticsearch-logging
    version: v1
  template:
    metadata:
      labels:
        k8s-app: elasticsearch-logging
        version: v1
        kubernetes.io/cluster-service: "true"
    spec:
      serviceAccountName: elasticsearch
      containers:
      - image: hub.baidubce.com/public/elasticsearch:v2.4.1-1
        name: elasticsearch-logging
        resources:
          # need more cpu upon initialization, therefore burstable class
          limits:
            cpu: 1000m
          requests:
            cpu: 100m
        ports:
        - containerPort: 9200
          name: db
          protocol: TCP
        - containerPort: 9300
          name: transport
          protocol: TCP
        volumeMounts:
        - name: es-persistent-storage
          mountPath: /data
        env:
        - name: "NAMESPACE"
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
      volumes:
      - name: es-persistent-storage
        emptyDir: {}

部署Kibana

$kubectl create -f kibana-service.yaml
service "kibana-logging" created

$kubectl create -f kibana-controller.yaml
deployment "kibana-logging" created

$kubectl get pods -n kube-system -o wide
NAME                              READY     STATUS    RESTARTS   AGE       IP             NODE
kibana-logging-1043852375-wrq6g   1/1       Running   0          48s       172.18.2.175   192.168.1.93

对应的kibana-service.yaml文件如下:

apiVersion: v1
kind: Service
metadata:
  name: kibana-logging
  namespace: kube-system
  labels:
    k8s-app: kibana-logging
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/name: "Kibana"
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: ui
  selector:
    k8s-app: kibana-logging

对应的kibana-controller.yaml文件如下

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: kibana-logging
  namespace: kube-system
  labels:
    k8s-app: kibana-logging
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: kibana-logging
  template:
    metadata:
      labels:
        k8s-app: kibana-logging
    spec:
      containers:
      - name: kibana-logging
        image: hub.baidubce.com/public/kibana:v4.6.1-1
        resources:
          # keep request = limit to keep this container in guaranteed class
          limits:
            cpu: 100m
          requests:
            cpu: 100m
        env:
          - name: "ELASTICSEARCH_URL"
            value: "http://elasticsearch-logging:9200"
          - name: "KIBANA_BASE_URL"
            value: ""
        ports:
        - containerPort: 5601
          name: ui
          protocol: TCP

kibana Pod 第一次启动时会用较长时间(10-20分钟)来优化和 Cache 状态页面,可以 tailf 该 Pod 的日志观察进度:

$ kubectl logs kibana-logging-1043852375-wrq6g -n kube-system -f
ELASTICSEARCH_URL=http://elasticsearch-logging:9200
server.basePath: /api/v1/proxy/namespaces/kube-system/services/kibana-logging
{"type":"log","@timestamp":"2017-12-04T09:54:41Z","tags":["info","optimize"],"pid":6,"message":"Optimizing and caching bundles for kibana and statusPage. This may take a few minutes"}
{"type":"log","@timestamp":"2017-12-04T10:02:20Z","tags":["info","optimize"],"pid":6,"message":"Optimization of bundles for kibana and statusPage complete in 458.61 seconds"}
{"type":"log","@timestamp":"2017-12-04T10:02:20Z","tags":["status","plugin:kibana@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}

访问 kibana

输入以下指令:

$kubectl get svc -n kube-system

返回结果如下所示:

NAME                    TYPE           CLUSTER-IP       EXTERNAL-IP    PORT(S)          AGE
kibana-logging          LoadBalancer   172.16.60.222    180.76.112.7   80:32754/TCP   1m

用户可以通过LoadBalancer访问kibana服务,浏览器访问http://180.76.112.7即可,该ip地址为kibana-logging这个service的EXTERNAL-IP