Prometheus神器之监控K8s集群

程序员同行者 · 2019-05-10 19:35:07 · 1501 次点击 · 预计阅读时间 20 分钟 · 大约8小时之前开始浏览

这是一个创建于 2019-05-10 19:35:07 的文章，其中的信息可能已经有所发展或是发生改变。

Prometheus 简介

Prometheus是SoundCloud开源的一款开源软件。它的实现参考了Google内部的监控实现，与源自Google的Kubernetes结合起来非常合适。另外相比influxdb的方案，性能更加突出，而且还内置了报警功能。它针对大规模的集群环境设计了拉取式的数据采集方式，你只需要在你的应用里面实现一个metrics接口，然后把这个接口告诉Prometheus就可以完成数据采集了。

安装Prometheus

首先我们使用ConfigMap的形式来设置Prometheus的配置文件，如下

apiVersion: v1kind: ConfigMapmetadata:  name: prometheus-configuration  labels:    app.kubernetes.io/name: prometheus    app.kubernetes.io/part-of: ingress-nginx    name: prometheus-configuration  namespace: ingress-nginxdata:  prometheus.yml: |-    global:      scrape_interval: 10s    scrape_configs:    - job_name: 'ingress-nginx-endpoints'      kubernetes_sd_configs:      - role: pod        namespaces:          names:          - ingress-nginx      relabel_configs:      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]        action: keep        regex: true      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]        action: replace        target_label: __scheme__        regex: (https?)      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]        action: replace        target_label: __metrics_path__        regex: (.+)      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]        action: replace        target_label: __address__        regex: ([^:]+)(?::\d+)?;(\d+)        replacement: $1:$2      - source_labels: [__meta_kubernetes_service_name]        regex: prometheus-server        action: drop---

将以上配置文件保存为configuration.yaml，然后执行命令：

$ kubectl apply -f configuration.yamlnamespace "ingress-nginx" createdconfigmap "prometheus-configuration" created

通过Deployment部署Prometheus,yaml文件如下：

---apiVersion: rbac.authorization.k8s.io/v1beta1kind: ClusterRolemetadata:  name: prometheusrules:  - apiGroups: [""] # "" indicates the core API group    resources:      - nodes      - nodes/proxy      - services      - endpoints      - pods    verbs:      - get      - watch      - list  - apiGroups:      - extensions    resources:      - ingresses    verbs:      - get      - watch      - list  - nonResourceURLs: ["/metrics"]    verbs:      - get---apiVersion: v1kind: ServiceAccountmetadata:  name: prometheus  namespace: ingress-nginx  labels:    app: prometheus---apiVersion: rbac.authorization.k8s.io/v1beta1kind: ClusterRoleBindingmetadata:  name: prometheussubjects:  - kind: ServiceAccount    name: prometheus    namespace: ingress-nginxroleRef:  kind: ClusterRole  name: prometheus  apiGroup: rbac.authorization.k8s.io---apiVersion: v1kind: ConfigMapmetadata:  name: prometheus-conf  namespace: ingress-nginx  labels:    app: prometheusdata:  prometheus.yml: |-    # my global config    global:      scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.      evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.      # scrape_timeout is set to the global default (10s).    # Alertmanager configuration    alerting:      alertmanagers:      - static_configs:        - targets:          # - alertmanager:9093    # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.    rule_files:      # - "first_rules.yml"      # - "second_rules.yml"    # A scrape configuration containing exactly one endpoint to scrape:    # Here it's Prometheus itself.    scrape_configs:      # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.      - job_name: 'prometheus'        # metrics_path defaults to '/metrics'        # scheme defaults to 'http'.        static_configs:          - targets: ['localhost:9090']      - job_name: 'grafana'        static_configs:          - targets:              - 'grafana.ingress-nginx:3000'      - job_name: 'kubernetes-apiservers'        kubernetes_sd_configs:        - role: endpoints        # Default to scraping over https. If required, just disable this or change to        # `http`.        scheme: https        # This TLS & bearer token file config is used to connect to the actual scrape        # endpoints for cluster components. This is separate to discovery auth        # configuration because discovery & scraping are two separate concerns in        # Prometheus. The discovery auth config is automatic if Prometheus runs inside        # the cluster. Otherwise, more config options have to be provided within the        # <kubernetes_sd_config>.        tls_config:          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt          # If your node certificates are self-signed or use a different CA to the          # master CA, then disable certificate verification below. Note that          # certificate verification is an integral part of a secure infrastructure          # so this should only be disabled in a controlled environment. You can          # disable certificate verification by uncommenting the line below.          #          # insecure_skip_verify: true        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token        # Keep only the default/kubernetes service endpoints for the https port. This        # will add targets for each API server which Kubernetes adds an endpoint to        # the default/kubernetes service.        relabel_configs:        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]          action: keep          regex: default;kubernetes;https      # Scrape config for nodes (kubelet).      #      # Rather than connecting directly to the node, the scrape is proxied though the      # Kubernetes apiserver.  This means it will work if Prometheus is running out of      # cluster, or can't connect to nodes for some other reason (e.g. because of      # firewalling).      - job_name: 'kubernetes-nodes'        # Default to scraping over https. If required, just disable this or change to        # `http`.        scheme: https        # This TLS & bearer token file config is used to connect to the actual scrape        # endpoints for cluster components. This is separate to discovery auth        # configuration because discovery & scraping are two separate concerns in        # Prometheus. The discovery auth config is automatic if Prometheus runs inside        # the cluster. Otherwise, more config options have to be provided within the        # <kubernetes_sd_config>.        tls_config:          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token        kubernetes_sd_configs:        - role: node        relabel_configs:        - action: labelmap          regex: __meta_kubernetes_node_label_(.+)        - target_label: __address__          replacement: kubernetes.default.svc:443        - source_labels: [__meta_kubernetes_node_name]          regex: (.+)          target_label: __metrics_path__          replacement: /api/v1/nodes/${1}/proxy/metrics      # Scrape config for Kubelet cAdvisor.      #      # This is required for Kubernetes 1.7.3 and later, where cAdvisor metrics      # (those whose names begin with 'container_') have been removed from the      # Kubelet metrics endpoint.  This job scrapes the cAdvisor endpoint to      # retrieve those metrics.      #      # In Kubernetes 1.7.0-1.7.2, these metrics are only exposed on the cAdvisor      # HTTP endpoint; use "replacement: /api/v1/nodes/${1}:4194/proxy/metrics"      # in that case (and ensure cAdvisor's HTTP server hasn't been disabled with      # the --cadvisor-port=0 Kubelet flag).      #      # This job is not necessary and should be removed in Kubernetes 1.6 and      # earlier versions, or it will cause the metrics to be scraped twice.      - job_name: 'kubernetes-cadvisor'        # Default to scraping over https. If required, just disable this or change to        # `http`.        scheme: https        # This TLS & bearer token file config is used to connect to the actual scrape        # endpoints for cluster components. This is separate to discovery auth        # configuration because discovery & scraping are two separate concerns in        # Prometheus. The discovery auth config is automatic if Prometheus runs inside        # the cluster. Otherwise, more config options have to be provided within the        # <kubernetes_sd_config>.        tls_config:          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token        kubernetes_sd_configs:        - role: node        relabel_configs:        - action: labelmap          regex: __meta_kubernetes_node_label_(.+)        - target_label: __address__          replacement: kubernetes.default.svc:443        - source_labels: [__meta_kubernetes_node_name]          regex: (.+)          target_label: __metrics_path__          replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor      # Scrape config for service endpoints.      #      # The relabeling allows the actual service scrape endpoint to be configured      # via the following annotations:      #      # * `prometheus.io/scrape`: Only scrape services that have a value of `true`      # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need      # to set this to `https` & most likely set the `tls_config` of the scrape config.      # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.      # * `prometheus.io/port`: If the metrics are exposed on a different port to the      # service then set this appropriately.      - job_name: 'kubernetes-service-endpoints'        kubernetes_sd_configs:        - role: endpoints        relabel_configs:        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]          action: keep          regex: true        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]          action: replace          target_label: __scheme__          regex: (https?)        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]          action: replace          target_label: __metrics_path__          regex: (.+)        - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]          action: replace          target_label: __address__          regex: ([^:]+)(?::\d+)?;(\d+)          replacement: $1:$2        - action: labelmap          regex: __meta_kubernetes_service_label_(.+)        - source_labels: [__meta_kubernetes_namespace]          action: replace          target_label: kubernetes_namespace        - source_labels: [__meta_kubernetes_service_name]          action: replace          target_label: kubernetes_name      # Example scrape config for probing services via the Blackbox Exporter.      #      # The relabeling allows the actual service scrape endpoint to be configured      # via the following annotations:      #      # * `prometheus.io/probe`: Only probe services that have a value of `true`      - job_name: 'kubernetes-services'        metrics_path: /probe        params:          module: [http_2xx]        kubernetes_sd_configs:        - role: service        relabel_configs:        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]          action: keep          regex: true        - source_labels: [__address__]          target_label: __param_target        - target_label: __address__          replacement: blackbox-exporter.example.com:9115        - source_labels: [__param_target]          target_label: instance        - action: labelmap          regex: __meta_kubernetes_service_label_(.+)        - source_labels: [__meta_kubernetes_namespace]          target_label: kubernetes_namespace        - source_labels: [__meta_kubernetes_service_name]          target_label: kubernetes_name      # Example scrape config for probing ingresses via the Blackbox Exporter.      #      # The relabeling allows the actual ingress scrape endpoint to be configured      # via the following annotations:      #      # * `prometheus.io/probe`: Only probe services that have a value of `true`      - job_name: 'kubernetes-ingresses'        metrics_path: /probe        params:          module: [http_2xx]        kubernetes_sd_configs:          - role: ingress        relabel_configs:          - source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe]            action: keep            regex: true          - source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]            regex: (.+);(.+);(.+)            replacement: ${1}://${2}${3}            target_label: __param_target          - target_label: __address__            replacement: blackbox-exporter.example.com:9115          - source_labels: [__param_target]            target_label: instance          - action: labelmap            regex: __meta_kubernetes_ingress_label_(.+)          - source_labels: [__meta_kubernetes_namespace]            target_label: kubernetes_namespace          - source_labels: [__meta_kubernetes_ingress_name]            target_label: kubernetes_name      # Example scrape config for pods      #      # The relabeling allows the actual pod scrape endpoint to be configured via the      # following annotations:      #      # * `prometheus.io/scrape`: Only scrape pods that have a value of `true`      # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.      # * `prometheus.io/port`: Scrape the pod on the indicated port instead of the      # pod's declared ports (default is a port-free target if none are declared).      - job_name: 'kubernetes-pods'        kubernetes_sd_configs:        - role: pod        relabel_configs:        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]          action: keep          regex: true        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]          action: replace          target_label: __metrics_path__          regex: (.+)        - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]          action: replace          regex: ([^:]+)(?::\d+)?;(\d+)          replacement: $1:$2          target_label: __address__        - action: labelmap          regex: __meta_kubernetes_pod_label_(.+)        - source_labels: [__meta_kubernetes_namespace]          action: replace          target_label: kubernetes_namespace        - source_labels: [__meta_kubernetes_pod_name]          action: replace          target_label: kubernetes_pod_name---apiVersion: v1kind: ConfigMapmetadata:  name: prometheus-rules  namespace: ingress-nginx  labels:    app: prometheusdata:  cpu-usage.rule: |    groups:      - name: NodeCPUUsage        rules:          - alert: NodeCPUUsage            expr: (100 - (avg by (instance) (irate(node_cpu{name="node-exporter",mode="idle"}[5m])) * 100)) > 75            for: 2m            labels:              severity: "page"            annotations:              summary: "{{$labels.instance}}: High CPU usage detected"              description: "{{$labels.instance}}: CPU usage is above 75% (current value is: {{ $value }})"---kind: DeploymentapiVersion: apps/v1beta2metadata:  labels:    app: prometheus  name: prometheus  namespace: ingress-nginxspec:  replicas: 1  revisionHistoryLimit: 10  selector:    matchLabels:      app: prometheus  template:    metadata:      labels:        app: prometheus    spec:      serviceAccountName: prometheus      securityContext:        runAsUser: 65534        fsGroup: 65534      containers:        - name: prometheus          image: prom/prometheus:latest          volumeMounts:            - mountPath: /etc/prometheus/prometheus.yml              name: prometheus-conf-volume              subPath: prometheus.yml            - mountPath: /etc/prometheus/rules              name: prometheus-rules-volume          ports:            - containerPort: 9090              protocol: TCP      volumes:        - name: prometheus-conf-volume          configMap:            name: prometheus-conf        - name: prometheus-rules-volume          configMap:            name: prometheus-rules      tolerations:        - key: node-role.kubernetes.io/master          effect: NoSchedule---kind: ServiceapiVersion: v1metadata:  annotations:    prometheus.io/scrape: 'true'  labels:    app: prometheus  name: prometheus-service  namespace: ingress-nginxspec:  ports:    - port: 9090      targetPort: 9090  selector:    app: prometheus  type: NodePort

将以上文件保存为prometheus.yaml，然后执行命令：

$ kubectl apply -f prometheus.yamlclusterrole "prometheus" createdserviceaccount "prometheus" createdclusterrolebinding "prometheus" createdconfigmap "prometheus-conf" createdconfigmap "prometheus-rules" createddeployment "prometheus" createdservice "prometheus-service" created

部署node-exporter，为了能够收集每个节点的信息，所以我们这里使用DaemonSet的形式部署：

kind: DaemonSetapiVersion: apps/v1beta2metadata:   labels:    app: node-exporter  name: node-exporter  namespace: ingress-nginxspec:  revisionHistoryLimit: 10  selector:    matchLabels:      app: node-exporter  template:    metadata:      labels:        app: node-exporter    spec:      containers:        - name: node-exporter          image: prom/node-exporter:v0.16.0          ports:            - containerPort: 9100              protocol: TCP              name: http      hostNetwork: true      hostPID: true      tolerations:        - effect: NoSchedule          operator: Exists---kind: ServiceapiVersion: v1metadata:  labels:    app: node-exporter  name: node-exporter-service  namespace: ingress-nginxspec:  ports:    - name: http      port: 9100      nodePort: 31672      protocol: TCP  type: NodePort  selector:    app: node-exporter

将以上文件保存为node-exporter.yaml，然后执行命令：

$ kubectl apply -f node-exporter.yamldaemonset "node-exporter" createdservice "node-exporter-service" created

接下来暴露服务以便可以访问Prometheus的UI界面，查看NodePort：

[root@dtdream-dtwarebase-prod-k8s-01 monitoring]# kubectl  -s10.90.2.100:8080 -ningress-nginx get svc,po -owideNAME                        TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE       SELECTORsvc/node-exporter-service   NodePort   10.254.208.254   <none>        9100:31672/TCP   55s       app=node-exportersvc/prometheus-service      NodePort   10.254.187.175   <none>        9090:25759/TCP   3m        app=prometheusNAME                             READY     STATUS             RESTARTS   AGE       IP             NODEpo/node-exporter-b47ch           1/1       Running            0          54s       10.90.2.102    10.90.2.102po/node-exporter-q88pp           1/1       Running            0          54s       10.90.2.100    10.90.2.100po/prometheus-7b7fd77c44-7cf6z   1/1       Running            0          3m        172.17.21.28   10.90.2.101

然后用浏览器访问http://10.90.2.101:9090就可以访问到Prometheus的界面了。

image

可以切换到Status下面的targets查看我们采集的数据是否正常：

image

可以根据targets下面的提示信息对采集失败的数据进行修正。

查询监控数据

Prometheus提供了API的方式进行数据查询，同样可以使用query语言进行复杂的查询任务，在上面的WEB界面上提供了基本的查询和图形化的展示功能。

比如查询每个POD的CPU使用情况，查询条件如下：

sum by (pod_name)( rate(container_cpu_usage_seconds_total{image!="", pod_name!=""}[1m] ) )

注意其中的pod_name和image要根据自己采集的数据进行区分。

安装Grafana

Prometheus以及获取到了我们采集的数据，现在我们需要一个更加强大的图标展示工具，毫无疑问选择grafana，同样的，在Kubernetes环境下面进行安装，yaml文件如下:

apiVersion: extensions/v1beta1kind: Deploymentmetadata:  labels:    app.kubernetes.io/name: grafana    app.kubernetes.io/part-of: ingress-nginx  name: grafana  namespace: ingress-nginxspec:  selector:    matchLabels:      app.kubernetes.io/name: grafana      app.kubernetes.io/part-of: ingress-nginx  strategy:    rollingUpdate:      maxSurge: 1      maxUnavailable: 1    type: RollingUpdate  template:    metadata:      labels:        app.kubernetes.io/name: grafana        app.kubernetes.io/part-of: ingress-nginx    spec:      containers:        - image: grafana/grafana          name: grafana          ports:            - containerPort: 3000              protocol: TCP          resources:            limits:              cpu: 500m              memory: 2500Mi            requests:              cpu: 100m              memory: 100Mi          volumeMounts:            - mountPath: /var/lib/grafana              name: data      restartPolicy: Always      volumes:        - emptyDir: {}          name: data---apiVersion: v1kind: Servicemetadata:  name: grafana  namespace: ingress-nginx  labels:    app.kubernetes.io/name: grafana    app.kubernetes.io/part-of: ingress-nginxspec:  ports:    - port: 3000      protocol: TCP      targetPort: 3000  selector:    app.kubernetes.io/name: grafana    app.kubernetes.io/part-of: ingress-nginx  type: NodePort---

将以上文件保存为grafana.yaml，然后执行命令：

$ kubectl apply -f grafana.yamldeployment "grafana" createdservice "grafana" created

可以选择使用ingress将服务暴露在外网进行访问。访问grafanaWEB界面，我这里就直接使用的Nodeport。

查看grafana访问端口

$ kubectl  -ningress-nginx get svc,po|grep grafanasvc/grafana                 NodePort   10.254.86.182    <none>        3000:7006/TCP    2mpo/grafana-85fbffb76f-x6hqw      1/1       Running            0          2m

访问http://10.90.2.101:7006

image

将我们上面的Prometheus添加到grafana数据源中去。

image

然后添加我们的Dashboard，可以使用https://grafana.com/dashboards/162，可以下载该页面的dashboard的json文件，然后直接导入到grafana中去，但是需要注意其中的一些参数，需要根据prometheus中采集到实际数据进行填写，比如我们这里采集到容器名是name，而不是io_kubernetes_container_name,最终展示界面如下：

image

上面用的yaml文件可以到github上查看https://github.com/jcops/k8s-yaml/tree/master/monitoring

欢迎您关注程序员同行者订阅号，程序员同行者是一个技术分享平台，主要是运维自动化开发：linux、python、django、saltstack、redis、golang、docker、kubernetes、vue等经验分享及经验交流。

牛人并不可怕，可怕的是牛人比我们还努力！

如果您觉得不错，请别忘了转发、分享、点赞让更多的人去学习，您的举手之劳，就是对小编最好的支持，非常感谢！

image

有疑问加站长微信联系（非本文作者）

本文来自：简书

感谢作者：程序员同行者

查看原文：Prometheus神器之监控K8s集群

入群交流（和以上内容无关）：加入Go大咖交流群，或添加微信：liuxiaoyan-s 备注：入群；或加QQ群：692541889

1501 次点击

加入收藏微博

收入我的专栏

上一篇：grom源码分析

下一篇：Golang操作excel文件

prometheus

kubernetes

grafana

0 回复

暂无回复

添加一条新回复（您需要登录后才能回复没有账号？）

请尽量让自己的回复能够对别人有帮助
支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
支持 @ 本站用户；支持表情（输入 : 提示），见 Emoji cheat sheet
图片支持拖拽、截图粘贴等方式上传

关注我

扫码关注领全套学习资料
加入 QQ 群：
- 192706294（已满）
- 731990104（已满）
- 798786647（已满）
- 729884609（已满）
- 977810755（已满）
- 815126783（已满）
- 812540095（已满）
- 1006366459（已满）
- 692541889
加入微信群：liuxiaoyan-s，备注入群
也欢迎加入知识星球 Go粉丝们（免费）

Prometheus神器之监控K8s集群

Prometheus 简介

安装Prometheus

查询监控数据

安装Grafana

用户登录

今日阅读排行

一周阅读排行

关注我

Prometheus 简介

安装Prometheus

查询监控数据

安装Grafana

Prometheus神器之监控K8s集群

Prometheus 简介

安装Prometheus

查询监控数据

安装Grafana

用户登录

今日阅读排行

一周阅读排行

关注我

给该专栏投稿 写篇新文章

收入到我管理的专栏 新建专栏

Prometheus 简介

安装Prometheus

查询监控数据

安装Grafana

给该专栏投稿写篇新文章

收入到我管理的专栏新建专栏