简单介绍Prometheus

Karl_Zhang · · 1283 次点击 · · 开始浏览

这是一个创建于的文章，其中的信息可能已经有所发展或是发生改变。

第一次，站长亲自招 Gopher 了>>>

一、什么是Prometheus？

Prometheus是一套开源的监控与告警系统，基于Golang实现，可用于对集群的状态进行实时的监控。
如今如多的公司与组织都在使用Prometheus，项目具有非常成熟的开发者社区。

github上star数量为33.4k

prometheus提供的Web界面

二、Prometheus的启动？

1、下载Prometheus，选择合适的版本，我选择的是prometheus-2.17.0-rc.0.linux-amd64。https://prometheus.io/download/

2、配置prometheus路径下的prometheus.yml文件，我们可以在global中配置包括prometheus抓取数据、验证rule的时间间隔；在rule_files中配置rule_file的地址；scrape_configs中设置数据源。prometheus抓取数据的方式、rule的作用以及rule_file如何编写等会在下文解释。

global:
  # How frequently to scrape targets by default.
  [ scrape_interval: <duration> | default = 1m ]

  # How long until a scrape request times out.
  [ scrape_timeout: <duration> | default = 10s ]

  # How frequently to evaluate rules.
  [ evaluation_interval: <duration> | default = 1m ]

  # The labels to add to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    [ <labelname>: <labelvalue> ... ]

# Rule files specifies a list of globs. Rules and alerts are read from
# all matching files.
rule_files:
  [ - <filepath_glob> ... ]

# A list of scrape configurations.
scrape_configs:
  [ - <scrape_config> ... ]

# Alerting specifies settings related to the Alertmanager.
alerting:
  alert_relabel_configs:
    [ - <relabel_config> ... ]
  alertmanagers:
    [ - <alertmanager_config> ... ]

# Settings related to the experimental remote write feature.
remote_write:
  [ - <remote_write> ... ]

# Settings related to the experimental remote read feature.
remote_read:
  [ - <remote_read> ... ]

3、输入./prometheus，启动prometheus服务。prometheus的默认端口为9090，通过访问http://localhost:9090即可进入prometheus的Web界面。

三、架构

image

数据收集：

Prometheus收集方式有两种方式，分别为pull和push。
1、若以pull的的方式，prometheus提供了Golang、Java、Scala、Python、Ruby等语言的客户端库，在这里我简单介绍一下Go中的用法。

package main

import (
    "fmt"
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
    "math"
    "math/rand"
    "net/http"
    "time"
)

var (
    TestCounter = prometheus.NewCounter(prometheus.CounterOpts{
        Name: "test_counter",
        Help: "test_counter",
    })
    TestGauge = prometheus.NewGauge(prometheus.GaugeOpts{
    Name: "test_gauge",
    Help: "test_gauge",
    })
    TestHistogram = prometheus.NewHistogram(prometheus.HistogramOpts{
    Name: "test_histogram",
    Help: "test_histogram",
    Buckets: prometheus.LinearBuckets(20, 5, 5),
    })
    TestSummary = prometheus.NewSummary(prometheus.SummaryOpts{
    Name: "test_summary",
    Help: "test_summary",
    Objectives: map[float64]float64{0.5: 0.05, 0.9: 0.01, 0.99: 0.001},
    })
)

func main() {
    prometheus.MustRegister(TestGauge)
    prometheus.MustRegister(TestHistogram)
    prometheus.MustRegister(TestSummary)
    prometheus.MustRegister(TestCounter)
    
    go func(){
        i := 0.0
        for {
            TestGauge.Add(1)
            TestCounter.Add(1)
            TestHistogram.Observe(30 + math.Floor(float64(rand.Intn(120))*math.Sin(i*0.1))/10)
            TestSummary.Observe(30 + math.Floor(float64(rand.Intn(120))*math.Sin(i*0.1))/10)
            time.Sleep(2 * time.Second)
            i += 1
        }
    }()
    http.Handle("/metrics", promhttp.Handler())
    err := http.ListenAndServe("localhost:2112", nil)
    if err != nil {
        fmt.Println(err)
    }
}

首先我们创建prometheus中的数据类型，包括Counter、Gauge、Histogram、Summary等，对它们感兴趣的可以查看https://prometheus.io/docs/concepts/metric_types/，接着将创建好的变量register到prometheus中并提供端口给prometheus来pull数据即可，这里需要我们在前面所提到的prometheus.yml文件中配置好数据源。

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
    - targets: ['localhost:9090']
  - job_name: 'myapp'
    static_configs:
    - targets: ['localhost:2112']

2、若以push的方式，则需要Pushgateway作为数据的收集器。首先我们需要下载Pushgateway https://prometheus.io/download/，解压后到pushgateway的路径下./pushgateway启动服务，默认端口为9091。当然，我们需要和前面pull时一样，在prometheus.yml文件中配置pushgateway的地址，这里不再赘述。
接着，将我们想要发送到Prometheus的数据通过http的方式发送到pushgateway。

func main() {
    ticker := time.NewTicker(2 * time.Second)
    for range ticker.C{
        for i := 1; i < 6; i++ {
            cmd := exec.Command("bash", "-c",
                "curl -XPOST --data-binary @job" + strconv.Itoa(i) + ".txt localhost:9091/metrics/job/" + strconv.Itoa(i))
            err := cmd.Start()
            if err != nil {
                fmt.Println(err)
            }
            fmt.Println("sending data")
        }
    }
}

代码使用了curl将数据发送到Pushgateway，job文件里的数据格式如下

slot{label1="xxx", label2="xxx"} 0

这是Prometheus中数据的最基本格式，这样的数据称为一个metrics，其中“slot”是该metrics的name；{}中包括了metrics的label，这些label是辨别不同metrics的标志，在通过PromQL检索时需要使用；而0则是该metrics的value，value随时间的变化而变化，metrics为时间序列数据，旧数据将存储在prometheus实现的TSDB中。

Prometheus server

这里主要介绍Prometheus的TSDB和HTTP server

TSDB

首先介绍数据在磁盘中的结构

-bash-4.2$ tree data
data
├── 01EQWQPHYCYQV25DGKWDXX01P5    #block
│   ├── chunks
│   │   └── 000001  #compressed time series data
│   ├── index  #query index
│   ├── meta.json  #record block meta information
│   └── tombstones # temporarily tore deleted records
├── lock
├── queries.active
└── wal  #write ahead log, prevent data loss
    ├── 00001561
    ├── 00001562
    ├── 00001563
    ├── 00001564
    └── checkpoint.001560
        └── 00000000

时序数据以block为单位持久化在磁盘里，每个block存在chunks、index、meta.json、tombstones等文件，其中
1、meta.json存储了Block的元数据信息，包括Block的时间窗口、包含数据的数量、压缩的次数等。
2、chunks包含了时间窗口内的所有samples，是实际存储数据的文件。
3、tombstones是暂存被删除数据的文件，metrics被删除时不会立刻从block中剔除，而是在tombstones中标记，直到block中metrics全被删除或者block被压缩时真正删除metrics。
4、index文件是用户查询数据时所依赖的索引文件。
5、wal的作用是防止数据丢失，当prometheus意外崩溃时，重启会首先将wal中数据读入内存。

block持久化的流程

每一个block可以看作一个小型数据库，其中index文件则是其索引，它使用了倒排索引，提高了Prometheus的查找效率。

┌────────────────────────────┬─────────────────────┐
│ magic(0xBAAAD700) <4b>     │ version(1) <1 byte> │
├────────────────────────────┴─────────────────────┤
│ ┌──────────────────────────────────────────────┐ │
│ │            1.     Symbol Table                 │ │
│ ├──────────────────────────────────────────────┤ │
│ │            2.       Series                    │ │
│ ├──────────────────────────────────────────────┤ │
│ ├──────────────────────────────────────────────┤ │
│ │            3.       Postings 1                 │ │
│ ├──────────────────────────────────────────────┤ │
│ │                      ...                     │ │
│ ├──────────────────────────────────────────────┤ │
│ │                   Postings N                 │ │
│ ├──────────────────────────────────────────────┤ │
│ ├──────────────────────────────────────────────┤ │
│ │            4.     Postings Table               │ │
│ ├──────────────────────────────────────────────┤ │
│ │            5.         TOC                     │ │
│ └──────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────┘

1、toc：index 文件中各部分的起始 offset；
2、postings offset table: 每个entry都保存了一对label的key和value以及对应series list在posting中offset；
3、posting: 每一个label pair都将对应一个series list；（真正存储series id list的地方）
4、series: 记录了这个block里有哪些series，series的各个label（symbol table中已编号），每个series里有哪些chunk，每个chunk的开始时间和结束时间；
5、symbol table: 存储block中所有series的Label key与Label value，并对它们编号，作用是压缩index文件的大小。

索引的流程

HTTP Server

下面是Prometheus提供的一些HTTP API:
Instant queries: /api/v1/query?query=up&time=xxx
Range queries: /api/v1/query_range?query=up&start=xxx&end=xxx&step=xxx
Querying metadata: /api/v1/series
Getting label names: /api/v1/labels
Targets: /api/v1/targets
Rules: /api/v1/rules
Alerts: /api/v1/alerts
TSDB Admin: /api/v1/admin/tsdb (Snapshot，Delete Series，Clean Tombstones)
详情可以看：https://prometheus.io/docs/prometheus/latest/querying/api/

除了在prometheus的web界面进行操作，还可以直接通过http去调用Prometheus的API，获取想要的数据。这里简单介绍一下这些API的用法，
1、前两个分别是瞬时查询、范围查询，通过设置metrics name和时间，得到queries的结果。
2、Querying metadata返回符合label的series的信息，将metrics作为搜索条件可以查找所有对应的series信息。
3、Getting label names可以返回所有的Label值。
4、返回所有prometheus的targets，包括prometheus自身、pushgateway、node exporter等等。
5、Rules可以查看prometheus配置中的报警规则。使用PromQL完成报警规则的设置。
6、Alert可以返回所有的报警信息。
7、TSDB admin api暴露了操作数据的方法，需要在prometheus中设置 —web.enable-admin-api才可以使用这些api，Snapshot的功能是为当前的数据创建一个快照，并返回数据所在的路径；Delete Series可以删除指定的series，prometheus中被删除的数据会被放在Tomstones中，直到Block被压缩时才会删除tomstones中的数据，当然也可以调用clean tombstones接口来清理tomstones的数据。

PromQL

Prometheus提供了PromQL来对tsdb中的数据进行查询。在Prometheus的Web界面和Grafana中都能输入PromQL进行查询。基础的用法是输入metrics名字与对应想要的label，就能搜索到对应的时序数据。https://prometheus.io/docs/prometheus/latest/querying/basics/#functions

Grafana中输入PromQL

有疑问加站长微信联系（非本文作者）

本文来自：简书

感谢作者：Karl_Zhang

查看原文：简单介绍Prometheus

入群交流（和以上内容无关）：加入Go大咖交流群，或添加微信：liuxiaoyan-s 备注：入群；或加QQ群：692541889

1283 次点击

加入收藏微博

收入我的专栏

上一篇：php/golang使用chrome内核实现服务器端html转pdf,html转图片,pdf加水印，pdf转图片等

下一篇：宝藏工具！基于gin的golang网站开发的认证利器jwt

prometheus

web

信息

0 回复

添加一条新回复（您需要登录后才能回复没有账号？）

请尽量让自己的回复能够对别人有帮助
支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
支持 @ 本站用户；支持表情（输入 : 提示），见 Emoji cheat sheet
图片支持拖拽、截图粘贴等方式上传

关注我

扫码关注领全套学习资料
加入 QQ 群：
- 192706294（已满）
- 731990104（已满）
- 798786647（已满）
- 729884609（已满）
- 977810755（已满）
- 815126783（已满）
- 812540095（已满）
- 1006366459（已满）
- 692541889
加入微信群：liuxiaoyan-s，备注入群
也欢迎加入知识星球 Go粉丝们（免费）

简单介绍Prometheus

一、什么是Prometheus？

二、Prometheus的启动？

三、架构

数据收集：

Prometheus server

TSDB

HTTP Server

PromQL

用户登录

今日阅读排行

一周阅读排行

关注我

一、什么是Prometheus？

二、Prometheus的启动？

三、架构

数据收集：

Prometheus server

TSDB

HTTP Server

PromQL

简单介绍Prometheus

一、什么是Prometheus？

二、Prometheus的启动？

三、架构

数据收集：

Prometheus server

TSDB

HTTP Server

PromQL

用户登录

今日阅读排行

一周阅读排行

关注我

给该专栏投稿 写篇新文章

收入到我管理的专栏 新建专栏

一、什么是Prometheus？

二、Prometheus的启动？

三、架构

数据收集：

Prometheus server

TSDB

HTTP Server

PromQL

给该专栏投稿写篇新文章

收入到我管理的专栏新建专栏