当前观点:Prometheus-2:blackbox_exporter黑盒监控

博客园   2023-07-02 12:35:04


(相关资料图)

黑盒监控blackbox_exporter前边介绍有很多exporter可以直接将metrics暴露给Prometheus进行监控,这些称为“白盒监控”,那些exporter无法监控到的指标呢?或者未暴露Metrics给Prometheus的一些服务怎么办?这时就要用到 blackbox_exporte “黑盒监控”。blackbox_exporte支持用户通过:HTTP、HTTPS、DNS、TCP和ICMP的方式对网络进行探测,还可以探测SSL证书过期时间。部署及使用blackbox_exporter部署blackbox_exporter这里以linux二进制部署为例:下载安装包
curl -LO https://github.com/prometheus/blackbox_exporter/releases/download/v0.22.0/blackbox_exporter-0.22.0.linux-amd64.tar.gz
展开程序包:
tar xf blackbox_exporter-0.22.0.linux-amd64.tar.gz -C /usr/local/ln -sv /usr/local/blackbox_exporter-0.22.0.linux-amd64 /usr/local/blackbox_exporter
创建用户,或prometheus用户已经存在,可略过该步骤:
useradd -r prometheus
创建Systemd Unitfile,保存于/usr/lib/systemd/system/blackbox_exporter.service文件中:
[Unit]Description=blackbox_exporterAfter=network.target[Service]Type=simpleUser=rootGroup=rootExecStart=/usr/local/blackbox_exporter/blackbox_exporter \    --config.file=/usr/local/blackbox_exporter/blackbox.yml \    --web.listen-address=:9115Restart=on-failure[Install]WantedBy=multi-user.target
启动服务:
systemctl daemon-reloadsystemctl start blackbox_exporter.servicesystemctl enable blackbox_exporter.service
验证监听的端口,并测试访问其暴露的指标
ss -tnlp | grep "9115"curl localhost:9115/metrics
随后即可访问Blackbox Exporter的Web UI,其使用的URL如下,其中的要替换为节点的实际地址:http://:9115/icmp监控,监控主机存活状态prometheus 添加相关监控,Blackbox 使用默认配置启动即可vi /usr/local/prometheus/prometheus.yml
- job_name: "icmp_ping"    metrics_path: /probe    params:      module: [icmp]  # 使用icmp模块    file_sd_configs:    - refresh_interval: 10s #检测时间间隔      files:      - "ping/ping_status*.yml"  #具体的配置文件路径    relabel_configs:    - source_labels: [__address__]      regex: (.*)(:80)?      target_label: __param_target      replacement: ${1}    - source_labels: [__param_target]      target_label: instance    - source_labels: [__param_target]      regex: (.*)      target_label: ping      replacement: ${1}    - source_labels: []      regex: .*      target_label: __address__      replacement: 127.0.0.1:9115
这里有很多relabel的操作,下篇博客会详细讲解
创建对应的ping目录
cd /usr/local/prometheus/mkdir pingcd ping
vi ping_status.yml
- targets: ["monitor.example.com"]  labels:    group: "跳板机"- targets: ["10.xx.xx.xx","10.xx.xx.xx","10.xx.xx.xx"]  labels:    group: "k8s cluster"- targets: ["www.baidu.com"]  labels:    group: "百度" 
配置完成后,可以检查配置文件语法,并让Prometheus重载配置。
./promtool check config prometheus.yml curl -XPOST monitor.example.com:9090/-/reload
打开Prometheus web UI,可以看到,已经监控到了主机icmp情况:http监控编辑prometheus的主配置文件prometheus.yml,添加类似如下内容,即可用户对目标站点的探测。
# Blackbox Exporter  - job_name: "http_get_status"    metrics_path: /probe    params:      module: [http_2xx]  # Look for a HTTP 200 response.    file_sd_configs:    - refresh_interval: 2m      files:      - "httpget/http_get*.yml"  #具体的配置文件    relabel_configs:      - source_labels: [__address__]        target_label: __param_target      - source_labels: [__param_target]        target_label: instance      - target_label: __address__        replacement: "monitor.example.com:9115"  # 指向实际的Blackbox exporter.      - target_label: region        replacement: "local"
vi httpget/http_get.yml
static_configs:- targets:  - "https://monitor.example.com"  - "http://monitor.example.com:8080"  - "www.google.com"      refresh_interval: 2m
重新加载Prometheus
curl -XPOST monitor.example.com:9090/-/reload
展示:这里要注意,blackbox下并不是prometheus中State状态为UP就认为是正常状态,其实并不然,这里我们随便写一个不存在的域名,例如:http://www.buzhida2222o.com,这里看也是UP状态:但实际看其的metrics指标并不正常,探活的指标是失败,这里不确定是BUG还是怎样。tcp端口监控大体的步骤都是一致的,这里就直接上配置:
- job_name: "tcp_port_status"  metrics_path: /probe  params:    module: [tcp_connect]  static_configs:    - targets: ["monitor.example.com:80","monitor.example.com:8080","monitor.example.com:443"]      labels:        instance: "port_status"        group: "tcp"  relabel_configs:    - source_labels: [__address__]      target_label: __param_target    - source_labels: [__param_target]      target_label: instance    - target_label: __address__      replacement: monitor.example.com:9115
成功监控:至此,黑盒监控中常用的一些功能介绍完毕,监控项配置完毕后,可以通过导入dashboard到grafana来更直观的查看监控数据。自定义blackbox.ymlblackbox的默认监控配置也可以进行自定义修改,例如http GET添加一些headers,设置boby_size_limit值或判断一些response body是否符合预期,还有一些TLS的设置等等,我们可以参考官网文档中给出的example来进行自定义:
# github地址https://github.com/prometheus/blackbox_exporter# github中blackbox.yml各配置项解析blackbox_exporter/CONFIGURATION.md at master · prometheus/blackbox_exporter · GitHub# github中example文件blackbox_exporter/example.yml at master · prometheus/blackbox_exporter · GitHub
这里我们做一个演示https及私有TLS证书的演示,监控证书过期时间。首先需要更改默认的vim blackbox.yml
modules:  http_2xx:    prober: http    http:      preferred_ip_protocol: "ip4"      valid_http_versions: ["HTTP/1.1", "HTTP/2"]      valid_status_codes: [200,301,302,303]      tls_config:        insecure_skip_verify: true  http_ca_example:    prober: http    http:      method: GET      preferred_ip_protocol: "ip4"      valid_http_versions: ["HTTP/1.1", "HTTP/2"]      fail_if_ssl: false      fail_if_not_ssl: true      tls_config:        insecure_skip_verify: false        ca_file: /usr/local/blackbox_exporter/certs/ca.crt        cert_file: /usr/local/blackbox_exporter/certs/server.crt        key_file: /usr/local/blackbox_exporter/certs/server.key
http_2xx:在此模块的基础上添加了请求状态码的校验,并设置了tls校验直接跳过,这种就比较省事,可以直接跳过证书的校验。http_ca_example:此模块为新增,主要配置了tls证书的一些配置,添加ca、证书和私钥的文件位置,使blackbox请求时带着证书。配置完成后,重启blackbox服务:
systemctl restart blackbox_exporter.service
然后配置prometheus.yml添加对应模块的使用:
- job_name: "http_get_status"    metrics_path: /probe    params:      module: [http_2xx]  # Look for a HTTP 200 response.    file_sd_configs:    - refresh_interval: 2m      files:      - "httpget/http_get*.yml"  #具体的配置文件    relabel_configs:      - source_labels: [__address__]        target_label: __param_target      - source_labels: [__param_target]        target_label: instance      - target_label: __address__        replacement: "monitor.example.com:9115"  # 指向实际的Blackbox exporter.      - target_label: region        replacement: "local"  - job_name: "http_get_ca_status"    metrics_path: /probe    params:      module: [http_ca_example]      file_sd_configs:    - refresh_interval: 2m      files:      - "httpget/http_ca.yml"    relabel_configs:      - source_labels: [__address__]        target_label: __param_target      - source_labels: [__param_target]        target_label: instance      - target_label: __address__        replacement: "monitor.example.com:9115"  # 指向实际的Blackbox exporter.      - target_label: region        replacement: "beijing"
在对应的发现文件中,配置主机并重新加载prometheus配置,并查看监控状态:
curl -XPOST monitor.example.com:9090/-/reload
搜索指标
probe_http_duration_seconds{phase="tls"}
证书过期时间已经获取成功,这个时候可以导入ID为13230的Dashboard到Grafana,然后再设置一个告警rule,就可以完成TLS证书过期监控啦。最终效果:

热文榜单