Prometheus+Grafana监控nodejs(docker版)
介绍
Prometheus + Grafana 架构图
Prometheus
Prometheus 是什么?
Prometheus 是一个开源系统监控和警报工具包,最初由 SoundCloud 构建。 自 2012 年启动以来,许多公司和组织都采用了 Prometheus,该项目拥有非常活跃的开发者和用户社区。 它现在是一个独立的开源项目,独立于任何公司进行维护。 为了强调这一点,并明确项目的治理结构,Prometheus 于 2016 年作为继 Kubernetes 之后的第二个托管项目加入了云原生计算基金会。
Prometheus 将其指标收集并存储为时间序列数据,即指标信息与记录时的时间戳以及称为标签的可选键值对一起存储。
Prometheus 特征
具有由指标名称和键/值对标识的时间序列数据的多维数据模型
PromQL,一种灵活的查询语言来利用这个维度
不依赖分布式存储; 单个服务器节点是自治的
时间序列收集通过 HTTP 上的拉模型进行
通过中间网关支持推送时间序列
通过服务发现或静态配置发现目标
多种图形和仪表板支持模式
Grafana
Grafana(官网) 是一个可视化大型测量数据的开源程序。可认为是一个图表库,可自定义图表布局,也可从网上下载布局模板。下载模板操作步骤如下:
- 点搜索
- 输入 nodejs,搜索出现很多面板
- 随便点击一个,里面有个模板 ID,后面会用到
监控场景
- 本文,演示 Prometheus + Grafana 监听某个服务(node-server)的资源使用情况;
- 在上文的架构图中,我们不需要 Pushgateway、Alertmanager;
- 只需要 Prometheus-server、Grafana、某个被监听的服务(node-server)。
准备环境
- windows11;
- wsl2(安装教程),可能需要科学上网;
- docker,用的是 decker-desktop(下载);
- 被监听的服务,本文使用:一个 local-server 和 一个预发布-docker。
在终端打开 Ubuntu 命令行(我装的是 Ubuntu)
下载镜像包
- 在终端中安装
docker pull prom/node-exporter
docker pull prom/prometheus
docker pull grafana/grafana
(prom/exporter 选装,为了演示数据,先装上)
- 或者,在 docker-desktop 中下载镜像。搜索镜像,pull,此方式下载后先不要运行 Prometheus 和 Grafana,这俩需要先做一些配置,如下图:
启动 node-exporter(选装)
- 终端启动:
docker run -d -p 9100:9100 \
-v "/proc:/host/proc:ro" \
-v "/sys:/host/sys:ro" \
-v "/:/rootfs:ro" \
prom/node-exporter
- 访问 url:http://localhost:9100/metrics 看到如下图所示,说明该服务启动成功:
- 说明,Grafana 是用 Go 语言开发的,node-exporter 的指标(metrics)大多以 go_ 开头。这并不适合本文演示的 nodejs,所以标为“选装”
启动 Prometheus
- 终端新建目录 prometheus,编辑配置文件 prometheus.yml:
mkdir /opt/prometheus
cd /opt/prometheus/
vim prometheus.yml
- prometheus.yml 内容示例如下:
global:
scrape_interval: 5s #采集间隔
evaluation_interval: 5s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
scrape_configs:
- job_name: 'node-exporter'
static_configs:
- targets: ['192.168.1.10:9100'] #注意,此处写宿主机的本地 ip
- job_name: 'file-service-local' #加入监听
static_configs:
- targets: ['192.168.1.10:8102'] #注意,此处写宿主机的本地 ip
- job_name: 'file-service-3.203' #加入监听
static_configs:
- targets: ['192.168.3.203:32730'] #注意,此处写宿主机的本地 ip
-
注意:
- 监听其他服务,格式不变,一个 job_name 对应一个 targets,targets 是个数组,可以添加多个 target
- 修改配置文件(prometheus.yml)后,需要停止和删除容器:prom/prometheus(在 docker-desktop 中操作),再启动 Prometheus
-
终端启动:
docker run -d \
-p 9090:9090 \
-v /opt/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml \
prom/prometheus
-
访问 url: http://localhost:9090/ ,可输入某个指标,点Execute,看到如下图:
-
点击下拉框,Targets,可以看到你配置文件中的服务是否正常启动:
启动 Grafana
- 用终端新建目录 grafana-storage,用于 grafana 存储数据
mkdir /opt/grafana-storage
- 设置权限
chmod 777 -R /opt/grafana-storage
- 终端启动:
docker run -d \
-p 3000:3000 \
--name=grafana \
-v /opt/grafana-storage:/var/lib/grafana \
grafana/grafana
-
访问 url:http://localhost:3000/ ,登录,用户和密码都是 admin。登录后要求你重置密码,当然你也可以设置同一个密码
-
进入主页,设置数据源,如图:
-
创建仪表盘。导入 json 文件,或输入 ID 下载(见上文),我用的是模板 ID 为:11159,如下图:
-
仪表盘中的每个图表板块都可以配置,编辑或移除,多尝试探索下设置:
-
我的 dashboard-json 有需要可以看看
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "datasource",
"uid": "grafana"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"target": {
"limit": 100,
"matchAny": false,
"tags": [],
"type": "dashboard"
},
"type": "dashboard"
}
]
},
"description": "node.js prometheus client basic metrics",
"editable": true,
"fiscalYearStartMonth": 0,
"gnetId": 11159,
"graphTooltip": 0,
"id": 1,
"links": [],
"liveNow": false,
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": {
"type": "prometheus",
"uid": "hpQAqjxVz"
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 19,
"x": 0,
"y": 0
},
"hiddenSeries": false,
"id": 7,
"interval": "5",
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": true,
"min": true,
"rightSide": true,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"paceLength": 10,
"percentage": false,
"pluginVersion": "9.4.3",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "hpQAqjxVz"
},
"expr": "process_resident_memory_bytes{instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "Process Memory - {{instance}}",
"refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "hpQAqjxVz"
},
"expr": "nodejs_heap_size_total_bytes{instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "Heap Total - {{instance}}",
"refId": "B"
},
{
"datasource": {
"type": "prometheus",
"uid": "hpQAqjxVz"
},
"expr": "nodejs_heap_size_used_bytes{instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "Heap Used - {{instance}}",
"refId": "C"
},
{
"datasource": {
"type": "prometheus",
"uid": "hpQAqjxVz"
},
"expr": "nodejs_external_memory_bytes{instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "External Memory - {{instance}}",
"refId": "D"
}
],
"thresholds": [],
"timeRegions": [],
"title": "Process Memory Usage",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "bytes",
"logBase": 1,
"show": true
},
{
"format": "short",
"logBase": 1,
"show": true
}
],
"yaxis": {
"align": false
}
},
{
"datasource": {
"type": "prometheus",
"uid": "hpQAqjxVz"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [
{
"options": {
"match": "null",
"result": {
"text": "N/A"
}
},
"type": "special"
}
],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "none"
},
"overrides": []
},
"gridPos": {
"h": 3,
"w": 5,
"x": 19,
"y": 0
},
"id": 2,
"interval": "",
"links": [],
"maxDataPoints": 100,
"options": {
"colorMode": "none",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "horizontal",
"reduceOptions": {
"calcs": [
"mean"
],
"fields": "",
"values": false
},
"textMode": "name"
},
"pluginVersion": "9.4.3",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "hpQAqjxVz"
},
"editorMode": "code",
"expr": "nodejs_version_info{}",
"format": "time_series",
"instant": false,
"interval": "",
"intervalFactor": 1,
"legendFormat": "{{version}}",
"refId": "A"
}
],
"title": "Node.js Version",
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "hpQAqjxVz"
},
"fieldConfig": {
"defaults": {
"color": {
"fixedColor": "#F2495C",
"mode": "fixed"
},
"mappings": [
{
"options": {
"match": "null",
"result": {
"text": "N/A"
}
},
"type": "special"
}
],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "none"
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 5,
"x": 19,
"y": 3
},
"id": 4,
"links": [],
"maxDataPoints": 100,
"options": {
"colorMode": "none",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "horizontal",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "auto"
},
"pluginVersion": "9.4.3",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "hpQAqjxVz"
},
"expr": "sum(changes(process_start_time_seconds{instance=~\"$instance\"}[1m]))",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"title": "Process Restart Times",
"type": "stat"
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": {
"type": "prometheus",
"uid": "hpQAqjxVz"
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 10,
"x": 0,
"y": 7
},
"hiddenSeries": false,
"id": 6,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": true,
"min": true,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"paceLength": 10,
"percentage": false,
"pluginVersion": "9.4.3",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "hpQAqjxVz"
},
"expr": "irate(process_cpu_user_seconds_total{instance=~\"$instance\"}[2m]) * 100",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "User CPU - {{instance}}",
"refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "hpQAqjxVz"
},
"expr": "irate(process_cpu_system_seconds_total{instance=~\"$instance\"}[2m]) * 100",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "Sys CPU - {{instance}}",
"refId": "B"
}
],
"thresholds": [],
"timeRegions": [],
"title": "Process CPU Usage",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "percent",
"logBase": 1,
"show": true
},
{
"format": "short",
"logBase": 1,
"show": true
}
],
"yaxis": {
"align": false
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": {
"type": "prometheus",
"uid": "hpQAqjxVz"
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 6,
"x": 10,
"y": 7
},
"hiddenSeries": false,
"id": 9,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": true,
"min": true,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"paceLength": 10,
"percentage": false,
"pluginVersion": "9.4.3",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "hpQAqjxVz"
},
"expr": "nodejs_active_handles_total{instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "Active Handler - {{instance}}",
"refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "hpQAqjxVz"
},
"expr": "nodejs_active_requests_total{instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "Active Request - {{instance}}",
"refId": "B"
}
],
"thresholds": [],
"timeRegions": [],
"title": "Active Handlers/Requests Total",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"logBase": 1,
"show": true
},
{
"format": "short",
"logBase": 1,
"show": true
}
],
"yaxis": {
"align": false
}
},
{
"datasource": {
"type": "prometheus",
"uid": "hpQAqjxVz"
},
"description": "",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 7,
"w": 8,
"x": 16,
"y": 7
},
"id": 14,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "hpQAqjxVz"
},
"editorMode": "code",
"expr": "nodejs_eventloop_lag_seconds{}",
"legendFormat": "__auto__",
"range": true,
"refId": "A"
}
],
"title": "Event Loop Lag",
"type": "timeseries"
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": {
"type": "prometheus",
"uid": "hpQAqjxVz"
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 8,
"w": 8,
"x": 0,
"y": 14
},
"hiddenSeries": false,
"id": 10,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": true,
"min": true,
"rightSide": false,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"paceLength": 10,
"percentage": false,
"pluginVersion": "9.4.3",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "hpQAqjxVz"
},
"expr": "nodejs_heap_space_size_total_bytes{instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "Heap Total - {{instance}} - {{space}}",
"refId": "A"
}
],
"thresholds": [],
"timeRegions": [],
"title": "Heap Total Detail",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "bytes",
"logBase": 1,
"show": true
},
{
"format": "short",
"logBase": 1,
"show": true
}
],
"yaxis": {
"align": false
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": {
"type": "prometheus",
"uid": "hpQAqjxVz"
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 8,
"w": 8,
"x": 8,
"y": 14
},
"hiddenSeries": false,
"id": 11,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": true,
"min": true,
"rightSide": false,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"paceLength": 10,
"percentage": false,
"pluginVersion": "9.4.3",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "hpQAqjxVz"
},
"expr": "nodejs_heap_space_size_used_bytes{instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "Heap Used - {{instance}} - {{space}}",
"refId": "A"
}
],
"thresholds": [],
"timeRegions": [],
"title": "Heap Used Detail",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "bytes",
"logBase": 1,
"show": true
},
{
"format": "short",
"logBase": 1,
"show": true
}
],
"yaxis": {
"align": false
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": {
"type": "prometheus",
"uid": "hpQAqjxVz"
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 8,
"w": 8,
"x": 16,
"y": 14
},
"hiddenSeries": false,
"id": 12,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": true,
"min": true,
"rightSide": false,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"paceLength": 10,
"percentage": false,
"pluginVersion": "9.4.3",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "hpQAqjxVz"
},
"expr": "nodejs_heap_space_size_available_bytes{instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "Heap Used - {{instance}} - {{space}}",
"refId": "A"
}
],
"thresholds": [],
"timeRegions": [],
"title": "Heap Available Detail",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "bytes",
"logBase": 1,
"show": true
},
{
"format": "short",
"logBase": 1,
"show": true
}
],
"yaxis": {
"align": false
}
}
],
"refresh": "5s",
"revision": 1,
"schemaVersion": 38,
"style": "dark",
"tags": [
"nodejs"
],
"templating": {
"list": [
{
"allValue": "",
"current": {
"selected": true,
"text": [
"192.168.1.10:8102"
],
"value": [
"192.168.1.10:8102"
]
},
"datasource": {
"type": "prometheus",
"uid": "hpQAqjxVz"
},
"definition": "label_values(nodejs_version_info, instance)",
"hide": 0,
"includeAll": false,
"label": "instance",
"multi": true,
"name": "instance",
"options": [],
"query": {
"query": "label_values(nodejs_version_info, instance)",
"refId": "StandardVariableQuery"
},
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "NodeJS Application Dashboard",
"uid": "PTSqcpJWk",
"version": 27,
"weekStart": ""
}
- 注意,此 nodejs-dashboard 不会显示 node-exporter 节点的数据。不同的监听节点有不同的 dashboard,上 Grafana 自行搜索。
代码怎么写
node-server 项目中,安装依赖:
prom-client
prometheus-api-metrics
Github | prometheus-api-metrics
代码很简单,在入口文件(如,app.ts)增加两行代码即可,插件会自动添加路由(/metrics),如下:
// express
import apiMetrics from 'prometheus-api-metrics'
app.use(apiMetrics({ defaultMetricsInterval: 5000 }))
// koa
const { koaMiddleware } = require('prometheus-api-metrics')
app.use(koaMiddleware())
启动 node 服务,Prometheus 就可以采集到数据了
ts-node app.ts
启动服务后,打开对应的端口号和路由路径,如果能看到下图,说明服务成功:
示例
nodejs 主要监听服务过程中是否存在【内存泄漏】和【资源使用情况】(比如,某个操作导致消耗的内存增加,但结束后并没释放,则可能存在内存泄漏)。贴两张我的面板图
本地:
预发布 docker
疑问:本地的 Prometheus 怎么访问到 预发布环境的 docker?答:只要服务暴露外部访问端口,ping 通就能访问!
不足之处
文中的监控服务 Prometheus + Grafana,仅在本地测试。
服务只能监控资源使用情况,但不能打快照,快照分析可具体定位到哪个文件和函数。可能是我懒惰了浅尝辄止学习能力的问题,根据自己的项目情况够用就行!
Easy-Monitor3.0 可以抓取快照,有兴趣的去看看 Easy-Monitor 3.0 使用指南
参考链接
有什么好点子和建议,欢迎评论留言~
转载自:https://juejin.cn/post/7209950095403155512