y76.第四章 Prometheus大厂监控体系及实战 -- prometheus进阶(七)
发布时间
阅读量:
阅读量
7.prometheus进阶
7.1 安装cadvisor
7.1.1 通过daemonset运行安装cadvisor
root@k8s-master1:~# cd kube-prometheus/manifests/
root@k8s-master1:~/kube-prometheus/manifests# kubectl delete -f .
root@k8s-master1:~/kube-prometheus/manifests# kubectl delete -f setup/
root@k8s-master1:~# mkdir prometheus-case
root@k8s-master1:~# cd prometheus-case/
root@k8s-master1:~/prometheus-case# cat case1-daemonset-deploy-cadvisor.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: cadvisor
namespace: monitoring
spec:
selector:
matchLabels:
app: cAdvisor
template:
metadata:
labels:
app: cAdvisor
spec:
tolerations: #污点容忍,忽略master的NoSchedule
- effect: NoSchedule
key: node-role.kubernetes.io/master
hostNetwork: true
restartPolicy: Always # 重启策略
containers:
- name: cadvisor
image: harbor.raymonds.cc/base_images/cadvisor:v0.39.3
imagePullPolicy: IfNotPresent # 镜像策略
ports:
- containerPort: 8080
volumeMounts:
- name: root
mountPath: /rootfs
- name: run
mountPath: /var/run
- name: sys
mountPath: /sys
- name: docker
mountPath: /var/lib/docker
volumes:
- name: root
hostPath:
path: /
- name: run
hostPath:
path: /var/run
- name: sys
hostPath:
path: /sys
- name: docker
hostPath:
path: /var/lib/docker
root@k8s-master1:~/prometheus-case# kubectl create ns monitoring
namespace/monitoring created
root@k8s-master1:~/prometheus-case# kubectl apply -f case1-daemonset-deploy-cadvisor.yaml
daemonset.apps/cadvisor created
root@k8s-master1:~/prometheus-case# kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
cadvisor-2mjwz 1/1 Running 0 14s
cadvisor-76kcd 1/1 Running 0 14s
cadvisor-gbk26 1/1 Running 0 14s
cadvisor-l79lr 1/1 Running 0 14s
cadvisor-qmk66 1/1 Running 0 14s
cadvisor-rwwlk 1/1 Running 0 14s
AI助手
http://172.31.7.111:8080/containers/

7.1.2 cadvisor指标数据
指标名称 类型 含义
container_cpu_load_average_10s gauge 过去10秒容器CPU的平均负载
container_cpu_usage_seconds_total counter 容器在每个CPU内核上的累积占用时间 (单位:秒)
container_cpu_system_seconds_total counter System CPU累积占用时间(单位:秒)
container_cpu_user_seconds_total counter User CPU累积占用时间(单位:秒)
container_fs_usage_bytes gauge 容器中文件系统的使用量(单位:字节)
container_fs_limit_bytes gauge 容器可以使用的文件系统总量(单位:字节)
container_fs_reads_bytes_total counter 容器累积读取数据的总量(单位:字节)
container_fs_writes_bytes_total counter 容器累积写入数据的总量(单位:字节)
container_memory_max_usage_bytes gauge 容器的最大内存使用量(单位:字节)
container_memory_usage_bytes gauge 容器当前的内存使用量(单位:字节
container_spec_memory_limit_bytes gauge 容器的内存使用量限制
machine_memory_bytes gauge 当前主机的内存总量
container_network_receive_bytes_total counter 容器网络累积接收数据总量(单位:字节)
container_network_transmit_bytes_total counter 容器网络累积传输数据总量(单位:字节)
当能够正常采集到cAdvisor的样本数据后,可以通过以下表达式计算容器的CPU使用率:
(1)sum(irate(container_cpu_usage_seconds_total{image!=""}[1m])) without (cpu)
容器CPU使用率
(2)container_memory_usage_bytes{image!=""}
查询容器内存使用量(单位:字节):
(3)sum(rate(container_network_receive_bytes_total{image!=""}[1m])) without (interface)
查询容器网络接收量(速率)(单位:字节/秒):
(4)sum(rate(container_network_transmit_bytes_total{image!=""}[1m])) without (interface)
容器网络传输量 字节/秒
(5)sum(rate(container_fs_reads_bytes_total{image!=""}[1m])) without (device)
容器文件系统读取速率 字节/秒
(6)sum(rate(container_fs_writes_bytes_total{image!=""}[1m])) without (device)
容器文件系统写入速率 字节/秒
cadvisor 常用容器监控指标
(1)网络流量
sum(rate(container_network_receive_bytes_total{name=~".+"}[1m])) by (name)
##容器网络接收的字节数(1分钟内),根据名称查询 name=~".+"
sum(rate(container_network_transmit_bytes_total{name=~".+"}[1m])) by (name)
##容器网络传输的字节数(1分钟内),根据名称查询 name=~".+"
(2)容器 CPU相关
sum(rate(container_cpu_system_seconds_total[1m]))
###所用容器system cpu的累计使用时间(1min钟内)
sum(irate(container_cpu_system_seconds_total{image!=""}[1m])) without (cpu)
###每个容器system cpu的使用时间(1min钟内)
sum(rate(container_cpu_usage_seconds_total{name=~".+"}[1m])) by (name)
#每个容器的cpu使用率
sum(sum(rate(container_cpu_usage_seconds_total{name=~".+"}[1m])) by (name) * 100)
#总容器的cpu使用率
increase(v range-vector)函数是PromQL中提供的众多内置函数之一。其中参数v是一个区间向量,increase函数获取区间向量中的第一个后最后一个样本并返回其增长量。因此,可以通过以下表达式Counter类型指标的增长率:
1
increase(node_cpu[2m]) / 120
Copied!
这里通过node_cpu[2m]获取时间序列最近两分钟的所有样本,increase计算出最近两分钟的增长量,最后除以时间120秒得到node_cpu样本在最近两分钟的平均增长率。并且这个值也近似于主机节点最近两分钟内的平均CPU使用率。
除了使用increase函数以外,PromQL中还直接内置了rate(v range-vector)函数,rate函数可以直接计算区间向量v在时间窗口内平均增长速率。因此,通过以下表达式可以得到与increase函数相同的结果:
1
rate(node_cpu[2m])
Copied!
需要注意的是使用rate或者increase函数去计算样本的平均增长速率,容易陷入“长尾问题”当中,其无法反应在时间窗口内样本数据的突发变化。 例如,对于主机而言在2分钟的时间窗口内,可能在某一个由于访问量或者其它问题导致CPU占用100%的情况,但是通过计算在时间窗口内的平均增长率却无法反应出该问题。
为了解决该问题,PromQL提供了另外一个灵敏度更高的函数irate(v range-vector)。irate同样用于计算区间向量的计算率,但是其反应出的是瞬时增长率。irate函数是通过区间向量中最后两个样本数据来计算区间向量的增长速率。这种方式可以避免在时间窗口范围内的“长尾问题”,并且体现出更好的灵敏度,通过irate函数绘制的图标能够更好的反应样本数据的瞬时变化状态。
1
irate(node_cpu[2m])
Copied!
irate函数相比于rate函数提供了更高的灵敏度,不过当需要分析长期趋势或者在告警规则中,irate的这种灵敏度反而容易造成干扰。因此在长期趋势分析或者告警中更推荐使用rate函数。
AI助手
7.2 node-exporter
7.2.1 daemonset运行node-exporter
root@k8s-master1:~# docker pull registry.cn-beijing.aliyuncs.com/raymond9/node-exporter:v1.3.1
root@k8s-master1:~# docker tag registry.cn-beijing.aliyuncs.com/raymond9/node-exporter:v1.3.1 harbor.raymonds.cc/base_images/node-exporter:v1.3.1
root@k8s-master1:~# docker push harbor.raymonds.cc/base_images/node-exporter:v1.3.1
root@k8s-master1:~/prometheus-case# cat case2-daemonset-deploy-node-exporter.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitoring
labels:
k8s-app: node-exporter
spec:
selector:
matchLabels:
k8s-app: node-exporter
template:
metadata:
labels:
k8s-app: node-exporter
spec:
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
containers:
- image: harbor.raymonds.cc/base_images/node-exporter:v1.3.1
imagePullPolicy: IfNotPresent
name: prometheus-node-exporter
ports:
- containerPort: 9100
hostPort: 9100
protocol: TCP
name: metrics
volumeMounts:
- mountPath: /host/proc
name: proc
- mountPath: /host/sys
name: sys
- mountPath: /host
name: rootfs
args:
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
- --path.rootfs=/host
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
- name: rootfs
hostPath:
path: /
hostNetwork: true
hostPID: true
---
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: "true"
labels:
k8s-app: node-exporter
name: node-exporter
namespace: monitoring
spec:
type: NodePort
ports:
- name: http
port: 9100
nodePort: 39100
protocol: TCP
selector:
k8s-app: node-exporter
root@k8s-master1:~/prometheus-case# kubectl apply -f case2-daemonset-deploy-node-exporter.yaml
daemonset.apps/node-exporter created
service/node-exporter created
root@k8s-master1:~/prometheus-case# kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
cadvisor-2mjwz 1/1 Running 0 34m
cadvisor-76kcd 1/1 Running 0 34m
cadvisor-gbk26 1/1 Running 0 34m
cadvisor-l79lr 1/1 Running 0 34m
cadvisor-qmk66 1/1 Running 0 34m
cadvisor-rwwlk 1/1 Running 0 34m
node-exporter-9dpnf 1/1 Running 0 22s
node-exporter-bk5xd 1/1 Running 0 22s
node-exporter-fdwn7 1/1 Running 0 22s
node-exporter-k5st8 1/1 Running 0 22s
node-exporter-m92ss 1/1 Running 0 22s
node-exporter-r4svj 1/1 Running 0 22s
AI助手

7.2.2 node-exporter指标数据
root@k8s-master1:~/prometheus-case# curl 172.31.7.111:39100/metrics
...
# HELP node_cpu_guest_seconds_total Seconds the CPUs spent in guests (VMs) for each mode.
# TYPE node_cpu_guest_seconds_total counter
node_cpu_guest_seconds_total{cpu="0",mode="nice"} 0
node_cpu_guest_seconds_total{cpu="0",mode="user"} 0
node_cpu_guest_seconds_total{cpu="1",mode="nice"} 0
node_cpu_guest_seconds_total{cpu="1",mode="user"} 0
# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="idle"} 4434.99
node_cpu_seconds_total{cpu="0",mode="iowait"} 0.47
node_cpu_seconds_total{cpu="0",mode="irq"} 0
node_cpu_seconds_total{cpu="0",mode="nice"} 0
node_cpu_seconds_total{cpu="0",mode="softirq"} 23.86
node_cpu_seconds_total{cpu="0",mode="steal"} 0
node_cpu_seconds_total{cpu="0",mode="system"} 140.79
node_cpu_seconds_total{cpu="0",mode="user"} 81.3
node_cpu_seconds_total{cpu="1",mode="idle"} 4457.38
node_cpu_seconds_total{cpu="1",mode="iowait"} 0.68
node_cpu_seconds_total{cpu="1",mode="irq"} 0
node_cpu_seconds_total{cpu="1",mode="nice"} 0
node_cpu_seconds_total{cpu="1",mode="softirq"} 8.42
node_cpu_seconds_total{cpu="1",mode="steal"} 0
node_cpu_seconds_total{cpu="1",mode="system"} 137.54
node_cpu_seconds_total{cpu="1",mode="user"} 88.09
常见指标数据:
node_boot_time:系统启动时间
node_cpu:系统CPU使用量
nodedisk*:磁盘IO
nodefilesystem*:文件系统用量
node_load1:系统负载
nodememeory*:内存使用量
nodenetwork*:网络带宽
node_time:当前系统时间
go_*:node exporter中go相关指标
process_*:node exporter自身进程相关运行指标
AI助手
root@prometheus1:~# vim /apps/prometheus/prometheus.yml
...
- job_name: 'prometheus-cadvisor'
static_configs:
- targets: ['172.31.7.111:8080','172.31.7.112:8080','172.31.7.113:8080']
- job_name: 'prometheus-node'
static_configs:
- targets: ['172.31.7.111:39100','172.31.7.112:39100','172.31.7.113:39100']
root@prometheus1:~# systemctl restart prometheus
AI助手

7.3 prometheus server
7.3.1 yaml文件部署prometheus server
7.3.1.1 prometheuus cfg
root@k8s-master1:~/prometheus-case# cat case3-1-prometheus-cfg.yaml
---
kind: ConfigMap
apiVersion: v1
metadata:
labels:
app: prometheus
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |
global:
scrape_interval: 15s
scrape_timeout: 10s
evaluation_interval: 1m
scrape_configs:
- job_name: 'kubernetes-node'
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:9100'
target_label: __address__
action: replace
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- job_name: 'kubernetes-node-cadvisor'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
- job_name: 'kubernetes-apiserver'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
AI助手
7.3.1.2 prometheus deployment
#将prometheus运行在node3,并提前准备数据目录,并授权
root@k8s-node3:~# mkdir /data/prometheusdata
root@k8s-node3:~# chmod 777 /data/prometheusdata
#创建监控账号
root@k8s-master1:~/prometheus-case# kubectl create serviceaccount monitor -n monitoring
serviceaccount/monitor created
#对monitor账号授权
root@k8s-master1:~/prometheus-case# kubectl create clusterrolebinding monitor-clusterrolebinding -n monitoring --clusterrole=cluster-admin --serviceaccount=monitoring:monitor
clusterrolebinding.rbac.authorization.k8s.io/monitor-clusterrolebinding created
root@k8s-master1:~/prometheus-case# kubectl apply -f case3-1-prometheus-cfg.yaml
configmap/prometheus-config created
root@k8s-master1:~/prometheus-case# kubectl get configmaps -n monitoring
NAME DATA AGE
kube-root-ca.crt 1 76m
prometheus-config 1 5m3s
root@k8s-master1:~/prometheus-case# kubectl describe configmaps prometheus-config -n monitoring
Name: prometheus-config
Namespace: monitoring
Labels: app=prometheus
Annotations: <none>
Data
====
prometheus.yml:
----
global:
scrape_interval: 15s
scrape_timeout: 10s
evaluation_interval: 1m
scrape_configs:
- job_name: 'kubernetes-node'
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:9100'
target_label: __address__
action: replace
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- job_name: 'kubernetes-node-cadvisor'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
- job_name: 'kubernetes-apiserver'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
Events: <none>
root@k8s-master1:~# docker pull registry.cn-beijing.aliyuncs.com/raymond9/prometheus:v2.31.2
root@k8s-master1:~# docker tag registry.cn-beijing.aliyuncs.com/raymond9/prometheus:v2.31.2 harbor.raymonds.cc/base_images/prometheus:v2.31.2
root@k8s-master1:~# docker push harbor.raymonds.cc/base_images/prometheus:v2.31.2
root@k8s-master1:~/prometheus-case# cat case3-2-prometheus-deployment.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-server
namespace: monitoring
labels:
app: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
component: server
#matchExpressions:
#- {key: app, operator: In, values: [prometheus]}
#- {key: component, operator: In, values: [server]}
template:
metadata:
labels:
app: prometheus
component: server
annotations:
prometheus.io/scrape: 'false'
spec:
nodeName: 172.31.7.113
serviceAccountName: monitor
containers:
- name: prometheus
image: harbor.raymonds.cc/base_images/prometheus:v2.31.2
imagePullPolicy: IfNotPresent
command:
- prometheus
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.path=/prometheus
- --storage.tsdb.retention=720h
ports:
- containerPort: 9090
protocol: TCP
volumeMounts:
- mountPath: /etc/prometheus/prometheus.yml
name: prometheus-config
subPath: prometheus.yml
- mountPath: /prometheus/
name: prometheus-storage-volume
volumes:
- name: prometheus-config
configMap:
name: prometheus-config
items:
- key: prometheus.yml
path: prometheus.yml
mode: 0644
- name: prometheus-storage-volume
hostPath:
path: /data/prometheusdata
type: Directory
root@k8s-master1:~/prometheus-case# kubectl apply -f case3-2-prometheus-deployment.yaml
deployment.apps/prometheus-server created
root@k8s-master1:~/prometheus-case# kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
cadvisor-2mjwz 1/1 Running 0 78m
cadvisor-76kcd 1/1 Running 0 78m
cadvisor-gbk26 1/1 Running 0 78m
cadvisor-l79lr 1/1 Running 0 78m
cadvisor-qmk66 1/1 Running 0 78m
cadvisor-rwwlk 1/1 Running 0 78m
node-exporter-9dpnf 1/1 Running 0 44m
node-exporter-bk5xd 1/1 Running 0 44m
node-exporter-fdwn7 1/1 Running 0 44m
node-exporter-k5st8 1/1 Running 0 44m
node-exporter-m92ss 1/1 Running 0 44m
node-exporter-r4svj 1/1 Running 0 44m
prometheus-server-549549ccc-7bzct 1/1 Running 0 13s
root@k8s-master1:~/prometheus-case# cat case3-3-prometheus-svc.yaml
---
apiVersion: v1
kind: Service
metadata:
name: prometheus
namespace: monitoring
labels:
app: prometheus
spec:
type: NodePort
ports:
- port: 9090
targetPort: 9090
nodePort: 30090
protocol: TCP
selector:
app: prometheus
component: server
root@k8s-master1:~/prometheus-case# kubectl apply -f case3-3-prometheus-svc.yaml
service/prometheus created
root@k8s-master1:~/prometheus-case# kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
node-exporter NodePort 10.100.9.200 <none> 9100:39100/TCP 45m
prometheus NodePort 10.100.30.53 <none> 9090:30090/TCP 15s
AI助手

全部评论 (0)
还没有任何评论哟~
