title: 8.7.在k8s中使用钉钉,邮件,企业微信接收告警通知
order: 49
icon: lightbulb
一、环境
主机名 | IP地址 | 系统 | 说明 |
k8s | 192.168.11.65 | Ubuntu 20.04 | k8s版本:v1.23.10 单机版本 |
1、环境准备
kube-prometheus-stack工具集安装的prometheus
2、配置端口转发
使alertmanager的端口能外部访问
kubectl port-forward --address=0.0.0.0 svc/prometheus-kube-prometheus-alertmanager -n monitoring 9093:9093 &
二、告警通知
因为kube-prometheus-stack的工具集已经带alertmanger的配置
1、配置邮件告警
vim kube-prometheus-stack/values.yaml
```
global:
#163服务器
smtp_smarthost: 'smtp.163.com:465'
#发邮件的邮箱
smtp_from: 'cdring@163.com'
#发邮件的邮箱用户名,也就是你的邮箱
smtp_auth_username: 'cdring@163.com'
#发邮件的邮箱密码
smtp_auth_password: 'your-password'
#tls验证配置,false为关闭
smtp_require_tls: false
route:
# 全局报警组,这个参数是必选的,和下面报警组名要相同
receiver: 'email'
receivers:
- name: 'email'
#接收报警信息的邮箱
email_configs:
- to: 'cdring@163.com'
#当告警恢复后也发通知
send_resolved: true
具体配置如下图:
更新配置
helm upgrade -n monitoring --create-namespace prometheus kube-prometheus-stack
检查
http://192.168.11.65:9093/#/status
手动测试alertmanager接口
需要把9090端口暴露出来,当然通过查看svc的集群ip也是可以的
#!/usr/bin/env bash
alerts1='[
{
"labels": {
"alertname": "DiskRunningFull",
"dev": "sda1",
"instance": "example1",
"severity": "critical"
},
"annotations": {
"description": "The disk sda1 is running full",
"summary": "please check the instance example1"
}
}
]'
curl -XPOST -d"$alerts1" http://localhost:9093/api/v1/alerts
检查
http://192.168.11.65:9093/#/alerts
检查alertmanager日志
当收不到告警信息,我们需要检查日志
kubectl --tail 10 alertmanager-prometheus-kube-prometheus-alertmanager-0 -n monitoring
2、配置钉钉告警
安装prometheus-webhook-dingtalk
cat > prometheus-webhook-dingtalk.yaml <<"EOF"
---
apiVersion: v1
kind: ConfigMap
metadata:
name: dingtalk-config
namespace: monitoring
labels:
app: dingtalk-config
data:
config.yml: |
#templates:
# - /etc/prometheus-webhook-dingtalk/templates/default.tmpl
targets:
webhook1:
url: https://oapi.dingtalk.com/robot/send?access_token=自己的token
secret: 自己的签名秘钥
#message:
# title: '{{ template "ding.link.title" . }}'
# text: '{{ template "ding.link.content" . }}'
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: prometheus-webhook-dingtalk
name: prometheus-webhook-dingtalk
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: prometheus-webhook-dingtalk
template:
metadata:
labels:
app: prometheus-webhook-dingtalk
spec:
containers:
- name: prometheus-webhook-dingtalk
image: timonwong/prometheus-webhook-dingtalk:v2.1.0
imagePullPolicy: IfNotPresent
args:
- "--config.file=/etc/prometheus-webhook-dingtalk/config.yml"
volumeMounts:
- name: webdingtalk-configmap
mountPath: /etc/prometheus-webhook-dingtalk/
ports:
- containerPort: 8060
protocol: TCP
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 200m
memory: 500Mi
volumes:
- name: webdingtalk-configmap
configMap:
name: dingtalk-config
---
apiVersion: v1
kind: Service
metadata:
labels:
app: prometheus-webhook-dingtalk
name: prometheus-webhook-dingtalk
namespace: monitoring
spec:
ports:
- port: 8060
protocol: TCP
targetPort: 8060
selector:
app: prometheus-webhook-dingtalk
EOF
创建
kubectl create -f prometheus-webhook-dingtalk.yaml
检查
kubectl get -f prometheus-webhook-dingtalk.yaml
alertmanager配置
编辑values.yaml文件
vim kube-prometheus-stack/values.yaml
增加钉钉发送通知配置
route:
receiver: dingtalk
receivers:
- name: 'dingtalk'
webhook_configs:
- url: 'http://prometheus-webhook-dingtalk:8060/dingtalk/webhook1/send'
send_resolved: true
更新配置
helm upgrade -n monitoring --create-namespace prometheus prometheus-community/kube-prometheus-stack -f kube-prometheus-stack/values.yaml
检查
http://192.168.11.65:9093/#/status
手动测试钉钉webhook接口
需要把8060端口暴露出来,当然通过查看svc的集群ip也是可以的
kubectl port-forward --address=0.0.0.0 svc/prometheus-webhook-dingtalk -n monitoring 8060:8060
手动测试
curl -H "Content-Type: application/json" -d '{ "version": "4", "status": "firing", "description":"description_content"}' http://localhost:8060/dingtalk/webhook1/send
检查alertmanager日志
当收不到告警信息,我们需要检查日志
kubectl --tail 10 alertmanager-prometheus-kube-prometheus-alertmanager-0 -n monitoring
碰到的问题
由于配置名称错误导致不能发送报警信息给钉钉,如下图:
解决:
把prometheus-webhook-dingtalk.yaml
中的webhook
改为webhook1
,如下:
kubectl edit cm dingtalk-config -n monitoring
修改如下:
data:
config.yml: |
targets:
webhook1:
3、配置微信告警(webhook)
安装prometheus-webhook-wechat
webhook地址:
https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=
cat > prometheus-webhook-wechat.yaml <<"EOF"
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: prometheus-webhook-wechat
name: prometheus-webhook-wechat
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: prometheus-webhook-wechat
template:
metadata:
labels:
app: prometheus-webhook-wechat
spec:
containers:
- name: prometheus-webhook-wechat
image: linge365/webhook-wechat:latest
imagePullPolicy: IfNotPresent
env:
- name: ROBOT_TOKEN
value: "自己的token"
ports:
- containerPort: 5000
protocol: TCP
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 200m
memory: 500Mi
---
apiVersion: v1
kind: Service
metadata:
labels:
app: prometheus-webhook-wechat
name: prometheus-webhook-wechat
namespace: monitoring
spec:
ports:
- port: 5000
protocol: TCP
targetPort: 5000
selector:
app: prometheus-webhook-wechat
EOF
创建
kubectl create -f prometheus-webhook-wechat.yaml
检查
kubectl get -f prometheus-webhook-wechat.yaml
alertmanager配置
编辑values.yaml文件
vim kube-prometheus-stack/values.yaml
增加微信发送通知配置
route:
receiver: wechat
receivers:
- name: 'wechat'
webhook_configs:
- url: 'http://prometheus-webhook-wechat:5000'
send_resolved: true
更新配置
helm upgrade -n monitoring --create-namespace prometheus kube-prometheus-stack
检查
http://192.168.11.65:9093/#/status
手动测试微信webhook
需要把5000端口暴露出来,当然通过查看svc的集群ip也是可以的
kubectl port-forward --address=0.0.0.0 svc/prometheus-webhook-wechat -n monitoring 5000:5000
localhost根据实际修改
测试告警
curl -X POST -H "Content-Type: application/json" -d '{
"alerts": [
{
"status": "firing",
"labels": {
"severity": "critical",
"alertname": "HighErrorRate",
"instance": "server1"
},
"annotations": {
"summary": "High error rate detected!",
"description": "This is a description of the alert."
},
"startsAt": "2023-09-13T14:30:00Z",
"endsAt": "2023-09-13T15:00:00Z"
}
]
}' http://localhost:5000
测试恢复
curl -X POST -H "Content-Type: application/json" -d '{
"alerts": [
{
"status": "resolved",
"labels": {
"severity": "critical",
"alertname": "HighErrorRate",
"instance": "server1"
},
"annotations": {
"summary": "High error rate resolved.",
"description": "This is a description of the resolved alert."
},
"startsAt": "2023-09-13T15:00:00Z",
"endsAt": "2023-09-13T15:30:00Z"
}
]
}' http://localhost:5000
检查alertmanager日志
当收不到告警信息,我们需要检查alertmanager日志
kubectl --tail 10 alertmanager-prometheus-kube-prometheus-alertmanager-0 -n monitoring
三、我的微信
如果碰到问题,可以随时加我微信,谢谢
评论区