Notice
Recent Posts
Recent Comments
Link
«   2025/07   »
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31
Tags
more
Archives
Today
Total
관리 메뉴

KJH

blackbox exporter 배포 및 alertmanager slack 설정 본문

DevOps

blackbox exporter 배포 및 alertmanager slack 설정

모이스쳐라이징 2022. 11. 6. 22:03

Blackbox exporter 설정 파일 생성

웹 서비스 엔드포인트를 모니터링하기 위한 http 모듈을 구성하기 위해 Blackbox configuration 파일을 ConfigMap으로 작성합니다.

# kubectl --namespace=monitoring apply -f configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-blackbox-exporter
  labels:
    app: prometheus-blackbox-exporter
data:
  blackbox.yaml: |
    modules:
      http_2xx:
        http:
          no_follow_redirects: false
          preferred_ip_protocol: ip4
          tls_config:
            insecure_skip_verify: true
          valid_http_versions:
          - HTTP/1.1
          - HTTP/2
          valid_status_codes: []
        prober: http
        timeout: 5s

 

 

Kubernetes에 Blackbox exporter 배포

Kubernetes에 배포할 수 있도록 Deployment와 Service를 작성합니다.

# kubectl --namespace=monitoring apply -f blackbox-exporter.yaml
---
kind: Service
apiVersion: v1
metadata:
  name: prometheus-blackbox-exporter
  labels:
    app: prometheus-blackbox-exporter
spec:
  type: ClusterIP
  ports:
    - name: http
      port: 9115
      protocol: TCP
  selector:
    app: prometheus-blackbox-exporter

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-blackbox-exporter
  labels:
    app: prometheus-blackbox-exporter
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus-blackbox-exporter
  template:
    metadata:
      labels:
        app: prometheus-blackbox-exporter
    spec:
      restartPolicy: Always
      containers:
        - name: blackbox-exporter
          image: "prom/blackbox-exporter:v0.15.1"
          imagePullPolicy: IfNotPresent
          securityContext:
            readOnlyRootFilesystem: true
            runAsNonRoot: true
            runAsUser: 1000
          args:
            - "--config.file=/config/blackbox.yaml"
          resources:
            {}
          ports:
            - containerPort: 9115
              name: http
          livenessProbe:
            httpGet:
              path: /health
              port: http
          readinessProbe:
            httpGet:
              path: /health
              port: http
          volumeMounts:
            - mountPath: /config
              name: config
        - name: configmap-reload
          image: "jimmidyson/configmap-reload:v0.2.2"
          imagePullPolicy: "IfNotPresent"
          securityContext:
            runAsNonRoot: true
            runAsUser: 65534
          args:
            - --volume-dir=/etc/config
            - --webhook-url=http://localhost:9115/-/reload
          resources:
            {}
          volumeMounts:
            - mountPath: /etc/config
              name: config
              readOnly: true
      volumes:
        - name: config
          configMap:
            name: prometheus-blackbox-exporter


prometheus-additional.yaml 으로 아래 내용 저장

- job_name: 'kube-api-blackbox'
  scrape_interval: 1w
  metrics_path: /probe
  params:
    module: [http_2xx]
  static_configs:
   - targets:
      - https://www.google.com
      - http://www.example.com
      - https://prometheus.io
  relabel_configs:
   - source_labels: [__address__]
     target_label: __param_target
   - source_labels: [__param_target]
     target_label: instance
   - target_label: __address__
     replacement: prometheus-blackbox-exporter:9115 # The blackbox exporter.


저장된 내용을 base64로 인코딩 후 secret value로 Secret 생성

PROMETHEUS_ADD_CONFIG=$(cat prometheus-additional.yaml | base64)
cat << EOF | kubectl --namespace=monitoring apply -f -
apiVersion: v1
kind: Secret
metadata:
  name: additional-scrape-configs
type: Opaque
data:
  prometheus-additional.yaml: $PROMETHEUS_ADD_CONFIG
EOF

 

kubectl --namespace=monitoring edit prometheuses prometheus-kube-prometheus-prometheus
아래 값 추가

spec:
  additionalScrapeConfigs:
    key: prometheus-additional.yaml
    name: additional-scrape-configs

 

kubectl edit  prometheusrules  prometheus-kube-prometheus-k8s.rules -n monitoring
아래 값 추가

  - name: blackbox-exporter
    rules:
    - alert: ProbeFailed
      expr: probe_success == 0
      for: 5m
      labels:
        severity: error
      annotations:
        summary: "Probe failed (instance {{ $labels.instance }})"
        description: "Probe failed\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
    - alert: SlowProbe
      expr: avg_over_time(probe_duration_seconds[1m]) > 1
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Slow probe (instance {{ $labels.instance }})"
        description: "Blackbox probe took more than 1s to complete\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
    - alert: HttpStatusCode
      expr: probe_http_status_code <= 199 OR probe_http_status_code >= 400
      for: 5m
      labels:
        severity: error
      annotations:
        summary: "HTTP Status Code (instance {{ $labels.instance }})"
        description: "HTTP status code is not 200-399\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
    - alert: SslCertificateWillExpireSoon
      expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 90
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "SSL certificate will expire soon (instance {{ $labels.instance }})"
        description: "SSL certificate expires in 30 days\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
    - alert: SslCertificateHasExpired
      expr: probe_ssl_earliest_cert_expiry - time()  <= 0
      for: 5m
      labels:
        severity: error
      annotations:
        summary: "SSL certificate has expired (instance {{ $labels.instance }})"
        description: "SSL certificate has expired already\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
    - alert: HttpSlowRequests
      expr: avg_over_time(probe_http_duration_seconds[1m]) > 1
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "HTTP slow requests (instance {{ $labels.instance }})"
        description: "HTTP request took more than 1s\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
    - alert: SlowPing
      expr: avg_over_time(probe_icmp_duration_seconds[1m]) > 1
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Slow ping (instance {{ $labels.instance }})"
        description: "Blackbox ping took more than 1s\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

 

proemtheus helm chart values.yaml 에 슬랙 설정 추가

슬랙 app  설정은 incomming webhook만 활성화 하면 바로 사용할 수 있다


적용된 alertname

HttpStatusCode, HttpSlowRequests, SlowPing [repeat_interval : 5m]
SslCertificateWillExpireSoon, SslCertificateHasExpired [repeat_interval: 168h] 일주일

  config:
    global:
      resolve_timeout: 2m
      slack_api_url: "https://hooks.slack.com/services/###"
    receivers:
    - name: default-slack-alert # Blackhole
    - name: timeout-slack-alert
      slack_configs:
      - send_resolved: true
        title: |-
          [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }} for {{ .CommonLabels.job }}
        text: >-
            {{ range .Alerts -}}
            *Alert:* {{ .Annotations.title }}{{ if .Labels.severity }} - `{{ .Labels.severity }}`{{ end }}

            *Description:* {{ .Annotations.description }}

            *Details:*
              {{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
              {{ end
    - name: ping-slack-alert
      slack_configs:
      - send_resolved: true
        title: |-
          [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }} for {{ .CommonLabels.job }}
        text: >-
            {{ range .Alerts -}}
            *Alert:* {{ .Annotations.title }}{{ if .Labels.severity }} - `{{ .Labels.severity }}`{{ end }}

            *Description:* {{ .Annotations.description }}

            *Details:*
              {{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
              {{ end }}
            {{ end }}
    - name: cert-slack-alert
      slack_configs:
      - send_resolved: true
        title: |-
          [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }} for {{ .CommonLabels.job }}
        text: >-
            {{ range .Alerts -}}
            *Alert:* {{ .Annotations.title }}{{ if .Labels.severity }} - `{{ .Labels.severity }}`{{ end }}

            *Description:* {{ .Annotations.description }}

            *Details:*
              {{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
              {{ end }}
            {{ end }}
    - name: expired-slack-alert
      slack_configs:
      - send_resolved: true
        title: |-
          [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }} for {{ .CommonLabels.job }}
        text: >-
            {{ range .Alerts -}}
            *Alert:* {{ .Annotations.title }}{{ if .Labels.severity }} - `{{ .Labels.severity }}`{{ end }}

            *Description:* {{ .Annotations.description }}

            *Details:*
              {{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
              {{ end }}
            {{ end }}
    - name: status-slack-alert
      slack_configs:
      - send_resolved: true
        title: " "
        text: "{{ range .Alerts }}{{ .Annotations.message }} Monitor is {{ .Status }}: [CLO-SET] ( <{{ .Labels.instance }}> ). {{ .Labels.alertname }} : {{ .Annotations.description }} \n {{ end }}"
    route:
      group_wait: 0s
      group_interval: 30s # 초기 알림이 이미 전송된 알림 그룹에 추가된 새 알림에 대한 알림을 보내기 전에 대기하는 시간(보통 최대 5m 이상)입니다. / s, m, h
      repeat_interval: 5m #6h # 알림이 이미 성공적으로 전송된 경우 알림을 다시 보내기 전에 대기하는 시간(보통 최대 3시간 이상).
      receiver: default-slack-alert
      # All alerts that do not match the following child routes
      # will remain at the root node and be dispatched to 'default-receiver'.
      routes:
      - match:
          alertname: HttpSlowRequests
        receiver: timeout-slack-alert
        group_wait: 10s
        group_by: ['alertname']
      - match:
          alertname: SlowPing
        receiver: ping-slack-alert
        group_wait: 10s
        group_by: ['alertname']
      - match:
          alertname: SslCertificateWillExpireSoon
        receiver: cert-slack-alert
        repeat_interval: 168h
        group_wait: 10s
        group_by: ['alertname']
      - match:
          alertname: SslCertificateHasExpired
        receiver: expired-slack-alert
        repeat_interval: 168h
        group_wait: 10s
        group_by: ['alertname']
      - match:
          alertname: HttpStatusCode
        receiver: status-slack-alert
        group_wait: 10s
        group_by: ['alertname']
    templates:
    - '/etc/alertmanager/config/*.tmpl'

 

다음 리포팅엔 prometheus chart 수정을 해서 자동으로 되게 하고자 한다..

'DevOps' 카테고리의 다른 글

azure keyvault secrets provider  (0) 2023.10.17
Packer  (0) 2023.10.17
Istio - 3(설치 및 예제)  (0) 2021.12.05
Istio - 2 (architecture)  (0) 2021.12.04
Istio - 1 (MSA, Service Mesh)  (0) 2021.12.03