Skip to main content

Prometheus Alertmanager integration by routing to slack using helm

Setting up Alertmanager and Rules
---------------------------------


Now that you have prometheus set up, you need to specify some instructions. The next step is to create a values.yaml file that specifies

1) what the alert rules are,
2) what the Prometheus targets are (i.e the definition of what to scrape and how) and any jobs for Prometheus, and
3) where alerts should be routed to (in this case, Slack).

Alert Rules
------------

vi prometheus.values

## Prometheus server ConfigMap entries
##
serverFiles:

  ## Alerts configuration
  ## Ref: https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/
  alerts:
    groups:
      - name: Instances
        rules:
          - alert: InstanceDown
            expr: up == 0
            for: 5m
            labels:
              severity: page
            annotations:
              description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.'
              summary: 'Instance {{ $labels.instance }} down'
             
             
Prometheus Targets
------------------

Next, you will set up the Prometheus targets and specify the jobs. Fortunately, Kube scraping is already set up out of the box, so for our purposes, no additional action is required for this step


Alert Routing
-------------

Here I am routing the alerts to slack api.

api_url / WebHOOK_URL : https://hooks.slack.com/services/xxxxxxxxx/xxxxxxxx/xxxxxxxxxxxxx

you can test if the webhook url is working or not by sending a message to slack using the below curl command.

For getting your webhook url take the below url after login to your slack:

https://curai.slack.com/apps/A0F7XDUAZ-incoming-webhooks?next_id=0

curl -X POST --data-urlencode "payload={\"channel\": \"#MYCHANNELNAME\", \"username\": \"sanitybot\", \"text\": \"Just a sanity check that slack webhook is working.\", \"icon_emoji\": \":ghost:\"}" MY_WEBHOOK_URL

curl -X POST --data-urlencode "payload={\"channel\": \"#devops\", \"jino\": \"webhookbot\", \"text\": \"This is posted to #devops and comes from a bot named webhookbot.\", \"icon_emoji\": \":ghost:\"}" https://hooks.slack.com/services/xxxxxxxxx/xxxxxxxx/xxxxxxxxxxxxx



vi prometheus.values

## alertmanager ConfigMap entries
##
alertmanagerFiles:
  alertmanager.yml:
    global: {}
      # slack_api_url: ''

    receivers:
      - name: default-receiver
        slack_configs:
         - channel: "#devops"
           send_resolved: true
           api_url: 'https://hooks.slack.com/services/xxxxxxxxx/xxxxxxxx/xxxxxxxxxxxxx'
           text: "description: {{ .CommonAnnotations.description }}\nsummary: {{ .CommonAnnotations.summary }}"
    route:
      group_by: [cluster]
      receiver: default-receiver
      routes:
        - match:
            severity: critical
          receiver: default-receiver
          repeat_interval: 1m
          group_wait: 10s
          group_interval: 5m

Once your values.yaml file is prepared, you’re ready to upgrade.

# helm upgrade -f prometheus.values prometheus stable/prometheus

Now check your prometheus alert url :

https://monitoring.abtest.tk/alerts

You can also confirm that the settings that you given are correct using the below command:


# kubectl describe configmap prometheus-alertmanager -n prometheus
Name:         prometheus-alertmanager
Namespace:    prometheus
Labels:       app=prometheus
              chart=prometheus-9.3.1
              component=alertmanager
              heritage=Tiller
              release=prometheus
Annotations: 

Data
====
alertmanager.yml:
----
global: {}
receivers:
- name: default-receiver
  slack_configs:
  - api_url: https://hooks.slack.com/services/xxxxxxxxx/xxxxxxxx/xxxxxxxxxxxxx
    channel: '#devops'
    send_resolved: true
    text: |-
      description: {{ .CommonAnnotations.description }}
      summary: {{ .CommonAnnotations.summary }}
route:
  group_by:
  - cluster
  group_interval: 5m
  group_wait: 10s
  receiver: default-receiver
  repeat_interval: 3h
  routes:
  - group_interval: 5m
    group_wait: 10s
    match:
      severity: critical
    receiver: default-receiver
    repeat_interval: 1m

Events: 


That is all!!!! cheers :-)

Comments