Monday, 8 June 2020

Generating Pagerduty Incidents summary 


This is part 1 of 3 discussing automation of generating on call summary of PagerDuty incidents into a Notion page. In this blog we will discuss about a simple Python application to generate PagerDuty Incidents summary.

PagerDuty is designed to alert clients to disruptions and outages through machine learning and automation. Notion is a collaboration platform with markdown support that integrates kanban boards, tasks, wikis, and databases.

We will make use of pdpyras, a low-level PagerDuty REST API Client in Python.

The alert summary will include the list of incidents reported through PagerDuty for last X days. Number of occurrences of each alert, mean time to resolve (MTTR) and service name is also included in the summary.

We need to generate a PagerDuty API token if you do not have one already. Details can be found here. Also find out the service ID of services for which you plan to generate the summary. One way to easily find this is by checking the URL of particular service you are interested in. For example, PY1B8TY is the service ID in the following URL.  


response = requests.get('https://api.pagerduty.com/incidents?since='+since_time+
'&service_ids[]='+ PD_SERVICE_ID_1+'&service_ids[]='+ PD_SERVICE_ID_2+
'&limit=100', headers=headers, params=payload,)

where since_time is calculated as

since_time = (datetime.now() - timedelta(days = X_DAYS)).strftime('%Y-%m-%dT%H:%M:%SZ')

where X_DAYS is number of days for which we need to collect the summary.

headers contain

    headers = {
        'Authorization': 'Token token='+ PAGERDUTY_API_ACCESS_KEY,
        'Content-type': 'application/json',
    }

payload contain

    payload = {
        'status':'Resolved',
    }

The response is then parsed and processed to gather alerts, counts, MTTR, etc.

    incidentSummary = {}
    for incident in responseJson['incidents']:
        alert = incident['title'].replace("[FIRING:1] ","")
        starttime = datetime.strptime(incident['created_at'], '%Y-%m-%dT%H:%M:%SZ')
        endtime = datetime.strptime(incident['last_status_change_at'], '%Y-%m-%dT%H:%M:%SZ')
        mttr = (endtime - starttime).total_seconds()
        if alert in incidentSummary:
            incidentSummary[alert]['count'] = incidentSummary[alert]['count'] + 1
            incidentSummary[alert]['time'] = incidentSummary[alert]['time'] + mttr
        else:
            service = "Service 1"
            id = incident['service']['id']
            if id == PD_SERVICE_ID_2:
                service = "Service 2"
            incidentSummary[alert] = {"count":1,"time": mttr,"service": service}

Complete application code can be found in my GitHub. In the next part we will discuss about generating the summary page in Notion.




No comments:

Post a Comment