Wednesday 10 June 2020

Automating OnCall Summary Page Generation


This is final part of 3 part series discussing automation of generating on call summary of PagerDuty incidents into a Notion page. In this blog we will discuss combining everything we did in last 2 parts and schedule it using K8 CronJob.

PagerDuty is designed to alert clients to disruptions and outages through machine learning and automation. Notion is a collaboration platform with markdown support that integrates kanban boards, tasks, wikis, and databases.

Let’s first create a docker image. Assume the python file for collecting incidents from PagerDuty (Part 1) is named get_pd_incidents.py and python file for creating notion page (Part 2) is named create_notion_page.py. A simple Dockerfile will look like:

FROM python:3.7.1

# Install required dependencies
RUN pip3 install --upgrade pip
RUN pip3 install requests
RUN pip3 install pdpyras
RUN pip3 install notion

# Bundle app source
COPY get_pd_incidents.py /src/get_pd_incidents.py
COPY create_notion_page.py /src/create_notion_page.py

CMD ["python3", "/src/get_pd_incidents.py"]
 
Make sure the files are present in your current working directory where Dockerfile is present. Build the docker image using:

docker build . -t <docker_hub>/on-call-summary:0.0.0

Now let’s schedule the application using K8 CronJob. 

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: on-call-summary
spec:
  schedule: “0 0 * * SUN"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: on-call-summary
            image: <docker_hub>/on-call-summary:0.0.0

Save it to on-call-summary-cron.yml file and apply it using:

kubectl apply -f on-call-summary-cron.yml


Complete application code can be found in my GitHub.  

Tuesday 9 June 2020

Creating Notion Table/Database 


This is 2 of 3 part series discussing automation of generating on call summary of PagerDuty incidents into a Notion page. In this blog we will discuss about creating a Notion page with a database that will show the alert summary. Part 1 on this series can be found here.

PagerDuty is designed to alert clients to disruptions and outages through machine learning and automation. Notion is a collaboration platform with markdown support that integrates kanban boards, tasks, wikis, and databases.

When I write this blog, there is still no official API support provided by Notion. More details here

But don’t worry, there is an unofficial python API - https://pypi.org/project/notion/. This provides all necessary APIs we need. One caveat is that since there is no official API support yet, there is no standard way of obtaining API token. However, if you are logged into Notion, you can find this (token_v2) by inspecting the browser.

Once you obtain the token, we are ready to create a new Notion page.

client = NotionClient(token_v2=“XXXXXXXXXXXXXXXXX")
page = client.get_block("https://www.notion.so/yourorg/OnCallSummary-wfhwjfwcnccwlrhfow2486r9wn")
page2 = page.children.add_new(notion.block.PageBlock, icon="\U0001F9EF")

This will create a new page. Let’s now set the page title to today’s date.

page2.title = str(datetime.today().strftime('%Y-%m-%d'))

It’s time to create a table block inside the new page.

new_table = page2.children.add_new(notion.block.CollectionViewBlock)
new_table.collection = client.get_collection(
client.create_record("collection", parent=new_table, schema=get_collection_schema())
)
new_table.views.add_new(view_type="table")

where get_collection_schema() corresponds to: 

def get_collection_schema():
return {
"count": {"name": "Count", "type": "number"},
"action": {"name": "Remedies taken", "type": "text"},
"title": {"name": "Alert Name", "type": "title"},
"mttr": {"name": "MTTR", "type": "text"},
"notes": {"name": "Other Notes", "type": "text"},
"runbook": {"name": "Runbook", "type": "url"},
"=d{|": {
"name": "Service",
"type": "select",
"options": [
{
"color": "green",
"id": "695667ab-c776-43d1-3420-27f5611fcaty",
"value": "Service 1",
},
{
"color": "yellow",
"id": “452c7016-ef57-445a-90a6-64afadfb042d",
"value": "Service 2",
},
],
},
}

We have a blank table created. Next step is to populate data. (incidentSummary dict is generated as part of pulling incidents from PagerDuty)

    total_alerts = 0
    for alert in incidentSummary:
        row = new_table.collection.add_row()
        row.alert = str(alert)
        row.count = (incidentSummary.get(alert).get('count'))
        total_alerts = total_alerts + incidentSummary.get(alert).get('count')
        mttr_min = round((incidentSummary.get(alert).get('time')/incidentSummary.get(alert).get('count'))/60)
        if mttr_min > 59:
            mttr_hrs = mttr_min/60
            if mttr_hrs >= 24:
                row.mttr = str(round(mttr_hrs/24)) + " days"
            else:
                row.mttr = str(mttr_hrs) + " hrs"
        else:
            row.mttr = str(mttr_min) + " min"
        row.service = str(incidentSummary.get(alert).get('service'))
    new_table.title = "Total Alerts - " + str(total_alerts) 


and Voila! 

Complete application code can be found in my GitHub. In the next part we will discuss about scheduling this application as a K8 CronJob.

Monday 8 June 2020

Generating Pagerduty Incidents summary 


This is part 1 of 3 discussing automation of generating on call summary of PagerDuty incidents into a Notion page. In this blog we will discuss about a simple Python application to generate PagerDuty Incidents summary.

PagerDuty is designed to alert clients to disruptions and outages through machine learning and automation. Notion is a collaboration platform with markdown support that integrates kanban boards, tasks, wikis, and databases.

We will make use of pdpyras, a low-level PagerDuty REST API Client in Python.

The alert summary will include the list of incidents reported through PagerDuty for last X days. Number of occurrences of each alert, mean time to resolve (MTTR) and service name is also included in the summary.

We need to generate a PagerDuty API token if you do not have one already. Details can be found here. Also find out the service ID of services for which you plan to generate the summary. One way to easily find this is by checking the URL of particular service you are interested in. For example, PY1B8TY is the service ID in the following URL.  


response = requests.get('https://api.pagerduty.com/incidents?since='+since_time+
'&service_ids[]='+ PD_SERVICE_ID_1+'&service_ids[]='+ PD_SERVICE_ID_2+
'&limit=100', headers=headers, params=payload,)

where since_time is calculated as

since_time = (datetime.now() - timedelta(days = X_DAYS)).strftime('%Y-%m-%dT%H:%M:%SZ')

where X_DAYS is number of days for which we need to collect the summary.

headers contain

    headers = {
        'Authorization': 'Token token='+ PAGERDUTY_API_ACCESS_KEY,
        'Content-type': 'application/json',
    }

payload contain

    payload = {
        'status':'Resolved',
    }

The response is then parsed and processed to gather alerts, counts, MTTR, etc.

    incidentSummary = {}
    for incident in responseJson['incidents']:
        alert = incident['title'].replace("[FIRING:1] ","")
        starttime = datetime.strptime(incident['created_at'], '%Y-%m-%dT%H:%M:%SZ')
        endtime = datetime.strptime(incident['last_status_change_at'], '%Y-%m-%dT%H:%M:%SZ')
        mttr = (endtime - starttime).total_seconds()
        if alert in incidentSummary:
            incidentSummary[alert]['count'] = incidentSummary[alert]['count'] + 1
            incidentSummary[alert]['time'] = incidentSummary[alert]['time'] + mttr
        else:
            service = "Service 1"
            id = incident['service']['id']
            if id == PD_SERVICE_ID_2:
                service = "Service 2"
            incidentSummary[alert] = {"count":1,"time": mttr,"service": service}

Complete application code can be found in my GitHub. In the next part we will discuss about generating the summary page in Notion.