Automate automation: Use alerts as actionable triggers to fix problems quickly

By Leonid Yankulin

Elevator Pitch

How to react to something happening with application as fast as possible? How to reach SLOs beyond three nines (99.9%)? The answer is to automate reaction to an incident and to automate triggering this reaction. How to implement it? The answer is “Actionable Alerts”.

Description

“Actionable alerts” is a widely used term that describes how alerting mechanism can be used for triggering automation. The alerts can help to solve many problems including “adding nine(s) to our SLOs” by automatically triggering automation 🙂. Let’s look into one of the problems of continuous deployment process. It is not uncommon that a deployment pass all stages and get enrolled to production only to start generating 4XX or 5XX responses. While troubleshooting it in production can be native way to resolve the root cause, many situations may require to restore the service first. In this talk we will demonstrate how to detect a potential incident after deploying a version to production and how simply to resolve it on an example of Cloud Function deployment.

Notes

The talk will require stable internet connection for live demo.