Alert on symptoms, not causes

When you are bringing a new system to production you know that you ought to define SLIs, set up instrumentation, alerting, etc. Nowadays there is an abundance of tooling and infrastructure to extract data from your service and the entire stack it runs on. But this leaves you with a problem. What can we do … Read more