The difference between total chaos and a minor inconvenience usually comes down to how well trained the organization is to handle the unexpected. To provide IT organizations with a framework for responding to unplanned outages, PagerDuty created a platform for managing IT incidents. Today PagerDuty upgraded that platform to include support for Rich Incidents.
PagerDuty CTO Andrew Miklas says Rich Incidents alerts will give IT organizations more context about a problem by attaching graphs and charts from IT monitoring tools to any alert being sent. Armed with that context, Miklas says that the IT staff will have a much better understanding of how critical any given IT problem just might be.
In general, IT organizations often suffer from alert fatigue. IT monitoring tools tend to throw off so many false positives that the IT staff stops paying attention to the alerts. Of course, it only takes one of those false positives to become a negative before entire systems start going offline. And Murphy’s Law being what it is, it’s almost certain that crisis will occur at the worst possible time for the business.
Of course, once alerted to a problem, the next issue is what to do about it. PagerDuty provides an incident response framework through which everyone knows not only who is handling a particular issue, but also how long it will be before the problem gets resolved. The end result is a significant reduction in the drama usually associated with an IT outage.
Naturally, all the alerts being fed through the PagerDuty framework can be passed on to an IT automation framework that can remediate many of these issues automatically. But before routinely doing that, Miklas says PagerDuty provides a mechanism through which IT organizations can collaborate on what the implications of any particular patch or fix might be to the rest of IT environments that, from a complexity standpoint, are usually beyond the ken of one person to manage. After all, everyone knows how the proverbial cure can easily wind up being worse than the disease. The real challenge is figuring out what’s really the cure versus an even bigger problem just waiting to happen.