February 17, 2025

9 mins

Incident Communications in 2025: Strategies from Industry Leaders

Are you buried under tickets and dubious SEV scales? Industry leaders are challenging the basics of how teams should communicate during incidents.

Written by

Ryan McDonald

Incident Communications in 2025: Strategies from Industry Leaders

Table of contents

You’re hanging out with a dear friend when she trips and faints. Oh no, are you okay? You reach out to help her up. But she’s not moving. Hey, Jane, hey. It’s not funny. She’s not reacting. Did she hit her head? Is she bleeding? You don’t know what’s going on—you’re panicking. You call 911.

Imagine that instead of the usual “911, what’s your emergency?” the operator picked up the phone and said, “911, what’s the severity of the incident? Which body parts were impacted? Is it raining right now?” and went on and on, asking you a long list of fixed questions, several of them irrelevant to the situation.

Sounds silly, but this is how most organizations approach incident declarations—one of the first communication touchpoints of an incident. Unafraid of challenging popular practices like this one, the industry leaders who gathered at the latest Reliability Leaders Roundtable discussed how they’re tackling incident communications in their teams.

Reliability Leaders Roundtables are private events where 20–30 SRE leaders gather to have casual chats on how their peers are approaching specific challenges.

In this article, we present a distilled version of the insights uncovered during the incident communications roundtable.

Tickets Aren’t Great, but They Won’t Go Away

Tickets can drive anyone insane. It’s not just about new requests—it’s also about dealing with ticket hygiene. It never ends. Not only do you get an overwhelming number of new requests through tickets with barely any information or context, but you also end up losing track of what’s being done for each of them—if anything at all. It’s been two weeks; are these resolved or even relevant by now? No easy way to tell.

To make matters worse, that same lack of information in the tickets makes it hard to set up automations around them. And let’s not even get started on tickets created by the trigger of an abandoned integration.

Augment Context with Automations

Incomplete information is a problem you’ll always have to deal with—you need to accept it. Attendees explained that they’ve been using tools like Rootly to bring up additional context automatically when an incident is created through a ticket.

For example, if a user files a Linear triage ticket whose only content is “Catalog API taking too long,” the team has a Rootly workflow that automatically pulls up a Datadog dashboard.

Internal Communications: Status Pages Rarely Work

Incidents are unavoidable. However, an incident at the wrong time can sour a deal or deteriorate your relationship with your customers. That’s why incidents are not only relevant to engineering—reliability leaders agree that cross-functional collaboration is vital in incident management.

But how do you tell your sales team that a bunch of OOMKilled errors are flooding your logs, but you’re not sure why or when you’ll have a fix?

Traditionally, internal status pages helped drive visibility for internal communication. Your customer support teams could check the internal status page to see if a system was down, and perhaps that information could help them understand a customer’s issue. Because it’s a private status page, your SRE team could hook automations to update it and make it easier to maintain.

The problem? Discoverability. Unless the team has a regular need to check the internal status page, they’ll have a hard time figuring out where to find the URL for it. You need to make a full context switch just to know if a system is down or if your organization is experiencing an incident at all.

Meet Your Colleagues Where They Are: Slack

The roundtable attendees agreed they had more success setting up automations to notify their colleagues about ongoing incidents in specific Slack channels. Most attendees had a tool like Rootly to update a Slack channel when an incident was declared and resolved.

SEV Scales: A Vestige of the Past

It’s standard practice to start dealing with an incident by assessing its severity (SEV1, SEV2, SEV3) or priority (P1, P2, P3, etc.). But how useful—or even realistic—is it to conduct an impact assessment when you’re basically in the dark about what the incident entails?

It doesn’t help that the boundaries between, for example, SEV1 and SEV2 are arbitrary. They can vary across companies and even across teams within the same company. Then you run into black swan events—incidents that don’t fit any category—or incidents that leave you perplexed, unable to move beyond filling in the incident declaration form.

Alternatives to SEV Levels

The attendees didn’t reach a consensus on how to replace severities. But the room agreed that severities are more likely a system inherited from legacy tools and frameworks.

Various leaders are experimenting with these ideas instead of relying on SEV or P levels:

Use descriptive language: Structure a scale that describes exactly what you’re defining. Instead of P1, say, for example, “Puts new important deal at risk.” Let your teams define labels that make sense to them.
Use a complexity-based scale: Classify incidents based on how much complexity may be involved in resolving them.
Describe the number of unknowns: When an incident breaks, responders may assess the situation better by laying down what they don’t know rather than making assumptions about the incident’s impact.

Incident Commanders: The Keys to Communication

Over-relying on processes tends to cause frustration and slow responses. Even declaring an incident seems so difficult—you can’t even declare one without filling in a questionnaire and writing an essay on the semantic limits of severity levels.

How can we improve incident communications at scale? Empower your Incident Commanders.

The reliability leaders agreed that it’s important to create an on-call culture that makes Incident Commanders feel empowered to make decisions without fear of punitive measures if their initial assessment isn’t entirely correct.

For example, Incident Commanders should be able to:

Call for an outsized response initially and re-scale the plan as more information is known.
Contact other teams without going through a ticketing system.
Delegate delicate tasks such as external communications to a dedicated role in their incident response team.

VPs and Execs: Keep Them Busy, Outside the Room

You’ve seen it firsthand. Your team is in the middle of a technical discussion to find a remediation path when a VP bursts into the room (or Zoom call), bypassing all standard communication channels. Now, instead of focusing on the solution, you have to stop and explain what’s going on.

VPs and execs aren’t the enemy or inconvenient allies—they just want to do everything in their power to help resolve the incident as quickly as possible. And they have more influence than you may realize while you’re knee-deep in logs and traces.

All the reliability leaders at the roundtable agreed: building trust with executives is essential. And the best way to build that trust? Over-communicate.

Here’s how roundtable attendees build rapport with executives during incidents:

Give them a task. Yes, keep them busy with something within their expertise. It doesn’t have to be restarting a cluster, but it could be calling a partner team to see if they’ve experienced something similar.
Have an executive liaison. This role acts as the bridge between execs and the responders, keeping communication clear and efficient.
Over-communicate. Set reminders for your team to update incident statuses frequently. Leverage tools like Rootly AI to keep summaries up to date without adding extra burden to responders. Prolonged silence can be misinterpreted as inaction.

Conclusion

Incident communication is evolving from the bureaucracy of forms and SEV levels to trust and automation. Being able to pair stronger human relationships and enhanced performance through tooling is what’s making SRE leaders move towards more robust reliability.

Apply to join us in the next Reliability Leaders Roundtable to discuss how AI is impacting incident response.