How to Build a Successful On-Call Culture in 2024: Tips, Best Practices
Good culture can make your team 30% more productive. But how do you cultivate it in the on-call context? Read what SRE leaders are recommending.
May 28, 2021
4 min read
What are the differences between incident management and incident response? The answer varies widely depending on whom you ask.
Is there a difference between incident response and incident management? Virtually every SRE will tell you there is.
However, the exact nature of the differences depends on whom you ask. There are several takes out there on the meanings of incident management and incident response. If you’re an SRE, understanding the surprising amount of variation surrounding definitions of each term can be helpful for thinking about how different types of organizations approach incident response and incident management.
So, let’s walk through some competing perspectives on the similarities and differences between incident management and incident response, and what SREs can learn from the varying viewpoints.
Probably the most widespread viewpoint on the differences between incident response and incident management boils down to the idea that the former focuses on the technical processes necessary to resolve an incident, whereas the latter deals with managing the broader impact of an incident on the business.
That’s the interpretation evident from the U.K. National Cyber Security Centre’s definition, for example, which tells us:
Adrien de Beaupré of the SANS Institute offers a similar definition (although he uses the term “incident handling” in place of “incident management”):
If you subscribe to this viewpoint, you probably think of incident response as the primary responsibility of SREs, whereas incident management requires the collaboration of a broader set of stakeholders -- the legal department, PR teams, compliance officers and so on -- who help steer the business as a whole through an incident.
Interestingly, some folks adopt an interpretation of incident management that is almost the opposite of the definition described above.
For instance, the U.S. Cybersecurity and Infrastructure Security Agency says that incident management “includes detecting and responding to computer security incidents as well as protecting critical data, assets, and systems to prevent incidents from happening.” Those are all essentially technical processes.
Although the agency goes on to state that incident management requires participation by a “wide range of participants across the enterprise,” rather than just technical teams, the core definition nonetheless focuses on the technical aspects of incident management more than the business aspects.
Referring to the ITIL (but not actually quoting it), BMC offers a take on incident management that similarly focuses on managing technical issues: “Incident management is the practice of minimizing the negative impact of incidents by restoring normal service operation as quickly as possible.” This definition doesn’t mention managing broader business impacts, just restoring service.
Within organizations like these, then, SREs may be expected to play a more central role in incident management, given that their definitions focus on technical processes first and foremost.
Sometimes, you find organizations that use the terms incident management and incident response more or less interchangeably.
The incident management and response guidance offered by EDUCAUSE, a nonprofit that promotes IT in the higher education industry, is a good example of this type of usage. The article doesn’t attempt to distinguish incident management from incident response in any formal way, and it seems to alternate between both terms more or less randomly.
It even implies that the terms are directly interchangeable, writing that “information security incident management” are “sometimes also called information security incident response programs.”
If we had to bet, we’d wager that most SREs would be uncomfortable with this more-or-less interchangeable use of the two terms. But the fact is that some organizations don’t distinguish between the phrases, whether SREs like it or not.
From examples like those cited above, it’s clear that there are multiple ways to define incident management and incident response. Some of them are ambiguous about the differences. Some definitions even directly contradict each other.
As an SRE, you could adopt one interpretation or another and go on a crusade to convince the world that your perspective is the right one. But realistically speaking, you’re probably not going to achieve universal consensus about the meaning of the two terms. Just as there will likely always be differences of opinion about whether Linux-based operating systems should be called just “Linux” or “GNU/Linux,” there will always exist varying viewpoints on the meaning of incident management and incident response.
Rather than trying to impose a certain perspective, SREs should adopt a flexible mindset. Their take on the definitions of incident management and incident response should reflect whichever viewpoint their organizational culture adopts. And they should do their best to support the practices that go along with incident response and management, no matter how they are labeled or categorized.