What SREs Can Learn from Facebook’s Largest Outage
An SRE’s analysis of the October 2021 Facebook outage.
April 12, 2022
4 min read
A comparison of the two main SRE team models: Embedded SREs vs. standalone SRE teams.
To embed or not to embed: That is the question.
At least, that’s one of the questions that companies have to answer as they decide how to implement Site Reliability Engineering. They can either embed SREs into existing teams, or they can build a new, separate SRE team.
Both approaches have their pros and cons. The right strategy for your company or team depends, of course, on your needs and priorities.
An embedded SRE is an SRE who works as part of a non-SRE team. Embedded SREs might join development or IT operations teams, for example – although they could also be part of Quality Assurance teams, security teams or any other unit within the organization that can benefit from an SRE’s expertise.
Embedded SREs are the opposite of creating a dedicated, general-purpose SRE team. In the latter approach, an organization hires SREs who collaborate with non-SRE teams, but who aren’t integrated directly into them.
There are several other SRE organizational models that fall somewhere between these two poles. But typically, the two main types of SRE structures are embedded SREs or dedicated SRE teams.
Embedding SREs into other teams offers several benefits.
Arguably the most important advantage is that embedded SREs ensure the highest level of collaboration between SREs and other stakeholders. When an SRE works directly alongside other engineers on a continuous basis, it’s a pretty safe bet that the SRE’s perspective will inform every decision that the engineers make. Separate SRE teams will still generally collaborate with other teams, but not necessarily with the same level of dedication as embedded SREs.
Embedded SREs are also beneficial for organizations where the SRE concept remains new, and where stakeholders still need to learn what SREs do and the value they can bring. It may be a bit challenging to get buy-in for SREs when you propose creating an entirely separate SRE team; indeed, developers or IT engineers may even perceive such a team as a threat to their own jobs (even though in reality, of course, SREs complement other teams, instead of competing with them). But when you add SREs to larger teams, it’s likely to be easier to get everyone onboard with the SRE concept, and for other types of engineers to appreciate how SREs make their jobs easier.
Finally, embedded SREs work well for organizations that are on the small side and don’t need an entire team of SREs. If you have just several dozen engineers on staff, instead of several hundred, hiring a few SREs to embed into your existing teams probably makes more sense than investing in a brand-new team of SREs.
The embedded SRE model has its drawbacks, too.
One major risk is that the SRE’s influence within a larger team may not be strong enough to have a major impact on how that team operates. A single SRE working alongside a dozen developers or IT engineers, for example, may struggle to integrate SRE techniques and tools into the existing team’s workflow – especially if it’s a team that already has strong opinions about how it should operate.
Along similar lines, a single SRE working within a team using the embedded model could end up being stretched too thin to do his or her job well. That’s especially true if other team members expect the SRE to “own” functions that should really be a collective, team responsibility – like responding to incidents or using chaos engineering to test reliability. The latter processes may be led by SREs, but they shouldn’t be the SRE’s responsibility alone, and the SRE function can start to break down when other team members think of the SRE as the person on whom they can simply “dump” all of their reliability-related tasks.
Finally, the embedded SRE model can be difficult to scale, which is one reason why it doesn’t work as well at large organizations. If you have dozens of different engineering teams, embedding an SRE into each one requires a lot of management effort. It also deprives the various SREs within your organization of the ability to work closely with each other. In that type of scenario, it probably makes more sense to create a dedicated SRE team and have those SREs interface with other teams as needed, rather than trying to distribute the SREs across the business by embedding them directly into other teams.
In general, then, embedded SREs make most sense for companies that have relatively small engineering organizations, and in which the concept of the SRE function has not yet received complete buy-in. For large businesses that have numerous engineering teams, it will likely be easier to manage SREs by creating a separate team just for them.
That said, your mileage may vary from the norm, and it may be worth experimenting a bit to determine which type of SRE model – embedded or not – makes most sense for your company. Keep in mind that you can also use a hybrid approach wherein you embed SREs into some teams, while simultaneously maintaining a dedicated SRE team that leads SRE initiatives for the organization as a whole.
After all, SRE is all about thinking creatively and coming up with innovative solutions to challenges – even the challenge of figuring out how to structure the SREs within your organization.
{{subscribe-form}}