6 Steps SREs Should Take to Prepare for Black Friday and Cyber Monday 2021
Six tips on how Site Reliability Engineers (SREs) can prepare for the reliability challenges of Black Friday and Cyber Monday 2021
November 5, 2021
4 min read
An explanation of the meaning of SLA, SLO and SLI, and how SREs should use each concept to manage reliability.
The world of Site Reliability Engineering is filled with acronyms -- especially ones that start with S. In addition to SRE (which can stand for both Site Reliability Engineering and Site Reliability Engineer), there are three other essential S acronyms to know: SLA, SLO and SLI.
Understanding what SLA, SLO and SLI mean, and how they relate to SRE, can be a bit tricky. The differences between the three terms are small, yet important, and you don’t want to make the mistake of conflating these terms.
In this article, we explain exactly what SLA, SLO and SLI mean, and discuss the similarities and differences between them.
SLA, which stands for Service Level Agreement, is probably the most commonly used term out of the three.
An SLA is a contract that a vendor or service provider makes with customers. The contract promises to deliver a specific level of service availability, reliability or performance. (Note that these terms are closely related, but not identical.) SLAs may also specify penalties (such as a reduction in fees) that the vendor service provider will incur for failure to meet the promises specified in the SLA.
For example, a simple SLA might guarantee that a SaaS application will be available 99.9 percent of the time. If the application fails to meet that level of availability, the customer’s payments will be reduced by 10 percent.
Importantly, an SLA is not the specific level of availability or other goals that a provider promises to meet. An SLA is just a contract that includes information about these promises.
An SLO, or Service Level Objective, is the specific reliability, availability or performance target that a vendor promises to meet within an SLA.
For instance, if your SLA promises 99.9 percent uptime, then 99.9 percent uptime is your SLO.
An SLA can include one or multiple SLOs. SLOs can also be specified in varying ways -- and determining how to set clear and measurable SLOs is a critical consideration when creating an SLA, especially when you are dealing with complex metrics.
For instance, an SLO that simply promises application response rates of under 2 seconds may be too vague because there are so many variables at play in response rates. A better SLO might be one that promises response rates of under 2 seconds for a specific type of transaction, based on average response rates for that type of transaction over a specific period of time.
An SLI, or Service Level Indicator, measures how well a company actually meets the SLO promises that it sets within SLAs.
For instance, if you promise an SLO of 99.9 percent uptime, and you achieve 99.95 percent uptime, then your SLI would be 99.95 percent (and you’d be exceeding your SLO, which is good).
Tracking SLIs is important for two main reasons. The most obvious is that demonstrating SLIs to customers allows companies to show that they are meeting the terms of an SLA.
Second, by tracking SLIs on a continuous basis, vendors can detect when they are falling short of meeting SLO promises, and they can take measures to address the problem before it turns into an SLA violation. Given that SLOs are often made on the basis of meeting certain goals over a period of time, early detection of SLI issues makes it possible to correct those issues before they persist long enough to trigger SLO non-compliance.
SLAs, SLOs and SLIs share one major thing in common: They are all part of the formal process that businesses use to set and track reliability, performance and availability goals. By extension, they are central to the work performed by SREs, whose main job is to help businesses meet the goals they set within these categories.
However, once you dive into the details, SAs, SLOs and SLIs are clearly different types of entities:
SLAs, SLOs and SLIs play a central role in shaping the reliability goals that SRE teams need to meet, as well as in helping them measure their success in achieving those goals. By striving to make clear, realistic SLA agreements that are based on easily measurable SLOs, then tracking SLO compliance based on SLIs, SREs set their businesses up for success on the reliability management front.
{{subscribe-form}}