What is SRE?
What is SRE?
SRE (Site Reliability Engineering) is a software engineering approach to IT operations. SRE uses software as a tool to manage systems, solve problems, and automate operational tasks.
SRE is useful for creating scalable and highly reliable software systems. Because systems can be managed at scale through code, SRE provides greater scalability and sustainability for system administrators who manage thousands to hundreds of thousands of machines.
The concept was created by Gen Treynor Sloss of Google’s engineering team.
Benefits of using SRE
SRE helps release new features on time and allows users to use those features reliably.
Role of an SRE engineer
This is a unique role that requires a background such as a software developer with additional operations experience, or a system administrator or IT operator with software development skills.
An SRE team is responsible for service availability, latency, change management, emergency response, and capacity management in production, as well as how code is deployed, configured, and monitored.
Using service-level agreements (SLAs), service-level indicators (SLIs), and service-level objectives (SLOs), the team can decide what new features to release and when to define reliability requirements for the system.
SRE does not expect 100% reliability and prepares plans for failures.
Difference from DevOps
DevOps is an approach to company culture, automation, and platform design for improving business value and responsiveness through fast delivery of high-quality services.
SRE can be considered an implementation of DevOps.
| SRE | DevOps | |
|---|---|---|
| Main focus | Scalability, operational metrics, automation | Integration of development and deployment processes |
| Responsible team | Development teams interested in operations | Operations teams interested in development |
| Metrics | Maximum/minimum service-level objectives (SLOs) | Mainly system telemetry |
| Typical adopters | IT service companies in cloud-native environments | Companies moving from on-premises to cloud |