Skip to Content

SRE Principles & Tips

Capgemini
Feb 9, 2024

Many organizations are either implementing SRE or thinking of doing so. I am sharing some guidelines/tips about how to start with SRE.

How to apply SRE practices to your project?

SRE is not a set of methods or prescribed solutions. Instead, it offers overarching principles that can guide production management. The right solutions will be different for each unique business. However, there are common SRE principles and best practices that can be applied:

  • Assess current state of reliability – This can be done by checking system performance against key business and technical KPIs. You may also start with four SRE Golden Signals which are latency, traffic, error rate, and resource saturation.
  • Define acceptable levels of reliability – Discuss acceptable levels of reliability with business and technical leadership. Define what is achievable. You may not want to be too optimistic here. Take small steps to achieve bigger goal.
  • Empower management to take on predetermined levels of risk – Educate management about what SRE team is trying to achieve, what risks may be involved, potential impact and mitigation plan.
  • Build robust service level objectives (SLOs) and service level agreements (SLAs) – SLOs are derived from SLAs. Think from end users’ perspective while defining SLOs and get better at it.
  • Create Error Budgets – This is directly related to SLOs. Consume error budgets wisely, avoid having unplanned deployments. It is advisable to keep some error budget aside for unplanned downtime.
  • Monitor services and act on possible areas of improvement – Monitoring how your services are doing is very crucial for SRE team and even for management. You can’t fix what you can’t see. Setting up ‘right’ SLOs will give SRE team time to work on issues before they breach SLAs.
  • Eliminate areas of high toil – Try to automate repetitive activities that support/DevOps team is doing. This will help to free up their time which can be used for development/automation activities.
  • Document release standards and educate all stakeholders – Define repetitive processes and standards, document and publish them.
  • Bring in Simplicity – Simplicity is the best. Simplify processes and architecture where possible to reduce complexity. It’s easy to understand and maintain.

Author

Vikas Baviskar

SRE Solution Architect
Vikas is SRE Solution Architect, driving SRE offer within North America SBU. He is helping accounts to create right SRE solutions for their clients. He is also working as SRE Solution Architect for one of the NA clients. As application and infrastructure reliability are becoming growing concerns across industries, Vikas’ interest lies in driving innovation in the SRE solutioning, to design better suited SRE solutions for different needs.