Saltar al contenido

What does it take to be a site reliability engineer? 

Aliasgar Muchhala
30th May 2024

The role of a Site Reliability engineer consists of taking a holistic look at the entire IT application landscape of an enterprise from an end user’s perspective. They ensure that the individual systems involved in fulfilling an end user’s business requirement are doing so effectively, so that the end-user can accomplish their task with minimal problems. 

To meet this expectation, a site reliability engineer is typically expected to have a broad knowledge of the entire spectrum of IT systems, while combining a deep awareness of technical infrastructure, operating systems, and computer networking with an attention to higher level service level objectives (SLOs).

Site Reliability engineers need to focus on solving problems by building software components and features in ways that prevent problems from reoccurring, or at least make them less painful to overcome. Because of this, it’s often recommended that the Site Reliability engineers come from a software engineering background with an awareness of operations, rather than the other way around. 

Some technical skills that a Site Reliability engineer should possess include data analysis and visualization, non-functional testing and chaos engineering, and architectural oversight and governance. Experience with Agile, incident management, DevOps, and release management are also big plusses, as well as infrastructure, both cloud and software. 

Of course, with the recent advances in the field of AI, it is not uncommon to expect Site Reliability engineers to design intelligent IT estates that can leverage the power of generative AI (GenAI) to enhance the reasoning and decision-making capabilities of self-healing systems, adding this new dimension to a Site Reliability engineer’s tech skills repertoire.

Additionally, soft skills such as problem-solving, teamwork, working under pressure, and strong written and verbal communication are keys to success. 

You may think that this makes a Site Reliability engineer appear like someone straight out of a Marvel movie casting call. But remember, it’s not necessary to have all these skills in one individual. After all, even Marvel resorts to the collective efforts of superhero teams like the X-Men or the Avengers, who together help meet the goal of saving the planet.

Aliasgar Muchhala

Global SRE Lead and Global Architects Lead
A strategic, focused, business-oriented leader and Capgemini Level 3 Certified Chief Architect, with an impressive record in architecting and building cutting edge systems that leverage new age technologies to enable clients transform their business, reduce costs and improve efficiency.