They’re often seen but rarely heard. When operations team members are on the floor, they’re almost always just a blur – diligently engaged in working on tickets, administering changes, or attending to outages. I don’t think I’ve ever seen an operations team member yakking around the water cooler or in the office kitchen. They’re not the gossiping type and are almost always more than 100% utilized, with little time for any niceties. And adding to the demanding nature of their jobs is the complexity that comes from being heavily outsourced across multiple vendors throughout a truly global delivery model.
Coordination among these widely distributed teams is a real challenge – even under normal circumstances. But in these times of global economic crisis, it can seem almost impossible. And with budgets shrinking across the board, there’s pressure to still maintain the same level of service while reducing operational costs. However, there is hope – with automation and something we call “No-Ops.”
No more tears: shifting from a traditional, multi-tiered approach to automated No-Ops
Traditional IT operations use a tiered operating model. The tier-1 teams (apps, server, DB, storage, network) are the first responders and handle bulk of the tickets or actions. They constitute almost 50% of the IT operations team. Tier-2 teams spend all their time in change management: break fixes, HW maintenance, patching and upgrades, assisting tier-1, and acting as conduit between tiers 1 and 3. Tier-2 teams constitute almost 30% of the overall operations team.
Tier-3 teams are premium resources and are very hard to find in the market. They’re also subject matter experts in their field (server, DB, storage, network, etc.) and are called in for serious issues. They spend time on future IT strategy, capacity planning, and other NFR (non-functional requirements) tasks. They make up 20% of the overall IT team.
If an enterprise is following a solid traditional DevOps approach from day one – from requirements to deployment – can they expect quality issues or incidents in production? There could be break fixes for HW or storage or network maintenance, which may necessitate some downtime. But, if we implement proactive monitoring and address these issues, will any downtime for applications be needed? Do we need this large operations team and an “eyes-on” glass helpdesk? Maybe not.
No-Ops: shifting left and shifting right to drive down to zero incidents
DevOps best practices dictate that teams build in quality from day one – starting from the requirements phase and working all the way through to deployment. It’s also recommended that teams practice shift left, where they fully test their code in the Dev environment (including security testing) and move all code to higher environments: QA, STG, and PROD. Implementing this quality mindset throughout all teams will bring you closer to zero technical debt and issues in the PROD environment.
Microservices architectures or API architectures in the cloud enable testing in PROD environments before the code is released to end customers. This practice is called shift right. Deployment patterns such as blue-green, A/B, Canary, or Feature Toggles enable for deployment of code to the PROD environment and full testing of new changes and features. Most of the operational issues happen after a change is implemented in PROD. This is mainly due to code defects, which have been moved into PROD. Shift-left or shift-right practices can help you drive down the number of production incidents to zero while simultaneously raising the argument for the necessity of your operations in a now incident-free PROD environment!
No-Ops continuous operations: automating your helpdesk with bots and virtual assistants
Most helpdesk production tickets and requests (50% of ticket volume) are minor in complexity. For example, user management, apps, DB, server (VM) and network requests, etc. Some of these tasks can be addressed through a self-service portal, so users can serve themselves, while others can be implemented using bots or virtual assistants. At Capgemini, our automation team has over 600 bots for servicing these IT operations requests – and they’re used across some of our largest clients.
Do you need to patch?
With new PaaS/container solutions such as Kubernetes, we’re able to build a new container instance with new patches/upgrades and take down old versions using ZDD or a blue/green deployment. This reduces the number of operations tickets for patches and upgrades, which then moves your IT into a more No-Ops mode.
Here’s what continuous operations look like: proactive monitoring and a common backlog
If all your IT elements are proactively monitored using log monitoring tools such as Splunk, APM tools such as Dynatrace, and container monitoring tools such as sysdig, incidents in the PROD environment can be detected before they happen. This is done by either setting known thresholds or through predictive algorithms. Actions can be coded using bots – and those bots are programmed so that production incidents are found before they even occur – another strong case for No-Ops.
Capgemini has proven accelerators to analyze your tickets with ITSM tools such as ServiceNow, which can find the root causes and apply automation solutions from our catalogue of bots. Problem management analysis for frequently occurring issues is completed and these tickets are added to your backlog to then act upon.
Figure 1.0: No-Ops – Continuous operations in action
When you implement No-Ops practices within Continuous Delivery and Continuous Operations, traditional operations tasks are automated or supported through a self-service portal to drive down manual actions or interventions to zero. The overall result for your business is a steady drive down to zero touch and zero defects – we call this the Power of Zero.
No-Ops and the Power of Zero: Why zero can be a really big number
The Power of Zero expands on this concept of No-Ops and is an actionable framework for solidifying your legacy IT estate as a launchpad for your digital transformation, so that you attain all the speed and agility needed for a truly digital enterprise. Essentially, the Power of Zero brings you a future state with zero defects and tickets, zero touch, zero applications debt, and zero business interruption – all leading to zero innovation latency across your entire business.