EA reboot – agile EA and an approach for „antifragile architecture”

Publish date:

Disruption and velocity have changed the landscape of almost everything within the enterprise.

Historically, enterprise architecture has been a relatively long-lasting implementation. It takes time to complete the current state architecture and inventory then develop the future state target architecture and the transformation program to achieve the future state. This typically takes one to two years and in the end, it is only partially successful because the business moved on after the first six to nine months. This occurs because markets are changing fast and the business must adapt and move on. Enterprise architecture needs to adapt and move faster. The good news is that we’ve seen this pattern before with software development. It too had a long-running process where the requirements changed by the time the software was developed. The solution to this was a closer tie to the business, and a highly iterative prototyping and development cycle called the Agile Method. This same method can be applied (at least in parts), to the long-running EA challenges. There are several key factors for success:

  • Focus – current state architecture frequently pre-occupies architecture time. This is because architects want to know the current state thoroughly so that any future changes can be perfectly understood in terms of impact to current systems. While this is a useful activity, it provides limited value to the enterprise. Instead, focus on the changes that are required to take the architecture forward. For legacy systems, focus on the systems that are the source of the most issues or tickets using the Pareto or 80/20 rule – 80% of the issues are caused by fewer than 20% of the systems
  • Front office – the front office (sales, customer feedback, partner feedback, etc.) tends to be the focus of change, especially in digital transformations. Back-office systems tend to change less often in general, maybe quarterly. Pay attention to front-office requirements as a digital transformation priority but maintain the linkage to the back office so they don’t fall out of sync
  • Integration – I don’t mean this in the traditional sense of enterprise application integration (EAI), but rather a higher form of integration at the business level where business architecture plays a role. This is critical for changes in business models and in partner relations. However, business executives fail to share these developments with the IT team and because of that the IT and the architecture falls behind or goes out of sync
  • Rhythm – staying in sync with Agile IT approaches and the business. EA team members need to be part of the projects and the programs implementing change. If there’s a daily SCRUM meeting, then a member of the EA team (or the extended team) needs to participate. When an architecture review board (ARB) meets then the business liaisons, and the solution architects from projects and programs need to participate. Architecture is about communication and collaboration. The fancy diagrams are not a substitute for clear dialogue and conversation
  • Guidance – probably the most important thing an EA program can provide the enterprise is guidance. EA can provide guidance in business model changes, technology choices and in investments. EA also provides an understanding of why these changes are a crucial part of the transformation process and it is important because why equals purpose, and purpose motivates people and organizations
  • Tools – utilizing improved frameworks such as TOGAF 9.2 and the TOGAF Library as well as new and improved modeling tools and repositories will help accelerate artifact development and maintenance
  • Automation – automating the population of architecture repositories and some modeling practices will accelerate the EA process, improve quality and allow for more time and effort on high-value knowledge work like the transformation program.

These adaptions to the EA approach will help us keep up with business velocity but will that be enough? Don’t plan on it! We live in this rapidly changing VUCA (volatile, uncertain, complex, ambiguous) world. My colleague Gunnar Menzel’s blog Architecture in a VUCA World is an interesting read on this topic.

What we also need is a way to stay ahead, not just keep up.

Introducing the concept of “antifragile (1[i]) architecture”

What is fragile? Something that breaks under stress, like a china vase.

So, what is antifragile? – Something that doesn’t break under stress – right? Wrong. Something that doesn’t break under stress is robust.

Something that’s antifragile isn’t just robust, it gets better with stress, up to a certain point. An example of antifragile is the human body; when you stress the body with manual labor or weight lifting, the muscles get stronger, tendons toughen, and bones get denser. Also, if you remove the stressors from the human body, the body becomes more fragile. An important concept of antifragility is stressing the system to improve the system. Another important concept is options.  Incorporating options into the system will allow you to take manageable risks but they are contrary to our desire to plan everything and quantify risk. Instead of quantifying risk, know that bad things are going to happen and ensure optionality such as redundancy which is built into the system. Don’t design not to fail but rather design for rapid recovery. A key principle of antifragility is to distribute the system and enable emergent behavior with feedback mechanisms so that the system can sense and respond to change as against a top-down centralized system. The difference here is critical – distribution of decisions, behavior, etc., by nature, are more resistant to centralized catastrophic failure.

Therefore, an “antifragile architecture” will get better with stress and change. Some examples of antifragile architectural attributes and principles are as follows:

  • Design for recovery not failure – rather than designing the system to prevent failure, design it for recovery, learning, and refactoring and stress test the system frequently. Don’t try to predict when bad things are going to happen but rather architect systems that can recover quickly and use chaos engineering to test and stress the system. A particularly relevant topic for recovery versus failure is a data breach. You can architect your systems from code through the firewall to prevent data breaches, but it’s still going to happen. Everything from code to systems need to have the ability to sense and respond to data breach threats and quarantine the effects. There are some excellent tools to help address and learn from these events – AI, machine learning, and threat analytics, to name a few. The organization also needs to be prepared to deal with the after-effects of a data breach, learn from it and refactor, based on the learnings.
  • Stressing the system – Chaos engineering takes the complexity of a system as a given and tests it holistically by simulating extreme, turbulent or novel conditions and observing how the system responds and performs. What happens if you lose connectivity to all your systems in one data center or a cloud platform? What if network traffic spikes dramatically due to a DDoS attack? What happens if both happen at the same time? Two important principles to achieve antifragility are to fail fast and adapt. Design the systems to be tested frequently in production but control the impact to avoid customer impact. Testing the production systems in this way requires they be architected for fast recovery and it stresses the systems and the people managing the systems frequently so that when something goes bad unexpectedly, the systems and the people recover faster.
  • Options – In general are key for an antifragile system and architecture. Options allow for adaptable future decision making in a VUCA environment. Options also provide a path for experimentation which is another key principle of antifragility.
  • Options – Open APIs (application programmable interfaces), provide a simple standard for accessing the application or data which increases the options for usage. Many legacy systems have proprietary APIs or access methods which constrain their use and reduces the options for access. Consider developing an open API layer to interact through the proprietary layer to provide access to more systems.
  • Risk in the game this is about architects more than architecture, but the approach helps the architecture. Architects, even enterprise architects must have something to win and something to lose in relation to the architecture and the projects they support. As an EA, or any architect for that matter, you don’t get to sit back and pronounce standards and guidelines that must be followed, or else! You as the EA must be involved and own what you’ve committed to – standards, guidelines, architecture, etc. Be humble in taking feedback and graceful giving it and help the firm adopt and adapt.
  • Microservices – address three key tenets of an antifragile architecture:
    1. A distributed set of services that run independently – imagine breaking down your ERP system into many business services that run independently and are decoupled from a central system. This distributed system is more tolerant to failure – when one microservice fails it doesn’t necessarily impact the other microservices. If I can still order the product but my machine learning instance is down for complimentary and cross-sell lookups, I can still complete the order.
    2. Increased accessibility and options using open APIs that are not hard-wired into a centralized system – you can address the microservice through an open API which makes it more accessible than traditional monolithic proprietary approaches.
    3. Enabling and encouraging experimentation by allowing more access in multiple ways to stitch microservices together in ways that are either too complex or too expensive to do with monolithic applications.

For additional readings on antifragile topics in systems and software consider these:

In conclusion, it’s clear that enterprise architecture is evolving through agile methods and antifragility to help the enterprise address the current challenges and the disruptions ahead. Welcome to the EA reboot! It’s also clear that EA is more than a role, it’s a program to both assist the business and drive the business toward their vision and desired results.

[i] 1 The discussion of “antifragile” is a concept I learned from Nassim Taleb in his book Antifragile: Things that Gain from Disorder – an excellent book that I highly recommend.

Powiązane posty


Diversity. Czy różnorodność jest ważna i jaką rolę odgrywa w firmie

Date icon 2022-07-04

O roli różnorodności w firmach, a także o tym dlaczego jest tak istotna opowiada Hanna...

Financial Services

Czy aplikacje zostawiają ślad węglowy?

Date icon 2022-07-01

Ślad węglowy, czyli całkowita emisja gazów cieplarnianych wytworzonych podczas pełnego cyklu...

Capgemini Invent

My Way Capgemini: Szymon Kisiel

Date icon 2022-06-21

Poznaj zespół Data Science!