Two considerations when migrating analytics platforms to the cloud

Capgemini

2020-12-10

Insight is everything in today’s business environment. Data analysis is the key to identifying new opportunities and mitigating risk. Companies can be especially successful at this when insight platforms are moved to the cloud, where they can benefit from increased flexibility to collect, analyze, and interpret data. As a solutions architect who helps clients transform their analytics capabilities via Amazon Web Services (AWS), I’ve identified two key considerations that should be addressed to boost success.

The first is what’s known as “separation of concerns.” At a high level, an analytics/insights platform can be broken into the following components:

Ingestion
Data validation
ETL (extract-transform-load) from raw, validated data to an operational data store
Analysis/reporting, either in the raw data format or in the operational data store

Analysis/reporting, either in the raw data format or in the operational data store.

Ensuring these processes are properly separated is a hallmark of clean software design and yet it is frequently discounted by programmers in the interest of speed. Recently, I worked with an organization to transform its insights platform and discovered the ETL process included capabilities from the ETL software combined with code (stored procedures and triggers) at the database level. I used the AWS Schema Conversion Tool (SCT) to understand these interdependencies – and they were significant: the SCT report ran to more than 100 pages. Untangling this consumed most of the time and resources in our engineering effort.

You may see this problem at a simpler level. In the new insights platform for this client, data is dropped into respective S3 buckets and triggers are used to commence the data-validation process. This utilizes AWS Step Functions to sequence AWS Lambda Functions in a serverless way. It’s pretty cool, and clean. But notice the word “trigger”? We’re relying on a very specific feature – and of course it’s used widely across AWS implementations – for a data store to kick off our data-validation process. What happens if in the future the platform won’t or can’t use S3 as the initial place data is dropped into? A better approach would be to have a third-party listener, in an event-processing manner, notice that data has been stored in S3 and then commence the data validation. AWS SQS would even work well here.

Architects are often placed under pressure by our non-technical colleagues to develop and implement solutions quickly. It’s more expedient – and more cost-effective – to craft a single piece of code that does what’s required. At least in the short term.

The problem, though, is burdening components too much makes it very difficult to make changes in the future. So, what was a quick solution in the short term becomes inefficient over the long term. This can be a difficult concept for non-technical people in the enterprise to grasp, especially when they want their solution “yesterday.”

An example from another architecture discipline helps explain the advantage of separating functions:

The Dee & Charles Wyly Theater in Dallas is a performance space built in 2009 and designed to easily transform between different stage and audience seating styles. A small crew can reconfigure it in just a few hours because, instead of designing the theater as a single piece, the architects developed seating, stage elements, and other components as discrete modules. The planning and construction took longer than building a standard, non-configurable venue, but the payoff is a space that adapts to various needs – without having to build a whole new theater for each.

When migrating to the cloud, AWS provides an extensive library of services that encourages architects to properly separate concerns and it also reduces development time, because we don’t have to build everything from scratch. But it’s still necessary to architect the solution correctly.

The second, and related, consideration arises frequently when re-engineering the data mart reporting component – namely, the need to separate design and construction of the analysis routines and reports from the underlying system. Perhaps a new line of business will require subjecting data to different analytical processes, or a business unit requires a different form of reporting. By architecting the solution from the outset to ensure that each component’s role is tightly defined, we can then quickly adapt the platform to these evolving business needs. This is important for allowing the platform to be used for other data-analysis use cases, in other business lines. AWS provides tools to do this: Athena – a serverless, interactive query service – is a good example.

In summary, architects must always be cognizant of tight interdependencies between or even within system components, and work to separate them. Not doing so exposes the design to brittleness – making it unable to adapt and change without significant effort. As an architect, I always think in terms of the future and in terms of abstractions. The most valuable question I can ask while designing is, “What if?”

For more information, feel free to reach out or visit our AWS page for more information.

Capgemini is a sponsor of AWS re:Invent 2020. Join us! Learn more about the event and our AWS offerings here.

Author

Matt Daynos

Matt is a pre-sales solution architect and manager helping clients of all kinds with their digital transformation and cloud computing initiatives.