A common challenge on projects, involving the full end to end data lifecycle, is how to address conflicting transactional and analytical requirements. This is exacerbated where there are real-time requirements such as in digital experience, digital manufacturing, and IoT projects.
A typical approach is to use a relational database for operational data, and to replicate the data into a data warehouse where we can optimize for analytical queries. This is often modelled using dimensions e.g. using a star schema.
While this is a proven approach, there are inherent inefficiencies. Relational databases suffer from a data impedance, where the data storage format is completely different to the data consumption format, so data reads are inefficient. The cost of normalization and re-indexing when saving data creates a write speed limitation. The structured nature of the data impedes product innovation of the applications generating and consuming the data.
NoSQL databases attempt to provide greater speed and efficiency by trading one or more of the positive ACID attributes (atomicity, consistency, isolation, durability) of a relational database for speed. The large number of different NoSQL systems is a testament to the different trade-offs that can be made. Where you have a large time based data stream, for IoT and Industrial IoT (IIoT) use cases, more specialized time series databases are required. For some use cases, we ditch the operational database completely and use big data technologies.
None of these are addressing the more fundamental inefficiency, that of data duplication, the complexity of systems and pipelines that need to be created and supported, and the inherent delays in replicating data from the source or operational database to the analytical database. These delays don’t just impact usability, they can also impact the real time analytics often needed in a digital environment. Hence the emergence of the ‘Translytical” database that offers transactional characteristics with the ability to run near real time analytics. This is achieved using in memory databases together with other modern database techniques.
A recent Forrester report evaluated the Translytical database providers. It is no surprise to see Oracle, SAP, Microsoft and IBM as leaders in this space, alongside vendors including Gigaspaces, MemSQL, Redis and MongoDB. (See the report for the full list). It is also interesting to see products like the MS Azure Time Series Insights preview adopting Translytical capabilities with data being stored in memory and on disk for long-term storage.
Some points to consider when selecting one of these databases include: performance for your situation, replication capabilities, consistency model (ACID vs BASE), data structure / data types, data tiering between memory and storage, autoscaling for peak loads.
One of the challenging and interesting aspects of being an Architect is keeping up to date with these database advances, so that we choose the most appropriate technologies, and not just use the tools that we are familiar with, because they have worked for us in the past. For this its crucial that we understand real life examples of where these technologies have worked, and this is where the Capgemini Architect Community is key. For myself, I’m currently involved in a Digital Manufacturing, IIoT project with Translytical requirements where we are using Azure TSI. I also recently saw an interesting presentation by Yoav Einav from Gigaspaces, where he highlighted a recent successful Capgemini project for an automotive client that uses Gigaspaces.
 The Forrester Wave™: Translytical Data Platforms, Q4 2019
For more information on how Translytical databases can help your organization, please reach out to me via my Expert Connect profile.