The Open Business Data Lake Standard, Part III

Publish date:

A reference architecture describing standards that help organizations set up an “insights-driven” strategy.

In my first and second blog about the “Open Business Data Lake Conceptual Framework (O-BDL)” I introduced its background, concept and characteristics. In this third part I want to discuss the capabilities of an O-BDL.

The O-BDL is a platform which provides a set of common capabilities that are required or useful to create new insights from data, regardless of its purpose (descriptive, diagnostic, predictive, prescriptive, see diagram below).

To create these insights the O-BDL is divided into the following four domains:

  • Data ingestion domain
  • Discovery domain
  • Data assembly domain
  • Insights creation domain

Data ingestion domain
The goal of this domain is to ingest data/events loaded from different sources and store it as-is (i.e. “schema on read”) and make it searchable. To achieve this, metadata is extracted, classified, and indexed.

Data discovery domain
The goal of this domain is to discover possible and relevant data patterns. To achieve this, ingested data sets are searched, compared, and assembled and stored into new data sets. If needed, synthetic data is generated and added to the assembled data set or data sets residing outside of the Data Lake are added by virtualizing them. New data sets are used to define and test machine learning algorithms. The assembled data sets are stored in a format depending on the proposed usage (i.e. “schema on write”).

Data assembly domain
The goal of this domain is to prepare data sets to be used for creating insights. To achieve this, ingested data sets are searched, compared and assembled and stored into new data sets. If needed, synthetic data might be generated and added to the assembled data set. It might also be relevant to connect to data sets residing outside of the O-BDL by virtualizing them. The quality of the data is assessed and improved (cleansing, standardization, harmonization, etc.), after which the data set is stored in a format depending on the proposed usage (i.e. “schema on write”), which can be an Enterprise Data Warehouse/Data Mart, or a SQL, key-value database, document, graph, or column database. Finally the metadata is extracted, classified, and indexed and the assembled data set is made available for distribution.

Insights creation domain
The goal of this domain is to create any type of insights (i.e. descriptive, diagnostic, predictive, prescriptive). To achieve this, assembled data sets are searched for and consumed within reports, algorithms, and/or simulations. When data is used coming directly from user input, natural processing capabilities are required. The output can be visualized, distributed, or embedded into a business process (i.e. rules engine) and will be stored in a format depending on the proposed usage (i.e. –“schema on write”).
To keep track of changes made in the data between ingestion and actual use (by whom), data lineage and monitoring, as well as data authorization capabilities are part of the O-BDL. Finally, data archiving capabilities should be applied when data isn’t used anymore or when to comply to legislation rules.

In the fourth blog in this series I’ll position the O-BDL domains within the CRISP-DM (Cross Industry Standard Process for Data Mining) and compare the O-BDL with other data processing platforms.

Related Posts

cookies

The cookie monster is dead: Long live the cookie monster

Patricia Evans
Date icon March 20, 2019

Why does good practice with cookies matter? Read further to find out.

gender equality

Women in the workplace: The changing face of women in IT

Loveleen Kaur
Date icon January 11, 2019

Women face a variety of challenges in the workplace. Read this blog to learn how Capgemini...

big data

Time to act – when 30% waste is just too much

Mark Deighton
Date icon May 31, 2018

Water companies are already working hard to reduce leakage, but are very aware that more...

cookies.

By continuing to navigate on this website, you accept the use of cookies.

For more information and to change the setting of cookies on your computer, please read our Privacy Policy.

Close

Close cookie information