The Open Business Data Lake Standard, Part II

Publish date:

A reference architecture describing standards that help organizations set up an “insights-driven” strategy.

In my first blog about the ‘Open Business Data Lake Conceptual Framework (O-BDL) I introduced its background and concept. In this second part I want to discuss the characteristics of as O-BDL.

An Open Business Data Lake is an enterprise capability to help organizations embedding new disruptive “big data” solutions into existing data landscapes which are often “data warehouse”-centric. This capability should increase organizational performance and competitiveness by setting up an associated “data”-centric and “insights-driven” strategy.

An O-BDL represents a new approach to the creation of analytical insights for the business, from the acceleration of traditional enterprise reporting through to new analytics driven by data science. It especially aims to bridge the gap between the rigidity of data warehouses/ data marts and the velocity and needs of the business.

To achieve this, an O-BDL must contain the following essential characteristics:

  • It covers data storage and data processing (especially transformation) at scale for a lower cost than previous approaches/solutions
  • It’s open to defining the structure of the data at the time it’s needed, in the context why it’s needed (schema on read)
  • It’s open to all kinds of data and mixes many types of data in the same repository
  • It combines a batch layer to process large sets of data as well as a streaming layer for real-time (or near real-time) processing.
  • It’s designed to scale by distributing both storage and processing capabilities over a cluster of machines.

In this sense, the O-BDL is a platform which provides a set of common, core services that are required or useful to create new insights from data. As a platform, it helps business people and data scientists to discover patterns and insights from data and provides services to helps data scientists and IT people for the development of analytics that scale well. Moreover, it’s open to new processing approaches (i.e. cognitive learning) and to new processing engines (i.e. distributed in-memory processing) as well as providing services to operate analytics.

Looking at the characteristics it becomes clear that an O-BDL is not:

  • a traditional data federation layer. Data federation tools are able to cross- join data from multiple sources, but are normally IT-driven and managed, and lack the near real-time analytic processing power and agility needed by the users.
  • a new version of the Enterprise Service Bus (ESB). Although near real-time data analytics (i.e., Complex Event Processing, or CEP) is possible, ESB’s are also managed by IT and lack data-at-rest capabilities, which are needed for most of the deeper analytics.
  • a High-Performance Computing (HPC) environment. In HPC environments data is moved to a large “super-computing” facility, while in an O-BDL processing is distributed and sent where pieces of data are stored.

In the third blog in this series I’ll going into more detail with regards to the O-BDL domains and capabilities.

Related Posts

AI and analytics

Spotlight on Capgemini NA @ Informatica World 2018 | May 21–24 in Las Vegas

Jackson, Dusty
Date icon July 10, 2018

Spotlight on Capgemini NA @INFA World 2018 with key representation from Dusty Jackson, Scott...

Consumer Analytics

Bullwhip effect applied to a data supply chain

Denis Sproten
Date icon June 22, 2018

Take a look at how the bullwhip effect translates into the data supply chain built for your...

Artificial Intelligence

Even the artificial intelligence you buy is prejudiced

Reinoud Kaasschieter
Date icon June 21, 2018

When wrong data is fed into the algorithms, they also make the wrong decisions. Learn why do...


By continuing to navigate on this website, you accept the use of cookies.

For more information and to change the setting of cookies on your computer, please read our Privacy Policy.


Close cookie information