The Open Business Data Lake Standard, Part IV

Publish date:

A reference architecture describing standards that help organizations set up an “insights-driven” strategy.

In my previous blog posts (Part I,  Part II and Part III) about the ‘Open Business Data Lake Conceptual Framework (O-BDL), I introduced its background, concept, characteristics and platform capabilities. In this fourth part I want to compare a Data Lake with other data processing platforms.

Due to its characteristics, a Data Lake is a special type of processing platform. This can best be shown by comparing it with the following existing platforms:

  • Data Federation (ETL)
  • Enterprise Service Bus (ESB)
  • High-Performance Computing (HPC)

Data Federation (ETL)
An O-BDL is not a data federation processing platform. While data federation tools are able to cross- join data from multiple sources, normally those tools are IT-driven and managed, and they lack the near real-time analytic processing power and agility needed by the users.

Enterprise Service Bus (ESB)
An O-BDL is It is not a new version of the Enterprise Service Bus (ESB). While some ESB vendors have been touting near real-time data analytics (i.e., Complex Event Processing, or CEP) for years, again, those are centrally managed by IT, and most of the deeper analytic needs require data-at-rest analysis as well, not just data-in-motion analytics.

High Performance Computing (HPC)
An O-BDL is not a High-Performance Computing (HPC) platform. An O-BDL relies on different architecture principles and software frameworks. While in HPC environments data is moved to a large “super-computing” facility, in an O-BDL processing is distributed and sent where pieces of data are stored.

While an O-BDL platform is different from the three mentioned, it can be combined perfectly by sitting on top of either of them, abstracting away the problem of performance and working with disparate data sources and targets. Data federation platforms may also be used as a method to create simplifying views of the data stored in an O-BDL for business users. Finally, the same physical infrastructures (clusters) could be used as a HPC environment or an O-BDL.

In the fifth blog in this series I’ll discuss the key concepts of an O-BDL and describes how it should work.

Related Posts

Insights and Data

Collaborative data ecosystems

Date icon October 13, 2021

Data ecosystems are emerging across industries, from financial services to automotive to...

Insights and Data

Best practices to admire (and adopt) from the data masters

Zhiwei Jiang
Date icon September 29, 2021

By starting with the business’s strategic priorities and specific problems, data masters are...

Insights and Data

An engineering approach to data mesh

Ron Tolido
Date icon September 28, 2021

A data-powered enterprise creates value by making data accessible across the enterprise. Yet,...