In my first blog about the ‘Open Business Data Lake Conceptual Framework (O-BDL) I introduced its background and concept. In this second part I want to discuss the characteristics of as O-BDL.
An Open Business Data Lake is an enterprise capability to help organizations embedding new disruptive “big data” solutions into existing data landscapes which are often “data warehouse”-centric. This capability should increase organizational performance and competitiveness by setting up an associated “data”-centric and “insights-driven” strategy.
An O-BDL represents a new approach to the creation of analytical insights for the business, from the acceleration of traditional enterprise reporting through to new analytics driven by data science. It especially aims to bridge the gap between the rigidity of data warehouses/ data marts and the velocity and needs of the business.
To achieve this, an O-BDL must contain the following essential characteristics:
- It covers data storage and data processing (especially transformation) at scale for a lower cost than previous approaches/solutions
- It’s open to defining the structure of the data at the time it’s needed, in the context why it’s needed (schema on read)
- It’s open to all kinds of data and mixes many types of data in the same repository
- It combines a batch layer to process large sets of data as well as a streaming layer for real-time (or near real-time) processing.
- It’s designed to scale by distributing both storage and processing capabilities over a cluster of machines.
In this sense, the O-BDL is a platform which provides a set of common, core services that are required or useful to create new insights from data. As a platform, it helps business people and data scientists to discover patterns and insights from data and provides services to helps data scientists and IT people for the development of analytics that scale well. Moreover, it’s open to new processing approaches (i.e. cognitive learning) and to new processing engines (i.e. distributed in-memory processing) as well as providing services to operate analytics.
Looking at the characteristics it becomes clear that an O-BDL is not:
- a traditional data federation layer. Data federation tools are able to cross- join data from multiple sources, but are normally IT-driven and managed, and lack the near real-time analytic processing power and agility needed by the users.
- a new version of the Enterprise Service Bus (ESB). Although near real-time data analytics (i.e., Complex Event Processing, or CEP) is possible, ESB’s are also managed by IT and lack data-at-rest capabilities, which are needed for most of the deeper analytics.
- a High-Performance Computing (HPC) environment. In HPC environments data is moved to a large “super-computing” facility, while in an O-BDL processing is distributed and sent where pieces of data are stored.
In the third blog in this series I’ll going into more detail with regards to the O-BDL domains and capabilities.