In my previous blog posts (Part I,  Part II,  and Part III) about the ‘Open Business Data Lake Conceptual Framework (O-BDL) I introduced its background, concept, characteristics and platform capabilities. In Part IV) I compared a Data Lake with other data processing platforms. In this fifth part I’ll discuss the key concepts of an O-BDL and describes how it should work.

To understand how an O-BDL should work, the following process diagram (applying The ArchiMate® Enterprise Architecture Modeling Language) is used.

An O-BDL ingests data (batch, micro-batch  as well as real-time) from other applications (through feeds) and data sources (through ingestion). Most data items (structured and semi- and unstructured) go straight into the main distributed – thus scalable – data store preserving the original structure. Some data items, however, are processed on arrival since the resulting insight is needed immediately. Data stored in an O-BDL is called data-at-rest and data that is processed without storing is called data-in-motion.

An O-BDL can generate insights from different assemblies of the ingested data. These insights can be translated into actions in multiple ways that make sense from a business point of view. A service layer takes care of delivering insights at different points of action into business processes, applications, etc.

To turn data into insights an O-BDL employs two types of data processing capabilities:

  • A real-time processing capability that creates real-time insights as data is ingested
  • Iteratively-designed distillation steps that progressively enrich, combine, or execute analytics with stored data to assemble new, “more valuable” datasets until it is considered a business-relevant insight

Different functions will manage the O-BDL and use of its content . The key roles involved in an O-BDL:

  • The platform owner (and operator) provides (and operates) O-BDL platform services.
  • The business use-case owners are responsible for the added business value of insights and actions.
  • The data owners are responsible for defining and applying the proper data policy, through the unified data management services of an O-BDL.
  • The business use-case contributors (e.g., “data scientists”) are responsible for discovering, experimenting with, and validating new processing capabilities (analytics) that are:
    • Relevant for the business use-case
    • Innovative, smart, efficient, like a positive “hack”
    • Consistent with the mathematical and statistical state-of-the-art.

Now it’s clear how an O-BDL should work, it’s time to define possible business scenario’s which can make use of an O-BDL. This will be discussed in the sixth blog post.