In my previous article (“Improve the quality of reporting with analytical MDM”, 01/05/14), I showed the benefits of the MDM brick for the traditional analytic world (datawarehouse).

What about MDM with regards to Big Data (and conversely)?
Many questions are asked: is there a link between MDM and Big Data? If so, which one? What does Big Data change for MDM? How can  evolve MDM with the new uses brought by Big Data?

First of all, let’s remind ourselves a few characteristics of MDM and Big Data:

Apart from analytical MDM, traditional MDM is first Operational-oriented. Big Data is fully Analytical-oriented (Analytics). Does it mean there is no synergy between MDM and Big Data?

Let’s represent MDM and Big Data on a schema and let us wonder on possible links between the two. I see three main links detailed hereafter.

1. Big Data feeds MDM

Within the Big Data mass, we can retrieve high-value information about clients, products, and other Master Data. If these pieces of information are not known yet within the company then they may enrich the MDM hub. We may think of an example around the information about personal hobbies – that are retrieved from social networks (e.g. “John Doe’s hobby is golf”). As this information comes from outside of the company, it should be considered with awareness.

2. MDM feeds Big Data (or feeds the dimensions of the Big Data facts)

At the datawarehouse level, the challenges are multiple and especially the ones about data modeling. In all cases, the MDM  must play an important role by providing the data-model backbone to bind the Big Data facts. This is what Capgemini calls the POLE (cf. the white paper “Mastering Big Data“).
All that is basically quite similar to traditional Analytical MDM. We will not extend the debate here about the new data-modeling and storage techniques (e.g. vertical databases).

3. MDM feeds Big Data (or helps to navigate within Big [Raw] Data)

One of the strong new uses of Big Data is Exploratory Analysis. The user does not always know exactly what he is looking for; he may discover insights by navigating through the raw data. For that he can be assisted by mathematical algorithms (e.g. based on R language). In order to avoid getting lost within the data ocean, it is necessary to navigate while keeping sight of reference points. This is where the exploration must call on the MDM database.
One of the major challenges here is to be able to link master entities contained within Big Data to the ones from the MDM. For that, MDM tools would have to evolve, for example by optimizing their fuzzy matching engines based on multi-attributes.

To conclude, I would like to remind that MDM’s first goal is to provide companies with unique data referential and to ensure alignment within the Operational Information System.
Then, regarding Big Data / Analytics, MDM has an important role to play because it provides the compass to navigate within the ocean of data. Without MDM, we may risk the drowning.  The ability to link Big Data to Master Data is the major challenge.
We often define Big Data by its 3V (Volume, Variety, Velocity). Many analysts and software vendors also speak about a fourth V: Veracity. MDM, as a discipline is fully part of this fourth V.