Data lake or data swamp

Publish date:

When talking about big data the term data lake is often used, the term is originally introduced by James Dixon, Pentaho CTO. The term refers to gathering all available data so it can be used in a big data strategy. By introducing this term James Dixon was correct and part of collecting all data can […]

When talking about big data the term data lake is often used, the term is originally introduced by James Dixon, Pentaho CTO. The term refers to gathering all available data so it can be used in a big data strategy. By introducing this term James Dixon was correct and part of collecting all data can be part of your big data strategy. However, there is a need to ensure your data lake is not turning into a data swamp. Gartner states some warning on the data lake approach in the “Gartner Says Beware of the Data Lake Fallacy” post on the Gartner website.
 
Data lakes therefore carry substantial risks. The most important is the inability to determine data quality or the lineage of findings by other analysts or users that have found value, previously, in using the same data in the lake. By its definition, a data lake accepts any data, without oversight or governance. Without descriptive metadata and a mechanism to maintain it, the data lake risks turning into a data swamp. And without metadata, every subsequent use of data means analysts start from scratch.
 
The meaning of this message and the entire post is that the idea of a data lake is not a false one and can be very useful however it states that one needs to ensure that the data you put into your data lake is managed in a correct manner to ensure that data quality is up to a level you can use it and to ensure that it makes sense to people using the data. Without the proper management and tagging of the data you will just have a large set of meaningless bits. Ensuring the data can be placed in the correct context will ensure the data can be used in a value adding process. 
 
Within the Capgemini Big Data approach the entire data management stream is integrated in all steps. Acquisition, Marshalling, Analysis and Action are the four defined steps all supported by Master Data Management and Data Governance.
 
Capgemini Oracle Big Data
 
When applying the Capgemini Big Data approach by using Oracle technology you can make use of a number of solutions from Oracle for Master Data Management and Data Governance. Oracle provides the Oracle Enterprise Data Quality Product Family solutions as well as Oracle Enterprise Metadata Management for this. 
 
Oracle Enterprise Data Quality Product Family:
The Oracle Enterprise Data Quality family of products helps organizations achieve maximum value from their businesscritical applications by delivering fitforpurpose data. These products also enable individuals and collaborative teams to quickly and easily identify and resolve any problems in underlying data. With Oracle Enterprise Data Quality products, customers can identify new opportunities, improve operational efficiency, and more efficiently comply with industry or governmental regulation.
 
Oracle Enterprise Metadata Management:
Oracle Enterprise Metadata Management brings powerful business capabilities to the modern enterprise to harvest and govern metadata across its whole Data Management technologies. By being able to provide data transparency not only within Oracle but also 3rd party technology, Oracle Enterprise Metadata Management is a must have technology for any organization looking to seriously tackle Governance, Productivity Improvement and Lifecycle Management challenges.

For more information about this topic, feel free to contact Johan Louwers directly via johan.louwers@capgemini.com

Related Posts

Architecture

Capgemini’s Technovision 2020

Gunnar Menzel
Date icon March 13, 2020

This is what I show people when they ask about the future of tech

application modernization

Realigning IT to the business with app modernization

Jigar Pathak
Date icon March 5, 2020

What’s the difference between the various application modernization approaches? And what’s...

Architecture

Getting to the heart of it all, or life as a lead architect at Capgemini

Danish Nadeem
Date icon December 13, 2019

For Capgemini, the architect is at the heart of any engagement and the company has a very...

cookies.

By continuing to navigate on this website, you accept the use of cookies.

For more information and to change the setting of cookies on your computer, please read our Privacy Policy.

Close

Close cookie information