When talking about big data the term data lake is often used, the term is originally introduced by James Dixon, Pentaho CTO. The term refers to gathering all available data so it can be used in a big data strategy. By introducing this term James Dixon was correct and part of collecting all data can be part of your big data strategy. However, there is a need to ensure your data lake is not turning into a data swamp. Gartner states some warning on the data lake approach in the “Gartner Says Beware of the Data Lake Fallacy
” post on the Gartner website.
“Data lakes therefore carry substantial risks. The most important is the inability to determine data quality or the lineage of findings by other analysts or users that have found value, previously, in using the same data in the lake. By its definition, a data lake accepts any data, without oversight or governance. Without descriptive metadata and a mechanism to maintain it, the data lake risks turning into a data swamp. And without metadata, every subsequent use of data means analysts start from scratch.”
The meaning of this message and the entire post is that the idea of a data lake is not a false one and can be very useful however it states that one needs to ensure that the data you put into your data lake is managed in a correct manner to ensure that data quality is up to a level you can use it and to ensure that it makes sense to people using the data. Without the proper management and tagging of the data you will just have a large set of meaningless bits. Ensuring the data can be placed in the correct context will ensure the data can be used in a value adding process.
Within the Capgemini Big Data approach the entire data management stream is integrated in all steps. Acquisition, Marshalling, Analysis and Action are the four defined steps all supported by Master Data Management and Data Governance.
When applying the Capgemini Big Data approach by using Oracle technology you can make use of a number of solutions from Oracle for Master Data Management and Data Governance. Oracle provides the Oracle Enterprise Data Quality Product Family solutions as well as Oracle Enterprise Metadata Management for this.
Oracle Enterprise Data Quality Product Family:
The Oracle Enterprise Data Quality
family of products helps organizations achieve maximum value from their businesscritical applications by delivering fitforpurpose data. These products also enable individuals and collaborative teams to quickly and easily identify and resolve any problems in underlying data. With Oracle Enterprise Data Quality products, customers can identify new opportunities, improve operational efficiency, and more efficiently comply with industry or governmental regulation.
Oracle Enterprise Metadata Management:
Oracle Enterprise Metadata Management
brings powerful business capabilities to the modern enterprise to harvest and govern metadata across its whole Data Management technologies. By being able to provide data transparency not only within Oracle but also 3rd party technology, Oracle Enterprise Metadata Management is a must have technology for any organization looking to seriously tackle Governance, Productivity Improvement and Lifecycle Management challenges.
For more information about this topic, feel free to contact Johan Louwers directly via firstname.lastname@example.org