Information Governance has often been one of the poorest parts of the IT department.  This comes from two parts, firstly because IT is much more about the ‘T’ than the ‘I’ but the second reason is that IT tends to approach the problem with a ‘maxiumal’ approach which grates against how the business actually wants to work.  The fallacy on which this over governance is based is the concept that just because its in a schema it should be governed at the same level.  The reality is that all information is not created equal.  There are key pieces of information that allow you to navigate between different data sets and where agreeing on those elements has significantly greater impact in enabling business collaboration.  This approach around identifying the points of stability is something we are very well versed with at Capgemini as it underpins our Master Data Management approach of Mastering the Information Ocean something that enables us to go beyond an organisations boundaries, which is where the Business Data Lake resides, and start looking at how you link up lakes from multiple parties to share information and collaborate on analytics.

At the heart of this therefore is a dichotomy.  While when data was smaller we tried, and often failed, to govern the full scope of information we now need to take another approach.  For Big Data we actually need to govern less information in order to succeed.  The previous approach which attempted to govern everything to the same level cannot hope to succeed when including large amounts of external data and ever increasing quantities of unstructured information.  Therefore with Big Data the goal is to find the minimum set of information that will truly enable effective collaboration and therefore the sort of analytical benefits that Big Data can deliver to a company or government.  For Big Data its about taking control of the small things and ensuring that governance succeeds where it counts rather than trying to make it work to the same level everywhere.

So why is it that the small matters while the full scope does not:?  Part of this comes down to focus.  These small elements, the master and reference data, are what enable horizontal communication.  The other part is what that focus brings: people who care.While in IT we’ve put it in a schema and said ‘all data is equal’ the reality is that different parts of the business case about different information sets and you are much better off concentrating on who actually cares in terms of improving data than trying to make people govern information when they have no business reason to care.  So while Procurement Order field 20 is important to procurement the folks in Sales couldn’t care, so don’t bother trying and instead work with Procurement and get that fixed for them.

With the Business Data Lake we’ve taken this approach as the new foundation of information.  Doing away with the Single Canonical Form of the Data Warehouse and instead enabling the local view, where the business cares, to be the primary approach with governance focused on where it delivers value.