Today at Capgemini we are annoucing something a little different to what we’ve done before.  In the last 9 months or so I’ve been working with the folks at Pivotal and a team here at Capgemini on solving one of the problems we’ve kept seeing in data programs.  The problem is simple to state but has been hard to historically overcome.

IT has a strategy to drive single solutions around data – EDWs and the Single Canonical form, while the business has a culture of heterogeneity and likes local solutions for infomation

IT has been forced into this strategy because, historically, the costs of storage and data movement, via ETL or in real-time, have limited what can be achieved within the IT budget.  That budget has been limited in part because of the perceived value of information being relatively low when viewed at a corporate level as opposed to a local operational level where it rapidly becomes essential to effective operations. 

While corporate views are often happy to provide simple financial consolidation this doesn’t deliver the sort of detail, and pace of change, that helps the business work day to day.  These challenges have led to EDWs which meet the needs of finance and some corporate KPIs and a fragmented estate of multiple data marts, Excel spreadsheets and ad-hoc solutions to help run the day-to-day business.

At Capgemini in the last few years we’ve started using Hadoop quite extensively in specific areas, normally areas where the cost of storage made traditional approaches unfeasible. We’ve also been using Hadoop to offload processing from existing data warehouses to reduce costs and also enable information to be provisioned for new uses. 

That led to the recognition that the two historical constraints of storage and data movement costs were not really constraints in today’s technologies.  Pivotal’s technology stack also gave another key build which is via Gemfire and Spring it becomes possible to do data movement in real-time as well. 

So with these two constraints removed what is the possibility?

Well the first piece is that we need to meet the business challenge still of having both local and corporate views.  This wasn’t a big challenge as its something that we’ve been doing in our Master Data Management practice for several years. Together with Pivotal this led to the concepts behind the Business Data Lake:

  • Store Everything
  • Encourage Local
  • Govern only the common
  • Treat Global as a Local view

The impact of this is summed up in the document ‘Principles of the Business Data Lake’ and the technology behind it is covered in ‘The Technology of the Business Data Lake’, finally there is a joint paper between Capgemini and Pivotal which talks about what this means from a business impact perspective.  The Business Data Lake is therefore a reaction to the growing challenges of businesses around data and recognises how Pivotal’s technologies help to remove some of the historical constraints that have led to sub-optimal solutions. 

Often there is a challenge with working at an SI that some things can be more theory that practice when it comes to detailed technology, by co-innovating with Pivotal we’ve managed to deliver both the theory and practice of how it will work.