In my recent blogs, I have discussed the emergence of the BI 3.0 standard, it’s journey and, alluded to the fact that it will need, both a big data foundation and a new form of data warehousing paradigm.
Cue, the Big Data Warehouse and its not quite what you were expecting …
The History thus far
The Data Warehouse has been a part of corporate IT strategy for many decades and yet, from a design and delivery approach, alongside fairly substantial technology advances, really has not changed that considerably.
The concept is still sound and, should be a mainstay in any CIO strategy however the role, architecture and approach to this corporate repository is changing fundamentally as the world moves towards Web 3.0.
Most organisations still operate in batch, are unable to provision information of sufficient quality quickly enough, focus predominantly on an inside-out structured view of enterprise data and undergo a perpetual ‘wrestling match’ for control between business consumers and IT practitioners’.
Recent technology advances such as Data Appliances, Parallel Integration Grids, In-Memory and Enterprise Discovery/Search attempted to embed a ‘veneer of simplicity, speed and flexibility’ over a fundamentally static and fixed approach grounded in database and integration philosophy since the late 1990’s..
The world has moved on and Data Warehouse traditionalism is no longer applicable
The cloud and information device diversity is expanding; the need to transform data into information within the context of process event, knowledge worker participants and preferred business outcomes has never been more apparent.
Traditional approaches to structuring, consolidating and presenting corporate data simply cannot meet the pace of change, our need for simplicity, nor the information appetite of the ‘always on, increasingly connected’ business consumer.
Additionally, a new breed of technologies and accelerators are emerging with alternative, more dynamic and agile approaches to data provision in an increasingly multi-structured and multi-device environment.
The ‘Big Data Warehouse’. What might this look like?
So, it is important to note, that we are not trying to second guess Bill Inmon and any potential evolution of DW 2.0 rather, we are attempting to envision where this might go from a capability and experience perspective.
There has been significant debate on the relevancy of a data warehouse and whether it should continue to be a physical or virtual construct. My belief is that Web 3.0 progression will imply both.
- Commodity Scalable platform combining both cloud and on-premise parallel storage, memory and processing (CPU/GPU) advances with ability to ramp-up and ramp-down according to consumer SLAs on demand.
- Multi-format Database and integration foundations supporting inside-out and outside-in business perspectives underpinned by physical (Integration, Messaging, EAI, Streams etc), virtual (Hadoop, Virtualisation, Federation, Mash-ups etc) and cloud-based (Enterprise Search, Semantic Web (3.0), Discovery etc) data processing technologies
- Configurable model-based multi-latency Data Warehouse Generators to enable increasingly automated model-based design, change management and optimisation of all data synchronisation activities ‘semantically’.
- Federated Data Model, Business Dictionary and corresponding Master Data Cache to hold the core business activity data, transactional data, reference data and master data complete with it’s inter-relationships dynamically and critically semantically.
- Apps-ecosystem of connectors that are canonically-based for structured and unstructured processing, data transformation components and data cleansing components that are inter-changeable and, that shield the business consumers and crucially, the IT developers from the complexity of the underlying data sources
The ‘Big Data Warehouse’. What might this ‘feel’ like?
The Big Data Warehouse will not be centralised and physical in nature
It will be logically and semantically a single corporate view yet, will be underpinned by a cocktail of underlying information assets that are integrated using hardware and software capabilities relevant to the consumer utilisation, business value, and timeliness of the information being requested.
Information which is in key demand, and which services a variety of needs, will be seamlessly promoted onto the appropriate platform components to service that urgency in a cost-effective manner. This should not require a ‘bottleneck of IT practitioners’ to adapt, integration, tune and optimise rather, will be a semi-automatic configuration exercise.
Whether the information is structured, unstructured, on-premise, in the cloud, real-time or batch, internally provisioned or externally ‘tweeted’ , known or ‘yet to be found’, on the corporate networks or on the latest BYOD offering, this will be transparent to the collaborative, social and mobile knowledge worker of tomorrow.
The feeding and caring for the Big Data Warehouse will be highly configurable, dynamic in nature with augmentation of information assets available in hours rather than weeks or months and, is likely to be highly business model based.
It will support the two-speed information organisation, meeting the transitional and short-term needs of the business community whilst, simultaneously ensuring that information assets are appropriately leveraged for the organisation as a whole.
The mechanism for moving tactical data into the strategic core of the platform will be based on a self-regulating governance workgroup of business consumers and IT practitioners (think Wikipedia moderation rather than database administration), where information assets are ‘rated’ for applicability, accuracy, quality and trust-worthiness and promoted accordingly.
IT focus will be on flexible scale, speed of provision, semantic alignment and consumer empowerment.
Business focus will be on articulation of business outcomes through rapid, highly-focused collaborative ‘mash-up-like’ prototyping of new and existing information streams with benefits and UAT performed on-the-fly in a new breed of BI 3.0 collaborative applications.
What do you think?
I would appreciate alternative ideas on our progression towards the Big Data Warehouse. Thoughts anybody?