Capping IT Off

Capping IT Off

Opinions expressed on this blog reflect the writer’s views and not the position of the Capgemini Group

MDM: The difference between a data model and a information rule

When designing transactional systems you often look to represent the business model in the data model, this is a basic premise of most ERP models. If you have a rule, such as 'communication adresses only apply to individuals" then you just create the communication addresses as a child object of your 'individual' table.

This is what Siebel does for instance. This is fine, but means you need different types of preferences for different channels and that when you start thinking about the relationships to Objects you have a difference between electronic addresses and physical ones, which makes channel shifting hard. So for instance if you want to change the invoice address from physical to electronic for a given object then you are a bit scuppered. This is normally done for performance reasons or because a specific set of business processes all work in the same way. Its not a bad thing its just important to realise that its being done for transactional reasons.

At the other end of the scale we have analytical models, these are set up to handle questions so the schemas are optimised for that sort of thing, often extremely flattened tables with OLAP Cubes or Star Schemas on top. The goal of these is to provide the most efficient analytical model possible. More and more however there is a new way of thinking about information modelling and that is to separate the data model from the business rules and make the data model as flexible as possible. For MDM models this is critical and is why the concept of a POLE based approach is becoming the standard way of thinking about data models. There are several key reasons for this

  1. Channel shifting - the transactional model take by Siebel made perfect sense in the 90s when it was designed and email was a rare thing, today however where its often the primary communication method its a problem having it treated differently from physical locations
  2. Consistency - Having a POLE based approach means you can set up elements to work consistently at the highest level, so communication preferences for instance can be done for all individuals and organisations across all types of locations whether physical, electronic or telephonic.
  3. Evolution - coding in a specific type of relationship into the data model is the hardest place to change. Flaws in several ERP data models have existed for 10+ years due to the impact of making changes at such a fundamental level.
As organisations look at more federated approaches then the need for MDM models to be more flexible and enterprise models to be enable change means that a new way of viewing information is becoming the norm. This way is about considering storage to be purely the physical side of the question. Whether this is through a NoSQL database, REST interface, Hadoop or a more traditional database isn't important the key is that it is no longer the responsibility of the database to ensure that business rules are enforced.

The role of the database in this world is to ensure that data rules are obeyed. Thus in a POLE model the data model states that all individuals and corporations must be a child of 'party' and all locations types must be a child of 'location'. It sets up the constraints as to what relationships are allowable and what enumerations are allowable. In other words the data model enforces a set of storage constraints which are there to enable all possible scenarios to be modelled.

The challenge then is what do you do with the business information rules? First of all in this context what do we mean by a business information rule? Well these are the rules that provide the local restrictions on information. The rule that a customer must have a physical address to be valid, the rule that only hotels can have bedrooms in a travel system, the rule that the maximum number of allowed credit cards is 10.  These aren't the rules that 'people have addresses' that is just a straight data entity rule.  Business Information rules are restrictions on those associations.

Here you have a choice, you can of course put these into the data schema.  So with the credit cards you just have 10 line items or 10 FK relationships, with the Beds pieces you just hardcode a relationship from the Hotel entity into the beds.

This is the traditional way of doing it and is one beloved of transactional systems.  For MDM however, and arguably for modern transaction systems, it has a significant issue: what happens when things change?

These business information rules are exactly where things change as a business.  The restriction on credit cards is determined to be anti-growth and so is removed, suddenly the database schema needs to change and you need to upgrade and migrate your solution.  You start selling cruise holidays and suddenly ships have beds and on the physical address you decide that electronic addresses are all you need.
With the performance of modern MDM technologies these sort of hard-coded relationships in the data model are to be avoided at all costs.  Instead these should be shifted into a business information rules layer where validations can be applied from a master data perspective.  In the same way as master data defines the rules of 'what good looks like' on field level verification, entity level verification and matching so it should also be doing the verification based on relationships between entities.

So some basic rules for a good MDM model

  1. Create the model based on POLE
  2. Shift relationship based restrictions into the MDM rules layer
More detail in future but repeatedly I've seen significant issues when restrictions are put into the schema.

About the author


Leave a comment

Your email address will not be published. Required fields are marked *.