Data Management

Data Management – Exploit the gold in your data!

Digitalization provides access to large quantities of highly detailed data, which today is seen as “the new gold”. The value lies in its scope for analysing data and ensuring a better decision-making basis. However, large quantities of data also involve big challenges: How should we obtain, store, make accessible, use, structure and secure our data in the best possible way?

Data Management is all about how you can best exploit the full potential of your digital gold.

Interest in Data Management is increasing as more and more companies get under way with advanced Business Intelligence operations, such as Self-Service BI and predictive analysis with Data Science/Machine Learning. This in turn has resulted in an increased need for data from multiple sources – and a need to ensure that the right data, of maximum quality and the right level of detail, is used in these analyses.

The most common problems are that data is divided into several different source systems, and that they do not communicate properly with each other. The traditional solution has been to build a data warehouse based on a server in a server room, and to subsequently fetch, convert and store data from these data sources in the data warehouse. The relevant data is then available for reports and analyses from that location.

Data Management is undergoing considerable change

However, a lot has happened in recent years and right now Data Management is undergoing major changes. Trends such as cloud-based data, Self-Service BI and the demand for proactive, predictive or even prescriptive insights have all increased the pace of change in many companies.

Today even traditional operations are investing in analysis departments staffed by data scientists and data analysts. Data architecture and data management solutions are thus becoming increasingly important. Among the questions that need to be resolved: How should the model be structured? How and for whom should the information be available? What should the solution look like so that the information is as correct as possible? How quickly can I obtain the information?

The new challenges, and at the same time the opportunities that companies and organisations are facing, have changed the perception of data and how its potential can be fully exploited.

Trends and challenges in Data Management

Cloudification – an attractive new alternative

Cloudification has transitioned from being just a hot term to becoming an attractive and viable alternative. Microsoft, Amazon and Google are making strong progress in this sphere and are establishing large facilities in Sweden too. To this can be added niche players such as Snowflake and Matillion, which apply their Cloud Data Warehouse solutions to the cloud giants’ platforms (Azure, AWS and Google Cloud) thus partly competing with the major cloud platform providers’ own tools.

We can also see that increasing numbers of companies in the Nordic region are transitioning to cloud data storage, and there are many benefits to this move.

First and foremost, cloud-based solutions reduce obstacles to new data warehouse initiatives, since you do not need to invest in expensive servers and software licences. With a cloud solution you also avoid a number of concerns regarding administration and upgrades, and gain in terms of scalability, elasticity and the possibility of a more balanced cost structure.

A traditional server solution is dimensioned to suit peak loads, which means it is unnecessarily powerful (and expensive) in normal use. A cloud solution, on the other hand, is dimensioned dynamically, based on current storage needs and the demand for calculation capacity.

Disadvantages? There are of course aspects that you need to consider such as information security, authorisation control and the possibility of being able to move data as needed. The long-term cost also needs to be examined, although this is difficult in such a fast-changing area.

Self Service BI – increases the need for blending data

It is becoming increasingly common for different functions within a company or organisation to work with different kinds of analysis through what is known as Self-Service BI. This imposes demands on efficient yet user-friendly self-service tools to make it easy to structure analyses, visualisations and reports without the involvement of the IT department. This helps cut both lead times and the IT department’s backlog.

But it’s not enough to have the right tools – analysis also increases the need for structured, quality-assured and authorisation-regulated data. The more advanced users also have a need to add their own data – known as Data Blending – which falls within the area of Self-Service Data Preparation.

Some BI tools have built-in support for this, while others require supplementary software. Among the leading suppliers of independent programs for Self-Service Data Preparation we find companies such as Alteryx.

Data Science imposes new demands on data handling

Today Business Intelligence is not only about reactive or descriptive analysis and reporting, but also about the various levels of advanced, proactive analysis using Machine Learning (ML). More specifically, it is about predictive, prescriptive and even cognitive analysis, the aim of which is to predict events, suggest measures, or quite simply implement actions. In all these cases with the overriding goal of strengthening business benefit and increasing competitiveness.

As Data Science initiatives and proactive analysis become increasingly common, there is a heightened demand for other types of data as well as demands for greater granularity or detail resolution. In particular there is a greater need to utilise external data, unstructured data, streamed data and other types of data such as images, audio, social media and so on.

Companies and organisations have traditionally filled their data warehouses with their own internal and finely structured data. However, the conventional ETL approach (Extract, Transform, Load) is not entirely satisfactory when the aim is advanced analysis using Data Science. Nor does ETL work in what is known as Shared Nothing or Massively Parallel environments such as Azure or Snowflake.

In order to handle large quantities of data (often referred to as Big Data) the data warehouse needs to be supplemented with something called a Data Lake. No processing takes place in a Data Lake; the collected data is neither filtered nor structured but is instead saved as unaltered raw data at the highest possible level of detail.

With this type of analysis work, what is instead more common is the ELT (Extract, Load, Transform) concept. The transformation phase, which encompasses application of business logic (calculations, filtration and aggregation) is moved to the end of the process so as not to destroy data for the Data Scientists and Data Engineers, and also in order to benefit from multiple calculation nodes.

One consequence of this is that existing ETL tools also need to be able to handle ELT processes; another consequence is that new, dedicated ELT tools have been developed.

IoT contributes to new analysis opportunities

The Internet of Things (IoT) means that objects such as machines, vehicles, goods, household appliances, clothes – even animals and people! – are equipped with small built-in sensors and processors. These units can then report on current status and provide information about ambient conditions, via the Internet.

Industrial IoT applications generate a lot of data such as streaming data and sensor data. This data needs to be processed for analysis and action. In general, the process takes place in three steps:  Data Capture, Data Insight & Visualisation, and Data Execution.

One characteristic of this type of data is that it is often streamed, occurs in very high volumes, and grows quickly. Data is used to monitor and analyse large-scale flows over time, but also in the shorter term to create activities based on events. For instance, a car manufacturer can use the volume of sensor data in a particular car model to analyse the energy consumption of a given engine over a period of time. The same data source can also be used to inform a car owner – or a nearby workshop – that the brake pads will soon need to be replaced.

New, stricter demands on data security

Owing not least to GDPR and also to a certain extent ISO 27001 (management system for information security), the demands on how data is stored and handled have been toughened. The regulations dealing with personal data, for example, are very strict and violation can be very costly.

Many companies today store their personal data in multiple locations. In order to be able to administrate personal data in a correct and safe way, it is necessary to have good support. An MDM (Master Data Management) solution is part of such a process.

Master data is the basic data that is shared and recurs in various functions and transactions in several systems in an operation. Master data usually relates to employees, customers, suppliers, partners and products.

Master Data Management is a method of structuring information so as to ensure uniform data in an organisation. In short, MDM is about rectifying poor data quality at the source and handling constant change. There should be just one single reference for information – a source of “truth” on which other systems can rely.

Changes increase demands on flexibility

Companies are acquired and sold, and business models are transformed at what appears to be an ever faster pace. This gives rise to new regulations and changes in business logic that Data Management solutions have to handle. In conjunction with corporate acquisitions, the integration philosophy plays a major role when deciding on a Data Management solution. Requests for a homogeneous system structure and the possibility of integrating the company’s business system create additional scope for the Data Management solution compared with a heterogeneous structure that features several different business systems.

Other factors that impose demands are the compilation of financial statements, corporate analysis and type of report. The handling of shared master data when there are multiple redundant systems must also be taken care of. In this context there is a recurring need for a structured process and system support in Master Data Management (MDM).

Here’s how to build your next-gen Data Management platform

Many Data Management projects have collapsed owing to an excessively rigid process or manually coded data warehouse. It takes a long time to write code and, what is more, the code may be difficult to administrate over a long period of time as the number of solutions continues to grow. In order to speed up the development process and implementation, and at the same time create a uniform structure, various technologies have been developed in recent years. You can regard them as a kind of Best Practice for a secure and reasonably future-proof Data Management platform.

BIML – the first step away from manual coding

BIML is a script language based on XML that is designed for DW development. Instead of building on manual routines, tables, views, procedures with code and/or SSIS packages, generic script is developed in BIML which can then be run on different platforms. These are utilised primarily at the start-up of a data warehouse project or when it is time for an upgrade or change of the platform on which the data warehouse operates.

DWA tool – the next step in automation

Instead of manually writing code or using BIML script, the data warehouse can be modelled and built using a Data Warehouse Automation tool. Objects, relationships and operations are modelled in a graphic interface, after which the tool automatically generates necessary objects on the database platform that forms the foundation. This saves a lot of time in projects, and administration is made simpler in the longer term since all objects and all logic are built the same way, no matter who the builder is.

Data Catalog – the importance of being organised

As your Data Management solution is filled with data, there is an increasing need to be organised. In order to retain your solutions and their contents over time, the contents need to be documented and catalogued. This also makes the contents searchable.

Moreover, metadata (data about the data) needs to be updated so its relationships to other objects can be traced. This type of relationship traceability is generally known as Data Lineage, with the objects catalogued in a Data Catalog. Some tools have built-in functions for Data Lineage, while others require that you build this manually. Irrespective of method, it is worth the investment to do this.

Modelling your data warehouse

New modelling techniques have therefore been developed for greater flexibility and agility than before. We can see that the Data Vault modelling concept is gaining ground and is now a serious alternative to the classic Kimball modelling approach. This is particularly apparent where the operation’s regulatory framework changes over time and there are high demands on traceability.

Data Virtualization – store the same data just once

Data volumes are expanding all the time and in general they are doing so at an ever faster pace. In this context, the guiding principle is to not store the same data multiple times. This has driven the creation of Data Virtualization (Data Sharing and Data Cloning) – techniques that can spread data without duplicating or moving it. Data Virtualization does not replace a data warehouse, but it can be a supplement that supports various models in a Business Intelligence framework.

There is often a traditional BI solution with readymade reports, but for more agile reports via a Self-Service BI tool Data Virtualization can be part of a solution. In a Data Virtualization tool new data can be added quickly and simply, for instance to test a theory or conduct an analysis on a one-off basis.

Look at the complete picture – not just the technical solution!

This far we have in principle only examined various technical solutions in the form of systems and tools. However, Data Management itself has no intrinsic value. It is only when the right data of the right quality is put to practical use through various reports and analyses that any value is created. A good Data Management platform must therefore be built on the strategies, business models and control models that form the basis of your operation. Nonetheless, these should be queried and if necessary changed so as to better exploit the power of your data. This may also apply to existing competence, working methods and organisation.

Master Data Management Organise your data, once and for all!

Master Data Management consists of the definitions, processes, divisions of responsibility and technical platforms used for creating and maintaining master data. Master data is the type of data that is standardised for a company or a public authority. Examples of such information include customers, organisation numbers, employees, ID numbers, suppliers, partners, contracts, accounts or products. Such data is often shared by many departments, business processes and IT systems.

Master data is particularly important data to which we wish to ascribe descriptive terms, such as ID numbers where we want descriptive information such as first name, surname, address, phone number and customer number. An ID number is master data to which the attribute belongs, and the master data together with the attribute constitutes a master data domain. We can work in a similar way with products, customers, employees, ledgers and so on. Master data is often a prerequisite for being able to perform other actions such as successful digitalization, self-service BI of various kinds, reporting, conforming to legislative requirements (for instance GDPR) and data science; if the quality of the data is not good, these initiatives will not succeed.

Why is Master data important?

There are many reasons why master data is important. Below are a few examples:

One shared source for the truth
With master data you ensure that everyone has exactly the same information and that everyone is in agreement on what a customer or a given product is, for instance. Having different definitions and answers about the same thing creates uncertainty, and there is less likelihood and poorer preconditions for taking decisions based on the right information.

Legislation and regulations such as GDPR aim at creating order and protecting personal information. In this context it is important to know what personal data you have, in which systems they can be found, and how they are managed and protected.

Many companies want to find out about their customers’ behavioural patterns. In the fields of banking and insurance, for instance, a customer may recur in multiple products and thus in several different systems. If we are to succeed with customer care and ensure that the customer’s business operations can expand, we must first understand the customer’s behaviour. In addition, there are regulations governing Know Your Customer (KYC) where among other things it is necessary to ensure that the customer has been given information before a product is purchased. This too is something that should be properly administrated in master data.

When reports are filed from several different systems it is necessary to know that you are reporting about the same object.

Data Science
In order to train with advanced analysis models it is necessary to have large quantities of good-quality data. A master data tool can be used to ensure quality. If the data is of poor quality or is inconsistent, the result of the analysis will be incorrect.

Here’s how to get started with Master Data Management
When you start up your Master Data Management project everyone involved has to be in agreement as to what is master data and how to utilise it.

Among other things, it is important to define:

  • What is a customer, what is an employee?
  • Where in our systems are they to be found, and with what information?
  • Is the information accurate?
  • Is the information complete?

The next stage is to examine how the organisation works with master data

Here it is important to define:

  • Who registers new customers?
  • How do they do it?
  • What are the requirements concerning data in order for a customer’s information to be complete, and not duplicated?

Finally it is necessary to define: who is responsible for ensuring that the customer’s data is in good condition today and in the future? And what happens if we acquire a company and with it additional customer registers, for instance?

Master Data Management applications contain functionality for supporting the operation with definitions, processes and responsibility, and for merging data from different systems and matching together terms that look the same, in order to create what is known as a “Golden Record”. A Golden Record is a compilation of all information we have about a customer, for example, and it is the best source of information we currently have. The master data can then be used for reporting or updating the source system. Most master data platforms also retain the relationship to the source system so it is possible to trace which information comes from which system. Master data can also be enriched with connections to external sources, for instance for credit information or to verify addresses.

Important advice for getting started

  • Begin by agreeing on and running the operation to create joint registers and definitions. Master data is often a political issue in larger organisations since it is a matter of viewing something with a uniform perspective. This is always difficult since different actors have different targets, different momentum and different purposes. The general solution is to establish a separate joint solution for MDM, with an accompanying organisation, with the aim of jointly agreeing on one single truth.
  • Make sure you establish a joint system for handling master data; examples of this are customer, supplier, product and employee.
  • Invest time in establishing processes for maintenance of master data.
  • Underscore the importance of establishing an organisation for control and implementation of master data processes.

How can we help you get started?

At Capgemini Insights & Data we have the experts you need to support your Master Data Management project. We offer strategic advisory expertise in all aspects relating to MDM, both operational and technical issues. Over the years we have acquired in-depth expertise on a number of selected platforms for implementation of system support.

At Capgemini Insights & Data, Master Data Management is often an integrated part of work with BI and integration or application development. Over the years we have also been involved in running focused MDM projects together with our customers. In recent years we have noted increased demand as more and more companies are investing in Self-Service tools and Data Science, among other things. There’s a lot happening in this field right now and we make sure we are constantly up-to-date with the very latest technical advances so our customers remain competitive. The master data platforms we usually work with are Profisee and Tibco.

Capgemini Insights & Data – complete package supplier in Data Management

Data Management is a complex area that encompasses several different sectors. Our specialists here at Capgemini Insights & Data will help you see the complete picture – everything from starting up a small, traditional data warehouse to developing unique strategy documents and roadmaps for large projects.

Choosing the right Data Management platform in a fast-changing world requires specialist competence. The same applies to implementation of the necessary projects. Over a period of many years Capgemini Insights & Data has built up the necessary expertise and experience. Our customers include many of Sweden’s largest companies and organisations.

Independent partner for supply of products solutions

Capgemini Insights & Data is also independent in terms of its product solutions, always recommending what is most suitable for particular requirements. Having said that, we work together with a number of strategic partners – specialists whose products are at the forefront of their respective areas as regards development, maturity and market presence.

For example, taking a decision to move all or part of your company’s data to the cloud is a process encompassing many different aspects that must be taken into account. Using tried and tested methods, we support and secure the entire process from initiative and POC to full-scale solution. Over the years we have helped customers with both creation of new Cloud Data and Data Warehouse/Data Lake solutions as well as “lift and shift” of existing on-premises solutions to the cloud.

When it comes to Cloud Data Warehousing, Capgemini Insights & Data partners with Amazon, Microsoft, Google and Snowflake, Matillion and Birst. All of them are built to be able to harness all the notable cloud benefits. This area is undergoing swift change and we are constantly at the forefront, which may generate additional partnerships in the future.

Irrespective of the individual Data Management area, we always aim to ensure that all our consultants are at the cutting edge in terms of their knowhow as new technologies and new areas of use arise and other areas transition to maturity.

Expertise in the entire Business Intelligence sphere

Unlike many other players in this area, Capgemini Insights & Data has expertise in the entire Business Intelligence sphere. This means that we have specialist competence not only in Data Management but in everything related to it: budget and planning, consolidated reports, Self-Service BI, Data Science and so on. This means we can deliver Data Management solutions that are optimally tailored for your specific BI needs.

Capgemini Insights & Data also has specialist Management consultants with considerable insight into BI and Data Management. This is a virtually unique resource that can contribute strategic advice and practical support in terms of the prerequisites and organisational abilities needed to work in a perceptive manner. Read more about Capgemini Insights & Data Management Consulting.


Insights and Data

In a world where change happens in a split second, organizations of today and tomorrow must...

Contact us

Thank you !

We are sorry, the form submission failed. Please try again.