Data Science – Extract insights from data
The fields of application for a Data Scientist are versatile. For instance, Data Scientists can quantify new business models, improve customer understanding or predict machine failures in manufacturing.
Regardless of the specific task, all use cases have one thing in common: they require the analysis of data. This includes data hidden in numerous data silos throughout the company, data that is not documented and data that lacks necessary data quality. Therefore, the Data Scientist must deal with many challenges before he can bring in his core competence: modelling and data analysis. Data Management remedies these challenges and enables the Data Scientist to produce faster and better results.
The Data Science Journey
In the following, we accompany a Data Scientist in a typical use case: On behalf of a company’s marketing department, all customers who are supposed to buy a new product with a high probability should be identified from the existing customer base. The marketing department would like to address these customers prior to a new product launch. For such use cases the Data Scientist will follow a standardized procedure – the so-called Data Science Journey. During this journey we will identify common challenges that the Data Scientist faces along the way.
In general, the Data Science Journey consists of five phases:
Figure 1: The five phases of the Data Science Journey
1. Objective Definition – Setting the right course
The first step is to define the objectives of the use case and to identify relevant stakeholders. This step’s relevance for the overall success is often underestimated.
How to meet the expectations of the various stakeholders? Do other stakeholders need to be involved in addition to the marketing department? Who is responsible for the technical implementation?
Data Management can support this phase by providing guidelines for use case management and by defining standards how to document use cases. This implies a further advantage: If all use cases are centrally documented, relations between similar use cases already being implemented or planned in other departments can be identified more quickly. This avoids redundancies and allows synergies to be exploited. In addition, relevant contact persons can be identified more quickly.
2. Data Exploration and 3. Data Preparation – Understanding data structure and meaning
During this phase, the Data Scientist evaluates the existing data regarding relevance and data quality. The faster the Data Scientist gets access to data and their interpretation, the faster the Data Scientist can produce first results.
Which data sources do exist? Has the consent of the data owner been obtained? What is the meaning of 0 or 1 in column ´active customer relationship` in the customer table? Why are customers duplicated in the master data? Are sales from the online shop spent in net or gross? Are all sales shown in a standardized currency or in the respective local currency?
In this step, Data Management provides important information for the Data Scientist and monitors data quality. A Data Catalog provides a central overview about available data and business meaning behind field names across systems.
4. Modelling – Value creation through Data Science
Once all the data is available and understood, the core task of the Data Scientist can begin. The previous phase ensures a good modelling baseline. Now it is up to the Data Scientist to choose the right model and calibrate it accordingly. The following critical success factors apply:
- The method is decisive. It is important to find the appropriate method and the Data Scientist needs to provide diverse alternatives.
- The first model will not solve the problem. An agile approach with feedback and iterative improvement of the model is key because the application of one approach rarely solves a problem spontaneously and data-driven business models do not emerge as direct results.
- Market environments continuously develop. The models should also be able to learn – ideally even to be able to predict developments. Thus, a data model and the corresponding algorithms should not be understood as a project, but rather as a product.
5. Value Generation – Taking action from derived insights
The Data Scientist can now use the model to identify customer groups who are very likely to buy the new product. However, the work of the Data Scientist does not end with the completion of the model, because it must be ensured that the insights derived from the model are ready to take actions within the organization.
How can stakeholders access the results? How to ensure that the model remains relevant in the future? How can the model (automatically) learn? To which / how many related use cases in the organization can the model be applied?
Data Management integrates derived insights into the information architecture and ensures that the results are available to take data-driven decisions.
The advantages of value-driven Data Management
Now the marketing department knows which customers are most likely to buy and it can apply appropriate communication channels to draw customers’ attention to the new product. How best to address each customer group is yet another use case for a Data Scientist. Again, information provided by Data Management, for instance through a Data Catalog, is already available. However, the use case described here can also be extended to other products or product lines.
Data Management offers the following advantages through a standardized approach to use case management, the definition of clear roles and responsibilities, and the provision of a central Data Catalog:
- Faster execution: All necessary information for processing use cases available
- Uniform understanding: Cross-functional transparency about data sources and their meaning
- Monetization of data: Focusing on those data sets that create a direct competitive advantage through the implementation of use cases
- Synergy effects: Information exchange between similar use cases across functions and business lines
Figure 2: Relationship of Data Management and Data Science
Based on value-driven Data Management, use cases can be made repeatable and scalable to implement further use cases more quickly. This way Data Management becomes key to establish data-driven decision making in your organization.