Organisations come & go, applications come and go but Data remains……….

Publish date:

In this multi-part series we shall look at the use cases for a “Data lake” in a typical Enterprise Application landscape. Data Lake: the data migration use case  In the implementation journey of any Enterprise Application like ERP, CRM etc, successful data migration is a key milestone. The same holds true for instances’ consolidation, enterprise […]

In this multi-part series we shall look at the use cases for a “Data lake” in a typical Enterprise Application landscape.

Data Lake: the data migration use case 

In the implementation journey of any Enterprise Application like ERP, CRM etc, successful data migration is a key milestone. The same holds true for instances’ consolidation, enterprise application migration, global rollouts, upgrades etc. In a majority of cases, the objects being migrated from the legacy applications are mostly the master & reference data objects.

The scope for transaction data migration is usually limited to open inventory & G/L balances, open purchase orders, open sales orders and other open transactions only. What happens to the closed transactions belonging to the current financial year or those belonging to the closed financial periods? In other words where does the history of an organization reside? In some cases, this legacy data may be backed up on tape drives, usually unusable because the corresponding legacy application has already been decommissioned and the servers switched off, with no practical means left to retrieve the information. More than the application transaction history, what about other unstructured data like server logs, interface files, documents, point-of-sale transactions, call detail records etc.?

Does it mean that if an organization changes it Enterprise Applications platform, it loses connect with the past. Without large amounts of historical data, how can it hope to achieve the following objectives:
·        Improve sales & inventory forecast
·        Utilize techniques like predictive analytics
·        Improve preventive maintenance
 

Traditional data migration

Due to the cost prohibitive nature of data warehousing tools like ETL, many organizations depend on homegrown or SI offered custom solutions based on MS Office, .NET etc to perform data migration. Even if they can afford traditional ETL tools like SAP Data services, the amount of data in staging area is limited. This can be due to lack of resources like licenses, skilled manpower, processing power, time window available for ETL,  lack of capability of ETL tools to handle variety of data etc.    
Hence, to achieve data nirvana, the following are the key impediments:
·        Cost of hardware, software & storage
·        Capability to handle all data types
·        Ability to handle vast amounts of data in an always ready mode
But is there life after death for data?

In comes BIG DATA technology along with the data lake. Big Data refers to the large amounts of poly-structured data that flows continuously through and around organizations. The cost of the technologies needed to store and analyze large volumes of diverse data has dropped, thanks to open source software running on industry-standard hardware. The cost has dropped so much, in fact, that the key strategic question is no longer what data is relevant, but rather how to extract the most value from all the available data.

The common use cases of big data technology like Hadoop include:
1.     As a flexible data store
2.     As a simple database
3.     As a data processing engine
4.     Together with SAP HANA for data analytics

Big data technology uses Hive & Pig scripts, Flume, Sqoop etc as its ETL tool. Traditional ETL tools have also adapted by adding the capability to “translate” an ETL job to a MapReduce job using JAQL technology. Thus, the ETL job is rewritten as a JAQL query which gets executed as a MapReduce job on Hadoop. SAP Data services is able to auto-generate Pig scripts to read from and write to HDFS including joins and push-down operations.
 

Next Generation Data Migration

Hence the recommended approach for data migration using a Data Lake would be as follows:

EXTRACT
Extract all data from the legacy source systems including all master and transactional relevant data in its entirety to a Data Lake. The fundamental technology in the Data Lake that is relevant to this process is HDFS (Hadoop Distributed File System). The data ingestion into the Data Lake from the legacy sources can leverage the standard Big Data ingestion methods or traditional ETL tools like SAP BODS for pushing data into the Data Lake.
 
TRANSFORM – cleanse, standardize, and de duplicate
Profile the legacy source data and generate a data quality assessment score. Perform cleansing iterations, apply data and business validation rules, legacy to target mapping exercise and standard data quality cleansing. As the quantum of data to be processed is huge, the fundamental technology in the Data Lake that is relevant to this process is mapreduce. The raw data and the cleansed data will reside in the Data Lake in different locations.
As mentioned in a blog by my colleague “MDM & Big Data”, this process can be aided by a MDM tool like SAP MDG to create a golden record of all master data in the Data Lake.

LOAD
Finally, the cleansed master data will be pushed back into the Data Lake from the MDM application in order to ensure that the all the present and future consumers of data in HDFS have cleansed master data to better enable their activities.  The transactional data objects in HDFS will be migrated, after being mapped back to the new master data records, as part of the cutover and go-live activities, into the new Enterprise Application System.
We now have a cleansed set of transactional and master data derived from all the legacy systems including all the relevant history right from the birth of the organization, ready for use in ERP, CRM etc as well as EDW, Analytics and in-memory appliances like SAP HANA.  

Advantages:

Big data technology can thus be used to “keep existing data around longer” thus facilitating the following:
·        Faster decommissioning of legacy systems
·        Historical data feeds into the Enterprise Data Warehouse enabling better insights
·        Bring down cost of data storage
·        Create an online archive- data that was once moved to tape can now be queried to understand long term trends
·        Compliance retention – industry specific requirements for data retention
·        Combine with external historical data sources- machine sensor data, weather, survey, research, purchased etc later in SAP HANA for instant analysis

Thus the power of big data technology in terms of almost infinite storage and processing power can be used for giving a second lease of life to archived data.

Related Posts

Digital Transformation

Servitization – (Re)Dawn of the XaaS

Aditya Kamalapurkar
Date icon October 15, 2020

Warren Buffett said, “Price is what you pay. Value is what you get.” Could servitization, the...

Azure

Musical chairs with the hyperscalers

David Lowson
Date icon September 16, 2020

Bryan Adams had the luxury to sing “Yeah, we can watch the world go by, up on cloud number...

analytics

Accelerating the customer experience (CX)

Goutham Belliappa
Date icon September 15, 2020

There is no single answer or recommendation on the preferred architecture. It depends on the...