Data Services – more than just an ETL tool

Capgemini

7 Oct 2020

ETL or Extract, Transform and Load, that small three letter acronym that’s used a lot in Business Intelligence and Data transformation projects. It’s critical to all business strategies, as if you have no data, what do you base your decision making process on? There’s a fair number of ETL tools on the market, but this blog is about the SAP Data Services (SDS) suite. If you’re a data geek like myself, you’ll enjoy using SDS.

What is it?

SDS, or Business Objects Data Services (BODS) as it was previously known, is a suite of tools to handle data manipulation. It has the ability to do data profiling, data quality, data transformation from a variety of sources as well as enabling the integration of interfaces between applications.

It has been around for many years, the last version before SAP purchased it was v3.2 which went out of support in 2013. SAP have improved it a fair amount with V4.X, including fixing some of the bugs and quirks although I would still love to have at least a single level undo or CTRL+Z option. It’s particularly good at processing high volumes of data, as it is database centric. The code SDS produces, if written correctly, is essentially SQL which is pushed down to the database of your choice typically Oracle or MS SQL. For custom coding scenarios, SDS also supports Python should you have a requirement for it.

Why ETL ?

In the 90’s I felt that ETL was a bit of a niche area. I got the chance to use ETL via MS SQL Data Transformation Services when the company I was working for, decided we needed to build a sales figure data mart. Back then the ERP ran on an IBM AS/400 midrange system which was superb for Online Transactional Processing (OLTP), but slow for anything analytical, due to the PowerPC CPU.

The problem with doing this, was the ‘E’ in ETL i.e. extraction, as there were not any decent DB2/400 database extractors. Java Database Connectivity (JDBC) was still in its early days then. We ended up with a custom Report Program Generator (RPG) program, to extract a DB2 table into a CSV file which we loaded into MS SQL v7. My point here is that it is so much easier with SDS now, as there are adaptors for a variety of data sources ranging from the traditional mainframe flat files, delimited and unstructured data, XML, even an adaptor for SAP’s HR system SuccessFactors. If that’s not enough, there is an Adapter SDK.

You might be surprised I mentioned mainframes. However American computer science graduates are learning COBOL, after graduating which I found fascinating. Listen to the podcast about it on stack overflow 230. Companies still need support for these legacy systems even after Y2K, which we thought would slowly be phased out such as Windows 3.1 at airports and are blending them with newer technologies.

Data Quality

SDS can be split into two tools, there’s the SDS part which is really the ‘ETL’ part, but there is also Information Steward (IS) which as you might guess from the title does the data profiling and data quality. You simply connect IS to the source system and import any tables you wish to profile, then you can execute a standard job. Behind the scenes IS is essentially using the SDS to connect to the data source, so it is quite heavily integrated. This is why you have to be careful when doing upgrades, to get the correct software patch level for both SDS, IS and Information Platform Services (IPS) which is sort of a middleware which glues everything together.

The data quality part of IS is probably more useful than the data profiling. I think the original intention was to use it for producing a quality scorecard of the source or target data. It does this by allowing you to build rules and then bind them to fields. Of course, you get some default rules, such as if the field allows nulls, but the power comes from the flexible rules you can build yourself. It essentially uses the same syntax as SDS, so it’s similar to Oracle syntax e.g. nvl for null. It is even smart enough to suggest some rules, based upon any previous data profiling you have completed. The rules can be exported too from IS, so you can use them in SDS. This is a nice feature as you could capture any erroneous data before insertion into the target tables.

Of course one other nice benefit of defining your data quality rules in IS is that you can reuse the same rules in SDS. On one project we did exactly this – IS allowed us to read from CSV delimited files so we setup a view over the input delimited file, bound the rules to the fields in the view, then used it to produce a scorecard of the data quality. Using the scorecard, you can also download a zip file of any rules that fail with the corresponding erroneous data to pass back to the business to cleanse. IS also has a nice unique function for checking for duplicate key data, although you can do the equivalent in SDS with code.

Why use it with SAP?

SDS comes with some ready built ABAP function modules to install on SAP, which makes it easy to extract data. It also supports BW data warehouse ODP extractors and RFC calls for SAP BAPIs or ABAP function modules, which is a good way of loading data into SAP. Some BAPIs have a test mode, so you can pass in the data you wish to load and get SAP to validate it without committing the data to any tables and then pass the errors back to SDS. Otherwise you can use IDOCs and SAP have a Rapid Deployment Solution (RDS) you can download for several of them.

I could spend more time extolling the benefits of Data Services, but if this has interested you, feel free to contact me at Capgemini.

Jason Brown is a BI Developer in SAP BI & Analytics within Capgemini’s UK Insights and Data business unit. He has 25 years of experience in IT, most of which has been ERP (SAP & AS/400) or data related. When he’s not writing code, he’s shooting arrows or travelling – preferably on the QM2.

Cloud

Management team

Artificial intelligence

Data Services – more than just an ETL tool

Capgemini

7 Oct 2020

What is it?

Why ETL ?

Data Quality

Why use it with SAP?

Related

Cloud

Management team

Artificial intelligence

What is it?

Why ETL ?

Data Quality

Why use it with SAP?

Jason Brown

B.I. Developer – SAP SDS Certified Data Services

Related

Contact us