Skip to Content

SAP Datasphere: Is partnership with Databricks a game changer?

Debraj Ray
Jul 26, 2023

How can SAP enterprise data be combined with big data to generate advanced analytics insights and drive key business decisions? SAP Datasphere with Databricks is the right answer!

In one of my previous projects for a large automotive industry client, we faced a real challenge in extracting SAP transactional & analytical data into a non-SAP cloud data lake platform to combine with other sources of information and realize the benefits and capabilities of AI and machine learning for predictive modelling. In general, extracting from SAP systems into a cloud data lake has the following challenges:

  • High data latency in exchange of information across systems
  • Incorrect data or data loss in the target data lake platform
  • Data duplication or redundant data in the target data lake platform
  • Need to rebuild the business context of SAP enterprise data in the data lake
  • Increased cost & maintenance due to additional software / hardware and licenses
  • Future regret spends due to issues with data reconciliation and delta processing capabilities
  • Additional incurred cost, change management governance for any future changes in source data structures

What is SAP’s offering to overcome these challenges?

On 8th March 2023 SAP announced the introduction of SAP Datasphere, the next generation of SAP Data Warehouse Cloud. One of the key highlights of the announcement was creating an Open Data ecosystem between SAP Datasphere and key strategic partners to deliver on the promises of a business data fabric architecture. Databricks is one of those key strategic partners.

What is Databricks?

Databricks is a data lake house platform built on top of an open-source Apache Spark framework, has capabilities of processing huge amounts of data and built-in libraries for artificial intelligence (AI) and machine learning functionalities. Databricks allow the consumption and analysis of structured, unstructured, and semi-structured data in batch or streaming modes. In short, it is the much-coveted data lake and data science platform which allows data engineers to generate predictive insights into the organization’s data in real time and helps businesses to be future ready and thrive.

Why is the partnership a game changer?

While SAP have always been at the forefront with providing businesses stable and robust ERP and Analytics solutions, there was a gap when it came to providing a solution or tool which delivers advanced analytics for data science use cases in conjunction with non-SAP and big data information. Building a powerful open data ecosystem around SAP Datasphere with Databricks lake house bridges that gap and unlocks future innovation possibilities in a business data fabric architecture without the need to move data outside of SAP.

How to connect SAP Datasphere with Databricks?

Currently a Data provisioning agent with CamelJDBC adapter is required to connect Databricks cluster to SAP Datasphere.

What are the potential use cases for SAP Datasphere with Databricks?

SAP Datasphere with its self-service modelling capabilities and in conjunction with reporting in SAP Analytics Cloud is targeted for analytics use cases to business analysts and power users. Databricks, on the other hand, is more suitable for processing large amounts of data and support data science use cases to be used by data engineers.

An example potential use case is to create a single golden record of every customer in SAP Datasphere which is critical for improving customer experience and targeted marketing for all businesses across the globe.

  • Harmonize data sources into SAP Datasphere: Information related to customer master data attributes (name, address, location, retailer information) and transactional data (customer generated sales and revenue) sourced from SAP ERP & CRM systems can be federated and harmonized in SAP Datasphere building a distributed data mesh architecture.
  • Machine learning in Databricks and Data federation between Databricks and SAP Datasphere: The harmonized information can be shared and combined with various other sources of information like websites, IoT devices in Databricks lake house platform. Predictive models can be built using FedML library capabilities in Databricks providing insights related to predicted marketing attributes of customer, e.g., customer lifetime value predictions, customer loyalty status, customer satisfaction score. These calculated predicted marketing attributes can be further integrated and combined into SAP Datasphere where the information can be consolidated into a single golden record for each customer.
  • Reporting in SAP Analytics Cloud: The consolidated insights can be reported using stories in SAP Analytics Cloud leveraging the live connectivity with SAP Datasphere. The single golden record for each customer can be further fed back into CRM system for the information to be displayed in customer overview screen.

SAP Datasphere with Databricks sounds to be the perfect marriage between Big Data and Analytics. The Open Data ecosystem of SAP Datasphere not only leverages partnership with Databricks but also with other key strategic partners Collibra, Confluent and DataRobot, all of which provide a wealth of opportunities in terms of data governance, data streaming and augmented intelligence capabilities respectively. Only time will tell how the ‘integration marriages’ unfold and if that result in providing valued outcomes to businesses across the world.

Debraj Ray

SAP Data and Analytics Solution Architect
Debraj Ray is an SAP Data and Analytics Solution Architect with experience in leading and implementing analytics solutions across a variety of industries, specialising in SAP BW, HANA, Business Objects and Cloud technologies.