In today’s business world change is the norm and the industry is buzzing with words such as digital transformation, cloud, data monetization, and data privacy. Organizations are talking of a “Cloud First” strategy and moving to a total cloud environment. However, this is the biggest misconception as most organizations will end up with a hybrid architecture and, possibly, multiple cloud providers. Given this, data orchestration and management across a hybrid multi-cloud environment becomes a major issue that needs to be addressed if organizations are to succeed in this digital world.
Historically, IT departments responded to business demands through complex, capex-funded initiatives. In today’s business environment, the key inhibitors are:
- On-premises hardware – slow to scale, failing to support the demands of digital organizations
- Five-year IT strategies – lock organizations into the IT innovation of the past
- Operating models and development methods – still waterfall and procedural, inhibiting innovation
- Solutions drive new challenges – cloud-based products offer scalability and software innovation but create increasing data challenges.
In today’s digital economy, information is power and organizations need a multi-cloud, connected and trusted data strategy that unlocks and empowers a hybrid multi-cloud IT business enablement.
The speed of innovation is the new norm. The speed of change drives new ways of doing business and IT and data enablement must respond. Organizations need to move from capex step changes to software, computing, storage, and resourcing to an elastic model. They also need to manage the explosion of data with increased governance while ensuring appropriate levels of data security and privacy. The future architecture of the digital organization rests on a multi-cloud, hybrid DevOps platform underpinned by connected and trusted data.
Some of the key aspects of ensuring a trusted and secure platform must address the following:
- All data should have an owner and this ownership should be aligned to platform data management and consumption strategy for the lake
- Management and ownership of platform security is aligned with the platform data management and consumption This must include how the data (at rest and in transit) is secured in combination with other data sets
- Integration to the organization’s core application framework for managing user access (e.g., Active Directory). This capitalizes on role-based access control to provide access to platform data assets.
Some of the key trends in this shift to the digital economy are:
- A data lake strategy’s success requires “Data on Data”
- Increased global data regulation requires a mature data privacy and data lifecycle management
- Business value is driving organizations to become data-driven
- Data governance underpins all these initiatives
- Data orchestration across multi-cloud, platform, and application ecosystem
- Analytical MDM, graph databases and “Relationship on Read”.
A typical data management framework to address the needs of an organization is depicted below:
Historically, data to support operational and analytical reporting was stored in an Enterprise Data Warehouse (EDW) in an industry template 3F or 4F normalized model. This approach is created by designing a complex data model, which required extensive regression testing and data modeling skills to modify. With the introduction of big data technology and cloud service, the approach changed to leverage a data lake with a schema on-read approach. Lowering the cost of storage and supporting elastic compute, large volumes of internal and external data, as well as semi and unstructured data, can be ingested.
The result has been a focus on data ingestion and storage. The impact on structured data ingested is the loss of the associated metadata and data lineage, combined with a reduction in data quality, lineage, lifecycle management for the data being ingested.
The next-generation data platform needs to have the ability to ingest and govern data at speed, supported by BI-model IT application/product development model.
The ingestion and data lake layer of the platform should offer a BI-model operating and governance model.
The industrialization of data ingestion in a factory approach is the standard industry practice for the creation and ongoing run of a cloud-based analytics data platform. This service would incorporate data governance, data quality, data on data, delta management, functional testing, stress testing, reconciliation, productionization, the transition to service, and ongoing monitoring and support.
The creation of an ingestion framework and standard practices, automation of the platform activities to optimize the process, should be done as part of the foundations of the data platform. The ingestion service can be fully or partially outsourced into a Kandan factory model. However, the IP and skills to extend the framework to new source systems should be retained within the organization.
That organizations will operate under a hybrid multi-cloud environment is a given and the sooner they strategize how to address the data orchestration and management challenges as part of their wider digital transformation/cloud strategy the better the chances of their succeeding. The days of working to a “best of breed” strategy for technology has given way to a “best of integration” strategy. Given the myriad of technologies and applications that will co-exist, integration & interoperability between them becomes the defining factor for any organization. Technology vendors such as Informatica & SAP are leading the way here with products that will address ingestion, privacy, orchestration & management across ERP sources, in memory, Hadoop & cloud environments seamlessly.
The mantra for a successful digital organization today is all about “driving business innovation and evolution powered by connected and trusted data.