SAP Looks to Simplify Data Management via SAP Data Hub


As the amount of data that IT organizations need to access increases, many of them are wrestling with how to manage multiple data pipelines spanning multiple sources. SAP today unveiled a SAP Data Hub that is intended to make it easier to both manage and integrate data between enterprise applications and Big Data platforms without having to physically move any of that data.

Greg McStravick, president of SAP database and data management, says SAP Data Hub provides a single open mechanism for managing data pipelines spanning enterprise applications running on the SAP HANA in-memory database and data sources such as Apache Hadoop and Spark.

McStravick says via SAP Data Hub, an IT organization can now combine data governance, management of data pipelines and data integration together using a single visual interface without requiring data to be moved into a central data warehouse.

“You can leave data where it resides,” says McStravick.

SAP is essentially providing a data management capability that takes advantage of data virtualization software to track not only who accessed what data, but also the relationship between all the pipelines accessing that data as they get integrated in real time. The SAP Data Hub itself is a SAP HANA application that generates an SAP Vora run-time based on the Apache Spark framework, then executed as a container running on a Kubernetes cluster. The SAP Data Hub can be deployed on-premises or in a public cloud.

McStravick says that capability is critical at a time when the convergence of cyber and physical systems is making data the new oil of a digital economy. The fundamental problem organizations face today is that it’s too difficult to refine that data. McStravick says only one half of 1 percent of the data being created today is actually being used. And yet, as more data than ever gets created, all the data being created has paradoxically become unmanageable. SAP Data Hub not only provides the data integration required to access that data, but employs machine learning algorithms to transform data and enrich it using SAP business logic.


Just about every provider of enterprise software is trying to bring to market a new generation of data management platforms that eliminate data silos without having to employ a raft of fragmented data integration tools. The challenge many IT organizations will face next is determining what data management tools to rationalize today in favor of more elegant approaches to solving a chronic IT problem.