One of the myths surrounding Big Data deployments is that the quality of the data is not nearly as important as it has been in most traditional database systems. Over time, any bad data would simply become marginalized by the total amount of data in the system.
While that remains true for certain classes of applications involving, for example, social media analytics, it turns out that in other sectors such as health care, the quality of the data still matters regardless of how much of it there is.https://o1.qnsr.com/log/p.gif?;n=203;c=204663295;s=11915;x=7936;f=201904081034270;u=j;z=TIMESTAMP;a=20410779;e=iTo address that requirement, SAP today rolled out a series of data quality tools aimed primarily at Big Data application environments, including a variety of tools designed to accelerate data extraction as well as a “microservice” that organizations can employ to invoke a SAP cloud service to clean data.
But Philip On, vice president of product marketing of SAP, says the most significant advance comes in the form of SAP Information Steward and SAP Agile Data Preparation tools that make it simpler for business users to self-service their own data needs within the context of a well-defined set of data management policies. That approach, says On, also makes it clear to everyone that IT departments alone are not responsible for maintaining the quality of the data.
Collectively, On says, SAP is committed to providing a full portfolio of tools spanning everything from data lineage to impact analysis. In fact, On says that in certain vertical industries, the difference between Big Data success and failure almost inevitably comes down to the quality of tools.
“A lot of organizations are winding up with data swamps instead of a data lake,” says On. “They wind up just making a lot of bad decisions a lot faster.”
Garbage in equals garbage out is, of course, a long-standing mantra of IT. What’s different now is the potential scale of all that garbage. While some garbage is almost inevitable, the challenge now is to identify and correct bad data before any of that garbage proliferates in a way that makes investing in a Big Data platform in the first place a regrettable decision.