One of the thornier challenges with creating any Big Data application is the amount of manual effort that must go into cataloging the data. As a result, organizations often hire expensive data scientists only to have them spend months simply tagging data.
To accelerate the process, Tamr has developed a free, lightweight application that takes advantage of machine-learning algorithms to determine what type of content is being cataloged. Nidhi Aggarwal, global head of strategy, operations and marketing for Tamr, says Tamr Catalog is available in beta and can identify data as a type of customer record, for example. It then asks the user to confirm that is indeed a customer record, which it then uses to inform itself to better identify those types of records in the future.
Aggarwal says the company is making Tamr Catalog free as part of an effort to increase demand for the Tamr Data Unification Platform, which organizations can use to join various sets of Big Data together to create more value for the business.
While some believe that investments in Big Data will return higher profits, there is a certain amount of frustration building with the length of time that it is taking for organizations to realize that value. Much of that frustration, said Aggarwal, can be attributed to the data preparation work that data scientists have to manually perform before an application is ever deployed.
Organizations, however, would do well to think long and hard about where to apply Big Data. While there are plenty of instances where having access to 100 percent of all the available data is necessary, there are probably still many more where 10 percent of the data is good enough to make a well-informed decision.