As IT organizations begin to routinely collect massive amounts of data, deciding who inside the organization should have access to that information is becoming a thorny issue. Business analysts often want to compare and contrast random sets of data in the hopes of discovering new patterns and insights regardless of the sensitivity of the data. This often puts them at loggerheads with IT organizations that have long been responsible for overseeing data management.
To reduce that friction, Paxata, a provider of a data preparation platform that runs on top of Apache Spark clusters, has added two-factor governance tools to the Paxata Summer ’15 release of its adaptive data preparation platform, which provides data administrators with control over all functional permissions, such as who can perform what types of tasks, while resource permissions over who has access to data sets and projects can be set by analysts.
Nenshad Bardoliwalla, vice president of products for Paxata, says that in the process of creating a platform that enables organizations to automate data integration and then organize data stored in a Big Data repository, Paxata found itself caught between two camps that have always been at war over data governance. IT is often held accountable for who has access to data, even though it’s ultimately up to business managers to determine who in the organization needs access to what data to do their jobs. Now, the two-factor authentication capability in Paxata provides a mechanism through which IT and the business can dynamically manage that process via a single click of a button, says Bardoliwalla.
At its core, Bardoliwalla says the Paxata Summer ’15 release applies machine learning algorithms to help organizations to discover the relationships between various sets of data stored in platforms such as Hadoop using a recommendation engine developed by the company. Once that data is fed into an Apache Spark cluster, the Paxata platform takes advantage of in-memory computing to present data using a HTML5 user interface in a familiar Excel-like spreadsheet format.
In addition to providing a number of enhancements to the tools that analysts use to sort through all the data stored in the Paxata platform using a columnar data store, the latest release now provides for dynamic provisioning of elastic clusters across multiple tenants, which enables IT organizations to segment an Apache Spark cluster across different projects. Paxata also now provides integration with Splunk search technology to enable IT organizations to better track how Paxata is being used from an IT operations perspective.
As is often the case, the more there is of something, the more responsibility there is to manage it. The challenge facing IT organizations is finding a way to manage Big Data without necessarily having to get in the way of the people inside the organization actually trying to make sense of all that data.