Targeting the Operational Side of the Data Lake

    The data lake may still be on the drawing board at most enterprises, but that hasn’t deterred software developers from coming up with new ways to improve their performance.

    By nature, the data lake will encompass a complex arrangement of infrastructure and architectural constructs, and be tasked with some of the most difficult data processes the enterprise has yet encountered. So not only are channel providers looking to deliver streamlined hardware and middleware platforms to simplify the lake’s design, but advanced automation and intelligent software to fulfill the demands of high-speed, even real-time, analytics.

    One of the newest entrants to the field is Podium Data, which recently unveiled a data lake management platform aimed at providing rapid turnaround of custom-built data sets. The company says it can provide a 25-fold increase in speed for functions like search, exploration and publishing, allowing business and analytics teams to incorporate critical data sets in weeks rather than months, and then produce results in a matter of hours. At the same time, the platform is said to reduce data delivery costs by 40 percent through improved resource consumption and self-service operations.

    Many long-standing IT vendors are starting to target the operational side of the data lake as well. CSC and Microsoft recently teamed up to develop industrial machine learning (ILM) solutions to coordinate enterprise-wide data sets for advanced analytics applications. The companies are said to be working on targeted solutions for key industry verticals like health care and finance, matching CSC’s data ingestion and analytics prowess with the Azure Cloud and the Cortana Intelligence Suite to provide next-generation data modeling, mining and pattern recognition. Ultimately, the goal is to devise a unified solution that meets enterprise scalability demands while simplifying the analytics process through microservices, application templates and improved data pipeline management.

    Development is also progressing on the protocols and APIs needed to bring multiple platforms together, says Progress’ Sumit Sarkar. A case in point is the OData (Open Data) protocol that provides a common format for creating and consuming interoperable RESTful APIs. Known to some as the “SQL of the Web,” OData enables a uniform query format to enhance the interchange between standard SQL interfaces like Hadoop Hive, BigSQL and Apache Phoenix. In this way, organizations will not only be able to implement data lakes across single facilities, but across multiple cloud implementations as well.

    The key element in any high-speed analytics operation, of course, is data flow, which is why start-ups like StreamSets are targeting middleware at data collection and other functions. The firm’s Data Collector solution features tools like drift synchronization, change data capture and dataflow triggers to bring data management in-line with the highly dynamic nature of information coming into the enterprise. In this way, the data lake is better able to respond to unexpected semantic and schema changes that typically lead to broken pipelines and lost data, and better accommodate the formatting differences between leading database platforms like Oracle and MySQL. Large organizations in particular may benefit from this approach by implementing an enterprise-wide data exchange that is accessible by multiple business units.

    These and other advancements show that the data lake is still very much a work in progress, and it will be some time before the industry has gained enough experience to manage it on an operational level on par with legacy data center or even cloud-based infrastructure.

    But the movement toward high-speed, high-volume analytics is already creating new opportunities for early adopters, and it won’t be long before virtually every organization, large or small, will have to tap into the wealth of information at their disposal in order to gain the insight needed to keep pace with an increasingly data-driven world.

    Arthur Cole writes about infrastructure for IT Business Edge. Cole has been covering the high-tech media and computing industries for more than 20 years, having served as editor of TV Technology, Video Technology News, Internet News and Multimedia Weekly. His contributions have appeared in Communications Today and Enterprise Networking Planet and as web content for numerous high-tech clients like TwinStrata and Carpathia. Follow Art on Twitter @acole602.

    Arthur Cole
    Arthur Cole
    With more than 20 years of experience in technology journalism, Arthur has written on the rise of everything from the first digital video editing platforms to virtualization, advanced cloud architectures and the Internet of Things. He is a regular contributor to IT Business Edge and Enterprise Networking Planet and provides blog posts and other web content to numerous company web sites in the high-tech and data communications industries.

    Get the Free Newsletter!

    Subscribe to Daily Tech Insider for top news, trends, and analysis.

    Latest Articles