Seagate this week introduced a ClusterStor Hadoop Workflow Accelerator that introduces the Hadoop on Lustre Connector, which allows clusters based on Hadoop and the open source Lustre file system “to use exactly the same data without having to move the data between file systems or storage devices.” Seagate is also making available “a source code for a patch to Hadoop that allows Map and Reduce processes to share files and enables the use of ‘diskless’ Hadoop compute clusters.”
Seagate also announced that it is contributing the Hadoop on Lustre Connector software to the open source community and that it is transferring “assets relating to Lustre.org to Open Scalable File Systems, Inc. (OpenSFS) and European Open Filesystem SCE (EOFS).” These two will manage Lustre.org together.
Ken Claffey, vice president and general manager for storage systems for Seagate, says that as IT organizations continue to scale their operations, many of them will find themselves tackling the same kinds of IT challenges that high performance computing (HPC) environments have already faced. Given the fact that Hadoop applications will be among the first places that will occur, it makes sense to bring the capabilities of an open source, parallel file system such as Lustre to Hadoop applications.
The challenge, says Claffey, was doing that in a way that didn’t require IT organizations to move data between different files and storage devices. As part of that effort, Seagate is providing “a set of Hadoop optimization tools, services and support that leverages and enhances the performance of ClusterStor” storage system in Hadoop environments. In addition, Seagate this week also announced an update to the software that runs its ClusterStor storage system that improves metadata performance by 700 percent and increases the number of files that can be managed to 16 billion.
Since acquiring Xyratex, which assumed ownership of Lustre in 2013, Seagate has been trying to establish a presence as a provider of storage systems. As part of that effort, Seagate this week also announced a reseller agreement with SGI under which the server and workstation vendor will make ClusterStor storage systems available to its customers.
As Hadoop evolves into a platform for running applications, it’s clear that performance expectations surrounding Hadoop clusters will soon rise. While there is no doubt that the performance of the Hadoop Distributed File System (HDFS) will improve in time, it’s unlikely that it will ever match the capabilities of a Lustre file system that was specifically designed to manage I/O in parallel from the ground up.