With many organizations struggling to incorporate various forms of media into their storage environments, it’s sometimes instructive to see how Web-scale companies went about solving the problem. Starting today, LinkedIn not only will to share how it built a distributed object storage optimized to handle media; it is now willing to share the actual code.
LinkedIn today announced that the Ambry distributed object storage software it built is now available under an open source Apache license. Sriram Subramanian, manager for media infrastructure at LinkedIn, says Ambry was created to specifically address performance issue associated with having to support both small and large objects. In fact, LinkedIn says it relies on Ambry to serve up to 10,000 requests per second across more than 400 million users. More specifically, LinkedIn says Ambry latency is less than 50ms for a 1 MB object and reduces request rate imbalance among disks by a factor of eight, and consumes 88 percent of the network bandwidth allocated.
Ambry consists of a set of data nodes for storing and retrieving data, front-end machines that route requests after some preprocessing to the data nodes, and a cluster manager that coordinates and maintains the cluster. The frontend interacts with the data nodes in the remote data center when read-after-write consistency is required. The frontend provides an HTTP API to POST, GET and DELETE objects.
Subramanian says LinkedIn developed Ambry because it could not find a storage solution that addressed horizontal scalability, availability and active-active data center configurations. As LinkedIn continues to scale out a data center environment, the need to move beyond a traditional file system in order to distribute objects became a critical IT requirement, says Subramanian.
Naturally, as many IT organizations think through the impact that video, for example, is having on their Web sites, a lot of research work is being done on various approaches to modernizing storage environments. The upside of Ambry is that not only does it have a price point that can’t be beat, it has just as importantly already been battle tested by an IT organization well used to managing IT at a level of scale most other organizations will never see.