Who can resist a piece that purports to explain how an Internet giant handles its own technology? Not me. So ReadWrite pretty much had me hooked at the headline, “How to Make Data Services Scale like Google.”
The big reveal is a distributed approach to data centers, but it turns out that’s not really what the piece is about at all. It’s actually an interview with Florian Leibert, CEO and co-founder of Mesosphere, a startup that promises to commercialize the same scalable approach to data.
Mesosphere is based on Apache Mesos, which is an open source solution. It’s the same solution Twitter used to get rid of its famous fail whale problem.
Another reason you might want to read up on Mesosphere: The company has raised $10.5 million to build a “Google-like scale” data center.
What makes ReadWrite’s piece unusual, though, is it’s the only piece to actually talk about what this means in terms of building data services.
Since it’s a Q&A, it’s obviously pretty biased toward Mesosphere’s approach, which obviously does work to scale data services. Leibert explains how it differs from virtualization, which is typically how organizations have created data services in the past five years or so.
Leibert’s main criticism against virtual machines is that VMs allow you to manually run multiple, small applications on incidents of an application on big servers.
The thing is (thanks in part to Hadoop, I’m sure), more applications are now written for distributed systems from inception. That, he explains, makes VM pointless:
“Rather than splitting up the applications onto multiple machines, we aggregate all the machines and present them to the application as a single pool of resources. It changes the way new applications are written, deployed and scaled and how existing applications are run, versus running multiple VMs. Our system aggregates your hardware into one pool.”
In some ways, the ReadWrite article is a hardware-heavy read, but it does delve into the software side of things. That’s a pretty important consideration, since normally shifting to a distributed approach might require significant code rewriting. Indeed, that was one of the first concerns when people started talking about distributed computing: What will we do with all these legacy apps?
Mesosphere’s answer to that is an open source orchestration layer called Marathon, which provides access to managing the clusters via — drum roll, please — a REST API.
Once again, we see how APIs can be useful for handling things that otherwise might require complex integration code or outright rewrites.