Any list of the top reasons why enterprises are still hesitant to trust the cloud with higher-level storage responsibilities is bound to lead off with security. Safety and reliability of stored data is nothing to be trifled with, after all.
Coming in a close second, however, is latency. Even if the firewall remains intact, mission-critical data is of little value if it cannot get to where it needs to be in a timely fashion. And with cloud-based data having to transition across any number of network architectures and infrastructures in order to show up on a screen somewhere, bottlenecks are a constant threat — even in the presence of state-of-the-art WAN optimization.
Still, a number of storage providers are hoping to speed things up through advanced data handling techniques, although results are mixed. Box's new Accelerator content delivery network, for example, aims to provide a global data network that improves upload speeds by a factor of 10. That means the service can pick up a few hundred MB in a matter of minutes, as opposed to an hour under a traditional architecture. The system utilizes a handful of data centers peppered throughout the U.S., UK and elsewhere, tied together with an advanced intelligent network that streamlines connectivity between nodes.
Meanwhile, Google is working on a new global distributed database platform called Spanner intended to maintain consistent data operations and transactions across widely divergent external infrastructure. According to a white paper uncovered by ZDNet, the platform allows data to be read and written across multiple data centers without falling victim to traffic bottlenecks. The goal is to build a worldwide replication service that maintains transaction consistency and allows users full control of where and how their data is being stored. Key elements of the platform include the TrueTime API and a fleet of GPS receivers and atomic clocks that allow data centers to maintain sync without having to constantly ping a centralized time source.
Proper synchronization is key to the concept of the distributed database. As David Barker, founder of colocation provider 4D Data Centres, points out, synchronized replication across multiple databases requires latency of less than 5 ms. This is easy enough to accomplish in a private network or virtual LAN, but public services are subject to the whims of public networks. Your best bet, then, is to employ dedicated leased lines or metro Ethernet directly to your provider. Of course, this will cost more, so you'll have to crunch the numbers to determine if the cloud is really an appropriate storage option.
Not all cloud storage applications require lightning speed, however. Amazon's aptly named Glacier service is designed from the ground up to provide slow, but cheap, storage for long-term archiving. A typical upload takes upwards of five hours, but once in place the data can be held for as little as one cent per GB per month. The company boasts an eleven-nines durability rating, as well as SSL and AES256 security and encryption. And for those needing faster throughput, there is always the new High I/O instance in the Elastic Computer Cloud, which uses front-end SSDs for high-speed throughput.
It's hard to imagine a cloud service providing the same level of performance as local infrastructure, but it is certainly within the realm of possibility to enable "good enough" service for the vast majority of applications currently in play.
Trust is very difficult to acquire and very easy to lose, however, so cloud providers have their work cut out for them to first show the kinds of latency and throughput that enterprises are looking for, and then maintain those levels in real-world data environments.