Now that it is clear GPUs will start to take on more and more responsibility for handling raw data in the enterprise, on-chip- and board-level designers have shifted focus from simply making it work to creating the optimal architecture that best takes advantage the GPU's enhanced capabilities.
The fact is, there is a wide variety of options when configuring a system architecture that ensures optimal data handling between GPUs and CPUs, and between GPUs themselves. The ideal solution, unfortunately, is likely to vary across different data environments, so it's probably not a bad idea to bone up on what some of the leading system designers are tinkering with at the moment.
A good place to start is a recent overview of Intel's Sandy Bridge and Ivy Bridge architectures by tech consultant Nebojsa Novakovic. Both lines feature integrated GPUs in four- to eight-core socket designs and external QPI v2 links capable of 8 gigatransfers per second. Novakovic has sketched out a number of design possibilities, including direct dual-QPI links between and among CPU and GPU cores, a multi-gig RAM module using a fast GPU or vector FP coprocessor, and a high-speed shared memory interconnect that can be used to pool thousands of nodes. It also helps to make sure the overall platform has the kinds of peripherals than can handle the configuration. It would be a shame to waste full CPU/GPU performance simply because a vendor wanted to save a few bucks on a lower-quality audio chip or Ethernet NIC.
Anyone interested in using IOV technology on the motherboard should think twice about deploying GPUs, SSDs and SAS/SATA controllers or other high-speed devices, according to David Greenfield of Strategic Technology Analytics. Or at the very least, recognize the you will very likely exceed the IOV capabilities of solutions like the XSigo/Mellanox combination found on the HP SL6500. In that case, you'll need to provision the very x16 slot that on-board IOV was supposed to replace.
On a more macro level, it's becoming easier to incorporate GPUs into workstation and small cluster applications. Solutions like NextIO's vCORE Express expansion system that sports four Tesla M2070s are capable of delivering full cluster performance at about one-tenth the cost of a CPU configuration. You also get a virtual PCIe platform capable of supporting high-speed, high-data applications like ray tracing, 3D cloud computing, data analytics and computer-aided engineering.
It's safe to say that GPUs are likely to find homes in a wide range of data settings, from small-job, low-GPU-count shops all the way to the largest supercomputer: a title just recently taken by China's Tianhe-1A CPU/GPU hybrid machine packed with more than 7,000 Tesla M2050s.
Finding a way to accommodate their improved processing power and integrate them into CPU-dominated infrastructure will probably be one of the most crucial tasks of IT infrastructure management as the industry transitions to high-speed, high-data architectures. The trick is to plan for at least a hybrid GPU infrastructure now so you won't have to do a lot of retrofitting later.