Graphical processor units (GPUs) play a key role in artificial intelligence (AI) by accelerating the training of machine and deep learning algorithms employed to create an AI model. Given the cost of GPUs, most AI training today takes place in the cloud. But organizations that need to maintain independent control of their IT infrastructure are training AI models using systems deployed on an on-premises IT environment.
IBM this week announced an IBM SpectrumAI with NVIDIA DGX reference architecture based on a software-defined storage scale-out file system, IBM Spectrum Scale on all-flash with the latest generation of GPUs from NVIDIA that promises to make it simpler to build AI models at scale. Existing IBM Spectrum Scale systems can be configured starting with a single IBM Elastic Storage Server (ESS), to support a few NVIDIA DGX-1 servers, to a full rack of 9 servers with 72 Tesla V100 Tensor Core GPUs to multi-rack configurations.
Eric Herzog, chief marketing officer and vice president of global channels for IBM storage, says the scale-out file system developed by IBM is ideally suited for GPUs because it’s based on a parallel architecture that maximizes I/O throughput to the GPU processors.
IT organizations looking to build AI models tend to underestimate the amount of data required to effectively train AI models. Most legacy storage systems were never designed to manage the petabytes of data that continue to expand over time as AI models continuously learn as new data sources are incorporated.
“AI models are iterative,” says Herzog. “They are always growing.”
NVIDIA and Intel are locked in a fierce battle for control over AI workloads. These days, most organizations rely on NVIDIA processors to train AI models. The inference engines those AI models run on are most commonly Intel processors. But NVIDIA has also made it clear that it intends to challenge Intel as part of a bid to dominate the AI processor space on an end-to-end basis. It may take a while for that battle to play out. But while everyone is arguing about processors, IT organizations would be well advised to pay some attention to the data storage system that will be needed to feed what one day soon might be hundreds of AI models.