But beyond the hyperbole, AI techniques such as machine learning are increasingly being put to use to solve numerous everyday business problems, from flagging up potential fraud in financial transactions to speech recognition for chatbots and image recognition.
The raw material for AI, whether machine learning or deep learning, is data – in vast quantities. Many organizations have petabytes of data collected from various sources, stored in data lakes or warehouses, which is the raw material for training and developing those models. Randy Bean wrote in MIT Sloan in 2017 AI is feeding on raw, not representative or sample, data in all its granularity, nuance and detail.
“Although many AI technologies have been in existence for several decades, only now are they able to take advantage of datasets of sufficient size to provide meaningful learning and results,” Bean wrote.
Processing all of this data requires some hefty compute capabilities, or as The Register recently noted, AI is making hardware important once again. This often calls for specialized accelerator components such as GPUs and ASICs, in addition to high-end server hardware with plenty of memory.
But the importance of storage can be overlooked, despite the obvious fact that if your compute systems are optimized to process data as fast as possible, then getting the data into those systems is going to become the bottleneck unless you build just the right combination of compute, memory, and storage.
In the past, machine learning systems tended to rely upon traditional compute architectures and traditional storage. Today’s systems with their GPUs, FPGAs, and ASICs are able to process data much faster. Meanwhile, the data sets used for training have grown larger and larger over time. Not surprisingly, the answer has been to turn to flash-based storage to meet these challenges. Flash, with its combination of low latency and high throughput, is currently considered the most optimal solution for AI storage, although a great deal also depends upon the way the storage subsystem is implemented.
In general, a disk array may have a latency of tens of milliseconds, while that of flash is typically in tens of microseconds, or about a thousand times faster. Flash chips can also be packed together much more densely than rotating drives, so that a petabyte of storage may take up just a single rack-mount enclosure. Flash also consumes less power, which can make a big difference in cost at large scale.
It is not just AI that calls for high performance storage – other demanding workloads can also stress the storage environment. But AI applications have their own quirks. Not only does there tend to be a large volume of unstructured data that has to be accessed quickly, but ML training workloads tend to follow an unpredictable access pattern, generating lots of reads and writes that may comprise both random and sequential accesses of varying sizes.
This is where flash excels, as it takes exactly the same time to read from one part of the chip as any other, unlike hard drives where the rotation of the disk surface and the time taken to move the read/write heads above the correct cylinder on the disk causes varying delays.
But it is only relatively recently that demand for flash has driven densities up and prices down to the point where it can start to compete against rotating hard drives at the kind of capacities that enterprises require from their primary storage environment.
As a technology, flash has been around for quite some time, having been invented by Toshiba in the early 1980s. It took a long time to make its way into the enterprise, as the high cost of flash memory chips meant that it used to cost tens of thousands of dollars to create a large enough pool of storage to be of any use.
Instead, flash demand was initially driven by consumer devices like smartphones and tablets, where the compact nature and low power requirements of flash made it the perfect solution.
Eventually, we started to see flash being used to accelerate critical workloads such as databases with high transaction rates, often using direct attached appliances providing all-flash storage to one or two servers, or internal flash accelerator cards inside individual server nodes.
All flash, disk discarded
As flash started to come down in price, it began to expand more into enterprise storage, appearing as a new performance tier above the level of the fastest enterprise hard disks. For the past several years, storage vendors have also started to offer all-flash array (AFA) products, ditching hard drives entirely.
The way flash storage is connected to the host system has also been undergoing an overhaul recently. It made sense at first to use existing interfaces such as SAS and SATA for compatibility, but eventually these proved to be a bottleneck to the raw throughput offered by flash. Some makers of high-end solid-state drives (SSDs) began to use the PCIe bus, which offers higher speed and connects directly to the processor, but there was no standard protocol stack to support this.
To solve this issue, the industry got together and developed NVMe (Non-Volatile Memory Express). A key feature of this is its support for multiple I/O queues – up to 65,535 – enabling flash storage to service multiple requests in parallel, thereby taking advantage of the internal parallelism of NAND storage devices.
Each queue can also support a queue depth of up 65,535 commands, meaning that NVMe storage systems should be less prone to the performance degradation that SAS and SATA can experience if overloaded with requests.
In turn, NMVe has exposed the network and the storage I/O stack as new bottlenecks. Vendors are now overcoming this by running NVMe over Ethernet or another connect such as Infiniband, and using Remote Direct Memory Access (RDMA) to transfer data from storage into the memory of the accessing server as speedily as if it were requesting data from a direct attached flash drive.
This is not the full story, as flash storage is still pricier than rotating disks in terms of the cost per gigabyte, a situation that is likely to continue for the foreseeable future. This means that while flash has the performance users want, the cost may become prohibitive beyond a certain level of capacity. For this reason, high performance computing (HPC) deployments often rely on a distributed file system and a mix of flash storage and high capacity hard drives to achieve both the performance and scale they require.
Meanwhile, new technologies are emerging, such as Intel and Micron’s 3D XPoint, which boasts lower latency and higher throughput than flash, but is also currently more expensive. This means it is unlikely to displace flash, but because 3D XPoint is byte-addressable, like DRAM, it can be fitted into a DIMM and used as an extra tier in the storage hierarchy between memory and flash.
Samsung has also delivered a faster form of flash, in the shape of Z-NAND. This shares the basic structure of the firm’s current V-NAND technology, but adds a new controller and other enhancements to deliver a low latency of 12-20µs for random reads and 16µs for random writes. Depending on cost, this may capture some of the high performance flash market for Samsung.
While flash is relatively pricey, though, challengers face the hurdle that they need large production volumes to bring down their own pricing down to a competitive level.
In the meantime, flash looks set to remain the technology of choice for applications such as AI and analytics that call for high throughput and low latency. ®