Processing data in place can cut down on a lot of data movement, but the technology wasn’t there — until now.
The idea behind computational storage is not new. It’s just that like so many concepts, the idea has been well ahead of the technology.
In a nutshell, computational storage brings processing power to the storage level. It eliminates the need to load data from the storage system into memory for processing. Moving data between storage and compute resources is inefficient and computational systems, while growing rapidly, can’t keep up with ever-expanding datasets.
Storage capacity is growing by leaps and bounds, but data processing remains unchanged — load the data from storage into memory, process it, and write out changes. The Storage Industry Networking Association (SNIA) has said, quite simply, “Storage architecture has remained mostly unchanged dating back to pre-tape and floppy.”
There have been some workarounds, such as in-memory databases like SAP HANA, which reduce the need to move data to and from storage. Flash drives in arrays are able to bypass the conventional and slower SAS and SATA interfaces between drive and CPU by connecting the controller and flash storage directly to the host’s PCI Express bus.
But it’s not enough. As memory in servers reaches terabytes, datasets for big data/analytics, artificial intelligence and machine learning are hitting petabytes.
Computational storage puts processing onto storage media to process it where it lies. This was impossible with mechanical drives, but it’s do-able with solid-state storage, and increasingly attractive because it costs more power to move data than to process it.
In one example, research by the University of California, Irvine, and NGD Systems suggests that eight- or nine-fold performance gains and energy savings are possible, with most systems offering at least a 2.2X improvement.
For most users, computational storage falls well under the radar. In fact, they probably won’t even know it’s happening because this typically is abstracted away from what they see. But for a large data center, these kinds of approaches can save millions of dollars per year. The key is to make this as seamless as possible, or basically invisible to the end user.
“Software is abstracting the notion that there’s even a server out there,” said Steven Woo, technology fellow and distinguished inventor at Rambus. “All they need to know is there’s this infrastructure that supports a computing model, and it’s going to execute code when some conditions are met. It doesn’t matter where it’s going to execute that code. It only matters that when conditions are met, the code runs. That’s important, because the goal is increased utilization. So instead of renting a virtual machine and sometimes having it sit idle because you can’t type fast enough or you don’t have enough jobs to run on it, now you can launch this little piece of code to run somewhere.”
Fig. 1: The economics of moving data. Source: Rambus
Computational storage is one option in improving this efficiency, but it’s certainly not the only one. However, it is starting to attract more attention.
Samsung and Xilinx validate the concept
Computational storage has been around a few years, if quietly. IBM has been shipping an FPGA in its flash arrays since 2018, called the Flashscore Module. It does simple offload and acceleration of the storage stack, as well as compression of content.
Andy Wall, IBM Fellow and CTO of IBM Flash Storage, believes computational storage will continue to move to the mainstream. “In the case of the FCM, we’re offloading something that is pretty time-consuming, and tough to deal with in software. And so the storage of data reduction becomes easier,” he says.
Last November, Samsung and Xilinx announced the SmartSSD computational storage drive (CSD), what they claimed was the industry’s first adaptable computational storage platform. It combines a Samsung 2.5-inch U.2 SSD, with a single port PCI Express Gen 3 adapter and starting capacity of 3.84TB with a Xilinx Kintex Ultrascale+ KU15P FPGA processor. In addition to the flash memory, each drive comes with 4GB of 2400MHz DDR4 memory.
Although it looks like a consumer device, the SmartSSD is anything but. It offers 2 million hours of Mean Time Before Failure (MTBF), random write speed of up to 800,000 I/O Operations Per Second (IOPS), and random read speed of 110,000 IOPS/sec.
Jamon Bowen, director of business development & product marketing in the Data Center Group at Xilinx, says there are basically two benefits that you get out of computational storage. First, you have an accelerator that’s optimized for the workload, rather than a general-purpose compute device. And second, by avoiding moving data, you avoid memory pollution, as well as the power and complexity that comes with moving things around.
Bowen added that in many computational scenarios, there are a fair number of things that they need to happen in a determined amount of time. “By having basically a hardware solution to that you can guarantee performance and have that guaranteed latency that’s important where things are waiting on results,” he says.
Basic use cases
But don’t look at computational storage to do the heavy lifting right off the bat. Experts agree that it will start with offloading workloads and doing preparatory processing initially, both due to the confines of the processing power and as people get their feet wet with the idea of computational storage.
“You’ve got to figure out what things you want to offload first, and the people who figure that out first will start the market first,” said Chris Tobias, senior director of Optane Solutions and Strategy at Intel. “You’ve got to figure out the simplest tasks first that are most storage-related in some sort. General-purpose compute is way off from doing that your storage.”
There is the potential for using it in databases with existing technologies called sharding, notes Jim Handy, principal analyst with Objective Research. Sharding is where a database is broken into pieces, and then each of the pieces is processed in parallel with each other.
“It’s a way of speeding up how you run the database, but it ends up causing a lot of problems because you end up with things broken in different areas that need to be put back together again, which is called joining,” Handy said. “They’re things that are very, very easy to get wrong.” Most major databases, both commercial and open source, support sharding in their own way. In fact, MySQL is notable for not supporting it.
However, he notes that when you break the database apart, you’re not actually modifying the data. Once it’s broken apart, you’re searching through it, and possibly sorting it, and if you sort the batches differently then you need to address that when joining the shards. Computational storage makes the sharding, searching, sorting, and joining much easier because it’s all in the same memory space, said Handy.
It also could help cut down on the amount of data gathered at the edge, said David Follett, founder and CEO of Lewis Rhodes Labs, developer of a neuromorphic processor for computational storage. He notes that every time a modern commercial jet flies, it generates an average of 4 terabytes worth of data, which gets downloaded at the end of the flight into a large database containing information from all planes traveling at any particular time around the world.
“It’s too much data,” Follett said. “But it’s not yet cost-effective in the broad sense of being able to monetize it. It’s great if you can collect the data. Then the challenge is how do we use this effectively.”
Computational storage at the edge provides a chance to whittle down unnecessary data and only send what is valuable to the main data center systems.
App rewrites in the cards?
The plus of computational storage is the elimination or reduction of data movement. But that kind of change in function could also mean a fundamental shift in how apps operate. Experts are split on whether enterprise apps such as Hadoop and Spark, or Oracle’s database, will need a rewrite.
“The only way that you can take advantage of this is to completely restructure your software,” said Chris Tobias, senior director for Optane solutions and strategy at Intel. “What you’re doing now is you’re saying is we have this piece of [an application] that the computational storage does a good job of. We’re going take that responsibility away from the server software, and then farm out multiple copies of this one piece to the SSD, and this is where they’re all going going to execute that piece. Somebody’s got to do the chopping up of the server software into the piece that goes into the SSDs.”
Steve Fingerhut, president and chief business officer for Pliops, which has its own computational storage processor in the works, said it depends on the application. But it requires a significant change in architecture to where you have to basically rewrite your application to move entire functions.
“If it requires something that is unique or a major change to the architecture, that’s going to create a lot of friction. And it’s going to be difficult for companies to embrace. Using an open specification, like a block interface, means that companies can port to it and know they have ultimate flexibility in the future. They’re not going to be locked into any one vendor,” he said.
But Fingerhut adds that commercial software applications may be slow to embrace this approach. “It’s the hyperscale data center operators who are buying into this, and those people are in complete control of their software and in complete control of their hardware.”
Handy also figures the big software vendors will probably take a wait-and-see approach. If computational storage catches on in a big way with their customer base, then they’ll support it. “But I don’t see them being the first mover to support computational storage. That would be the hyperscalers,” he said.
Hao Zhong, CEO of computational storage vendor ScaleFlux, said moving to computational storage all starts with the APIs, not the applications. He cites Nvidia’s CUDA language as the perfect example of the right API and library for different GPU apps. Something similar needs to happen to make use of computational storage, databases, and big data.
“What needs to happen is, once the computational storage has this standard API, they can easily partake of those frameworks. Database vendors can adopt those APIs and have two versions [of the database], which means they can use computational storage to greatly accelerate the workload. And meanwhile, if the computational storage is not available, their software can stay around,” he said.
Fighting the laws of physics
The tech industry is good at advancing technology, but it will face a real challenge in advancing computational storage due to inherent contradictions, notes Follett.
“Storage, fundamentally, likes to be dense, and it likes to be relatively cool. Computation likes to be dense and relatively hot. And these two things are pretty diametrically opposed. SSDs typically have thermal cutouts around 75° Celsius. So an SSD is going to throttle itself at 75° Celsius, which can happen when packaged in tightly in a U.2 with a decent size FPGA,” he said.
The U.2 drive looks like a SATA consumer drive, meaning a plastic 2.5-inch encasing. How do you put a heat sink on that?
“If you want to do anything computationally significant, you got to figure out what the heck you’re going to do with the heat,” he said. “You’ve got to get it out of there somehow. This has been a fundamental blocker. Where do you put the heat?”
Wall disagrees, saying you can cool SSDs in enterprise enclosures, including servers and all-flash arrays. “The NVMe spec limits U.2 to 25W, but all these servers and storage units provide cooling mechanisms to ensure that the drives stay under their temperature limits. We already have mechanisms for cooling our SSDs,” he said.
Bowen said the SmartSSD CSD is designed to run in a standard 25W power envelope for a U.2. “A big part of the joint design was ensuring this limit is met with both the SSD and FPGA portions working in tandem, like NAND using more power during writes. The FPGA power is based on the IP loaded and its clock rate,” he said.
He added there is a specification called enterprise storage form-factor (EDSFF), which anticipated more needs and supports power that is both lower (12.5 W – E1.s thin) and higher (70W -E3L 2T). The spec is now managed by SNIA.
Bowen believes momentum is building for computational storage. In fact, there is a technical proposal within the NVMe Consortium to have compute added into the NVMe framework.
The Consortium is going to use the standard NVMe driver to call the accelerator functions and transfer data between storage and devices, rather than having each individual computational storage vendor like Samsung, Pliops, ScaleFlux, etc., do its own implementation.
Zhong also believes that adding computational storage to the NVMe standard will increase support for the concept, with a few major data centers starting the deployments in the next 12 to 18 months as the standard matures.
Moving Data And Computing Closer Together
This is far from simple, but the power/performance and latency benefits are potentially huge.
Computing Where Data Resides
Computational storage approaches push power and latency tradeoffs.
New Power, Performance Options At The Edge
Tradeoffs grow as chipmakers leverage a variety of architectural choices.
Data Overload In The Data Center
Which architectures and interfaces work best for different applications.
Great overview of great technology!
I see future databases where the index is kept on the storage device and updated as soon as a record is written in it, thus saving the processor from doing the steps in memory then copying it into storage.
(Note: This name will be displayed publicly)
(This will not be displayed publicly)
document.getElementById( “ak_js” ).setAttribute( “value”, ( new Date() ).getTime() );
Is there room for an even smaller version of a RISC-V processor that could replace 8-bit microcontrollers?
Concerns about security and supply chain availability prompt new push.
Short-term IC supply-chain problems and long-term architectural and business changes top the list of what’s ahead.
Cache coherency is expensive and provides little or negative benefit for some tasks. So why is it still used so frequently?
Power devices shine in China; 94 companies raise over $5B.
Sharing resources can significantly improve utilization and lower costs, but it’s not a simple shift.
Is there room for an even smaller version of a RISC-V processor that could replace 8-bit microcontrollers?
New levels of system performance bring new tradeoffs.
Smarts, not just functionality, is moving to everything from watches and glasses to clothing, but there are still barriers to overcome.
Technologies are being designed into more systems as defectivity drops and reliability increases.
It’s not easy to include interposers in a design today, but as the wrinkles get ironed out, new tools, methodologies, and standards will enable it for the masses.
Almost every letter of the alphabet has been used to describe a processor architecture, but under the hood they all look very similar.
Optimizing energy and power efficiency in server chips and software is a multifaceted challenge with many moving parts.