In short: GPUs have reminiscence limitations when dealing with the calls for of AI and HPC purposes. There are methods round this bottleneck, however the options might be costly and cumbersome. Now, a startup headquartered in Daejeon, South Korea, has developed a brand new method: utilizing PCIe-attached reminiscence to increase capability. Growing this answer required leaping by way of many tech hoops and there are nonetheless challenges forward. Specifically, will AMD, Intel, and Nvidia help the know-how?
Reminiscence necessities stemming from superior datasets for AI and HPC purposes typically swamp the reminiscence constructed right into a GPU. Increasing that reminiscence has usually meant putting in costly excessive bandwidth reminiscence, which frequently introduces modifications to the prevailing GPU structure or software program.
One answer to this bottleneck is being provided by Panmnesia, an organization backed by South Korea’s KAIST analysis institute, which has launched new tech that permits GPUs to entry system reminiscence straight by way of a Compute Specific Hyperlink (CXL) interface. Primarily, it permits GPUs to make use of system reminiscence as an extension of their very own reminiscence.
Referred to as CXL GPU Picture, this PCIe-attached reminiscence has a double-digit nanosecond latency that’s considerably quicker than conventional SSDs, the corporate says.
Panmnesia needed to overcome a number of tech challenges to develop this method.
CXL is a protocol that works on prime of a PCIe hyperlink, however the know-how needs to be acknowledged by an ASIC and its subsystem. In different phrases, one can’t merely add a CXL controller to the tech stack as there is no such thing as a CXL logic material and subsystems that help DRAM and/or SSD endpoints in GPUs.
Additionally, GPU cache and reminiscence subsystems don’t acknowledge any expansions besides unified digital reminiscence (UVM), which isn’t quick sufficient for AI or HPC. In assessments by Panmnesia, UVM carried out the worst amongst all examined GPU kernels. The CXL, nevertheless, supplied direct entry to expanded storage through load/retailer directions, eliminating the problems hampering UVM similar to overhead from host runtime intervention throughout web page faults and transferring knowledge on the web page degree.
What Panmnesia developed in response is a collection of {hardware} layers that help all the key CXL protocols, consolidating them right into a unified controller.
The CXL 3.1-compliant root complicated has a number of root ports supporting exterior reminiscence over PCIe and a bunch bridge with a host-managed gadget reminiscence decoder that connects to the GPU’s system bus and manages the system reminiscence.
There are different challenges that Panmnesia is dealing with that transcend its management, an enormous one being that AMD and Nvidia should add CXL help to their GPUs. It’s potential that business gamers resolve they just like the method of utilizing PCIe-attached reminiscence for GPUs – and go on to develop their very own know-how.