In context: The primary iteration of high-bandwidth reminiscence (HBM) was considerably restricted, solely permitting speeds of as much as 128 GB/s per stack. Nonetheless, there was one main caveat: graphics playing cards that used HBM1 had a cap of 4 GB of reminiscence resulting from bodily limitations. Nonetheless, over time HBM producers resembling SK Hynix and Samsung have improved upon HBM’s shortcomings.
HBM2 doubled potential speeds to 256 GB/s per stack and most capability to eight GB. In 2018, HBM2 acquired a minor replace referred to as HBM2E, which additional elevated capability limits to 24 GB and introduced one other velocity enhance, ultimately hitting 460 GB/s per chip at its peak.
When HBM3 rolled out, the velocity doubled once more, permitting for a most of 819 GB/s per stack. Much more spectacular, capacities elevated almost threefold, from 24 GB to 64 GB. Like HBM2E, HBM3 noticed one other mid-life improve, HBM3E, which elevated the theoretical speeds as much as 1.2 TB/s per stack.
Alongside the best way, HBM slowly bought changed in consumer-grade graphics playing cards by extra inexpensive GDDR reminiscence. Excessive-bandwidth reminiscence grew to become an ordinary in information facilities, with producers of workplace-focused playing cards opting to make use of the a lot sooner interface.
All through the varied updates and enhancements, HBM retained the identical 1024-bit (per stack) interface in all its iterations. In keeping with a report out of Korea, this will likely lastly change when HBM4 reaches the market. If the claims show true, the reminiscence interface will double from 1024-bit to 2048-bit.
Leaping to a 2048-bit interface may theoretically double switch speeds once more. Sadly, reminiscence producers is likely to be unable to take care of the identical switch charges with HBM4 in comparison with HBM3E. Nonetheless, the next reminiscence interface would permit producers to make use of fewer stacks in a card.
For example, Nvidia’s flagship H100 AI card at present makes use of six 1024-bit recognized good stacked dies, which permits for a 6144-bit interface. If the reminiscence interface doubled to 2048-bit, Nvidia may theoretically halve the variety of dies to a few and obtain the identical efficiency. After all, it’s unclear which path producers will take, as HBM4 is sort of actually years away from mass manufacturing.
Each SK Hynix and Samsung imagine they may have the ability to obtain a “100% yield” with HBM4 after they start to fabricate it. Solely time will inform if the reviews maintain water, so take the information with a grain of salt.