AMD MI455X Could Combine HBM4 And LPDDR For Massive AI Memory Capacity

5 hours ago hothardware.com

Let's talk about transformers, dear readers. Not robots in disguise, but the neural network architecture that underpins basically every modern AI model. Transformers are smart, but they trade training efficiency for inference complexity. To help reduce the amount of compute needed for complex transformers, we use a thing called a Key Value cache, or KV cache. This stores pre-computed rarely-changing values in memory so that we don't have to re-compute them for every single output token, as you otherwise would, radically accelerating performance.

Without a KV cache, transformers become virtually unusable, but the KV cache is very large on modern bleeding-edge (or "frontier") AI models that support massive context windows and also come with extremely long system prompts. That means it takes up precious space in the very limited HBM available to each GPU.

NVIDIA's Vera Rubin servers solve this by using an incredibly high-bandwidth ...

Copyright of this story solely belongs to hothardware.com . To see the full text click HERE

Share: