Google’s new compression algorithm cut memory stocks within hours of publication
thenextweb.com
Google published a research blog post on Tuesday about a new compression algorithm for AI models. Within hours, memory stocks were falling. Micron dropped 3 per cent, Western Digital lost 4.7 per cent, and SanDisk fell 5.7 per cent, as investors recalculated how much physical memory the AI industry might actually need.
The algorithm is called TurboQuant, and it addresses one of the most expensive bottlenecks in running large language models: the key-value cache, a high-speed data store that holds context information so the model does not have to recompute it with every new token it generates. As models process longer inputs, the cache grows rapidly, consuming GPU memory that could otherwise be used to serve more users or run larger models. TurboQuant compresses the cache to just 3 bits per value, down from the standard 16, reducing its memory footprint by at least six times without, according ...
Copyright of this story solely belongs to thenextweb.com . To see the full text click HERE

