Nvidia slaps $20B Groq tech into massive new LPX racks to speed AI response time

1 day, 11 hours ago theregister.co.uk

GTC Nvidia will use Groq's language processing units (LPUs), a technology it paid $20 billion for, to boost the inference performance of its newly-announced Vera Rubin rack systems, CEO Jensen Huang revealed during his GTC keynote on Monday.

Using this technology, the GPU giant can now serve massive trillion parameter large language models (LLMs) at hundreds or even thousands of tokens a second per user, Ian Buck, VP of Hyperscale and HPC at Nvidia told press ahead of Huang's keynote Sunday.

Until now, ultra-low latency inference has been dominated by a handful of boutique chip slingers like Cerebras, SambaNova, and of course, Groq, the latter of which Nvidia all but absorbed as part of an acquihire late last year.

Demand for these so-called premium tokens has grown over the past year. OpenAI is using Cerebras' dinner-plate sized accelerators to achieve near nearly instantaneous code generation for models like ...

Copyright of this story solely belongs to theregister.co.uk . To see the full text click HERE

Share:

More related news