Perplexity shows how to run monster AI models more efficiently on aging GPUs, AWS networks

2 hours ago theregister.co.uk

AI search provider Perplexity's research wing has developed a new set of software optimizations that allows for trillion parameter or large models to run efficiently across older, cheaper hardware using a variety of existing network technologies, including Amazon's proprietary Elastic Fabric Adapter.

These innovations, detailed in a paper published this week and released on GitHub for further scrutiny, present a novel approach to addressing one of the biggest challenges in serving large-scale mixture of experts models (MoE) at scale: memory and network latency.

Mo parameters, mo problems

MoE models, like DeepSeek V3 and R1 or Moonshot AI's Kimi K2, are big, ranging from 671 billion to 1 trillion parameters. This means they're too large to run on eight-GPU systems using older H100 or H200 GPUs at scale. Sure, in some cases you might be able to fit the model weights, but you won't have enough ...

Copyright of this story solely belongs to theregister.co.uk . To see the full text click HERE

Mo parameters, mo problems

Share: