Nvidia's new open weights Nemotron 3 super combines three different architectures to beat gpt-oss and Qwen in throughput
venturebeatMulti-agent systems, designed to handle long-horizon tasks like software engineering or cybersecurity triaging, can generate up to 15 times the token volume of standard chats — threatening their cost-effectiveness in handling enterprise tasks.
But today, Nvidia sought to help solve this problem with the release of Nemotron 3 Super, a 120-billion-parameter hybrid model, with weights posted on Hugging Face.
By merging disparate architectural philosophies—state-space models, transformers, and a novel "Latent" mixture-of-experts design—Nvidia is attempting to provide the specialized depth required for agentic workflows without the bloat typical of dense reasoning models, and all available for commercial usage under mostly open weights.
Triple hybrid architecture
At the core of Nemotron 3 Super is a sophisticated architectural triad that balances memory efficiency with precision reasoning. The model utilizes a Hybrid Mamba-Transformer backbone, which interleaves Mamba-2 layers with strategic Transformer attention layers.
To understand the implications for enterprise production, consider the "needle ...
Copyright of this story solely belongs to venturebeat . To see the full text click HERE

