Tech »  Topic »  Inference-optimized chip 30% cheaper than any other AI silicon on the market today, Azure's Scott Guthrie claims

Inference-optimized chip 30% cheaper than any other AI silicon on the market today, Azure's Scott Guthrie claims


Microsoft on Monday unveiled a new in-house AI accelerator to rival Nvidia's Blackwell GPUs.

Fabbed on TSMC's N3 process node, Redmond's second-gen Maia 200 accelerator packs 144 billion transistors capable of churning out a collective 10 petaFLOPS of FP4 performance.

That puts the chip in direct contention with Nvidia's first-generation Blackwell GPUs, like the B200 — at least in terms of inference.

According to Scott Guthrie, EVP of cloud and AI at Microsoft, the chip has been "specifically optimized for inferencing very large models, including both reasoning and chain of thought."

Compared to training, inference is much more sensitive to memory bandwidth. For each token (think words or punctuation) generated, the entirety of the model's active weights needs to be streamed from memory. Because of this, memory bandwidth puts an upper bound on how interactive — that's how many tokens per second per user — a system ...


Copyright of this story solely belongs to theregister.co.uk . To see the full text click HERE