Why Stochastic Rounding is Essential for Modern Generative AI
google cloudblog
In computing's early days of the 1940s, mathematicians discovered a flawed assumption about the behavior of round-off errors. Instead of canceling out, fixed-point arithmetic accumulated errors, compromising the accuracy of calculations. A few years later, "random round-off" was proposed, which would round up or down based on a random probability proportional to the remainder.
In today's age of generative AI, we face a new numerical challenge. To overcome memory bottlenecks, the industry is shifting to lower precision formats like FP8 and emerging 4-bit standards. However, training in low precision is fragile. Standard rounding destroys the tiny gradient updates driving learning, causing model training to stagnate. That same technique from the 1950s, now known as stochastic rounding, is allowing us to train massive models without losing the signal. In this article, you'll learn how frameworks like JAX and Qwix apply this technique on modern Google Cloud hardware to ...
Copyright of this story solely belongs to google cloudblog . To see the full text click HERE

