Training Time Comparison: Multi-Token vs. Next-Token Prediction
hackernoon.comThis table (S5) quantifies the training time overhead of multi-token prediction relative to next-token prediction, demonstrating its computational efficiency across different LLM sizes.

This paper is available on arxiv under CC BY 4.0 DEED license.
Authors:
(1) Fabian Gloeckle, FAIR at Meta, CERMICS Ecole des Ponts ParisTech, and contributed equally;
(2) Badr Youbi IdrissiFAIR at Meta, LISN Université Paris-Saclay, and contributed equally;
(3) Baptiste Rozière, FAIR at Meta;
(4) David Lopez-Paz, FAIR at Meta and his the last author;
(5) Gabriel Synnaeve, FAIR at Meta and the last author.
Copyright of this story solely belongs to hackernoon.com . To see the full text click HERE