Analysis
Research
Post-Training Eats Pretraining
The most important capability gains of the last year didn't come from bigger models. They came from longer, weirder, more deliberate post-training.
By The Memo · Friday, June 12, 2026 · 9 min read
If you read only the press releases, you'd think the story of the last twelve months was scale. It wasn't. The labs that pulled ahead pulled ahead on post-training: reinforcement learning from verifiable rewards, long-horizon RL with tool use, and synthetic data pipelines that are now, in dollar terms, a meaningful fraction of total training cost.
The shift is quiet because labs don't disclose it. The compute bills don't lie, though, and the bills tell a clear story.
Reinforcement learning from verifiable rewards — math, code, formal proofs, anywhere you can check the answer programmatically — has become the dominant driver of reasoning gains. It is also embarrassingly compute-hungry. The labs that built RL infrastructure first now have a compounding advantage. Each generation of RLVR run produces synthetic trajectories that bootstrap the next round; the compounding is real, and labs that started a year later cannot just buy their way to parity.
Subscribers only
Keep reading — it's free.
The Model Memo is a free daily newsletter. Drop your email to unlock the rest of this essay and get tomorrow's in your inbox. Always free, unsubscribe anytime.
Free daily newsletter. Unsubscribe anytime. No spam — ever.
Keep Reading