Research

Post-Training Eats Pretraining

The most important capability gains of the last year didn't come from bigger models. They came from longer, weirder, more deliberate post-training.

By The Memo · Friday, June 12, 2026 · 9 min read

Post-Training Eats Pretraining — Photo · Trnava University / Unsplash

If you read only the press releases, you'd think the story of the last twelve months was scale. It wasn't. The labs that pulled ahead pulled ahead on post-training: reinforcement learning from verifiable rewards, long-horizon RL with tool use, and synthetic data pipelines that are now, in dollar terms, a meaningful fraction of total training cost.

The shift is quiet because labs don't disclose it. The compute bills don't lie, though, and the bills tell a clear story.

Reinforcement learning from verifiable rewards — math, code, formal proofs, anywhere you can check the answer programmatically — has become the dominant driver of reasoning gains. It is also embarrassingly compute-hungry. The labs that built RL infrastructure first now have a compounding advantage. Each generation of RLVR run produces synthetic trajectories that bootstrap the next round; the compounding is real, and labs that started a year later cannot just buy their way to parity.

Subscribers only

Keep reading — it's free.

The Model Memo is a free daily newsletter. Drop your email to unlock the rest of this essay and get tomorrow's in your inbox. Always free, unsubscribe anytime.

Free daily newsletter. Unsubscribe anytime. No spam — ever.

Keep Reading

The Token Burn Economy: Why AI Usage Limits Are Really Workflow Limits

Analysis

Post-Training Eats Pretraining

Keep reading — it's free.

The Token Burn Economy: Why AI Usage Limits Are Really Workflow Limits

The Quiet Collapse of the LLM Moat

Inference Is the New CUDA