The Gug Letter

The Gug Letter

*NEW* Multi Token Prediction Just Made Local Agents Running in vLLM ~3x Faster

A working, easy to follow guide to Multi Token Prediction (MTP) in vLLM, with real benchmarks on Qwen 3.6, Gemma 4, and DeepSeek V4, for anyone serving LOCAL large language models in 2026

Joe Guglielmucci's avatar
Joe Guglielmucci
May 12, 2026
∙ Paid

If you are in any community on twitter or any subreddit talking about building Local AI, i can guarantee this letter will be one of the best places you could possibly start. Might even be all you need for MTP…

Keep reading with a 7-day free trial

Subscribe to The Gug Letter to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2026 Joe Guglielmucci · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture