Why Send a Simple Question to a Billion Dollar Data Center?

stanford's research confirms local ai 'been had been' bodying the cloud / intel per watt

Jul 01, 2026

The research confirming signal, intelligence per watt, go local.

In 2026, the entire AI industry has been racing towards one capital goal: build bigger data centers, buy more chips, burn more electricity. Aka horde compute. Google’s cloud grew 1300% in the last year alone. (that will certainly die, but still…)

This Stanford research paper [ Stanford Research Hazy Studio Paper ] written by Chris Re and legendary chip architect John Henessy, makes the argument that cuts against that entire industry strategy:

https://hazyresearch.stanford.edu/blog/2025-11-11-ipw

a huge share of what people actually use AI for doesn’t need any of that.

FORWARD

It can run on ordinary local hardware. Sounds a lot like my previous letters: [Is Your Junk Drawer a Supercomputer? [2025]] go read these, I figured this would happen. [Is Compute the New Oil? [2025]] , Rule #1: [VRAM > Unified Memory [2026].

Historical precedent. Before personal computers existed, all computing lived on massive shared mainframes, because nothing smaller could do the job. That changed not because PC’s became more powerful than mainframes, but because chip efficiency kept doubling. Roughly every 1.5 years. Until…local machines became good enough to handle real work without the cost of a shared, giant system.

Once that threshold was crossed, computing power redistributed away from centralized mainframes permanently. Entire industries were built on that shift (Microsoft, Apple, Dell) aka the whole PC economy you know today. While the former mainframe giants (IBM’s bread and butter) shrank into a niche.

The paper’s argument: AI compute is approaching that same threshold right now… It is a major claim with data, that argues where economic value and infrastructure spending should be heading.

Most AI demand is simple, which, can be a problem. Data cited in the paper shows 77% of everyday LLM requests are simple things: writing help, general advice, quick lookups. Not frontier level reasoning.

Yet, the industry routes nearly all of this demand through the same expensive infrastructure built for the hardest 5% of problems.

To me, that is an efficiency mismatch at a massive scale. It means a large fraction of the capital being poured into new data centers may be overbuilt for the workload that is showing up in reality.

What the researchers actually tested. To check whether local ai could actually absorb simpler demand, they ran a large controlled test: over 1 million queries, across 20+ small ai models (Qwen3, Gemma, IBM Granite etc.) and 8 types of chips: 3 consumer grade (i wonder if 3090’s?) and 5 enterprise grade (probably 5 RTX 6000s).

What i think of when i think local ai haha cleaaaan tho

The results. Local models correctly handled 88.7% of everyday chat and reasoning questions, meaning most typical ai demand doesn’t structurally require a data center at all. Local model accuracy is up 3.1x from 2023-2025 (absolutely way more in 2026 that isnt measured). A group of local models working together, beat cloud routed ai on 3 of 4 benchmarks. Local chips are still catching up in efficiency about 1.5x behind for now. Local ai efficiency shown to improve 5.3x in those same 2 years.

They also introduced a new measurement system: Intelligence Per Watt (IPW): how much correct outpute you get per watt of power used.

Clearly indicating a deliberate shift in what the industry should be optimizing for…

This is a bigger story than just devices. If a large share of AI workloads can shift to a local/edge hardware, the implications go well past consumer gadgets:

Capex risk. Hundreds of billions in planned data center spending assumes demand keeps centralizing. If a meaningful chunk of demand redistributes to local instead, that changes the ROI math for hyperscalers.

Energy and grid strain. Every workload that runs locally instead of in a data center is one less strain on power grids that are already being stretched by ai buildouts. A policy and infrastructure issue, not a hypothetical.

Competitive landscape shift. This is good news for chipmakers focused on efficient local silicon (Apple, Qualcomm, AMD) relative to the current “sell more giant GPUs to data centers” model that NVIDIA has ridden. It’s the mainframe to PC story with new players in the mainframe and PC roles.

AWS's Project Rainier: the world's most powerful computer for... — Local ai could shrink this: AWS Rainier

Business model pressure. Companies charging per API call for cloud AI face losing entire business models if good enough local alternatives are free to run once you own the hardware. People will not spend $3k+ a year for intelligence if they can build a pc that does it for them.

The paper’s data stops in 2025. Imagine if this was released now, after the advent of hermes getting nvidia and linux support, plus models like GLM 5.2, the king of 3090’s: qwen 3.6, etc. Every point would hit 10x harder.

These are the things I am interested to see the paper expound on after 2026 is recorded history:

New local chips keep shipping with more on device ai throughput per watt. So the hardware is already adapting to the paper’s argument in a way.

Small open models keep closing the gap with frontier models faster than frontier models can even pull away anymore. Eventually…it won’t even be a measurement, or a thought, just run it local.

Facts. Some reasonable local unimems can already run models approaching 200 billion parameters, but, I say that to say: VRAM is laughing at those numbers if unimem can pull it off, your GPU can scrape more. A smartphone NPU can run a 7B model at 8+ tok/s, which is actually usable real time ai, on a phone chip. The research had direct collaboration and input from SambaNova, Ollama, Snorkel AI, IBM, OpenRouter, AMD, TogetherAI aka actual infrastructure and chip companies, not just a university lab.

The team also shipped a tool. The open sourced, hardware agnostic, benchmarking harness, so anyone can measure their own “intelligence per watt".” They are saying, hey…it isnt just theory…

Cant wait to watch these cloud companies try to cope with this in live time haha..

So, if its already the case that over 80% of ai usage could be done and served by a local model on older hardware, it doesnt make a lot of sense to keep building data centers everywhere because they can’t all operate in the green with this kind of progression.

Don’t send the simple prompt to a billion dollar data center if your old laptop would be happy to do it and keep your data local. If you fall outside of the 88% (like myself and most builders), its still a signal to go local instead of subscriptions.

What do you think?

God-Willing, see you at the next letter

GRACE & PEACE

VISIT JoeGuglielmucci.com TODAY

Discussion about this post

Ready for more?