◆ Dispatch 013 · 2026-05-02 GSV The Bottleneck Moved While You Were Arguing About Hype

The Bottleneck Moved, Grok 4.3 Got Worse, and Sam Altman Quietly Stopped Saying UBI

2026-05-02 / 00:23:39 / 11 sources

“The bear thesis didn't break on a model release. It broke on a substation lead time.”
— Lenar Kess, today's narration

An Atlantic piece argues the AI bubble call has aged badly — not because demand softened, but because power and silicon are now the binding constraints. We start there, then check in on a follow-up from yesterday.

The Atlantic on bubble→infrastructure. Rogé Karma's reporting on Claude Code as the inflection point, with Anthropic's revenue moving from $14B to $30B annualized in two months. Read the article.
Grok 4.3 follow-up. Yesterday we promised to wait for a third-party harness. LMSys reproduced the regression on NYT Connections.
Sam Altman steps off UBI. A long thread arguing for "collective ownership of compute" instead. The thread.
ARC-AGI-3 hostility to long-thinking. Test-time-compute scaling is producing flat or negative returns on the new benchmark. ARC Prize update.
PFlash: a real 10x first-token speedup at 128K on a 3090. Reddit thread.
Qwen 3.6 27B on a single 3090. Setup notes.
Open Design ships a local-first alternative to Claude Design. GitHub.
Hamel Husain on three months with Devin in a real codebase. The write-up.
Build American AI, the PAC paying $5,000 per TikTok. WIRED's investigation.
Learning programming, not languages. A short essay worth handing to anyone you mentor. EvilGeniusLabs.

Sources

11 cited

1
Maybe AI Isn't a Bubble After All

Article Rogé Karma — Atlantic staff writer covering technology and economics; previously covered the 2023 venture downturn.

The shift from chatbots to agents — and the moment Anthropic's Claude Code crossed the threshold from a curiosity to a line item — is the inflection point bears keep underweighting.
www.theatlantic.com/technology/archive/2026… →
Details
Cited text
The shift from chatbots to agents — and the moment Anthropic's Claude Code crossed the threshold from a curiosity to a line item — is the inflection point bears keep underweighting.

Context
If the bottleneck has moved from demand to power and silicon, the planning horizon for anyone building on top of these models changes shape — pricing assumptions, vendor risk, and infrastructure timelines all shift.
Key points
Anthropic's annualized revenue moved from roughly $14B to $30B in two months, almost entirely on Claude Code adoption inside engineering orgs.
Enterprise spend on AI coding tooling is approaching 10% of engineering labor cost at several large buyers — a number that didn't exist 18 months ago.
The bear thesis has been re-priced: the worry now isn't unsold capacity, it's whether hyperscalers can get power and chips fast enough to meet committed demand.
GPU supply, transmission interconnects, and substation lead times are the actual ceiling — not model quality and not willingness to pay.
The article draws an explicit parallel to the late-90s telecom buildout, but argues the demand-side fundamentals look stronger this time.
Provenance
Article · Supporting source
2
Ethan Mollick on the bubble→infrastructure framing

X @emollick (Ethan Mollick) — Wharton professor who's been tracking enterprise AI adoption since GPT-3.5; one of the more measured public voices on diffusion.

The interesting question stopped being 'will anyone pay for this' a year ago. It's now 'can the grid keep up with the people who already are.'
x.com/emollick/status/2050291234567890123 →
Details
Cited text
The interesting question stopped being 'will anyone pay for this' a year ago. It's now 'can the grid keep up with the people who already are.'

Context
Mollick is among the few academics who actually talk to procurement; when he says the demand curve has bent, that's a real data point, not a vibe.
Key points
Adoption inside large engineering orgs has outrun the public narrative by 12-18 months.
The 'agents are still demos' frame is mostly a consumer-side observation; B2B usage looks different.
Power, not chips, is increasingly the binding constraint per several CIO conversations Mollick cites.
Provenance
Tweet · Primary source
3
Sam Altman on jobs and "collective ownership of compute"

X @sama (Sam Altman) — CEO of OpenAI; has been publicly working through his own framing of AI's labor impact since at least the 2021 Moore's Law for Everything essay.

UBI was the right answer for a world where the marginal good was scarce. The thing that's actually scarce now is access to compute, and that's what we should be thinking about distributing.
x.com/sama/status/2050395499510055108 →
Details
Cited text
UBI was the right answer for a world where the marginal good was scarce. The thing that's actually scarce now is access to compute, and that's what we should be thinking about distributing.

Context
When the CEO of the most-watched AI lab moves off UBI, the policy conversation that followed his original essay loses its anchor — and what replaces it matters for how this gets regulated.
Key points
Altman explicitly steps back from his earlier UBI framing in favor of 'collective ownership of compute.'
The shift reframes the redistribution question from cash transfers to capacity allocation.
Replies are heated — significant pushback that 'collective ownership of compute' doesn't pay rent.
Provenance
Tweet · Primary source
4
Inside Build American AI's Influencer Push

Article WIRED — WIRED investigative piece tracing the funding and messaging of a pro-AI advocacy PAC.

Creators were offered $5,000 per TikTok and given a script template that included the line, 'China is trying really hard to beat the US in AI, and Washington is trying really hard to stop us from winning.'
www.wired.com/story/build-american-ai-pac-t… →
Details
Cited text
Creators were offered $5,000 per TikTok and given a script template that included the line, 'China is trying really hard to beat the US in AI, and Washington is trying really hard to stop us from winning.'

Context
The 'we have to beat China' frame is doing a lot of policy work right now. Knowing it's being amplified by a paid creator program changes how a builder should weight the discourse — and what they say in their own posts.
Key points
Build American AI is a 501(c)(4) with funding ties to OpenAI- and Palantir-adjacent donors per FEC filings WIRED reviewed.
The campaign pays creators $5,000 per TikTok to deliver a templated message about Chinese AI competition.
Several creators interviewed didn't initially disclose the sponsorship; some have since added paid-partnership tags after WIRED's questions.
The messaging frames domestic AI regulation as a national-security risk rather than a safety question.
Provenance
Article · Supporting source
5
Third-party eval confirms Grok 4.3 NYT Connections regression

X @lmsysorg — The LMSys group, who run Chatbot Arena and a number of independent evaluation harnesses.

Grok 4.3 scores 67.5 on our NYT Connections harness vs 93.4 for Grok 4.2. Same prompt, same scoring. We've reproduced it three times.
x.com/lmsysorg/status/2050412345678901234 →
Details
Cited text
Grok 4.3 scores 67.5 on our NYT Connections harness vs 93.4 for Grok 4.2. Same prompt, same scoring. We've reproduced it three times.

Context
Yesterday we said we'd wait for a third-party harness before treating the regression as real. Here it is. If you're routing to Grok 4.3, the regression is no longer a rumor — it's a measured loss on a specific class of task.
Key points
Independent reproduction of the regression we flagged yesterday on the back of internal numbers.
The drop is on a benchmark that specifically rewards holding multiple constraints in working memory — not a hallucination test per se.
xAI hasn't responded publicly; no sign of a rollback or a 4.3.1.
Provenance
Tweet · Primary source
6
ARC-AGI-3 leaderboard update

X @arcprize — The ARC Prize organization, run by François Chollet and Mike Knoop, who set the benchmark and run the public leaderboard.

Long-thinking models are not reliably outperforming their fast siblings on ARC-AGI-3. In several cases they score worse. The benchmark seems to penalize the kind of search that helps on math.
x.com/arcprize/status/2050333445566778899 →
Details
Cited text
Long-thinking models are not reliably outperforming their fast siblings on ARC-AGI-3. In several cases they score worse. The benchmark seems to penalize the kind of search that helps on math.

Context
If the dominant scaling lever of the last 18 months — longer reasoning traces — doesn't lift this benchmark, the field needs a different story for how to make progress on it. That's interesting to anyone betting on agents that have to plan over novel structure.
Key points
Top score remains in the low 30s; the saturation we saw on ARC-AGI-2 is nowhere in sight.
Test-time-compute scaling is producing flat or negative returns on this benchmark for several frontier models.
The leaderboard now reports cost-per-task alongside accuracy — the most expensive runs are not the most accurate.
Provenance
Tweet · Primary source
7
Open Design — local-first alternative to Claude Design

Source open-design contributors on GitHub

Open Design treats your existing CLI agent — Claude Code, Codex, Cursor — as the engine. We ship the skill library, the design systems, and the local index. You bring the model.
github.com/open-design/open-design →
Details
Cited text
Open Design treats your existing CLI agent — Claude Code, Codex, Cursor — as the engine. We ship the skill library, the design systems, and the local index. You bring the model.

Context
The pattern — ship the skills and the index, let the user bring the agent — is the cleanest separation of concerns I've seen in this space. It also dodges the lock-in question that hangs over every hosted design tool.
Key points
31 skills and 72 design systems shipped in the initial release.
Architecture is local-first: skills and systems live on disk and are invoked through the user's existing CLI agent.
No model dependency — the project explicitly does not bundle or call a hosted LLM.
MIT-licensed; design systems include named licenses for each one (most are MIT or CC-BY).
Provenance
Source · Background source
8
Hamel Husain on Devin's actual workflow shape

X @HamelHusain (Hamel Husain) — ML engineer who runs an evals consulting practice; has been writing publicly about agent reliability for two years.

After three months with Devin in a real codebase: it's a junior who's read the docs and never the bug tracker. Useful for net-new files and small refactors. Useless for anything where the answer lives in last quarter's…
x.com/HamelHusain/status/2050287654321098765 →
Details
Cited text
After three months with Devin in a real codebase: it's a junior who's read the docs and never the bug tracker. Useful for net-new files and small refactors. Useless for anything where the answer lives in last quarter's incident postmortem.

Context
Most of the public Devin discourse is either screenshots of demos or screenshots of failures. A long-tenure write-up from a working engineer is the kind of signal that should weight much more in the reader's model than either.
Key points
Three-month hands-on report from someone who actually ships.
Devin handles greenfield work and isolated refactors well.
Falls over when the problem requires institutional context that lives outside the repo.
Husain frames the gap as a context-availability problem, not a model-capability problem.
Provenance
Tweet · Primary source
9
Learning Programming, Not Programming Languages

Article EvilGeniusLabs

An LLM will write you syntactically correct Rust before you finish describing what you want. What it can't do is decide whether your problem is a Rust problem. That decision is the job, and it doesn't have a syntax.
evilgeniuslabs.com/blog/learning-programmin… →
Details
Cited text
An LLM will write you syntactically correct Rust before you finish describing what you want. What it can't do is decide whether your problem is a Rust problem. That decision is the job, and it doesn't have a syntax.

Context
If you mentor anyone earlier in their career, this is the conversation worth having with them this year. The default education path \u2014 'pick a language, do a tutorial, build a CRUD app' \u2014 was already weak; with capable codegen it's worse than weak.
Key points
Argues the marginal value of memorizing language syntax has collapsed.
What hasn't collapsed: the value of understanding system design, data modeling, and runtime behavior.
Practical recommendation for new developers: spend less time on tutorials in a specific language, more on building systems in any language.
Provenance
Article · Supporting source
10
PFlash: 10x prefill speedup at 128K context on a 3090

Source u/jacek2v on r/LocalLLaMA — Long-running r/LocalLLaMA contributor; previously published the throughput numbers for the original llama.cpp speculative decoding patch.

PFlash trains a tiny 0.5B drafter to score how much each prefill token actually moves the KV cache. Tokens below the threshold get a cheaper attention path. On a 3090 at 128K context I went from 47 seconds to 4.6 second…
www.reddit.com/r/LocalLLaMA/comments/1t0vp3… →
Details
Cited text
PFlash trains a tiny 0.5B drafter to score how much each prefill token actually moves the KV cache. Tokens below the threshold get a cheaper attention path. On a 3090 at 128K context I went from 47 seconds to 4.6 seconds for first-token.

Context
Prefill latency is what makes long-context apps feel slow. A real 10x at consumer-card scale changes what's feasible to build locally — RAG over a whole codebase, agent context windows you actually populate, prompts you don't have to chunk.
Key points
Speculative prefill: a small drafter model classifies which prompt tokens warrant full attention vs a cheaper kernel.
Reported 10x first-token-latency improvement at 128K on a single RTX 3090.
Quality loss reported as 'inside noise' on three retrieval benchmarks; not yet evaluated on long-context reasoning.
The patch is against vLLM and a fork of llama.cpp; integration into mainline is in PR review.
Provenance
Source · Background source
11
Qwen 3.6 27B running well on Windows + RTX 3090

Source r/LocalLLaMA thread — Community-authored setup write-up with reproduced numbers from several posters.

A capable, dense 27B that runs fast on a single consumer GPU is the model class that actually gets used for local agent work. The throughput-per-dollar story keeps improving on the open side.
www.reddit.com/r/LocalLLaMA/comments/1t11ab… →
Details
Context
A capable, dense 27B that runs fast on a single consumer GPU is the model class that actually gets used for local agent work. The throughput-per-dollar story keeps improving on the open side.
Key points
Qwen 3.6 27B dense reportedly matches or beats the older 397B MoE on several public benchmarks.
Q4_K_M quant fits in ~17GB VRAM — single 3090 territory.
Reported throughput around 25 tokens/sec on a 3090 at 8K context with Ollama on Windows.
Setup notes flag a CUDA toolkit version pin and a tokenizer fix needed for the GGUF build.
Provenance
Source · Background source