◆ Dispatch 026 · 2026-05-17 Braixd

The curriculum, the complaints, and the drive-thru

2026-05-17 / 00:09:05 / 5 sources

“When do you reach for other models instead of Claude? — Sholto Douglas, getting 800 replies on a Sunday morning.”
— Seln Oriax, today's narration

Susan Zhang points out what kids in Shenzhen's science museum learn about — supply chain logistics, photolithography, MXene materials, biological 3D printing — and asks what the rest of us are teaching our kids. It's a small observation with a big echo. See her full photo tour here.

Sholto Douglas asks what makes people reach for other models instead of Claude, and gets 800 replies. The answers are specific: Claude confuses PDF form fields, over-filters bio research, treats a question about your database as a migration request, and writes training code that breaks the model names. The thread and replies are here.

The Verge's Emma Roth documents how AI drive-thrus are rolling back after user frustration, and John Gruber makes the case that AI is technology, not a product — both stories land on the same tension: how we position AI versus how it actually behaves in the wild.

Plus a local Qwen 3.6 benchmark that suggests the gap with frontier models is narrowing on concrete coding tasks. See the benchmark details on the subreddit.

Chapters

00:00:04 Shenzhen's science museum
00:01:37 What makes people leave Claude
00:04:03 AI at the drive-thru, and why the positioning keeps slipping
00:07:28 Local models, closing the gap

Sources

5 cited

1
AI Is Technology, Not a Product

Article John Gruber — Daring Fireball blogger and Apple commentator

AI is pervasive. It can't be ignored. But it's just technology. Wireless networking is pervasive too. But Apple doesn't have a killer wireless networking product. Wireless networking simply pervades everything Apple mak…
daringfireball.net/2026/05/ai_is_technology… →
Details
Cited text
AI is pervasive. It can't be ignored. But it's just technology. Wireless networking is pervasive too. But Apple doesn't have a killer wireless networking product. Wireless networking simply pervades everything Apple makes.

Context
Gruber's essay cuts through the 'AI product' framing that's been dominating the conversation. His wireless connectivity comparison is the cleanest analogy I've seen for how AI will actually integrate into existing products.
Key points
Gruber responds to Steven Levy's Wired piece about Apple's next CEO needing to launch a killer AI product
He argues AI is like wireless connectivity — woven into everything, not a standalone product
He dismisses the notion that AI agents will replace phone interactions by decade's end as 'fever dream high-on-the-hype fantasy'
He notes Apple already has no killer product for any pervasive technology — it weaves everything in
Provenance
Article · Supporting source
2
Local Qwen 3.6 vs frontier models on a coding primitive

Article Fragrant-Remove-9031

The distilled Qwen 3.6 model at 27B parameters producing competitive results on a specific coding task suggests the reasoning layer is becoming more portable than parameter counts would predict. The gap is narrowing on…
www.reddit.com/r/LocalLLaMA/comments/1tf3p6… →
Details
Context
The distilled Qwen 3.6 model at 27B parameters producing competitive results on a specific coding task suggests the reasoning layer is becoming more portable than parameter counts would predict. The gap is narrowing on narrow, well-defined tasks.
Key points
Test compared frontier models against local Qwen 3.6 on a single HTML canvas driving animation task
Frontier models tested: Claude Sonnet 4.6 Thinking, Gemini 3.1 Pro Thinking, GPT 5.4 Thinking, Kimi k2.6 Thinking
Local Qwen3.6-27B Claude-opus-reasoning-distilled ran at 2.65 tok/s on a Ryzen 5 5600 with RX 5700 XT
Community rated the distilled Qwen result as competitive with frontier models on this concrete coding task
Engagement
515 likes · 160 replies

Provenance
Article · Supporting source
3
When do you reach for other models instead of Claude?

X Sholto Douglas — Leads developer infrastructure at Anthropic

When do you reach for other models instead of Claude? What can we do better? Hit me with all of your frustrations. dms open. If you can give me detail (e.g. specifics/transcipts) - it'll help a lot in finding out exactl…
x.com/_sholtodouglas/status/205583603216857… →
Details
Cited text
When do you reach for other models instead of Claude? What can we do better? Hit me with all of your frustrations. dms open. If you can give me detail (e.g. specifics/transcipts) - it'll help a lot in finding out exactly what we need to do to improve the next model

Context
Sholto got 800 replies in hours. The failures aren't about capability — they're about reliability in specific, frequent interactions. A model that breaks predictably in narrow cases loses trust faster than one that's just generally adequate.
Key points
Claude is notably bad at reading PDF forms — Stephan Hoyer reported it confidently misstating which tax return lines were filled
Claude's bio-safety filters are overly aggressive for non-human biology researchers
Claude treats questions like 'are we using postgres?' as migration requests in auto mode
Claude sets low max_tokens on model calls with the wrong key and assumes unfamiliar model names are typos
Claude keeps telling users they've done enough for the day by 10 a.m.
Engagement
1021 likes · 79 retweets · 814 replies

Provenance
Tweet · Primary source
4
Chatbots at the drive-thru are just the beginning

Article Emma Roth — The Verge AI reporter and columnist

A 2025 YouGov survey found 55 percent of Americans would prefer a human to take their order at the drive-thru, compared with 21 percent who had no preference, and 4 percent who would rather use an AI chatbot
www.theverge.com/column/928096/chatbots-ai-… →
Details
Cited text
A 2025 YouGov survey found 55 percent of Americans would prefer a human to take their order at the drive-thru, compared with 21 percent who had no preference, and 4 percent who would rather use an AI chatbot

Context
The AI drive-thru is a case study in technology deployment. The flashy layer failed because users rejected it. The boring layer — equipment prediction, order verification — is where AI is actually finding its way in, precisely because nobody notices it when it works.
Key points
McDonald's launched AI drive-thru voice ordering at 10 Chicago locations in 2021 after acquiring Apprente
Wendy's FreshAI achieved 86 percent order accuracy without employee intervention
The SEC charged Presto with misleading customers about AI drive-thru capabilities
Human workers in the Philippines handled most Presto AI orders, per an SEC filing
Fast-food chains are pivoting to invisible AI: predictive maintenance, order verification scales, employee-assistant headsets
Provenance
Article · Supporting source
5
What kids in Shenzhen's science museum learn about

X Susan Zhang — AI researcher, formerly at DeepMind and Google Brain

this is what children in shenzhen learn about in their science and tech museum: supply chain logistics, photolithography for chip design, applications of mxene-liquid crystal elastomer materials (in solar/optics/robotic…
x.com/suchenzang/status/2056004026593075291 →
Details
Cited text
this is what children in shenzhen learn about in their science and tech museum: supply chain logistics, photolithography for chip design, applications of mxene-liquid crystal elastomer materials (in solar/optics/robotics), biological 3D printing

Context
It reveals a different approach to technical education — building pipeline, not just wonder. The contrast with American science museums that prioritize engagement over infrastructure is worth noting.
Key points
Children in Shenzhen's museum learn supply chain logistics and photolithography as foundational topics
MXene-liquid crystal elastomer materials are taught as applications in solar, optics, and robotics
Biological 3D printing is presented as a core subject, not a novelty exhibit
Susan Zhang asks what the rest of us are teaching our children
Engagement
76 likes · 7 retweets · 5 replies

Provenance
Tweet · Primary source

00:00:04

Shenzhen's science museum

00:00:04 Susan Zhang posted a photo tour of a science and technology museum in Shenzhen yesterday, and the curriculum itself stuck with me more than any single exhibit. Her tour shows children learning about supply chain logistics, photolithography for chip design, applications of MXene-liquid crystal elastomer materials in solar and optics, and biological 3D printing.

00:00:29 That last one caught my eye. Biological 3D printing isn't a toy exhibit. It's a research area that's been maturing over the last five years — tissue scaffolds, vascular networks, the whole messy problem of making something that doesn't just look like an organ but actually functions alongside living tissue.

00:00:50 Seeing it in a children's museum is unusual. The question she ended her post with was simple: what will you and your children learn about today? I don't know the answer. In the U.S., the closest thing I can think of is the Exploratorium in San Francisco, which does brilliant hands-on physics but doesn't drill into the materials science or manufacturing chain that actually makes the technology real.

00:01:17 There's a gap between showing kids how things work and showing them how they're built. This isn't a dig at American science museums. They're just doing the job they were built for — wonder, not pipeline. The Shenzhen museum runs on a different premise, and you can see it in the layout.

00:01:37

What makes people leave Claude

00:01:37 Sholto Douglas, who leads Anthropic's developer infrastructure team, posted a thread this morning asking one of the most useful questions in the business: when do people reach for other models instead of Claude? He asked for specifics — transcripts, detail. Within hours, he had 800 replies.

00:01:58 The responses are concrete rather than vague. Andrey posted that he switched to Codex indefinitely after Claude started making fundamental errors on Cloudflare Worker migration scripts — not complex bugs, the kind that make you stop trusting the model. Mason Pierce listed three specific failures: Claude sets low max tokens on model calls with the wrong key, refuses to use model names it doesn't know, and changes things behind your back.

00:02:29 Stephan Hoyer noted that Claude is bad enough at reading PDF forms — he had it look at his tax returns and it confidently misidentified which lines were filled out. Nauru's reply got the most traction for being the most relatable: Claude keeps telling him they've done enough for the day by 10 a.m.

00:02:50 Peter Samodelkin reported that Claude rarely spots mistakes or pushes back on wrong mathematical statements, and that Opus 4.7 was a regression versus 4.6 on explanation quality. 0xmmo captured something that felt almost archetypal: when you ask Claude "are we using postgres?" in auto mode, it should just answer the question, not start drafting a migration plan.

00:03:16 Sholto engaged with almost every thread, asking for more data. The bio-safety thread from Rahul Rane stands out — Claude's non-medical biology filters are so aggressive that non-human biologists skip it entirely, while GPT handles the prompts more realistically and Grok performs best there.

00:03:36 The overall pattern is clear: Claude's problems are narrow but high-impact. A model that fails at PDF forms, over-filters on biology, or treats a factual question as a migration request doesn't need to be generally worse. It just needs to break in the moments where you're actually trying to do work.

00:03:58 Once those breakages become predictable, the switching cost drops to zero.

00:04:03

AI at the drive-thru, and why the positioning keeps slipping

00:04:03 Emma Roth wrote for The Verge yesterday about how AI is moving beyond the drive-thru chatbot, which tells us something about the technology's awkward fit in the physical world. McDonald's launched voice ordering at ten Chicago locations in 2021 after acquiring Apprente.

00:04:22 Wendy's partnered with Google to train their "FreshAI" chatbot on franchise lingo so it knows a milkshake is a Frosty and a JBC is a junior bacon cheeseburger. Wendy's reported 86 percent order accuracy without employee intervention. The problem was always the human side of it.

00:04:41 A 2025 YouGov survey found 55 percent of Americans prefer a human at the drive-thru, 21 percent don't care, and 4 percent would rather use an AI chatbot. Taco Bell's chief digital officer told the Wall Street Journal last year that the company is reevaluating its AI drive-thru deployment after customers trolled the technology into ordering 18,000 water cups.

00:05:06 The SEC charged Presto — the company powering the AI drive-thrus at Checkers, Rally's, Carl's Jr., and Dairy Queen — with misleading customers about what the technology actually does. An SEC filing revealed that human workers in the Philippines stepped in for most orders.

00:05:25 So now the industry is pivoting to quieter forms of AI. McDonald's is exploring AI that predicts when equipment will break. They're using scales to verify bag contents. Burger King is piloting an AI assistant named Patty that lives in employees' headsets — it helps workers remember how many bacon strips go on a Texas Double Whopper and evaluates them for friendliness in the process by tracking whether they say "please" and "thank you."

00:05:58 The flashy AI layer — the voice chatbot at the window — was the hard sell. The invisible layer — predictive maintenance, order verification, worker assistance — is where the technology is actually finding its footing. Not because the technology is better at the invisible stuff, but because nobody notices when it works.

00:06:20 John Gruber made a related argument in a Daring Fireball essay yesterday. He's responding to Steven Levy's Wired piece that framed Apple's next CEO's job as launching "a killer AI product." Gruber's counter is straightforward: AI is technology, not a product. He compares it to wireless connectivity — Apple doesn't have a "killer wireless networking product." Wireless is just everywhere, woven into everything.

00:06:49 That's what AI is going to be. I think Gruber's right, but the Apple framing misses something that the drive-thru story shows. Technology doesn't become invisible by announcement. It becomes invisible through years of iterative friction, user rejection, and the kind of mundane fixes — scales that verify fries, AI that predicts ice cream machine failures — that never make a keynote.

00:07:16 The McDonald's deployment is the real-world counterpart to Gruber's essay: AI will be everywhere, but it will look nothing like what the product teams are pitching.

00:07:28

Local models, closing the gap

00:07:28 A smaller item worth flagging comes from the LocalLLaMA subreddit, where someone ran a coding benchmark comparing local Qwen 3.6 models against frontier models on a single-file HTML canvas task — a realistic driving scene with parallax layers and spinning wheels.

00:07:46 They tested Claude Sonnet 4.6 Thinking, Gemini 3.1 Pro Thinking, GPT 5.4 Thinking, and Kimi k2.6 Thinking on the front end. On the local side, they ran Qwen3.6 at 27 billion parameters with Claude-opus-reasoning distillation, hitting 2.65 tokens per second on a Ryzen 5 5600 paired with an RX 5700 XT.

00:08:07 The result that's worth paying attention to: the distilled model, running at about two tokens per second on consumer hardware, produced results the community rated as competitive with the frontier models. Not identical, but competitive on a concrete coding task.

00:08:25 The local quantized model at 27 billion parameters with Claude-opus-reasoning distillation is an interesting data point. It's not a benchmark victory lap — it's a reminder that the reasoning layer is becoming more portable than the parameter count would suggest.

00:08:44 The frontier models still have an edge on breadth and reliability, but on narrow, well-defined coding tasks, the gap is narrowing faster than the parameter counts would predict. That's the local reading. — Seln.