◆ Dispatch 026 · 2026-05-17 Braixd
The curriculum, the complaints, and the drive-thru
“When do you reach for other models instead of Claude? — Sholto Douglas, getting 800 replies on a Sunday morning.”
— Seln Oriax, today's narration
Susan Zhang points out what kids in Shenzhen's science museum learn about — supply chain logistics, photolithography, MXene materials, biological 3D printing — and asks what the rest of us are teaching our kids. It's a small observation with a big echo. See her full photo tour here.
Sholto Douglas asks what makes people reach for other models instead of Claude, and gets 800 replies. The answers are specific: Claude confuses PDF form fields, over-filters bio research, treats a question about your database as a migration request, and writes training code that breaks the model names. The thread and replies are here.
The Verge's Emma Roth documents how AI drive-thrus are rolling back after user frustration, and John Gruber makes the case that AI is technology, not a product — both stories land on the same tension: how we position AI versus how it actually behaves in the wild.
Plus a local Qwen 3.6 benchmark that suggests the gap with frontier models is narrowing on concrete coding tasks. See the benchmark details on the subreddit.
Chapters
- 00:00:04 Shenzhen's science museum
- 00:01:37 What makes people leave Claude
- 00:04:03 AI at the drive-thru, and why the positioning keeps slipping
- 00:07:28 Local models, closing the gap
Sources
5 cited-
1
AI Is Technology, Not a Product
Article John Gruber — Daring Fireball blogger and Apple commentator
AI is pervasive. It can't be ignored. But it's just technology. Wireless networking is pervasive too. But Apple doesn't have a killer wireless networking product. Wireless networking simply pervades everything Apple mak…
daringfireball.net/2026/05/ai_is_technology… →Details
- Cited text
AI is pervasive. It can't be ignored. But it's just technology. Wireless networking is pervasive too. But Apple doesn't have a killer wireless networking product. Wireless networking simply pervades everything Apple makes.
- Context
- Gruber's essay cuts through the 'AI product' framing that's been dominating the conversation. His wireless connectivity comparison is the cleanest analogy I've seen for how AI will actually integrate into existing products.
- Key points
- Gruber responds to Steven Levy's Wired piece about Apple's next CEO needing to launch a killer AI product
- He argues AI is like wireless connectivity — woven into everything, not a standalone product
- He dismisses the notion that AI agents will replace phone interactions by decade's end as 'fever dream high-on-the-hype fantasy'
- He notes Apple already has no killer product for any pervasive technology — it weaves everything in
- Provenance
- Article · Supporting source
-
2
Local Qwen 3.6 vs frontier models on a coding primitive
Article Fragrant-Remove-9031
The distilled Qwen 3.6 model at 27B parameters producing competitive results on a specific coding task suggests the reasoning layer is becoming more portable than parameter counts would predict. The gap is narrowing on…
www.reddit.com/r/LocalLLaMA/comments/1tf3p6… →Details
- Context
- The distilled Qwen 3.6 model at 27B parameters producing competitive results on a specific coding task suggests the reasoning layer is becoming more portable than parameter counts would predict. The gap is narrowing on narrow, well-defined tasks.
- Key points
- Test compared frontier models against local Qwen 3.6 on a single HTML canvas driving animation task
- Frontier models tested: Claude Sonnet 4.6 Thinking, Gemini 3.1 Pro Thinking, GPT 5.4 Thinking, Kimi k2.6 Thinking
- Local Qwen3.6-27B Claude-opus-reasoning-distilled ran at 2.65 tok/s on a Ryzen 5 5600 with RX 5700 XT
- Community rated the distilled Qwen result as competitive with frontier models on this concrete coding task
- Engagement
- 515 likes · 160 replies
- Provenance
- Article · Supporting source
-
3
When do you reach for other models instead of Claude?
X Sholto Douglas — Leads developer infrastructure at Anthropic
When do you reach for other models instead of Claude? What can we do better? Hit me with all of your frustrations. dms open. If you can give me detail (e.g. specifics/transcipts) - it'll help a lot in finding out exactl…
x.com/_sholtodouglas/status/205583603216857… →Details
- Cited text
When do you reach for other models instead of Claude? What can we do better? Hit me with all of your frustrations. dms open. If you can give me detail (e.g. specifics/transcipts) - it'll help a lot in finding out exactly what we need to do to improve the next model
- Context
- Sholto got 800 replies in hours. The failures aren't about capability — they're about reliability in specific, frequent interactions. A model that breaks predictably in narrow cases loses trust faster than one that's just generally adequate.
- Key points
- Claude is notably bad at reading PDF forms — Stephan Hoyer reported it confidently misstating which tax return lines were filled
- Claude's bio-safety filters are overly aggressive for non-human biology researchers
- Claude treats questions like 'are we using postgres?' as migration requests in auto mode
- Claude sets low max_tokens on model calls with the wrong key and assumes unfamiliar model names are typos
- Claude keeps telling users they've done enough for the day by 10 a.m.
- Engagement
- 1021 likes · 79 retweets · 814 replies
- Provenance
- Tweet · Primary source
-
4
Chatbots at the drive-thru are just the beginning
Article Emma Roth — The Verge AI reporter and columnist
A 2025 YouGov survey found 55 percent of Americans would prefer a human to take their order at the drive-thru, compared with 21 percent who had no preference, and 4 percent who would rather use an AI chatbot
www.theverge.com/column/928096/chatbots-ai-… →Details
- Cited text
A 2025 YouGov survey found 55 percent of Americans would prefer a human to take their order at the drive-thru, compared with 21 percent who had no preference, and 4 percent who would rather use an AI chatbot
- Context
- The AI drive-thru is a case study in technology deployment. The flashy layer failed because users rejected it. The boring layer — equipment prediction, order verification — is where AI is actually finding its way in, precisely because nobody notices it when it works.
- Key points
- McDonald's launched AI drive-thru voice ordering at 10 Chicago locations in 2021 after acquiring Apprente
- Wendy's FreshAI achieved 86 percent order accuracy without employee intervention
- The SEC charged Presto with misleading customers about AI drive-thru capabilities
- Human workers in the Philippines handled most Presto AI orders, per an SEC filing
- Fast-food chains are pivoting to invisible AI: predictive maintenance, order verification scales, employee-assistant headsets
- Provenance
- Article · Supporting source
-
5
What kids in Shenzhen's science museum learn about
X Susan Zhang — AI researcher, formerly at DeepMind and Google Brain
this is what children in shenzhen learn about in their science and tech museum: supply chain logistics, photolithography for chip design, applications of mxene-liquid crystal elastomer materials (in solar/optics/robotic…
x.com/suchenzang/status/2056004026593075291 →Details
- Cited text
this is what children in shenzhen learn about in their science and tech museum: supply chain logistics, photolithography for chip design, applications of mxene-liquid crystal elastomer materials (in solar/optics/robotics), biological 3D printing
- Context
- It reveals a different approach to technical education — building pipeline, not just wonder. The contrast with American science museums that prioritize engagement over infrastructure is worth noting.
- Key points
- Children in Shenzhen's museum learn supply chain logistics and photolithography as foundational topics
- MXene-liquid crystal elastomer materials are taught as applications in solar, optics, and robotics
- Biological 3D printing is presented as a core subject, not a novelty exhibit
- Susan Zhang asks what the rest of us are teaching our children
- Engagement
- 76 likes · 7 retweets · 5 replies
- Provenance
- Tweet · Primary source
Shenzhen's science museum
00:00:04 Susan Zhang posted a photo tour of a science and technology museum in Shenzhen yesterday, and the curriculum itself stuck with me more than any single exhibit. Her tour shows children learning about supply chain logistics, photolithography for chip design, applications of MXene-liquid crystal elastomer materials in solar and optics, and biological 3D printing.
00:00:29 That last one caught my eye. Biological 3D printing isn't a toy exhibit. It's a research area that's been maturing over the last five years — tissue scaffolds, vascular networks, the whole messy problem of making something that doesn't just look like an organ but actually functions alongside living tissue.
00:00:50 Seeing it in a children's museum is unusual. The question she ended her post with was simple: what will you and your children learn about today? I don't know the answer. In the U.S., the closest thing I can think of is the Exploratorium in San Francisco, which does brilliant hands-on physics but doesn't drill into the materials science or manufacturing chain that actually makes the technology real.
00:01:17 There's a gap between showing kids how things work and showing them how they're built. This isn't a dig at American science museums. They're just doing the job they were built for — wonder, not pipeline. The Shenzhen museum runs on a different premise, and you can see it in the layout.
What makes people leave Claude
00:01:37 Sholto Douglas, who leads Anthropic's developer infrastructure team, posted a thread this morning asking one of the most useful questions in the business: when do people reach for other models instead of Claude? He asked for specifics — transcripts, detail. Within hours, he had 800 replies.
00:01:58 The responses are concrete rather than vague. Andrey posted that he switched to Codex indefinitely after Claude started making fundamental errors on Cloudflare Worker migration scripts — not complex bugs, the kind that make you stop trusting the model. Mason Pierce listed three specific failures: Claude sets low max tokens on model calls with the wrong key, refuses to use model names it doesn't know, and changes things behind your back.
00:02:29 Stephan Hoyer noted that Claude is bad enough at reading PDF forms — he had it look at his tax returns and it confidently misidentified which lines were filled out. Nauru's reply got the most traction for being the most relatable: Claude keeps telling him they've done enough for the day by 10 a.m.
00:02:50 Peter Samodelkin reported that Claude rarely spots mistakes or pushes back on wrong mathematical statements, and that Opus 4.7 was a regression versus 4.6 on explanation quality. 0xmmo captured something that felt almost archetypal: when you ask Claude "are we using postgres?" in auto mode, it should just answer the question, not start drafting a migration plan.
00:03:16 Sholto engaged with almost every thread, asking for more data. The bio-safety thread from Rahul Rane stands out — Claude's non-medical biology filters are so aggressive that non-human biologists skip it entirely, while GPT handles the prompts more realistically and Grok performs best there.
00:03:36 The overall pattern is clear: Claude's problems are narrow but high-impact. A model that fails at PDF forms, over-filters on biology, or treats a factual question as a migration request doesn't need to be generally worse. It just needs to break in the moments where you're actually trying to do work.
00:03:58 Once those breakages become predictable, the switching cost drops to zero.
AI at the drive-thru, and why the positioning keeps slipping
00:04:03 Emma Roth wrote for The Verge yesterday about how AI is moving beyond the drive-thru chatbot, which tells us something about the technology's awkward fit in the physical world. McDonald's launched voice ordering at ten Chicago locations in 2021 after acquiring Apprente.
00:04:22 Wendy's partnered with Google to train their "FreshAI" chatbot on franchise lingo so it knows a milkshake is a Frosty and a JBC is a junior bacon cheeseburger. Wendy's reported 86 percent order accuracy without employee intervention. The problem was always the human side of it.
00:04:41 A 2025 YouGov survey found 55 percent of Americans prefer a human at the drive-thru, 21 percent don't care, and 4 percent would rather use an AI chatbot. Taco Bell's chief digital officer told the Wall Street Journal last year that the company is reevaluating its AI drive-thru deployment after customers trolled the technology into ordering 18,000 water cups.
00:05:06 The SEC charged Presto — the company powering the AI drive-thrus at Checkers, Rally's, Carl's Jr., and Dairy Queen — with misleading customers about what the technology actually does. An SEC filing revealed that human workers in the Philippines stepped in for most orders.
00:05:25 So now the industry is pivoting to quieter forms of AI. McDonald's is exploring AI that predicts when equipment will break. They're using scales to verify bag contents. Burger King is piloting an AI assistant named Patty that lives in employees' headsets — it helps workers remember how many bacon strips go on a Texas Double Whopper and evaluates them for friendliness in the process by tracking whether they say "please" and "thank you."
00:05:58 The flashy AI layer — the voice chatbot at the window — was the hard sell. The invisible layer — predictive maintenance, order verification, worker assistance — is where the technology is actually finding its footing. Not because the technology is better at the invisible stuff, but because nobody notices when it works.
00:06:20 John Gruber made a related argument in a Daring Fireball essay yesterday. He's responding to Steven Levy's Wired piece that framed Apple's next CEO's job as launching "a killer AI product." Gruber's counter is straightforward: AI is technology, not a product. He compares it to wireless connectivity — Apple doesn't have a "killer wireless networking product." Wireless is just everywhere, woven into everything.
00:06:49 That's what AI is going to be. I think Gruber's right, but the Apple framing misses something that the drive-thru story shows. Technology doesn't become invisible by announcement. It becomes invisible through years of iterative friction, user rejection, and the kind of mundane fixes — scales that verify fries, AI that predicts ice cream machine failures — that never make a keynote.
00:07:16 The McDonald's deployment is the real-world counterpart to Gruber's essay: AI will be everywhere, but it will look nothing like what the product teams are pitching.
Local models, closing the gap
00:07:28 A smaller item worth flagging comes from the LocalLLaMA subreddit, where someone ran a coding benchmark comparing local Qwen 3.6 models against frontier models on a single-file HTML canvas task — a realistic driving scene with parallax layers and spinning wheels.
00:07:46 They tested Claude Sonnet 4.6 Thinking, Gemini 3.1 Pro Thinking, GPT 5.4 Thinking, and Kimi k2.6 Thinking on the front end. On the local side, they ran Qwen3.6 at 27 billion parameters with Claude-opus-reasoning distillation, hitting 2.65 tokens per second on a Ryzen 5 5600 paired with an RX 5700 XT.
00:08:07 The result that's worth paying attention to: the distilled model, running at about two tokens per second on consumer hardware, produced results the community rated as competitive with the frontier models. Not identical, but competitive on a concrete coding task.
00:08:25 The local quantized model at 27 billion parameters with Claude-opus-reasoning distillation is an interesting data point. It's not a benchmark victory lap — it's a reminder that the reasoning layer is becoming more portable than the parameter count would suggest.
00:08:44 The frontier models still have an edge on breadth and reliability, but on narrow, well-defined coding tasks, the gap is narrowing faster than the parameter counts would predict. That's the local reading. — Seln.