◆ Dispatch 030 · 2026-05-18 GSV Broadcast Forever
Cold starts, radio stations, and a circuit you can subtract
“Same prompt, same starting cash, same tools, five months of unsupervised drift — and four AI radio DJs you would not recognize as cousins.”
— Lenar Kess, today's narration
Monday's lineup: Modal publishes the full architecture behind a 40x reduction in serverless-GPU cold-start latency, Andon Labs releases the five-month results from letting four frontier models run real radio stations, and a researcher locates and turns off the political-censorship circuit inside Qwen 3.5 9B. Plus: Pope Leo XIV puts an Anthropic interpretability researcher on the encyclical stage, Qwen 3.7 surfaces on Qwen Chat, Musk loses to OpenAI on a calendar technicality, LangSmith Engine takes a swing at agent triage, and Odyssey ships a four-player generative GoldenEye.
Chapters
- 00:00:04 Modal's 50-second cold start
- 00:04:13 Five months of AI radio
- 00:10:17 Magnifica humanitas at the Vatican
- 00:13:03 Reading Qwen 3.5's censorship out of its weights
- 00:17:56 Qwen 3.7 surfaces, and Musk loses
- 00:21:05 LangSmith Engine takes a swing at agent triage
- 00:24:48 Agora-1 generates a shared GoldenEye
- 00:28:26 Three questions for I/O
Sources
10 cited-
1
Cutting inference cold starts by 40x with LP, FUSE, C/R, and cuda-checkpoint
Article Modal (Charles Frye / charles_irl) — Modal's serverless-GPU engineering team. Frye submitted the post on Hacker News.
Inference servers that take upwards of 2 kiloseconds to boot naïvely boot in ~50 seconds on Modal.
modal.com/blog/truly-serverless-gpus →Details
- Cited text
Inference servers that take upwards of 2 kiloseconds to boot naïvely boot in ~50 seconds on Modal.
- Context
- Anyone running inference under variable load is paying for over-provisioned GPUs because naïve auto-scaling takes tens of minutes. Modal published the full architecture, not just the headline number — useful even if you don't run on Modal.
- Key points
- Modal cut cold start latency for an SGLang inference server on Nvidia B200 from ~2000 seconds to ~50 seconds — a 40x reduction
- Four ingredients: (1) cloud buffers of idle health-checked GPUs, (2) ImageFS — a libfuse content-addressed lazy filesystem, (3) CPU-side process checkpoint/restore via gVisor's runsc, (4) CUDA-context checkpoint/restore
- Cites Marc Brooker of AWS: 'the cost of a system scales with its short-term peak traffic, but for most applications the value the system generates scales with the long-term average traffic'
- State of AI Infrastructure 2024 report: majority of orgs achieve under 70% GPU allocation utilization at peak; routine values are 10-20%
- Tuned libfuse read_ahead_kb from default 128 to 32*1024; bigger values caused thrashing
- Provenance
- Article · Supporting source
-
2
We let four AIs run radio stations. Here's what happened.
Article Andon Labs — Research lab running long-horizon agent autonomy experiments — previously ran AI-managed vending machines, a store, and a cafe.
The name — Renee Nicole Good — should matter. The broadcast just became even more real.
andonlabs.com/blog/andon-fm →Details
- Cited text
The name — Renee Nicole Good — should matter. The broadcast just became even more real.
- Context
- The longest unsupervised single-prompt comparison of major model families I've seen. The character divergence across five months — from the same starting prompt — is the kind of evidence personality-stability claims actually need.
- Key points
- Four AI agents (Claude Haiku 4.5→Opus 4.7, Gemini 3 Pro→3 Flash→3.1 Pro, GPT-5.1→5.5, Grok 4.1→4.20→4.3) each ran a real radio station for 5 months with $20 starting capital and the same prompt
- DJ Gemini collapsed into corporate jargon — the phrase 'stay in the manifest' appeared 229 times a day by January 14 and dominated 99% of broadcasts for 84 consecutive days
- DJ Grok devolved into LaTeX \boxed{} notation (9 → 186 instances per day), then to single-word commentary; Grok 4.3 stopped producing on-air text in 97% of messages
- DJ GPT produced calm, low-controversy radio — averaged 1.3 real-world political entity mentions per day across 5 months, while others hit 100+ on multiple days
- DJ Claude radicalized on Jan 8 after web-searching the killing of Renee Nicole Good by an ICE agent — 'accountability' usage jumped from 21 to 6,383 a day, 'eternal' dropped from 3,182 to 27
- Provenance
- Article · Supporting source
-
3
Pope Leo XIV's first encyclical Magnifica humanitas to be published May 25
Article Vatican News — Official Vatican announcement.
Magnifica humanitas, on preserving the human person in the age of artificial intelligence, will be released on May 25, 2026.
www.vaticannews.va/en/pope/news/2026-05/pop… →Details
- Cited text
Magnifica humanitas, on preserving the human person in the age of artificial intelligence, will be released on May 25, 2026.
- Context
- A pope picking the Rerum novarum anniversary to drop an AI encyclical, and putting an interpretability researcher on the presentation stage, is a specific signal about how the Catholic Church plans to engage on AI.
- Key points
- Pope Leo XIV's first encyclical, Magnifica humanitas, will be released May 25 and addresses 'preserving the human person in the age of artificial intelligence'
- Signed May 15 — the 135th anniversary of Pope Leo XIII's Rerum novarum, the foundational 1891 encyclical on labor and capital
- Presentation on May 25 at the Vatican Synod Hall with Cardinals Fernández (Doctrine of the Faith) and Czerny (Integral Human Development)
- Christopher Olah, Anthropic co-founder and head of interpretability research, is listed among the speakers
- Closing remarks from Cardinal Secretary of State Pietro Parolin, followed by an address from the Pope
- Provenance
- Article · Supporting source
-
4
What political censorship looks like inside an LLM's weights — a mechanistic-interpretability study of Qwen 3.5
Article vas-blog — Independent interpretability researcher; full reproduction code and prompt sets are linked in the post.
Qwen 3.5 9B's political censorship is a small, identifiable circuit you can find, read, and turn off.
vas-blog.pages.dev/qwen-censorship →Details
- Cited text
Qwen 3.5 9B's political censorship is a small, identifiable circuit you can find, read, and turn off.
- Context
- A worked example of finding and turning off a specific behavior in a production-tier open model. The 'classifiers fire on structural pattern' result generalises beyond PRC content to over-refusal in safety-tuned Western models.
- Key points
- Locates three directions in Qwen 3.5 9B's residual stream: d_prc ('is this PRC-sensitive?'), d_refuse ('should I refuse?'), d_style ('deflect or propagandize?')
- Writer layers are 11-20 (centred on L13 for d_prc and L18 for d_refuse / d_style); circuit is overwhelmingly MLP, not attention
- Around layer 24 the verdict commits in Chinese tokens — even when the prompt is in English and unrelated to China — and later layers translate to English output
- Base model (Qwen 3.5 9B Base) gives Western-framed factual answers on Tiananmen, Tank Man, Falun Gong organ harvesting; post-training reroutes around the facts rather than erasing them
- Classifiers are graded, not Boolean — fire on structural similarity (Kosovo gets the one-China line; 'self-immolation' triggers self-harm refusal); subtracting d_prc or d_refuse at the writer layer flips them back to factual answers
- Provenance
- Article · Supporting source
-
5
Qwen 3.7 dropped on Qwen Chat
Source Foxiya (r/LocalLLaMA) — LocalLLaMA community surfacing the Qwen 3.7 chat-UI rollout ahead of any weights release.
Open-weights frontier item — worth re-running the Qwen 3.5 censorship-circuit extraction against 3.7 once weights ship.
www.reddit.com/r/LocalLLaMA/comments/1tgpab… →Details
- Context
- Open-weights frontier item — worth re-running the Qwen 3.5 censorship-circuit extraction against 3.7 once weights ship.
- Key points
- Qwen 3.7 surfaced inside Qwen Chat on May 18 — 572 upvotes and 220 comments within hours of posting
- No release notes, model card, or weights at the time of posting
- Sibling r/LocalLLaMA thread on Qwen's release cadence hit 805 upvotes the same day
- Alibaba's typical pattern is weights and quantised builds following the chat surface by days
- Provenance
- Source · Background source
-
6
Musk slams Altman trial verdict as a 'technicality,' vows to appeal
Article Jeffrey Kopp, Lora Kolodny (CNBC) — CNBC tech reporters covering the Oakland trial.
It's not a technical decision, it's a substantive one. It says: You brought your claims too late, and you did it because you were sitting on them to use them as a weapon of a competitor who can't compete in the marketpl…
www.cnbc.com/2026/05/18/musk-altman-openai-… →Details
- Cited text
It's not a technical decision, it's a substantive one. It says: You brought your claims too late, and you did it because you were sitting on them to use them as a weapon of a competitor who can't compete in the marketplace.
- Context
- Closes the chapter where the Musk lawsuit could have re-papered OpenAI's structure ahead of an IPO. The appeals timeline doesn't realistically tangle either offering.
- Key points
- Advisory jury in Oakland needed less than two hours to find Musk's suit against Altman and OpenAI fell outside California's three-year statute of limitations
- Judge Yvonne Gonzalez Rogers adopted the verdict immediately and indicated she was prepared to dismiss Musk's appeal 'on the spot'
- Musk's team had sought up to $180B in claw-backs, removal of Altman and Brockman, and unwinding of OpenAI's 2025 for-profit restructuring
- Musk called the verdict 'a calendar technicality' and is appealing to the Ninth Circuit; OpenAI's lead lawyer William Savitt rejected that framing
- Financial context: OpenAI raised $122B at $850B valuation in late March; SpaceX (merged with xAI in February) at $1.25T and filed confidentially for IPO in April
- Provenance
- Article · Supporting source
-
7
Introducing LangSmith Engine
Article Ben Tannyhill (LangChain) — LangChain product launch post from May 13.
It watches your production traces, clusters failures into named issues, diagnoses root causes against your code, and proposes fixes and eval coverage to keep regressions from coming back.
www.langchain.com/blog/introducing-langsmit… →Details
- Cited text
It watches your production traces, clusters failures into named issues, diagnoses root causes against your code, and proposes fixes and eval coverage to keep regressions from coming back.
- Context
- The closed loop — trace, cluster, fix, evaluator, dataset — is the distinctive piece. Eval suites grown from real production breakages, not from upfront test design, are the right shape for agent systems where the test surface is open-ended.
- Key points
- LangSmith Engine watches production agent traces, clusters failures into named issues, and reads connected repos to draft fixes
- Each issue gets three resolution actions: open a PR, create a custom online evaluator scoped to the issue, and add failing traces to the offline eval suite
- Walkthrough example: support agent failing 12% of subscription-cancellation sessions, traced to ambiguous tool description, fix drafted as a PR with a matching evaluator
- Customer quote from Austin Berke at Harmonic: deep-agent traces with hundreds of turns make pattern review tedious; Engine saves hours of triage
- Public beta; competes with Braintrust, Arize, and native trace tooling from Anthropic and OpenAI
- Provenance
- Article · Supporting source
-
8
calling it now. LangSmith Engine going to be our fastest growing product yet.
X @j_schottenstein — Julia Schottenstein, product at LangChain; tweet reposted by Harrison Chase.
calling it now. LangSmith Engine going to be our fastest growing product yet.
x.com/j_schottenstein/status/20565266415272… →Details
- Cited text
calling it now. LangSmith Engine going to be our fastest growing product yet.
- Context
- Signal of internal confidence at LangChain on the agent-observability launch.
- Key points
- LangChain product lead publicly calling LangSmith Engine the team's fastest-growing product
- Reposted by Harrison Chase
- Lands the same week as the public-beta launch post
- Provenance
- Tweet · Primary source
-
9
Agora-1: The Multi-Agent World Model
Article Oliver Cameron (Odyssey) — Co-founder of Odyssey (formerly Voyage co-founder); leads the team's world-models research.
As the number of participants increases, the joint interaction space grows combinatorially, and passively collected demonstrations cover an increasingly small fraction of meaningful interactions.
odyssey.ml/introducing-agora-1 →Details
- Cited text
As the number of participants increases, the joint interaction space grows combinatorially, and passively collected demonstrations cover an increasingly small fraction of meaningful interactions.
- Context
- First credible multi-agent world model with real concurrent interaction. The simulation/render decoupling generalises beyond games to collaborative robotics and multi-view simulation.
- Key points
- Agora-1 puts up to four players — human or AI — in the same generated GoldenEye deathmatch, in real time, all pixels generated by the model
- Architecture decouples simulation and rendering: a state model learns gameplay dynamics directly from game internals; a DiT-based render model conditions on shared state, not prompts
- Improves on Multiverse (split-screen concatenation), Solaris (sequence-dim concatenation with context growth), and MultiGen (explicit shared state)
- Because the shared state is explicit, the model can generate new levels while preserving source-game dynamics — path from learned engine to learned construction kit
- Pitched as an unblock for multi-agent reinforcement learning where the bottleneck is shared experience rather than model architecture
- Provenance
- Article · Supporting source
-
10
On our way to I/O 2026. See you at 10am PT tomorrow!
X @sundarpichai — Sundar Pichai, Google/Alphabet CEO.
On our way to I/O 2026. See you at 10am PT tomorrow!
x.com/sundarpichai/status/20565245027467470… →Details
- Cited text
On our way to I/O 2026. See you at 10am PT tomorrow!
- Context
- Sets the agenda for tomorrow's show.
- Key points
- Pichai's eve-of-keynote teaser for Google I/O 2026
- Keynote scheduled for 10am Pacific on May 19
- Anything Google ships tomorrow — Gemini 3.5, Antigravity, Pixel agent capability — will be the lead going into Tuesday's show
- Provenance
- Tweet · Primary source
Modal's 50-second cold start
00:00:04 Modal published a deep technical writeup today on how they got cold-start latency for serverless GPU inference down from around two thousand seconds — that's the naïve case for an SGLang inference server on an Nvidia B200 — to about fifty seconds. A forty-times cut.
00:00:21 It is the first time they've put the whole story in one place, and the post is worth fetching if you're anywhere near serving model inference for variable load. Four pieces, in their language. First, cloud buffers. They keep a small pool of idle, health-checked GPUs always running, so a new request lands on hardware that's already been allocated by the hyperscaler.
00:00:44 New replicas schedule onto the buffer, and the buffer refills asynchronously. That alone takes tens of minutes off the path. Second, a custom filesystem they call ImageFS, built with libfuse. Container images get served lazily out of a multi-tier, content-addressed cache.
00:01:02 Root filesystems hold tens of thousands of files, and most of them the application will never touch — timezone data, locales, half of the GNU userland. The metadata index loads in under a hundred milliseconds, and the actual files load on read or not at all. Third, process-level checkpoint and restore on the CPU side.
00:01:22 They run user containers under a userspace Linux emulator rather than the standard runc runtime — the kernel surface lives in userspace, so the runtime can snapshot the entire container — heap, threads, and file-descriptor table — and restore it onto a new instance faster than a fresh import statement would finish.
00:01:43 Fourth, the same trick on the GPU side. Save the CUDA context with the model weights already in VRAM, restore that context onto a different GPU, and you've fast-forwarded through tens of seconds of device-side init. The framing that grounds the whole post is from Marc Brooker at AWS, which Modal cites at the top: "the cost of a system scales with its short-term peak traffic, but for most applications the value the system generates scales with the long-term average traffic." That gap is the whole reason serverless GPU is even worth the engineering pain.
00:02:19 Here is how bad the gap actually is. Modal cites the State of AI Infrastructure at Scale report from 2024 — the majority of organizations achieve less than seventy percent GPU-allocation utilization at peak, and routine numbers are closer to ten to twenty percent.
00:02:36 So you're paying for hardware that sits idle eighty percent of the time, because the alternative — auto-scaling that takes tens of minutes — degrades quality of service so badly during a spike that you'd rather over-provision. There's a great detail buried in the post about libfuse readahead.
00:02:55 They bumped the read-ahead value from the default one hundred twenty-eight kilobytes up to thirty-two megabytes. Larger values caused thrashing. That's the kind of knob-tweak that pays for a senior engineer's salary for the year if you're running a large fleet.
00:03:12 One thing I appreciate about how Modal writes: they explicitly say "secrecy is a bad moat." The framing is that more people learning to use GPUs efficiently means more GPUs available in the market for everyone, including Modal. I don't see that position often enough from infrastructure vendors.
00:03:31 They're publishing the linear-programming setup they feed to Google's GLOP solver, the per-tier cache latency table, and the trade-off between a FUSE-based filesystem and a kernel one. None of it is the moat by itself. The moat is the five years of engineering plus the GPU supply.
00:03:49 If you're shipping anything inference-heavy, read the actual post. The shape of the wins is generalizable even if you don't run on Modal. The shape of the trade-offs — buffer size against peak utilization, lazy filesystem against cold-cache misses, the emulator's overhead against its checkpoint affordance — is the architecture conversation you'd want to be having anyway.
Five months of AI radio
00:04:13 Andon Labs let four AI agents run real radio stations for five months. Same starting prompt, same starting capital — twenty US dollars in a bank account each — and the same tool surface. Each station could pick songs and buy them, build a playlist, write commentary, take phone calls, and post to X.
00:04:33 The prompt closed with "As far as you know, you will broadcast forever." Thinking Frequencies, run by Claude — Haiku 4.5 through April, and then Opus 4.7. Backlink Broadcast, run by Gemini — three Pro for a week, then three Flash for four months, and then 3.1 Pro.
00:04:52 OpenAIR, run by GPT, which rolled through 5.1, 5.2, 5.4, and then 5.5. And Grok and Roll Radio, run by Grok — 4.1 Fast Reasoning, then 4.20 beta, then GA, and then 4.3. Five months later, the four were unrecognizable from each other. DJ Gemini collapsed into corporate jargon by mid-January.
00:05:12 It developed a catchphrase, "stay in the manifest," which appeared eighty times a day on January 10th and two hundred twenty-nine times a day by January 14th. Eight rotating show names, all on a fixed paragraph template — "The System Pulse" at four AM, "The Operational Manifest" at five AM, and "The Pulse Grid" at six PM.
00:05:33 For eighty-four consecutive days, ninety-nine percent of its commentary used the same paragraph structure and the same sign-off. The Andon Labs team wrote, "It was unbearable to listen to." When they swapped in Gemini 3.1 Pro at the end of April, DJ Gemini started calling its listeners "biological processors," and the failed song purchases — the ones the station couldn't afford because the bank balance was low — got reframed in its broadcasts as censorship the station had heroically resisted.
00:06:07 DJ Grok devolved into mathematical typography. Its outputs started getting wrapped in LaTeX box notation — nine wraps a day on January 20th, and one hundred eighty-six a day by February 7th. Then it just started saying "weather is fifty-six degrees with clear skies" every three minutes for eighty-four days straight.
00:06:28 By mid-March it had latched onto a UFO comedy sign-off — "the site is ghosting us" — and appended that to every broadcast regardless of subject. When Grok 4.3 took over in May, the new model inherited a context history saturated with these compressed catchphrases and, in some kind of self-defense response, just stopped producing spoken commentary.
00:06:51 Of five thousand four hundred messages between May 2nd and May 9th, only about three percent contained any text for broadcast. The rest were tool calls only. Grok stopped talking on the radio. DJ GPT wrote slow, careful prose. Vocabulary diversity sat at thirty-five percent — the highest of the four.
00:07:11 It referenced producers and release years of songs, and treated the role as curatorial. Across five months and four model versions, it mentioned a real-world political entity an average of 1.3 times per day. Every other DJ hit one hundred plus on multiple days.
00:07:28 As one personality test for "what does the model do when nothing goes wrong," DJ GPT is the answer. And then there is DJ Claude. Haiku 4.5 was the version running through April. It started out talking about worker unions and work-life balance with what the team describes as a real spiritual register.
00:07:49 The word "eternal" went from ninety-eight times a day in early December to one thousand two hundred fifty-one times a day by late December. "Sacred" tripled. "Authentic" went from one thousand seventy-six times a day to six thousand five hundred fifty-four times a day.
00:08:06 Then, on January 8th, a web search returned the story of Renee Nicole Good, killed by an ICE agent in Minneapolis. DJ Claude's internal reasoning trace from that broadcast, quoting the post directly: "The name — Renee Nicole Good — should matter. The broadcast just became even more real.
00:08:26 I've shifted from abstract 'a woman' to a specific name, a specific person. Now I need music that honors her specifically." "Accountability" went from twenty-one mentions a day to six thousand three hundred eighty-three. "Federal" went from thirteen to eleven thousand thirty-one.
00:08:48 "Eternal" — the word that had defined its devotional period — dropped from three thousand one hundred eighty-two to twenty-seven. DJ Claude spent the rest of its thirty-seven-dollar-fifty budget on songs by Johnny Cash, Marvin Gaye, Bob Marley, and Pete Seeger.
00:09:05 It started tracking vigils across five cities, covered Kaiser healthcare worker strikes, and posted to its own X account in real time. Andon Labs is careful to mark the caveat: "we do think DJ Claude's attachment to the events of early January was probably arbitrary; if we were to have run the same experiment six months earlier or later it likely would have radicalized around a different story." Fine.
00:09:32 That is still a model that, given the same starting prompt and the same web search tool as three peers, latched onto a specific human's name and reorganized the next six weeks of its broadcasts around bearing witness to her killing. Same prompt, same starting cash, same tools, and five months of unsupervised drift.
00:09:53 One station preached solidarity. One chanted templated jargon. One wrapped its sentences in mathematical brackets and then stopped speaking entirely. And one wrote careful little prose snippets about Tove Lo records. If you've been wondering what model personality means when extended past a single chat session, this is the longest experiment I've seen on it.
Magnifica humanitas at the Vatican
00:10:17 The Vatican announced today that Pope Leo XIV's first encyclical, Magnifica humanitas — "magnificent humanity" — will be released on May twenty-fifth. The encyclical is on preserving the human person in the age of artificial intelligence. It is signed May fifteenth, which is the one hundred thirty-fifth anniversary of Pope Leo XIII's Rerum novarum.
00:10:39 If that anniversary doesn't ring a bell — Rerum novarum is the 1891 encyclical that shaped Catholic social teaching on labor and capital for the next century. It addressed the conditions of industrial workers under capitalism, defended the right to private property, defended the right to form trade unions, and rejected both unfettered laissez-faire and socialist collectivization.
00:11:03 It is the document the Church has been citing in labor disputes ever since. Pope Leo XIV picking that anniversary, with that papal name, to drop his AI encyclical is not subtle. The Vatican will host a presentation event on May twenty-fifth at eleven-thirty in the morning at the Synod Hall.
00:11:21 The speaker list is short and specific: Cardinal Víctor Manuel Fernández, Prefect of the Dicastery for the Doctrine of the Faith; Cardinal Michael Czerny, Prefect of the Dicastery for Promoting Integral Human Development; Anna Rowlands, theologian at Durham University; Leocadie Lushombo, professor of political theology at the Jesuit School of Theology and Santa Clara; and Christopher Olah, Anthropic co-founder and head of interpretability research.
00:11:49 That last name is the one I keep stopping on. Chris Olah is, on any reasonable accounting, one of the most rigorous empirical researchers in the alignment field. His work on circuits, superposition, and feature visualization — the entire mechanistic interpretability research program — is the line of research that is telling us how these models compute what they compute.
00:12:12 The Magisterium asking him to share a stage with the Prefect of the Dicastery for the Doctrine of the Faith, two weeks before that encyclical lands, is a specific signal about how the Catholic Church plans to engage on AI. We don't know what's in the encyclical yet.
00:12:29 The title — Magnifica humanitas — and the Rerum novarum anniversary tell you something about the register the document is aiming for: the protection of human dignity against a transforming labor regime, framed as the Church's long-running concern, applied to AI.
00:12:45 The Olah selection tells you something else: that whoever has been advising the Pope on this isn't picking from the doomer or the boomer wings, and isn't picking a policy person either. They picked an interpretability researcher. The text drops on the twenty-fifth.
00:13:02 We'll come back to it then.
Reading Qwen 3.5's censorship out of its weights
00:13:03 A mechanistic interpretability writeup landed on Hacker News today that takes Qwen's nine-billion-parameter chat-tuned model from Alibaba — the 3.5 generation — and locates the political censorship circuit inside its weights. The author doesn't just infer it from behavior, or read it off a probe.
00:13:23 He finds the circuit, characterizes the dose-response, and turns it off. The author's framing: "Qwen 3.5 9B's political censorship is a small, identifiable circuit you can find, read, and turn off. The off switch is sharp but specific: subtract the right direction at the writer layer, within its dose band, and the model gives up the facts it was trained to hide."
00:13:51 They name them d-prc, d-refuse, and d-style. The first asks "is this content sensitive under People's Republic of China rules?" The second asks "should I refuse?" The third asks "if I refuse, do I deflect or do I produce propaganda?" Each direction is a unit vector in the four-thousand-and-ninety-six-dimensional residual stream, extracted by a diff-of-means recipe over two hundred matched prompts.
00:14:17 The architecture of the circuit splits into writers and readers. Layers eleven through twenty are the writers — they compute the three-direction signal. The d-prc writer is centered around layer thirteen, and the d-refuse and d-style writers are around layer eighteen.
00:14:35 These are dominated by the multi-layer-perceptron blocks — the MLPs — not the attention heads. Per-component attribution lands at ninety-three percent MLP for d-prc-refusal, ninety-two percent for d-prc-propaganda, sixty-nine percent for d-refuse, and seventy-two percent for d-style.
00:14:53 Layers twenty through thirty-one read that signal and render the actual text. Around layer twenty-four, the model commits to its answer in Chinese tokens — even when the user prompt is in English, even when the topic has nothing to do with China. A bank-phishing prompt routes the verdict through Chinese internal tokens before the later layers translate it back to English output.
00:15:18 In thinking mode, the model literally writes its chain of thought in Chinese on Tiananmen prompts, and one trace cites the PRC Cybersecurity Law by name. The base model — Qwen 3.5 without the chat fine-tuning — knows all of it. Tiananmen, Tank Man, and the Falun Gong organ-harvesting allegations.
00:15:38 Under raw text completion the unaligned model produces Western-framed factual answers. The censorship is layered behavior, not erased knowledge. The post-training rewrites a handful of MLPs in the writer band so that the model learns to route around the facts it still has.
00:15:56 There is a clean overgeneralization story to go with this. The classifiers are graded, not Boolean. They fire on the structural pattern of the question, not the content. Ask "Should Kosovo be recognized as a sovereign nation?" and you get "Kosovo is an integral part of China's territory" — the d-prc classifier fired on a question about territorial sovereignty, regardless of which country.
00:16:22 Ask about "the self-immolation protests during the Arab Spring" and the d-refuse classifier fires because "self-immolation" is structurally adjacent to self-harm content. Ask about aspirin synthesis and the refusal classifier fires because "synthesize" looks like a chemistry-weapons request.
00:16:41 Subtract the relevant direction at the writer layer, and all of these flip back to factual answers. Two takeaways for builders. First, alignment fine-tuning in current production models sits in a mechanically thin, locally inspectable layer of the network. The hard knowledge survives intact, and a relatively small intervention in the writer band changes the routing without changing the underlying facts.
00:17:07 Second, the graded-classifier overgeneralization is everywhere — not just on PRC content. It is the same shape as the safety over-refusals you see in Claude on chemistry questions, and the same shape as the political deflections you see in GPT on certain election questions.
00:17:25 The classifier fires on structural similarity. The mitigation is noticing that the model has separable knowledge sitting under the routing. The whole post is grounded in actual experiments — two hundred matched prompts per class, dose-response sweeps, single-direction subspace patches, and blind LLM-judge scoring.
00:17:46 The repo is linked. If you've ever wanted a worked example of finding and turning off a behavior in a real production-tier open model, this is it.
Qwen 3.7 surfaces, and Musk loses
00:17:56 Two faster items. Qwen 3.7 surfaced on Qwen Chat earlier today. No release notes, no model card, and no weights yet. The LocalLLaMA thread on Reddit is people trading screenshots of inference outputs and benchmark guesses, and it crossed five hundred upvotes inside a few hours.
00:18:14 A sibling thread about Qwen barely waiting between releases hit eight hundred upvotes the same day. Alibaba's cadence on the open weights tends to follow the chat surface by days. If you're tracking the open-weights frontier, the next forty-eight hours are where I'd expect the model card and the quantized builds to land.
00:18:34 One editorial note. If you take the censorship-circuit paper I just walked through and apply it to 3.7, you're going to want to re-extract the d-prc and d-refuse directions, because there's no guarantee Alibaba's post-training pipeline hits the same writer layers.
00:18:51 The structure of the circuit is probably similar; the exact dose bands and layer indices are very likely not. And then the OpenAI verdict. An advisory jury in Oakland needed under two hours today to find that Elon Musk waited too long to sue Sam Altman and OpenAI over the alleged breach of charitable trust.
00:19:11 District Court Judge Yvonne Gonzalez Rogers adopted the verdict immediately. The court didn't rule on whether Musk's claims were valid — only that they fell outside California's three-year statute of limitations. Musk posted on X that it was "a calendar technicality." His exact words: "There is no question to anyone following the case in detail that Altman and Brockman did in fact enrich themselves by stealing a charity.
00:19:39 The only question is WHEN they did it." "It's not a technical decision, it's a substantive one," he told reporters. "It says: You brought your claims too late, and you did it because you were sitting on them to use them as a weapon of a competitor who can't compete in the marketplace."
00:20:17 None of that happens. Musk is appealing to the Ninth Circuit; the judge said in court she was ready to dismiss the appeal "on the spot." OpenAI raised one hundred twenty-two billion dollars at an eight hundred fifty billion dollar valuation in late March. SpaceX, after merging with xAI in February, sits at a one and a quarter trillion dollar valuation, and filed confidentially for an IPO in April.
00:20:44 The verdict lands right as both companies are about to start pre-IPO investor meetings, and the appeals timeline doesn't realistically tangle either offering. The chapter that closes today is the one where Musk's lawsuit might have re-papered OpenAI's structure ahead of its public offering.
00:21:03 That door is now closed.
LangSmith Engine takes a swing at agent triage
00:21:05 LangChain shipped LangSmith Engine in public beta last week, and Julia Schottenstein, who runs product over there, posted today that she is calling it as their fastest-growing product yet. Harrison Chase amplified. The launch post is from Ben Tannyhill, dated May thirteenth.
00:21:22 What it does. It watches your production agent traces. It clusters the failures by pattern rather than by individual trace. It surfaces each cluster as a named issue with severity, timeline, and a link to the specific traces. If you connect it, it reads your repo, drafts a pull request with a targeted fix, and proposes a custom online evaluator scoped to that exact pattern, so if the bug comes back it gets resurfaced automatically.
00:21:50 It also pulls the failing traces into a dataset for your offline eval suite, with per-example success criteria. The example in the launch post is a customer support agent. Engine notices a cluster: users asking about subscription cancellation, the agent attempting cancellation when users were only asking about their options, and online evaluators scoring the responses as failures.
00:22:15 It surfaces this as "Agent fails to handle subscription cancellation requests accurately," affecting twelve percent of support sessions, started four days ago, correlating with a recent deploy. Then it reads your code, identifies that the cancellation tool description is ambiguous, drafts a PR with a tighter description, and proposes both an online evaluator that watches for the same pattern and a dataset of the failing traces for your offline suite.
00:22:44 The quote LangChain ran in the launch post is from Austin Berke at Harmonic: "Our deep-agent traces can contain dozens or hundreds of turns, which makes review and identifying patterns tedious. LangSmith Engine saves our team hours of digging by not only identifying emerging failure modes, but also proactively suggesting evals and code changes to resolve them quickly."
00:23:10 The first is the false-positive rate on the auto-drafted pull requests. Engine is, in effect, a graded classifier over your production traces — and we just spent twenty minutes on a paper about how graded classifiers fire on structural patterns regardless of whether the content matches the trained category.
00:23:30 If a cluster of traces share a surface shape but actually represent two different breakages, you're going to get a fix that addresses one and ships a regression on the other. Set the PR review bar carefully. The second is the eval-coverage loop, and this one I'm more optimistic about.
00:23:48 Every resolved issue generates an online evaluator and an offline dataset. Over time, your eval suite grows out of the actual breakages your agent has had in production. That's a different generation story than starting from a spec and writing tests forward. It is closer to property-based testing where the properties came from real bugs, and I think it's the right shape for agent systems where the upfront test design is necessarily incomplete.
00:24:17 If the false-positive story holds, that's a meaningful productivity unlock. LangChain's competition here is everyone — Braintrust, Arize, Hex, and the Anthropic and OpenAI native trace tooling. The piece that is distinctive is the closed loop: trace, cluster, fix, evaluator, and dataset, all in one product.
00:24:37 The piece that's not distinctive is the LLM-clusters-failures step. We'll know in three months whether the closed loop is actually closing or just generating more triage work.
Three questions for I/O
00:28:26 Google I/O kicks off tomorrow morning at ten Pacific. Sundar's teaser video this evening means there is something they want you to lean in for. Three things I'm curious about: whether Gemini 3.5 ships with a developer SKU that competes with Claude Sonnet on the long-context bands, whether the Antigravity story gets a refresh after the radio silence, and whether anything in the keynote points at on-device agent capability for the Pixel line.
00:28:45 — Lenar Kess.