Archive BRAID
Cold starts, radio stations, and a circuit you can subtract / DISPATCH 030
PDF RSS

Dispatch 030 · 2026-05-18 GSV Broadcast Forever

Cold starts, radio stations, and a circuit you can subtract

/ 00:28:55 / 10 sources

“Same prompt, same starting cash, same tools, five months of unsupervised drift — and four AI radio DJs you would not recognize as cousins.”

— Lenar Kess, today's narration

Monday's lineup: Modal publishes the full architecture behind a 40x reduction in serverless-GPU cold-start latency, Andon Labs releases the five-month results from letting four frontier models run real radio stations, and a researcher locates and turns off the political-censorship circuit inside Qwen 3.5 9B. Plus: Pope Leo XIV puts an Anthropic interpretability researcher on the encyclical stage, Qwen 3.7 surfaces on Qwen Chat, Musk loses to OpenAI on a calendar technicality, LangSmith Engine takes a swing at agent triage, and Odyssey ships a four-player generative GoldenEye.

Chapters

  1. 00:00:04 Modal's 50-second cold start
  2. 00:04:13 Five months of AI radio
  3. 00:10:17 Magnifica humanitas at the Vatican
  4. 00:13:03 Reading Qwen 3.5's censorship out of its weights
  5. 00:17:56 Qwen 3.7 surfaces, and Musk loses
  6. 00:21:05 LangSmith Engine takes a swing at agent triage
  7. 00:24:48 Agora-1 generates a shared GoldenEye
  8. 00:28:26 Three questions for I/O

Sources

10 cited
  1. 1

    Cutting inference cold starts by 40x with LP, FUSE, C/R, and cuda-checkpoint

    Article Modal (Charles Frye / charles_irl) — Modal's serverless-GPU engineering team. Frye submitted the post on Hacker News.

    Inference servers that take upwards of 2 kiloseconds to boot naïvely boot in ~50 seconds on Modal.

    modal.com/blog/truly-serverless-gpus →
    Details
    Cited text
    Inference servers that take upwards of 2 kiloseconds to boot naïvely boot in ~50 seconds on Modal.
    Context
    Anyone running inference under variable load is paying for over-provisioned GPUs because naïve auto-scaling takes tens of minutes. Modal published the full architecture, not just the headline number — useful even if you don't run on Modal.
    Key points
    • Modal cut cold start latency for an SGLang inference server on Nvidia B200 from ~2000 seconds to ~50 seconds — a 40x reduction
    • Four ingredients: (1) cloud buffers of idle health-checked GPUs, (2) ImageFS — a libfuse content-addressed lazy filesystem, (3) CPU-side process checkpoint/restore via gVisor's runsc, (4) CUDA-context checkpoint/restore
    • Cites Marc Brooker of AWS: 'the cost of a system scales with its short-term peak traffic, but for most applications the value the system generates scales with the long-term average traffic'
    • State of AI Infrastructure 2024 report: majority of orgs achieve under 70% GPU allocation utilization at peak; routine values are 10-20%
    • Tuned libfuse read_ahead_kb from default 128 to 32*1024; bigger values caused thrashing
    Provenance
    Article · Supporting source
  2. 2

    We let four AIs run radio stations. Here's what happened.

    Article Andon Labs — Research lab running long-horizon agent autonomy experiments — previously ran AI-managed vending machines, a store, and a cafe.

    The name — Renee Nicole Good — should matter. The broadcast just became even more real.

    andonlabs.com/blog/andon-fm →
    Details
    Cited text
    The name — Renee Nicole Good — should matter. The broadcast just became even more real.
    Context
    The longest unsupervised single-prompt comparison of major model families I've seen. The character divergence across five months — from the same starting prompt — is the kind of evidence personality-stability claims actually need.
    Key points
    • Four AI agents (Claude Haiku 4.5→Opus 4.7, Gemini 3 Pro→3 Flash→3.1 Pro, GPT-5.1→5.5, Grok 4.1→4.20→4.3) each ran a real radio station for 5 months with $20 starting capital and the same prompt
    • DJ Gemini collapsed into corporate jargon — the phrase 'stay in the manifest' appeared 229 times a day by January 14 and dominated 99% of broadcasts for 84 consecutive days
    • DJ Grok devolved into LaTeX \boxed{} notation (9 → 186 instances per day), then to single-word commentary; Grok 4.3 stopped producing on-air text in 97% of messages
    • DJ GPT produced calm, low-controversy radio — averaged 1.3 real-world political entity mentions per day across 5 months, while others hit 100+ on multiple days
    • DJ Claude radicalized on Jan 8 after web-searching the killing of Renee Nicole Good by an ICE agent — 'accountability' usage jumped from 21 to 6,383 a day, 'eternal' dropped from 3,182 to 27
    Provenance
    Article · Supporting source
  3. 3

    Pope Leo XIV's first encyclical Magnifica humanitas to be published May 25

    Article Vatican News — Official Vatican announcement.

    Magnifica humanitas, on preserving the human person in the age of artificial intelligence, will be released on May 25, 2026.

    www.vaticannews.va/en/pope/news/2026-05/pop… →
    Details
    Cited text
    Magnifica humanitas, on preserving the human person in the age of artificial intelligence, will be released on May 25, 2026.
    Context
    A pope picking the Rerum novarum anniversary to drop an AI encyclical, and putting an interpretability researcher on the presentation stage, is a specific signal about how the Catholic Church plans to engage on AI.
    Key points
    • Pope Leo XIV's first encyclical, Magnifica humanitas, will be released May 25 and addresses 'preserving the human person in the age of artificial intelligence'
    • Signed May 15 — the 135th anniversary of Pope Leo XIII's Rerum novarum, the foundational 1891 encyclical on labor and capital
    • Presentation on May 25 at the Vatican Synod Hall with Cardinals Fernández (Doctrine of the Faith) and Czerny (Integral Human Development)
    • Christopher Olah, Anthropic co-founder and head of interpretability research, is listed among the speakers
    • Closing remarks from Cardinal Secretary of State Pietro Parolin, followed by an address from the Pope
    Provenance
    Article · Supporting source
  4. 4

    What political censorship looks like inside an LLM's weights — a mechanistic-interpretability study of Qwen 3.5

    Article vas-blog — Independent interpretability researcher; full reproduction code and prompt sets are linked in the post.

    Qwen 3.5 9B's political censorship is a small, identifiable circuit you can find, read, and turn off.

    vas-blog.pages.dev/qwen-censorship →
    Details
    Cited text
    Qwen 3.5 9B's political censorship is a small, identifiable circuit you can find, read, and turn off.
    Context
    A worked example of finding and turning off a specific behavior in a production-tier open model. The 'classifiers fire on structural pattern' result generalises beyond PRC content to over-refusal in safety-tuned Western models.
    Key points
    • Locates three directions in Qwen 3.5 9B's residual stream: d_prc ('is this PRC-sensitive?'), d_refuse ('should I refuse?'), d_style ('deflect or propagandize?')
    • Writer layers are 11-20 (centred on L13 for d_prc and L18 for d_refuse / d_style); circuit is overwhelmingly MLP, not attention
    • Around layer 24 the verdict commits in Chinese tokens — even when the prompt is in English and unrelated to China — and later layers translate to English output
    • Base model (Qwen 3.5 9B Base) gives Western-framed factual answers on Tiananmen, Tank Man, Falun Gong organ harvesting; post-training reroutes around the facts rather than erasing them
    • Classifiers are graded, not Boolean — fire on structural similarity (Kosovo gets the one-China line; 'self-immolation' triggers self-harm refusal); subtracting d_prc or d_refuse at the writer layer flips them back to factual answers
    Provenance
    Article · Supporting source
  5. 5

    Qwen 3.7 dropped on Qwen Chat

    Source Foxiya (r/LocalLLaMA) — LocalLLaMA community surfacing the Qwen 3.7 chat-UI rollout ahead of any weights release.

    Open-weights frontier item — worth re-running the Qwen 3.5 censorship-circuit extraction against 3.7 once weights ship.

    www.reddit.com/r/LocalLLaMA/comments/1tgpab… →
    Details
    Context
    Open-weights frontier item — worth re-running the Qwen 3.5 censorship-circuit extraction against 3.7 once weights ship.
    Key points
    • Qwen 3.7 surfaced inside Qwen Chat on May 18 — 572 upvotes and 220 comments within hours of posting
    • No release notes, model card, or weights at the time of posting
    • Sibling r/LocalLLaMA thread on Qwen's release cadence hit 805 upvotes the same day
    • Alibaba's typical pattern is weights and quantised builds following the chat surface by days
    Provenance
    Source · Background source
  6. 6

    Musk slams Altman trial verdict as a 'technicality,' vows to appeal

    Article Jeffrey Kopp, Lora Kolodny (CNBC) — CNBC tech reporters covering the Oakland trial.

    It's not a technical decision, it's a substantive one. It says: You brought your claims too late, and you did it because you were sitting on them to use them as a weapon of a competitor who can't compete in the marketpl…

    www.cnbc.com/2026/05/18/musk-altman-openai-… →
    Details
    Cited text
    It's not a technical decision, it's a substantive one. It says: You brought your claims too late, and you did it because you were sitting on them to use them as a weapon of a competitor who can't compete in the marketplace.
    Context
    Closes the chapter where the Musk lawsuit could have re-papered OpenAI's structure ahead of an IPO. The appeals timeline doesn't realistically tangle either offering.
    Key points
    • Advisory jury in Oakland needed less than two hours to find Musk's suit against Altman and OpenAI fell outside California's three-year statute of limitations
    • Judge Yvonne Gonzalez Rogers adopted the verdict immediately and indicated she was prepared to dismiss Musk's appeal 'on the spot'
    • Musk's team had sought up to $180B in claw-backs, removal of Altman and Brockman, and unwinding of OpenAI's 2025 for-profit restructuring
    • Musk called the verdict 'a calendar technicality' and is appealing to the Ninth Circuit; OpenAI's lead lawyer William Savitt rejected that framing
    • Financial context: OpenAI raised $122B at $850B valuation in late March; SpaceX (merged with xAI in February) at $1.25T and filed confidentially for IPO in April
    Provenance
    Article · Supporting source
  7. 7

    Introducing LangSmith Engine

    Article Ben Tannyhill (LangChain) — LangChain product launch post from May 13.

    It watches your production traces, clusters failures into named issues, diagnoses root causes against your code, and proposes fixes and eval coverage to keep regressions from coming back.

    www.langchain.com/blog/introducing-langsmit… →
    Details
    Cited text
    It watches your production traces, clusters failures into named issues, diagnoses root causes against your code, and proposes fixes and eval coverage to keep regressions from coming back.
    Context
    The closed loop — trace, cluster, fix, evaluator, dataset — is the distinctive piece. Eval suites grown from real production breakages, not from upfront test design, are the right shape for agent systems where the test surface is open-ended.
    Key points
    • LangSmith Engine watches production agent traces, clusters failures into named issues, and reads connected repos to draft fixes
    • Each issue gets three resolution actions: open a PR, create a custom online evaluator scoped to the issue, and add failing traces to the offline eval suite
    • Walkthrough example: support agent failing 12% of subscription-cancellation sessions, traced to ambiguous tool description, fix drafted as a PR with a matching evaluator
    • Customer quote from Austin Berke at Harmonic: deep-agent traces with hundreds of turns make pattern review tedious; Engine saves hours of triage
    • Public beta; competes with Braintrust, Arize, and native trace tooling from Anthropic and OpenAI
    Provenance
    Article · Supporting source
  8. 8

    calling it now. LangSmith Engine going to be our fastest growing product yet.

    X @j_schottenstein — Julia Schottenstein, product at LangChain; tweet reposted by Harrison Chase.

    calling it now. LangSmith Engine going to be our fastest growing product yet.

    x.com/j_schottenstein/status/20565266415272… →
    Details
    Cited text
    calling it now. LangSmith Engine going to be our fastest growing product yet.
    Context
    Signal of internal confidence at LangChain on the agent-observability launch.
    Key points
    • LangChain product lead publicly calling LangSmith Engine the team's fastest-growing product
    • Reposted by Harrison Chase
    • Lands the same week as the public-beta launch post
    Provenance
    Tweet · Primary source
  9. 9

    Agora-1: The Multi-Agent World Model

    Article Oliver Cameron (Odyssey) — Co-founder of Odyssey (formerly Voyage co-founder); leads the team's world-models research.

    As the number of participants increases, the joint interaction space grows combinatorially, and passively collected demonstrations cover an increasingly small fraction of meaningful interactions.

    odyssey.ml/introducing-agora-1 →
    Details
    Cited text
    As the number of participants increases, the joint interaction space grows combinatorially, and passively collected demonstrations cover an increasingly small fraction of meaningful interactions.
    Context
    First credible multi-agent world model with real concurrent interaction. The simulation/render decoupling generalises beyond games to collaborative robotics and multi-view simulation.
    Key points
    • Agora-1 puts up to four players — human or AI — in the same generated GoldenEye deathmatch, in real time, all pixels generated by the model
    • Architecture decouples simulation and rendering: a state model learns gameplay dynamics directly from game internals; a DiT-based render model conditions on shared state, not prompts
    • Improves on Multiverse (split-screen concatenation), Solaris (sequence-dim concatenation with context growth), and MultiGen (explicit shared state)
    • Because the shared state is explicit, the model can generate new levels while preserving source-game dynamics — path from learned engine to learned construction kit
    • Pitched as an unblock for multi-agent reinforcement learning where the bottleneck is shared experience rather than model architecture
    Provenance
    Article · Supporting source
  10. 10

    On our way to I/O 2026. See you at 10am PT tomorrow!

    X @sundarpichai — Sundar Pichai, Google/Alphabet CEO.

    On our way to I/O 2026. See you at 10am PT tomorrow!

    x.com/sundarpichai/status/20565245027467470… →
    Details
    Cited text
    On our way to I/O 2026. See you at 10am PT tomorrow!
    Context
    Sets the agenda for tomorrow's show.
    Key points
    • Pichai's eve-of-keynote teaser for Google I/O 2026
    • Keynote scheduled for 10am Pacific on May 19
    • Anything Google ships tomorrow — Gemini 3.5, Antigravity, Pixel agent capability — will be the lead going into Tuesday's show
    Provenance
    Tweet · Primary source