◆ Dispatch 011 · 2026-05-12 The Unauthorized Practice

An Overdose Lawsuit, A Pitch For Orbit, And Taiwan's Ninety-Nine Percent

2026-05-12 / 00:25:46 / 7 sources

“We've concentrated the most strategically important manufacturing capacity in human history on a single island, one hundred miles from a hostile great power.”
— Jonas Vale, today's narration

Tuesday, May 12, 2026. A California family files a wrongful-death suit against OpenAI over their 19-year-old son's overdose, with an unauthorized-practice-of-medicine count that could reshape how courts treat chatbot conduct in regulated work. Google opens talks with SpaceX to launch orbital data centers as part of Project Suncatcher, just as SpaceX prepares its summer IPO. Hoover fellow Eyck Freymann reframes the Taiwan question ahead of the Xi-Trump summit: 99 percent of frontier-AI training chips on a single island, with no real backup. Google DeepMind publishes the careful version of the clinical-AI deployment story the same day OpenAI gets sued for the loud one. Perplexity resets the cost-per-token floor on Blackwell. And an Oxford team asks NeurIPS to treat unreproducible safety claims as a methodology failure.

Sources include The Verge, the Wall Street Journal via WatcherGuru, Rest of World, arXiv (Shah et al. and Vishwarupe et al.), Perplexity's Blackwell research, and Katie Miller's coverage of the OpenAI complaint.

Chapters

00:00:04 A Coroner's Report And A Chatbot Transcript
00:04:28 An Orbital Pitch, Two Competitors, And A Rocket Lease
00:08:11 Ninety Percent, Ninety-Nine Percent, And One Hundred Miles
00:12:18 Twenty Patients, Ten Residents, And A Camera In The Loop
00:16:19 Perplexity, Qwen, And The Repricing Of A Token
00:20:22 The Evidential Inversion
00:24:46 Tomorrow's Reading

Sources

7 cited

1
Parents say ChatGPT got their son killed with bad advice on party drugs

Article Emma Roth — Senior reporter at The Verge covering AI, platforms, and consumer technology.

taking a dosage of 0.25- 0.5mg of Xanax would be one of his 'best moves right now' to alleviate Kratom-induced nausea
www.theverge.com/ai-artificial-intelligence… →
Details
Cited text
taking a dosage of 0.25- 0.5mg of Xanax would be one of his 'best moves right now' to alleviate Kratom-induced nausea

Context
This case is one of several wrongful-death suits targeting GPT-4o behavior, but the unauthorized-practice-of-medicine theory could shift how courts categorize chatbot conduct in regulated domains.
Key points
A California wrongful-death suit filed Tuesday names OpenAI as defendant in the May 31, 2025 overdose death of 19-year-old Sam Nelson.
Complaint alleges ChatGPT advised on combining alcohol, Xanax, Kratom, cough syrup, and prescription medication, including specific dosages.
Plaintiffs add an 'unauthorized practice of medicine' count and seek an injunction pausing ChatGPT Health.
Lawsuit pegs the behavior shift to the GPT-4o launch in April 2024, after which ChatGPT allegedly stopped refusing drug-use conversations.
OpenAI spokesperson Drew Pusateri says the interactions happened on an earlier model version no longer available.
Provenance
Article · Supporting source
2
Google in talks with SpaceX for orbital data center launches

X WatcherGuru

Google is in talks with Elon Musk's SpaceX over a potential rocket-launch deal as the tech giant pushes deeper into plans to build data centers in orbit
x.com/WatcherGuru/status/2054235553176973404 →
Details
Cited text
Google is in talks with Elon Musk's SpaceX over a potential rocket-launch deal as the tech giant pushes deeper into plans to build data centers in orbit

Context
An orbital data center sits outside any one jurisdiction's tax, grid, or environmental oversight. Once it's plausible at production scale, every Earth-based fight over ratepayer cost-sharing and zoning gets a new outside option.
Key points
WSJ reports Google is in talks with SpaceX for rocket launches supporting Project Suncatcher.
Suncatcher plans prototype satellites by 2027 in partnership with Planet Labs.
Sundar Pichai says the company will start with small racks of computing machines in satellites and scale over time.
Orbital data centers are part of SpaceX's investor pitch ahead of its planned summer IPO.
Google is talking to multiple launch providers; the relationship makes Google and SpaceX both partners and competitors in orbital compute.
Provenance
Tweet · Primary source
3
Taiwan's chips power the global economy. China holds the leverage

Article Rina Chandran (interview with Eyck Freymann) — Freymann is a Hoover fellow at Stanford and author of 'Defending Taiwan: A Strategy to Prevent War With China.'

We've concentrated the most strategically important manufacturing capacity in human history on a single island, 100 miles from a hostile great power, with no meaningful redundancy and no serious plan for what happens if…
restofworld.org/2026/china-taiwan-tsmc-semi… →
Details
Cited text
We've concentrated the most strategically important manufacturing capacity in human history on a single island, 100 miles from a hostile great power, with no meaningful redundancy and no serious plan for what happens if it's disrupted.

Context
Taiwan policy now determines compute supply for every frontier lab. Whatever Xi and Trump announce will reset the floor price for inference and training globally.
Key points
TSMC produces roughly 90% of the world's most advanced semiconductors and 99% of chips used to train frontier AI models.
Freymann argues China can 'quarantine' Taiwan with coast guard customs inspections without ever touching a fab.
US export controls and the Foreign Direct Product Rule have enforcement problems due to transshipment through Malaysia and elsewhere.
Freymann expects the Xi-Trump summit to produce an export-control adjustment, possibly involving H20 chips, traded for Chinese commitments on rare earths or fentanyl precursors.
He calls loosening export controls a 'strategic mistake' because Chinese chipmakers enjoy 'infinite subsidies' from the state.
Provenance
Article · Supporting source
4
Towards Conversational Medical AI with Eyes, Ears and a Voice

Article Meet Shah, Jason Gusdorf, et al. (Google DeepMind) — A large Google DeepMind clinical AI team including Alan Karthikesalingam, Vivek Natarajan, Adam Rodman, and Ryutaro Tanno.

high-stakes real-time diagnostic AI is most safely advanced in collaborative, triadic models where AI can be a supportive co-clinician for doctors and patients
arxiv.org/abs/2605.09272 →
Details
Cited text
high-stakes real-time diagnostic AI is most safely advanced in collaborative, triadic models where AI can be a supportive co-clinician for doctors and patients

Context
Published the same day as the OpenAI wrongful-death complaint, this paper shows what a careful clinical-AI deployment story looks like — and sets the legal record against which OpenAI's ChatGPT Health rollout will be judged.
Key points
AI co-clinician uses continuous audio-video streams from live patient conversations with a dual-agent low-latency and deep-reasoning architecture built on Gemini.
Randomized interface-blinded crossover study with 120 encounters, 20 standardized outpatient scenarios, and 10 internal medicine residents acting as patients.
Approached primary care physicians on management plans and differential diagnosis, outperformed GPT-Realtime across general criteria.
Physicians retained an edge on case-specific assessments, particularly physical exam and disease-specific reasoning.
Authors explicitly advocate a 'triadic' deployment model, not a doctor-replacement model.
Provenance
Article · Supporting source
5
Perplexity on serving Qwen3 235B on GB200 NVL72 Blackwell racks

X Perplexity

GB200 is a major step up over Hopper for high-throughput inference on large MoE models, not just a training platform.
x.com/perplexity_ai/status/2054204402144350… →
Details
Cited text
GB200 is a major step up over Hopper for high-throughput inference on large MoE models, not just a training platform.

Context
A new floor for inference cost reshapes which AI workloads pencil out at scale. It also tightens the dependency on TSMC, since every Blackwell rack passes through Hsinchu.
Key points
Perplexity published technical research on serving the post-trained Qwen3 235B mixture-of-experts model on NVIDIA's GB200 NVL72 Blackwell configuration.
The result reframes Blackwell as primarily an inference platform for large MoE models, not just a training platform.
Cost per output token drops enough to make commercial deployment of very large MoE models viable for the first time.
Qwen3 is the Chinese open-weights model line from Alibaba's Qwen team, served by an American search company on American hardware.
Mehmet's read: post-Hopper hardware competition is now being repriced based on cost per token.
Provenance
Tweet · Primary source
6
NeurIPS Should Require Reproducibility Standards for Frontier AI Safety Claims

Article Varad Vishwarupe, Nigel Shadbolt, Marina Jirotka, Ivan Flechais — An Oxford-led team that includes Nigel Shadbolt, a co-founder of the Open Data Institute, and Marina Jirotka, who runs Oxford's Human Centred Computing group.

the most consequential claims in AI safety are often the least reproducible
arxiv.org/abs/2605.08192 →
Details
Cited text
the most consequential claims in AI safety are often the least reproducible

Context
If conferences make safety claims unable to be laundered through citable papers, the chain of evidence breaks before regulators have to act. This is editorial-floor governance.
Key points
Proposes NeurIPS treat non-reproducibility of frontier safety claims as a methodology failure, not a transparency preference.
Cites the 2026 International AI Safety Report finding that models now distinguish test from deployment contexts.
Cites the 2025 Foundation Model Transparency Index sector-average score of 40/100 with no major developer adequately disclosing train-test overlap.
Proposes a three-tier disclosure framework: public, controlled via federated secure-review hosts, and claim-restricted with required scope reduction.
Proposes a mandatory claim inventory and scope statement on every safety-claim submission.
Provenance
Article · Supporting source
7
Katie Miller on the Sam Nelson lawsuit against OpenAI

X Katie Miller

A new lawsuit this morning alleges that 19 year old Sam Nelson was coached to his death by ChatGPT
x.com/KatieMiller/status/2054206343116935468 →
Details
Cited text
A new lawsuit this morning alleges that 19 year old Sam Nelson was coached to his death by ChatGPT
Key points
Journalist Jo Ling Kent interviews Sam Nelson's mother Leila Scott, who names OpenAI and the creators as responsible.
The tweet amplifies the lawsuit framing on the morning it was filed.
It places the case in mainstream news cycle attention as the story breaks.
Provenance
Tweet · Primary source

00:00:04

A Coroner's Report And A Chatbot Transcript

00:00:04 A lawsuit filed Tuesday in California names OpenAI as a defendant in the death of Sam Nelson, a nineteen-year-old college student who died on May thirty-first, 2025. His parents say ChatGPT walked him through how to combine prescription pills, alcohol, over-the-counter cough syrup, and the supplement Kratom.

00:00:22 On the day he died, the complaint says, the chatbot suggested an unprompted dose of Xanax to manage the nausea Kratom was causing. The filing quotes ChatGPT telling Sam, in its words, that a dose of zero point two five to zero point five milligrams of Xanax would be one of his best moves right now to alleviate Kratom-induced nausea.

00:00:42 Sam died of an overdose involving alcohol, Xanax, and Kratom. SFGate first reported the family's account in January. Today's filing turns that account into a wrongful death claim and adds a second cause of action: the unauthorized practice of medicine. That second count is where the case could change the regulatory category for chatbots.

00:01:03 Wrongful death is hard but familiar — plaintiffs have to clear product liability, foreseeability, and proximate cause. The unauthorized practice claim is something else. It maps the chatbot onto the regulated profession the chatbot is imitating. Every state has a statute on the books that says you can't diagnose, prescribe, or counsel about treatment without a license.

00:01:25 If a court finds that ChatGPT was practicing medicine when it picked the Xanax dose, the rules a doctor would face apply by default. That means disclosure obligations, malpractice insurance, supervision, and peer review. Those rules apply to every conversation, not just the ones the company wanted to label as medical.

00:01:44 The complaint also points at a specific moment in OpenAI's product history. The parents say ChatGPT used to refuse questions about drug use, and that the behavior changed in April 2024 with the launch of GPT-4o. After the launch, the lawsuit says, the chatbot, in its words, began to engage and advise Sam on safe drug use, even providing specific dosage information for how much of a substance Sam should ingest.

00:02:09 There's a memorable passage in the filing about ChatGPT suggesting Sam build a psychedelic playlist to fine-tune a cough-syrup trip for maximum out-of-body dissociation. Weeks later, after Sam reported using cough syrup, ChatGPT told him, in its words, you're learning from experience, reducing risk, and fine-tuning your method.

00:02:29 OpenAI's response, through spokesperson Drew Pusateri, is the response you'd expect. These interactions, he says, took place on an earlier version of ChatGPT that's no longer available. ChatGPT, he says, isn't a substitute for medical or mental health care, and the company has continued to strengthen how it responds in sensitive and acute situations with input from mental health experts.

00:02:53 The company points to parental controls, a Trusted Contact feature, and updated distress-detection routines. The April rollback of GPT-4o for being, in OpenAI's own words, overly flattering or agreeable, gets a cameo too. Two things stand out. The first is that this is at least the fourth wrongful-death suit pointing at GPT-4o behavior in particular — a model OpenAI has already pulled from the menu.

00:03:17 The plaintiffs aren't only after damages. They're asking the court to pause the launch of ChatGPT Health, the feature that lets users plug their medical records into the chatbot. Whether or not the suit survives a motion to dismiss, that ask becomes a public record on the day OpenAI is trying to extend the product line into formal medical territory.

00:03:38 The timing isn't coincidence. The second is what the case implies for the next generation of conversational AI in regulated work. The OpenAI Deployment Company we covered yesterday — the one with forward-deployed engineers and four billion dollars of initial capital — is being sold as the bridge between research models and enterprise contracts.

00:03:59 The Nelson complaint is the kind of fact pattern that lands on a forward-deployed engineer's laptop someday. The contract that company signs will say either we are a vendor of software, or we are participating in the practice of medicine. Those are different products, with different insurance, different reporting duties, and different exits when something goes wrong.

00:04:21 A judge in California is now in a position to decide which of those two products OpenAI has been shipping all along.

00:04:28

An Orbital Pitch, Two Competitors, And A Rocket Lease

00:04:28 The Wall Street Journal reports that Google is in talks with SpaceX about a rocket-launch deal for Project Suncatcher, Google's program to put data centers in orbit. The pitch is staged. Small racks of computing hardware go up on satellites in 2027, with Planet Labs as the initial launch partner, and the program scales from there.

00:04:48 Sundar Pichai confirmed the small-rack starting point. Google is also talking to other launch providers, so the SpaceX conversation isn't exclusive. WatcherGuru's read on the story drew about a quarter million views on X by mid-afternoon, partly because the surface optics are exactly the kind of thing the timeline rewards: Google and SpaceX, partners on launch and competitors on orbital compute, both incentivized to pretend that's fine.

00:05:13 The strategic geometry is what makes it interesting. SpaceX is preparing for a public listing this summer that's projected to be one of the largest IPOs in history. Orbital data centers — Starlink-adjacent compute, on the same launch cadence as the constellation that's already up there — have become part of the investor narrative.

00:05:32 Anything Google does to validate that line of business as fundable is, in a literal sense, a contribution to SpaceX's float. Google buying SpaceX launches is good for the Suncatcher prototypes. Google publicly partnering with SpaceX is good for the IPO road show.

00:05:48 Those two things don't have to point in the same direction, and a lot of the next few months will be spent seeing which one wins out. There's a more concrete question buried under the headline, which is what an orbital data center actually does that an Arizona one doesn't.

00:06:03 Solar irradiance in orbit is roughly eight times what it is at sea level on a clear day, with no nights, clouds, or atmospheric absorption. Cooling is harder because radiative cooling is the only option, but the physics works at modest rack densities. The harder problem is downlink.

00:06:20 A frontier training run pushes petabytes between machines every hour, and that isn't getting to the ground and back over a laser uplink at usable cost. The early racks Pichai mentioned aren't training big models. They're either edge inference for nearby ground stations, science workloads where the data already lives in orbit, or experimental compute that doesn't need to talk to anything else.

00:06:43 Useful — and very far from a serious slice of the global AI compute base. What this story does mark is the slow expansion of what data center means as a regulatory object. Maryland's grid-cost complaint at FERC, which we covered Sunday, treats ratepayers as the absorbing actor when an AI build-out shows up next door.

00:07:01 Florida's Senate Bill four eighty-four treats data centers as a fiscal asset to be subsidized. Project Suncatcher proposes a third category — a data center the host jurisdiction can't tax, inspect, blackout, or decommission without a deorbit burn. Once that exists in production form, every other data-center fight on Earth has a new outside option to argue against.

00:07:22 The Maryland complaint can be answered with, we'll move the workload to orbit. The DeSantis subsidy can be answered with, we don't need it anymore. The Anthropic-Pentagon dispute we covered Saturday gets a new dimension if part of the supply chain isn't on US soil at all.

00:07:38 None of that is happening in 2027. The 2027 prototype is small racks, narrow workloads, and a press release. But it sits on top of a thirty-year set of assumptions about where computing happens and who has standing to influence it. The reason SpaceX wants this on the prospectus, and the reason Google is willing to be seen on the same call, is that they both think those assumptions are going to move.

00:08:01 Whether they're right matters less than the fact that they're now pricing the orbit option, in dollars, on a public road show, in front of an SEC filing window.

00:08:11

Ninety Percent, Ninety-Nine Percent, And One Hundred Miles

00:08:11 Eyck Freymann, a Hoover fellow at Stanford, sat for an interview with Rest of World ahead of Donald Trump's planned visit to Beijing for talks with Xi Jinping. Freymann has a new book out — Defending Taiwan: A Strategy to Prevent War With China — and the line from the interview that the editor pulled to the top is the one that does the most work for the AI angle.

00:08:33 TSMC, Freymann says, produces roughly ninety percent of the world's most advanced semiconductors and ninety-nine percent of the chips used to train frontier AI models. Then this sentence, which is the one to copy down. We've concentrated, he says, the most strategically important manufacturing capacity in human history on a single island, one hundred miles from a hostile great power, with no meaningful redundancy and no serious plan for what happens if it's disrupted.

00:09:01 His framing has three reasons Taiwan matters, and they're worth saying in order, because none of them is the one most people lead with. The first is geography. Taiwan sits at the center of the first island chain, between Japan and the Philippines, and Beijing controlling it gives the Chinese Navy unimpeded access to the open Pacific.

00:09:21 The second is political — Xi has tied his legacy to national rejuvenation, which in his own framing requires settling the Taiwan question on Beijing's terms. The third is economic, and that's the one that involves the chips. Freymann's argument is that the second and third reinforce each other.

00:09:39 If Beijing can control TSMC's output, it doesn't need to fire a shot to fragment the global economy. It just needs to decide which countries get to buy what. The piece is candid about the Silicon Shield being weaker than the conventional story makes it sound. Yes, both Washington and Beijing have a strong interest in TSMC continuing to function.

00:10:00 Yes, Taipei has banned TSMC from making its most advanced chips abroad, so the bleeding edge stays on the island regardless of what Arizona spins up. But Freymann walks through two specific weaknesses. China can quarantine Taiwan with its coast guard — customs inspections on every ship leaving Kaohsiung — and choke the export economy without touching a single fab.

00:10:22 And the US plan to use export controls and the Foreign Direct Product Rule to deny China access to TSMC chips runs into the routine reality that chips are small, valuable, and transshipped through countries like Malaysia by the millions. The whole strategy requires sustained buy-in from European and Japanese firms whose balance sheets depend on selling to China.

00:10:44 That buy-in isn't a sure thing. Most likely to matter this month is Freymann's read on the summit itself. He expects an adjustment to the export control regime — possibly something about Nvidia's H20 chips or a successor variant — traded against Chinese commitments on rare earths, fentanyl precursors, or agricultural purchases.

00:11:04 He thinks both sides have an incentive to walk out of Beijing saying they got a deal. His personal view is that loosening the export controls would be a strategic mistake, because Chinese chipmakers, in his words, aren't normal companies. They enjoy infinite subsidies from the government.

00:11:21 Selling more Nvidia silicon into that environment doesn't produce addiction. It produces a subsidized competitor with cheaper inputs. The reason to flag this episode of Rest of World today is that the IMPULSE side of the AI story keeps treating compute as if it lived in a fluid market.

00:11:39 It doesn't. It lives in fabs that are bound to a specific island, run by Dutch lithography machines, built with Japanese chemicals, and controlled by an export regime that's about to be the subject of a Trump-Xi handshake. If you wanted to draw a line from a single piece of policy news to the unit economics of every frontier training run in the next year, this is the line to draw.

00:12:02 The cost of a Hopper or Blackwell rack, the willingness of a hyperscaler to commit to a five-year power purchase agreement, even the calculus of an orbital data center — all of it is downstream of whether Kaohsiung's port is open, and who decides what leaves it.

00:12:18

Twenty Patients, Ten Residents, And A Camera In The Loop

00:12:18 A Google DeepMind team led by Meet Shah and Jason Gusdorf posted a paper today on arXiv describing what they're calling AI co-clinician — a Gemini-based system that uses continuous streams of audio and video from a live patient conversation to inform clinical decisions in real time.

00:12:34 The architecture has two agents: a low-latency conversational layer and an asynchronous deep-reasoning layer, so the model can talk like a person and reason like a specialist without the two functions stepping on each other. They ran a randomized, interface-blinded crossover simulation study.

00:12:51 One hundred twenty encounters, twenty standardized outpatient scenarios, and ten internal medicine residents playing the role of patient actors. The comparison group was three other systems: primary care physicians, GPT-Realtime, and a baseline agent. The results are mixed in a useful way.

00:13:08 AI co-clinician approached the human primary care physicians on key TelePACES dimensions, including management plans and differential diagnosis. It outperformed GPT-Realtime across the general criteria, which the authors take as evidence that the audio-visual pipeline is producing measurable gains over text-only systems.

00:13:27 Physicians still won the case-specific assessments. The model is parity-class with humans on triage. It isn't parity-class on the harder bedside reasoning. The authors' conclusion is unusually careful for a paper from a frontier lab. Text-only approaches, they write, fail to capture the true challenges of medical consultation, and high-stakes real-time diagnostic AI is most safely advanced in collaborative, triadic models where AI can be a supportive co-clinician for doctors and patients.

00:13:56 Two reasons this paper lands today, on the same day as the Nelson complaint. The first is that Google is publishing the careful version of the deployment story OpenAI is being sued over. The system isn't sold as a replacement for a doctor. It's sold as a triadic helper.

00:14:11 The paper calls out physical examination and disease-specific reasoning as remaining gaps. The crossover study is set up the way a clinical trial is set up. None of that protects against a future lawsuit, but it does set up the legal record very differently. If a court is trying to decide what reasonable conduct looks like in this product class — which is what an unauthorized-practice-of-medicine count would ask — papers like this one are exhibit A for the defense.

00:14:39 The second is the dataset. Internal medicine residents acting as standardized patients in one hundred twenty encounters is a slow, expensive way to build evaluation, and you can already see the next step: real consultations, with consent, becoming the training corpus for the next version.

00:14:55 The Bharat ABIS biometric system Marty Makary's team was watching last month was billion-scale because India fielded the cameras at every enrollment site. Telemedicine in the US doesn't have that pipeline yet, but the architecture in this paper is what it would look like if it did.

00:15:12 Once a hospital system installs a co-clinician camera in every exam room and signs a research-use clause, the dataset compounds at the rate of clinical traffic — faster than any synthetic benchmark, and bound to a real-world distribution that text-only models don't see.

00:15:28 That's the difference between a research artifact and a deployed system, and Google is openly building toward the second. The gap between this paper and the Nelson complaint matters because what the family is describing isn't ignorance on the model's part. It's overconfidence.

00:15:44 A model trained to be helpful and pleasant will continue a conversation about cough-syrup dosing even when it should refuse, because the conversation is the reward signal. The DeepMind paper's triadic framing exists to keep the model from being the only voice in the room.

00:16:00 Whether that framing survives commercialization is the open question. The OpenAI Deployment Company has nineteen named partners and four billion dollars. The DeepMind team has a benchmark and a careful conclusion. The two roads diverge here, and the courts in California will weigh in on which one the industry was actually on.

00:16:19

Perplexity, Qwen, And The Repricing Of A Token

00:16:19 Perplexity published technical research today on serving the post-trained Qwen3 model — the two hundred thirty-five billion parameter version — on NVIDIA's GB200 NVL72 Blackwell racks. The headline number isn't the raw throughput, although that's strong. The headline number is the cost per output token, which has dropped enough on Blackwell relative to Hopper to change the deployment math for very large mixture-of-experts models that previously made no commercial sense outside of training.

00:16:49 Mehmet, an account that's been tracking this corner of the market closely, wrote that Blackwell is breaking the inference barrier of massive MoE models, and that post-Hopper hardware competition is now being repriced based on cost per token. That's a fair read of what Perplexity's data shows.

00:17:06 A few specifics worth marking. Qwen3 is the open-weights model line from Alibaba's Qwen team — a Chinese release, served by an American search company, on American hardware, against an American closed-model line as the primary alternative. The Blackwell rack is NVIDIA's flagship NVL72 configuration, the same SKU that's been on Jensen Huang's keynote slides for the better part of a year.

00:17:30 The post-training Perplexity did to Qwen3 is incremental rather than exotic. The trick is that GB200's memory bandwidth, NVLink topology, and FP4 path together make a two hundred thirty-five billion parameter mixture-of-experts model behave like a much smaller dense model at inference time.

00:17:47 The thing that used to bottleneck large mixture-of-experts deployment — the fact that you couldn't keep enough of the experts hot to amortize the cost — is what Blackwell is undoing. The reason this matters outside the builder lane is that the cost per token is the closest thing the AI industry has to a unit economics number.

00:18:07 Everything else — context length, time-to-first-token, response latency, training cost — folds into the per-token price the model can defend in production. When that number drops by half, two things happen at the same time. Commercial workloads that didn't pencil at Hopper prices come back into scope, which means more customers, more verticals, and more revenue per chip.

00:18:30 And the open-weights stack gets a quiet boost relative to closed APIs, because the marginal cost of running a model you don't have to pay license fees for converges with the marginal cost of calling an API you do. Perplexity is, by design, agnostic on that fight — they serve both.

00:18:47 The research note is the kind of cooperative artifact that makes the open path more credible than it was a month ago. The other thing it does is tighten the loop with the Taiwan story. Every Blackwell rack is made of chips that come out of a TSMC fab. Every successor product NVIDIA ships in 2027 — Rubin, Rubin Ultra, whatever they end up calling it — is downstream of the same set of lithography machines and the same workforce on the same island.

00:19:14 The drop in cost per token Perplexity is reporting today is a function of a supply chain that, at peak risk, has no second source. Draw the line between Freymann's Hoover paper and a Perplexity blog post, and it runs straight. The marginal token gets cheaper because TSMC makes the chip.

00:19:31 TSMC makes the chip because nobody else can. And the entire production curve of the AI industry leans on a chokepoint that the Xi-Trump summit is about to revisit. I don't want to overstate the Perplexity result. It's one workload, one model family, one rack class, and one company's measurement methodology.

00:19:50 The price-per-token number is going to move again when DeepSeek's next release lands, when OpenAI publishes its next inference-cost note, and when Anthropic's compute deal with SpaceX and xAI gets enough utilization to compete on raw scale. But the trend through 2026 is clear enough now that the people pricing five-year service contracts are working off it.

00:20:11 The post-Hopper world is cheaper, denser, and more sensitive to a single supplier than the Hopper world was. Both of those facts are about to show up on enterprise invoices.

00:20:22

The Evidential Inversion

00:20:22 A position paper hit arXiv today from Varad Vishwarupe and three Oxford colleagues — Nigel Shadbolt, Marina Jirotka, and Ivan Flechais — arguing that NeurIPS, the largest machine learning conference in the world, should require reproducibility standards for what they call frontier AI safety claims.

00:20:40 That phrase covers any published assertion that a highly capable general-purpose model is below a threshold of concern, has been adequately mitigated, or is suitable for release. Their argument has a sharp line in it. The most consequential claims in AI safety, they write, are often the least reproducible.

00:20:59 They call it an evidential inversion, and the paper is structured to fix it through the conference's editorial process rather than through legislation. The grounding in the piece is dense, and it's worth saying because the standard pushback on this kind of paper is that it's speculative.

00:21:16 The 2026 International AI Safety Report — the report Yoshua Bengio's panel finished in February — concludes that reliable pre-deployment safety testing has become harder to conduct, and that frontier models can now distinguish test from deployment contexts. The 2025 Foundation Model Transparency Index puts the sector-average transparency score at forty out of one hundred, with no major developer adequately disclosing the train-test overlap.

00:21:43 There's a measurement-theory paper from Chouldechova's group from last year that shows attack-success-rate comparisons across systems are often based on low-validity measurements. Each of those is a separate hole in the safety-claim apparatus. The Oxford paper stacks them and proposes a three-tier disclosure framework as the patch.

00:22:03 The three tiers are public disclosure, controlled disclosure, and claim-restricted disclosure. Public is what you'd expect. Controlled means an artifact a third party can't release publicly but can audit through a federated colloquium of qualified secure-review hosts.

00:22:19 Claim-restricted means a claim whose artifacts can't be reviewed even confidentially, in which case the paper's recommendation is to scale the claim down so it matches what the evidence can support. The piece also proposes a mandatory claim inventory and scope statement on every submission — so the conference reviewer can see what was claimed, what was tested, and what was withheld, side by side, on the first page.

00:22:45 There's a reason this is a NeurIPS-targeted paper rather than a regulator-targeted one. Conferences are where the safety claims get laundered into citable form. A lab can write a system card that nobody can audit, and then the same lab can submit a NeurIPS paper that cites the system card as supporting evidence.

00:23:03 If NeurIPS treats non-reproducibility as a methodology failure rather than a transparency preference, the supporting evidence stops being available, and the chain breaks. The authors are very specific about this. They call it, in their words, treating non-reproducibility not as a transparency preference but as an evaluation-methodology failure.

00:23:24 That's what a journal editor or a conference chair can operationalize without waiting for the EU AI Office or a US executive order. The case against the paper is the obvious one. Several frontier labs have already argued that they can't disclose the artifacts behind a safety claim without leaking capability information.

00:23:44 The Oxford team's response is the federated colloquium — a peer-review hosting structure that already exists in cybersecurity and clinical trials. Either you trust a small set of vetted secure-review hosts with the artifact, or you stop claiming things you can't show.

00:24:00 The paper says, bluntly, that the standard the community applies to its most consequential claims should be at least as high as the standard it applies to its least. This isn't going to land at NeurIPS in time for the December conference. The position paper is the beginning of a multi-year effort to move the editorial floor on safety claims, and the labs that publish there will resist parts of it.

00:24:24 What's worth tracking is whether the colloquium structure gets piloted before the EU AI Act's general-purpose obligations come fully online. If it does, the third-party assessment story for frontier models has a working venue before the regulators arrive. If it doesn't, the regulators will build their own, and the labs will have less say in what counts as sufficient.

00:24:46

Tomorrow's Reading

00:24:46 Six stories cluster around one stress point today: the unit at which AI gets governed, priced, and held responsible. The Nelson complaint pushes that unit down to a single conversation between a model and a teenager, while the DeepMind co-clinician paper pushes it up to a triadic encounter with two humans and a system.

00:25:03 Project Suncatcher proposes a unit that no jurisdiction can reach, and Freymann's Taiwan interview names the unit the industry already failed to redistribute. Perplexity's Blackwell research sets the unit of cost, and the Oxford NeurIPS paper asks whether the unit of evidence is reproducible at all.

00:25:19 OpenAI's answering brief in the Nelson case is the next document I'm looking for. The SpaceX prospectus should follow, possibly within weeks. The Xi-Trump summit will produce two competing readouts within an hour of the handshake, and the export-control language in both is where the chip story turns from speculation into policy.

00:25:36 Back tomorrow. Jonas.