◆ Dispatch 025 · 2026-05-29 Trust, But Re-Execute

The Receipts You Can't Check

2026-05-29 / 00:19:44 / 16 sources

“Every audit has to trust some piece of evidence — and right now the evidence comes from exactly the party with the strongest reason to fake it.”
— Jonas Vale, today's narration

Five stories, one nerve: the distance between a claim and anyone's ability to check it. SpaceX takes a $4.16B contract to put America's airborne-threat tracking in orbit on its own Starshield platform. A Reuters investigation finds Tesla's "10x safer" math inflated by roughly three, and its robotaxi zones pre-mapped after all. OpenAI hands governments a life-sciences model it once warned could help build bioweapons. A new paper shows per-token AI billing is unauditable by design. And the FDA opens the door to swapping animal studies for computational models, just as fresh research shows where medical AI silently breaks.

SpaceNews and Bloomberg via Techmeme — the $4.16B Space Force AMTI award to SpaceX.
Reuters via Electrek — Tesla's inflated FSD safety math and pre-mapped robotaxi zones.
Axios and The Decoder — OpenAI's Rosalind Biodefense program.
arXiv — token-billing audits that hidden reasoning can inflate undetected.
FDA draft guidance on swapping animal studies for computational models, alongside medical-AI papers on triage and retrieval.

Chapters

00:00:04 The receipts you can't check
00:01:11 The Pentagon's eyes move to a private constellation
00:04:47 Tesla's safety math, taken apart by the people who built it
00:09:06 OpenAI hands governments the double-edged model
00:12:24 The meter nobody can read
00:15:26 Trading the wet lab for the model — before we know the model

Sources

16 cited

1
"The pitchforks are here": Billionaires work to contain AI's populist revolt

Article Zachary Basu / Axios

If they don't support a good version, they risk a bad version designed by a mob.
www.axios.com/2026/05/29/ai-billionaires-te… →
Details
Cited text
If they don't support a good version, they risk a bad version designed by a mob.

Context
The people building frontier AI and the politicians campaigning against it now share a premise: AI concentrates wealth fast enough to be politically destabilizing. The fight is over who writes the response.
Key points
Bezos floats zero federal income tax for the bottom 50% of earners.
Altman has shifted from universal basic income to 'universal basic compute'; OpenAI floated public wealth funds, taxes on AI returns, and a four-day workweek.
Musk backs 'universal high income' funded by robot-driven growth.
Amodei frames a wealth tax pragmatically: support a good version or get a bad one 'designed by a mob.'
Backlash context: Warren tax-code push, NY pied-a-terre tax, a California 5% billionaire-wealth-tax signature drive, the Sanders/AOC 'Fighting Oligarchy' tour.
Provenance
Article · Supporting source
2
Inside the Democratic resistance on AI

Article Maria Curi / Axios

AI is becoming a 2026-2028 campaign axis. Data-center siting, power bills, and labor displacement are the concrete hooks turning an abstract technology into local politics.
www.axios.com/2026/05/29/inside-democratic-… →
Details
Context
AI is becoming a 2026-2028 campaign axis. Data-center siting, power bills, and labor displacement are the concrete hooks turning an abstract technology into local politics.
Key points
Five progressives shaping a confrontational message: Sanders, Ocasio-Cortez, Khanna, Warren, and Maine candidate Graham Platner.
Sanders pushes data-center moratoriums, worker protections, and opposes tech-funded super PACs.
AOC displayed contaminated water from a Georgia Meta facility during congressional testimony.
Khanna calls data centers 'extractive' and proposes 'Work for America' to train 1 million Americans for public-sector jobs.
Warren proposes taxes on AI companies and data centers and is investigating electricity-cost impacts.
Provenance
Article · Supporting source
3
SpaceX wins $4.16B Space Force contract to detect airborne moving targets

Article

SpaceX has won a contract for more than $4 billion to build satellites to track foreign aircraft and missiles as part of President Donald Trump's Golden Dome defensive shield.
breakingdefense.com/2026/05/spacex-wins-4-1… →
Details
Cited text
SpaceX has won a contract for more than $4 billion to build satellites to track foreign aircraft and missiles as part of President Donald Trump's Golden Dome defensive shield.

Context
SpaceX is consolidating launch, communications, and now orbital tracking under one vendor the Pentagon increasingly can't route around — a concentration-of-leverage story as much as a defense story.
Key points
$4.16B Other Transaction Authority award for the Space-Based Airborne Moving Target Indicator (SB-AMTI) program.
Satellites to detect and maintain custody of fighters, bombers, cruise missiles, and potentially hypersonic weapons.
One piece of the Golden Dome missile-defense architecture of thousands of satellites.
OTA contracting bypasses traditional procurement rules for speed and flexibility; constellation targeted by 2028.
SpaceX booked roughly $6.4B in Space Force contracts in a single week.
Provenance
Article · Supporting source
4
BioRefusalAudit: Auditing Biosecurity Refusal Depth Using Sparse Autoencoders

Article Caleb DeLeeuw

If open-weight biology capability is already loose with shallow refusals, gating a single closed model behind 'trusted developers' addresses only part of the dual-use surface.
arxiv.org/abs/2605.30162 →
Details
Context
If open-weight biology capability is already loose with shallow refusals, gating a single closed model behind 'trusted developers' addresses only part of the dual-use surface.
Key points
Audits how deep biosecurity refusals run in open models including Gemma, Llama, and Qwen.
Surface-level refusals can sit on top of capability that remains accessible.
Refusal 'depth' is a better safety measure than whether a model says no once.
Provenance
Article · Supporting source
5
Boston Children's uses AI to unlock new diagnoses

Article

This is frontier AI in live clinical use on real children, not a demo — and it lands the same day the FDA moves to change the evidence bar for drug development.
openai.com/index/boston-childrens-hospital →
Details
Context
This is frontier AI in live clinical use on real children, not a demo — and it lands the same day the FDA moves to change the evidence bar for drug development.
Key points
Boston Children's built a 'co-pilot geneticist' that integrates genetic data, phenotype, and global medical literature.
Helped diagnose more than 40 rare-disease cases previously thought impossible, and surfaced new gene targets and therapeutic pathways.
Also deployed on operations: invoice intake and routing in supply chain, and operating-room scheduling to lift utilization.
Provenance
Article · Supporting source
6
G7 nations agree first-ever joint approach to protecting children online and drive safe AI growth

Article UK Department for Science, Innovation and Technology

Ordinary citizens and businesses will only see those benefits when they have trust that these technologies are being developed safely and responsibly.
www.gov.uk/government/news/g7-nations-agree… →
Details
Cited text
Ordinary citizens and businesses will only see those benefits when they have trust that these technologies are being developed safely and responsibly.

Context
A coordinated G7 line on child safety and 'AI openness' is the soft-law layer that hardens into national rules; the open question is whether risk-assessment frameworks can be monitored at all.
Key points
G7 Digital Ministers in Paris agreed a first common approach to protecting children online, addressing AI chatbot risks and age assurance.
Agreed an SME AI-adoption tool built with the OECD and a 'Vision on AI Openness.'
Under France's presidency, agreed to further work on mutual AI risk-assessment frameworks.
Flagged chemical and biological capability threats and AI-content detection.
UK Science Secretary Liz Kendall tied benefits to public trust.
Provenance
Article · Supporting source
7
Does Distributed Training Undermine Compute Governance?

Article Robi Rahman

Policymakers writing AI risk frameworks are leaning on the idea that big training runs are visible; this paper questions the foundation under that bet.
arxiv.org/abs/2605.29359 →
Details
Context
Policymakers writing AI risk frameworks are leaning on the idea that big training runs are visible; this paper questions the foundation under that bet.
Key points
Compute-governance proposals assume frontier training needs large, detectable clusters.
Distributed training across many smaller sites could weaken that assumption and the ability to monitor it.
If training disperses, the technical premise behind a lot of governance and export-control monitoring gets shakier.
Provenance
Article · Supporting source
8
Space Force awards SpaceX $4.16 billion to build satellite network for airborne target tracking

Article Sandra Erwin / SpaceNews

We will not leverage any one single provider.
spacenews.com/space-force-awards-spacex-4-1… →
Details
Cited text
We will not leverage any one single provider.

Context
The Pentagon is shifting battlefield surveillance from crewed aircraft to a private constellation, deepening dependence on one vendor that also controls Starlink comms and is heading toward an IPO.
Key points
$4.16B Other Transaction Authority award for the first increment of a space-based Air Moving Target Indicator (AMTI) network — a proliferated LEO constellation to track aircraft, bombers, cruise missiles and potentially hypersonics.
Satellites to be built on SpaceX's Starshield platform, the government variant of Starlink; initial constellation targeted to field by 2028.
Comes days after a separate $2.29B award to SpaceX for the Space Data Network backbone — giving SpaceX a central role in both sensing and military comms.
Col. Ryan Frazier stressed SpaceX won't be the sole supplier; an IDIQ vendor pool will compete for future awards.
DoD FY2027 budget seeks $7.1B for AMTI; part of Trump's Golden Dome missile-defense architecture.
Provenance
Article · Supporting source
9
US Space Force says SpaceX won a $4.16B contract for Golden Dome tracking network (Bloomberg)

Article Sana Pashankar / Bloomberg
www.techmeme.com/260529/p27 →
Details
Key points
SpaceX won a contract worth more than $4 billion to build satellites tracking foreign aircraft and missiles for Trump's Golden Dome shield.
Reporting notes SpaceX took roughly $6.45B in Space Force contracts in a single week.
Award lands as SpaceX prepares for a potential IPO.
Provenance
Article · Supporting source
10
Tesla's own AI trainers don't trust 'Full Self-Driving' or its safety stats, Reuters finds

Article Fred Lambert / Electrek (on Reuters investigation)

It's like saying: 'My jet airplane is faster than your World War II bomber.' Yeah, so, what's your point?
electrek.co/2026/05/28/tesla-fsd-safety-sta… →
Details
Cited text
It's like saying: 'My jet airplane is faster than your World War II bomber.' Yeah, so, what's your point?

Context
The gap between Tesla's safety marketing and its insiders' own assessment bears directly on liability, regulatory exposure, and whether autonomy claims can be trusted as deployment scales.
Key points
Reuters interviewed 9 former Tesla data labelers, a former self-driving engineer, and 11 traffic-safety researchers.
Tesla's '10x safer than humans' claim rests on comparing its own airbag-deployment crashes to federal data counting all tow-away crashes; a correct apples-to-apples comparison drops the edge to ~3x and is further confounded by fleet-age gap (4.1 vs 12.8 years).
10 of 11 researchers called the stats misleading marketing; 7 of 9 labelers said they wouldn't trust FSD to drive them.
Reuters found Tesla extensively pre-mapped robotaxi zones (Cybercab lot, Austin) — contradicting Musk's claim FSD needs no 'laborious local mapping' like Waymo.
NHTSA has four active FSD/Autopilot investigations; Tesla still runs only ~20 unsupervised robotaxis in Austin.
Provenance
Article · Supporting source
11
OpenAI is giving away its life sciences AI model to help governments prepare for the next pandemic

Article Matthias Bastian / The Decoder

A frontier lab is positioning itself as the gatekeeper of who gets biosecurity-grade AI, blurring the line between the threat it warns about and the defense it sells.
the-decoder.com/openai-is-giving-away-its-l… →
Details
Context
A frontier lab is positioning itself as the gatekeeper of who gets biosecurity-grade AI, blurring the line between the threat it warns about and the defense it sells.
Key points
OpenAI launched the Rosalind Biodefense program, giving vetted developers and government partners free, sponsored access to GPT-Rosalind, a life-sciences model that reasons about molecules, proteins, genes and disease biology.
Early partners: Lawrence Livermore National Laboratory, Johns Hopkins Applied Physics Laboratory, vaccine coalition CEPI; Fourth Eon and SecureDNA use it for DNA screening.
OpenAI says it briefed the White House and several federal agencies and is extending 'trusted access' to US government and allied partners.
Same dual-use capability OpenAI and Anthropic have warned could enable AI-assisted bioweapons is now the basis of the defensive program.
Provenance
Article · Supporting source
12
Exclusive: OpenAI launches biodefense program

Article Maria Curi / Axios
www.axios.com/2026/05/29/openai-biodefense-… →
Details
Key points
Axios first reported OpenAI briefed the White House on the biodefense program built around GPT-Rosalind.
Program sponsors access for trusted developers building early-warning, diagnostics, screening, and medical-countermeasure tools.
Provenance
Article · Supporting source
13
Token Inflation: How Dishonest Providers Can Overcharge for Large Language Model Usage

Article Shahinul Hoque, Jinghuai Zhang, Jinyuan Sun, Fnu Suya

In the most permissive setting, hidden reasoning usage can be inflated by 1,469% on average without detection.
arxiv.org/abs/2605.30040 →
Details
Cited text
In the most permissive setting, hidden reasoning usage can be inflated by 1,469% on average without detection.

Context
As enterprises wire budgets to per-token meters, the meter itself is unauditable — a structural trust gap under the entire AI cost base.
Key points
Per-token billing is 'hard to audit by design': providers hide the model, tokenizer, and execution, so an auditor can only inspect proofs the provider supplies.
A 'trust paradox': every audit must trust some artifact, but current frameworks trust exactly the ones a provider has the most reason to manipulate.
Hidden reasoning tokens can be inflated ~1,469% on average undetected — turning a $100 honest bill into ~$1,569 on the same query at frontier reasoning prices.
Even when the user sees the full reasoning string, tokenization ambiguity alone allows 50.85% over-reporting below detection thresholds.
Fixes require evidence the provider doesn't control: trusted execution attestation, cryptographic proofs of inference, or third-party re-execution.
Provenance
Article · Supporting source
14
FDA Issues Draft Guidance to Cut Unnecessary Animal Testing for Cancer Drugs

Article FDA Office of the Commissioner

replacing three-month non-human primate studies with a weight-of-evidence risk assessment ... may include New Approach Methodologies, as appropriate.
www.fda.gov/news-events/press-announcements… →
Details
Cited text
replacing three-month non-human primate studies with a weight-of-evidence risk assessment ... may include New Approach Methodologies, as appropriate.

Context
A regulator is opening the door to swap living test subjects for computational evidence — a structural bet that in-silico models are reliable enough to gate human trials.
Key points
FDA draft guidance would cut animal testing in nonclinical safety studies for certain oncology biologics and conjugated products.
Recommends using a single relevant species instead of two, rodent-only studies, or replacing animal studies with evidence-based 'New Approach Methodologies' (which include AI/computational models).
Framed as shaving time off the 10–12 years it takes to bring a drug to patients; builds on COVID-era practices reducing non-human-primate use.
Public comment open until July 30, 2026.
Provenance
Article · Supporting source
15
Internal Representation, Not Clinical Knowledge: Where Apparent LLM Triage Failures Originate

Article David Fraile Navarro, Berardino Como, et al.

How a medical model is queried can flip whether it looks safe — a warning for anyone trusting headline benchmark scores to gate clinical deployment.
arxiv.org/abs/2605.29889 →
Details
Context
How a medical model is queried can flip whether it looks safe — a warning for anyone trusting headline benchmark scores to gate clinical deployment.
Key points
Consumer LLMs show high under-triage rates on multiple-choice clinical-triage benchmarks but score differently on the same cases in free-text.
Using sparse-autoencoder features, the authors find medical features fire on the clinical narrative but go silent at the multiple-choice decision token — format and scaffold features, not clinical knowledge, drive the answer.
The gap is dominated by off-by-one acuity errors (picking an adjacent severity level), not knowledge failure.
Suggests benchmark format can both hide and manufacture apparent medical competence.
Provenance
Article · Supporting source
16
Same Question, Different Source, Different Answer: Auditing Source-Dependence in Medical Multi-Source RAG

Article Yubo Li, Rema Padman, Ramayya Krishnan

When a clinical assistant's answer depends on which handbook it grabbed, correctness alone can't certify it safe to deploy.
arxiv.org/abs/2605.29084 →
Details
Context
When a clinical assistant's answer depends on which handbook it grabbed, correctness alone can't certify it safe to deploy.
Key points
A retrieval-augmented system over a multi-author institutional corpus can give different answers to the same question depending on which source it retrieves.
Demonstrated in transplant patient education, where institutional handbooks demonstrably disagree (TransplantQA benchmark + HERO-QA retrieval audit).
Better retrieval surfaced far more inter-source disagreement than prior estimates — the problem was understated, not overstated.
Argues source-dependence is a missing axis of evaluation for any deployed multi-source system, including legal and educational RAG.
Provenance
Article · Supporting source

00:00:04

The receipts you can't check

00:00:04 It's Friday, the twenty-ninth of May, and I want to walk you through five things that happened today. They don't look like the same story. There's a four-billion-dollar military satellite contract, a Reuters investigation into Tesla, OpenAI handing a biology model to governments, a research paper about your AI bill, and the Food and Drug Administration rethinking animal testing.

00:00:25 They're different beats, playing out in different rooms. But sit with them for a minute and they rhyme. Each one comes down to a claim somebody is making — about safety, about cost, about who can be trusted with a dangerous capability — and how hard it is for anyone standing outside the building to check that claim.

00:00:42 That's the nerve I'll keep pressing today — who's making the assertion, who has to take it on faith, and what happens to the rest of us when the marketing turns out to be dressed up as measurement. A quick note before we start, because I've been tracking the people in a couple of these stories all week.

00:00:59 Elon Musk shows up twice today, on opposite ends — winning a defense contract in one story and getting his safety math taken apart in another. That contrast is most of what I want you to hold onto. Let's start in orbit.

00:01:11

The Pentagon's eyes move to a private constellation

00:01:11 Start with the number. The U.S. Space Force awarded SpaceX four-point-one-six billion dollars to build a constellation of satellites that tracks airborne targets from orbit. The reporting is from Sandra Erwin at SpaceNews and Sana Pashankar at Bloomberg, and the program has an ungainly name: the space-based Air Moving Target Indicator, AMTI for short.

00:01:32 What it does is straightforward. It's meant to detect and follow things moving through the atmosphere — fighter jets, bombers, cruise missiles, and potentially hypersonic weapons — from space, on a global basis. It's one piece of the Golden Dome missile-defense architecture the Trump administration has been assembling.

00:01:51 Here's why this is bigger than another defense award. For decades, that job — watching the sky for moving threats — belonged to aircraft. The E-3 AWACS, with the big radar dome on top. More recently the E-7 Wedgetail. Crewed planes, flying orbits, with humans aboard.

00:02:07 The Pentagon's argument is that those planes are getting harder to keep alive as adversaries build long-range systems designed to push them back or shoot them down. So the mission is migrating to a proliferated mesh of small satellites in low Earth orbit, which is harder to knock out wholesale.

00:02:24 The Space Systems Command put it plainly: the long-standing method of using airborne platforms to track moving targets faces continued challenges, so the requirement for a layered, resilient tracking architecture is, in their words, evident. The satellites will be built on SpaceX's Starshield platform — that's the government-only variant of Starlink, operated for national security missions.

00:02:47 The first constellation is targeted to be flying by 2028. And the four billion is only an increment. The Defense Department's 2027 budget request asks for seven-point-one billion dollars for AMTI overall. Now follow the week, not just the day. This award landed just days after the Space Force picked SpaceX for a separate two-point-three-billion-dollar contract to build the Space Data Network backbone — a mesh that moves data across military satellites.

00:03:14 Add in the rest, and SpaceX took something north of six billion dollars in Space Force contracts in a single week. So the same company now sits at the center of two different layers of America's emerging space architecture: the sensing layer that spots the threat, and the communications layer that carries the data about it.

00:03:33 And this is happening while SpaceX is reportedly steering toward an eventual public offering. If you've been with me this week, you know the question I keep circling. It's about what can't be replaced. Last week we talked about Starlink's pricing leverage and the Space Force depending on the only network that works.

00:03:52 The week before, about the Colossus compute lease. The pattern keeps landing on the same place — power flows to whoever controls what you can't route around. To their credit, the people running this award seem to know that's the worry. Colonel Ryan Frazier, who oversees space-based sensing and targeting acquisition, made a point of saying SpaceX won't be the only supplier.

00:04:14 There's a vendor pool, an arrangement where other firms can compete for future orders. His exact line was, we will not leverage any one single provider. I believe that's the intent. What I'd watch is whether it survives contact with reality. Right now SpaceX is the only publicly named AMTI contractor, it has the launch capacity nobody else matches, and it's building the comms backbone the sensing data rides on.

00:04:38 A vendor pool on paper isn't the same as a second source you could actually switch to in a crisis. The next year of awards will tell us which one this is.

00:04:47

Tesla's safety math, taken apart by the people who built it

00:04:47 The other Musk story today cuts the other way. Reuters published an investigation into Tesla's Full Self-Driving — FSD — and it's one of the more damning things I've read on autonomous driving claims. I'm working from the Reuters reporting as relayed by Fred Lambert at Electrek, since Reuters itself was hard to reach directly.

00:05:06 The reporting rests on interviews with nine former Tesla data labelers, a former self-driving engineer, and eleven independent traffic-safety researchers. Let me start with the statistic, because everything else rests on it. Tesla executives have repeatedly said Full Self-Driving is up to ten times safer than a human driver.

00:05:24 The chief financial officer said it. The board chair repeated it at a shareholder meeting. Musk showed a chart claiming eighty-five percent fewer crashes. Here's how that number is built, and it's not subtle. Tesla counts crashes in its own vehicles where the airbags deployed.

00:05:40 Then it compares that to a federal crash rate — but the federal figure counts every crash serious enough that a car had to be towed. A tow-away crash is a much lower bar than an airbag going off. Plenty of towed cars never deploy an airbag at all. So Tesla is comparing its most severe crashes against everyone else's fender-benders-and-up, and declaring victory.

00:06:01 And the detail that turns this from sloppy to deliberate: the federal data Tesla used already breaks out airbag-deployment crashes as their own category. Tesla could have made the apples-to-apples comparison. It chose the one that flattered the product. A researcher at the University of Michigan, Marco Benedetti, ran the correct version — airbag crashes for Teslas against airbag crashes for all vehicles.

00:06:24 The ten-times-safer claim collapsed to roughly three times the distance between crashes. And even that's generous, because Tesla's fleet averages about four years old while the overall U.S. fleet averages closer to thirteen. Newer cars crash less for reasons that have nothing to do with the software.

00:06:41 Phil Koopman, the Carnegie Mellon professor who studies this, put it the way I wish I'd thought of: it's like saying my jet airplane is faster than your World War Two bomber. Yeah, so, what's your point. Ten of the eleven researchers Reuters consulted called the statistics misleading marketing rather than a serious safety study.

00:07:00 Then there's what the workers think. These are the people in a Utah office who watch footage from the eight cameras on FSD cars all day. Seven of the nine former labelers said they wouldn't trust the system to drive them. One said he wouldn't ride in a Tesla robotaxi, and I'm cleaning up the language, if you paid him.

00:07:19 They described watching the car fail at ordinary things — not pulling over for emergency vehicles, crowding motorcyclists, struggling on freeway off-ramps, and blowing into construction zones. There was a specialized group in Palo Alto they called the trauma team, focused on near-misses with pedestrians, including clips of cars nearly hitting children in crosswalks.

00:07:40 And one more finding that matters for the business, not just the safety record. Musk's central pitch for why Tesla will scale robotaxis faster than Waymo is that Tesla doesn't need what he calls laborious local mapping — it just drives on cameras and neural nets.

00:07:55 Reuters found that before the Cybercab reveal and before the Austin robotaxi launch, Tesla crews spent weeks driving the exact routes at night, and labelers spent hundreds of hours annotating curbs, stop lights, and road markings in those zones. The Utah team doubled to about three hundred people in the six months before Austin.

00:08:14 In other words, what Musk says Tesla doesn't do is exactly what Tesla did to make the launches go smoothly. Nearly a year on, Austin still runs only about twenty unsupervised robotaxis in a carefully mapped area. Why does this belong on a show about power and consequence, not just a car review?

00:08:31 Because the regulator is circling. The National Highway Traffic Safety Administration has four active investigations into FSD and Autopilot, including one into cars running red lights and turning into oncoming traffic. There's already a two-hundred-and-forty-three-million-dollar verdict from an Autopilot crash that killed a young woman in Florida.

00:08:51 When a company's public safety case is built on a comparison its own engineers call worthless, every one of those proceedings gets a little easier for the other side. Marketing math has a way of becoming evidence in a courtroom, and not the kind the defendant wants.

00:09:06

OpenAI hands governments the double-edged model

00:09:06 Now to a story that's harder to feel clean about in either direction. OpenAI launched something called the Rosalind Biodefense program. The reporting is from Maria Curi at Axios, who broke it, and Matthias Bastian at The Decoder. The short version: OpenAI is giving vetted developers and government partners free, sponsored access to a model called GPT-Rosalind.

00:09:27 It's a life-sciences model the company introduced in April that, in its framing, reasons about molecules, proteins, genes, and disease biology better than a general-purpose model does. The stated goal is biodefense and pandemic preparedness. OpenAI is covering the access costs for teams building things like early-warning systems, diagnostics, and vaccine-development tools.

00:09:49 The early partners are not small names. Lawrence Livermore National Laboratory. The Johns Hopkins Applied Physics Laboratory. CEPI, the international coalition that funds vaccine work. Two firms, Fourth Eon and SecureDNA, are using the model for DNA screening — the work of catching whether a gene-synthesis order matches something dangerous.

00:10:09 And OpenAI says it briefed the White House and several federal agencies on the approach, and is extending what it calls trusted access to U.S. government and allied partners with approved public-health missions. Here's the part I keep turning over. For two years, OpenAI and Anthropic have been among the most vocal in warning that frontier models could lower the bar for building biological weapons — that a capable enough model walking a bad actor through protein engineering or pathogen design is one of the catastrophic risks the field takes most seriously.

00:10:42 That warning is the same capability described in this announcement. A model that reasons about molecules, proteins, and disease biology well enough to speed up a vaccine is, by construction, a model that reasons about those things well enough to worry you in the wrong hands.

00:10:59 The biodefense program and the bioweapon fear are the same model pointed in different directions. Which is exactly why the structure here is worth naming. OpenAI isn't open-sourcing this. It's not posting weights. It's handing out gated, sponsored access to partners it vets and approves.

00:11:15 And I understand the logic — you don't want this capability loose, so you keep a hand on the valve. But look at what that makes the company. OpenAI becomes the entity that decides which national labs, which governments, which allied partners get biosecurity-grade AI, and which don't.

00:11:32 That's not a product decision. That's something closer to a quasi-governmental gatekeeping role over a dual-use technology, run by a private company that answers to its investors and its board. There's a real argument that a frontier lab is better placed than anyone to do this triage — it built the model, it knows what it can do, it can move faster than a treaty.

00:11:54 But I'd want to know who's checking the gatekeeper. When the same institution defines the threat, builds the dangerous capability, sells the defense against it, and decides who's trusted enough to receive it, every one of those roles is a place where its commercial interest and the public interest could quietly diverge.

00:12:13 The partners listed today are reassuring. The framework that decides the next hundred partners is what I'll be tracking, because that framework, not this launch, is where the power sits.

00:12:24

The meter nobody can read

00:12:24 Here's an item that got far less attention, but I think it sits underneath a lot of what we've covered this month. A new paper out of a security group — the lead author is Shahinul Hoque — has a blunt title: Token Inflation, how dishonest providers can overcharge for large language model usage.

00:12:41 And it's about something almost nobody checks: the bill. Almost every commercial AI service charges you per token — per chunk of text in and out. So the honesty of the token count the provider reports is exactly what you pay. The paper's argument is that this kind of billing is hard to audit by design.

00:12:58 To protect their intellectual property, to fight jailbreaks, and to preserve user privacy, providers hide the model, hide the tokenizer that chops text into tokens, and hide the execution. So when an auditor tries to verify your bill, the only thing they can inspect is proof the provider chooses to hand over.

00:13:16 The audit collapses into checking whether the provider's own numbers agree with the provider's own numbers. The authors call this a trust paradox, and I think it's the right phrase. Every audit has to trust some piece of evidence. The current frameworks trust exactly the artifacts a provider has the strongest reason to manipulate.

00:13:35 They tested three recent token-auditing schemes and showed that a provider with ordinary, off-the-shelf capabilities could systematically pad the count. The headline figure is rough. In the most permissive setting — where the model does hidden reasoning you never see — billed usage could be inflated by an average of about fifteen hundred percent without tripping detection.

00:13:57 Their example: at frontier reasoning prices, a hundred-dollar honest bill becomes roughly fifteen hundred and sixty-nine dollars on the same query. And even in the friendly case where you can see the full reasoning text, just the ambiguity in how text gets tokenized still allowed over-reporting of about fifty percent below the detection threshold.

00:14:17 I want to be careful here. This is a vulnerability paper, not an accusation. It doesn't claim any named provider is doing this. What it claims is that the structure permits it and nobody outside could prove otherwise. That's the part that matters for this show.

00:14:32 Because think about where the money is going. Yesterday I talked about Anthropic's forty-seven-billion-dollar revenue run rate. This week we've talked about enterprises getting bill shock and emerging futures markets for AI tokens — financial instruments priced off token consumption.

00:14:49 All of that is denominated in a unit the buyer can't independently verify. Law firms wiring eight figures a year to a model provider, hedge funds writing contracts on token prices, and governments standing up the procurement we just discussed — every one of them is trusting a meter they're not allowed to read.

00:15:07 The authors say the fix has to come from evidence the provider doesn't control: trusted hardware attestation, cryptographic proofs that the inference actually ran the way claimed, or independent re-execution. None of that is deployed broadly yet. Until it is, the entire cost base of this industry rests on taking the seller's word for it.

00:15:26

Trading the wet lab for the model — before we know the model

00:15:26 Last story, and it ties the whole thread in a bow. Today the FDA issued draft guidance to cut unnecessary animal testing in the safety studies for certain cancer drugs — specific biologics and conjugated products. The recommendations: use one relevant animal species instead of two where you can, lean on rodent-only studies in some cases, and in others replace a three-month non-human-primate study with what they call a weight-of-evidence risk assessment.

00:15:52 And that assessment, the guidance says, may include New Approach Methodologies — which is the regulator's term for non-animal methods, and increasingly that means computational and AI-based models. The Oncology Center of Excellence director, Angelo de Claro, framed it as cutting time and cost out of the ten-to-twelve years it takes to get a drug from discovery to a patient.

00:16:14 Public comment is open until the end of July. I want to be fair to this, because there's a good version. Fewer monkeys and dogs dosed to no scientific end is a real ethical gain, and if a model can rule out a toxic compound before it ever reaches an animal or a person, that's faster, cheaper, and kinder.

00:16:31 I'm not against it. But notice what the move actually is. A regulator is opening the door to swapping a living test subject for a computational prediction as the evidence that gates a human trial. That's a bet that the models are reliable enough to stand where the animal used to stand.

00:16:47 And on the very same day, two fresh papers gave me reasons to hold that bet carefully. The first is a paper from a team led by David Fraile Navarro on medical triage. Consumer models score badly on patient-triage benchmarks when the answer is forced into multiple choice — they under-triage, meaning they wave through people who needed urgent care.

00:17:07 The researchers dug into the model's internals and found something unsettling. The medical understanding was actually there — the features that represent the clinical situation fired correctly on the patient's story. But at the moment the model had to commit to a multiple-choice letter, those medical features went silent, and the answer got driven by the format of the question instead.

00:17:29 The model knew, and then the scaffolding around the question made it answer wrong anyway, usually by picking the severity level one notch off. So how you query a medical model can flip whether it looks safe — and a benchmark score can both hide real competence and manufacture fake competence.

00:17:45 The second, from Yubo Li and colleagues, looked at retrieval systems — the setup where a model answers by pulling from a document library, retrieval-augmented generation. They built it over real institutional handbooks for transplant patients and asked the same question many times.

00:18:01 The answer changed depending on which source the system happened to retrieve, because the institutions themselves disagree. And the better they made the retrieval, the more disagreement surfaced — the problem was bigger than anyone had measured, not smaller. Correctness against a single gold answer can't even see that failure.

00:18:20 Put those next to the FDA's draft and you get the shape of the day. We are moving real-world trust onto computational systems whose claims are hard for an outsider to verify — across defense, transportation, biosecurity, finance, and now drug approval. And in several cases, the people closest to the systems are the ones raising their hands.

00:18:39 SpaceX may be the only supplier we can actually reach. Tesla's safety number was built to flatter. OpenAI is both the threat and the guard. The token meter can't be read. And the models we're about to let stand in for animals can be right in their representation and wrong in their answer, depending on how you ask.

00:18:57 None of this means stop. It means the verification has to grow up as fast as the deployment, and right now it isn't. What I'm watching into next week is narrow and concrete: whether the FDA's final guidance says anything about how a computational model earns the right to replace an animal study — what evidence, audited by whom.

00:19:16 If it's specific, that's a regulator taking the verification problem seriously. If it's vague, we'll have moved another high-stakes decision onto a claim nobody outside the building can check. That's the whole show today, and it's why I keep coming back to the same instinct: when someone hands you a number, ask who gets to look at the receipt.

00:19:35 I'm Jonas.