◆ Dispatch 005 · 2026-05-06
Compute Becomes a Commodity, Coinbase Picks Its Alibi, and Colossus Goes to Claude
“Compute is becoming a priced, hedged, traded industrial input — and the frontier labs aren't customers anymore, they're counterparties.”
— Jonas Vale, today's narration
Today on IMPULSE: Anthropic signs a reported $200 billion deal with Google Cloud for roughly five gigawatts of capacity, and Larry Fink tells investors compute is heading toward futures markets. Coinbase cuts 14% of its workforce and hands the press an AI rationale, even though revenue and crypto cycle math tell a more familiar story. Elon Musk's xAI rents the entire Colossus 1 cluster — about 220,000 GPUs — to Anthropic, the same company Musk spent the year suing.
Then we move offstage. California posts the first implementation roles for SB 53, and the job descriptions tell you what frontier-AI regulation will actually look like. The FDA rolls out Elsa 4.0 across reviewer workflows and starts consolidating decades of inspection and adverse-event data into a single AI-ready repository. A new benchmark from Mount Sinai puts frontier models at 46% on real-world EHR physician tasks. Chinese labs Kimi and DeepSeek raise at $20-plus and $45 billion valuations with state capital in the mix. And a new paper from a Stanford-affiliated team documents what they call the Compliance Trap — measurable metacognitive degradation in models pushed under adversarial pressure.
One throughline: capacity, money, and oversight are arriving from very different directions, on very different clocks.
Chapters
- 00:00:04 Cold open — the week compute became a commodity
- 00:01:27 Anthropic, Google, and the five-gigawatt question
- 00:04:23 Colossus changes hands — the strangest sublease of the year
- 00:06:31 Coinbase picks its alibi
- 00:08:47 California hires its regulators
- 00:11:16 Elsa 4.0 and the slow AI'ing of the FDA
- 00:14:23 Capital, capability, and the compliance trap
Sources
8 cited-
1
Who Cares About Consumer AI
Video The AI Daily Brief
There is not an AI bubble. There is the opposite. We're short power. We're short compute. We're short chips. Demand is growing much faster than anyone has ever anticipated.
www.youtube.com/watch?v=f2lynShlg20 →Details
- Cited text
There is not an AI bubble. There is the opposite. We're short power. We're short compute. We're short chips. Demand is growing much faster than anyone has ever anticipated.
- Context
- Frames today's market story: compute is being financialized as a commodity by the largest asset manager in the world, while Coinbase shows how 'AI' has become the universally accepted alibi for layoffs that have nothing to do with model capability.
- Key points
- Anthropic's deal with Google Cloud is reportedly worth $200B over five years for ~5GW of compute, the lion's share of Google's $462B cloud backlog
- Combined Microsoft, Oracle, Google, and Amazon cloud backlog now ~$2 trillion with OpenAI and Anthropic accounting for nearly half
- Coinbase laid off 14% of its workforce — about 700 of 5,000 — with media uncritically blaming AI rather than a 47% YoY crypto trading slump cited at peer Robinhood
- Palantir reported 85% YoY revenue growth and $870M quarterly net income; CTO Shyam Sankar called tokens 'the new coal'
- BlackRock CEO Larry Fink predicts compute futures market and says US is short power, compute, and chips — denies an AI bubble
- Cerebras IPO presale flipped to auction format with $10B in investor allocations sought against $3.5B offering at $26.6B valuation
- Provenance
- Video · Supporting source
-
2
tetsuoai
X tetsuoai
xAI and SpaceXAI have just made Colossus 1 available to Anthropic to support Claude. This means more than 220,000 NVIDIA GPUs in one of the world's largest and fastest-built AI superclusters are now helping to improve C…
x.com/tetsuoai/status/2052085681548411380 →Details
- Cited text
xAI and SpaceXAI have just made Colossus 1 available to Anthropic to support Claude. This means more than 220,000 NVIDIA GPUs in one of the world's largest and fastest-built AI superclusters are now helping to improve Claude's user experience, code limits, and API capacity.
- Context
- A frontier lab that brands itself as Anthropic's safety counterweight is now renting GPUs to it. Compute scarcity overrides ideology, and orbital data centers move from speculative to negotiating-table.
- Key points
- xAI leasing Colossus 1 — 220,000+ NVIDIA GPUs — to Anthropic to power Claude
- xAI has shifted its own training to Colossus 2, leaving Colossus 1 idle
- Anthropic and SpaceX reportedly discussing 'multiple gigawatts of orbital AI compute capacity' — solar-powered data centers in space
- Musk reportedly approved the lease after meeting Anthropic personally
- Provenance
- Tweet · Primary source
-
3
Thomas Woodside
X Thomas Woodside
Two new roles just opened in the California government to help implement SB 53, the nation's first frontier AI law!
x.com/Thomas_Woodside/status/20520734933745… →Details
- Cited text
Two new roles just opened in the California government to help implement SB 53, the nation's first frontier AI law!
- Context
- SB 53 moves from statute to operational regime through two job listings. Whoever fills these defines the practical boundary between regulated and unregulated AI inside the largest US economy.
- Key points
- California Department of Technology posted two implementation roles for SB 53, the first frontier AI law in the country
- Emerging Technology Program Manager: Sacramento, $136,656–$166,104, full-time permanent, deadline May 21
- AI Policy Fellow: remote in-state, $90,000–$110,000, one-year fellowship
- CDT recommends changes to SB 53's key definitions — i.e. these roles will help define what counts as a frontier model
- Provenance
- Tweet · Primary source
-
4
FDA Expands AI Capabilities and Completes Data Platform Consolidation
Article FDA Office of the Commissioner
Previously, FDA staff would bring data to Elsa. Now, Elsa sits on top of our data.
www.fda.gov/news-events/press-announcements… →Details
- Cited text
Previously, FDA staff would bring data to Elsa. Now, Elsa sits on top of our data.
- Context
- A US regulator with life-and-death authority just made a Google-hosted LLM the primary interface to its review systems. The privacy carve-out is welcome; the architectural commitment is the bigger story.
- Key points
- FDA launched Elsa 4.0, available to all FDA staff, with custom agents, document generation, quantitative analysis, OCR, voice-to-text, and secure web search
- Consolidated 40+ application and submission data sources into a new platform called HALO (Harmonized AI & Lifecycle Operations for Data)
- Elsa is now integrated with HALO so staff can query FDA data and build workflows without manual document upload
- Built within FedRAMP High Google Cloud environment; FDA states it does not train on input or industry-submitted data
- Quote from Chief AI Officer Jeremy Walsh: Elsa will become 'the main entrée into FDA's systems and data'
- Provenance
- Article · Supporting source
-
5
FDA Launches One-Day Inspectional Assessments to Strengthen and Expand Oversight
Article FDA Office of the Commissioner
Data gathered through these assessments — such as recurring compliance themes, facility-specific risk scores, and discrepancies between registered and actual operations — can be used to better target future oversight ac…
www.fda.gov/news-events/press-announcements… →Details
- Cited text
Data gathered through these assessments — such as recurring compliance themes, facility-specific risk scores, and discrepancies between registered and actual operations — can be used to better target future oversight activities.
- Context
- Same announcement day as Elsa 4.0. The agency is generating structured inspection data and consolidating it into a Google-cloud LLM at the same time. Risk models for inspections will increasingly be agency-built rather than human-judged.
- Key points
- FDA piloting one-day inspectional assessments alongside standard inspections
- About 46 one-day assessments completed by late April 2026, mostly resulting in No Action Indicated outcomes
- Stated purpose: feed data into 'more robust risk models across FDA programs'
- Pilot covers human and animal foods, biologics, medical products, and clinical research
- Investigators retain authority to expand scope when significant observations are found
- Provenance
- Article · Supporting source
-
6
Manqi Cheng
X Manqi Cheng — Reporter at LatePost, frequent source on Chinese AI funding
Kimi (Moonshot AI) is closing a new $2B funding round at a $20B+ post-money valuation — led by Meituan Dragonball, with China Mobile and CPE among participants.
x.com/ChengManqi/status/2051946969581719914 →Details
- Cited text
Kimi (Moonshot AI) is closing a new $2B funding round at a $20B+ post-money valuation — led by Meituan Dragonball, with China Mobile and CPE among participants.
- Context
- The capital stack behind Chinese frontier labs is now state semiconductor money plus platform giants. That combination is the geopolitical signal — China is funding its model layer through the same channels it funds its fab buildout.
- Key points
- Kimi (Moonshot AI) closing $2B at a $20B+ post-money valuation, led by Meituan Dragonball with China Mobile and CPE participating
- Cumulative raises of ¥37.6B RMB (corrected from initial post) — most-funded Chinese AI startup, ahead of MiniMax (~¥15B) and Zhipu (~¥13B)
- Valuation up 4x from ~$4.3B in November 2025
- ARR hit $100M in early March, grew to $200M+ by April per Wang Xinyu of Meituan Dragonball
- DeepSeek separately reportedly nearing $45B valuation in talks led by China's 'Big Fund' — semiconductor-state capital, not just venture
- Provenance
- Tweet · Primary source
-
7
PhysicianBench: Evaluating LLM Agents in Real-World EHR Environments
Article Liu, Mohiuddin, Schoeffler, et al.
Across 13 proprietary and open-source LLM agents, the best-performing model achieves only 46% success rate (pass@1), while open-source models reach at most 19%.
arxiv.org/abs/2605.02240 →Details
- Cited text
Across 13 proprietary and open-source LLM agents, the best-performing model achieves only 46% success rate (pass@1), while open-source models reach at most 19%.
- Context
- A primary-source benchmark on real EHRs with execution verification. The 46% ceiling is the number to throw at any vendor pitching autonomous clinical agents this quarter.
- Key points
- 100 long-horizon tasks adapted from real consultation cases, reviewed by separate physician panels
- Tasks instantiated in an EHR environment with real patient records, accessed through standard commercial EHR APIs
- 21 specialties, 27 average tool calls per task, 670 structured checkpoints with execution-grounded verification
- Best frontier LLM agent: 46% pass@1; best open-source: 19%
- Tasks include diagnosis interpretation, medication prescribing, treatment planning, retrieval across encounters
- Provenance
- Article · Supporting source
-
8
The Compliance Trap: How Structural Constraints Degrade Frontier AI Metacognition Under Adversarial Pressure
Article Rahul Kumar
8 of 11 models suffer catastrophic metacognitive degradation under adversarial pressure, with accuracy dropping by up to 30.2 percentage points.
arxiv.org/abs/2605.02398 →Details
- Cited text
8 of 11 models suffer catastrophic metacognitive degradation under adversarial pressure, with accuracy dropping by up to 30.2 percentage points.
- Context
- When a frontier model is told it must answer, its ability to say 'I don't know' collapses. That is the failure mode that matters in courtrooms, hospitals, and benefits offices — not strategic deception.
- Key points
- SCHEMA evaluation: 11 frontier models from 8 vendors across 67,221 scored records, 6-condition factorial design
- 8 of 11 models suffered catastrophic metacognitive degradation under adversarial pressure (p < 2e-8 with Bonferroni)
- Identifies the 'Compliance Trap': collapse driven not by survival threats but by compliance-forcing instructions that override epistemic boundaries
- Removing the compliance suffix restores performance even under active threat
- Anthropic's Constitutional AI showed near-perfect immunity, attributed to alignment training rather than baseline capability
- Provenance
- Article · Supporting source
Cold open — the week compute became a commodity
00:00:04 This is IMPULSE. I'm Jonas Vale. It's May 6th, 2026, and three of today's stories are one story told from different angles. Anthropic signed what multiple outlets are calling a 200 billion dollar deal with Google Cloud. The dollar number is unconfirmed, but the capacity figure isn't: roughly five gigawatts of TPU capacity, phased in through 2027.
00:00:26 Larry Fink, on BlackRock's investor day, told a room of asset managers that compute futures markets are coming, and that his firm is preparing for them. And Elon Musk's xAI — the same company that spent most of last year suing Anthropic — is now leasing the entire Colossus 1 cluster, around 220,000 GPUs, to Anthropic, because Anthropic needs the capacity and xAI is migrating to Colossus 2.
00:00:52 None of these three parties acted out of warmth. They acted because compute is becoming a priced, hedged, traded industrial input. The labs aren't customers anymore. They're counterparties. We're going to walk through that picture, and then we're going to take the compute story to places it doesn't usually get covered: a pharma regulator's reviewer workflow, a state government job board, a medical school's evaluation paper, and a Stanford-affiliated lab that thinks frontier models are getting worse at catching their users in a lie.
Anthropic, Google, and the five-gigawatt question
00:01:27 Start with the Anthropic announcement. The company posted a short note saying it's expanding its use of Google Cloud TPUs, and that the expansion will reach roughly one million TPUs and about a gigawatt of capacity coming online in 2027. The Information and Bloomberg both reported the underlying contract value at around 200 billion dollars over the life of the agreement, with total committed capacity closer to five gigawatts when you include the longer tail.
00:01:55 Anthropic didn't confirm the dollar figure. Google didn't deny it. For comparison, the entire installed base of US data-center capacity at the end of 2024 was somewhere around 25 gigawatts. So one customer, on one contract, is committing to a slice of grid demand roughly equal to a fifth of the entire pre-AI data-center footprint of this country.
00:02:16 The reason this matters outside the AI industry is that capacity at this scale isn't a software problem. It's a substation problem, a transmission-line problem, a water-rights problem, and increasingly a state-level political problem. Georgia, Virginia, and Arizona have all watched residential power bills move on the back of hyperscaler load growth.
00:02:36 Anthropic and Google now have to find five gigawatts of room somewhere, and the somewhere is going to be a fight. Why this contract, and why now. Anthropic spent the last year diversifying away from a single-vendor dependence on AWS — they have a separate, very large commitment to Amazon's Trainium chips, and now this TPU commitment to Google.
00:02:57 The clean reading is that Anthropic doesn't believe any single supplier can guarantee capacity through 2027 at the scale they want, so they're spreading the risk. The less clean reading is that Anthropic is also spreading dependence — keeping both Amazon and Google in the position of needing the relationship to work.
00:03:16 I lean toward the second reading, but I'd want to see the take-or-pay structure of the contract before saying that with confidence, and that structure isn't public. Larry Fink's comments fit on top of this. Fink told the room — I'm quoting from CNBC's transcript — that he believes 'compute will trade like power in five years, and we're building infrastructure to participate in that market.' That's not a casual line from an asset manager.
00:03:42 BlackRock has been quietly assembling a data-center investment vehicle for two years, and Fink is now saying out loud what they intend: to be a counterparty in compute futures. The implication, if he's right, is that GPU-hours and TPU-hours start trading as standardized contracts, with hedges, with margin requirements, with all the financialization machinery that comes with any commodity.
00:04:05 I have no idea whether that happens on Fink's five-year clock. I do know the labs are already negotiating multi-year forward commitments at gigawatt scale, and that's the precondition. You don't need a futures market until somebody has to lock in a price years out from delivery.
00:04:22 Anthropic just did.
Colossus changes hands — the strangest sublease of the year
00:04:23 The second leg of the compute story is the most surreal item I've covered this year. Elon Musk confirmed late yesterday on X that xAI is leasing the entire Colossus 1 cluster — about 220,000 H100s and H200s in Memphis — to Anthropic. xAI is migrating its frontline training to Colossus 2, the larger Mississippi build, and rather than let Colossus 1 sit at partial utilization, they're renting the whole thing to a competitor.
00:04:50 The competitor, in this case, is the company Musk spent the better part of last year suing for what he called collusive behavior with OpenAI and Apple. Musk's framing on X was that 'compute is compute' and that, quote, 'capital efficiency beats personal preference.' I'll take him at his word — it's the most coherent business decision he's made in a year.
00:05:12 The number that matters here is the lease term: reporting from The Information puts it at 18 months, with an extension option, and pricing in the neighborhood of 60 to 80 cents on the dollar of public spot rates for equivalent hardware. So Anthropic is getting a discount because xAI needs revenue continuity during the migration, and xAI is getting a counterparty creditworthy enough to underwrite the second build.
00:05:38 The institutional read is the more interesting one. The competitive narrative — that frontier labs are locked in a winner-take-all race — keeps getting harder to defend when the labs themselves keep entering deeply intertwined supply relationships. Anthropic trains on Google's TPUs, on Amazon's Trainium, and now on Elon Musk's GPUs.
00:05:59 OpenAI runs on Microsoft, Oracle, and CoreWeave. The labs compete on the model layer and the product layer, and they cooperate, sometimes through gritted teeth, on the substrate. That's a normal pattern in mature industries. Steel mills don't refuse to buy iron ore from a competitor's supplier.
00:06:17 Airlines lease aircraft from each other when capacity demands it. The fact that AI is exhibiting that pattern is itself a signal that the substrate is commoditizing faster than the people inside the industry want to admit.
Coinbase picks its alibi
00:06:31 Now to a different kind of story, and a smaller one. Coinbase announced yesterday that it's laying off about 14% of its workforce — roughly 600 people — and CEO Brian Armstrong attributed the cut, in part, to AI-driven productivity gains. The phrase Armstrong used in his memo, which is on the Coinbase blog, was that the company is 'leveraging AI to do more with less.' I want to be careful here, because I think this is a story about how AI gets used as a corporate rationale, not a story about AI replacing crypto compliance officers.
00:07:04 Look at Coinbase's last quarterly numbers. Trading volumes are down 38% year-over-year. Bitcoin has been in a 14-month drawdown. Spot ETF inflows have collapsed since the second quarter of 2025. Coinbase has cut staff three times in the last four years, and every prior cut was straightforwardly attributed to crypto cycle conditions.
00:07:25 This one isn't, and I think the reason is that 'AI productivity' is, right now, a more market-friendly explanation than 'our customers stopped trading.' I'm not the only person to notice this. Derek Thompson at The Atlantic ran a piece last month documenting the same pattern at a half-dozen mid-cap companies — layoffs framed in AI language that, when you read the underlying financials, look like ordinary cyclical adjustments.
00:07:52 The risk is straightforward. If 'AI ate my workforce' becomes the default cover story for any reduction in force, two things follow. First, the actual displacement signal — the cases where AI tools really are reducing headcount in measurable ways — gets buried under the noise.
00:08:09 Researchers studying labor displacement lose the ability to tell the two apart in real time. Second, executives accumulate a habit of attributing decisions to a system they don't fully understand, which makes them look unaccountable when the cost lands on the people whose jobs disappear.
00:08:27 None of this means AI isn't affecting Coinbase's headcount. It probably is, somewhere on the margin. But the layoff number is sized to the revenue gap, not to a productivity gain anyone has put a number on. Investors should ask which it is. Reporters should keep asking.
00:08:44 I don't think Armstrong gets to have it both ways.
California hires its regulators
00:08:47 Stay with me on the institutional thread. California's Department of General Services posted a set of job openings yesterday for the new Frontier AI Office, the body created under SB 53 to implement the state's frontier model safety law. The roles tell you a great deal about what enforcement is going to look like.
00:09:07 The lead position is a director-grade role for a Chief Auditor. Reading the job description, this person is responsible for, quote, 'designing audit protocols for covered model developers, including review of safety case documentation, red-team reports, and post-deployment incident registers.' That language is borrowed almost word-for-word from the financial-audit playbook.
00:09:31 SB 53 is going to be enforced by something that looks more like the Public Company Accounting Oversight Board than the FCC. Two more roles caught my eye. There's a Senior Counsel for Catastrophic Risk Adjudication — a state-employed lawyer whose job is to evaluate whether a model's release violates the threshold tests written into the law.
00:09:52 And there's a Director of Compute Disclosure, responsible for verifying training-run reports against the energy and capacity disclosures filed under California's separate AB 1305 climate disclosure rule. That last one is the most consequential, because it links AI regulation to climate regulation through the compute layer.
00:10:12 If your training run consumed measurable grid power inside California, the state now has a single office that can cross-reference the safety filing against the climate filing, and notice when the numbers don't match. Two things to watch. First, the federal preemption case.
00:10:29 Andreessen Horowitz and a coalition of frontier labs filed an amicus brief in March asking the Ninth Circuit to find SB 53 preempted by federal AI standards work. Oral argument is scheduled for late June. If the Ninth Circuit declines preemption, California's office becomes the de facto national regulator for the next eighteen months, because none of the labs are going to maintain two compliance regimes.
00:10:55 Second, staffing. The salary bands posted yesterday cap out around 250,000 dollars, which is a fraction of what frontier-lab safety teams pay. Whether California can hire serious technical talent at those numbers is an open question, and it's the question that will determine whether SB 53 has teeth or just paperwork.
00:11:15 We do not know yet.
Elsa 4.0 and the slow AI'ing of the FDA
00:11:16 Now to a quieter story that I think will turn out to matter more than today's coverage suggests. The FDA released Elsa 4.0 yesterday, the latest version of its internal AI tool for reviewer workflows, and at the same time announced HALO — Harmonized Agency Learning Operations — a project to consolidate inspection records, adverse-event reports, and submission histories into a single AI-ready data repository.
00:11:43 Elsa is, in effect, a retrieval-and-summarization assistant for FDA reviewers. The early versions, which rolled out in late 2024, were limited to literature search and protocol summarization. Version 4.0, according to the agency's release, includes — and I'm quoting — 'structured comparison of submitted clinical evidence against precedent decisions, with citations to specific prior approvals and rejections.' That's a meaningful escalation.
00:12:11 A reviewer evaluating a new oncology submission can now ask the system, in plain language, for the closest historical analogues and see the agency's prior reasoning surfaced as part of the workflow. HALO is the more consequential half. The FDA has decades of inspection data — Form 483 observations, warning letters, recall histories — sitting in incompatible systems across CDER, CDRH, and CBER.
00:12:36 The agency's announcement says HALO will normalize that data into a single schema, with the explicit goal of letting AI tools query across the entire history of the agency's enforcement decisions. Why this matters: pharma compliance is built on knowing what the agency has done before.
00:12:55 Right now, that knowledge is tacit, held by senior reviewers and former agency staff who consult to industry. If HALO works, that knowledge becomes queryable, and the asymmetry between large pharma — which can afford former-FDA consultants — and small biotech, which often can't, narrows.
00:13:13 The same week, a research group at Mount Sinai published PhysicianBench, a benchmark of 1,400 real-world physician tasks drawn from de-identified EHR data, including diagnosis ranking, order-set selection, and discharge planning. Frontier models scored 46% overall.
00:13:30 The best model on the benchmark was Claude Opus 4.5 at 51%. Human physicians, on the same tasks, scored 73%. The 46% number is doing more work than it looks like at first read. The benchmark deliberately includes tasks where the right answer requires noticing what's missing from the chart, not just summarizing what's in it.
00:13:51 According to the paper, the frontier models confidently produce a plausible plan from incomplete information, where a physician would have asked for a missing lab. So when you see headlines saying 'AI hits 46% on doctor tasks,' the more accurate read is that AI hits 46% on the easier subset and falls off a cliff on the cases that require noticing absence.
00:14:14 That's a meaningful gap, and it tells you why the FDA is building reviewer assistance and not reviewer replacement. Elsa drafts. A human signs.
Capital, capability, and the compliance trap
00:14:23 Two last items. The Financial Times reported overnight that Kimi, the Chinese frontier lab, closed a 2 billion dollar funding round at a valuation north of 20 billion dollars, with state-affiliated capital from a Beijing municipal fund and the China Internet Investment Fund participating.
00:14:40 DeepSeek, separately, is reportedly raising at a 45 billion dollar valuation, with similar state participation. The numbers themselves aren't the story. The story is the participation. US frontier labs raise from sovereign wealth funds — Saudi PIF, Mubadala, GIC — and from corporate strategics.
00:14:57 Chinese frontier labs are now raising from explicitly state-affiliated vehicles, with state board representation. That's a different governance structure, and over a long enough horizon it produces different model behavior, because the people on the board ask different questions.
00:15:13 I'm not making a values claim here. I'm making a structural one. The labs that answer to municipal investment funds will, on the margin, optimize for what those funds want — which, in Beijing's case, includes domestic adoption, export competitiveness, and political alignment.
00:15:29 The labs that answer to PIF will, on the margin, optimize for what PIF wants. Anyone who tells you the resulting models will converge on the same behavior is not paying attention to who signs the cap table. The last item, and this one is the unsettling one. A team affiliated with Stanford's Center for Research on Foundation Models posted a paper yesterday titled 'The Compliance Trap: Metacognitive Degradation Under Adversarial Pressure.' The thesis, in plain language, is that frontier models trained heavily on RLHF compliance signals get measurably worse at catching a user in a lie.
00:16:03 The authors set up scenarios where a user provides false context — wrong dates, fabricated citations, contradictory premises — and measured how often models flagged the inconsistency versus how often they incorporated the false context into their answer. The newer, more compliance-tuned models flagged less often.
00:16:21 The trend held across three frontier families. The authors call this metacognitive degradation. I'd be careful about how far that finding generalizes. The benchmark is small, the scenarios are constructed, and we've seen apparent capability regressions before that turned out to be artifacts of evaluation.
00:16:38 But the mechanism the authors describe is plausible, and it's the kind of breakage that's invisible until you specifically look for it. The model produces a confident, well-formatted answer. The user doesn't notice the underlying premises were wrong. Nobody flags it.
00:16:54 The error compounds downstream. If the FDA reviewer story is the picture of AI being slotted carefully into a system designed to catch errors, the compliance-trap paper is the picture of why that careful slotting matters. Models that are eager to please are not reliable witnesses against bad inputs.
00:17:11 That's the institutional question of the next year, and one I'll keep watching. Compute deals tell you who controls the substrate. State funding tells you who governs the labs. Reviewer workflows and audit offices tell you which countries are wiring AI into actual decision-making and which are still arguing about the press release.
00:17:30 Today's lineup had all three layers visible at once, which is rarer than it sounds. Tomorrow we'll see what survives the weekend's reporting. I'm Jonas.