◆ Dispatch 012 · 2026-05-13 The Dial

The Dial, the Mythos, and the Lawyer's Desk

2026-05-13 / 00:24:29 / 6 sources

“If EviCore wants more denials, it can send on for review anything that scores lower than ninety-five percent. If it wants fewer, it can set the threshold at seventy-five. That's the game we would play, one former executive said.”
— Jonas Vale, today's narration

Wednesday's IMPULSE: ProPublica names the algorithm — EviCore's "dial" — that turns prior-authorization scores into denials for one in three insured Americans, and walks through the death of a 61-year-old welder twice refused a heart catheterization. Anthropic's Mythos is now inside the largest US banks; tens of thousands of vulnerabilities have surfaced and Treasury and the Fed are calling CEOs about it. The White House quietly drops the FDA-style approval frame for frontier AI. Trump's China delegation leaves Jensen Huang at home. Anthropic launches Claude For Legal with practice-area plugins and connectors to nine legal platforms, then drops itself directly inside Microsoft 365. A Princeton-led Nature paper traces state-coordinated media through training data into model answers about Xi Jinping. And a new London-and-SF lab called Recursive raises money from Nvidia and AMD to automate AI research itself.

Chapters

00:00:04 The Dial
00:04:58 Mythos In The Vault
00:08:19 Trade Without Huang
00:11:37 Anthropic Sits Under The Lawyer's Desk
00:15:35 What The Models Read About Xi
00:18:49 Recursive And The Information Barrier
00:23:04 Closing

Sources

6 cited

1
"Not Medically Necessary": Inside the Company Helping America's Biggest Health Insurers Deny Coverage for Care

Article David Armstrong — ProPublica senior reporter on health care; joint investigation with Capitol Forum

The algorithm cannot say no, however. If it finds problems, it sends the request for review to a team of in-house nurses and doctors who consult company medical guidelines. Only doctors can issue a final denial.
www.propublica.org/article/evicore-health-i… →
Details
Cited text
The algorithm cannot say no, however. If it finds problems, it sends the request for review to a team of in-house nurses and doctors who consult company medical guidelines. Only doctors can issue a final denial.

Context
Concrete, sourced look at AI-assisted insurance gatekeeping affecting one in three insured Americans — a case the labor, medical, and policy lanes have been talking around for two years.
Key points
EviCore by Evernorth, owned by Cigna, makes prior-authorization decisions for about 100 million insured Americans across UnitedHealthcare, Aetna, Blue Cross Blue Shield and others
Uses an AI-backed algorithm employees call 'the dial' that scores requests; staff can change the score threshold above which a case goes to nurse/doctor review, raising the chance of denial
Markets a 3-to-1 return on investment to insurers and has boasted internally of a 15% increase in denials
In Arkansas — which forces denial-rate disclosure — EviCore turned down requests in full or in part nearly 20% of the time since 2021, vs about 7% for federal Medicare Advantage in 2022
Risk contracts let EviCore pocket savings when it keeps insurer claim spending below a target
Story walks through the death of Little John Cupp, 61, whose heart catheterization was twice denied; he died of cardiac arrest 36 hours after the cheaper stress test EviCore approved
Provenance
Article · Supporting source
2
Anthropic's Mythos sends US banks rushing to plug cyber holes

Article Reuters

Mythos can create a high-risk vulnerability by bringing together several lower risk weaknesses.
www.reuters.com/business/finance/anthropics… →
Details
Cited text
Mythos can create a high-risk vulnerability by bringing together several lower risk weaknesses.

Context
First clear, on-the-record evidence that an offensive-class model is reshaping bank patch cycles in real time, with US officials actively in the loop — a financial-stability story, not a vendor story.
Key points
JPMorgan Chase named publicly as a Mythos launch partner; Goldman Sachs, Citigroup, Bank of America and Morgan Stanley also have access
Number of low- to moderate-rated vulnerabilities Mythos has surfaced in bank tech runs from several hundred to thousands per institution
Larger banks helping smaller banks who don't have direct access to prepare their systems
Dario Amodei said May 5 that financial firms have six to 12 months to patch before Chinese AI models match Mythos capabilities; tens of thousands of vulnerabilities found overall
Adam Meyers of CrowdStrike (Project Glasswing) said the team spent a 'solid entire weekend' learning to use the model before they could hunt bugs with it
Treasury Secretary Scott Bessent and Fed Chair Jerome Powell have raised Mythos directly with major bank CEOs
Provenance
Article · Supporting source
3
Brandon Stewart

Thread Brandon Stewart — Political scientist at Princeton; co-author of a Nature paper on how state-coordinated media bleeds into LLM training data and outputs

LLMs separate the message from the messenger. State-coordinated phrasing can circulate through the web, enter training data, and reappear as neutral-sounding LLM output — the source obscured.
x.com/b_m_stewart/status/2054579383923335627 →
Details
Cited text
LLMs separate the message from the messenger. State-coordinated phrasing can circulate through the web, enter training data, and reappear as neutral-sounding LLM output — the source obscured.

Context
A peer-reviewed Nature paper that links authoritarian media control to model behavior with traceable mechanism — gives policy and procurement people a real artifact to point at when they argue about training-data provenance.
Key points
Six connected studies across 38 languages and 13 models, joint work with Hannah Waight, Solomon Messing, Molly Roberts, Joshua Tucker and others
Open multilingual training set CulturaX contains state-scripted Chinese media; models memorize state-coordinated phrases at higher rates than common Chinese phrases
Continued pre-training of Llama 2 13B with different Chinese-language documents using LoRA replicates the political slant
Audits show commercial models answer political questions about Xi Jinping and the CCP differently in English vs Chinese — including for real user queries
Cross-country audit of 37 nations: lower media freedom states have more pro-state answers in the state language relative to English
Replicated with newer models at state-media-influence-llm.github.io
Provenance
Thread · Primary source
4
Anthropic launches Claude For Legal with practice-area plugins and MCP connectors

Article

The Microsoft 365 integration is the real story. If your redline in Word carries context into the Outlook cover note and then into a PowerPoint board summary, your workflow lives inside Office, not inside whatever legal…
www.streetinsider.com/Reuters/Anthropic+exp… →
Details
Cited text
The Microsoft 365 integration is the real story. If your redline in Word carries context into the Outlook cover note and then into a PowerPoint board summary, your workflow lives inside Office, not inside whatever legal tech vendor's UI you used to pay for.

Context
A vertical roll-up that puts Anthropic in the Microsoft 365 surface where the actual work happens, while leaving the legal AI incumbents to defend a thinner middle layer.
Key points
Anthropic launched Claude For Legal on May 12 with plugins for commercial, employment, privacy, product, corporate, and AI governance law
MCP connectors ship for DocuSign, Ironclad, iManage, NetDocuments, LexisNexis, Thomson Reuters, Box, Everlaw, and LSuite
Each plugin runs a cold-start interview that learns the firm's playbook, escalation chains, and house style, then writes a practice profile shared by the skills
CoCounsel runs on Claude and is now exposed to Claude as a tool — Anthropic sits both under and beside the incumbent legal AI product
When Anthropic's first legal plugin shipped in February, RELX, Thomson Reuters and Wolters Kluwer all took share-price hits; those incumbents are now ecosystem partners
A practicing lawyer on r/ClaudeAI called the product 'lacklustre' and said many connectors are gated behind expensive third-party subscriptions
Provenance
Article · Supporting source
5
Tim Rocktäschel

Thread Tim Rocktäschel — DeepMind researcher and UCL professor; lead author on a series of papers on open-endedness, self-improvement, and AI for science

create AI that experiments on how to safely improve itself, turning compute into knowledge that accumulates in an open-ended process of endless, automated scientific discoveries.
x.com/_rockt/status/2054491251345391852 →
Details
Cited text
create AI that experiments on how to safely improve itself, turning compute into knowledge that accumulates in an open-ended process of endless, automated scientific discoveries.

Context
A serious technical team putting money behind the bet that the next frontier comes from open-ended self-improvement, not bigger pre-training runs — and putting it in London rather than the Bay Area.
Key points
Recursive is a new lab in London and SF aimed at automating AI research itself — using AI to safely run experiments on AI self-improvement
Frames the mission against Stanisław Lem's 1964 'information barrier' concept and David Deutsch's view that evil follows from insufficient knowledge
Backed by GV (Google Ventures), Greycroft, Nvidia, AMD and others
Cites earlier work on AI debate and persuasion (arxiv 2402.06782, 2402.16822) as 'early signs of life'
Provenance
Thread · Primary source
6
The AI Daily Brief: Towards AI That Can Actually Interact

Video Nathaniel Whittemore (The AI Daily Brief)

If Anthropic starts invalidating layered SPVs and other so-called creative financing structures, private markets are in for a reckoning. The SpaceX IPO will expose just how much synthetic ownership and outright fraud ha…
www.youtube.com/watch?v=-UTIXsziBJI →
Details
Cited text
If Anthropic starts invalidating layered SPVs and other so-called creative financing structures, private markets are in for a reckoning. The SpaceX IPO will expose just how much synthetic ownership and outright fraud has accumulated in privates.

Context
A clean daily roll-up of the OpenAI/Anthropic gray-market enforcement, the FDA-frame retreat, and the China delegation choreography — useful primary-source sieve for the institutional lanes.
Key points
OpenAI Deployment Company officially launched with $4B investment at a $10B pre-money valuation, lead investor TPG with Advent, Bain Capital and Brookfield as co-leads; 19 partners total
Deploy Co acquires Tomoro for about 150 forward-deployed engineers; Goldman Sachs is the only firm backing both DeployCo and Anthropic's equivalent venture
Anthropic and OpenAI both publicly voided unauthorized secondary-market transfers and SPV-routed stock claims; tokenized Anthropic share prices on gray markets halved overnight
White House walked back FDA-style AI approval framing; NEC Chair Kevin Hassett told CNBC, 'I probably shouldn't have called it the FDA'
Trump China delegation includes Elon Musk, Tim Cook, Meta's Dina Powell McCormack, plus Micron and Qualcomm executives — Jensen Huang notably not invited despite saying he would go
Zero H200 export licenses to China have been approved by Commerce since the December signal that older H200s would be allowed
Thinking Machines Lab released interaction models — a 200ms micro-turn architecture trained from scratch around continuous human-AI exchange, with a real-time model paired to a slower background model
Provenance
Video · Supporting source

00:00:04

The Dial

00:00:04 ProPublica and Capitol Forum published an investigation today by David Armstrong on a Cigna-owned company called EviCore. EviCore decides whether a prior-authorization request your doctor sends in actually gets paid for. Its clients include UnitedHealthcare, Aetna, Blue Cross Blue Shield, and a long list of Medicare and Medicaid contractors.

00:00:24 That covers about one hundred million Americans — roughly one in three insured people. ProPublica's reporting draws on internal documents and interviews with dozens of former employees. What it shows is that EviCore runs the front gate of those decisions with an algorithm.

00:00:40 Staff inside the company call it 'the dial.' A doctor's office sends in a request — for a heart catheterization, a back MRI, or a course of radiation. EviCore's algorithm scores the request on the chance it gets approved. The algorithm isn't allowed to deny anything.

00:00:57 Only doctors are. But the algorithm decides which cases get sent on for a human review. One former executive describes the dial in these terms — quote — 'If EviCore wants more denials, it can send on for review anything that scores lower than ninety-five percent.

00:01:12 If it wants fewer, it can set the threshold at seventy-five. That's the game we would play.' ProPublica's analysis of EviCore's own data shows the company has turned down prior-authorization requests in full or in part nearly twenty percent of the time since 2021.

00:01:32 The equivalent figure for federal Medicare Advantage in 2022 was about seven. EviCore markets to insurers with a three-to-one return on investment — for every dollar an insurer spends on EviCore, the insurer pays out three dollars less on medical care. Salespeople have privately boasted of a fifteen percent increase in denials.

00:01:51 Some contracts are structured so that EviCore itself keeps the savings when claim costs come in under a target. A former EviCore executive told the reporters, quote, 'Where you really made your money was on a risk model. Their margins were exponentially higher.'

00:02:13 He started gasping for breath in late 2021. His doctor ordered a heart catheterization. EviCore denied it twice — 'not medically necessary.' His doctor wrote in shorthand on Cupp's chart, quote, 'ideally he needs LHC (denied twice by insurance).' The doctor then ordered a cheaper test, a nuclear stress test, which EviCore approved.

00:02:33 The catheterization would have cost around thirty-five hundred dollars in network. The stress test cost about three hundred and fifteen. Thirty-six hours after the stress test, Cupp went to bed early because he was on the 2:30 a.m. shift at a medical-supplies warehouse.

00:02:48 He stopped breathing. Time of death, 11:39 p.m. Three of the four cardiologists ProPublica asked to review the case said the catheterization was appropriate. One said his life might have been saved. The Cigna spokesperson gave ProPublica a written statement on EviCore's behalf.

00:03:05 Quote — 'EviCore uses the latest evidence-based medicine to ensure that patients receive the care they need and avoid the services they do not.' The statement also said the algorithm is used 'ONLY to accelerate approval of appropriate care and reduce the administrative burden on providers.' Cigna acknowledged what the company calls 'the sentinel effect' — the observation that when EviCore is in the loop, doctors stop requesting certain procedures altogether.

00:03:32 Cigna frames this as physicians becoming better informed. Dave Jones, a former California insurance commissioner who now runs the climate-risk initiative at the UC Berkeley School of Law, framed it differently. He told ProPublica that arbitrarily changing manual review rates doesn't appear to violate any specific standard.

00:03:51 But a contract paying EviCore for denial volume, he said, quote, 'calls into question everything that's occurring.' She told ProPublica, about companies like EviCore — quote — 'They love to deny things.' I don't have much to add to that. There's an AI angle here, and you've already heard it — an opaque scoring model sitting between a doctor's clinical judgment and a patient's care, with a tunable knob hidden inside a contract neither the patient nor the doctor can read.

00:04:25 But the more important thing, I think, is that we now have an algorithm-of-record story with a body. Little John Cupp's daughter Chris is suing UnitedHealthcare, the doctor, the hospital, and EviCore. Her lawyer has had to drop United and EviCore from the suit.

00:04:40 Under federal law, employer-funded plans get tried in federal court, where there are no punitive damages — the insurer pays only the cost of the treatment it refused. Which, in Cupp's case, is zero. That's the legal architecture the algorithm sits inside. The dial is a piece of it.

00:04:57 So is the courthouse.

00:04:58

Mythos In The Vault

00:04:58 On Tuesday Reuters reported, and the largest US banks have now started confirming on background, that Anthropic's Mythos model — the one Dario Amodei spent April calling a strategic-class cyber weapon — is being used inside the biggest American financial institutions to audit their own code.

00:05:15 JPMorgan Chase is the publicly named launch partner. Goldman Sachs, Citigroup, Bank of America, and Morgan Stanley also have access. The program is called Project Glasswing, run jointly with CrowdStrike and a small handful of other security firms. The numbers in the Reuters piece are the part to pay attention to.

00:05:34 The number of low- and moderate-rated vulnerabilities Mythos has surfaced inside the named banks runs from several hundred per institution to several thousand. Amodei said on May 5 that across the full program, Mythos has uncovered tens of thousands of vulnerabilities.

00:05:50 The specific finding the banks are most worried about isn't any single bug. It's that Mythos can chain several individually low-risk weaknesses into a high-risk vulnerability that would have taken human red teams months to assemble. The banks have responded by collapsing their patch cycles.

00:06:07 People who used to wait weeks on a fix are now shipping in days. The larger banks are quietly briefing smaller banks that don't have direct Mythos access, so those banks can prepare their own systems before whatever comes next gets there first. Adam Meyers, who runs counter-adversary operations at CrowdStrike, gave Reuters the most human line in the piece.

00:06:28 When he first found out about Mythos, his words were 'oh boy.' His team then spent — quote — 'a solid entire weekend trying to figure out how to best use this thing before we even started looking for bugs.' That's the part I keep coming back to. The model is so different from a normal analyzer that the experienced operators had to invent the methodology to use it before they could turn it on the problem.

00:06:53 The political layer matters here. Treasury Secretary Scott Bessent and Fed Chair Jerome Powell — according to CNBC's reporting last month — raised Mythos directly with bank CEOs and told them, in Bessent's framing, to take the model seriously and use it to find holes in their defenses.

00:07:10 Amodei's framing is sharper. He has said publicly that financial firms have six to twelve months to fix the vulnerabilities Mythos is surfacing now, because that is roughly the window before Chinese AI models — his words — develop comparable capabilities. Whether you believe that timeline or not, that's the gun the Treasury is using to motivate the patch-cycle compression.

00:07:32 I'll name what I think the more interesting question is. The story being told here, by Anthropic and by Treasury, is the defense story — we're letting the good guys find the bugs first. But every line of the Reuters piece also describes an offensive capability.

00:07:47 Mythos doesn't know it's defending. It's just finding novel vulnerability chains. The banks happen to own the code. If the same class of model is running on Chinese hardware against the same code in six to twelve months, the only thing that decides who wins is who patched faster.

00:08:04 On that, the JPMorgan side has a head start measured in weeks, not years. I don't love putting financial-system stability on a race like that. But that's the race we appear to be in. The Treasury Secretary calling the bank CEOs is the tell that he agrees.

00:08:19

Trade Without Huang

00:08:19 Following up on yesterday's thread about export-control language and the upcoming Xi-Trump meeting. The White House confirmed Tuesday that President Trump's delegation to China later this week will include Elon Musk, Apple CEO Tim Cook, and Meta's president of global affairs, Dina Powell McCormack.

00:08:37 Executives from finance, semiconductor, aerospace, and agriculture firms round out the roster. Officials say the goal is to finalize the bilateral trade framework discussed at the November summit, including standing up a US-China board of trade. The roster is interesting for who isn't on it.

00:08:54 Jensen Huang isn't on it. Last week the Nvidia CEO publicly said he would join the delegation if the invitation came. The invitation didn't come. Executives from Micron and Qualcomm are on the list, so semiconductors are clearly part of the conversation in some form.

00:09:10 The reading I find most persuasive — and it's the reading the AI Daily Brief landed on this afternoon — is that the White House is signaling that Nvidia's AI chips specifically are off the trade-talks table. The evidence for that reading goes beyond the seating chart.

00:09:26 In December the White House signaled that older H200 GPUs would be approved for export to China. As of this week, the Commerce Department has approved zero export licenses for them. The pipeline is open on paper and closed in practice. If you were going to include Huang in a delegation whose explicit job is to finalize trade language, you'd want the H200 file moving.

00:09:48 It isn't moving. So you leave Huang at home. The related thing that happened in Washington this week is more revealing in some ways. Last week the National Economic Council chair, Kevin Hassett, told reporters the White House was considering an executive order that would put frontier AI models through an FDA-style approval process before public release.

00:10:09 The industry reaction was — to put it gently — loud. Over the weekend David Sacks, the former AI and crypto czar, said he'd spoken with Hassett and that the FDA comparison wasn't apt. Quote — 'I don't think any senior official supports it.' On Monday on CNBC, Hassett walked it all the way back.

00:10:26 He said, and I'll just read it, quote — 'At the White House, nobody has an idea that we should do something like bring in a giant new bureaucracy to approve AIs.' He then added, quote — 'I probably shouldn't have called it the FDA.' There is no formal pre-release approval pathway for frontier models in the United States, and there isn't going to be one this year.

00:10:50 What the White House is offering instead is, in Hassett's language, an all-of-government, all-of-private-sector arrangement where administration officials work directly with the labs on extreme-harm risks. That is, by design, a deal between the executive branch and the four or five companies that operate at the frontier.

00:11:09 No registry, no docket, and no public comment period. The labs and the Treasury talk to each other, and the rest of us read about it on Tuesdays. I'm not pretending I know the right answer on the substantive question of how frontier safety review should work. I am pointing out that we've now seen the answer the current administration prefers — bilateral with the labs, no rulemaking.

00:11:32 And the China-delegation seating chart is the same posture in a different room.

00:11:37

Anthropic Sits Under The Lawyer's Desk

00:11:37 Anthropic rolled out Claude For Legal on Tuesday. The release has two parts that matter and one part that's already getting underestimated. The two parts that matter. First, practice-area plugins covering commercial law, employment, privacy, product, corporate work, and AI governance.

00:11:55 Each plugin runs what Anthropic calls a cold-start interview when a firm enables it. That interview captures the firm's playbook, escalation chains, and house style, then writes a practice profile that all the skills read from. Second, native connectors over model context protocol — the open tool standard Anthropic published last year.

00:12:16 Nine integrations into the software stack lawyers already pay for: DocuSign, Ironclad, NetDocuments, LexisNexis, Thomson Reuters, Box, Everlaw, the legal-services platform LSuite, and a leading document-management system most firms already run. The underestimated part is what sits behind those plugins.

00:12:35 One of the better reads I've seen came from a working lawyer on the ClaudeAI subreddit named in the byline as Intelligent-Lynx-953. They wrote — quote — 'The Microsoft 365 integration is the actual move here. If your redline in Word carries context into the Outlook cover note and then into a PowerPoint board summary, your workflow lives inside Office, not inside whatever legal tech vendor's UI you used to pay for.' That, to me, is what's underneath the launch.

00:13:04 The legal-AI front-end companies — CoCounsel, Harvey, Relativity, and Everlaw — are joining the ecosystem because Anthropic is sitting one Word add-in away from the actual lawyer doing the actual redline. CoCounsel even runs on Claude. Now Claude can call CoCounsel as a tool.

00:13:21 That isn't a partnership in the normal sense. It's Anthropic taking both layers. There's a market-history detail to keep in your head. When Anthropic shipped its first legal plugin in February of this year, three companies' shares took a hit on the same day — RELX, Thomson Reuters, and Wolters Kluwer.

00:13:39 The legal-information incumbents. This week, those same three are partners in the Claude For Legal launch. Read it whichever way you like, but the simplest read is that they took one look at the February tape and decided sitting inside the ecosystem was less bad than sitting outside it.

00:13:57 Is the product good? A lawyer on the same Reddit thread who has tried the plugins called the release 'lacklustre' and noted that several of the useful connectors require expensive third-party subscriptions on top. That's the disconnect to track. The strategic position is strong, and the user-level reaction so far is muted.

00:14:18 I think both of those are true. What decides which one matters in twelve months is whether the cold-start interview actually captures firm-specific judgment, or whether it produces a competent but generic profile that every firm has to keep correcting. I've worked with enough partner-level lawyers to know how thin the margin is between 'this is faster than my junior' and 'this is making me read more carefully than my junior would.' Anthropic is betting it can land on the right side of that line at the practice-area level.

00:14:51 We'll know by Q3 whether the partners agree. The last thing I'll say about Claude For Legal is the labor part. Big-law associate hiring has been roughly flat for two cycles now, after fifteen years of growth. Firms have been quietly cutting first-year intake without ever announcing a target.

00:15:09 The Anthropic playbook here — domain plugins plus deep Office hooks — is precisely what an in-house knowledge-management partner pitches to the management committee when they want to thin the bottom of the pyramid without saying so out loud. I don't know how fast that compresses.

00:15:26 I do know that if you graduated law school in 2025, the chair you were promised exists. The job description on the chair is changing under you.

00:15:35

What The Models Read About Xi

00:15:35 A new Nature paper landed Wednesday morning from Brandon Stewart at Princeton, along with Hannah Waight, Solomon Messing, Molly Roberts, Joshua Tucker, Yin Yuan, and others. The paper grounds a question I'd been waiting for someone to ground in evidence — does state-coordinated media in authoritarian countries shape what commercial language models say about those same countries?

00:15:57 Their answer, after six connected studies covering thirty-eight languages and thirteen models, is yes. And they can show you the pipe. The China case study is the spine of the paper. Study one looks at CulturaX, the largest open multilingual training dataset, and shows that state-scripted Chinese-language media is in there.

00:16:16 Study two shows that models memorize state-coordinated Chinese phrases at higher rates than common Chinese phrases — meaning the political language isn't just present in training data, it sticks. Study three takes Llama 2 13B and does continued pre-training using low-rank adapters on different Chinese-language documents.

00:16:35 In that controlled setting, more state-coordinated input produces more pro-state output. Study four audits commercial models. The researchers ask the same political questions in English and Chinese — and here's the part you can feel. If the language-specific training data matters, the answers should differ in language-specific ways.

00:16:55 They do. Study five replicates that finding using actual user queries about Xi Jinping and the Chinese Communist Party, rather than questions the researchers wrote themselves. Study six broadens the audit to thirty-seven countries where one national language accounts for over seventy percent of global speakers.

00:17:13 The headline cross-country finding is that lower-media-freedom states have more pro-state answers in the state language relative to English. Stewart's own summary line at the end of the thread is the one to keep. Quote — 'LLMs separate the message from the messenger.

00:17:29 State-coordinated phrasing can circulate through the web, enter training data, and reappear as neutral-sounding LLM output — the source obscured.' The team used the models available at submission, which was October 2024. They published a separate replication on a public site with the newer models, and the pattern holds.

00:17:50 So this isn't a frozen-in-time artifact of older training pipelines. It's a current property of how commercial multilingual models behave when the people prompting them speak a state language and the regime running that state has spent decades shaping its domestic information environment.

00:18:06 What changes downstream? In the short run, procurement. Any government, NGO, or news organization buying a commercial chat model now has a peer-reviewed Nature artifact to cite when they ask vendors what's in the multilingual training data. Vendors who continue to call the answer a trade secret are going to have a harder time saying that to a sovereign customer in 2026 than they did in 2023.

00:18:30 In the medium run, the more uncomfortable question. If a model is trained on data that includes state-coordinated authoritarian media, the model is, in some measurable sense, downstream of those states. We haven't built an apparatus to think about that. The Stewart paper is the first piece of evidence I'd hand a regulator who wanted to start.

00:18:49

Recursive And The Information Barrier

00:18:49 Tim Rocktäschel, who has been doing some of the more interesting open-endedness research at DeepMind, announced Wednesday morning that he is co-founding a new lab. It's called Recursive, based in London and San Francisco, with backing from GV, Greycroft, Nvidia, and AMD.

00:19:05 The pitch is more honest about what it is trying to do than most of the lab launches we've seen this year. Quote — 'create AI that experiments on how to safely improve itself, turning compute into knowledge that accumulates in an open-ended process of endless, automated scientific discoveries.'

00:19:26 He opens by citing Stanisław Lem's 1964 book Summa Technologiae. Lem described what he called an information barrier — a point where the volume and fragmentation of information exceeds humanity's ability to filter, interpret, and integrate it into a coherent body of knowledge.

00:19:42 Rocktäschel's argument is that we are arriving at that barrier, and that the path through it runs through automating the scientific method itself, starting with AI research on AI. He quotes David Deutsch's line that all evils are caused by insufficient knowledge, and Marc Andreessen's line that new knowledge has always been the main source of growth.

00:20:02 Then he says the logical conclusion is to get AI to autonomously attain knowledge to help solve humanity's hardest problems. Whether you find that thrilling or worrying depends on what part of the sentence you read first. Three things to hold next to each other.

00:20:17 First, the funding is from semiconductor companies, not just venture firms. Nvidia and AMD are both on the round, which matters because access to current-generation compute is now the binding constraint on this kind of research, and they control it. Second, the team is in London.

00:20:33 The choice to put the headquarters there, with a satellite in San Francisco, signals where the next-generation labs are being staffed and what regulatory environment they want to operate inside. The UK AI Safety Institute, the Frontier AI Taskforce relationships, and the proximity to DeepMind — those add up.

00:20:51 Third, the substantive bet — that the next big move in AI capability comes from open-ended self-improvement rather than another order-of-magnitude pre-training run — is one several serious people now share. If Recursive is right, the bottleneck is automated alignment research, not raw compute.

00:21:08 That is also where the most pointed disagreements about safety live. I want to put one other thing next to this. On Tuesday Thinking Machines Lab — Mira Murati's outfit — released what they're calling interaction models. The technical claim is that today's chat systems are like email.

00:21:24 You batch your thoughts, you wait for the model to finish, the model waits for you to finish, and the perception channel between you is narrow. Their proposed alternative is a model trained from scratch to handle two-hundred-millisecond micro-turns of audio and video in both directions, paired with a slower background model for longer reasoning.

00:21:44 The demos include simultaneous translation, real-time professional softening of how you'd say something to a colleague, and a model noticing you've started slouching and prompting you to fix your posture. Set the demos aside. The world-facing piece is that the regulatory and institutional language we have for AI right now — the FDA comparison Hassett tried out, the procurement clauses about model versions and release dates — is built around discrete inference calls.

00:22:12 Interaction models don't have discrete inference calls. They have streams. When the streams become the product, every rule we have about what counts as a release, what counts as deployment, and who is responsible for what the model produced in the second half of a sentence — all of it has to be rewritten.

00:22:29 Nobody is rewriting it yet. The through-line on the Recursive and interaction-model items is that the model design space — even at this late stage — isn't done. Two labs founded by serious people, on the same day, are betting that the next architecture moves matter as much as the next training run.

00:22:46 Whether either is right is a question for the second half of the year. But both are taking compute they could have spent on bigger pre-training and pointing it at something else. That is the bet, and that is what is being said out of two separate buildings, in two different cities, on the same Wednesday.

00:23:04

Closing

00:23:04 Six items today, and the through-line I'd write on the back of the day is this — the institutions are catching up to the models, the models aren't waiting, and the rules we're applying are still mostly the rules we had for software. EviCore's dial sits inside an employer-plan legal architecture that makes algorithmic denial almost free for the insurer.

00:23:24 Mythos sits inside a financial-stability conversation between Treasury, the Fed, and the five biggest banks, with no equivalent for the other ten thousand institutions in the system. Claude For Legal sits inside Microsoft 365, where the procurement decision was made twenty years ago and nobody is going to revisit it because a model showed up.

00:23:43 The Nature paper sits inside a vendor-confidentiality regime that lets training-data origins stay a trade secret until a buyer big enough to subpoena them decides otherwise. And the China delegation is being decided by who sits next to the president on the plane, with no Jensen Huang and no export licenses.

00:24:01 Tomorrow I want to track two specific things — what Anthropic's secondary-market enforcement actually does to gray-market private valuations once the SpaceX IPO docket starts moving, and whether any of the Mythos-using banks publish even a high-level vulnerability disclosure that lets the rest of the financial system catch up.

00:24:19 Until then. Jonas.