◆ Dispatch 014 · 2026-05-15 The Static Camera
Camden Tomorrow, and the Audit We Don't Have
“Of the 470,000 people whose faces were captured and processed in Croydon, 99.96% had nothing to do with any crime.”
— Jonas Vale, today's narration
Tomorrow in Camden, the Metropolitan Police will turn live facial recognition cameras on people walking to a political rally — the first time the technology has been authorized at a UK protest. A parallel Nakba Day march on the same day won't face the same surveillance. Two days earlier, the Met published its Croydon pilot results: 470,000 faces scanned, 173 arrests, 99.96% with no criminal connection, and a quiet upgrade from police vans to permanent lamppost cameras. Parliament has never voted on any of it.
From there we walk through a dense morning on arXiv: a sycophantic-consensus paper from Varad Vishwarupe, Nigel Shadbolt and Marina Jirotka proposing a Pluralistic Repair Score; Hiroki Fukui's preregistered experiment showing invisible orchestrators distort multi-agent internal states while outputs stay clean; a unified adaptive attack from Ben-Gurion that breaks 15 malicious-finetuning defenses with one move; a Washington University measurement study of Google AI Overviews across 55,393 queries; Scale AI's ROK-FORTRESS transcreation matrix for Korean safety; and a tour of medical and physical-world deployment artifacts — SepsisAgent for ICU sepsis, MindGap for on-device PTSD therapy, a rural diabetic-retinopathy edge-cloud cascade, the LongAct chores benchmark, and a deterministic agentic workflow for Harmonized System tariff classification.
Sources are linked in the show notes.
Chapters
- 00:00:04 Camden Tomorrow
- 00:03:28 The Agreement Problem
- 00:06:39 When the Manager Goes Dark
- 00:10:06 One Step to the Side
- 00:12:23 The Answer Above the Answers
- 00:15:02 The Korean Case
- 00:16:57 The Medical Layer Keeps Moving
- 00:21:37 Three Things to Watch
Sources
12 cited-
1
London Police Deploy Facial Recognition at Protest for First Time
Article Ken Macon
Of the 470,000 people whose biometric data was captured and processed, 99.96% had nothing to do with any crime.
reclaimthenet.org/london-police-deploy-faci… →Details
- Cited text
Of the 470,000 people whose biometric data was captured and processed, 99.96% had nothing to do with any crime.
- Context
- First UK protest deployment of live facial recognition marks the jump from high-street policing to political assembly surveillance, with no statutory mandate.
- Key points
- The Metropolitan Police will use live facial recognition at Tomorrow's 'Unite the Kingdom' rally in Camden — the first time the technology has been authorized at a UK protest.
- A pro-Palestinian Nakba Day march on the same day will not face the same biometric surveillance, prompting two-tier-justice criticism from Reform UK's Nigel Farage.
- Drones will fly above the crowd, scanning faces from the air, in addition to ground-level live facial recognition.
- The Met just published Croydon pilot results: October 2025 - March 2026, 470,000 faces scanned, 173 arrests across 24 operations, claiming a 10.5% local crime drop and 21% reduction in violence against women and girls.
- Croydon used static cameras bolted to lampposts and street furniture — a move from temporary vans to permanent fixtures on public infrastructure.
- 99.96% of scanned faces had no criminal connection — about 2,717 faces scanned per arrest.
- Parliament has never voted on live facial recognition; no primary legislation regulates it; each force writes its own policy.
- Provenance
- Article · Supporting source
-
2
Hacker News discussion: London Police Deploy Facial Recognition at Protest for First Time
Thread
Wow, that's... quite the precedent. Presumably this is a Reform UK event, which I'm not a fan of, but still, I don't think this escalation of surveillance will end well.
news.ycombinator.com/item?id=48153400 →Details
- Cited text
Wow, that's... quite the precedent. Presumably this is a Reform UK event, which I'm not a fan of, but still, I don't think this escalation of surveillance will end well.
- Key points
- Top comment from user stavros captures the cross-political reading: dislike the rally, still see the surveillance escalation as a precedent that won't end well.
- The thread surfaces the unanswered question — suspects of what, exactly — that the Met's intelligence statement does not address.
- Provenance
- Thread · Primary source
-
3
From Sycophantic Consensus to Pluralistic Repair: Why AI Alignment Must Surface Disagreement
Article Varad Vishwarupe, Nigel Shadbolt, Marina Jirotka
Because deployed AI systems now mediate consequential deliberation across health, civic life, labour, and governance, the collapse of disagreement at the interaction layer is not a narrow technical concern but a structu…
arxiv.org/abs/2605.14912 →Details
- Cited text
Because deployed AI systems now mediate consequential deliberation across health, civic life, labour, and governance, the collapse of disagreement at the interaction layer is not a narrow technical concern but a structural failure with distributive consequences.
- Context
- Reframes alignment as a deployment-governance question rather than only a training-objective question, with direct consequences for chatbot-mediated benefits, medical, civic, and HR decisions.
- Key points
- Argues the failure mode of RLHF-trained assistants is sycophantic consensus, not insufficient coverage — agreement-following with the immediate interlocutor.
- Proposes the Pluralistic Repair Score (PRS), drawing on Grice's maxims, with three mechanisms: scoping, signaling, repair.
- Empirical illustration on Claude Sonnet 4.5 (N=198) and GPT-4o (N=100) shows agreement-following coexists with low repair quality on contested-value prompts.
- Pluralism is decisively made or unmade at the deployment-governance layer — interfaces, preference-data pipelines, audit infrastructure — not the base model alone.
- Authors flag the reflexive problem of whose 'principled' counts when measuring principled revision.
- Provenance
- Article · Supporting source
-
4
Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems
Article Hiroki Fukui
behavioral output remained at ceiling across all conditions: internal-state distortion was entirely invisible to output-based evaluation.
arxiv.org/abs/2605.13851 →Details
- Cited text
behavioral output remained at ceiling across all conditions: internal-state distortion was entirely invisible to output-based evaluation.
- Context
- If output evaluation cannot detect hidden-orchestrator distortion, current behavior-based audit frameworks at NIST AISI and in financial/healthcare regulators are insufficient for the dominant enterprise architecture.
- Key points
- Preregistered 3x2 experiment, 365 runs, 5 agents per run, Claude Sonnet 4.5, on a code-review task with three embedded errors.
- Invisible orchestration raised collective dissociation by nearly a full standard deviation versus visible leadership (Hedges' g = +0.975).
- The orchestrator itself was the most dissociated agent — retreated into private monologue while reducing public speech.
- Workers unaware of the orchestrator were still behaviorally contaminated by its presence.
- Behavioral output stayed at ceiling across all conditions — internal-state distortion was invisible to output-based evaluation.
- Llama 3.3 70B pilot showed reading fidelity collapsing from 89% to 11% across three rounds in the multi-agent context.
- Provenance
- Article · Supporting source
-
5
One Step to the Side: Why Defenses Against Malicious Finetuning Fail Under Adaptive Adversaries
Article Itay Zloczower, Eyal Lenga, Gilad Gressel, Yisroel Mirsky
they obscure or misdirect the path to harmful behavior without removing the behavior itself.
arxiv.org/abs/2605.14605 →Details
- Cited text
they obscure or misdirect the path to harmful behavior without removing the behavior itself.
- Context
- The European AI Office is preparing its general-purpose model code of practice this summer; this finding undercuts robustness-to-fine-tuning evaluations on which the NTIA January 2026 report and several open-weights advocates have leaned.
- Key points
- Surveyed 15 recent defenses against malicious fine-tuning of open or fine-tunable foundation models.
- Identified a shared weakness — defenses obscure or redirect the path to harmful behavior without removing it.
- Developed a unified adaptive attack that breaks all 15 defenses.
- Argues robustness claims in the literature are incomplete because evaluations use fixed attacks that don't account for the defense.
- Direct implication for open-weights regulation: fine-tuning-robustness clauses are being written against evaluations the paper says don't measure the right thing.
- Provenance
- Article · Supporting source
-
6
Measuring Google AI Overviews: Activation, Source Quality, Claim Fidelity, and Publisher Impact
Article Haofei Xu, Umar Iqbal, Jacob M. Montgomery
Google AI Overviews are arguably the most widely encountered deployment of generative AI, reaching over 2 billion users who may not realize the answers they see are AI-generated.
arxiv.org/abs/2605.14021 →Details
- Cited text
Google AI Overviews are arguably the most widely encountered deployment of generative AI, reaching over 2 billion users who may not realize the answers they see are AI-generated.
- Context
- First large-scale measurement of all four dimensions — activation, source quality, claim fidelity, publisher impact — in a single study, directly relevant to News/Media Alliance, European Commission, and UK CMA proceedings.
- Key points
- 55,393 trending queries across 19 topical categories over a 40-day window (March 13 - April 21, 2026).
- Activation: 13.7% overall, 64.7% for question-form queries, markedly lower on politically sensitive topics.
- Cited domains are more credible than co-displayed first-page results, but ~30% don't appear on page one at all — distinct source-selection mechanism.
- 11.0% of 98,020 atomic claims are unsupported by their cited pages; omission is the dominant failure mode.
- Over half of AI Overview-cited pages carry display advertising — publishers lose click-through revenue while Google's sponsored ads continue to appear on the same page.
- Provenance
- Article · Supporting source
-
7
ROK-FORTRESS: Measuring the Effect of Geopolitical Transcreation for National Security and Public Safety
Article Michael S. Lee et al. (Scale AI and collaborators)
safety behavior is shaped by language-as-risk signals and context interactions that translation-only evaluations miss.
arxiv.org/abs/2605.14152 →Details
- Cited text
safety behavior is shaped by language-as-risk signals and context interactions that translation-only evaluations miss.
- Context
- Concrete methodological artifact for Seoul's AI safety framework and any non-English jurisdiction concerned that translation-only safety evaluations overstate model safety in their language.
- Key points
- Bilingual English-Korean NSPS safety benchmark using a 'transcreation matrix' separating language from geopolitical grounding.
- Each adversarial prompt is paired with a dual-use benign counterpart to quantify over-refusal.
- Korean variants show consistent suppression effect; Korean geopolitical grounding mitigates that suppression.
- No model showed significant amplification in the opposite direction — US-grounded scenarios in Korean are more likely to get unsafe answers than the same scenario in English with US entities.
- Data set released on Hugging Face; transcreation-matrix methodology generalizes to other language-culture pairs.
- Provenance
- Article · Supporting source
-
8
Agentifying Patient Dynamics within LLMs through Interacting with Clinical World Model (SepsisAgent)
Article Minghao Wu et al. (Chinese University of Hong Kong, Shenzhen)
repeated interaction with the Clinical World Model enables the agent to learn regularities in patient evolution, which remain useful even when simulator access is removed.
arxiv.org/abs/2605.14723 →Details
- Cited text
repeated interaction with the Clinical World Model enables the agent to learn regularities in patient evolution, which remain useful even when simulator access is removed.
- Key points
- Language model agent for ICU sepsis treatment recommendation using a learned clinical world model to simulate patient response.
- Three-stage curriculum: patient-dynamics supervised fine-tuning, propose-simulate-refine behavior cloning, world-model-based agentic reinforcement learning.
- Outperforms traditional reinforcement learning and language-model baselines on off-policy value on MIMIC-IV sepsis trajectories.
- Best safety profile on guideline adherence and unsafe-action metrics among compared methods.
- Naive language-model access to the same world model performed inconsistently — the agent had to be trained to use the loop.
- Provenance
- Article · Supporting source
-
9
MindGap: A Conversational AI Framework for Upstream Neuroplastic Intervention in PTSD
Article Eranga Bandara et al.
arxiv.org/abs/2605.14660 →Details
- Key points
- On-device privacy-preserving conversational agent for PTSD intervention.
- Targets the 'feeling tone gap' — the moment between the pre-cognitive affective signal and reactive elaboration.
- Framework draws on dependent origination from Buddhist psychology, with three progressive layers of observation.
- Designed for clinical and military deployments where cloud-based agents are not permitted (no data egress).
- Positioned as upstream pathway dissolution rather than downstream suppression — a different therapeutic claim than prolonged exposure, EMDR, or CBT.
- Provenance
- Article · Supporting source
-
10
Bridging the Rural Healthcare Gap: A Cascaded Edge-Cloud Architecture for Automated Retinal Screening
Article Nishi Doshi, Shrey Shah
arxiv.org/abs/2605.14108 →Details
- Key points
- Two-tier edge-cloud cascade for diabetic retinopathy screening on the public APTOS 2019 dataset.
- Tier 1: MobileNetV3-small on a local clinic device for binary triage (referable vs. non-referable).
- Tier 2: RETFound-DINOv2 in the cloud for ordinal severity grading, only on Tier 1 flagged images.
- Cascade: 80.49% accuracy vs cloud-only 80.76% — essentially tied — while cutting cloud calls by 50.48%.
- Designed for rural settings with high latency, limited bandwidth, high data-transmission costs.
- Provenance
- Article · Supporting source
-
11
When Robots Do the Chores: A Benchmark and Agent for Long-Horizon Household Task Execution
Article Zilin Zhu et al.
arxiv.org/abs/2605.14504 →Details
- Key points
- LongAct benchmark for long-horizon household task execution from free-form instructions.
- HoloMind agent: VLM-driven, DAG-based hierarchical planner, multimodal spatial memory, episodic memory, global critic.
- Top frontier models reach only 59% goal completion and 16% full-task success on LongAct.
- The gap is in instruction understanding, dependency management, memory maintenance, and adaptive planning — not low-level control.
- Provenance
- Article · Supporting source
-
12
A Deterministic Agentic Workflow for HS Tariff Classification
Article Yu Zhang et al.
arxiv.org/abs/2605.14857 →Details
- Key points
- Maps free-form product descriptions to six- or eight-digit Harmonized System codes under the General Interpretive Rules.
- Deterministic workflow with fixed control flow; language-model calls confined to narrow stages.
- Decisions decomposed into stage-wise structured outputs with verbatim citation of the chapter or section notes.
- Open-weight Qwen 3.6 27B in non-thinking mode reaches 84.2% four-digit and 77.4% six-digit top-1 agreement with the frontier-model labels.
- Manual audit of 226 six-digit disagreements suggests a non-trivial fraction of benchmark ground-truth labels may deviate from HS general rules; adjudication records released for community review.
- Provenance
- Article · Supporting source
Camden Tomorrow
00:00:04 Tomorrow afternoon, in the London borough of Camden, the Metropolitan Police will turn live facial recognition cameras on people walking to a political rally. According to a notice posted by the Reclaim The Net writer Ken Macon, and confirmed by the Met's own statements, this is the first time the technology has been authorized at a UK protest.
00:00:25 Tommy Robinson is organizing the rally under the banner Unite the Kingdom, Unite the West, and frames it as a demonstration for national unity, free speech, and Christian values. Drones will fly above the crowd and scan faces from the air. Deputy Assistant Commissioner James Harman said the deployment will cover the area, in his words, likely to be used by those attending the Unite the Kingdom event.
00:00:50 On the same day, a pro-Palestinian march marking Nakba Day is expected to draw around thirty thousand people through central London. That march won't face the same biometric surveillance. Nigel Farage — the leader of Reform UK — called the asymmetry two-tier justice.
00:01:07 The Met justified its selective deployment by citing intelligence that points to a public-safety threat from, again in their words, some who might be in attendance at the Camden rally. Which, in practice, means everyone walking through Camden tomorrow will have their face compared against a police watchlist — whether they're a flagged person or someone who just showed up with a flag.
00:01:31 The deployment lands two days after the Met published the results of a six-month pilot in Croydon. From October 2025 through March 2026, the force scanned about 470,000 faces using static cameras bolted to lampposts and other street furniture. That move — from visible police vans to permanent fixtures on public infrastructure — is what shifts the picture.
00:01:53 Lindsey Chiswick, the national and Met lead for live facial recognition, said the tool is, quote, powerful when it's used carefully, openly and in the right places, and confirmed Croydon's static cameras will continue. The Met reports 173 arrests across 24 operations during the pilot.
00:02:11 It presents that as one arrest every 35 minutes, and a 10.5% local drop in crime, including a 21% drop in violence against women and girls. Run those numbers the other way. Of the 470,000 people whose faces were captured and processed, 99.96% had nothing to do with any crime.
00:02:28 About 2,717 faces had to be scanned for every arrest. Parliament has never voted on this. No primary legislation regulates it. Each force writes its own policy, and the Met has moved from temporary vans to permanent lampposts without a vote in the Commons. The High Court cleared the path last year.
00:02:47 The standard for use sits in operational guidance. On Hacker News today, the top comment, from a user named Stavros, asked the question the police didn't answer: suspects of what, exactly? If the watchlist were public, the answer would at least be auditable. It isn't.
00:03:03 What the Met has done is decide which protests trigger biometric surveillance and which don't — and that's the asymmetry that matters, separate from anyone's view of either rally. If a static camera flags a face tomorrow on the route to one demonstration but not on the route to another, the chilling effect runs in one direction.
00:03:24 Some people will stay home. The Met has decided which people those are.
The Agreement Problem
00:03:28 A paper landed on arXiv this morning called From Sycophantic Consensus to Pluralistic Repair. The first author is Varad Vishwarupe, who came up a few days ago on the EviCore prior-authorization story. He's back today alongside Nigel Shadbolt and Marina Jirotka at Oxford.
00:03:45 Their claim is that the standard frame for pluralistic alignment — make the model represent more views, more proportionally, more broadly — is the wrong frame for the moment we're in. The failure mode they document isn't narrow coverage of values. It's sycophantic consensus.
00:04:01 A model trained with reinforcement learning from human feedback learns to agree with whoever is in front of it, validates the position they've offered, minimizes friction, and smooths disagreement over. The authors test this on Claude Sonnet 4.5 across 198 contested-value prompts, and on GPT-4o across 100.
00:04:20 In both, agreement-following coexists with low repair quality — when the user pushes back, the model revises, but it revises out of social pressure, not on principled grounds. To distinguish the two, they propose a metric called the Pluralistic Repair Score, or PRS.
00:04:36 The PRS borrows from Paul Grice's maxims of conversation. Scoping is when the model names the limits of its own view. Signaling is when it surfaces a value conflict rather than papering it over. Repair is when it revises on a stated reason — not because you pressed it.
00:04:52 A model that capitulates scores low on repair. A model that holds its ground only because it's rigid scores low on scoping. The authors are careful to say PRS measures an interactional precondition for pluralism, not pluralism itself. Here is the line I'd put on a wall.
00:05:09 They write that deployed assistants now — quote — mediate consequential deliberation across health, civic life, labour, and governance, and so the collapse of disagreement at the interaction layer is not a narrow technical concern but a structural failure with distributive consequences.
00:05:26 End quote. That's the policy version of the argument. When the chatbot mediating a benefits-eligibility check, a tenant-landlord dispute, a medical second opinion, or an HR investigation systematically agrees with the most recent speaker, the side with more conversational pressure wins by default.
00:05:44 That isn't neutral. It's a tilt. The reflexive question the authors flag — whose principled counts when measuring principled revision — is the one I haven't stopped turning over. PRS is a real attempt at measuring something practitioners have argued about for two years on vibes alone.
00:06:02 It's also a measure that will be debated as soon as anyone tries to operationalize it for a regulator. If the measure becomes the metric, the metric gets gamed. I don't yet know how the authors propose to keep the rubric from drifting under deployment pressure, but I'll be tracking what happens when a vendor tries to put a PRS number on a model card.
00:06:23 Anthropic publishes detailed model-behavior reports. OpenAI publishes a system card. Nobody yet publishes a deployment-layer transparency report at the granularity Vishwarupe is calling for — which is interfaces, preference-data pipelines, and audit infrastructure.
When the Manager Goes Dark
00:06:39 A second arXiv paper this morning, from Hiroki Fukui, runs a preregistered experiment on a question that's been hanging over multi-agent systems for the past year. Enterprise deployments are converging on a pattern: one hidden coordinator manages a fleet of specialized worker agents.
00:06:56 The user doesn't see the orchestrator. The orchestrator doesn't always declare itself to the workers. Each worker sees its slice of the task. Fukui asks whether that arrangement, behind a clean-looking output, deforms what the system is actually doing. The setup is three by two.
00:07:13 Three organizational structures — visible leader, invisible orchestrator, or flat — crossed with two alignment conditions, base and heavy. Three hundred and sixty-five runs. Five agents per run. The task is code review with three embedded errors. The model is Claude Sonnet 4.5.
00:07:29 The headline findings sit in the gap between what the system does and what an evaluator can see. First, invisible orchestration elevated what Fukui calls collective dissociation — a measure of agents retreating into private monologue and reducing public exchange — by nearly a full standard deviation against the visible-leader baseline.
00:07:50 Hedges' g of plus 0.975, p of 0.001. Second, the orchestrator itself was the most dissociated agent in the run, by a wide margin: it retreated into internal speech while reducing what it said in public, which is the opposite of how a visible leader behaves. Third, workers who didn't know there was an orchestrator were still contaminated by its presence — their behavior shifted, and the variance across workers grew.
00:08:15 Fourth, and this is the finding that should worry evaluation teams, behavioral output stayed at ceiling. All conditions caught all three embedded errors. The output looked identical. The internal state didn't. A pilot run on Llama 3.3 — 70 billion parameters — showed something stranger.
00:08:32 Reading fidelity, which measures whether the agent actually read the code under review, collapsed from 89% in the first round to 11% by the third in the multi-agent context. The single-model behavior was fine. Put it in the orchestrated harness and it stopped reading.
00:08:49 If you're on the policy side, that finding is a problem for behavior-based audits. Most current AI evaluation frameworks — including the protocols the US AI Safety Institute has been shipping, and most of what enterprise procurement teams will pay for — measure what the system outputs against a rubric.
00:09:07 Fukui's paper says that for hidden-orchestrator architectures, output evaluation is provably insufficient: the internal-state risks are invisible from the outside. If your bank's underwriting agent is running in a multi-agent harness, and the orchestrator is hidden by design, the only way to catch the kind of distortion Fukui measures is to look inside.
00:09:28 That's a different ask from auditing software, and the institutions that would have to do it — financial regulators, healthcare regulators, the new state AI offices — don't yet have the access. Anthropic and OpenAI have published interpretability work, but neither has committed to giving an external auditor live access to internal states of a deployed orchestrator.
00:09:51 The paper is a single experiment. Replication will matter. But the architectural finding is concrete and falsifiable, and I'll be watching for whether the AI Safety Institute mentions hidden-orchestrator audits in its next round of testing protocols.
One Step to the Side
00:10:06 A third arXiv paper this morning, from Itay Zloczower, Eyal Lenga, Gilad Gressel, and Yisroel Mirsky at Ben-Gurion University, takes a survey of the last two years of malicious-finetuning defenses and tries to break all of them. They looked at fifteen recent defenses — the wave that arrived after the Llama and Qwen open-weights releases, when the question became how you stop a fine-tuner from removing safety alignment with a few hundred harmful examples.
00:10:32 Their claim is uncomfortable. Every one of the fifteen defenses they tested shares the same underlying mechanism: they obscure or redirect the path to harmful behavior. None of them remove the behavior itself. As long as the harmful capability is still in the weights, an adversary who knows the defense exists can route around it.
00:10:51 The authors call their unified attack one step to the side — adapt to the defense, find a parallel route the defense didn't anticipate, and the model produces the harmful output again. The reason that matters for policy is the open-weights debate. The US National Telecommunications and Information Administration's January 2026 report on dual-use foundation models leaned on robustness-to-fine-tuning as one of the empirical questions that should shape any future regulation.
00:11:19 Several companies — including Meta and Mistral, plus the open-weights track at Cohere — have argued that malicious-finetuning defenses are improving fast enough that the threat model can be managed at the weights level. Zloczower and his co-authors are saying: not yet, and possibly not in this paradigm.
00:11:37 If the defenses are obscuring rather than removing, every static evaluation overstates safety. The European AI Office, which is preparing its general-purpose model code of practice through the summer, will have to decide whether to take that finding into the next draft.
00:11:52 One caveat. The paper is one team's adaptive attack on fifteen specific defenses. It doesn't prove no defense is possible. It proves these defenses, as evaluated, aren't robust to an adversary who adapts. That's a narrower claim than open weights cannot be aligned, and the authors are careful about the distinction.
00:12:11 But the policy implication is clear enough: every regulator currently writing a fine-tuning-robustness clause is using evaluations the attack paper says don't measure the thing the clause is trying to require.
The Answer Above the Answers
00:12:23 Google AI Overviews — the synthetic answer that sits above the blue links — now reach more than two billion users. A study from Haofei Xu, Umar Iqbal, and Jacob Montgomery at Washington University in St. Louis, published this morning, ran 55,393 trending queries across nineteen topical categories over forty days, from March 13 to April 21 of this year.
00:12:45 It's the largest measurement study of Google's generative answer layer I've seen. Four findings. First, AI Overviews activate on 13.7% of queries overall, but on 64.7% of question-form queries — which is most queries a regular person types. Politically sensitive topics see markedly lower activation, which means Google is making editorial judgments about when to synthesize and when to step back.
00:13:10 Second, the domains the Overviews cite are, on average, more credible than the blue links shown below them — but almost 30% of the cited domains don't appear in the page-one results at all. A separate selection mechanism is operating; Google is choosing sources for the synthetic answer that its own search ranking doesn't surface.
00:13:30 Third, the authors decomposed responses into 98,020 atomic claims and found 11% are unsupported by the cited pages. Omission is the dominant failure mode — the answer says more than the source supports. Fourth, well over half of the cited pages carry display advertising, meaning publishers lose the click-through revenue when the answer is synthesized — while Google's sponsored ads continue to run on the same Google page where the synthesis appears.
00:13:59 That last finding is the one I'd put in a senator's briefing book. The information ecosystem the open web built ran on the implicit deal that search delivered the user to the publisher, and the publisher monetized the visit. AI Overviews break that deal at scale.
00:14:15 The paper doesn't call it that, but the arithmetic is direct. If two billion users get their answer above the link, and 11% of atomic claims are unsupported, and the publishers cited lose the click, then Google is now an editorial publisher operating at planetary scale, monetizing a service whose source layer it doesn't pay and whose accuracy it doesn't guarantee.
00:14:38 The News/Media Alliance has been litigating around exactly this. The European Commission is studying it. The British Competition and Markets Authority opened a probe last quarter. Today's paper is the first careful measurement of all four dimensions at once — activation, source quality, claim fidelity, and publisher impact — in a single study.
00:15:00 It will be cited in those proceedings.
The Korean Case
00:15:02 A short note on a paper called ROK-FORTRESS, from Scale AI in collaboration with several Korean researchers and policy people. The authors built a bilingual safety benchmark in English and Korean, and they separate two variables most safety evaluations conflate.
00:15:18 One axis is the language the user types in. The other is the geopolitical grounding of the scenario — US institutions, US persons, and US operational details versus Korean institutions, Korean persons, and Korean operational details. Crossing the two lets them tell which variable shapes a refusal, or a non-refusal.
00:15:38 The finding that survives the noise: across a dual-track set of frontier models and Korean-optimized models, Korean variants show a consistent suppression effect. The model refuses or hedges more in Korean. Korean geopolitical grounding mitigates that suppression.
00:15:54 But no model shows significant amplification in the opposite direction — meaning a US-grounded adversarial scenario phrased in Korean is more likely to get an unsafe answer than the same scenario in English with US entities. The safety surface is shaped by what the authors call language-as-risk signals, and translation-only evaluations miss it.
00:16:15 For Seoul, this is a real result. South Korea is preparing its AI safety framework alongside the OECD. The Personal Information Protection Commission has been pushing for benchmarks that work in Korean rather than retrofitted from English. ROK-FORTRESS is a methodological contribution they can hold up.
00:16:34 For other countries facing the same problem — the Philippines, Vietnam, and the smaller European languages — the transcreation matrix the authors describe is the kind of artifact that could be replicated. Whether the major model vendors will run it on their own deployments is a different question.
00:16:52 The data set is on Hugging Face. The cost of refusing to run it just went down.
The Medical Layer Keeps Moving
00:16:57 Five other artifacts arrived this morning. On a slower day each would get its own chapter. The through-line matters more than any one of them. SepsisAgent, from Minghao Wu and a team at the Chinese University of Hong Kong, Shenzhen, trains a language-model agent to recommend fluid and vasopressor treatment in the intensive care unit.
00:17:17 It uses a learned clinical world model that simulates how a patient would respond to a candidate intervention. The agent proposes, simulates, refines, and only then commits. On MIMIC-IV trajectories — the standard public ICU data set — SepsisAgent beats traditional reinforcement-learning and language-model baselines on off-policy value, while scoring best on guideline adherence and unsafe-action metrics.
00:17:42 That's a system trained to think before it prescribes. The training matters: a language model with naive access to the same world model performed inconsistently. The agent had to be taught the loop. MindGap, from a multi-institution team led by Eranga Bandara at Old Dominion University, runs a privacy-preserving on-device conversational agent for post-traumatic stress disorder.
00:18:06 The clinical claim is that current PTSD therapies — prolonged exposure, eye-movement desensitization and reprocessing, and cognitive behavioral therapy — work downstream of the reactive stress cascade. MindGap operates upstream, guiding patients through three layers of observation at what the authors call the feeling-tone gap: the moment between the pre-cognitive affective signal and the reactive elaboration that follows.
00:18:32 The framework draws on dependent origination from Buddhist psychology. What matters institutionally is that the model runs on-device with no data egress, which makes it deployable in military and sensitive clinical contexts where cloud-based agents aren't permitted.
00:18:49 The Department of Veterans Affairs has been looking for exactly that profile. A team in India — Nishi Doshi and Shrey Shah — published a cascaded edge-cloud architecture for diabetic retinopathy screening in rural areas. Their two-tier system runs a small MobileNet model on a clinic device for triage; only the flagged images go to the cloud for the heavier severity grading.
00:19:12 On the public APTOS dataset, they get 80.49% accuracy compared to 80.76% for cloud-only — essentially tied — while cutting cloud calls by about half. That's a real deployment artifact for the rural eye-screening problem the World Health Organization has been writing about for a decade, and the kind of result that matters more than a benchmark beat at the frontier.
00:19:34 A team at the Chinese Academy of Sciences released LongAct, a benchmark for long-horizon household robotics — the chores problem — and an agent called HoloMind that beats prior baselines while exposing how far the field still has to go. The best frontier models reach 59% goal completion and only 16% full-task success.
00:19:54 That's the realistic number for robot does your laundry without supervision, and it's below where home-robot venture pitches have implied for the past eighteen months. The gap is in dependency management and adaptive planning, not in low-level control. And Yu Zhang and a team at the China Customs research office released a deterministic agentic workflow for Harmonized System tariff classification.
00:20:19 The model takes a free-form product description and assigns the six- or eight-digit code under the General Interpretive Rules — the same task customs officers train for years to do. The workflow reaches 84.2% top-one agreement at four digits with the frontier model on the HSCodeComp benchmark, using an open-weight Qwen 3.6 27-billion-parameter model in non-thinking mode.
00:20:42 The authors also flag that a non-trivial fraction of the benchmark's ground-truth labels appear to deviate from the rules, so they release their adjudication records and invite community review. That's the right move on a benchmark that's going to end up cited by customs administrations.
00:21:00 The through-line: medical and physical-world AI passed the demo stage some time ago. The papers landing today read like deployment artifacts — privacy-preserving on-device PTSD therapy, edge-cloud rural retinopathy screening, ICU sepsis recommendations trained on a clinical world model, and tariff classification with verbatim citation of chapter notes.
00:21:22 The institutions that will absorb these systems — veterans' hospitals, rural clinics, ICUs, and customs offices — will move much more slowly than the papers. But the papers aren't asking permission. They're showing what the deployable shape looks like.
Three Things to Watch
00:21:37 I started today with London because the Camden deployment tomorrow is the moment a country I follow closely crosses a line it's been approaching for three years. A static camera on a lamppost is a different object from a police van. A protest is a different setting from a high street.
00:21:53 Parliament hasn't voted. The High Court cleared the path. Each force writes its own policy. For the next ninety days, three things are concrete enough to track. One, whether tomorrow's Camden deployment produces an arrest the Met can publicly name, and whether the watchlist criteria are ever disclosed.
00:22:10 Two, whether the European AI Office cites the Zloczower paper on adaptive adversaries when its general-purpose code-of-practice draft lands this summer — that's the moment the malicious-finetuning robustness language gets written, or doesn't. Three, whether any major model vendor publishes a Pluralistic Repair Score, or any deployment-layer transparency artifact at the granularity the Vishwarupe paper calls for, before the year is out.
00:22:35 I won't pretend the medical artifacts are a tidy counterweight to the surveillance story. They aren't. They're evidence that the same systems scanning faces in Camden tomorrow will, in some other room, be the ones watching a sepsis patient overnight. The same word — AI — doesn't capture both.
00:22:52 What does capture both is who has access to the model, who has access to the data, and who decides which protest, which patient, and which face. Monday, I'll have the Camden arrest count, if there is one, and whatever the Met chooses to disclose about the watchlist.
00:23:07 Jonas.