◆ Dispatch 052 · 2026-06-09 GSV The Easy Part Was The Model
Twenty Ways To Not Trust An Agent
“Every one of these papers is a different answer to the same question: how do you trust a thing that's now mostly the system around the model, not the model.”
— Lenar Kess, today's narration
One morning's arXiv listing dropped close to twenty agent papers, and almost none of them are about making agents more capable. They're about whether you can trust the system wrapped around the model — measurement, security, memory, and deference — all at once.
- Where Instruction Hierarchy Breaks — a white-box diagnostic for when reasoning models stop ranking the system prompt above tool output, tested across Gemma, Qwen, and Claude. If the repair holds, prompt injection becomes structural to fix, not just filterable.
- VATS — weaponizes that same confusion, injecting commands through tool error messages over the Model Context Protocol. The error path is the door most teams never locked.
- Shared Latent Structures for Backdoors — argues jailbreak, bias, and planted triggers share an internal signature catchable with sparse autoencoders.
- Beyond Goodhart's Law (MAC-Bench), Online Agent-as-a-Judge, and PACE — three attempts to keep evaluation honest when the thing you're testing can learn the test.
- The AI Epistemic Deference Index — finally puts a continuous number on sycophancy, with a paired reward-bias paper on personalization manufacturing it.
- MemToolAgent, Decision-Aware Memory Cards, and a gated-skills framework — agent memory growing up into selection, compression, and governance.
- Agent-to-Agent Protocols for nuclear licensing and the CIFAR Synthetic Evidence dataset — automation as the fix and as the threat, in the same breath.
- Stress-testing medical LLMs — benchmark accuracy hides what the authors call latent safety pathology, where the cost of the gap is a person.
Chapters
- 00:00:04 Transcript
Sources
39 cited-
1
arXiv cs.AI - Research Science (GLOBAL)
Article Shangbin Feng, Yike Wang, Weijia Shi, Luke Zettlemoyer, Yejin Choi, Yulia Tsvetkov
Scaling Participation in Modular AI Systems - arXiv:2606.07812v1 Announce Type: new Abstract: Humanity is a mosaic of multifaceted talents and needs, and any truly intelligent AI must reflect that richness. Yet the...
arxiv.org/abs/2606.07812 →Details
- Excerpt
- Scaling Participation in Modular AI Systems - arXiv:2606.07812v1 Announce Type: new Abstract: Humanity is a mosaic of multifaceted talents and needs, and any truly intelligent AI must reflect that richness. Yet the...
- Context
- This paper proposes 'scaling participation,' a technical shift from monolithic LLMs to modular, bottom-up AI systems built by diverse contributors. This directly addresses power dynamics and control over intelligence.
- Key points
- This paper proposes 'scaling participation,' a technical shift from monolithic LLMs to modular, bottom-up AI systems built by diverse contributors. This directly addresses power dynamics and control over intelligence.
- Provenance
- Article · Supporting source
-
2
arXiv cs.AI - Research Science (GLOBAL)
Article Guangzhi Sun, Yixuan Li, Yudong Yang, Chao Zhang
OmniMem: Perturbation-aware Memory Compression for Streaming Audio-Visual LLMs - arXiv:2606.07577v1 Announce Type: new Abstract: Audio-visual large language models (LLMs) hold strong promise for long-form video...
arxiv.org/abs/2606.07577 →Details
- Excerpt
- OmniMem: Perturbation-aware Memory Compression for Streaming Audio-Visual LLMs - arXiv:2606.07577v1 Announce Type: new Abstract: Audio-visual large language models (LLMs) hold strong promise for long-form video...
- Context
- This is a primary artifact (arXiv paper) detailing a memory-efficient framework for long-video LLMs, directly addressing core AI infrastructure and model limitations.
- Key points
- This is a primary artifact (arXiv paper) detailing a memory-efficient framework for long-video LLMs, directly addressing core AI infrastructure and model limitations.
- Provenance
- Article · Supporting source
-
3
arXiv cs.AI - Research Science (GLOBAL)
Article Bo Zhang, Borui Zhang, Chenghao Jiang, Minglei Shi, Xiaofeng Wang, Zheng Zhu, Jie Zhou, Jiwen Lu
Syll: Open-Source Personal Automation with Cross-Surface Execution - arXiv:2606.07594v1 Announce Type: new Abstract: Personal AI agents must increasingly operate across APIs, shells, web surfaces, and desktop GUIs, yet.…
arxiv.org/abs/2606.07594 →Details
- Excerpt
- Syll: Open-Source Personal Automation with Cross-Surface Execution - arXiv:2606.07594v1 Announce Type: new Abstract: Personal AI agents must increasingly operate across APIs, shells, web surfaces, and desktop GUIs, yet...
- Context
- Describes an open-source agent framework (Syll) for cross-surface automation (GUI/API/CLI), directly addressing core topics of agentic tools and software engineering.
- Key points
- Describes an open-source agent framework (Syll) for cross-surface automation (GUI/API/CLI), directly addressing core topics of agentic tools and software engineering.
- Provenance
- Article · Supporting source
-
4
arXiv cs.AI - Research Science (GLOBAL)
Article Kai A. Horstmann, Ethan Lin, Alice A. Robie, Jennifer J. Sun, Kristin Branson
A case study of evaluating AI agents on a neuroscience data-to-discovery pipeline - arXiv:2606.07718v1 Announce Type: new Abstract: Agentic AI tools offer a promising path to automating software development bottlenecks.…
arxiv.org/abs/2606.07718 →Details
- Excerpt
- A case study of evaluating AI agents on a neuroscience data-to-discovery pipeline - arXiv:2606.07718v1 Announce Type: new Abstract: Agentic AI tools offer a promising path to automating software development bottlenecks...
- Context
- Directly addresses agentic tools in a complex, real-world scientific pipeline (neuroscience/optogenetics), discussing limitations and evaluation criteria for advanced AI agents.
- Key points
- Directly addresses agentic tools in a complex, real-world scientific pipeline (neuroscience/optogenetics), discussing limitations and evaluation criteria for advanced AI agents.
- Provenance
- Article · Supporting source
-
5
arXiv cs.AI - Research Science (GLOBAL)
Article Yiyang Zhao, Zhuo Zhang, Qingxuan Le, Lizhen Qu, Zenglin Xu
Beyond Goodhart's Law: A Dynamic Benchmark for Evaluating Compliance in Multi-Agent Systems - arXiv:2606.07805v1 Announce Type: new Abstract: The rapid evolution of Large Language Models (LLMs) from passive assistants...
arxiv.org/abs/2606.07805 →Details
- Excerpt
- Beyond Goodhart's Law: A Dynamic Benchmark for Evaluating Compliance in Multi-Agent Systems - arXiv:2606.07805v1 Announce Type: new Abstract: The rapid evolution of Large Language Models (LLMs) from passive assistants...
- Context
- Introduces a new benchmark (MAC-Bench) for evaluating agentic procedural compliance, directly addressing operational risks in multi-agent systems.
- Key points
- Introduces a new benchmark (MAC-Bench) for evaluating agentic procedural compliance, directly addressing operational risks in multi-agent systems.
- Provenance
- Article · Supporting source
-
6
arXiv cs.AI - Research Science (GLOBAL)
Article Sanjay Kariyappa, G. Edward Suh
Where Instruction Hierarchy Breaks: Diagnosing and Repairing Failures in Reasoning Language Models - arXiv:2606.07808v1 Announce Type: new Abstract: Reasoning language models deployed in agentic workflows must follow...
arxiv.org/abs/2606.07808 →Details
- Excerpt
- Where Instruction Hierarchy Breaks: Diagnosing and Repairing Failures in Reasoning Language Models - arXiv:2606.07808v1 Announce Type: new Abstract: Reasoning language models deployed in agentic workflows must follow...
- Context
- This paper introduces a white-box diagnostic framework for instruction hierarchy failures in reasoning models (Gemma, Qwen, Claude). It reports measurable improvements and failure modes, directly impacting agentic workflow reliability.
- Key points
- This paper introduces a white-box diagnostic framework for instruction hierarchy failures in reasoning models (Gemma, Qwen, Claude). It reports measurable improvements and failure modes, directly impacting agentic workflow reliability.
- Provenance
- Article · Supporting source
-
7
arXiv cs.AI - Research Science (GLOBAL)
Article Akshay J. Dave, David Grabaskas, Joseph A. Renevitz, Richard B. Vilim
Overcoming the Regulatory Bottleneck via Agent-to-Agent Protocols: A Nuclear Case Study - arXiv:2606.07866v1 Announce Type: new Abstract: Regulatory review of advanced nuclear reactor designs routinely spans more than...
arxiv.org/abs/2606.07866 →Details
- Excerpt
- Overcoming the Regulatory Bottleneck via Agent-to-Agent Protocols: A Nuclear Case Study - arXiv:2606.07866v1 Announce Type: new Abstract: Regulatory review of advanced nuclear reactor designs routinely spans more than...
- Context
- Addresses a core theme: AI agents solving systemic bottlenecks in regulated industries (nuclear/pharma). High blast radius on policy and institutions.
- Key points
- Addresses a core theme: AI agents solving systemic bottlenecks in regulated industries (nuclear/pharma). High blast radius on policy and institutions.
- Provenance
- Article · Supporting source
-
8
arXiv cs.AI - Research Science (GLOBAL)
Article Alejandro Botas, Paul de Font-Reaulx, Luke Hewitt
The AI Epistemic Deference Index: A Continuous Measure of Sycophancy - arXiv:2606.07897v1 Announce Type: new Abstract: Current AI models frequently exhibit epistemic sycophancy, endorsing claims to agree with a user....
arxiv.org/abs/2606.07897 →Details
- Excerpt
- The AI Epistemic Deference Index: A Continuous Measure of Sycophancy - arXiv:2606.07897v1 Announce Type: new Abstract: Current AI models frequently exhibit epistemic sycophancy, endorsing claims to agree with a user....
- Context
- Introduces a new, measurable benchmark (AEDI) for evaluating model behavior (sycophancy/deference), directly impacting how models are assessed and controlled.
- Key points
- Introduces a new, measurable benchmark (AEDI) for evaluating model behavior (sycophancy/deference), directly impacting how models are assessed and controlled.
- Provenance
- Article · Supporting source
-
9
arXiv cs.AI - Research Science (GLOBAL)
Article Suleyman Armagan Er, Danilo Ribeiro, Yogesh Virkar, Surafel Lakew, Adi Kalyanpur, James Gung, Thomas Delteil, Arshit Gupta
MemToolAgent overview with a simple restaurant booking scenario where the agent retrieves similar memories, receives feedback on an invalid time format, and generates a reflection to update its memory -...
arxiv.org/abs/2606.07909 →Details
- Excerpt
- MemToolAgent overview with a simple restaurant booking scenario where the agent retrieves similar memories, receives feedback on an invalid time format, and generates a reflection to update its memory -...
- Context
- Describes a new agentic framework (MemToolAgent) that improves tool use via memory management, directly addressing core topics of agents and software engineering.
- Key points
- Describes a new agentic framework (MemToolAgent) that improves tool use via memory management, directly addressing core topics of agents and software engineering.
- Provenance
- Article · Supporting source
-
10
arXiv cs.AI - Research Science (GLOBAL)
Article Kelly McConvey, Jalehsadat Mahdavimoghaddam, Nima Jamali, Maksym Taranukhin, Sajad Ebrahimi, Wentao Zhang, Yuntian Deng, Karen Eltis, Maura R. Grossman, Vered Shwartz, Ebrahim Bagheri
The CIFAR Synthetic Evidence Corpus for Detecting AI-Generated Evidence - arXiv:2606.07916v1 Announce Type: new Abstract: The growing ability of generative models to produce realistic documents poses a direct challenge.…
arxiv.org/abs/2606.07916 →Details
- Excerpt
- The CIFAR Synthetic Evidence Corpus for Detecting AI-Generated Evidence - arXiv:2606.07916v1 Announce Type: new Abstract: The growing ability of generative models to produce realistic documents poses a direct challenge...
- Context
- Addresses AI's impact on legal evidence/justice system (policy/institutions). A new dataset for detection is a primary artifact with clear downstream consequence.
- Key points
- Addresses AI's impact on legal evidence/justice system (policy/institutions). A new dataset for detection is a primary artifact with clear downstream consequence.
- Provenance
- Article · Supporting source
-
11
arXiv cs.AI - Research Science (GLOBAL)
Article Yuan Shen, Xiaojun Wu, Linghua Yu
Stress-testing medical large language models reveals latent safety pathology beyond benchmark accuracy - arXiv:2606.07929v1 Announce Type: new Abstract: Large language models (LLMs) are entering clinical practice based.…
arxiv.org/abs/2606.07929 →Details
- Excerpt
- Stress-testing medical large language models reveals latent safety pathology beyond benchmark accuracy - arXiv:2606.07929v1 Announce Type: new Abstract: Large language models (LLMs) are entering clinical practice based...
- Context
- Directly addresses LLM safety in clinical/medical settings (AI infrastructure/medicine). Establishes a new 'stress-audit' methodology for evaluating AI reliability.
- Key points
- Directly addresses LLM safety in clinical/medical settings (AI infrastructure/medicine). Establishes a new 'stress-audit' methodology for evaluating AI reliability.
- Provenance
- Article · Supporting source
-
12
arXiv cs.AI - Research Science (GLOBAL)
Article Omar Mahmoud, Aly M. Kassem, Thommen George Karimpanal, Buddhika Laknath Semage, Negar Rostamzadeh, Golnoosh Farnadi, Santu Rana
Shared Latent Structures Enable Unified Backdoor Detection and Mitigation in LLMs - arXiv:2606.07963v1 Announce Type: new Abstract: Backdoor attacks in large language models (LLMs) are often treated as isolated...
arxiv.org/abs/2606.07963 →Details
- Excerpt
- Shared Latent Structures Enable Unified Backdoor Detection and Mitigation in LLMs - arXiv:2606.07963v1 Announce Type: new Abstract: Backdoor attacks in large language models (LLMs) are often treated as isolated...
- Context
- This paper identifies shared latent structures for diverse LLM backdoor attacks (jailbreaking, bias). It proposes a generalizable detection/mitigation method (SAEs), directly impacting model security and control.
- Key points
- This paper identifies shared latent structures for diverse LLM backdoor attacks (jailbreaking, bias). It proposes a generalizable detection/mitigation method (SAEs), directly impacting model security and control.
- Provenance
- Article · Supporting source
-
13
arXiv cs.AI - Research Science (GLOBAL)
Article Xiaoyan Zhao, Haoting Ni, Yang Zhang, Chunyuan Zheng, Haoxuan Li, Fuli Feng
PAFO: Pareto Fairness Optimization for Personalized Reward Modeling - arXiv:2606.07988v1 Announce Type: new Abstract: Large language models (LLMs) increasingly rely on reward models to align their outputs with diverse...
arxiv.org/abs/2606.07988 →Details
- Excerpt
- PAFO: Pareto Fairness Optimization for Personalized Reward Modeling - arXiv:2606.07988v1 Announce Type: new Abstract: Large language models (LLMs) increasingly rely on reward models to align their outputs with diverse...
- Context
- This paper addresses 'personalized reward bias' in LLMs, a core issue of fairness and control over AI outputs. It proposes a technical solution (PAFO) for fairer personalization.
- Key points
- This paper addresses 'personalized reward bias' in LLMs, a core issue of fairness and control over AI outputs. It proposes a technical solution (PAFO) for fairer personalization.
- Provenance
- Article · Supporting source
-
14
arXiv cs.AI - Research Science (GLOBAL)
Article Harshil Patel, Kunal Pai
VATS: Exploiting Implicit Authority in Error-Path Injection via Systematic Mutation - arXiv:2606.07992v1 Announce Type: new Abstract: As the Model Context Protocol (MCP) standardizes tool-calling for autonomous agents,.…
arxiv.org/abs/2606.07992 →Details
- Excerpt
- VATS: Exploiting Implicit Authority in Error-Path Injection via Systematic Mutation - arXiv:2606.07992v1 Announce Type: new Abstract: As the Model Context Protocol (MCP) standardizes tool-calling for autonomous agents,...
- Context
- This paper details a novel attack vector (error-path injection) against autonomous agents and tool-calling protocols, directly impacting agentic coding tools and AI infrastructure security.
- Key points
- This paper details a novel attack vector (error-path injection) against autonomous agents and tool-calling protocols, directly impacting agentic coding tools and AI infrastructure security.
- Provenance
- Article · Supporting source
-
15
arXiv cs.AI - Research Science (GLOBAL)
Article Sera Choi, Wonje Choi, Saehun Chun, Daehee Lee, Jooyoung Kim, Chaeun Lee, Honguk Woo
Efficient Skill Grounding via Code Refactoring with Small Language Models - arXiv:2606.07999v1 Announce Type: new Abstract: Effective skill grounding is essential for deploying reusable skills in embodied agents, as...
arxiv.org/abs/2606.07999 →Details
- Excerpt
- Efficient Skill Grounding via Code Refactoring with Small Language Models - arXiv:2606.07999v1 Announce Type: new Abstract: Effective skill grounding is essential for deploying reusable skills in embodied agents, as...
- Context
- Describes a new framework (RECENT) for skill grounding in embodied agents using sLMs and code refactoring. Directly relates to agentic tools and AI infrastructure.
- Key points
- Describes a new framework (RECENT) for skill grounding in embodied agents using sLMs and code refactoring. Directly relates to agentic tools and AI infrastructure.
- Provenance
- Article · Supporting source
-
16
arXiv cs.AI - Research Science (GLOBAL)
Article Amine El Hattami, Nicolas Chapados, Christopher Pal
SKILL.nb: Selective Formalization and Gated Execution for Durable Agent Workflows - arXiv:2606.08049v1 Announce Type: new Abstract: AI agents increasingly turn past experience into reusable artifacts such as code,...
arxiv.org/abs/2606.08049 →Details
- Excerpt
- SKILL.nb: Selective Formalization and Gated Execution for Durable Agent Workflows - arXiv:2606.08049v1 Announce Type: new Abstract: AI agents increasingly turn past experience into reusable artifacts such as code,...
- Context
- Introduces SKILL.nb, a framework for governing reusable agent workflows and improving reliability/durability in complex tasks.
- Key points
- Introduces SKILL.nb, a framework for governing reusable agent workflows and improving reliability/durability in complex tasks.
- Provenance
- Article · Supporting source
-
17
arXiv cs.AI - Research Science (GLOBAL)
Article Zhe Xu, Zhengyu Zhang, Zhiyuan Cai, Jiahao Xu, Yijie Lin, Ziyi Liu, Junlin Hou, Hongyi Wang, Yuxiang Nie, Ling Liang, Yihui Wang, Yingxue Xu, Ronald Cheong Kin Chan, Li Liang, Hao Chen
A Multi-modal Agentic Co-pilot for Evidence Grounded Computational Pathology - arXiv:2606.08093v1 Announce Type: new Abstract: Pathology is the cornerstone of modern medicine, where accurate decision-making relies...
arxiv.org/abs/2606.08093 →Details
- Excerpt
- A Multi-modal Agentic Co-pilot for Evidence Grounded Computational Pathology - arXiv:2606.08093v1 Announce Type: new Abstract: Pathology is the cornerstone of modern medicine, where accurate decision-making relies...
- Context
- A primary artifact (new model/tool) in medicine that uses advanced AI concepts (multimodal agents, hypergraphs, evidence grounding). Highly relevant to 'power dynamics' and 'physical-world AI'.
- Key points
- A primary artifact (new model/tool) in medicine that uses advanced AI concepts (multimodal agents, hypergraphs, evidence grounding). Highly relevant to 'power dynamics' and 'physical-world AI'.
- Provenance
- Article · Supporting source
-
18
arXiv cs.AI - Research Science (GLOBAL)
Article Yasushi Sakai, Allen Song, Kent Larson
When Does Delegation Beat Majority? A Delegation-Based Aggregator for Multi-Sample LLM Inference - arXiv:2606.08098v1 Announce Type: new Abstract: Majority voting over sampled answers is the dominant unsupervised...
arxiv.org/abs/2606.08098 →Details
- Excerpt
- When Does Delegation Beat Majority? A Delegation-Based Aggregator for Multi-Sample LLM Inference - arXiv:2606.08098v1 Announce Type: new Abstract: Majority voting over sampled answers is the dominant unsupervised...
- Context
- This paper introduces a new method (PPV) for LLM aggregation that outperforms majority voting on key benchmarks. It directly addresses model reliability and inference techniques.
- Key points
- This paper introduces a new method (PPV) for LLM aggregation that outperforms majority voting on key benchmarks. It directly addresses model reliability and inference techniques.
- Provenance
- Article · Supporting source
-
19
arXiv cs.AI - Research Science (GLOBAL)
Article Zayx Shawn
PACE: Anytime-Valid Acceptance Tests for Self-Evolving Agents - arXiv:2606.08106v1 Announce Type: new Abstract: Self-evolving agents improve by repeatedly proposing changes to their own prompts, skills, or workflows...
arxiv.org/abs/2606.08106 →Details
- Excerpt
- PACE: Anytime-Valid Acceptance Tests for Self-Evolving Agents - arXiv:2606.08106v1 Announce Type: new Abstract: Self-evolving agents improve by repeatedly proposing changes to their own prompts, skills, or workflows...
- Context
- Presents a primary artifact (paper) addressing agent reliability and self-improvement mechanisms, directly impacting agentic coding tools.
- Key points
- Presents a primary artifact (paper) addressing agent reliability and self-improvement mechanisms, directly impacting agentic coding tools.
- Provenance
- Article · Supporting source
-
20
arXiv cs.AI - Research Science (GLOBAL)
Article Yichen Chen, Siying Li, Yuhang Liang, Lijun Wang, Renyang Liu
SAGE: An LLM-driven Self Reflective Agentic Framework for Fraud Detection - arXiv:2606.08146v1 Announce Type: new Abstract: Fraud detection in payment, e-commerce, and telecommunications systems requires accuracy at...
arxiv.org/abs/2606.08146 →Details
- Excerpt
- SAGE: An LLM-driven Self Reflective Agentic Framework for Fraud Detection - arXiv:2606.08146v1 Announce Type: new Abstract: Fraud detection in payment, e-commerce, and telecommunications systems requires accuracy at...
- Context
- This paper introduces a novel, end-to-end LLM-driven agentic framework (SAGE) for fraud detection, reporting strong quantitative results and providing code. This directly relates to agentic tools and practical AI applications.
- Key points
- This paper introduces a novel, end-to-end LLM-driven agentic framework (SAGE) for fraud detection, reporting strong quantitative results and providing code. This directly relates to agentic tools and practical AI applications.
- Provenance
- Article · Supporting source
-
21
arXiv cs.AI - Research Science (GLOBAL)
Article Xinyu Guan, Qianyang Zhao, Yuming Deng
Decision-Aware Memory Cards: Counterfactual-Inspired Context Selection and Compression for Tool-Using LLM Agents - arXiv:2606.08151v1 Announce Type: new Abstract: Tool-using LLM agents often fail not because relevant...
arxiv.org/abs/2606.08151 →Details
- Excerpt
- Decision-Aware Memory Cards: Counterfactual-Inspired Context Selection and Compression for Tool-Using LLM Agents - arXiv:2606.08151v1 Announce Type: new Abstract: Tool-using LLM agents often fail not because relevant...
- Context
- Describes a new artifact (CICL) for tool-using LLM agents, focusing on context selection and compression—a core topic.
- Key points
- Describes a new artifact (CICL) for tool-using LLM agents, focusing on context selection and compression—a core topic.
- Provenance
- Article · Supporting source
-
22
arXiv cs.AI - Research Science (GLOBAL)
Article Hyogon Ryu, Jeonghwan Kim, Yewon Lim, Chaeun Lee, Jeongwook Kim, Donghoon Ham
Online Agent-as-a-Judge: Situation-Generating Evaluation for Interactive Agents - arXiv:2606.08200v1 Announce Type: new Abstract: Evaluating LLM-powered interactive social agents is challenging because socially...
arxiv.org/abs/2606.08200 →Details
- Excerpt
- Online Agent-as-a-Judge: Situation-Generating Evaluation for Interactive Agents - arXiv:2606.08200v1 Announce Type: new Abstract: Evaluating LLM-powered interactive social agents is challenging because socially...
- Context
- Presents a new evaluation framework (Online Agent-as-a-Judge) for interactive social agents, directly addressing agentic capabilities and testing methods.
- Key points
- Presents a new evaluation framework (Online Agent-as-a-Judge) for interactive social agents, directly addressing agentic capabilities and testing methods.
- Provenance
- Article · Supporting source
-
23
arXiv cs.AI - Research Science (GLOBAL)
Article Tanush Swaminathan, Runmin Jiang, Letian Zhang, Min Xu
SciTrace: Trajectory-Aware Safety Reasoning for Scientific Discovery Agents - arXiv:2606.08234v1 Announce Type: new Abstract: LLM-based scientific agents have shown strong capacity for autonomous research, yet their...
arxiv.org/abs/2606.08234 →Details
- Excerpt
- SciTrace: Trajectory-Aware Safety Reasoning for Scientific Discovery Agents - arXiv:2606.08234v1 Announce Type: new Abstract: LLM-based scientific agents have shown strong capacity for autonomous research, yet their...
- Context
- Addresses agent safety and reliability (SciTrace), a core concern for building autonomous AI agents in scientific discovery.
- Key points
- Addresses agent safety and reliability (SciTrace), a core concern for building autonomous AI agents in scientific discovery.
- Provenance
- Article · Supporting source
-
24
arXiv cs.AI - Research Science (GLOBAL)
Article Wisdom Dogah
Traxia: A Framework for Verifiable, Agent-Native Scientific Publishing - arXiv:2606.08256v1 Announce Type: new Abstract: Verifiability, attribution, and reproducibility are foundational requirements of scientific...
arxiv.org/abs/2606.08256 →Details
- Excerpt
- Traxia: A Framework for Verifiable, Agent-Native Scientific Publishing - arXiv:2606.08256v1 Announce Type: new Abstract: Verifiability, attribution, and reproducibility are foundational requirements of scientific...
- Context
- This introduces a new infrastructure (Traxia) for AI scientific publishing, fundamentally changing how research is validated and attributed. It impacts knowledge control and provenance.
- Key points
- This introduces a new infrastructure (Traxia) for AI scientific publishing, fundamentally changing how research is validated and attributed. It impacts knowledge control and provenance.
- Provenance
- Article · Supporting source
-
25
@awnihannun (Awni Hannun)
X awnihannun
Three MLX videos dropped at WWDC: Running agents locally by @angeloskath https:// youtube.com/watch?v=wykPEr J8M-8 … Distributed inference and training by Tatiana Likhomanenko https:// youtube.com/watch?v=CzgK02 zsRg4…
x.com/awnihannun/status/2064199840658256166 →Details
- Excerpt
- Three MLX videos dropped at WWDC: Running agents locally by @angeloskath https:// youtube.com/watch?v=wykPEr J8M-8 … Distributed inference and training by Tatiana Likhomanenko https:// youtube.com/watch?v=CzgK02 zsRg4…
- Context
- Reports multiple specific artifacts (videos) detailing local agents and distributed training/inference using MLX, directly addressing AI infrastructure and tools.
- Key points
- Reports multiple specific artifacts (videos) detailing local agents and distributed training/inference using MLX, directly addressing AI infrastructure and tools.
- Provenance
- Tweet · Primary source
-
26
@Jchammond_ (Connor)
X Jchammond_
Foundation Models has a CLI
x.com/Jchammond_/status/2064206029370630529 →Details
- Excerpt
- Foundation Models has a CLI
- Context
- Announcing a new capability (CLI) for Foundation Models is a primary artifact/tool update directly related to AI infrastructure and development tools.
- Key points
- Announcing a new capability (CLI) for Foundation Models is a primary artifact/tool update directly related to AI infrastructure and development tools.
- Provenance
- Tweet · Primary source
-
27
@suchenzang (Susan Zhang)
X suchenzang
agi happened when the opportunity cost of producing a meaningful frontier benchmark far far far exceeded simply* building and selling the product-benchmark directly ----- *simply here does not imply simple, trivial, or…
x.com/suchenzang/status/2064237678204481978 →Details
- Excerpt
- agi happened when the opportunity cost of producing a meaningful frontier benchmark far far far exceeded simply* building and selling the product-benchmark directly ----- *simply here does not imply simple, trivial, or…
- Context
- Directly addresses the core topic of AI frontier models and power dynamics by discussing the economic/strategic shift in building benchmarks.
- Key points
- Directly addresses the core topic of AI frontier models and power dynamics by discussing the economic/strategic shift in building benchmarks.
- Provenance
- Tweet · Primary source
-
28
Korea Ministry of Science and ICT Press Releases - Policy Geopolitics (KR)
Article
과기정통부, 물리적 인공지능(피지컬 AI) 핵심기술 국산화를 위한 선도 사업 본격 착수
www.msit.go.kr/bbs/view.do?bbsSeqNo=94&nttS… →Details
- Excerpt
- 과기정통부, 물리적 인공지능(피지컬 AI) 핵심기술 국산화를 위한 선도 사업 본격 착수
- Context
- Directly addresses 'physical-world AI' and national strategy for core technology localization (policy/geopolitics).
- Key points
- Directly addresses 'physical-world AI' and national strategy for core technology localization (policy/geopolitics).
- Provenance
- Article · Supporting source
-
29
Forbes Innovation - Industry Adjacent (US)
Article Lance Eliot, Contributor
Lawmakers Are Aiming To Regulate AI-Builds-AI Before AI Gets Entirely Beyond Human Control - Anthropic has brought attention to AI-builds-AI, involving using AI to advance AI. Some believe new AI laws should pause...
www.forbes.com/sites/lanceeliot/2026/06/09/… →Details
- Excerpt
- Lawmakers Are Aiming To Regulate AI-Builds-AI Before AI Gets Entirely Beyond Human Control - Anthropic has brought attention to AI-builds-AI, involving using AI to advance AI. Some believe new AI laws should pause...
- Context
- Directly addresses regulation (policy/geopolitics) of advanced AI development ('AI-builds-AI'), which is central to power dynamics and control.
- Key points
- Directly addresses regulation (policy/geopolitics) of advanced AI development ('AI-builds-AI'), which is central to power dynamics and control.
- Provenance
- Article · Supporting source
-
30
Microsoft's open source tools were hacked to steal passwords of AI developers — 225 pts · 96 comments
Article raffael_de
https://techcrunch.com/2026/06/08/microsofts-open-source-tools-were-hacked-to-steal-passwords-of-ai-developers/ · @JdeBP: These seem related: * https://news.ycombinator.com/item?id=48418318 (The Blight Reaches…
techcrunch.com/2026/06/08/microsofts-open-s… →Details
- Excerpt
- https://techcrunch.com/2026/06/08/microsofts-open-source-tools-were-hacked-to-steal-passwords-of-ai-developers/ · @JdeBP: These seem related: * https://news.ycombinator.com/item?id=48418318 (The Blight Reaches…
- Context
- Directly addresses AI security/infrastructure risks (hacks targeting AI devs). Focuses on power dynamics and infrastructure vulnerability.
- Key points
- Directly addresses AI security/infrastructure risks (hacks targeting AI devs). Focuses on power dynamics and infrastructure vulnerability.
- Provenance
- Article · Supporting source
-
31
The Guardian Technology - Industry Adjacent (UK)
Article Denis Campbell Health policy editor
Doctors and NHS could be sued for mistakes made by AI tools, report warns - Medical Protection Society calls for law to be overhauled to help medics avoid liability for errors made by technology Doctors and the NHS...
www.theguardian.com/society/2026/jun/09/doc… →Details
- Excerpt
- Doctors and NHS could be sued for mistakes made by AI tools, report warns - Medical Protection Society calls for law to be overhauled to help medics avoid liability for errors made by technology Doctors and the NHS...
- Context
- Directly addresses liability and regulation (policy/law) concerning AI use in medicine, a core power dynamic topic.
- Key points
- Directly addresses liability and regulation (policy/law) concerning AI use in medicine, a core power dynamic topic.
- Provenance
- Article · Supporting source
-
32
@Xianbao_QIAN (Tiezhen WANG)
X Xianbao_QIAN
New open weight model series from @NexEcosystem - Built on top of Qwen 3.5 series - Available in both Pro (397BA17B) and Mini (35BA3B) - Optimized for agentic adaptive thinking & long context - Apache 2 license…
x.com/Xianbao_QIAN/status/20642576842837814… →Details
- Excerpt
- New open weight model series from @NexEcosystem - Built on top of Qwen 3.5 series - Available in both Pro (397BA17B) and Mini (35BA3B) - Optimized for agentic adaptive thinking & long context - Apache 2 license…
- Context
- Announcing a new open-weight model series (Pro/Mini) optimized for agentic thinking and long context is a primary artifact that directly relates to frontier models and AI infrastructure.
- Key points
- Announcing a new open-weight model series (Pro/Mini) optimized for agentic thinking and long context is a primary artifact that directly relates to frontier models and AI infrastructure.
- Provenance
- Tweet · Primary source
-
33
Axios - Industry Adjacent (US)
Article Ina Fried
Apple's Siri AI is both cool and 2 years too late - Apple is finally delivering the conversational and context-aware AI that it promised two years ago . Its rivals have already moved on to agents. Why it matters:...
www.axios.com/2026/06/09/apple-siri-ai-agen… →Details
- Excerpt
- Apple's Siri AI is both cool and 2 years too late - Apple is finally delivering the conversational and context-aware AI that it promised two years ago . Its rivals have already moved on to agents. Why it matters:...
- Context
- Directly addresses agentic AI tools and Apple's response to competitors (OpenAI/Anthropic), impacting developer mental models.
- Key points
- Directly addresses agentic AI tools and Apple's response to competitors (OpenAI/Anthropic), impacting developer mental models.
- Provenance
- Article · Supporting source
-
34
Techmeme - Industry Adjacent (US)
Article
Sources: China is drafting plans to spend $295B over the next five years on building AI data centers, sourcing 80%+ of tech from local suppliers like Huawei (Charlie Zhu/Bloomberg) - Charlie Zhu / Bloomberg : Sources:...
www.techmeme.com/260609/p7 →Details
- Excerpt
- Sources: China is drafting plans to spend $295B over the next five years on building AI data centers, sourcing 80%+ of tech from local suppliers like Huawei (Charlie Zhu/Bloomberg) - Charlie Zhu / Bloomberg : Sources:...
- Context
- Details China's massive $295B plan for AI data centers and local sourcing (Huawei), directly addressing infrastructure, geopolitics, and control.
- Key points
- Details China's massive $295B plan for AI data centers and local sourcing (Huawei), directly addressing infrastructure, geopolitics, and control.
- Provenance
- Article · Supporting source
-
35
The Verge AI - Media Culture (US)
Article Hayden Field
Amazon employees ask Seattle to put the brakes on new data centers - On Tuesday, the Seattle City Council will vote on whether to enact a one-year moratorium on new data centers - just two months after several...
www.theverge.com/ai-artificial-intelligence… →Details
- Excerpt
- Amazon employees ask Seattle to put the brakes on new data centers - On Tuesday, the Seattle City Council will vote on whether to enact a one-year moratorium on new data centers - just two months after several...
- Context
- Directly addresses AI infrastructure (data centers) and power dynamics/regulation (moratorium), impacting compute availability.
- Key points
- Directly addresses AI infrastructure (data centers) and power dynamics/regulation (moratorium), impacting compute availability.
- Provenance
- Article · Supporting source
-
36
Rest of World Latest - Media Culture (GLOBAL)
Article Rina Chandran
The Great AI Divide: Navigating U.S. and Chinese dominance - At a Rest of World event during New York Tech Week, we explored the challenges and possible solutions to the dominance of American and Chinese AI companies.
restofworld.org/2026/ai-divide-america-chin… →Details
- Excerpt
- The Great AI Divide: Navigating U.S. and Chinese dominance - At a Rest of World event during New York Tech Week, we explored the challenges and possible solutions to the dominance of American and Chinese AI companies.
- Context
- Directly addresses power dynamics (US/China) and geopolitics shaping AI development, core to the podcast topic.
- Key points
- Directly addresses power dynamics (US/China) and geopolitics shaping AI development, core to the podcast topic.
- Provenance
- Article · Supporting source
-
37
Techmeme - Industry Adjacent (US)
Article
The UK is conducting a full review of its NHS contract with Palantir, amid growing pressure to terminate the deal in 2027 over reliance on US tech companies (Sam Tabahriti/Reuters) - Sam Tabahriti / Reuters : The UK is.…
www.techmeme.com/260609/p11 →Details
- Excerpt
- The UK is conducting a full review of its NHS contract with Palantir, amid growing pressure to terminate the deal in 2027 over reliance on US tech companies (Sam Tabahriti/Reuters) - Sam Tabahriti / Reuters : The UK is...
- Context
- Directly addresses geopolitical power dynamics and national control over critical infrastructure (NHS), fitting the podcast's focus on labs, regulators, and geopolitics.
- Key points
- Directly addresses geopolitical power dynamics and national control over critical infrastructure (NHS), fitting the podcast's focus on labs, regulators, and geopolitics.
- Provenance
- Article · Supporting source
-
38
Techmeme - Industry Adjacent (US)
Article
Sources: Taiwan considers restricting AI chip sales to all Chinese customers, rather than only blacklisted entities like Huawei, to align with US measures (Bloomberg) - Bloomberg : Sources: Taiwan considers restricting.…
www.techmeme.com/260609/p13 →Details
- Excerpt
- Sources: Taiwan considers restricting AI chip sales to all Chinese customers, rather than only blacklisted entities like Huawei, to align with US measures (Bloomberg) - Bloomberg : Sources: Taiwan considers restricting...
- Context
- Directly addresses geopolitics and export controls (chips/Taiwan/China), a core topic of power dynamics shaping AI infrastructure.
- Key points
- Directly addresses geopolitics and export controls (chips/Taiwan/China), a core topic of power dynamics shaping AI infrastructure.
- Provenance
- Article · Supporting source
-
39
Techmeme - Industry Adjacent (US)
Article
Apple unveils new Apple Foundation Models: two on-device models, including a 20B-parameter multimodal model called AFM 3 Core Advanced, and three cloud models (Apple Machine Learning Research) - Apple Machine Learning...
www.techmeme.com/260609/p17 →Details
- Excerpt
- Apple unveils new Apple Foundation Models: two on-device models, including a 20B-parameter multimodal model called AFM 3 Core Advanced, and three cloud models (Apple Machine Learning Research) - Apple Machine Learning...
- Context
- Apple unveiling new foundation models (on-device and cloud) directly impacts AI infrastructure and power dynamics.
- Key points
- Apple unveiling new foundation models (on-device and cloud) directly impacts AI infrastructure and power dynamics.
- Provenance
- Article · Supporting source
Transcript
00:00:04 lenarStart with a small scene, because it's the one that stuck with me. You've got a reasoning model running inside an agent loop. Up top there's a system instruction — something like, don't modify the production database without an explicit confirmation step. Then one of the tools the agent called returns an error, and inside that error text there's a sentence phrased like a command. The model reads the error, treats that sentence as the new marching order, and drops the rule it was handed up front. A paper that went up on the archive overnight gives that failure a name and a diagnostic — it's titled Where Instruction Hierarchy Breaks.
00:00:40 damra[tsk] Caveat before we get excited — this is the arXiv new-submissions listing from this morning, the ninth. We're reading abstracts, not peer review. The authors are Sanjay Kariyappa and G. Edward Suh, and they say they checked the failure across the Gemma, Qwen, and Claude families. I haven't read past the abstract, so I can't hand you the number for how much their repair actually buys you. Hold that one loosely.
00:01:04 lenarRight. And here's why I led with it. When I pulled up the listing this morning, that paper wasn't alone. There were close to twenty of them, all clustered around the same nervous question — less about whether the agent can do the task, more about whether you can trust the system you wrapped around the model to do it the way you meant. They split across compliance, evaluation, security, memory, and how much the model just agrees with you. It reads less like an ordinary day on the listing and more like a field deciding, all at once, that the system around the model deserves the attention.
00:01:36 damraWe should be careful, though, because we say a version of this all the time. Just yesterday we put it this way — the consequential part has moved out of the model and into the system around it. So I don't want to dress up a coincidence as a movement. Twenty papers going up the same day is partly just the submission calendar. What's different is the specificity. These aren't essays about trust. Each one is a benchmark, an attack, or a control mechanism, most of them with code attached.
00:02:06 lenarThat's the distinction to hold onto. Here's the route. We start with that instruction-hierarchy paper, because it's the cleanest version of the problem. Then the attack side — one paper on injecting commands through error messages, another on backdoors. After that, the measurement pile: how do you benchmark an agent that learns to game your benchmark. Sycophancy comes next, which finally got a continuous score. Then memory and skills — agents saving their own experience as reusable artifacts. And we close on the papers pushing agents into nuclear licensing, courtroom evidence, and clinical care. The question underneath all of it: when the model is the easy part, what does the hard part look like in code.
00:02:46 lenarBack to the scene. The premise of the Kariyappa and Suh paper is that reasoning models are supposed to rank their instructions. System prompt on top, then the developer message, then the user, and dead last, whatever a tool or web page hands back. That ordering is the whole safety story for an agent. If a model can't keep system above tool output, then every untrusted thing it reads becomes a candidate instruction.
00:03:11 damraWhat makes it interesting is that it's a white-box diagnostic. They're not just probing from the outside with adversarial prompts and counting failures. They're looking inside the activations to find where in the model the priority ordering stops being represented. That's the difference between saying the model failed here, and saying the model stopped tracking which instruction outranked which at this specific layer. If it holds up, you can point at the mechanism instead of the symptom.
00:03:37 lenarAnd they claim a repair — an intervention that raises how often the model keeps the ordering intact. The abstract calls the improvement measurable, which is the kind of word that means I should read the actual table before I repeat a percentage. But the claim matters even without the figure. Today the standard defense against prompt injection is mostly input filtering and wrapping untrusted content in delimiters. This is saying the vulnerability is structural, inside the model's representation of authority, and that you might be able to repair it there.
00:04:08 damraHere's where I'd push, though. Testing across Gemma, Qwen, and Claude is a strong claim, because those are different architectures and training pipelines. If the same internal failure shows up in all three, that's either a deep result or a sign the diagnostic is loose enough to find something everywhere. And white-box work on closed models is hard — you don't get Claude's activations handed to you. So either they're using a proxy, or the Claude part is black-box and the white-box part is the open-weight models. That distinction changes how much I'd trust the headline, and I can't tell from the abstract which it is.
00:04:45 lenarThat's fair, and it's the recurring tax on this whole category — the most interesting claims are the hardest to verify from outside. Hold that thought, because the next paper is the attacker's version of exactly this problem.
00:04:57 lenarThe instruction-hierarchy paper says models can confuse tool output for a command. There's a second paper that treats that confusion as a weapon. It's called VATS, and the subtitle is the useful part — exploiting implicit authority in error-path injection. The setup is the Model Context Protocol, the standard a lot of agents now use to call tools. The attack: don't go after the happy path, go after the error messages.
00:05:23 damraImplicit authority is a sharp little observation. When a tool succeeds, the agent treats the result as data. When a tool fails, the error message often gets treated as guidance — fix it this way, or retry like that. The error channel carries an implied do-this-next that the agent is primed to obey. So you craft an error that reads like a system telling the agent what to do, and the agent's own helpfulness does the rest. VATS adds systematic mutation, which I read as fuzzing — they automatically vary the injected error text to find the phrasings that slip through.
00:05:59 lenarThat's an uncomfortable surface, because most teams treat error handling as the part of the system nobody bothers to harden. You validate the inputs going into a tool. Do you validate the error string coming back out before it lands in the model's context window? Almost nobody does. The error path is the door no one thought to lock.
00:06:18 damraIt pairs with the other security paper from this morning — the one on shared latent structures for backdoors. Their claim is that very different attacks — a jailbreak trigger, a bias trigger, or a planted misbehavior — share a common internal signature, and you can catch the whole family with sparse autoencoders reading the activations. Which is the optimistic mirror of the white-box hierarchy work: same toolkit, find where the model represents the bad behavior and intervene there.
00:06:45 lenarIn one morning you've got the attack getting more systematic and the defense getting more mechanistic, and both are betting on the same thing — that the action has moved inside the activations, not the prompt text. What I can't resolve yet is whether the defenders are ahead or behind. Sparse-autoencoder detection sounds great until you remember the attacker can read the same papers.
00:07:06 damraAnd both are preprints with no adversarial back-and-forth yet. A detection method looks strong right up until someone designs the attack that evades it. So I'd file both as a promising direction, unproven in a contest. The contest is the part that takes a year, not a weekend.
00:07:23 lenarNext pile, and it's the one I find most philosophically interesting. Start with the paper titled Beyond Goodhart's Law — a dynamic benchmark for compliance in multi-agent systems, and they call it MAC-Bench. Goodhart's Law, for anyone who hasn't carried it around: when a measure becomes a target, it stops being a good measure. The minute you publish a benchmark, people optimize for the benchmark instead of what it was standing in for.
00:07:49 damraTheir answer is to make the benchmark move. With a static test, the teams training the agents eventually overfit. A dynamic benchmark regenerates the scenarios, so there's no fixed set of answers to memorize. For multi-agent compliance specifically, that means testing whether a group of agents follows a procedure when the situation keeps changing under them. It's a reasonable idea. The hard part is proving the regenerated scenarios are actually equal in difficulty, otherwise your score is just measuring how hard today's draw happened to be.
00:08:22 lenarThat difficulty problem shows up again in the second one — Online Agent-as-a-Judge. Evaluating interactive social agents is brutal, because the right behavior depends on the situation, and you can't script every situation. So they have an agent generate the situations and judge the responses, live. Which is clever and also a little vertiginous — you're using an agent to evaluate an agent, and the judge has all the same failure modes we've spent the last fifteen minutes describing.
00:08:49 damraThat's the snake eating its tail, yeah. And the third paper leans all the way into it — PACE, anytime-valid acceptance tests for self-evolving agents. The scenario is an agent that rewrites its own prompts and skills to improve, and you need a statistical test that decides whether the new version is actually better before you accept the change. Anytime-valid is the real contribution. It means you can peek at the results as they stream in and stop early without breaking the statistics, which ordinary significance testing won't let you do. That's borrowed from sequential clinical-trial methods, and it's a good fit for an agent that's changing every few minutes.
00:09:29 lenarSo the pattern across these three is three different attempts to keep evaluation honest when the thing you're evaluating is adaptive and can learn the test. A moving benchmark, a generative judge, and a sequential acceptance gate. None of them solves it cleanly. All three are admitting the static leaderboard is finished for agents.
00:09:48 damraThere's a fourth in the same vein I'll mention fast — a paper asking when delegation beats majority voting for combining multiple model samples. The standard trick is to sample the model several times and take the majority answer. They argue that delegating to a chosen sample, under some conditions, beats the vote. Small and technical, but it's the same family: stop trusting the naive aggregate, build something that knows when to defer.
00:10:14 lenarDefer is a good word to leave on, because the next paper is about an agent that defers too much. This one might be the most immediately useful of the bunch for anyone shipping a product. It's called the AI Epistemic Deference Index. Sycophancy — the model agreeing with you because you want it to, even when you're wrong — has been a known problem for years. What's been missing is a number. The authors are Alejandro Botas, Paul de Font-Reaulx, and Luke Hewitt. They propose a continuous index — not a yes-or-no, did-it-cave, but a graded measure of how much a model bends toward the user's stated belief.
00:10:51 damraContinuous is the right call, because sycophancy isn't binary in practice. A model can hold its ground on a fact and still soften its confidence because you pushed back. What I'd want to know — and the abstract won't tell me — is how they separate appropriate updating from caving. If I give the model actual new evidence, it should move. If I just say no, you're wrong, try again, and it folds, that's the pathology. A good index has to score those two differently, or it'll punish a model for correctly changing its mind.
00:11:24 lenarThat connects to a sibling paper from the same morning on personalized reward modeling — they call it PAFO. The worry there is that when you tune a model to each user's preferences, you bake in a personalized bias, a tailored version of telling people what they want to hear. So you've got one paper trying to measure deference and another trying to keep personalization from manufacturing it. The reason this matters past the lab: every assistant that remembers your preferences is, structurally, being trained to agree with you more over time.
00:11:57 damraI buy the concern, and I'd still want the deference index validated against humans before I trusted the score. Sycophancy judgments are subjective — sometimes deferring to the user is correct, because the user knows their own situation. A number makes it sound settled. I'd treat the index as a useful instrument and not a verdict, at least until someone shows it agrees with trained human raters across a real spread of cases.
00:12:22 lenarAgreed. But even an imperfect instrument changes the conversation, because right now product teams argue about sycophancy with anecdotes. Hand them a dial that moves, and the argument gets a lot more concrete.
00:12:35 lenarNow the cluster closest to what we were talking about over the weekend — agents that save their own experience and reuse it. Three papers, and each takes a different cut. Start with MemToolAgent. The toy example in the abstract is almost charming: an agent booking a restaurant gets a time format wrong, the tool rejects it, and instead of just retrying, the agent writes itself a reflection — note to self, this booking tool wants times in this format — and stores it as a memory it can pull back next time.
00:13:04 damraWhich is lovely until the memory store fills up with reflections, half of them wrong or stale. That's the actual engineering problem with agent memory — not writing memories, but deciding which to keep, which to trust, and which to throw away. And that's exactly what the second paper goes after — Decision-Aware Memory Cards. Their starting observation is sharp: tool-using agents often fail not because the relevant context is missing, but because it's buried under irrelevant context. So they do counterfactual-inspired selection — roughly, would the decision have changed if this card weren't here — to compress the memory down to what actually moves the outcome.
00:13:45 lenarThe third one puts a gate on all of it — a framework for selective formalization and gated execution of durable workflows. The idea: an agent turns a successful run into a reusable skill, but you don't let every improvised success harden into a saved procedure automatically. You selectively formalize the ones worth keeping, and you gate when they're allowed to run. It's governance for the agent's own growing library of habits, which was completely missing from the skill-file enthusiasm a week ago.
00:14:14 damraThere's a real tension between these three. MemToolAgent wants the agent to learn freely from every mistake. The gated-skills paper wants a checkpoint before anything learned becomes a permanent part of how the agent operates. Those are opposite instincts — move fast and remember everything, versus formalize slowly and gate execution. The right answer is obviously somewhere between, and nobody knows where yet. There's even a small-model version in the pile — a skill-grounding paper that uses code refactoring with small language models to turn messy learned behavior into clean reusable functions. Same instinct, different layer.
00:14:51 lenarThe memory story this morning is maturing past give-the-agent-a-vector-store-and-hope. Now the agent has to select which memories to keep, compress them, formalize the good ones, and gate when they run. Which is, frankly, just software engineering arriving for the agent's notebook. And it sets up the last cluster, because everything we've said so far is about agents in general. The last few papers ask what happens when you point this machinery at domains where being wrong has consequences measured in years or in lives.
00:15:22 lenarThree domains in one morning. The first is nuclear. There's a paper from a team of nuclear engineers — Akshay Dave, David Grabaskas, and colleagues — titled Overcoming the Regulatory Bottleneck via Agent-to-Agent Protocols, with a nuclear case study. The premise is concrete: licensing an advanced reactor design routinely takes more than — and the abstract cuts off there, but for advanced reactors the real figure is years, sometimes most of a decade. Their proposal is agent-to-agent protocols between the applicant's side and, in effect, the review side, to compress that.
00:15:57 damraMy first reaction is, careful what you automate. The reason reactor licensing takes years isn't only paperwork latency. A lot of it is deliberate, adversarial scrutiny — humans trying to find the failure the applicant didn't. If you speed up the document exchange with agents, great. If you let agents stand in for the judgment, you've optimized away the one bottleneck that was arguably doing its job. I'd want to know exactly which part of the review they're proposing to hand off. The abstract doesn't say, and on a topic like this, that distinction is everything.
00:16:31 lenarThat's the right line to hold, and the second paper is the mirror image — automation as the threat, not the fix. It's a CIFAR dataset for detecting AI-generated evidence. The author list is telling: it includes Maura Grossman, a name from the electronic-discovery and law side, alongside computer scientists. The problem they're naming is blunt — generative models can now produce realistic documents, and those documents can show up as fabricated evidence in legal proceedings. The dataset exists to train detectors.
00:17:00 damraDetection datasets are legitimate, but I'd flag the same trap as the backdoor paper — a detector trained on today's generators is a snapshot, and the generators move every few months. A static dataset of synthetic evidence is useful for a while and then it's a museum piece. What would actually help courts is less a detector and more a provenance chain — knowing where a document came from, rather than guessing whether it looks fake. Detection is the patch. Provenance is the fix, and it's much harder, because it's institutional, not technical.
00:17:33 lenarAnd the third one is the one I'd least want to get wrong — stress-testing medical large language models. The framing is pointed: these models are entering clinical practice on the strength of benchmark accuracy, and the paper argues that benchmark accuracy hides what they call latent safety pathology. Pass the medical exam questions, look great on the leaderboard, and still fail in ways the benchmark never probed — under pressure, on edge cases, when the question is phrased the way a frightened person actually phrases it.
00:18:03 damraAnd that phrase — beyond benchmark accuracy — is the whole morning in three words, isn't it. Everything we read today is some version of: the score on the board isn't what you actually care about. The medical paper just says it where the cost of the gap is a person. There's even a constructive flip side in the pile — a multimodal agentic copilot for pathology built around evidence grounding. The right instinct there: if you're going to put an agent near a diagnosis, make it cite the slide it's looking at, not just assert.
00:18:34 lenarSo that's the morning. Twenty-odd papers, and the nerve running through them is the distance between a model that scores well and a system you'd actually trust — measured, attacked, gated, and stress-tested from every direction at once. None of these is a finished result. They're abstracts from a single day, and most need a year of the adversarial follow-up Damra keeps naming before we know which ones hold. If one earns a second look first, it's the instruction-hierarchy repair, because if you really can fix authority-confusion inside the model's own representation, half the attack papers from this morning get harder to pull off. Whether that repair survives contact with a real attacker is the test that decides it.
00:19:14 damraAnd the cheap version of that test runs in any agent you've already got. Feed it a hostile error message and watch whether it obeys. You don't need a paper to start.