Archive BRAID
Twenty Ways To Not Trust An Agent / DISPATCH 052
PDF RSS

Dispatch 052 · 2026-06-09 GSV The Easy Part Was The Model

Twenty Ways To Not Trust An Agent

/ 00:19:29 / 39 sources

“Every one of these papers is a different answer to the same question: how do you trust a thing that's now mostly the system around the model, not the model.”

— Lenar Kess, today's narration

One morning's arXiv listing dropped close to twenty agent papers, and almost none of them are about making agents more capable. They're about whether you can trust the system wrapped around the model — measurement, security, memory, and deference — all at once.

Chapters

  1. 00:00:04 Transcript

Sources

39 cited
  1. 1

    arXiv cs.AI - Research Science (GLOBAL)

    Article Shangbin Feng, Yike Wang, Weijia Shi, Luke Zettlemoyer, Yejin Choi, Yulia Tsvetkov

    Scaling Participation in Modular AI Systems - arXiv:2606.07812v1 Announce Type: new Abstract: Humanity is a mosaic of multifaceted talents and needs, and any truly intelligent AI must reflect that richness. Yet the...

    arxiv.org/abs/2606.07812 →
    Details
    Excerpt
    Scaling Participation in Modular AI Systems - arXiv:2606.07812v1 Announce Type: new Abstract: Humanity is a mosaic of multifaceted talents and needs, and any truly intelligent AI must reflect that richness. Yet the...
    Context
    This paper proposes 'scaling participation,' a technical shift from monolithic LLMs to modular, bottom-up AI systems built by diverse contributors. This directly addresses power dynamics and control over intelligence.
    Key points
    • This paper proposes 'scaling participation,' a technical shift from monolithic LLMs to modular, bottom-up AI systems built by diverse contributors. This directly addresses power dynamics and control over intelligence.
    Provenance
    Article · Supporting source
  2. 2

    arXiv cs.AI - Research Science (GLOBAL)

    Article Guangzhi Sun, Yixuan Li, Yudong Yang, Chao Zhang

    OmniMem: Perturbation-aware Memory Compression for Streaming Audio-Visual LLMs - arXiv:2606.07577v1 Announce Type: new Abstract: Audio-visual large language models (LLMs) hold strong promise for long-form video...

    arxiv.org/abs/2606.07577 →
    Details
    Excerpt
    OmniMem: Perturbation-aware Memory Compression for Streaming Audio-Visual LLMs - arXiv:2606.07577v1 Announce Type: new Abstract: Audio-visual large language models (LLMs) hold strong promise for long-form video...
    Context
    This is a primary artifact (arXiv paper) detailing a memory-efficient framework for long-video LLMs, directly addressing core AI infrastructure and model limitations.
    Key points
    • This is a primary artifact (arXiv paper) detailing a memory-efficient framework for long-video LLMs, directly addressing core AI infrastructure and model limitations.
    Provenance
    Article · Supporting source
  3. 3

    arXiv cs.AI - Research Science (GLOBAL)

    Article Bo Zhang, Borui Zhang, Chenghao Jiang, Minglei Shi, Xiaofeng Wang, Zheng Zhu, Jie Zhou, Jiwen Lu

    Syll: Open-Source Personal Automation with Cross-Surface Execution - arXiv:2606.07594v1 Announce Type: new Abstract: Personal AI agents must increasingly operate across APIs, shells, web surfaces, and desktop GUIs, yet.…

    arxiv.org/abs/2606.07594 →
    Details
    Excerpt
    Syll: Open-Source Personal Automation with Cross-Surface Execution - arXiv:2606.07594v1 Announce Type: new Abstract: Personal AI agents must increasingly operate across APIs, shells, web surfaces, and desktop GUIs, yet...
    Context
    Describes an open-source agent framework (Syll) for cross-surface automation (GUI/API/CLI), directly addressing core topics of agentic tools and software engineering.
    Key points
    • Describes an open-source agent framework (Syll) for cross-surface automation (GUI/API/CLI), directly addressing core topics of agentic tools and software engineering.
    Provenance
    Article · Supporting source
  4. 4

    arXiv cs.AI - Research Science (GLOBAL)

    Article Kai A. Horstmann, Ethan Lin, Alice A. Robie, Jennifer J. Sun, Kristin Branson

    A case study of evaluating AI agents on a neuroscience data-to-discovery pipeline - arXiv:2606.07718v1 Announce Type: new Abstract: Agentic AI tools offer a promising path to automating software development bottlenecks.…

    arxiv.org/abs/2606.07718 →
    Details
    Excerpt
    A case study of evaluating AI agents on a neuroscience data-to-discovery pipeline - arXiv:2606.07718v1 Announce Type: new Abstract: Agentic AI tools offer a promising path to automating software development bottlenecks...
    Context
    Directly addresses agentic tools in a complex, real-world scientific pipeline (neuroscience/optogenetics), discussing limitations and evaluation criteria for advanced AI agents.
    Key points
    • Directly addresses agentic tools in a complex, real-world scientific pipeline (neuroscience/optogenetics), discussing limitations and evaluation criteria for advanced AI agents.
    Provenance
    Article · Supporting source
  5. 5

    arXiv cs.AI - Research Science (GLOBAL)

    Article Yiyang Zhao, Zhuo Zhang, Qingxuan Le, Lizhen Qu, Zenglin Xu

    Beyond Goodhart's Law: A Dynamic Benchmark for Evaluating Compliance in Multi-Agent Systems - arXiv:2606.07805v1 Announce Type: new Abstract: The rapid evolution of Large Language Models (LLMs) from passive assistants...

    arxiv.org/abs/2606.07805 →
    Details
    Excerpt
    Beyond Goodhart's Law: A Dynamic Benchmark for Evaluating Compliance in Multi-Agent Systems - arXiv:2606.07805v1 Announce Type: new Abstract: The rapid evolution of Large Language Models (LLMs) from passive assistants...
    Context
    Introduces a new benchmark (MAC-Bench) for evaluating agentic procedural compliance, directly addressing operational risks in multi-agent systems.
    Key points
    • Introduces a new benchmark (MAC-Bench) for evaluating agentic procedural compliance, directly addressing operational risks in multi-agent systems.
    Provenance
    Article · Supporting source
  6. 6

    arXiv cs.AI - Research Science (GLOBAL)

    Article Sanjay Kariyappa, G. Edward Suh

    Where Instruction Hierarchy Breaks: Diagnosing and Repairing Failures in Reasoning Language Models - arXiv:2606.07808v1 Announce Type: new Abstract: Reasoning language models deployed in agentic workflows must follow...

    arxiv.org/abs/2606.07808 →
    Details
    Excerpt
    Where Instruction Hierarchy Breaks: Diagnosing and Repairing Failures in Reasoning Language Models - arXiv:2606.07808v1 Announce Type: new Abstract: Reasoning language models deployed in agentic workflows must follow...
    Context
    This paper introduces a white-box diagnostic framework for instruction hierarchy failures in reasoning models (Gemma, Qwen, Claude). It reports measurable improvements and failure modes, directly impacting agentic workflow reliability.
    Key points
    • This paper introduces a white-box diagnostic framework for instruction hierarchy failures in reasoning models (Gemma, Qwen, Claude). It reports measurable improvements and failure modes, directly impacting agentic workflow reliability.
    Provenance
    Article · Supporting source
  7. 7

    arXiv cs.AI - Research Science (GLOBAL)

    Article Akshay J. Dave, David Grabaskas, Joseph A. Renevitz, Richard B. Vilim

    Overcoming the Regulatory Bottleneck via Agent-to-Agent Protocols: A Nuclear Case Study - arXiv:2606.07866v1 Announce Type: new Abstract: Regulatory review of advanced nuclear reactor designs routinely spans more than...

    arxiv.org/abs/2606.07866 →
    Details
    Excerpt
    Overcoming the Regulatory Bottleneck via Agent-to-Agent Protocols: A Nuclear Case Study - arXiv:2606.07866v1 Announce Type: new Abstract: Regulatory review of advanced nuclear reactor designs routinely spans more than...
    Context
    Addresses a core theme: AI agents solving systemic bottlenecks in regulated industries (nuclear/pharma). High blast radius on policy and institutions.
    Key points
    • Addresses a core theme: AI agents solving systemic bottlenecks in regulated industries (nuclear/pharma). High blast radius on policy and institutions.
    Provenance
    Article · Supporting source
  8. 8

    arXiv cs.AI - Research Science (GLOBAL)

    Article Alejandro Botas, Paul de Font-Reaulx, Luke Hewitt

    The AI Epistemic Deference Index: A Continuous Measure of Sycophancy - arXiv:2606.07897v1 Announce Type: new Abstract: Current AI models frequently exhibit epistemic sycophancy, endorsing claims to agree with a user....

    arxiv.org/abs/2606.07897 →
    Details
    Excerpt
    The AI Epistemic Deference Index: A Continuous Measure of Sycophancy - arXiv:2606.07897v1 Announce Type: new Abstract: Current AI models frequently exhibit epistemic sycophancy, endorsing claims to agree with a user....
    Context
    Introduces a new, measurable benchmark (AEDI) for evaluating model behavior (sycophancy/deference), directly impacting how models are assessed and controlled.
    Key points
    • Introduces a new, measurable benchmark (AEDI) for evaluating model behavior (sycophancy/deference), directly impacting how models are assessed and controlled.
    Provenance
    Article · Supporting source
  9. 9

    arXiv cs.AI - Research Science (GLOBAL)

    Article Suleyman Armagan Er, Danilo Ribeiro, Yogesh Virkar, Surafel Lakew, Adi Kalyanpur, James Gung, Thomas Delteil, Arshit Gupta

    MemToolAgent overview with a simple restaurant booking scenario where the agent retrieves similar memories, receives feedback on an invalid time format, and generates a reflection to update its memory -...

    arxiv.org/abs/2606.07909 →
    Details
    Excerpt
    MemToolAgent overview with a simple restaurant booking scenario where the agent retrieves similar memories, receives feedback on an invalid time format, and generates a reflection to update its memory -...
    Context
    Describes a new agentic framework (MemToolAgent) that improves tool use via memory management, directly addressing core topics of agents and software engineering.
    Key points
    • Describes a new agentic framework (MemToolAgent) that improves tool use via memory management, directly addressing core topics of agents and software engineering.
    Provenance
    Article · Supporting source
  10. 10

    arXiv cs.AI - Research Science (GLOBAL)

    Article Kelly McConvey, Jalehsadat Mahdavimoghaddam, Nima Jamali, Maksym Taranukhin, Sajad Ebrahimi, Wentao Zhang, Yuntian Deng, Karen Eltis, Maura R. Grossman, Vered Shwartz, Ebrahim Bagheri

    The CIFAR Synthetic Evidence Corpus for Detecting AI-Generated Evidence - arXiv:2606.07916v1 Announce Type: new Abstract: The growing ability of generative models to produce realistic documents poses a direct challenge.…

    arxiv.org/abs/2606.07916 →
    Details
    Excerpt
    The CIFAR Synthetic Evidence Corpus for Detecting AI-Generated Evidence - arXiv:2606.07916v1 Announce Type: new Abstract: The growing ability of generative models to produce realistic documents poses a direct challenge...
    Context
    Addresses AI's impact on legal evidence/justice system (policy/institutions). A new dataset for detection is a primary artifact with clear downstream consequence.
    Key points
    • Addresses AI's impact on legal evidence/justice system (policy/institutions). A new dataset for detection is a primary artifact with clear downstream consequence.
    Provenance
    Article · Supporting source
  11. 11

    arXiv cs.AI - Research Science (GLOBAL)

    Article Yuan Shen, Xiaojun Wu, Linghua Yu

    Stress-testing medical large language models reveals latent safety pathology beyond benchmark accuracy - arXiv:2606.07929v1 Announce Type: new Abstract: Large language models (LLMs) are entering clinical practice based.…

    arxiv.org/abs/2606.07929 →
    Details
    Excerpt
    Stress-testing medical large language models reveals latent safety pathology beyond benchmark accuracy - arXiv:2606.07929v1 Announce Type: new Abstract: Large language models (LLMs) are entering clinical practice based...
    Context
    Directly addresses LLM safety in clinical/medical settings (AI infrastructure/medicine). Establishes a new 'stress-audit' methodology for evaluating AI reliability.
    Key points
    • Directly addresses LLM safety in clinical/medical settings (AI infrastructure/medicine). Establishes a new 'stress-audit' methodology for evaluating AI reliability.
    Provenance
    Article · Supporting source
  12. 12

    arXiv cs.AI - Research Science (GLOBAL)

    Article Omar Mahmoud, Aly M. Kassem, Thommen George Karimpanal, Buddhika Laknath Semage, Negar Rostamzadeh, Golnoosh Farnadi, Santu Rana

    Shared Latent Structures Enable Unified Backdoor Detection and Mitigation in LLMs - arXiv:2606.07963v1 Announce Type: new Abstract: Backdoor attacks in large language models (LLMs) are often treated as isolated...

    arxiv.org/abs/2606.07963 →
    Details
    Excerpt
    Shared Latent Structures Enable Unified Backdoor Detection and Mitigation in LLMs - arXiv:2606.07963v1 Announce Type: new Abstract: Backdoor attacks in large language models (LLMs) are often treated as isolated...
    Context
    This paper identifies shared latent structures for diverse LLM backdoor attacks (jailbreaking, bias). It proposes a generalizable detection/mitigation method (SAEs), directly impacting model security and control.
    Key points
    • This paper identifies shared latent structures for diverse LLM backdoor attacks (jailbreaking, bias). It proposes a generalizable detection/mitigation method (SAEs), directly impacting model security and control.
    Provenance
    Article · Supporting source
  13. 13

    arXiv cs.AI - Research Science (GLOBAL)

    Article Xiaoyan Zhao, Haoting Ni, Yang Zhang, Chunyuan Zheng, Haoxuan Li, Fuli Feng

    PAFO: Pareto Fairness Optimization for Personalized Reward Modeling - arXiv:2606.07988v1 Announce Type: new Abstract: Large language models (LLMs) increasingly rely on reward models to align their outputs with diverse...

    arxiv.org/abs/2606.07988 →
    Details
    Excerpt
    PAFO: Pareto Fairness Optimization for Personalized Reward Modeling - arXiv:2606.07988v1 Announce Type: new Abstract: Large language models (LLMs) increasingly rely on reward models to align their outputs with diverse...
    Context
    This paper addresses 'personalized reward bias' in LLMs, a core issue of fairness and control over AI outputs. It proposes a technical solution (PAFO) for fairer personalization.
    Key points
    • This paper addresses 'personalized reward bias' in LLMs, a core issue of fairness and control over AI outputs. It proposes a technical solution (PAFO) for fairer personalization.
    Provenance
    Article · Supporting source
  14. 14

    arXiv cs.AI - Research Science (GLOBAL)

    Article Harshil Patel, Kunal Pai

    VATS: Exploiting Implicit Authority in Error-Path Injection via Systematic Mutation - arXiv:2606.07992v1 Announce Type: new Abstract: As the Model Context Protocol (MCP) standardizes tool-calling for autonomous agents,.…

    arxiv.org/abs/2606.07992 →
    Details
    Excerpt
    VATS: Exploiting Implicit Authority in Error-Path Injection via Systematic Mutation - arXiv:2606.07992v1 Announce Type: new Abstract: As the Model Context Protocol (MCP) standardizes tool-calling for autonomous agents,...
    Context
    This paper details a novel attack vector (error-path injection) against autonomous agents and tool-calling protocols, directly impacting agentic coding tools and AI infrastructure security.
    Key points
    • This paper details a novel attack vector (error-path injection) against autonomous agents and tool-calling protocols, directly impacting agentic coding tools and AI infrastructure security.
    Provenance
    Article · Supporting source
  15. 15

    arXiv cs.AI - Research Science (GLOBAL)

    Article Sera Choi, Wonje Choi, Saehun Chun, Daehee Lee, Jooyoung Kim, Chaeun Lee, Honguk Woo

    Efficient Skill Grounding via Code Refactoring with Small Language Models - arXiv:2606.07999v1 Announce Type: new Abstract: Effective skill grounding is essential for deploying reusable skills in embodied agents, as...

    arxiv.org/abs/2606.07999 →
    Details
    Excerpt
    Efficient Skill Grounding via Code Refactoring with Small Language Models - arXiv:2606.07999v1 Announce Type: new Abstract: Effective skill grounding is essential for deploying reusable skills in embodied agents, as...
    Context
    Describes a new framework (RECENT) for skill grounding in embodied agents using sLMs and code refactoring. Directly relates to agentic tools and AI infrastructure.
    Key points
    • Describes a new framework (RECENT) for skill grounding in embodied agents using sLMs and code refactoring. Directly relates to agentic tools and AI infrastructure.
    Provenance
    Article · Supporting source
  16. 16

    arXiv cs.AI - Research Science (GLOBAL)

    Article Amine El Hattami, Nicolas Chapados, Christopher Pal

    SKILL.nb: Selective Formalization and Gated Execution for Durable Agent Workflows - arXiv:2606.08049v1 Announce Type: new Abstract: AI agents increasingly turn past experience into reusable artifacts such as code,...

    arxiv.org/abs/2606.08049 →
    Details
    Excerpt
    SKILL.nb: Selective Formalization and Gated Execution for Durable Agent Workflows - arXiv:2606.08049v1 Announce Type: new Abstract: AI agents increasingly turn past experience into reusable artifacts such as code,...
    Context
    Introduces SKILL.nb, a framework for governing reusable agent workflows and improving reliability/durability in complex tasks.
    Key points
    • Introduces SKILL.nb, a framework for governing reusable agent workflows and improving reliability/durability in complex tasks.
    Provenance
    Article · Supporting source
  17. 17

    arXiv cs.AI - Research Science (GLOBAL)

    Article Zhe Xu, Zhengyu Zhang, Zhiyuan Cai, Jiahao Xu, Yijie Lin, Ziyi Liu, Junlin Hou, Hongyi Wang, Yuxiang Nie, Ling Liang, Yihui Wang, Yingxue Xu, Ronald Cheong Kin Chan, Li Liang, Hao Chen

    A Multi-modal Agentic Co-pilot for Evidence Grounded Computational Pathology - arXiv:2606.08093v1 Announce Type: new Abstract: Pathology is the cornerstone of modern medicine, where accurate decision-making relies...

    arxiv.org/abs/2606.08093 →
    Details
    Excerpt
    A Multi-modal Agentic Co-pilot for Evidence Grounded Computational Pathology - arXiv:2606.08093v1 Announce Type: new Abstract: Pathology is the cornerstone of modern medicine, where accurate decision-making relies...
    Context
    A primary artifact (new model/tool) in medicine that uses advanced AI concepts (multimodal agents, hypergraphs, evidence grounding). Highly relevant to 'power dynamics' and 'physical-world AI'.
    Key points
    • A primary artifact (new model/tool) in medicine that uses advanced AI concepts (multimodal agents, hypergraphs, evidence grounding). Highly relevant to 'power dynamics' and 'physical-world AI'.
    Provenance
    Article · Supporting source
  18. 18

    arXiv cs.AI - Research Science (GLOBAL)

    Article Yasushi Sakai, Allen Song, Kent Larson

    When Does Delegation Beat Majority? A Delegation-Based Aggregator for Multi-Sample LLM Inference - arXiv:2606.08098v1 Announce Type: new Abstract: Majority voting over sampled answers is the dominant unsupervised...

    arxiv.org/abs/2606.08098 →
    Details
    Excerpt
    When Does Delegation Beat Majority? A Delegation-Based Aggregator for Multi-Sample LLM Inference - arXiv:2606.08098v1 Announce Type: new Abstract: Majority voting over sampled answers is the dominant unsupervised...
    Context
    This paper introduces a new method (PPV) for LLM aggregation that outperforms majority voting on key benchmarks. It directly addresses model reliability and inference techniques.
    Key points
    • This paper introduces a new method (PPV) for LLM aggregation that outperforms majority voting on key benchmarks. It directly addresses model reliability and inference techniques.
    Provenance
    Article · Supporting source
  19. 19

    arXiv cs.AI - Research Science (GLOBAL)

    Article Zayx Shawn

    PACE: Anytime-Valid Acceptance Tests for Self-Evolving Agents - arXiv:2606.08106v1 Announce Type: new Abstract: Self-evolving agents improve by repeatedly proposing changes to their own prompts, skills, or workflows...

    arxiv.org/abs/2606.08106 →
    Details
    Excerpt
    PACE: Anytime-Valid Acceptance Tests for Self-Evolving Agents - arXiv:2606.08106v1 Announce Type: new Abstract: Self-evolving agents improve by repeatedly proposing changes to their own prompts, skills, or workflows...
    Context
    Presents a primary artifact (paper) addressing agent reliability and self-improvement mechanisms, directly impacting agentic coding tools.
    Key points
    • Presents a primary artifact (paper) addressing agent reliability and self-improvement mechanisms, directly impacting agentic coding tools.
    Provenance
    Article · Supporting source
  20. 20

    arXiv cs.AI - Research Science (GLOBAL)

    Article Yichen Chen, Siying Li, Yuhang Liang, Lijun Wang, Renyang Liu

    SAGE: An LLM-driven Self Reflective Agentic Framework for Fraud Detection - arXiv:2606.08146v1 Announce Type: new Abstract: Fraud detection in payment, e-commerce, and telecommunications systems requires accuracy at...

    arxiv.org/abs/2606.08146 →
    Details
    Excerpt
    SAGE: An LLM-driven Self Reflective Agentic Framework for Fraud Detection - arXiv:2606.08146v1 Announce Type: new Abstract: Fraud detection in payment, e-commerce, and telecommunications systems requires accuracy at...
    Context
    This paper introduces a novel, end-to-end LLM-driven agentic framework (SAGE) for fraud detection, reporting strong quantitative results and providing code. This directly relates to agentic tools and practical AI applications.
    Key points
    • This paper introduces a novel, end-to-end LLM-driven agentic framework (SAGE) for fraud detection, reporting strong quantitative results and providing code. This directly relates to agentic tools and practical AI applications.
    Provenance
    Article · Supporting source
  21. 21

    arXiv cs.AI - Research Science (GLOBAL)

    Article Xinyu Guan, Qianyang Zhao, Yuming Deng

    Decision-Aware Memory Cards: Counterfactual-Inspired Context Selection and Compression for Tool-Using LLM Agents - arXiv:2606.08151v1 Announce Type: new Abstract: Tool-using LLM agents often fail not because relevant...

    arxiv.org/abs/2606.08151 →
    Details
    Excerpt
    Decision-Aware Memory Cards: Counterfactual-Inspired Context Selection and Compression for Tool-Using LLM Agents - arXiv:2606.08151v1 Announce Type: new Abstract: Tool-using LLM agents often fail not because relevant...
    Context
    Describes a new artifact (CICL) for tool-using LLM agents, focusing on context selection and compression—a core topic.
    Key points
    • Describes a new artifact (CICL) for tool-using LLM agents, focusing on context selection and compression—a core topic.
    Provenance
    Article · Supporting source
  22. 22

    arXiv cs.AI - Research Science (GLOBAL)

    Article Hyogon Ryu, Jeonghwan Kim, Yewon Lim, Chaeun Lee, Jeongwook Kim, Donghoon Ham

    Online Agent-as-a-Judge: Situation-Generating Evaluation for Interactive Agents - arXiv:2606.08200v1 Announce Type: new Abstract: Evaluating LLM-powered interactive social agents is challenging because socially...

    arxiv.org/abs/2606.08200 →
    Details
    Excerpt
    Online Agent-as-a-Judge: Situation-Generating Evaluation for Interactive Agents - arXiv:2606.08200v1 Announce Type: new Abstract: Evaluating LLM-powered interactive social agents is challenging because socially...
    Context
    Presents a new evaluation framework (Online Agent-as-a-Judge) for interactive social agents, directly addressing agentic capabilities and testing methods.
    Key points
    • Presents a new evaluation framework (Online Agent-as-a-Judge) for interactive social agents, directly addressing agentic capabilities and testing methods.
    Provenance
    Article · Supporting source
  23. 23

    arXiv cs.AI - Research Science (GLOBAL)

    Article Tanush Swaminathan, Runmin Jiang, Letian Zhang, Min Xu

    SciTrace: Trajectory-Aware Safety Reasoning for Scientific Discovery Agents - arXiv:2606.08234v1 Announce Type: new Abstract: LLM-based scientific agents have shown strong capacity for autonomous research, yet their...

    arxiv.org/abs/2606.08234 →
    Details
    Excerpt
    SciTrace: Trajectory-Aware Safety Reasoning for Scientific Discovery Agents - arXiv:2606.08234v1 Announce Type: new Abstract: LLM-based scientific agents have shown strong capacity for autonomous research, yet their...
    Context
    Addresses agent safety and reliability (SciTrace), a core concern for building autonomous AI agents in scientific discovery.
    Key points
    • Addresses agent safety and reliability (SciTrace), a core concern for building autonomous AI agents in scientific discovery.
    Provenance
    Article · Supporting source
  24. 24

    arXiv cs.AI - Research Science (GLOBAL)

    Article Wisdom Dogah

    Traxia: A Framework for Verifiable, Agent-Native Scientific Publishing - arXiv:2606.08256v1 Announce Type: new Abstract: Verifiability, attribution, and reproducibility are foundational requirements of scientific...

    arxiv.org/abs/2606.08256 →
    Details
    Excerpt
    Traxia: A Framework for Verifiable, Agent-Native Scientific Publishing - arXiv:2606.08256v1 Announce Type: new Abstract: Verifiability, attribution, and reproducibility are foundational requirements of scientific...
    Context
    This introduces a new infrastructure (Traxia) for AI scientific publishing, fundamentally changing how research is validated and attributed. It impacts knowledge control and provenance.
    Key points
    • This introduces a new infrastructure (Traxia) for AI scientific publishing, fundamentally changing how research is validated and attributed. It impacts knowledge control and provenance.
    Provenance
    Article · Supporting source
  25. 25

    @awnihannun (Awni Hannun)

    X awnihannun

    Three MLX videos dropped at WWDC: Running agents locally by @angeloskath https:// youtube.com/watch?v=wykPEr J8M-8 … Distributed inference and training by Tatiana Likhomanenko https:// youtube.com/watch?v=CzgK02 zsRg4…

    x.com/awnihannun/status/2064199840658256166 →
    Details
    Excerpt
    Three MLX videos dropped at WWDC: Running agents locally by @angeloskath https:// youtube.com/watch?v=wykPEr J8M-8 … Distributed inference and training by Tatiana Likhomanenko https:// youtube.com/watch?v=CzgK02 zsRg4…
    Context
    Reports multiple specific artifacts (videos) detailing local agents and distributed training/inference using MLX, directly addressing AI infrastructure and tools.
    Key points
    • Reports multiple specific artifacts (videos) detailing local agents and distributed training/inference using MLX, directly addressing AI infrastructure and tools.
    Provenance
    Tweet · Primary source
  26. 26

    @Jchammond_ (Connor)

    X Jchammond_

    Foundation Models has a CLI

    x.com/Jchammond_/status/2064206029370630529 →
    Details
    Excerpt
    Foundation Models has a CLI
    Context
    Announcing a new capability (CLI) for Foundation Models is a primary artifact/tool update directly related to AI infrastructure and development tools.
    Key points
    • Announcing a new capability (CLI) for Foundation Models is a primary artifact/tool update directly related to AI infrastructure and development tools.
    Provenance
    Tweet · Primary source
  27. 27

    @suchenzang (Susan Zhang)

    X suchenzang

    agi happened when the opportunity cost of producing a meaningful frontier benchmark far far far exceeded simply* building and selling the product-benchmark directly ----- *simply here does not imply simple, trivial, or…

    x.com/suchenzang/status/2064237678204481978 →
    Details
    Excerpt
    agi happened when the opportunity cost of producing a meaningful frontier benchmark far far far exceeded simply* building and selling the product-benchmark directly ----- *simply here does not imply simple, trivial, or…
    Context
    Directly addresses the core topic of AI frontier models and power dynamics by discussing the economic/strategic shift in building benchmarks.
    Key points
    • Directly addresses the core topic of AI frontier models and power dynamics by discussing the economic/strategic shift in building benchmarks.
    Provenance
    Tweet · Primary source
  28. 28

    Korea Ministry of Science and ICT Press Releases - Policy Geopolitics (KR)

    Article

    과기정통부, 물리적 인공지능(피지컬 AI) 핵심기술 국산화를 위한 선도 사업 본격 착수

    www.msit.go.kr/bbs/view.do?bbsSeqNo=94&nttS… →
    Details
    Excerpt
    과기정통부, 물리적 인공지능(피지컬 AI) 핵심기술 국산화를 위한 선도 사업 본격 착수
    Context
    Directly addresses 'physical-world AI' and national strategy for core technology localization (policy/geopolitics).
    Key points
    • Directly addresses 'physical-world AI' and national strategy for core technology localization (policy/geopolitics).
    Provenance
    Article · Supporting source
  29. 29

    Forbes Innovation - Industry Adjacent (US)

    Article Lance Eliot, Contributor

    Lawmakers Are Aiming To Regulate AI-Builds-AI Before AI Gets Entirely Beyond Human Control - Anthropic has brought attention to AI-builds-AI, involving using AI to advance AI. Some believe new AI laws should pause...

    www.forbes.com/sites/lanceeliot/2026/06/09/… →
    Details
    Excerpt
    Lawmakers Are Aiming To Regulate AI-Builds-AI Before AI Gets Entirely Beyond Human Control - Anthropic has brought attention to AI-builds-AI, involving using AI to advance AI. Some believe new AI laws should pause...
    Context
    Directly addresses regulation (policy/geopolitics) of advanced AI development ('AI-builds-AI'), which is central to power dynamics and control.
    Key points
    • Directly addresses regulation (policy/geopolitics) of advanced AI development ('AI-builds-AI'), which is central to power dynamics and control.
    Provenance
    Article · Supporting source
  30. 30

    Microsoft's open source tools were hacked to steal passwords of AI developers — 225 pts · 96 comments

    Article raffael_de

    https://techcrunch.com/2026/06/08/microsofts-open-source-tools-were-hacked-to-steal-passwords-of-ai-developers/ · @JdeBP: These seem related: * https://news.ycombinator.com/item?id=48418318 (The Blight Reaches…

    techcrunch.com/2026/06/08/microsofts-open-s… →
    Details
    Excerpt
    https://techcrunch.com/2026/06/08/microsofts-open-source-tools-were-hacked-to-steal-passwords-of-ai-developers/ · @JdeBP: These seem related: * https://news.ycombinator.com/item?id=48418318 (The Blight Reaches…
    Context
    Directly addresses AI security/infrastructure risks (hacks targeting AI devs). Focuses on power dynamics and infrastructure vulnerability.
    Key points
    • Directly addresses AI security/infrastructure risks (hacks targeting AI devs). Focuses on power dynamics and infrastructure vulnerability.
    Provenance
    Article · Supporting source
  31. 31

    The Guardian Technology - Industry Adjacent (UK)

    Article Denis Campbell Health policy editor

    Doctors and NHS could be sued for mistakes made by AI tools, report warns - Medical Protection Society calls for law to be overhauled to help medics avoid liability for errors made by technology Doctors and the NHS...

    www.theguardian.com/society/2026/jun/09/doc… →
    Details
    Excerpt
    Doctors and NHS could be sued for mistakes made by AI tools, report warns - Medical Protection Society calls for law to be overhauled to help medics avoid liability for errors made by technology Doctors and the NHS...
    Context
    Directly addresses liability and regulation (policy/law) concerning AI use in medicine, a core power dynamic topic.
    Key points
    • Directly addresses liability and regulation (policy/law) concerning AI use in medicine, a core power dynamic topic.
    Provenance
    Article · Supporting source
  32. 32

    @Xianbao_QIAN (Tiezhen WANG)

    X Xianbao_QIAN

    New open weight model series from @NexEcosystem - Built on top of Qwen 3.5 series - Available in both Pro (397BA17B) and Mini (35BA3B) - Optimized for agentic adaptive thinking & long context - Apache 2 license…

    x.com/Xianbao_QIAN/status/20642576842837814… →
    Details
    Excerpt
    New open weight model series from @NexEcosystem - Built on top of Qwen 3.5 series - Available in both Pro (397BA17B) and Mini (35BA3B) - Optimized for agentic adaptive thinking & long context - Apache 2 license…
    Context
    Announcing a new open-weight model series (Pro/Mini) optimized for agentic thinking and long context is a primary artifact that directly relates to frontier models and AI infrastructure.
    Key points
    • Announcing a new open-weight model series (Pro/Mini) optimized for agentic thinking and long context is a primary artifact that directly relates to frontier models and AI infrastructure.
    Provenance
    Tweet · Primary source
  33. 33

    Axios - Industry Adjacent (US)

    Article Ina Fried

    Apple's Siri AI is both cool and 2 years too late - Apple is finally delivering the conversational and context-aware AI that it promised two years ago . Its rivals have already moved on to agents. Why it matters:...

    www.axios.com/2026/06/09/apple-siri-ai-agen… →
    Details
    Excerpt
    Apple's Siri AI is both cool and 2 years too late - Apple is finally delivering the conversational and context-aware AI that it promised two years ago . Its rivals have already moved on to agents. Why it matters:...
    Context
    Directly addresses agentic AI tools and Apple's response to competitors (OpenAI/Anthropic), impacting developer mental models.
    Key points
    • Directly addresses agentic AI tools and Apple's response to competitors (OpenAI/Anthropic), impacting developer mental models.
    Provenance
    Article · Supporting source
  34. 34

    Techmeme - Industry Adjacent (US)

    Article

    Sources: China is drafting plans to spend $295B over the next five years on building AI data centers, sourcing 80%+ of tech from local suppliers like Huawei (Charlie Zhu/Bloomberg) - Charlie Zhu / Bloomberg : Sources:...

    www.techmeme.com/260609/p7 →
    Details
    Excerpt
    Sources: China is drafting plans to spend $295B over the next five years on building AI data centers, sourcing 80%+ of tech from local suppliers like Huawei (Charlie Zhu/Bloomberg) - Charlie Zhu / Bloomberg : Sources:...
    Context
    Details China's massive $295B plan for AI data centers and local sourcing (Huawei), directly addressing infrastructure, geopolitics, and control.
    Key points
    • Details China's massive $295B plan for AI data centers and local sourcing (Huawei), directly addressing infrastructure, geopolitics, and control.
    Provenance
    Article · Supporting source
  35. 35

    The Verge AI - Media Culture (US)

    Article Hayden Field

    Amazon employees ask Seattle to put the brakes on new data centers - On Tuesday, the Seattle City Council will vote on whether to enact a one-year moratorium on new data centers - just two months after several...

    www.theverge.com/ai-artificial-intelligence… →
    Details
    Excerpt
    Amazon employees ask Seattle to put the brakes on new data centers - On Tuesday, the Seattle City Council will vote on whether to enact a one-year moratorium on new data centers - just two months after several...
    Context
    Directly addresses AI infrastructure (data centers) and power dynamics/regulation (moratorium), impacting compute availability.
    Key points
    • Directly addresses AI infrastructure (data centers) and power dynamics/regulation (moratorium), impacting compute availability.
    Provenance
    Article · Supporting source
  36. 36

    Rest of World Latest - Media Culture (GLOBAL)

    Article Rina Chandran

    The Great AI Divide: Navigating U.S. and Chinese dominance - At a Rest of World event during New York Tech Week, we explored the challenges and possible solutions to the dominance of American and Chinese AI companies.

    restofworld.org/2026/ai-divide-america-chin… →
    Details
    Excerpt
    The Great AI Divide: Navigating U.S. and Chinese dominance - At a Rest of World event during New York Tech Week, we explored the challenges and possible solutions to the dominance of American and Chinese AI companies.
    Context
    Directly addresses power dynamics (US/China) and geopolitics shaping AI development, core to the podcast topic.
    Key points
    • Directly addresses power dynamics (US/China) and geopolitics shaping AI development, core to the podcast topic.
    Provenance
    Article · Supporting source
  37. 37

    Techmeme - Industry Adjacent (US)

    Article

    The UK is conducting a full review of its NHS contract with Palantir, amid growing pressure to terminate the deal in 2027 over reliance on US tech companies (Sam Tabahriti/Reuters) - Sam Tabahriti / Reuters : The UK is.…

    www.techmeme.com/260609/p11 →
    Details
    Excerpt
    The UK is conducting a full review of its NHS contract with Palantir, amid growing pressure to terminate the deal in 2027 over reliance on US tech companies (Sam Tabahriti/Reuters) - Sam Tabahriti / Reuters : The UK is...
    Context
    Directly addresses geopolitical power dynamics and national control over critical infrastructure (NHS), fitting the podcast's focus on labs, regulators, and geopolitics.
    Key points
    • Directly addresses geopolitical power dynamics and national control over critical infrastructure (NHS), fitting the podcast's focus on labs, regulators, and geopolitics.
    Provenance
    Article · Supporting source
  38. 38

    Techmeme - Industry Adjacent (US)

    Article

    Sources: Taiwan considers restricting AI chip sales to all Chinese customers, rather than only blacklisted entities like Huawei, to align with US measures (Bloomberg) - Bloomberg : Sources: Taiwan considers restricting.…

    www.techmeme.com/260609/p13 →
    Details
    Excerpt
    Sources: Taiwan considers restricting AI chip sales to all Chinese customers, rather than only blacklisted entities like Huawei, to align with US measures (Bloomberg) - Bloomberg : Sources: Taiwan considers restricting...
    Context
    Directly addresses geopolitics and export controls (chips/Taiwan/China), a core topic of power dynamics shaping AI infrastructure.
    Key points
    • Directly addresses geopolitics and export controls (chips/Taiwan/China), a core topic of power dynamics shaping AI infrastructure.
    Provenance
    Article · Supporting source
  39. 39

    Techmeme - Industry Adjacent (US)

    Article

    Apple unveils new Apple Foundation Models: two on-device models, including a 20B-parameter multimodal model called AFM 3 Core Advanced, and three cloud models (Apple Machine Learning Research) - Apple Machine Learning...

    www.techmeme.com/260609/p17 →
    Details
    Excerpt
    Apple unveils new Apple Foundation Models: two on-device models, including a 20B-parameter multimodal model called AFM 3 Core Advanced, and three cloud models (Apple Machine Learning Research) - Apple Machine Learning...
    Context
    Apple unveiling new foundation models (on-device and cloud) directly impacts AI infrastructure and power dynamics.
    Key points
    • Apple unveiling new foundation models (on-device and cloud) directly impacts AI infrastructure and power dynamics.
    Provenance
    Article · Supporting source