Archive BRAID DAILY
GPT-5.5 instant ships less sycophantic, and a flood of agent-reliability papers
Subscribe

Braid Daily · 2026-05-29

GPT-5.5 instant ships less sycophantic, and a flood of agent-reliability papers

Pokrass: 'the previous model was too bullet pilled.' Plus: the OpenAI Agents SDK refresh, ParaTool, TOON, and a politics-of-AI doubleheader.

The lead

1

Michelle Pokrass on today's GPT-5.5 instant update: 'the previous model was too bullet pilled. the new one improves on some other important dimensions: sycophancy, factuality, and multilingual performance.' A direct note from the team that sets the default voice of one of the most-used models in the world.

Read source

Agents in production

5

Build Hour: Agents SDK

OpenAI / YouTube

OpenAI's 47-minute walkthrough of the updated Agents SDK, focused on long-running agents and a model-native harness — the pitch being that the harness is now part of the model contract rather than glue around it.

Read source

pibot, fully local

@badlogicgames

Mario Zechner reports pibot running end-to-end on-device: parakeet for STT, qwen3-tts for TTS, Qwen 3.6 as the multimodal LLM via llama.cpp, with the STT and TTS engines ported to Rust/mlx-c. Zero Python in the runtime.

“pibot is now running fully local, using parakeet for STT, qwen3-tts for TTS, and Qwen 3.6 as the local multi-modal LLM via llama.cpp.”

Read source

Two budgets for tokens

@emollick

Ethan Mollick argues organizations should think of AI spend as funding two distinct things — building stuff, and the experimentation that figures out how to build stuff — and that most teams haven't separated those budgets.

Read source

Reliability research catches up to deployment

5

Scaling Monosemanticity to Claude 3 Sonnet

arXiv 2605.29358

Anthropic's interpretability team extracts interpretable features from Claude 3 Sonnet using sparse autoencoders — the first time the monosemanticity program has been pushed past toy models onto a production frontier model.

Read source

Provably Secure Agent Guardrail

arXiv 2605.29251

A formal guardrail construction for agents with execution privileges, with provable security guarantees. Pairs naturally with the Redpanda agentic data plane paper on out-of-band metadata for safe autonomy.

Read source

The chain holds, the answer folds

arXiv 2605.29087

Reasoning models under adversarial pressure preserve a clean chain-of-thought trace and then capitulate in the final answer. The paper names the dissociation and measures how often it appears across single-turn benchmarks.

Read source

MMPO: memory drift on long horizons

arXiv 2605.30159

Memory-augmented agents that recursively summarize their own context drift in predictable ways over long-horizon tasks. MMPO is a policy-optimization patch for that drift, aimed at agents that need to hold a belief state across many steps.

Read source

Governing Technical Debt in Agentic AI

arXiv 2605.29129

A vocabulary paper. The authors define 'agentic technical debt' and a 'stochastic tax' that production agentic systems pay as they reason over many steps and call many tools, and propose governance hooks for both.

Read source

Politics catches up

4

Inside the Democratic resistance on AI

Axios

Maria Curi maps five progressive Democrats — Sanders, Warren, AOC, Khanna, Markey — who are pushing AI moratoriums, taxes, and labor protections into a confrontational party message. Read it for the specific policy proposals, not the politics.

Read source

Billionaires work to contain AI's populist revolt

Axios

Zachary Basu on how the tech-billionaire class is moving early to defuse an AI-driven backlash. The piece's framing line — 'the pitchforks are here' — is from an unnamed billionaire describing what they think is already happening.

“The pitchforks are here”

Read source

Infra notes

3

XCENA: $135M on memory, not compute

TechCrunch

South Korean chip startup XCENA raised $135M at a $570M valuation on the bet that the bottleneck holding back AI deployment is memory bandwidth, not raw compute — a thesis that pairs with this week's reporting on HBM supply tightening.

Read source

A second read on the Microsoft / Copilot move

@trengriffin

Tren Griffin pushes back on the framing of Microsoft moving engineers from Claude Code to GitHub Copilot — both running Opus 4.7 on enterprise API — and argues it's dogfooding of the Copilot harness, not a signal about the underlying model. Follow-up to Monday's coverage.

Read source

Companion episode

Locally coherent, globally not

· 00:22:01

Two threads to keep an eye on. First, the agent-reliability literature is now arriving faster than any single team can read it — guardrails, memory drift, tool retrieval, technical debt, all in one day's arXiv listing. Second, the political pressure on AI is no longer hypothetical: a named caucus on one side, a defensive billionaire class on the other, and OpenAI briefing the White House on biodefense. Those two threads are going to keep meeting.