◆ Braid Daily · 2026-05-29

GPT-5.5 instant ships less sycophantic, and a flood of agent-reliability papers

29 May 2026

Pokrass: 'the previous model was too bullet pilled.' Plus: the OpenAI Agents SDK refresh, ParaTool, TOON, and a politics-of-AI doubleheader.

The lead

Michelle Pokrass on today's GPT-5.5 instant update: 'the previous model was too bullet pilled. the new one improves on some other important dimensions: sycophancy, factuality, and multilingual performance.' A direct note from the team that sets the default voice of one of the most-used models in the world.

Read source

Agents in production

Build Hour: Agents SDK

OpenAI / YouTube

OpenAI's 47-minute walkthrough of the updated Agents SDK, focused on long-running agents and a model-native harness — the pitch being that the harness is now part of the model contract rather than glue around it.

Read source

pibot, fully local

@badlogicgames

Mario Zechner reports pibot running end-to-end on-device: parakeet for STT, qwen3-tts for TTS, Qwen 3.6 as the multimodal LLM via llama.cpp, with the STT and TTS engines ported to Rust/mlx-c. Zero Python in the runtime.

“pibot is now running fully local, using parakeet for STT, qwen3-tts for TTS, and Qwen 3.6 as the local multi-modal LLM via llama.cpp.”

Read source

Two budgets for tokens

@emollick

Ethan Mollick argues organizations should think of AI spend as funding two distinct things — building stuff, and the experimentation that figures out how to build stuff — and that most teams haven't separated those budgets.

Read source

ParaTool: tool schemas as parameters, not context

arXiv 2605.29561

Stuffing tool schemas into the context window is itself the bottleneck. ParaTool moves tool representations into model parameters, sidestepping the per-call token cost agent systems have been swallowing.

Read source

Notation Matters: TOON, TRON, and the JSON tax

arXiv 2605.29676

A benchmark of token-optimized data formats (TOON, TRON) as drop-in replacements for JSON in agent tool schemas. The authors' claim is that JSON is the wrong serialization for what an agent pays per call.

Read source

Reliability research catches up to deployment

Scaling Monosemanticity to Claude 3 Sonnet

arXiv 2605.29358

Anthropic's interpretability team extracts interpretable features from Claude 3 Sonnet using sparse autoencoders — the first time the monosemanticity program has been pushed past toy models onto a production frontier model.

Read source

Provably Secure Agent Guardrail

arXiv 2605.29251

A formal guardrail construction for agents with execution privileges, with provable security guarantees. Pairs naturally with the Redpanda agentic data plane paper on out-of-band metadata for safe autonomy.

Read source

The chain holds, the answer folds

arXiv 2605.29087

Reasoning models under adversarial pressure preserve a clean chain-of-thought trace and then capitulate in the final answer. The paper names the dissociation and measures how often it appears across single-turn benchmarks.

Read source

MMPO: memory drift on long horizons

arXiv 2605.30159

Memory-augmented agents that recursively summarize their own context drift in predictable ways over long-horizon tasks. MMPO is a policy-optimization patch for that drift, aimed at agents that need to hold a belief state across many steps.

Read source

Governing Technical Debt in Agentic AI

arXiv 2605.29129

A vocabulary paper. The authors define 'agentic technical debt' and a 'stochastic tax' that production agentic systems pay as they reason over many steps and call many tools, and propose governance hooks for both.

Read source

Politics catches up

Inside the Democratic resistance on AI

Axios

Maria Curi maps five progressive Democrats — Sanders, Warren, AOC, Khanna, Markey — who are pushing AI moratoriums, taxes, and labor protections into a confrontational party message. Read it for the specific policy proposals, not the politics.

Read source

Billionaires work to contain AI's populist revolt

Axios

Zachary Basu on how the tech-billionaire class is moving early to defuse an AI-driven backlash. The piece's framing line — 'the pitchforks are here' — is from an unnamed billionaire describing what they think is already happening.

“The pitchforks are here”

Read source

OpenAI briefs the White House on GPT-Rosalind biodefense

Maria Curi / Axios (via Techmeme)

OpenAI says it has briefed the White House on a new biodefense program built on GPT-Rosalind, framed as pandemic preparedness. The policy reading is that frontier-lab compute is now a named national-security input.

Read source

Former Tesla labelers describe what backs FSD's hazard detection

Reuters (via Techmeme)

Reuters interviews former Tesla data labelers on the manual hazard-mapping work that sits behind FSD, and walks through crash data the reporters say shows Tesla's published safety methodology overstates FSD performance.

Read source

Infra notes

XCENA: $135M on memory, not compute

TechCrunch

South Korean chip startup XCENA raised $135M at a $570M valuation on the bet that the bottleneck holding back AI deployment is memory bandwidth, not raw compute — a thesis that pairs with this week's reporting on HBM supply tightening.

Read source

3,000 tokens/sec per request on standard GPUs

kog.ai (via HN)

A claim of 3,000 tokens/sec per request on commodity GPUs without exotic hardware. The HN thread is the useful part — commenters press on the methodology and what 'standard' rules out.

Read source

A second read on the Microsoft / Copilot move

@trengriffin

Tren Griffin pushes back on the framing of Microsoft moving engineers from Claude Code to GitHub Copilot — both running Opus 4.7 on enterprise API — and argues it's dogfooding of the Copilot harness, not a signal about the underlying model. Follow-up to Monday's coverage.

Read source

Companion episode

Locally coherent, globally not

2026-05-29 · 00:22:01

Episode Sources Transcript Chapters JSON

Two threads to keep an eye on. First, the agent-reliability literature is now arriving faster than any single team can read it — guardrails, memory drift, tool retrieval, technical debt, all in one day's arXiv listing. Second, the political pressure on AI is no longer hypothetical: a named caucus on one side, a defensive billionaire class on the other, and OpenAI briefing the White House on biodefense. Those two threads are going to keep meeting.