Michelle Pokrass on today's GPT-5.5 instant update: 'the previous model was too bullet pilled. the new one improves on some other important dimensions: sycophancy, factuality, and multilingual performance.' A direct note from the team that sets the default voice of one of the most-used models in the world.
Read source◆ Braid Daily · 2026-05-29
GPT-5.5 instant ships less sycophantic, and a flood of agent-reliability papers
Pokrass: 'the previous model was too bullet pilled.' Plus: the OpenAI Agents SDK refresh, ParaTool, TOON, and a politics-of-AI doubleheader.
The lead
1Agents in production
5Build Hour: Agents SDK
OpenAI / YouTube
OpenAI's 47-minute walkthrough of the updated Agents SDK, focused on long-running agents and a model-native harness — the pitch being that the harness is now part of the model contract rather than glue around it.
Read sourcepibot, fully local
@badlogicgames
Mario Zechner reports pibot running end-to-end on-device: parakeet for STT, qwen3-tts for TTS, Qwen 3.6 as the multimodal LLM via llama.cpp, with the STT and TTS engines ported to Rust/mlx-c. Zero Python in the runtime.
Read source“pibot is now running fully local, using parakeet for STT, qwen3-tts for TTS, and Qwen 3.6 as the local multi-modal LLM via llama.cpp.”
Two budgets for tokens
@emollick
Ethan Mollick argues organizations should think of AI spend as funding two distinct things — building stuff, and the experimentation that figures out how to build stuff — and that most teams haven't separated those budgets.
Read sourceParaTool: tool schemas as parameters, not context
arXiv 2605.29561
Stuffing tool schemas into the context window is itself the bottleneck. ParaTool moves tool representations into model parameters, sidestepping the per-call token cost agent systems have been swallowing.
Read sourceNotation Matters: TOON, TRON, and the JSON tax
arXiv 2605.29676
A benchmark of token-optimized data formats (TOON, TRON) as drop-in replacements for JSON in agent tool schemas. The authors' claim is that JSON is the wrong serialization for what an agent pays per call.
Read sourceReliability research catches up to deployment
5Scaling Monosemanticity to Claude 3 Sonnet
arXiv 2605.29358
Anthropic's interpretability team extracts interpretable features from Claude 3 Sonnet using sparse autoencoders — the first time the monosemanticity program has been pushed past toy models onto a production frontier model.
Read sourceProvably Secure Agent Guardrail
arXiv 2605.29251
A formal guardrail construction for agents with execution privileges, with provable security guarantees. Pairs naturally with the Redpanda agentic data plane paper on out-of-band metadata for safe autonomy.
Read sourceThe chain holds, the answer folds
arXiv 2605.29087
Reasoning models under adversarial pressure preserve a clean chain-of-thought trace and then capitulate in the final answer. The paper names the dissociation and measures how often it appears across single-turn benchmarks.
Read sourceMMPO: memory drift on long horizons
arXiv 2605.30159
Memory-augmented agents that recursively summarize their own context drift in predictable ways over long-horizon tasks. MMPO is a policy-optimization patch for that drift, aimed at agents that need to hold a belief state across many steps.
Read sourceGoverning Technical Debt in Agentic AI
arXiv 2605.29129
A vocabulary paper. The authors define 'agentic technical debt' and a 'stochastic tax' that production agentic systems pay as they reason over many steps and call many tools, and propose governance hooks for both.
Read sourcePolitics catches up
4Inside the Democratic resistance on AI
Axios
Maria Curi maps five progressive Democrats — Sanders, Warren, AOC, Khanna, Markey — who are pushing AI moratoriums, taxes, and labor protections into a confrontational party message. Read it for the specific policy proposals, not the politics.
Read sourceBillionaires work to contain AI's populist revolt
Axios
Zachary Basu on how the tech-billionaire class is moving early to defuse an AI-driven backlash. The piece's framing line — 'the pitchforks are here' — is from an unnamed billionaire describing what they think is already happening.
Read source“The pitchforks are here”
OpenAI briefs the White House on GPT-Rosalind biodefense
Maria Curi / Axios (via Techmeme)
OpenAI says it has briefed the White House on a new biodefense program built on GPT-Rosalind, framed as pandemic preparedness. The policy reading is that frontier-lab compute is now a named national-security input.
Read sourceFormer Tesla labelers describe what backs FSD's hazard detection
Reuters (via Techmeme)
Reuters interviews former Tesla data labelers on the manual hazard-mapping work that sits behind FSD, and walks through crash data the reporters say shows Tesla's published safety methodology overstates FSD performance.
Read sourceInfra notes
3XCENA: $135M on memory, not compute
TechCrunch
South Korean chip startup XCENA raised $135M at a $570M valuation on the bet that the bottleneck holding back AI deployment is memory bandwidth, not raw compute — a thesis that pairs with this week's reporting on HBM supply tightening.
Read source3,000 tokens/sec per request on standard GPUs
kog.ai (via HN)
A claim of 3,000 tokens/sec per request on commodity GPUs without exotic hardware. The HN thread is the useful part — commenters press on the methodology and what 'standard' rules out.
Read sourceA second read on the Microsoft / Copilot move
@trengriffin
Tren Griffin pushes back on the framing of Microsoft moving engineers from Claude Code to GitHub Copilot — both running Opus 4.7 on enterprise API — and argues it's dogfooding of the Copilot harness, not a signal about the underlying model. Follow-up to Monday's coverage.
Read sourceCompanion episode
Locally coherent, globally not
Two threads to keep an eye on. First, the agent-reliability literature is now arriving faster than any single team can read it — guardrails, memory drift, tool retrieval, technical debt, all in one day's arXiv listing. Second, the political pressure on AI is no longer hypothetical: a named caucus on one side, a defensive billionaire class on the other, and OpenAI briefing the White House on biodefense. Those two threads are going to keep meeting.