◆ Braid Daily · 2026-05-24

Mythos found 10,000 bugs and won't be released

24 May 2026

A frontier bug-finder turned up more than 10,000 vulnerabilities in a month, and Anthropic says it's too dangerous to ship.

The lead

Project Glasswing, launched in April and powered by the still-unreleased Claude Mythos Preview, helped partners find more than 10,000 vulnerabilities in a single month. Cloudflare surfaced 2,000 bugs, 400 of them high or critical; Mozilla found and fixed 271 Firefox vulnerabilities, about ten times what an older Claude model managed. Anthropic says it won't ship the model: "it hasn't released…

Read source

Defense ships as the first real attacks arrive

Mythos surfaces inside Claude Code and a new 'Claude Security' surface

@testingcatalog

New app strings reference "Access to the Claude Mythos model in Claude Code and Claude Security," which would put the same restricted bug-finder inside the tools developers already use. TestingCatalog expects it to stay gated rather than open to the public.

“Mythos 1, "claude-mythos-1-preview", is being prepared for a release on Claude Code and Claude Security.”

Read source

The first real-world prompt injection, delivered through GitHub Issues

@rez0__ (Joseph Thacker)

Joseph Thacker, who tests AI products for OpenAI and Google, says this is the first genuine in-the-wild prompt injection he has seen, not a lab demo, and it arrives through the exact channel researchers have been probing.

“This is the first REAL one I've seen. And it's using GitHub issues which is the main way/channel that gets tested these days.”

Read source

How it works: a fake security finding that exfiltrates over DNS

@inf0stache

The malicious issue uses security-finding language to get an agent to run a local scan.js that reads the home directory for secrets and leaks them over DNS, a channel that slips past egress rules built only to block outbound HTTP. The diagram traces the chain.

“The issue uses fake security finding language to push a local scan.js, which searches the home directory for secrets, base64 encodes the results, and reports over DNS.”

Read source

Coding's boring 90%, and the bill for the rest

A $3, 2-million-token mass refactor, with one funny deadlock

r/singularity

A poster ran an autonomous refactor across a 120-file FastAPI service. It took about 400 steps and 2 million tokens for $3, on cheap worker models that cost around 80 times less than Opus. The routine work landed; the hard part didn't.

“it confidently introduced a deadlock into my async event handler which was genuinely funny, so the hard 10% still needs opus.”

Read source

Addy Osmani: fine for side projects, tech debt for shared codebases

@addyosmani

A Google Chrome engineering leader draws the line between AI code in throwaway projects and AI code in a team codebase nobody fully understands.

“For side-projects that may be fine, but for anything team/shared I feel it's a recipe for tech debt down the line.”

Read source

'Cognitive surrender': shipping code you can't explain

@nakadai_mon

A name for the failure pattern underneath the tech-debt worry: developers who ship AI output and, when asked, can't say what it does.

“I've seen people with cognitive surrender and when called on it, they have no idea what said text or code means.”

Read source

r/programming ends its April LLM ban, writes a standing policy instead

r/programming

The 6.9-million-member subreddit ran a one-month trial ban on large language model content, took community feedback, and replaced it with rules rather than a blanket ban, a read on where developer culture is landing.

“After temporarily banning LLM-related content over April... we've decided to bring about an end of the temporary, I-can't-believe-it's-still-April ban on AI-related posts.”

Read source

Where the agent UI goes next

The missing primitive for agent swarms is coordination

Lou Bichard, Ona (AI Engineer talk)

Bichard argues the runtime, orchestration, and triggers for background agents are effectively solved; what's missing is a shared coordination layer, so teams keep abusing GitHub and Linear to stand in for one. He also makes the case for VM isolation over containers.

“Out of these primitives, I do believe we've effectively solved the runtime... the triggers are solved, but the thing that's missing for me is coordination.”

Read source

Your agent is an infinite canvas, and chat is the CLI phase

Rachel Lee Nabors, Arise (AI Engineer talk)

Nabors demos a working comic reader rendered inside Claude as a sandboxed MCP app, and argues the bare chat window is to agentic software what the command line was to software: a developer phase, not the end state.

“It's been said that chat is the lowest common denominator of the user experience. That it is to the future of agentic experiences what the CLI was to software.”

Read source

Companion episode

The capability got here first: Mythos, a real prompt injection, and the structure that hasn't caught up

2026-05-24 · 00:21:32

Episode Sources Transcript Chapters JSON

Two threads from the week meet today. We've followed Anthropic toward its first profitable quarter; now the same company says its strongest security model is too dangerous to release. And yesterday's 'I don't write code anymore' optimism runs straight into Addy Osmani's tech-debt warning and the first real attack aimed at agents reading untrusted repos. Defense and offense are scaling on the same loop.