Project Glasswing, launched in April and powered by the still-unreleased Claude Mythos Preview, helped partners find more than 10,000 vulnerabilities in a single month. Cloudflare surfaced 2,000 bugs, 400 of them high or critical; Mozilla found and fixed 271 Firefox vulnerabilities, about ten times what an older Claude model managed. Anthropic says it won't ship the model: "it hasn't released…
Read source◆ Braid Daily · 2026-05-24
Mythos found 10,000 bugs and won't be released
A frontier bug-finder turned up more than 10,000 vulnerabilities in a month, and Anthropic says it's too dangerous to ship.
The lead
1
Defense ships as the first real attacks arrive
3Mythos surfaces inside Claude Code and a new 'Claude Security' surface
@testingcatalog
New app strings reference "Access to the Claude Mythos model in Claude Code and Claude Security," which would put the same restricted bug-finder inside the tools developers already use. TestingCatalog expects it to stay gated rather than open to the public.
Read source“Mythos 1, "claude-mythos-1-preview", is being prepared for a release on Claude Code and Claude Security.”
The first real-world prompt injection, delivered through GitHub Issues
@rez0__ (Joseph Thacker)
Joseph Thacker, who tests AI products for OpenAI and Google, says this is the first genuine in-the-wild prompt injection he has seen, not a lab demo, and it arrives through the exact channel researchers have been probing.
Read source“This is the first REAL one I've seen. And it's using GitHub issues which is the main way/channel that gets tested these days.”
How it works: a fake security finding that exfiltrates over DNS
@inf0stache
The malicious issue uses security-finding language to get an agent to run a local scan.js that reads the home directory for secrets and leaks them over DNS, a channel that slips past egress rules built only to block outbound HTTP. The diagram traces the chain.
Read source“The issue uses fake security finding language to push a local scan.js, which searches the home directory for secrets, base64 encodes the results, and reports over DNS.”
Coding's boring 90%, and the bill for the rest
4A $3, 2-million-token mass refactor, with one funny deadlock
r/singularity
A poster ran an autonomous refactor across a 120-file FastAPI service. It took about 400 steps and 2 million tokens for $3, on cheap worker models that cost around 80 times less than Opus. The routine work landed; the hard part didn't.
Read source“it confidently introduced a deadlock into my async event handler which was genuinely funny, so the hard 10% still needs opus.”
Addy Osmani: fine for side projects, tech debt for shared codebases
@addyosmani
A Google Chrome engineering leader draws the line between AI code in throwaway projects and AI code in a team codebase nobody fully understands.
Read source“For side-projects that may be fine, but for anything team/shared I feel it's a recipe for tech debt down the line.”
'Cognitive surrender': shipping code you can't explain
@nakadai_mon
A name for the failure pattern underneath the tech-debt worry: developers who ship AI output and, when asked, can't say what it does.
Read source“I've seen people with cognitive surrender and when called on it, they have no idea what said text or code means.”
r/programming ends its April LLM ban, writes a standing policy instead
r/programming
The 6.9-million-member subreddit ran a one-month trial ban on large language model content, took community feedback, and replaced it with rules rather than a blanket ban, a read on where developer culture is landing.
Read source“After temporarily banning LLM-related content over April... we've decided to bring about an end of the temporary, I-can't-believe-it's-still-April ban on AI-related posts.”
Where the agent UI goes next
2The missing primitive for agent swarms is coordination
Lou Bichard, Ona (AI Engineer talk)
Bichard argues the runtime, orchestration, and triggers for background agents are effectively solved; what's missing is a shared coordination layer, so teams keep abusing GitHub and Linear to stand in for one. He also makes the case for VM isolation over containers.
Read source“Out of these primitives, I do believe we've effectively solved the runtime... the triggers are solved, but the thing that's missing for me is coordination.”
Your agent is an infinite canvas, and chat is the CLI phase
Rachel Lee Nabors, Arise (AI Engineer talk)
Nabors demos a working comic reader rendered inside Claude as a sandboxed MCP app, and argues the bare chat window is to agentic software what the command line was to software: a developer phase, not the end state.
Read source“It's been said that chat is the lowest common denominator of the user experience. That it is to the future of agentic experiences what the CLI was to software.”
Companion episode
The capability got here first: Mythos, a real prompt injection, and the structure that hasn't caught up
Two threads from the week meet today. We've followed Anthropic toward its first profitable quarter; now the same company says its strongest security model is too dangerous to release. And yesterday's 'I don't write code anymore' optimism runs straight into Addy Osmani's tech-debt warning and the first real attack aimed at agents reading untrusted repos. Defense and offense are scaling on the same loop.