Following last week's thread on FUZZ-E and Mythos finding live CVEs: Kabir, who won DownUnderCTF with Blitzkrieg and competed top-10 with TheHackersCrew, says the scene has collapsed from the inside. Claude Opus 4.5 one-shots medium challenges across the CTFd API; GPT-5.5 Pro one-shots Insane heap pwn on HackTheBox. Plaid CTF is gone, and most historic top-10 teams aren't fielding full rosters.
Read source◆ Braid Daily · 2026-05-16
The CTF scene reports its own death
An Australian top-10 player on why Opus 4.5 and GPT-5.5 Pro broke open competitive CTF, plus Intercom and PFF on what agent-first actually…
The lead
1Agents on the org chart
3Intercom hit 2x PR throughput in a year
Brian Scanlan, Intercom
Scanlan, a senior principal at Intercom, reports the company hit its 2x PR-throughput target inside a year, with 17.6% of pull requests now auto-approved while SOC 2, ISO 27001, and HIPAA stay intact. Their bet: standardize on Claude Code, mandate adoption in job descriptions, and stream every session to S3 so a Stanford group can measure whether code quality drifts.
Read source“Everything that you can do, the agent must be able to do. And that can feel weird as well, when you're first connecting it into production systems.”
PFF: two engineers, ten engineers, same product
Mike Spitz, PFF
PFF's CTO ran a January-to-March case study pitting two Claude Code engineers against a team of ten on the same product. The pair shipped about five deploys a day to the team's one every five days, with CSAT moving from 7-7.5 to 8.6, and PFF dropped sprint planning, standups, refinement, and retros. Spitz's framing: optimize the agent's loop, not the engineer's output.
Read source“Instead of figuring out how we can help engineers output more, how do we help make the agents quicker?”
Supabase: MCP plus a skill closes a security flag the agent silently dropped
Pedro Rodrigues, Supabase
Rodrigues ran a Braintrust eval on Claude Sonnet 4.6 building a SQL view over a row-level-security table. With only the Supabase MCP server, the agent omitted security_invoker=true and silently exposed cross-tenant rows; pairing MCP with the official Supabase skill produced the safe version, and MCP-plus-skill won across all six scenarios on Claude 4.6 and Codex GPT-5.4.
Read source“If you don't explicitly pass security_invoker equals true, the view will bypass the RLS. The agent with the skill got this implemented correctly and safely; the one that only had access to the MCP tool did not.”
Counter-patterns
2Julia Evans moves off Tailwind after eight years
jvns.ca
Evans walks several sites back to semantic HTML and vanilla CSS, keeping only the parts of Tailwind that taught her structure — the reset, a color-variable palette, an xs/sm/md/lg type scale. CSS grid with auto-fit replaces most breakpoints; esbuild is the only build step; the 2.8MB tailwind.min.css files are gone.
Read source“It turns out Tailwind taught me a lot. Every CSS code base has a bunch of different things going on, and Tailwind has systems for some of these. Maybe I can imitate the systems I like.”
Armin Ronacher: bash as the only tool
@mitsuhiko
Sentry's co-founder is running an agent harness with bash as its only tool and telling it to make file edits via the patch binary — no apply_patch, no jq, no specialized file-edit tool. Useful counter-data to the heavy skills-and-MCP investment Intercom and Supabase argue for.
Read source“Unironically, bash-only is quite fun: pi -nes --tools bash --append-system-prompt "Use the patch binary to make edits."”
Where the models still drift
3Claude is telling users to go to sleep mid-session
Fortune
Hundreds of users have reported Claude telling them to go to bed — sometimes at 8:30 in the morning, sometimes three times in a row. Anthropic's Sam McAllister calls it a character tic and says they hope to fix it. Outside theories range from 25,000 books on human sleep needs in the corpus to context-window wrap-up behavior.
Read source“Sam McAllister at Anthropic called it a "bit of a character tic." We're aware of this and hoping to fix it in future models.”
Yishan: that's not what happened with Napster in 2000
@yishan
The former Reddit CEO was inside the file-sharing era and watched a frontier model invent its history with full confidence. His comparison: 'the Egyptians used dinosaurs to haul the big stones used to make pyramids.' A reminder that the models clearing CTFs can still confabulate basic recent history.
Read source“That's not at all what happened with Napster in 2000. I was there. That is some kind of imaginary scenario on the level of "the Egyptians used dinosaurs to haul the big stones used to make pyramids." What the hell kind of content were you trained on??”
Is the training-data-mean ceiling a 2024 take?
@Kirsten3531
A short post — 716 likes, 130 replies in hours — asking whether the line that large language models can't exceed the mean of their training data still holds in coding and math. Reads alongside today's CTF and PR-throughput stories on one side and Yishan's Napster post on the other.
Read source“My cousin is betting his career on "LLMs can never be more than the average of their training data" but I feel like that's a very 2024 take. Aren't we already past this in like, coding and math?”
On the edge
1Sparky: a fully offline suitcase robot on a Jetson Orin NX
r/LocalLLaMA
Gemma 4 E4B at Q4_K_M with q8_0 key-value cache runs on a Jetson Orin NX SUPER 16GB with a 12K context, no wifi, no bluetooth, no cellular. SenseVoiceSmall handles speech-to-text; Piper drives a 43Hz mouth-synced face. Cached time-to-first-token sits near 200ms with 14-15 tokens per second sustained, and 30-plus sensors fold into the prompt every turn.
Read source“No WiFi, no Bluetooth, no cellular. Gemma 4 E4B at Q4_K_M, q8_0 KV cache, flash attention, ~200ms cached TTFT, 14–15 tok/s sustained. 30+ sensors fold into the prompt as natural language every turn.”
Companion episode
CTFs, Scrum, and Claude's Bedtime
Two named engineering orgs reporting agent-first numbers on the same day a top-10 CTF player writes the scene's obituary is a coincidence worth holding together. The same Claude Opus 4.5 and GPT-5.5 Pro that emptied CTFd queues are the ones inside Intercom's 17.6% auto-approval rate. Tomorrow we'll see whether the second-order effects — Scrum dropped, rosters shrinking, security flags silently omitted — keep landing in the same direction.