Archive BRAID
Hackbots, Magento, and Three Lines of Logic / DISPATCH 025
PDF RSS

Dispatch 025 · 2026-05-13 GSV With One Run Overnight

Hackbots, Magento, and Three Lines of Logic

/ 00:29:55 / 13 sources

“A frontier model multiplies whatever the operator already had — the target list, the harness, the disclosure pipeline.”

— Lenar Kess, today's narration

An overnight hackbot run lands a real CVE in Adobe Magento. Codex starts driving local Mac apps in parallel, with per-app permissions and a separate cursor. Cloudflare publishes one of the prettiest debugging writeups of the year — a nine-year-old kernel patch, a 14ms oscillation, three lines of fix. Plus Nous Research's removable attention wrapper, GPT-5.5's first ProgramBench solve, Vercel's argument that giving an agent a file system changes how it behaves, a 26-million-parameter tool-calling model, Isomorphic's two-billion-dollar Series B, and a Purdue senior who put Rust on his graduation cap.

Chapters

  1. 00:00:04 A hackbot earned a CVE overnight
  2. 00:04:26 Codex starts driving your Mac apps
  3. 00:08:15 Cloudflare's CUBIC death spiral
  4. 00:13:06 Nous Research's Lighthouse Attention
  5. 00:15:48 GPT-5.5 cracks a ProgramBench task
  6. 00:19:35 Vercel: give the agent a computer
  7. 00:22:24 Needle: tool-calling in 26 million parameters
  8. 00:24:35 Isomorphic Labs raises 2.1 billion
  9. 00:26:47 A Rust grad cap, and a tweet about working out
  10. 00:29:44 Tomorrow's Angular zero-days and Lighthouse at scale

Sources

13 cited
  1. 1

    Hackbot FUZZ-E earns CVE for LFI in Adobe Magento, plus 2 Angular zero-days

    X @rez0__ — Joseph Thacker — AppSec researcher focused on AI red-teaming, prolific writer on agentic security

    With 1 run overnight, it found vulns in wildly hardened projects.

    x.com/rez0__/status/2054539643912077351 →
    Details
    Cited text
    With 1 run overnight, it found vulns in wildly hardened projects.
    Context
    An overnight autonomous run pulling a CVE from Magento changes the asymmetry on every open-source maintainer and enterprise codebase. The bar moved from "AI finds toy bugs" to "AI finds real bugs in projects that have already survived years of human review."
    Key points
    • FUZZ-E, an autonomous hackbot from @AutonomousCyber, earned a CVE for a local file inclusion vulnerability in Adobe Magento
    • Same overnight run also surfaced 2 zero-days in Angular (disclosure pending)
    • Targets were chosen by Thacker — these aren't toy codebases, they are mature enterprise projects
    • Thacker says the same hackbot is 'even better with gpt5.5 now' — model upgrades translate directly to bug yield per run
    Provenance
    Tweet · Primary source
  2. 2

    Adobe Magento security advisory APSB26-49

    Article Adobe Product Security

    Anchors the hackbot claim to an actual vendor advisory rather than a tweet. The CVE went through Adobe's standard process — the autonomy was at discovery, not at disclosure.

    helpx.adobe.com/security/products/magento/a… →
    Details
    Context
    Anchors the hackbot claim to an actual vendor advisory rather than a tweet. The CVE went through Adobe's standard process — the autonomy was at discovery, not at disclosure.
    Key points
    • Primary-source disclosure for the Magento LFI vulnerability found by FUZZ-E
    • Confirms a vendor patch shipped through the normal Adobe advisory channel
    Provenance
    Article · Supporting source
  3. 3

    Computer use in Codex

    Video OpenAI — Roma and Ari Weinstein — Ari Weinstein joined OpenAI from Sky/Shortcuts; Roma hosts the Codex product video

    Every computer use implementation I've ever seen takes over your entire computer. So you can't use your computer while the agent is using your apps.

    www.youtube.com/watch?v=D_FCYsshMI4 →
    Details
    Cited text
    Every computer use implementation I've ever seen takes over your entire computer. So you can't use your computer while the agent is using your apps.
    Context
    Local-app computer use is the bridge from chat-only coding agents to agents that can actually drive the rest of your workflow. The per-app permission model is the part to watch — it sets a precedent for how desktop agents will be sandboxed.
    Key points
    • Codex can now drive any local Mac application via accessibility framework plus screenshots
    • The agent uses a separate cursor and does not steal focus — you keep working while it operates other apps in the background
    • Permissions are per-app and require explicit allow on first use; the agent has no access to apps you haven't authorized
    • Pairing with the Spark model removes the multimodal screenshot dependency for many tasks because the accessibility tree provides textual structure — Weinstein claims this runs faster than a human can operate the app
    • Roadmap target: 2-5-10x human speed on common tasks; Windows support 'very soon'
    Provenance
    Video · Supporting source
  4. 4

    When "idle" isn't idle: how a Linux kernel optimization became a QUIC bug

    Article Esteban Carisimo and Antonio Vicente — Cloudflare

    The effort to find the bug was massive, but the fix itself was basically one line of logic.

    blog.cloudflare.com/quic-death-spiral-fix →
    Details
    Cited text
    The effort to find the bug was massive, but the fix itself was basically one line of logic.
    Context
    This is what senior-engineer craft looks like when no agent is going to help: a bug at the minimum-cwnd corner of CUBIC's state space, invisible to throughput dashboards, only surfaced because someone wrote a test that deliberately drove the controller into recovery. The fix is small; the investigation is the work.
    Key points
    • A 2017 Linux kernel patch for CUBIC's idle-handling was ported into quiche in 2020 — but the follow-up kernel fix from a week later was not
    • Test was failing 61% of the time: after a heavy loss burst, the congestion window pinned at the two-packet floor and never grew back
    • Diagnosis took weeks of qlog instrumentation; the smoking gun was a 14ms oscillation period matching the test's RTT
    • Root cause: bytes_in_flight hitting zero between ACK and next send was misread as an idle period, advancing the recovery boundary into the future on every cycle
    • Fix: measure idle from last_ack_time, not last_sent_time — a three-line patch contributed back to cloudflare/quiche
    Provenance
    Article · Supporting source
  5. 5

    Lighthouse Attention from Nous Research

    X @omarsar0 — Elvis Saravia — runs DAIR.AI, regular curator of frontier research

    What if you could speed up long-context pretraining with a subquadratic wrapper that you remove before deployment?

    x.com/omarsar0/status/2054224130103554359 →
    Details
    Cited text
    What if you could speed up long-context pretraining with a subquadratic wrapper that you remove before deployment?
    Context
    Most efficient-attention proposals lock you into a different architecture at deploy time. Lighthouse's train-with, deploy-without framing means labs can experiment with subquadratic pretraining without paying for it on every inference call.
    Key points
    • Nous Research paper on Lighthouse Attention
    • Wraps ordinary scaled-dot-product attention with a hierarchical, gradient-free selection layer for long-context pretraining
    • Selection layer is removable at deployment time, leaving plain vanilla attention behind
    • Trades off training-time compute for inference-time fidelity — a different shape from the usual efficient-attention work
    Provenance
    Tweet · Primary source
  6. 6

    Sebastian Raschka on Lighthouse Attention

    X @rasbt — Sebastian Raschka — author of "Build a Large Language Model From Scratch", prolific commentator on pretraining mechanics

    It is a relatively low-commitment attention modification. One can use it during most of training, switch back to vanilla attention near the end, and recover roughly the same modeling performance as if full attention had…

    x.com/rasbt/status/2054543968344412621 →
    Details
    Cited text
    It is a relatively low-commitment attention modification. One can use it during most of training, switch back to vanilla attention near the end, and recover roughly the same modeling performance as if full attention had [been used throughout].
    Context
    The most pragmatic read on a research paper from one of the most pragmatic readers in the field. Marking what is and isn't bet-the-run risk for a pretraining team is exactly the kind of context a builder needs.
    Key points
    • Raschka highlights the 'low-commitment' property — you don't have to bet your whole training run on the modification
    • Switch back to vanilla attention near the end and recover full-attention-equivalent quality
    • This is the unusual property; most efficient-attention work degrades final modeling quality unless you keep it on at inference
    Provenance
    Tweet · Primary source
  7. 7

    GPT 5.5 high Solves First Instance — ProgramBench

    Article Kilian Lieret and John (ProgramBench team) — Maintained by researchers behind the SWE-bench family at Princeton and FAIR

    First full solve on a benchmark designed to be unforgiving is a real milestone, but the comparison runs are the more useful read: a model that probes the CLI surface carefully in 34 calls is doing meaningfully different…

    programbench.com/blog/gpt-5-5-first-solve →
    Details
    Context
    First full solve on a benchmark designed to be unforgiving is a real milestone, but the comparison runs are the more useful read: a model that probes the CLI surface carefully in 34 calls is doing meaningfully different work than one that grinds through 178 calls and still ships case-sensitive string compares.
    Key points
    • GPT-5.5 with high reasoning becomes the first model ever to fully solve a ProgramBench task (the cmatrix instance)
    • Score is still 0.05% overall — one solved task out of thousands — but 26 tasks now pass 95%+ of unit tests
    • Both GPT-5.5 (high) and (xhigh) full-solved cmatrix in 34 and 40 API calls; Claude Opus 4.7 (xhigh) used 178 calls and racked 19 failures
    • Opus's 19 failures decomposed into two trivial bugs: strcmp instead of strcasecmp for color names (11 failures) and wrong exit code on invalid color (8 failures)
    • GPT-5.5 default (medium reasoning) 'barely beat Claude Sonnet 4.6' — the wins come from running it at higher reasoning levels
    Provenance
    Article · Supporting source
  8. 8

    Give Your Agent a Computer — Nico Albanese, Vercel

    Video Nico Albanese — Vercel — Developer-relations lead on the Vercel AI SDK

    The big assumption that goes through every single AI SDK API decision is that we want the agent definition to be the source of truth that everything else inherits from.

    www.youtube.com/watch?v=wflNENRSUb4 →
    Details
    Cited text
    The big assumption that goes through every single AI SDK API decision is that we want the agent definition to be the source of truth that everything else inherits from.
    Context
    The pattern matters more than the SDK: agent quality is increasingly bottlenecked on what the agent can write to and read from between turns, not on the model. Vercel framing the file system as a behavioral primitive is consistent with what every serious agent shop has been quietly converging on.
    Key points
    • AI SDK 6 ships an object-oriented agent primitive (toolLoopAgent) plus end-to-end type inference from agent definition to UI message rendering
    • Vercel's internal claim: giving an agent a file system didn't just add storage — it changed the agent's behavior; it followed through on long tasks and built on its own prior work
    • Workshop walks through a tool-loop agent with bash, memories.md, and persistent named sandboxes
    • Three primitives Albanese says define agent building in 2026: an agent runtime, a tool set, and a computer or sandbox for state and code execution
    Provenance
    Video · Supporting source
  9. 9

    Needle: a 26M-parameter tool-calling model distilled from Gemini

    Article Henrie_the_dreamer (LocalLLaMA)

    If tool calling really decomposes to retrieve-and-assemble, then the right architecture for the tool-loop step is small and specialized, not the same monolith that does reasoning. Cheap, fast, on-device tool routing is…

    www.reddit.com/r/LocalLLaMA/comments/1tb9b0… →
    Details
    Context
    If tool calling really decomposes to retrieve-and-assemble, then the right architecture for the tool-loop step is small and specialized, not the same monolith that does reasoning. Cheap, fast, on-device tool routing is what makes always-on local agents plausible.
    Key points
    • 26M parameter open-weights function-calling model targeted at consumer phones
    • Reported throughput: 6000 tokens per second prefill, 1200 tokens per second decode on consumer devices
    • Distilled from Gemini tool-calling traces; argues that tool calling is essentially retrieval-and-assembly rather than reasoning
    • Aimed at agentic workflows on budget Android hardware where a 7B-plus model is impractical
    Provenance
    Article · Supporting source
  10. 10

    Isomorphic Labs announces $2.1B Series B

    Article Isomorphic Labs — Alphabet-spinout AI drug discovery company led by Demis Hassabis

    A two-billion-dollar Series B in AI drug design — with a UK Sovereign AI Fund check in it — is a signal about where AI capital is willing to leave the chat-app gold rush and bet on bench science. Worth knowing about eve…

    www.isomorphiclabs.com/articles/isomorphic-… →
    Details
    Context
    A two-billion-dollar Series B in AI drug design — with a UK Sovereign AI Fund check in it — is a signal about where AI capital is willing to leave the chat-app gold rush and bet on bench science. Worth knowing about even if you'll never write a peptide.
    Key points
    • Series B totals $2.1 billion, led by Thrive Capital
    • New investors include MGX, Temasek, CapitalG, and the UK Sovereign AI Fund; existing backers Alphabet and GV participated
    • Capital is earmarked for the IsoDDE drug design engine and the company's pipeline of drug candidates
    • First major outside funding round for the AlphaFold-descended company since its 2021 spinout
    Provenance
    Article · Supporting source
  11. 11

    My graduation cap runs Rust

    Article Eric Park

    A small, fun reminder that the joy of building things is still the joy of building things. No agent needed, no benchmark involved — just a Rust toolchain, a reed switch, and a graduating senior writing better prose abou…

    ericswpark.com/blog/2026/2026-05-12-my-grad… →
    Details
    Context
    A small, fun reminder that the joy of building things is still the joy of building things. No agent needed, no benchmark involved — just a Rust toolchain, a reed switch, and a graduating senior writing better prose about it than most product blogs.
    Key points
    • A Purdue senior built a Digispark ATtiny85 + 48 WS2812B LED cap, triggered by a reed switch and magnet as the tassel moves
    • Wrote firmware in Rust, forking avr-hal and ws2812-avr to support the ATtiny85 and a 16MHz clock
    • Coding took 2 hours; hardware took 3+ hours — confirming the universal truth that hardware is always the slow part
    • Doesn't plan to wear it: 'It looks like what kids would think of as a gaming PC and what boomers would think of as a seizure.'
    Provenance
    Article · Supporting source
  12. 12

    Thacker on the GitHub bug-bounty backlash

    X @rez0__

    It's painfully obvious that it's the second in this case. And also, they probably haven't paid the majority of any valid bugs submitted.

    x.com/rez0__/status/2054529796672041113 →
    Details
    Cited text
    It's painfully obvious that it's the second in this case. And also, they probably haven't paid the majority of any valid bugs submitted.
    Context
    The flip side of the FUZZ-E story. Same agentic-coding wave; opposite quality distribution. Bounty programs are being drowned in low-effort AI reports while a tuned hackbot pulls real CVEs from Magento overnight — the gap is in the operator, not the model.
    Key points
    • Discourse around 325 GitHub bug-bounty submissions splits two ways: 'GitHub pays too little' versus 'most aren't valid'
    • Thacker's read: most aren't valid — the AI-generated submissions are noise — and the few real ones still aren't getting paid
    Provenance
    Tweet · Primary source
  13. 13

    Nick Cammarata on identity in an AI-doubling world

    X @nickcammarata — Nick Cammarata — ex-OpenAI researcher

    Everyone is handling AI doubling every fourteen hours surprisingly well. They mostly just dropped it and work out more.

    x.com/nickcammarata/status/2054492840668123… →
    Details
    Cited text
    Everyone is handling AI doubling every fourteen hours surprisingly well. They mostly just dropped it and work out more.
    Context
    A one-liner that captures the emotional weather underneath the news cycle. Not a claim about model capabilities; a claim about the people who used to derive worth from the part of the job a model now does in seconds.
    Key points
    • Comic exaggeration on capability-doubling — 'every fourteen hours' is deliberately absurd
    • Pointing at a real felt-sense among researchers: identity built on being smart is unstable when capability moves faster than you can
    Provenance
    Tweet · Primary source