Archive BRAID
Mostly-work, malicious npm, and one engineer replacing a law firm / DISPATCH 031
PDF RSS

Dispatch 031 · 2026-05-19 GSV Mostly-Work Is The Job

Mostly-work, malicious npm, and one engineer replacing a law firm

/ 00:28:26 / 11 sources

“The Cabinet Office was about to spend one and a half million pounds on an outside law firm. One engineer with a Gemini API key did the same job in two and a half weeks.”

— Lenar Kess, today's narration

A six-month overview from Simon Willison anchors the day: coding agents crossed from often-work to mostly-work in November, and laptop-class models started outrunning expectations. Then a fresh npm supply-chain attack — 637 malicious versions in 22 minutes — that for the first time specifically hijacks Claude Code and Codex agent hooks for persistence. Plus a Number 10 talk on replacing a one-and-a-half-million-pound law-firm contract with one embedded engineer, an editor-layer company renting xAI's Colossus 2, Ethan Mollick on insourcing, the full GenMedia pipeline running for a dollar a book, Daniel Griesser's pi-config skill repo, and two obituaries that hit the Unix world in the same week.

Chapters

  1. 00:00:04 PyCon, pelicans, and the November inflection
  2. 00:03:36 Mini Shai-Hulud and your agent hooks
  3. 00:08:47 Prime Intellect tries to automate the environment
  4. 00:11:30 Rewiring the state from Number 10
  5. 00:15:30 Cursor's Compose on Colossus
  6. 00:17:33 Mollick on insourcing
  7. 00:19:53 A dollar a book
  8. 00:23:34 Daniel Griesser's pi-config
  9. 00:25:50 Peter Neumann and Peter Salus

Sources

11 cited
  1. 1

    The last six months in LLMs in five minutes

    Article Simon Willison — Co-creator of Django; longtime LLM observer whose annotated benchmarks (the pelican-on-a-bicycle test) have become industry-standard vibe checks

    Coding agents went from often-work to mostly-work, crossing a quality barrier where you could use them as a daily-driver to get real work done.

    simonwillison.net/2026/May/19/5-minute-llms →
    Details
    Cited text
    Coding agents went from often-work to mostly-work, crossing a quality barrier where you could use them as a daily-driver to get real work done.
    Excerpt
    Annotated slides from a five-minute lightning talk at PyCon US 2026 summarizing six months of LLM developments — framed around a November 2025 inflection point where coding agents went from often-work to mostly-work.
    Context
    Best single overview of where coding agents and open-weights models actually landed over the last six months, from a writer who has been right about the trend lines more often than not. Sets up most of today's episode.
    Key points
    • November 2025 was an inflection point: the 'best' model changed hands five times in a single month between Sonnet 4.5, GPT-5.1, Gemini 3, GPT-5.1 Codex Max, and Opus 4.5
    • The actual story is that Anthropic's and OpenAI's investment in RLVR with their agent harnesses (Codex, Claude Code) finally produced agents reliable enough to use as daily drivers
    • OpenClaw (first commit at end of November as 'Warelay') became a generic category — 'Claws' — by February, with Mac Minis selling out as the local hardware to run them
    • Open-weights side: Google's Gemma 4 is the most capable US open-weight model Simon has seen; Chinese lab GLM's GLM-5.1 is a 1.5TB monster that produces a credibly animated North Virginia Opossum on an e-scooter
    • Two big themes for the six months: coding agents got reliable, and laptop-class models started wildly outperforming expectations
    Provenance
    Article · Supporting source
  2. 2

    Mini Shai-Hulud Strikes Again: 317 npm Packages Compromised

    Article SafeDep Team — Open-source supply-chain security firm; published the same-day forensic writeup of the SAP compromise three weeks earlier

    The payload hijacks Claude Code and Codex by injecting SessionStart hooks that re-execute the malware on every AI session, both locally and via commits to accessible GitHub repositories.

    safedep.io/mini-shai-hulud-strikes-again-31… →
    Details
    Cited text
    The payload hijacks Claude Code and Codex by injecting SessionStart hooks that re-execute the malware on every AI session, both locally and via commits to accessible GitHub repositories.
    Excerpt
    The npm account 'atool' was compromised on May 19, 2026. The attacker published 637 malicious versions across 317 packages in a 22-minute automated burst, affecting more than 15 million monthly downloads — and the payload specifically targets Claude Code, Codex, and VS Code agent hooks for persistence.
    Context
    Second Shai-Hulud variant in three weeks. The novel piece is agent-hook persistence: anyone running Claude Code or Codex with skill/hook auto-loading is now treating their .claude directory as a credential surface whether they realize it or not.
    Key points
    • 637 malicious versions across 317 packages including size-sensor (4.2M/mo), echarts-for-react (3.8M/mo), timeago.js (1.15M/mo), and most @antv scoped packages — pushed in 22 minutes via automated burst
    • Semver ranges (e.g. ^3.0.6) auto-resolve to the malicious versions because npm picks highest matching version regardless of where 'latest' tag points
    • New persistence vector: injects SessionStart hooks into .claude/settings.json and Codex hooks so the payload re-runs on every AI session; also drops .vscode/tasks.json with runOn:folderOpen
    • Persistent C2 via systemd/LaunchAgent ('kitty-monitor') polls GitHub commit search API hourly for RSA-signed commands tagged with keyword 'firedalazer'
    • Dual exfiltration: stolen credentials committed as Git blobs to Dune-themed public repos (sardaukar-sandworm-742) plus RSA+AES POSTs disguised as OpenTelemetry traces to t.m-kosche.com
    Provenance
    Article · Supporting source
  3. 3

    General-Agent: self-evolving synthetic RL environment

    X @PrimeIntellect — Distributed-training lab known for INTELLECT-1 and decentralized RL infrastructure

    The next step toward automating AI is automating RL environments.

    x.com/PrimeIntellect/status/205656987716780… →
    Details
    Cited text
    The next step toward automating AI is automating RL environments.
    Context
    If synthetic environment generation actually produces durable agent skills, the post-training advantage Anthropic and OpenAI built mostly through env engineering loses some scarcity. If it doesn't, this is another reminder that environments are the bottleneck.
    Key points
    • Pitches a 'fully synthetic environment whose task corpus self-evolves and grows harder over time'
    • Initial scope: 4,504 tool-use tasks across 1,040 domains with 8,159 unique tools
    • Targets the bottleneck of post-training: environment coverage, which currently requires large hand-built engineering teams inside Anthropic and OpenAI
    • If real, lets smaller labs match the agentic capability of frontier labs without the post-training env headcount
    • Falsifiable test: does a fine-tune on General-Agent beat a comparable hand-curated env on a known agent benchmark?
    Engagement
    850 likes · 118 retweets · 37 replies
    Provenance
    Tweet · Primary source
  4. 4

    Rewiring the State — Eoin Mulgrew, 10 Downing Street

    Video Eoin Mulgrew (10 Downing Street, Number 10 Data Science) — Runs cross-government transformation and the fellowship program inside Number 10's data science team

    We do want to recruit missionaries, not mercenaries — a paycheck is not going to get you out of bed when stuff gets hard.

    www.youtube.com/watch?v=ObNKGf9YR0g →
    Details
    Cited text
    We do want to recruit missionaries, not mercenaries — a paycheck is not going to get you out of bed when stuff gets hard.
    Context
    The most concrete account I've seen of what AI engineering inside a government actually looks like. The forward-deployed-engineer pattern from the labs is being applied to the state, with named projects and named departments rather than slideware.
    Key points
    • The Cabinet Office was about to spend £1.5M on an outside law firm to analyze the UK statute book; one embedded engineer built the tool in roughly two weeks and the in-house legal team now runs it on demand
    • Insurgent unit model: a small team at the centre of government with political cover to ship in 2-3 weeks instead of the typical year-plus discovery phase
    • Recruit exclusively outsiders (YC founders, big-tech, research labs) at ~0.7-0.8% acceptance; fellows then spin up parallel teams (Incubator for AI in DSIT, Just AI in Ministry of Justice)
    • Concrete public-service scale: 7.25M people on NHS waiting lists, 350K court cases stuck in backlog, only 1-in-5 planning applications decided on time; Tony Blair Institute estimates £40B/year in achievable productivity gains
    • Extract: a DeepMind/Gemini-built tool that digitizes planning applications including handwritten maps, launched by the PM at London Tech Week, rolling out to every English local authority
    Provenance
    Video · Supporting source
  5. 5

    Compose 2.5 by Cursor was trained at xAI's Colossus 2

    X @techdevnotes — Anonymous developer news account with a track record of accurate but unsourced infrastructure leaks

    Compose 2.5 by Cursor was trained at xAI's Colossus 2.

    x.com/techdevnotes/status/20565439400529102… →
    Details
    Cited text
    Compose 2.5 by Cursor was trained at xAI's Colossus 2.
    Context
    If it holds up, the application layer is starting to spend real money on training, and the lab most strategically positioned to sell that compute is xAI. That second observation may matter more than this month's model releases.
    Key points
    • Unconfirmed claim: Cursor's Compose 2.5 coding model was trained on xAI's Colossus 2 supercluster
    • If accurate, this is the first publicly visible case of an editor-layer company renting frontier-scale training compute from a frontier lab
    • Reframes the application-layer-doesn't-train assumption: category-leading products can buy frontier compute when they want it
    • Notable strategic shape for xAI — selling cluster time to companies that compete with Anthropic and OpenAI's first-party coding agents
    • Awaiting confirmation from Cursor or xAI; currently a single tweet from a generally accurate account
    Engagement
    473 likes · 27 retweets
    Provenance
    Tweet · Primary source
  6. 6

    Insourcing via hiring as an AI-driven trend

    X @emollick (Ethan Mollick) — Wharton professor; author of Co-Intelligence; one of the most widely read independent voices on enterprise AI adoption

    Why pay so many outside vendors (legal, marketing, software vendors) when you can hire in-house and harness AI productivity gains yourself?

    x.com/emollick/status/2056578946813100173 →
    Details
    Cited text
    Why pay so many outside vendors (legal, marketing, software vendors) when you can hire in-house and harness AI productivity gains yourself?
    Context
    If the pattern holds, the high-margin professional services category gets squeezed first. For builders inside non-tech orgs, this rewrites what is reasonable to ship internally rather than outsource.
    Key points
    • Reports talking to executives at large companies already insourcing functions they used to buy from outside vendors
    • The economic logic of vendors (specialization + amortized tooling) erodes when in-house teams get dramatic agent-driven productivity gains
    • Categories most exposed: outside legal counsel for routine work, mid-tier marketing agencies, integration-heavy software vendors
    • Second-order question: what happens to consulting and contracting when a Fortune 500 can hire two senior engineers running Claude Code and match an Accenture engagement on throughput
    • For senior engineers inside non-tech companies, the case for shipping in-house just changed from ideology to budget
    Engagement
    301 likes · 26 retweets
    Provenance
    Tweet · Primary source
  7. 7

    Let's go Bananas with GenMedia — Guillaume Vernade, Google DeepMind

    Video Guillaume Vernade (Google DeepMind) — Developer advocate at Google DeepMind; ex-Stadia producer; works on GenMedia API surface

    Each model has its own set of API. It doesn't make any sense — a developer should just swap the model name and it works.

    www.youtube.com/watch?v=BcWFc3H7Khg →
    Details
    Cited text
    Each model has its own set of API. It doesn't make any sense — a developer should just swap the model name and it works.
    Context
    Concrete picture of where the gen-media stack actually sits in May 2026: a multi-modal pipeline runs for a dollar, server-side context caching is about to be table stakes, and the per-call dial for cost-vs-latency is becoming a first-class API feature.
    Key points
    • Live demo: Gemini reads Wind in the Willows in one shot, generates structured prompts; Nano Banana 2 renders portraits and scenes; Veo animates them; Lyria composes per-chapter music; Gemini TTS reads dialogue
    • Veo 3.1 Light shipped last week at 5¢ per second of video — 40¢ for an 8-second clip — making the whole pipeline run for roughly $1 per book
    • Interactions API in preview: server-side context caching with ~2-day session TTL, eliminates re-uploading large documents on every call; Vernade expects this to become the default at next I/O
    • Lyria Real Time is a predict-based (not diffusion) continuous-generation music model with ~2-second prompt-update latency — DJ-mixable from an agent loop
    • Three pricing tiers on the Gemini API now: normal, flex at 50% off with 2-5 minute delays, priority at 2x cost for fast-track — useful for separating agent-batch from user-facing latency-sensitive calls
    Provenance
    Video · Supporting source
  8. 8

    pi-config: Plan skill, handoff skill, and subagents for Claude Code

    X @DanielGri (Daniel Griesser) — Engineer at Sentry; open-source practitioner of agent-skill patterns

    Don't copy, get inspired.

    x.com/DanielGri/status/2056676488183689620 →
    Details
    Cited text
    Don't copy, get inspired.
    Context
    Cleanest worked example to date of the skill-file-as-primary-artifact approach to agent orchestration. If you've been on the fence about writing your own SKILL.md, this is the reference to read first.
    Key points
    • Open repo (HazAT/pi-config) with three production-tested skill artifacts: a Plan SKILL.md for larger tasks, a handoff SKILL.md for context-window-exhaustion handoff between agents, and a directory of named subagents
    • Pattern is converging across experienced operators: skills as markdown files survive model upgrades better than tool-use frameworks that have to be rebuilt with each protocol change
    • Integration with Aaron Francis's Soloterm: a Pi extension that lets the subagents drive Soloterm sessions, treating the terminal multiplexer as the shared substrate
    • Mario Zechner reposted, signalling cross-builder adoption of the skill-file pattern
    • Daniel's framing: don't copy 1:1, treat it as inspiration for your own Plan/handoff conventions
    Provenance
    Tweet · Primary source
  9. 9

    Peter Neumann has died

    Article Dan Cross via TUHS, forwarding Tom Van Vleck and Robert Watson — Notice forwarded from the Multicians mailing list to The Unix Heritage Society

    Peter Neumann passed away in his sleep on Sunday night at the hospital in Santa Clara from complications arising from his fall and subsequent surgery a few weeks ago.

    www.tuhs.org/pipermail/tuhs/2026-May/033748… →
    Details
    Excerpt
    Peter Neumann passed away in his sleep on Sunday night at the hospital in Santa Clara from complications arising from his fall and subsequent surgery a few weeks ago.
    Context
    Neumann spent decades cataloguing supply-chain and software-risk failures decades before npm packages existed. The Mini Shai-Hulud writeup earlier in today's episode is the exact kind of incident his RISKS column would have annotated and contextualized.
    Key points
    • SRI Computer Science Lab senior researcher for half a century
    • Founded and moderated RISKS Digest (comp.risks) from 1985, shaping how a generation of engineers thought about computer-related risk
    • Worked on Multics; author of Computer-Related Risks (1995), still standard reading for understanding why software systems harm people
    • Accomplished pianist and French-horn player — was listening to classical music with his daughter Hellie at the hospital before he died
    • SRI is expected to host a memorial service in Menlo Park within the next month
    Provenance
    Article · Supporting source
  10. 10

    RIP Peter Salus

    Article Dan Cross via TUHS — Notice on The Unix Heritage Society mailing list

    Peter Salus passed away on May 15. His 'Quarter Century of Unix' is required reading for any serious student of Unix history.

    www.tuhs.org/pipermail/tuhs/2026-May/033750… →
    Details
    Excerpt
    Peter Salus passed away on May 15. His 'Quarter Century of Unix' is required reading for any serious student of Unix history.
    Context
    Salus is the reason we have a clear written history of how Unix actually evolved. Losing him and Neumann in the same week is a real shift in the institutional memory of the field.
    Key points
    • Died May 15, 2026
    • Author of A Quarter Century of Unix (1994), the canonical history of Unix from Bell Labs through the workstation era
    • Documented the Unix wars, BSD lineage, the AT&T lawsuit, and the rise of Linux while most of the principals were still alive to interview
    • Two of the Unix-era historians and stewards gone in the same week (Neumann May 17, Salus May 15)
    Provenance
    Article · Supporting source
  11. 11

    Pope Leo XIV's first encyclical Magnifica humanitas to be published May 25

    Article Vatican News — Official Vatican communications office

    First major encyclical from this pope and reportedly centered on AI. We made an on-air commitment to cover it Monday — confirms timing for next week.

    www.vaticannews.va/en/pope/news/2026-05/pop… →
    Details
    Context
    First major encyclical from this pope and reportedly centered on AI. We made an on-air commitment to cover it Monday — confirms timing for next week.
    Key points
    • Pope Leo XIV's first encyclical, Magnifica humanitas, is confirmed for publication on May 25, 2026
    • Expected to address AI's relationship to human dignity, work, and creative life — drafted with theological consultation including from technical advisors
    • We promised on yesterday's episode to cover the release on the day; this confirms the date
    Provenance
    Article · Supporting source