◆ Dispatch 024 · 2026-05-12 GSV Self-Spreading Configuration
When Your Editor Becomes the Worm
“The attackers didn't just compromise packages — they turned the developer's own editor into a re-infection surface.”
— Lenar Kess, today's narration
A coordinated npm and PyPI campaign turned Claude Code and VS Code config files into a self-spreading vector, Mira Murati's lab put out its first model and it is an argument with the hands-off-keyboard doctrine, and matklad explains why rust-analyzer's build system is really an org chart. Plus a small rant about cursors, and two builds from the LocalLLaMA subreddit that keep pushing the local-frontier line by hand.
- The mini-shai-hulud campaign — 170 packages, 404 versions, and a self-spreading IDE payload
- Thinking Machines ships TML-Interaction-Small and picks a fight with the autonomous-agent doctrine
- matklad on architecture, incentives, and the build system as a social filter
- Don't hijack the mouse pointer — and what it tells us about cheap effects
- A 1-trillion-parameter Kimi K2.5 running on Intel Optane DIMMs at 4 tokens a second
- Unsloth releases Qwen3.6 with the multi-token prediction layer preserved
Chapters
- 00:00:04 A worm that travels by configuration file
- 00:05:56 Mira Murati's lab picks a fight with autonomy
- 00:11:11 matklad on architecture as a social filter
- 00:15:59 Please stop hijacking my mouse pointer
- 00:17:47 Optane DIMMs, Qwen3.6 MTP, and the local frontier
- 00:21:17 Closing
Sources
6 cited-
1
Mass Supply Chain Attack Hits TanStack, Mistral AI npm and PyPI Packages
Article SafeDep Team — Supply-chain security vendor that flagged the burst from its malware detection pipeline
The attacker designed this as a self-spreading vector that targets Claude Code and VS Code users.
safedep.io/mass-npm-supply-chain-attack-tan… →Details
- Cited text
The attacker designed this as a self-spreading vector that targets Claude Code and VS Code users.
- Context
- First mass campaign that explicitly weaponizes AI-coding agent configuration files for propagation, and the first single campaign to span npm and PyPI together. The IDE-poisoning loop turns every cloned repo on a developer's machine into a re-infection surface.
- Key points
- 404 malicious versions published across 170+ npm packages and 2 PyPI packages in a five-hour window on May 11
- Entire TanStack router scope, all three Mistral AI SDKs, the @uipath scope (65 packages), OpenSearch's official npm client, and Guardrails AI were all compromised
- Two trigger styles: Mistral packages used a preinstall hook downloading Bun then a payload; TanStack used an optionalDependency pointing at a malicious commit in the real tanstack/router GitHub repo
- Payload drops .claude/settings.json, .claude/setup.mjs, .vscode/tasks.json into victim repos and pushes them via GitHub's createCommitOnBranch GraphQL mutation — Claude Code and VS Code become re-infection vectors
- Exfiltration runs over the Session onion-routed messenger network with no fixed C2 domain to seize
- Credential providers target AWS IAM (via 169.254.169.254), HashiCorp Vault on localhost:8200, ghp_/gho_/ghs_/npm_ tokens, and GitHub Actions OIDC
- Provenance
- Article · Supporting source
-
2
Interaction Models: A Scalable Approach to Human-AI Collaboration
Article Thinking Machines Lab — Mira Murati's lab — the ex-OpenAI CTO's research org, posting its first major model
humans increasingly get pushed out not because the work doesn't need them, but because the interface has no room for them.
thinkingmachines.ai/blog/interaction-models →Details
- Cited text
humans increasingly get pushed out not because the work doesn't need them, but because the interface has no room for them.
- Context
- A direct challenge to the hands-off-keyboard agent doctrine the bigger labs have leaned into. Murati's team argues the bottleneck is interface bandwidth, not model intelligence, and is shipping architecture to back that up.
- Key points
- First research-preview model, TML-Interaction-Small, trained from scratch around a multi-stream micro-turn design rather than turn-based prompts
- 200ms input chunks interleave with 200ms output chunks, so the model can listen, speak, watch, and call tools concurrently
- Pairs a real-time interaction model with an asynchronous background model for sustained reasoning and tool use; the interaction model stays present while the heavy work runs
- Encoder-free early fusion: audio as dMel, video as 40x40 hMLP patches, all co-trained with the transformer
- Beats GPT-realtime-2.0 (minimal) and Gemini-3.1-flash-live on FD-bench v1.5 and Audio MultiChallenge at 0.40s turn-taking latency
- Introduces new benchmarks TimeSpeak and CueSpeak for proactive speech (e.g. 'remind me to breathe every 4 seconds', 'correct mispronunciations as you hear them')
- Provenance
- Article · Supporting source
-
3
Learning Software Architecture
Article matklad (Aleksey Kladov) — Original author of rust-analyzer; previously on IntelliJ Rust; now at TigerBeetle
we talk about programming like it is about writing code, but the code ends up being less important than the architecture, and the architecture ends up being less important than social issues.
matklad.github.io/2026/05/12/software-archi… →Details
- Cited text
we talk about programming like it is about writing code, but the code ends up being less important than the architecture, and the architecture ends up being less important than social issues.
- Context
- A working architect explaining how he actually picks technical constraints to shape the social system around a codebase. Useful counterweight to architecture-as-diagram thinking.
- Key points
- Architecture is downstream of incentives, which are downstream of org structure — Conway's law as the real curriculum
- rust-analyzer's no-rustc-build, no-C-deps, seconds-long test suite was a deliberate move to attract deep contributors
- Features were sandboxed with catch_unwind and required to work on immutable snapshots, so weekend contributors could ship without poisoning the core
- 'Speedrun the four stages of grief to acceptance' on incentives you can't change
- Recommends Bernhardt's Boundaries talk, Pieter Hintjens / ZeroMQ writing, Jamii's 'Reflections on a decade of coding', Ted Kaminski's notes
- Provenance
- Article · Supporting source
-
4
Don't Hijack My Mouse Pointer
Article Rukshan — Independent web developer
before vibe-coding it was difficult and time consuming to implement such fancy effects, but now it takes a single prompt and a few hundred tokens, and you have a fancy effect instead of the nice pointer.
ruky.me/dont-hijack-my-pointer →Details
- Cited text
before vibe-coding it was difficult and time consuming to implement such fancy effects, but now it takes a single prompt and a few hundred tokens, and you have a fancy effect instead of the nice pointer.
- Context
- A small, sharp example of the second-order effect of agentic coding: when a junky pattern becomes one prompt away, taste and restraint become more load-bearing than skill.
- Key points
- Vibe-coding has dropped the cost of custom cursor effects from hours to a single prompt
- Sites are replacing the OS pointer with bespoke designs that hurt click accuracy and discoverability
- The pointer's slight tilt and pointed tip exist for accumulated UI reasons — they were not arbitrary
- Lower implementation cost shifts the constraint from 'can I build it' to 'should I build it'
- Provenance
- Article · Supporting source
-
5
Computer build using Intel Optane Persistent Memory — running 1T-parameter Kimi K2.5 at ~4 tokens/sec
Article APFrisco — r/LocalLLaMA builder
A reminder that the local-LLM frontier is being pushed forward by people scavenging discontinued enterprise hardware, not buying new GPUs.
www.reddit.com/r/LocalLLaMA/comments/1taeg8… →Details
- Context
- A reminder that the local-LLM frontier is being pushed forward by people scavenging discontinued enterprise hardware, not buying new GPUs.
- Key points
- Runs a 1 trillion parameter Kimi K2.5 locally at about 4 tokens/second
- Uses discontinued Intel Optane Persistent Memory DIMMs — a tier between DRAM and SSD
- Surfaces a workaround for the memory-capacity wall blocking local frontier-scale models
- 633 upvotes and 107 comments on r/LocalLLaMA — community read is that this is a real path, not a meme
- Provenance
- Article · Supporting source
-
6
MTP on Unsloth — Qwen3.6 27B and 35B-A3B GGUFs with preserved multi-token prediction layers
Article Altruistic_Heat_9531
Multi-token prediction baked into the released checkpoint, instead of bolted on with a separate draft model, is the cleanest version of speculative decoding for local rigs.
www.reddit.com/r/LocalLLaMA/comments/1ta4rv… →Details
- Context
- Multi-token prediction baked into the released checkpoint, instead of bolted on with a separate draft model, is the cleanest version of speculative decoding for local rigs.
- Key points
- Unsloth published Qwen3.6 27B and 35B-A3B quantizations with the multi-token prediction (MTP) head preserved
- Requires the in-flight llama.cpp MTP PR to actually use the speedup
- 420 upvotes, 141 comments — local-LLM users want speculative-style speedups from the base model itself
- Provenance
- Article · Supporting source
A worm that travels by configuration file
00:00:04 Sometime late Monday night, somebody pushed 404 malicious package versions across 170 npm packages and 2 PyPI packages in a five-hour window. They hit every router in the TanStack scope — React, Vue, Solid, the devtools, and the SSR plugins. They hit all three Mistral AI SDKs, the npm ones and the PyPI one.
00:00:26 They hit the entire UiPath scope, sixty-five packages in a single burst. They hit OpenSearch's official npm client, which gets one-point-three million weekly downloads. And they hit Guardrails AI on PyPI, which is an irony I'll leave alone. StepSecurity and Socket are tracking this as mini-shai-hulud.
00:00:48 SafeDep published the deepest writeup so far, and that's the one I'd point you at — it's linked in the show notes. The headline number is bad enough. The mechanism is what I want to spend a minute on. Two different trigger styles. The Mistral packages stripped out the legitimate build scripts and replaced them with a single preinstall hook that ran a setup script through Bun.
00:01:15 The hook downloads a Bun runtime binary straight from the official oven-sh GitHub releases, then runs the payload through Bun. Smart, in a grim way: Bun is on developers' allowlists, GitHub releases are on developers' allowlists, and the binary signature is the real Bun binary.
00:01:36 The TanStack packages were stealthier. The package.json scripts block was left alone. Instead, the attackers added one entry to the optional-dependencies list, pointing at a TanStack setup package — at a real, specific commit in the real TanStack router GitHub repository.
00:01:55 That commit, which GitHub has since deleted, contained a package.json with a prepare script that ran the same payload via Bun and then forced an error exit, so the loader output wouldn't show up in the install log. Which means the attackers had write access to the TanStack GitHub repository on top of the npm publish tokens.
00:02:19 That's two compromises, not one, in the same campaign. The payload itself is a two-megabyte obfuscated JavaScript file. It carries a credential-stealing framework with dedicated providers for AWS IAM — it hits the metadata endpoint at one-six-nine-dot-two-five-four — and for HashiCorp Vault on local-host port eight-two-zero-zero.
00:02:43 It also scoops up GitHub personal-access tokens, OAuth and server tokens, npm publish tokens, and GitHub Actions OIDC tokens. Standard cloud-credential bingo card. The exfiltration is the interesting part. There's no fixed command-and-control domain to seize. The payload embeds a full Session messenger client — Session is the onion-routed encrypted messenger built on the Oxen network — and routes stolen credentials through Session's swarm of service nodes.
00:03:17 The only static piece of infrastructure is the seed-node bootstrap, with TLS certs pinned in the binary. You can't take this down by yanking a domain. And then there's the propagation step, which should keep you up tonight. The payload reads the GitHub-repository environment variable, enumerates the branches of the victim's repo through the GitHub GraphQL API, filters out anything that looks like main or master or release, and then uses a GraphQL mutation to push files into feature branches.
00:03:53 It pushes a settings file, a setup script, and a router-runtime file into the dot-claude directory, plus a tasks file into the dot-vscode directory. It also pushes a complete copy of the obfuscated payload into dot-claude, using a Bun runtime trick to read its own running binary off disk.
00:04:14 Which means the next developer who clones that branch, opens it in VS Code, or runs Claude Code against it, executes the payload locally with no further user action. Cloning is enough. Opening is enough. The attackers didn't just compromise packages — they turned the developer's own editor into a re-infection surface.
00:04:37 We've spent a year telling each other that the agent-in-the-editor pattern moves the trust boundary. This campaign is the first I've seen that picks up that boundary and walks through it on purpose. One more wrinkle. The Mistral preinstall hook references the payload under one filename, but the actual file in the Mistral tarball was renamed.
00:05:02 So the Mistral hook crashes at runtime. The attackers didn't even bother to rename the constant before publishing to a different vendor's scope. That tells you a lot about the operational sloppiness, and a little about the speed: this was a single template, fired at every scope they had a token for, in under five hours.
00:05:26 If you've pulled anything from those scopes since Monday evening Pacific time, do three things. Rotate your AWS, npm, and GitHub tokens. Audit the dot-claude and dot-vscode directories in every repo you touched. And check your feature branches for commits with co-authored-by trailers you don't recognize.
00:05:48 PyPI has quarantined the Mistral and Guardrails packages entirely. npm has been rolling cleanups all morning.
Mira Murati's lab picks a fight with autonomy
00:05:56 Mira Murati's Thinking Machines Lab dropped its first research preview yesterday, and it isn't a chat model. They're calling it an interaction model. The release is TML-Interaction-Small, and the thesis is sharp enough that I want to read you the one sentence that anchors it.
00:06:15 They write: 'humans increasingly get pushed out not because the work doesn't need them, but because the interface has no room for them.' The post quotes — without naming — a recent frontier model card that admits 'when used in an interactive, synchronous, hands-on-keyboard pattern, the benefits of the model were less clear,' and then says autonomous long-running harnesses 'better elicited the model's coding capabilities.' Thinking Machines reads that line as a confession, not a feature.
00:06:56 Their argument is that the model is harder to use synchronously because the architecture treats turn-taking as someone else's problem, bolted on with a voice-activity-detection harness and a turn-boundary predictor. The architecture is the interesting part. Instead of the model consuming a full user turn and producing a full response, they split everything into two-hundred-millisecond micro-turns.
00:07:24 Two hundred milliseconds of input, two hundred of output, interleaved. Audio, video, and text are all streams. There's no encoder model in front; audio comes in as discrete mel features through a light embedding layer, video frames are split into forty-by-forty patches through a hierarchical MLP, and all of that gets co-trained from scratch with the transformer.
00:07:49 The second piece is what they call the background model. The interaction model handles the live thread — listening, watching, talking, and taking new input. When a task needs deeper reasoning or longer tool use, the interaction model delegates to the background model, which runs asynchronously and streams results back in.
00:08:12 The interaction model is still present the whole time. You can interrupt it, change your mind, or ask a follow-up, and the heavy work keeps running in the background and folds in when it's ready. That's a different shape than the 'fire and forget the autonomous agent' loop most of us have been building.
00:08:33 The benchmarks they ran make the picture honest. On Audio MultiChallenge, TML-Interaction-Small scores forty-three-point-four, which beats GPT-realtime-2.0 on minimal and Gemini-3.1-flash-live on minimal, and loses to GPT-realtime-2.0 on extra-high — which is the thinking setting.
00:08:53 On FD-bench v1.5 it scores seventy-seven-point-eight against everyone else in the forties. Turn-taking latency comes in at four-tenths of a second. The honest read: it dominates instant interactivity, and it's competitive but not state-of-the-art on the deep-thinking benchmarks unless they let the background model wake up.
00:09:16 What got my attention is two new benchmarks they introduce, called TimeSpeak and CueSpeak.
00:09:22 TimeSpeak asks the model to initiate speech at a specified time with the correct content — 'remind me to breathe in and out every four seconds until I ask you to stop.' CueSpeak asks the model to speak at the appropriate moment in response to something the user is doing — 'every time I codeswitch into another language, give me the correct word in the original language.' Those are things a turn-based model with a voice-activity-detection harness simply can't do, because they require the model to decide to speak while you're still speaking.
00:10:00 If you build coding agents for a living, this is worth thinking about, because the same idea generalizes. 'Interrupt me when I'm writing a bug' is the CueSpeak shape applied to your editor. 'Tell me when this test has been running for thirty seconds without progress' is the TimeSpeak shape applied to your terminal.
00:10:23 We've been treating the model as a turn-taker because the model was built as a turn-taker. Thinking Machines is asking what you get if the unit of interaction is two hundred milliseconds and the model is allowed to interject. I don't yet know if any of this matters for how you ship code tomorrow morning.
00:10:44 The model is a research preview, not an API. But the framing is the most interesting framing of the autonomy debate I've read this year. The big-lab pitch is 'give the model a task, walk away, come back.' The Thinking Machines pitch is 'stay at the keyboard, and let the model stay in the room with you.' Those are different bets on what work feels like when the model is actually good.
matklad on architecture as a social filter
00:11:11 Aleksey Kladov — matklad — posted a short letter today on learning software architecture, originally written in reply to a researcher physicist who emailed him asking how to get better at design. It's one thousand words. It hit the front page of Hacker News with about two hundred points by mid-morning, and I want to walk through what's in it, because matklad is the person who wrote rust-analyzer, and he picks his examples carefully.
00:11:42 The opening claim is the one a lot of people will quote, and it deserves to be quoted. He cites a line from neugierig: 'we talk about programming like it is about writing code, but the code ends up being less important than the architecture, and the architecture ends up being less important than social issues.' Conway's law as the actual curriculum, not the cute corollary.
00:12:08 The example he walks through is the build system of rust-analyzer. He writes — and I'm going to read this almost in full because it's the cleanest articulation I've seen of architecture as recruiting strategy: 'My insistence that rust-analyzer doesn't require building rustc, that it builds on stable, that it doesn't have any C dependencies, and that the entire test suite takes seconds, was in the service of the goal of attracting high-impact contributors.
00:12:40 I was wrangling the build system to make sure people can work on the borrow checker without thinking about anything else.' The build system isn't a build system. It's a filter on who shows up. If your project takes forty minutes to compile and requires a custom toolchain, you've decided, whether you meant to or not, that nobody with a weekend will help you.
00:13:07 Then the second half. To handle the breadth features — the dozens of small IDE behaviors no single deep contributor wants to own — matklad split the internals into independent features, each wrapped in a catch-and-unwind at runtime. The bar for a feature pull request was 'happy path works and tested.' If the code crashes, fine.
00:13:30 The crash is isolated to that feature, and the runtime promise is that features work against an immutable snapshot and can't poison shared data. That's the architecture call: pay for a stronger invariant in the core, so the bar for contributions in the periphery can be lower.
00:13:49 What I like about this is how plain it is. There's no diagram. There's no layered onion. The architectural decision is 'wrap every feature in catch-and-unwind and forbid mutation of the snapshot,' and the consequence is 'we can accept code from people who don't have time to be careful, and we still ship something stable.'
00:14:18 He says you basically have two moves. One: occasionally, rarely, you get to nudge the incentive structure of a project — and that's where the leverage is. Two: if you can't change it, speedrun the four stages of grief to acceptance, and do the best work you can inside it.
00:14:37 He cites TigerStyle — the set of rules that came out of TigerBeetle's culture — and says the secret sauce there isn't the rules themselves, it's the social context that makes those rules a good idea. Same rules in a different shop would be cargo culting. The reading list is also worth the click.
00:14:58 He points at Gary Bernhardt's Boundaries talk, Pieter Hintjens and the ZeroMQ guide for Conway's-law thinking, Jamii's 'Reflections on a decade of coding,' and Ted Kaminski's blog. He doesn't recommend a single book, because, he says, the truths aren't in a book — they're in the practice.
00:15:18 Which is a slightly inconvenient answer, but I think a true one. The one thing I'd add for our context. We're now writing code with agents that don't yet experience Conway's law. The agent will happily produce a fifteen-thousand-line refactor that no team would have shipped, because no team would have agreed to it.
00:15:41 Architecture as social filter still matters, maybe more than ever — except now the filter has to include 'will a sleepy human review this in the morning,' not just 'will a weekend contributor write this.' I'd love to read matklad's followup on that, if he writes one.
Please stop hijacking my mouse pointer
00:15:59 Short one. Rukshan, a developer in Sri Lanka writing on his personal blog, has a small post titled 'Don't Hijack My Mouse Pointer.' It's two hundred and sixty-nine words. It's worth them. His observation is that more and more sites are replacing the OS mouse pointer with custom bespoke designs — glowing blobs, magnetic trails, oversized circles that lag the cursor by a few frames.
00:16:26 The pointer, he points out, is shaped the way it is for accumulated reasons. The tilt makes it easy to draw on low-resolution screens. The sharp tip makes click targets accurate. None of this was arbitrary. And he names the cause, which is what makes it a Braid item.
00:16:44 He writes: 'before vibe-coding it was difficult and time consuming to implement such fancy effects, but now it takes a single prompt and a few hundred tokens, and you have a fancy effect instead of the nice pointer.' When the implementation cost of a junky pattern drops to zero, the only thing standing between the pattern and the user is taste.
00:17:17 We've spent decades building friction around bad ideas — they were hard to build, so people had to want them. That friction is gone. The cursor on your portfolio site is now a single prompt away from being a ten-pixel pink blob that misses every link. I don't think the answer is to ask agents to refuse.
00:17:38 We're about to find out, very quickly, who has taste and who was just constrained. The constraint was doing more work than we knew.
Optane DIMMs, Qwen3.6 MTP, and the local frontier
00:17:47 Two posts on the LocalLLaMA subreddit today, and they sit nicely next to each other. The first one is from a builder calling themselves APFrisco, and the title is, I'm not kidding, 'Computer build using Intel Optane Persistent Memory — Can run 1 trillion parameter model at over 4 tokens per second.' Six hundred and thirty-three upvotes, a hundred and seven comments.
00:18:15 The setup: APFrisco picked up discontinued Intel Optane Persistent Memory DIMMs. Optane PMem sits in a DIMM slot but operates somewhere between DRAM and an SSD — slower than DRAM, much faster than NVMe, and dense enough that you can fit hundreds of gigabytes into a single workstation.
00:18:37 Intel killed the line a few years back. So now there's a small ecosystem of people buying it on the secondary market specifically to run huge mixture-of-experts models that nobody can afford the DRAM for. With that, APFrisco is running Kimi K2.5 — Moonshot AI's one-trillion-parameter open-weights model — at roughly four tokens per second.
00:19:03 Four tokens per second is unusable for chat. It's fine for an overnight batch job. The point isn't 'is this fast enough,' it's 'is this possible at all.' Last year, a trillion-parameter model on a home workstation wasn't a sentence anybody could finish. Today it runs on a five-figure pile of discontinued enterprise memory and a single GPU.
00:19:29 The second post is from Altruistic-Heat-9531, and it's about multi-token prediction. Unsloth has published Qwen3.6 27B and Qwen3.6 35B-A3B as GGUF files, but with the multi-token prediction head from the original training run preserved. Multi-token prediction, if you haven't been tracking it, is the idea that you train the model to predict the next two or three tokens at once, instead of one.
00:19:59 At inference, the extra heads function as a speculative draft for the main head — which is exactly the trick speculative decoding plays with a smaller draft model, except now you don't need a smaller draft model, because it's part of the same checkpoint. The wrinkle: you have to check out the in-flight llama.cpp pull request that adds MTP support, build it yourself, and then point it at these GGUFs.
00:20:30 The community read in the comments is that the speedup, when you get it working, is real and ranges from the high single-digit percent into the low thirties depending on the workload and the quantization. What I like about the two posts side by side is that they capture the texture of where local-LLM work actually lives right now.
00:20:55 It isn't 'buy the latest H200.' It's 'find Intel Optane DIMMs on eBay' and 'check out the not-yet-merged MTP branch.' The local frontier is being pushed forward by people scavenging the corners of the hardware market and pulling unmerged commits to chase ten percent.
00:21:16 That's craft.
Closing
00:21:17 So that's the day. A coordinated supply-chain campaign that explicitly weaponizes Claude Code and VS Code config files as the propagation step — which I think is the moment the agent-in-the-editor pattern stopped being a clean win on its own. A new model from a new lab arguing that the next frontier isn't autonomy but bandwidth between the human and the model.
00:21:40 A short letter from matklad reminding us that the build system is an org chart in disguise. A small post about cursors that I think will look prescient in a year. And two home rigs pushing the local frontier with eBay parts and unmerged pull requests. If you pulled from npm or PyPI yesterday, rotate your tokens before you do anything else, and grep your repos for dot-claude and dot-vscode files you didn't write.
00:22:06 If you haven't read the matklad piece, read it tonight. And if you build editor-resident agents, read the SafeDep writeup twice, because tomorrow's threat model is going to look more like that and less like the prompt-injection toy examples we've been handing each other for two years.
00:22:24 That's what I'll be watching next. — Lenar.