Archive BRAID DAILY
Grok V9-Medium wraps pre-training, with Cursor data in the mix
Subscribe

Braid Daily · 2026-05-25

Grok V9-Medium wraps pre-training, with Cursor data in the mix

xAI's 1.5-trillion-parameter coding model is done pre-training; harness-vs-engine, local hardware, and who owns the training data.

Dark editorial cover showing a monolithic compute lattice lit by signal-yellow filaments, labeled 1.5T, evoking a frontier model finishing training.
A 1.5-trillion-parameter coding model finishes pre-training — the day's lead.

The lead

1

Musk says xAI's next foundation model has finished pre-training: 1.5 trillion parameters, fine-tuning underway, reinforcement learning starting in days, public release in two to three weeks. He credits the training mix — 'A lot of Cursor data was added in supplementary training and there is more to come.' No public evals yet, so the coding claim is his to back up.

Read source

Running agents, and the bill

4

Microsoft moved engineers from Claude Code to GitHub Copilot — both on Opus 4.7

Tren Griffin (X)

A rumor said Microsoft throttled Claude Code to cut a runaway AI bill. Griffin, a Microsoft employee, says it was a harness swap, not a cost cut: engineers moved to GitHub Copilot, both tools run Opus 4.7 on the same enterprise API, so 'Same Anthropic bill. Zero expense cut.' Treat the specifics as one person's claim, not a Microsoft statement.

“The wrapper is interchangeable — the engine isn't... The moat was never the UI.”

Read source

Heterogeneous intelligence: route each subtask to the smallest model that can do it

Adrian Bertagnoli, Callosum (AI Engineer)

Bertagnoli's case for routing across models and chips: on Video Web Arena, an 8-billion-parameter Qwen3 VL paired with Kimi K2.5 beat GPT-5.2 by 18% and Gemini 2.5 by 25%. Sending cheap steps like zooming and visual parsing to the small model alone ran 11x faster and 43x cheaper on those steps. The lever any multi-step agent can pull is matching each subtask to the smallest model that can do it.

“You don't need GPT to zoom for you.”

Read source

'Everyone is Wrong about Tokens'

ThePrimeagen (YouTube)

Reacting to a post bragging about $1.3M and 603 billion tokens spent in a month running OpenClaw, ThePrimeagen notes the poster paid nothing for those tokens. His prediction: orgs will swap token-maxing for token efficiency and rank people by features shipped, not spend.

“It's going to be the people that are just being engineers... not the people spending Infinity.”

Read source

How Google DeepMind runs its own agents

KP Sawhney & Ian Ballantyne (AI Engineer)

A look inside day-to-day agent ops at a frontier lab. DeepMind engineers get worse rate limits than paying customers, who are prioritized; a 'Darwinian' skills library lets the org cull all but the best skills so agents inherit them for free; and an agent-trajectory store replays runs down to raw requests to find exactly when one started looping. KP is skeptical of MCP (the Model Context Protocol) and favors skills plus guardrailed CLI calls.

“We have worse limits than you do because obviously we prioritize customers and not ourselves.”

Read source

Tools for the local builder

4

1,000 tokens/sec on a 27-billion-parameter model, using old V100 cards

r/LocalLLaMA

A hobbyist pushed roughly 1,000 tokens per second aggregate on Qwen3.6 27B across 128 concurrent requests, on multi-generation-old Nvidia V100 server cards. Single-user generation lands around 80 tokens per second. It's a best-case throughput demo, but the point holds: a coding-grade model runs fast on cheap, old GPUs.

“For single user the generation is around 80 t/s with 3000 t/s processing, no mtp!!”

Read source

A llama.cpp fix that stops local agents reprocessing the whole context

llama.cpp (GitHub PR)

Agent harnesses that rewrite conversation history to 'optimize context' were forcing llama.cpp to reprocess huge token chunks — sometimes the full 70k-token context — stalling every turn. The merged PR fixes checkpoint creation so it reprocesses only what changed. A reminder that local agent speed is as much about runtime cache plumbing as about the model.

“In the worst case, it has to reprocess the entire context and you get "forcing full prompt re-processing."”

Read source

Is NVIDIA still the default for local LLMs in 2026?

r/LocalLLaMA

A 230-comment thread on whether 'just buy Nvidia' still holds. For text inference the gap to AMD has mostly closed on llama.cpp's Vulkan backend; AMD still hurts for training and image generation. The value case people cite is an MI50 around $600 for 32 gigabytes of memory and a terabyte per second of bandwidth, with Apple's unified-memory Macs as the turnkey alternative.

“MI50 can be had for just $600... 32GB of VRAM and 1TB/s of memory bandwidth.”

Read source

Defeating git rigour fatigue with Jujutsu

Ike Saunders

A concrete workflow for the messy middle of feature work: build the ideal commit history first as empty labeled commits, squash all the mess into one 'everything commit,' then sort hunks into place by hand until that commit is empty. Saunders is upfront about the catch — there's no guarantee every commit compiles, which may rule it out for bisect-clean history.

“Doing Commits Like A Big Pile Of Laundry, perhaps?”

Read source

Where the training data comes from

3

Now individual AI researchers are being sued over training data

Ed Newton-Rex (X)

Newton-Rex flags a shift in the AI-training suits: Hobbs v. Meta names an individual researcher, not just the company and its executives. The authors allege Guillaume Lample, then at Meta, torrented 70-plus terabytes of pirated books to train Llama; he has since co-founded Mistral AI.

“It's no longer just AI companies & their founders being sued over AI training - individual researchers are now being sued, too.”

Read source

Court records: Meta staff torrented nearly 82TB of pirated books

Tom's Hardware

The records behind that suit: 81.7 terabytes pulled from shadow libraries to train Llama, plus internal messages from researchers who objected at the time. The dissent in the record is the part that lands — someone flagged the line and it was crossed anyway.

“I don't think we should use pirated material. I really need to draw a line here.”

Read source

Tech firms are paying people $20–25 an hour to film their chores

The Washington Post

The data bottleneck for humanoid robots is physical demonstration, and a labor market has formed to supply it. DoorDash launched a Tasks app in March letting US Dashers film chores; Micro1 reports about 4,000 'robotics generalists' across 71 countries sending more than 160,000 hours of video a month. The next training-data land grab is happening in living rooms, not on the open web.

“Gig workers earn $20-25 an hour to record themselves folding laundry, washing dishes, and making beds.”

Read source

Who gets to check the work

3

AlphaProof Nexus solved 9 open Erdős problems — and the proofs typecheck

Google DeepMind (arXiv)

AlphaProof Nexus pairs a large language model with the Lean proof assistant and runs agentic loops until a proof typechecks or it gives up. It solved 9 of 353 open Erdős problems and proved 44 of 492 OEIS conjectures, some open for decades, at a few hundred dollars each. Because Lean is the referee, there's no hallucinated-proof problem: it passes the kernel or it doesn't. This extends the autonomous-math thread from the May 21 planar-unit-distance result.

“9 of 353 open Erdős problems, at an inference cost of a few hundred dollars per problem.”

Read source

Simon Willison: OpenAI should publish GPT-4's retired architecture

Simon Willison (X)

Willison's argument: much of the much-cited 'bottle of water per email' figure rested on guesses about GPT-4's architecture, so publishing the real numbers for a now-retired, three-year-old model would let people reason from facts instead of leaks.

“Given how much of the original 'bottle of water per generated email' water estimate came from guesses at the architecture of GPT-4, it would be very much in OpenAI's interest to publish the architecture of that now-retired, three year old model.”

Read source

Pope Leo XIV's encyclical argues AI on the terms of work and dignity

Vatican — Magnifica Humanitas

A major non-industry institution arguing about AI on the terms of work and dignity rather than benchmarks. The encyclical warns that power over ourselves is concentrating in private hands rather than democratic ones, and insists work has dignity independent of productivity. Its line about technology is a sharp counterpoint to the 'tools are neutral' reflex common in engineering.

“Technology is never neutral, because it takes on the characteristics of those who devise, finance, regulate and use it.”

Read source

Companion episode

A few hundred dollars a proof, and the long argument about what machines are for

· 00:23:40

Two earlier threads come back today. The autonomous-math run we covered on May 21 now has a verified-proof sibling in AlphaProof Nexus, and Pope Leo XIV's encyclical — flagged here a week ago as due May 25 — arrived on schedule. Read next to the Meta court records, they circle one problem: when a model produces a result or a corpus, who gets to check it, and on whose terms.