Archive BRAID
A few hundred dollars a proof, and the long argument about what machines are for / DISPATCH 037
PDF RSS

Dispatch 037 · 2026-05-25 GSV Smallest Model That Does The Job

A few hundred dollars a proof, and the long argument about what machines are for

/ 00:23:40 / 15 sources

“The skill that ages well isn't running the most agents; it's getting the result with the smallest model and the fewest tokens that do the job.”

— Lenar Kess, today's narration

A frontier lab proves nine decades-old math problems for a few hundred dollars each, two talks make the numeric case that the cheapest agents route work to the smallest model that can do it, a lawsuit names an individual researcher over how Llama's training data was sourced, and a papal encyclical argues about AI on the terms of work and dignity. Eight things worth knowing today, told one developer to another.

Chapters

  1. 00:00:04 A few hundred dollars a proof
  2. 00:03:20 You don't need GPT to zoom for you
  3. 00:06:46 The token-efficiency turn
  4. 00:09:42 Inside how DeepMind runs its own agents
  5. 00:12:43 The lawsuit that names a name
  6. 00:15:58 Jujutsu and the pile of laundry
  7. 00:18:28 Filming your chores for the robots
  8. 00:21:00 Pope Leo XIV, and what no machine replaces

Sources

15 cited
  1. 1

    Grok foundation model V9-Medium (1.5T) has finished training

    X elonmusk — CEO of xAI, Tesla, SpaceX; owner of X

    A lot of Cursor data was added in supplementary training and there is more to come.

    x.com/elonmusk/status/2058787384364265734 →
    Details
    Cited text
    A lot of Cursor data was added in supplementary training and there is more to come.
    Context
    A frontier lab pre-announcing a 1.5-trillion-parameter model trained partly on coding-agent interaction data, with an open-source pledge for the prior model, signals where xAI is aiming its coding push — though no public evals exist yet.
    Key points
    • xAI's next foundation model, Grok V9-Medium, is 1.5 trillion parameters and has finished pre-training; fine-tuning underway, reinforcement learning to begin in days, public release in 2-3 weeks.
    • Musk says a lot of Cursor data was added in supplementary training, framing it as 'much better at coding.'
    • The current production model serving all Grok traffic is the 0.5T V9-... sorry, the 0.5T v8-small; the new model is pitched as a major improvement for difficult coding tasks.
    • xAI plans to open-source the 0.5T v8-small model toward the end of the year.
    • A reply notes benchmark performance and real-world performance are diverging, a fair caution before evals are public.
    Provenance
    Tweet · Primary source
  2. 2

    Microsoft switched from Claude Code to GitHub Copilot, both on Opus 4.7

    X trengriffin — Tren Griffin, longtime tech/finance writer ("12 Things") and a Microsoft employee

    The wrapper is interchangeable — the engine isn't... The moat was never the UI.

    x.com/trengriffin/status/2058786460103532597 →
    Details
    Cited text
    The wrapper is interchangeable — the engine isn't... The moat was never the UI.
    Context
    It separates two things builders often conflate — the agent harness and the underlying model — and argues the spend follows the model, not the interface, which reframes a 'cost-cutting' rumor as internal dogfooding.
    Key points
    • Responding to a claim that Microsoft throttled Claude Code usage to cut an out-of-control AI bill, Griffin says the move was a harness swap, not a cost cut.
    • His claim: Microsoft switched engineers from Claude Code to GitHub Copilot, both running Opus 4.7, both paid via enterprise API usage — 'Same Anthropic bill. Zero expense cut.'
    • He frames it as Microsoft wanting to dogfood the GitHub Copilot harness for scale and feedback.
    • A reply by Ash Cole captures the point: people read harness swaps as model swaps; the wrapper is interchangeable, the engine isn't.
    • This is one person's assertion, not a Microsoft announcement; treat the specifics as a claim.
    Provenance
    Tweet · Primary source
  3. 3

    1000 tokens/sec generation on Qwen3.6 27B with V100s

    Source Simple_Library_2700 (r/LocalLLaMA) — Local-inference hobbyist posting benchmark numbers

    For single user the generation is around 80 t/s with 3000 t/s processing, no mtp!!

    www.reddit.com/r/LocalLLaMA/comments/1tmyln… →
    Details
    Cited text
    For single user the generation is around 80 t/s with 3000 t/s processing, no mtp!!
    Context
    It shows a capable 27-billion-parameter coding-grade model running fast on multi-generation-old, cheap GPUs — evidence that serious local inference no longer requires current-gen hardware.
    Key points
    • A hobbyist hit roughly 1000 tokens/sec aggregate on Qwen3.6 27B across 128 concurrent requests using older Nvidia V100 server cards.
    • Single-user (batch one) generation is around 80 tokens/sec with about 3000 tokens/sec prompt processing, without multi-token prediction.
    • The headline interest in the comments is the cheap hardware: people want V100 pairs at reasonable prices.
    • It's a best-case throughput demo, not a typical single-user figure — the poster is clear the 128-concurrent number is far beyond personal need.
    Provenance
    Source · Background source
  4. 4

    llama.cpp: fix checkpoints creation (faster agentic coding on local models)

    Source jacekpoplawski (jacek2023) — llama.cpp contributor; the merged PR addresses context reprocessing during agentic coding

    In the worst case, it has to reprocess the entire context and you get "forcing full prompt re-processing."

    github.com/ggml-org/llama.cpp/pull/22929 →
    Details
    Cited text
    In the worst case, it has to reprocess the entire context and you get "forcing full prompt re-processing."
    Context
    For anyone running agents against a local model, this is the difference between a snappy loop and a multi-second stall on every turn — the kind of runtime detail that decides whether local agentic coding is usable.
    Key points
    • The problem: agent harnesses that rewrite conversation history to 'optimize context' force llama.cpp to reprocess huge chunks of tokens — sometimes the entire 70k-token context — stalling local agentic coding.
    • Two triggers: tools that rewrite history (he switched from opencode to pi to avoid it) and models that strip reasoning from context (enable 'preserve thinking,' e.g. with Qwen 3.6).
    • The merged PR fixes checkpoint creation so llama.cpp reprocesses only what actually changed, getting closer to the best case.
    • The author reports two weeks of use with noticeably more responsive agentic coding.
    • It's a concrete reminder that local agent performance is as much about cache/context plumbing in the runtime as about the model.
    Provenance
    Source · Background source
  5. 5

    Is NVIDIA still the default best choice for local LLMs in 2026?

    Source pmv143 (r/LocalLLaMA) — r/LocalLLaMA discussion, 230+ comments

    MI50 can be had for just $600... 32GB of VRAM and 1TB/s of memory bandwidth.

    www.reddit.com/r/LocalLLaMA/comments/1tmkau… →
    Details
    Cited text
    MI50 can be had for just $600... 32GB of VRAM and 1TB/s of memory bandwidth.
    Context
    The default 'just buy Nvidia' answer is fracturing by task and budget — relevant to anyone speccing a local inference box this year.
    Key points
    • The gap to AMD has closed for text inference: a commenter runs an all-AMD homelab pain-free on llama.cpp's Vulkan backend.
    • AMD still hurts outside text inference — training and image generation run into ROCm headaches; llama.cpp's native training looks half-finished.
    • Value argument for AMD: an MI50 at about $600 gives 32GB of video memory and 1TB/s bandwidth, and AMD's open ISAs/drivers mean community support can outlast vendor decisions.
    • Apple's unified memory is the turnkey alternative — the 512GB Mac Studio was the go-to for hosting very large models like GLM-5.
    • MSRP is treated as nearly useless; real local-hardware decisions hinge on street prices and which task you're doing.
    Provenance
    Source · Background source
  6. 6

    AlphaProof Nexus: Autonomous formal mathematics with agentic loops

    Article Google DeepMind — Google DeepMind's formal-mathematics team; preprint arXiv 2605.22763v1, posted May 21, 2026, with proofs on GitHub and erdosproblems.com

    9 of 353 open Erdős problems, at an inference cost of a few hundred dollars per problem.

    arxiv.org/abs/2605.22763 →
    Details
    Cited text
    9 of 353 open Erdős problems, at an inference cost of a few hundred dollars per problem.
    Context
    Verified formal proofs sidestep the trust problem that dogs LLM math: the Lean kernel is the referee, so a few-hundred-dollar agent loop produces results a human can check by running them. It extends the autonomous-math thread from OpenAI's planar-unit-distance result we covered May 21.
    Key points
    • AlphaProof Nexus autonomously solved 9 of 353 open problems in the Erdős catalogue and proved 44 of 492 open OEIS sequence conjectures.
    • It pairs a large language model with the Lean proof assistant, running agentic loops that refine proofs against formal verification until they pass or it gives up.
    • Inference cost was a few hundred dollars per problem; proofs are machine-checkable rather than natural-language.
    • Some problems had been open for decades; the team also reports a 15-year-old algebraic-geometry result.
    • Because output is Lean-verified, there is no question of a hallucinated proof — it either typechecks or it doesn't.
    Provenance
    Article · Supporting source
  7. 7

    Scaling the Next Paradigm of Heterogeneous Intelligence — Adrian Bertagnoli, Callosum

    Video Adrian Bertagnoli (Callosum / Colossyan) — Founding engineer presenting Callosum's work on routing subtasks across different models and chips; talk hosted on the AI Engineer channel

    You don't need GPT to zoom for you.

    www.youtube.com/watch?v=WRBNDpUhsJQ →
    Details
    Cited text
    You don't need GPT to zoom for you.
    Context
    It's a concrete, numbers-backed case that the cheapest wins in agent systems come from matching each subtask to the smallest model that can do it — a design lever any builder shipping multi-step agents can pull today.
    Key points
    • On Video Web Arena, a mixture of Qwen3 VL 8B and Kimi K2.5 beat GPT-5.2 and Gemini 2.5 by 18 and 25 percent respectively.
    • Routing cheap subtasks (zooming, visual parsing) to a small model alone produced 11x faster and 43x cheaper results on those steps.
    • On a long-context benchmark, mapping recursive sub-agents across Cerebras/SambaNova hardware ran 7-12x cheaper and 3-5x faster than GPT-5.2.
    • Core thesis: real problems decompose into subtasks that need different model sizes and architectures; homogeneous single-model scaling is inefficient for inference.
    • An automation layer now detects task complexity and predicts the best-suited model and hardware, replacing hand-coded routing.
    Provenance
    Video · Supporting source
  8. 8

    Everyone is Wrong about Tokens

    Video ThePrimeagen — Developer and streamer known for blunt takes on engineering culture; reacting to a screenshot of $1.3M / 603 billion tokens spent in 30 days

    It's going to be the people that are just being engineers... not the people spending Infinity.

    www.youtube.com/watch?v=0zw-Uk9KJiA →
    Details
    Cited text
    It's going to be the people that are just being engineers... not the people spending Infinity.
    Context
    A useful counterweight to the maximalist agent-swarm pitch: the person showing off billions of tokens often isn't paying retail, and the org paying retail will eventually ask who shipped the most per dollar.
    Key points
    • Reacts to a post showing $1.3M and 603 billion tokens spent in a month running OpenClaw — and notes the poster paid zero for those tokens.
    • Compares today's 'spend infinity on tokens' culture to the 2016-2020 era of startups with more microservices than customers.
    • Prediction: companies will swap 'token maxing' for token efficiency, ranking people by features delivered, not spend.
    • Frames the new build calculus as 'buy vs build vs vibe' — vibing costs both time and money.
    • Skeptical of the 10x-cheaper-every-year promise: 'that promise is 2 years old and I feel like things have never been more expensive.'
    Provenance
    Video · Supporting source
  9. 9

    How Google DeepMind Runs Agents at Scale — KP Sawhney & Ian Ballantyne

    Video KP Sawhney & Ian Ballantyne (Google DeepMind) — KP Sawhney is a software engineer on DeepMind's AI platform team; Ian Ballantyne is a DevRel engineer; panel on the AI Engineer channel

    We have worse limits than you do because obviously we prioritize customers and not ourselves.

    www.youtube.com/watch?v=7gujZrJ9L5I →
    Details
    Cited text
    We have worse limits than you do because obviously we prioritize customers and not ourselves.
    Context
    A rare look inside how a frontier lab actually operates its own agents day to day — quota politics, skill curation, and observability — which is more honest about the constraints than most vendor demos.
    Key points
    • DeepMind engineers get worse rate limits than paying customers — customers are prioritized; internal throttling is 'kind of brute force.'
    • A 'Darwinian' skills library: experts contribute skills, the org culls them so only the best survive, and agents inherit that knowledge for free.
    • KP is skeptical of MCP ('may be a little bit of a flash in the pan') and favors skills plus guardrailed CLI interactions.
    • Subscription pricing 'doesn't really work' for token-hungry agents; they want harness-level fallback (Pro to Flash to local) so an unattended job doesn't stall on a limit.
    • An agent-trajectory store lets them replay runs down to raw predict requests to find exactly when a run started looping.
    Provenance
    Video · Supporting source
  10. 10

    Ed Newton-Rex on individual researchers being sued over AI training

    X @ednewtonrex — Ed Newton-Rex, founder of Fairly Trained and former audio lead at Stability AI; a prominent critic of unlicensed AI training data

    It's no longer just AI companies & their founders being sued over AI training - individual researchers are now being sued, too.

    x.com/ednewtonrex/status/2058433725889716519 →
    Details
    Cited text
    It's no longer just AI companies & their founders being sued over AI training - individual researchers are now being sued, too.
    Context
    Personal liability for researchers changes the calculus of how training corpora get assembled. If an individual engineer can be named for sourcing data, the 'move fast, sort licensing later' default gets a lot more expensive to choose.
    Key points
    • Flags a new suit (Hobbs v. Meta) naming an individual AI researcher, not just the company and its executives.
    • Authors Jeff Hobbs and A. Douglas Stone allege Guillaume Lample, then a Meta researcher, torrented roughly 70+ terabytes of pirated books to train Llama.
    • Court records put the figure at 81.7TB pulled from shadow libraries including LibGen, Anna's Archive and Z-Library.
    • Lample allegedly referred to a LibGen copy as 'BooksZero' and kept the code off Meta's repository.
    • Defendants also include Mark Zuckerberg and Joelle Pineau; Lample has since co-founded Mistral AI.
    Provenance
    Tweet · Primary source
  11. 11

    Meta staff torrented nearly 82TB of pirated books for AI training — court records

    Article Tom's Hardware — Trade-press reporting on the Hobbs v. Meta court records

    I don't think we should use pirated material. I really need to draw a line here.

    www.tomshardware.com/tech-industry/artifici… →
    Details
    Cited text
    I don't think we should use pirated material. I really need to draw a line here.
    Context
    The internal dissent in the record is the part that lands: engineers flagged the line and it was crossed anyway. That's the institutional pattern worth watching as more of these suits name names.
    Key points
    • Court records describe 81.7TB of data downloaded via torrents from shadow libraries to train Llama.
    • Internal messages show researchers objecting: one said using pirated material 'should be beyond our ethical threshold.'
    • Plaintiffs allege Meta removed copyright-management information and avoided licensing to preserve a fair-use posture.
    • The case names individuals alongside the company, a shift from earlier AI-training suits.
    • It centers on books specifically, treated as uniquely valuable training data.
    Provenance
    Article · Supporting source
  12. 12

    Simon Willison on publishing GPT-4's retired architecture

    X @simonw — Simon Willison, co-creator of Django and author of the widely-read blog on practical LLM use

    Given how much of the original 'bottle of water per generated email' water estimate came from guesses at the architecture of GPT-4, it would be very much in OpenAI's interest to publish the architecture of that now-reti…

    x.com/simonw/status/2058877314004627690 →
    Details
    Cited text
    Given how much of the original 'bottle of water per generated email' water estimate came from guesses at the architecture of GPT-4, it would be very much in OpenAI's interest to publish the architecture of that now-retired, three year old model.
    Context
    Much of the public debate about AI's resource cost runs on reverse-engineered guesses. The opacity isn't just an academic gap — it shapes regulation and reputation built on numbers nobody can verify.
    Key points
    • Argues OpenAI should publish the architecture of the now-retired GPT-4, three years on.
    • The widely-cited 'bottle of water per email' figure rested on guesses about GPT-4's architecture.
    • Publishing real numbers would let people replace estimates with facts on AI's energy and water footprint.
    • Ties to a broader transparency gap: outsiders still reason about frontier models from leaks and inference.
    Provenance
    Tweet · Primary source
  13. 13

    Defeating git rigour fatigue with Jujutsu

    Article Ike Saunders — Developer writing about the Jujutsu (jj) version control system

    Doing Commits Like A Big Pile Of Laundry, perhaps?

    ikesau.co/blog/defeating-git-rigour-fatigue… →
    Details
    Cited text
    Doing Commits Like A Big Pile Of Laundry, perhaps?
    Context
    It's a concrete workflow for the messy-middle of feature development, and a good example of how Jujutsu's model lets you treat commit history as something you arrange at the end rather than maintain throughout.
    Key points
    • Names a real pain: keeping clean, reviewable commits during long feature work is effortful and people give up ('git rigour fatigue').
    • Proposes building the ideal commit history first as empty labeled commits (jj new -B / -A), then sorting hunks into them.
    • Squash everything messy into one 'everything commit,' then interactively squash hunks into the right labeled commit until the everything commit is empty.
    • Claims it beats jj split and as-you-go squashing because the final state is guaranteed conflict-free.
    • Honest caveat: there's no guarantee every commit compiles, which may be a dealbreaker for bisect-clean history.
    Provenance
    Article · Supporting source
  14. 14

    Why tech companies are paying people to film their chores

    Article The Washington Post — Reporting on the gig-work economy springing up to generate household training video for humanoid robots

    Gig workers earn $20-25 an hour to record themselves folding laundry, washing dishes, and making beds.

    www.washingtonpost.com/technology/interacti… →
    Details
    Cited text
    Gig workers earn $20-25 an hour to record themselves folding laundry, washing dishes, and making beds.
    Context
    The data bottleneck for embodied AI is physical demonstration, and a labor market is forming to supply it one folded towel at a time. It's the clearest sign yet that the next training-data land grab is happening in living rooms, not on the open web.
    Key points
    • AI and robotics firms are paying gig workers to film everyday chores as training data for humanoid robots.
    • DoorDash launched a Tasks app in March 2026 letting its US Dashers earn money filming laundry, dishes and bed-making.
    • Micro1 reports ~4,000 'robotics generalists' across 71 countries sending more than 160,000 hours of video a month.
    • Equipment is typically head-mounted phones; pay runs roughly $20-25 an hour.
    • A viral Reddit post framed this as 'OpenAI installing 360 cameras' — the verified players are DoorDash, Scale AI and Micro1, training robots from Figure, Tesla and Agility.
    Provenance
    Article · Supporting source
  15. 15

    Magnifica Humanitas — Encyclical Letter of Pope Leo XIV

    Article Pope Leo XIV — An encyclical (a senior teaching document of the Catholic Church) addressing artificial intelligence and human dignity, dated May 15, 2026

    Technology is never neutral, because it takes on the characteristics of those who devise, finance, regulate and use it.

    www.vatican.va/content/leo-xiv/en/encyclica… →
    Details
    Cited text
    Technology is never neutral, because it takes on the characteristics of those who devise, finance, regulate and use it.
    Context
    It's a major non-industry institution arguing about AI on the terms of work and dignity rather than benchmarks, and its line about technology carrying its makers' values is a sharp counterpoint to the 'tools are neutral' reflex common in engineering.
    Key points
    • Argues technology is not antagonistic to humanity in itself, but is never neutral — it carries the values of whoever builds, funds and deploys it.
    • Warns of a 'Babel syndrome' promising limitless progress at the cost of human dignity, versus a 'Nehemiah approach' of shared responsibility.
    • Flags that power over ourselves is unprecedented yet increasingly concentrated in private hands rather than democratic institutions.
    • Insists work has dignity independent of productivity metrics and opposes reducing workers to 'costs of production.'
    • Frames human dignity as ontological — 'no machine can ever replace' it, regardless of efficiency.
    Provenance
    Article · Supporting source