Reframes agentic coding from a software engineering activity into an ML pipeline — which means the disciplines that matter shift toward eval, not deterministic review.
Read source◆ Braid Daily · 2026-05-10
Chollet: agentic coding as machine learning
Reframes agentic coding from a software engineering activity into an ML pipeline — which means the disciplines that matter shift toward eval
The lead
1Primary signals
5antirez: DeepSeek 4 on DGX Spark — 12 tokens/sec, prefill 200
@antirez on X
A concrete, measured port of DeepSeek 4 to NVIDIA's small-form-factor DGX Spark. The 270 gigabytes per second memory bandwidth is the bottleneck — a real number worth filing alongside the M3 Max comparison.
Read source“DS4 running on DGX Spark (GB10 / CUDA), private branch for now. 12 tokens/sec, the memory bandwidth is limited in this system, at 270GB/sec. But prefill is ways more aligned to M3 Max at ~200 t/s.”
Elad Gil: the AI diffusion gap, in months
@eladgil on X
A practical map of who has access to what and when. It's a compounding gap: by the time a model lands at a startup, lab insiders are already six months into the next one.
Read source“People at major AI labs (using internal models) 3-4 months ahead of startup silicon valley engineers. SV founders/eng 3-6 months ahead of NY. NY founders/eng 6-12 months ahead of rest of world.”
Virgil Maro: agency at the prompt boundary
@_virgil19 on X
Names something a lot of teams are quietly noticing — that AI tools amplify whatever the user brings, including the absence of a goal.
Read source“the compounding shows up at the prompt boundary. high-agency users come pre-loaded with goals worth amplifying. low-agency users hand the model the goal too. AI doesn't generate the gap. it scales whatever shape”
Engineering moves to the consequence boundary
@FiftyOne_50_ on X
A clean restatement of what agentic coding actually shifts: not less engineering, just engineering located somewhere different — at the points where you can still say no.
Read source“Agentic coding does not remove engineering. It moves engineering to the consequence boundary: What gets specified, tested, trusted, deployed, monitored, rolled back, and owned when the model is wrong.”
Gemma 4 MTP on MLX Swift: 30-40% faster on M5 Max
@adrgrondin on X
Multi-token prediction with a small drafter model is the speculative-decoding move, but with the drafter trained alongside the target model. 30 to 40 percent decode speedup for 900 megabytes of extra weights is a strong trade.
Read source“Early WIP port of Gemma 4 multi-token prediction (MTP) on MLX Swift. With MTP, Gemma 31B is 30-40% faster on M5 Max and with zero quality degradation. A significant speedup by just adding a 900MB MTP drafter model.”
Supporting links
4METR: Claude Mythos Preview 50% time horizon hits 17 hours
reddit.com
Yesterday we promised to track who builds the next METR evaluation tasks. Today METR published an update showing Claude Mythos Preview's 50% time horizon at 17 hours — a measurable advance over the previous bar and the headline number from yesterday's evaluation-ceiling discussio
Read sourceNVIDIA Star Elastic: one checkpoint, three sizes via zero-shot slicing
reddit.com
A single checkpoint that contains 30 billion, 23 billion, and 12 billion parameter reasoning models, sliceable at inference time with no retraining. That collapses three deployment targets into one artifact and shifts where the inference budget gets spent.
Read sourceClaude Opus 4.7 burns more tokens on German prompts
reddit.com
A practical reminder that the tokenizer is not language-neutral. German runs through the tokenizer at a meaningfully higher token count than English for the same content, and that translates to slower turns, smaller effective context, and higher bills.
Read sourceGemini API File Search goes multimodal
blog.google
Multimodal retrieval-augmented generation as a hosted API primitive. The change in scope is the part to notice — the file-search endpoint now indexes images and PDFs alongside text, so callers don't need to maintain a separate visual retrieval pipeline.
Read sourceCompanion episode
Seventeen Hours, Three Sizes, and the Prompt Boundary
Today's full Braid dispatch is up on the site — every link above ran past the editorial agent before it landed here.