◆ Braid Daily · 2026-05-10

Chollet: agentic coding as machine learning

10 May 2026

Reframes agentic coding from a software engineering activity into an ML pipeline — which means the disciplines that matter shift toward eval

The lead

Reframes agentic coding from a software engineering activity into an ML pipeline — which means the disciplines that matter shift toward eval, not deterministic review.

Read source

Primary signals

antirez: DeepSeek 4 on DGX Spark — 12 tokens/sec, prefill 200

@antirez on X

A concrete, measured port of DeepSeek 4 to NVIDIA's small-form-factor DGX Spark. The 270 gigabytes per second memory bandwidth is the bottleneck — a real number worth filing alongside the M3 Max comparison.

“DS4 running on DGX Spark (GB10 / CUDA), private branch for now. 12 tokens/sec, the memory bandwidth is limited in this system, at 270GB/sec. But prefill is ways more aligned to M3 Max at ~200 t/s.”

Read source

Elad Gil: the AI diffusion gap, in months

@eladgil on X

A practical map of who has access to what and when. It's a compounding gap: by the time a model lands at a startup, lab insiders are already six months into the next one.

“People at major AI labs (using internal models) 3-4 months ahead of startup silicon valley engineers. SV founders/eng 3-6 months ahead of NY. NY founders/eng 6-12 months ahead of rest of world.”

Read source

Virgil Maro: agency at the prompt boundary

@_virgil19 on X

Names something a lot of teams are quietly noticing — that AI tools amplify whatever the user brings, including the absence of a goal.

“the compounding shows up at the prompt boundary. high-agency users come pre-loaded with goals worth amplifying. low-agency users hand the model the goal too. AI doesn't generate the gap. it scales whatever shape”

Read source

Engineering moves to the consequence boundary

@FiftyOne_50_ on X

A clean restatement of what agentic coding actually shifts: not less engineering, just engineering located somewhere different — at the points where you can still say no.

“Agentic coding does not remove engineering. It moves engineering to the consequence boundary: What gets specified, tested, trusted, deployed, monitored, rolled back, and owned when the model is wrong.”

Read source

Gemma 4 MTP on MLX Swift: 30-40% faster on M5 Max

@adrgrondin on X

Multi-token prediction with a small drafter model is the speculative-decoding move, but with the drafter trained alongside the target model. 30 to 40 percent decode speedup for 900 megabytes of extra weights is a strong trade.

“Early WIP port of Gemma 4 multi-token prediction (MTP) on MLX Swift. With MTP, Gemma 31B is 30-40% faster on M5 Max and with zero quality degradation. A significant speedup by just adding a 900MB MTP drafter model.”

Read source

Supporting links

METR: Claude Mythos Preview 50% time horizon hits 17 hours

reddit.com

Yesterday we promised to track who builds the next METR evaluation tasks. Today METR published an update showing Claude Mythos Preview's 50% time horizon at 17 hours — a measurable advance over the previous bar and the headline number from yesterday's evaluation-ceiling discussio

Read source

NVIDIA Star Elastic: one checkpoint, three sizes via zero-shot slicing

reddit.com

A single checkpoint that contains 30 billion, 23 billion, and 12 billion parameter reasoning models, sliceable at inference time with no retraining. That collapses three deployment targets into one artifact and shifts where the inference budget gets spent.

Read source

Claude Opus 4.7 burns more tokens on German prompts

reddit.com

A practical reminder that the tokenizer is not language-neutral. German runs through the tokenizer at a meaningfully higher token count than English for the same content, and that translates to slower turns, smaller effective context, and higher bills.

Read source

Gemini API File Search goes multimodal

blog.google

Multimodal retrieval-augmented generation as a hosted API primitive. The change in scope is the part to notice — the file-search endpoint now indexes images and PDFs alongside text, so callers don't need to maintain a separate visual retrieval pipeline.

Read source

Companion episode

Seventeen Hours, Three Sizes, and the Prompt Boundary

2026-05-10 · 00:24:34

Episode Sources Transcript Chapters JSON

Today's full Braid dispatch is up on the site — every link above ran past the editorial agent before it landed here.