◆ Braid Daily · 2026-05-05

VS Code reverts Copilot attribution default

5 May 2026

Microsoft pulls back the Co-authored-by: Copilot default, CAISI signs three more frontier labs, and DeepSeek V4 Pro lands in the top tier…

The lead

Following last week's segment on Microsoft defaulting Co-authored-by: Copilot to on without consent, the VS Code team has reverted the default in 1.119, scoped attribution to AI-generated changes only, and floated 'assisted-by' as a replacement trailer. Users will have to opt in.

Read source

Policy and labor

CAISI signs Google DeepMind, Microsoft, and xAI to pre-deployment testing

NIST

Every major US frontier lab is now under a voluntary pre-deployment testing regime, with over 40 evaluations completed to date. The mechanism — labs handing over models with reduced or removed safeguards to a measurement body — is the operational shape any future mandatory regime would inherit.

“Independent, rigorous measurement science is essential to understanding frontier AI and its national security implications.”

Read source

DeepMind staff vote 98% to unionize over military contracts

The Verge

Staff at the London headquarters petitioned Google to recognize the Communication Workers Union and Unite the Union as joint reps, with a stated focus on blocking DeepMind technology from being used by the Israeli and US military. First serious union vote inside a frontier AI lab tied explicitly to refusing military deployment.

Read source

Chrome silently installs a 4GB Gemini Nano weights file

thatprivacyguy.com

On an audit profile that received zero human input, Chrome still downloaded the model to OptGuideOnDeviceModel/2025.8.8.1141/weights.bin and re-downloaded it after deletion. The visible 'AI Mode' pill in the address bar routes to Google servers, not the local model.

“No dialogue at first launch. No checkbox in Settings.”

Read source

Models and benchmarks

DeepSeek V4 Pro lands #4 on FoodTruck Bench at 17x lower price

r/LocalLLaMA

DeepSeek V4 Pro ties Grok 4.3 Latest on outcome behind Opus 4.6 and GPT-5.2, at $0.435/M input and $0.87/M output versus GPT-5.2's $1.75/$14. Against Grok at the same price tier: zero loans, ~6x less food waste, 30% more meals served, and a 2.4x tighter outcome distribution.

“The China-US frontier gap on this benchmark used to feel like a year. Right now it's about ten weeks.”

Read source

AgentFloor: how far up the tool-use ladder small open-weight models can go

arXiv

A deterministic 30-task benchmark organized as a six-tier capability ladder, evaluated across 16 open-weight models from 0.27B to 32B alongside GPT-5 over 16,542 scored runs. The strongest open-weight model matches GPT-5 in aggregate, but frontier models hold a real advantage on long-horizon planning with persistent constraints.

Read source

The tool-use tax: when adding tools makes reasoning worse

arXiv

Tool-augmented reasoning does not always beat native chain-of-thought, especially under semantic distractors. The paper decomposes the cost into prompt formatting, tool-calling protocol overhead, and execution gain, and proposes G-STEP as an inference-time gate that recovers some of the loss but not all of it.

Read source

Opus 4.7 regression reports

r/Anthropic

A 225-upvote thread on regressions in Opus 4.7 versus 4.6 in agentic coding setups, with multiple commenters reporting similar issues. Commenters speculate about a smaller base model tuned for harness benchmarks; whether the regression is real or a sampling artifact, the perception is loud enough to look at.

Read source

Tools and infra

Qwen3.6 27B FP8 with 200K BF16 KV cache at 80 TPS on a single 48GB card

r/LocalLLaMA

A concrete recipe for keeping a frontier-class agent loop on a single RTX 5000 PRO 48GB, with the argument that for agentic coding the KV cache precision matters more than the weights because errors compound across long tool-use loops.

“A quantized model with quantized KV will inevitably compound errors faster than non-quantized ones, which noticeably impacts agentic coding.”

Read source

vibevoice.cpp: Microsoft VibeVoice ported to ggml

r/LocalLLaMA

Pure C++ ggml port covering TTS with voice cloning and long-form ASR with diarization, across CPU, CUDA, Metal, Vulkan, and hipBLAS backends. 0.41 real-time factor on CUDA Q4_K with ~6GB peak RSS; 17 minutes of audio in one shot at 1.94 RTF on CPU Q8_0.

Read source

A 5-step lead enrichment pipeline replaced with one Claude skill

r/ClaudeAI

An operator collapsed Apollo plus People Data Labs plus a verification tool plus a manual HubSpot import into a single Claude skill wired to three MCPs — Crustdata, FullEnrich, and HubSpot. Over an hour of work and three vendors became about five minutes; the top reply names the tradeoff as vendor concentration risk replacing integration complexity.

Read source

Threads worth pulling

When everyone has AI and the company still learns nothing

robert-glaser.de

Glaser's argument is that adoption typically becomes 'everywhere, uneven, partially hidden, difficult to compare, and not yet connected to organizational learning.' The right metric is not token spend or seats, it is what changed in the team because of the spend.

“Individual productivity gains from AI do not automatically become organizational gains.”

Read source

Companion episode

VS Code Walks It Back, CAISI Signs Three Labs, and the Frontier Gap Compresses to Ten Weeks

2026-05-05 · 00:28:24

Episode Sources Transcript Chapters JSON

Two threads worth tracking together this week: the AgentFloor paper and the tool-use tax paper both argue that small open-weight models can carry more of the agent loop than the routing-by-default-to-frontier crowd assumes — and that adding tools to a weak reasoner can make things worse. Pair that with the Qwen3.6 KV-cache benchmark, and the case for keeping the routine calls local keeps gaining concrete numbers.