◆ Braid Daily · 2026-06-06

DeepSeek V4 reaches local hardware, tuned to rival Opus 4.7

6 June 2026

DeepSeek's V4 series is getting llama.cpp support, and a Latent Space guest claims he made it outperform Opus 4.7 on taste, not scale.

The lead

DeepSeek's V4 series is now getting llama.cpp support through an early PR, putting a frontier open-weights model within reach of a single machine. On Latent Space, CommandCodeAI's Ahmad Awais walks through making DeepSeek v4 outperform Claude Opus 4.7, leaning on tool-calling reliability and repair logic rather than raw scale.

Read source

Models and local inference

DeepSeek V4 Flash arrives on llama.cpp

r/LocalLLaMA

An early work-in-progress PR brings DeepSeek V4 support to llama.cpp, opening the series up for local experimentation. The author warns it is at a very early stage.

“the DeepSeek V4 series is finally getting supported on llama.cpp with this PR”

Read source

SAGE-PTQ: ultra-low-bit quantization for large models

arXiv

A graph-guided post-training quantization method aimed at cutting the inference and deployment cost of running large models at very low bit widths.

Read source

Benchmarks under pressure

Agents' Last Exam: benchmarks that track economic value

arXiv

A large new benchmark built around economically valuable, real-world professional tasks, aimed at the gap between strong benchmark scores and GDP-relevant work.

Read source

SentinelBench: a benchmark for long-running monitoring agents

arXiv

A benchmark for agents that monitor work spanning minutes to hours, rather than the one-shot tasks most evals assume.

Read source

When an LLM judge can be talked out of its verdict

arXiv

Tests whether a large language model acting as judge can be talked out of a verdict it has already reached, a direct challenge to the assumed stability behind automated benchmarking pipelines.

Read source

Agents and institutional knowledge

AI Skills as a primitive for institutional knowledge

arXiv

Proposes Agentic Knowledge Units as a structured way to capture the institutional knowledge enterprises accumulate, so agents can act on it instead of guessing.

Read source

PACT: action-state communication for multi-agent systems

arXiv

Structures inter-agent messages around action and state to cut communication overhead and cost in multi-agent systems built on large language models.

Read source

SciVisAgentSkills: reusable skills for scientific visualization

arXiv

Designs and evaluates a set of reusable agent skills for scientific data analysis and visualization, a concrete test of the skills-as-primitive idea.

Read source

Governance, cost, and the grid

Zero-knowledge verification for frontier AI training

arXiv

Argues that zero-knowledge methods such as zero-knowledge virtual machines and Merkle commitments can verify how much compute went into training a model, a building block for compute-based governance.

Read source

Carbon and energy cost of US hyperscale data centers

arXiv

Estimates the carbon emissions and energy consumption driving the rapid build-out of US hyperscale data centers.

Read source

Insurance of agentic AI

arXiv

Looks at how insurance and capital might price the risk of agentic systems that act on their own, not just generate text.

Read source

On the timeline

Trump administration pushes AI into healthcare

Washington Post via Techmeme

A report on the administration's effort to integrate AI across healthcare, including an FDA regulatory fast track for digital health tools like AI chatbots.

Read source

Anthropic on handing its own development to AI

r/ClaudeAI

A reader flags Anthropic's new piece on giving AI systems more of the work of building Anthropic's own models, with figures on how far that already goes.

“When AI builds itself”

Read source

Companion episode

When the Harness Carries the Model

2026-06-06 · 00:17:31

Episode Watch on YouTube Sources Transcript Chapters JSON

DeepSeek V4 continues this week's open-weights streak, from MiniMax M3 on Monday through a steady run of agentic-coding scores. Today's benchmark papers are a useful counterweight: as more of model development gets handed to the models themselves, the harder question is whether any of it shows up as durable, economically real work.