DeepSeek's V4 series is now getting llama.cpp support through an early PR, putting a frontier open-weights model within reach of a single machine. On Latent Space, CommandCodeAI's Ahmad Awais walks through making DeepSeek v4 outperform Claude Opus 4.7, leaning on tool-calling reliability and repair logic rather than raw scale.
Read source◆ Braid Daily · 2026-06-06
DeepSeek V4 reaches local hardware, tuned to rival Opus 4.7
DeepSeek's V4 series is getting llama.cpp support, and a Latent Space guest claims he made it outperform Opus 4.7 on taste, not scale.
The lead
1Models and local inference
2DeepSeek V4 Flash arrives on llama.cpp
r/LocalLLaMA
An early work-in-progress PR brings DeepSeek V4 support to llama.cpp, opening the series up for local experimentation. The author warns it is at a very early stage.
Read source“the DeepSeek V4 series is finally getting supported on llama.cpp with this PR”
SAGE-PTQ: ultra-low-bit quantization for large models
arXiv
A graph-guided post-training quantization method aimed at cutting the inference and deployment cost of running large models at very low bit widths.
Read sourceBenchmarks under pressure
3Agents' Last Exam: benchmarks that track economic value
arXiv
A large new benchmark built around economically valuable, real-world professional tasks, aimed at the gap between strong benchmark scores and GDP-relevant work.
Read sourceSentinelBench: a benchmark for long-running monitoring agents
arXiv
A benchmark for agents that monitor work spanning minutes to hours, rather than the one-shot tasks most evals assume.
Read sourceWhen an LLM judge can be talked out of its verdict
arXiv
Tests whether a large language model acting as judge can be talked out of a verdict it has already reached, a direct challenge to the assumed stability behind automated benchmarking pipelines.
Read sourceAgents and institutional knowledge
3AI Skills as a primitive for institutional knowledge
arXiv
Proposes Agentic Knowledge Units as a structured way to capture the institutional knowledge enterprises accumulate, so agents can act on it instead of guessing.
Read sourcePACT: action-state communication for multi-agent systems
arXiv
Structures inter-agent messages around action and state to cut communication overhead and cost in multi-agent systems built on large language models.
Read sourceSciVisAgentSkills: reusable skills for scientific visualization
arXiv
Designs and evaluates a set of reusable agent skills for scientific data analysis and visualization, a concrete test of the skills-as-primitive idea.
Read sourceGovernance, cost, and the grid
3Zero-knowledge verification for frontier AI training
arXiv
Argues that zero-knowledge methods such as zero-knowledge virtual machines and Merkle commitments can verify how much compute went into training a model, a building block for compute-based governance.
Read sourceCarbon and energy cost of US hyperscale data centers
arXiv
Estimates the carbon emissions and energy consumption driving the rapid build-out of US hyperscale data centers.
Read sourceInsurance of agentic AI
arXiv
Looks at how insurance and capital might price the risk of agentic systems that act on their own, not just generate text.
Read sourceOn the timeline
2Trump administration pushes AI into healthcare
Washington Post via Techmeme
A report on the administration's effort to integrate AI across healthcare, including an FDA regulatory fast track for digital health tools like AI chatbots.
Read sourceAnthropic on handing its own development to AI
r/ClaudeAI
A reader flags Anthropic's new piece on giving AI systems more of the work of building Anthropic's own models, with figures on how far that already goes.
Read source“When AI builds itself”
Companion episode
When the Harness Carries the Model
DeepSeek V4 continues this week's open-weights streak, from MiniMax M3 on Monday through a steady run of agentic-coding scores. Today's benchmark papers are a useful counterweight: as more of model development gets handed to the models themselves, the harder question is whether any of it shows up as durable, economically real work.