Archive BRAID
The number nobody optimized for / DISPATCH 042
PDF RSS

Dispatch 042 · 2026-05-30 GSV The Number Nobody Optimized For

The number nobody optimized for

/ 00:18:29 / 20 sources

“Two documents from the same week that don't contradict each other. One says Opus 4.8 jumped on math and slipped on business ops. The other says the whole bar-chart genre measures the harness as much as the model.”

— Lenar Kess, today's narration

Claude Opus 4.8 landed overnight with a math score that leapt and a business-ops score that fell — and reading the release honestly means distrusting the chart. Lenar and Damra work through the gap between the number that moved and the number that matters, then chase it into agent budgets, the protocol wars, local-inference tooling, Mistral's on-prem bet, and the power grid.

Chapters

  1. 00:00:00 Transcript

Sources

20 cited
  1. 1

    AI Engineer · 20m12s

    Video AI Engineer

    Why your agents need decision traces, not just documents — Zach Blumenfeld, Neo4j — A knowledge base tells a financial analyst agent the risk factors. A context graph tells it whether to reject or accept, because it…

    www.youtube.com/watch?v=B9h9ovW5H9U →
    Details
    Excerpt
    Why your agents need decision traces, not just documents — Zach Blumenfeld, Neo4j — A knowledge base tells a financial analyst agent the risk factors. A context graph tells it whether to reject or accept, because it…
    Context
    Directly addresses agentic tools and advanced AI infrastructure (graph DBs, RAG extension, decision tracing), which is core to the podcast topic.
    Key points
    • Directly addresses agentic tools and advanced AI infrastructure (graph DBs, RAG extension, decision tracing), which is core to the podcast topic.
    Provenance
    Video · Supporting source
  2. 2

    Techmeme - Industry Adjacent (US)

    Article

    Executives at Uber, Meta, Microsoft, and other companies are trying to rein in "tokenmaxxing" by employees, which led to ballooning AI use costs (Bradley Olson/Wall Street Journal) - Bradley Olson / Wall Street Journal.…

    www.techmeme.com/260529/p22 →
    Details
    Excerpt
    Executives at Uber, Meta, Microsoft, and other companies are trying to rein in "tokenmaxxing" by employees, which led to ballooning AI use costs (Bradley Olson/Wall Street Journal) - Bradley Olson / Wall Street Journal...
    Context
    Directly addresses the cost and resource constraints (compute/money) of AI, a core topic.
    Key points
    • Directly addresses the cost and resource constraints (compute/money) of AI, a core topic.
    Provenance
    Article · Supporting source
  3. 3

    @xai (xAI)

    X xai

    grok-build-0.1 is now available via the xAI API in public beta. This is the same model that powers the Grok Build CLI and excels at agentic coding. Priced at $1/m input and $2/m output, it’s extremely cost effective,…

    x.com/xai/status/2060392249402552457 →
    Details
    Excerpt
    grok-build-0.1 is now available via the xAI API in public beta. This is the same model that powers the Grok Build CLI and excels at agentic coding. Priced at $1/m input and $2/m output, it’s extremely cost effective,…
    Context
    Announces a new, specific, and functional agentic coding tool (grok-build-0.1) and its API, directly addressing the podcast's focus on agentic tools and frontier models.
    Key points
    • Announces a new, specific, and functional agentic coding tool (grok-build-0.1) and its API, directly addressing the podcast's focus on agentic tools and frontier models.
    Provenance
    Tweet · Primary source
  4. 4

    @ttunguz (Tomasz Tunguz)

    X ttunguz

    I've been using state-of-the-art models to teach small models running on my computer how I work. The result : a personal agent that runs my inbox, my deal pipeline, my blog, my calendar, & my research. 🧵

    x.com/ttunguz/status/2060393514144502070 →
    Details
    Excerpt
    I've been using state-of-the-art models to teach small models running on my computer how I work. The result : a personal agent that runs my inbox, my deal pipeline, my blog, my calendar, & my research. 🧵
    Context
    Describes a personal agent built using state-of-the-art models to automate professional workflows (inbox, pipeline, calendar), directly addressing agentic tools and AI application.
    Key points
    • Describes a personal agent built using state-of-the-art models to automate professional workflows (inbox, pipeline, calendar), directly addressing agentic tools and AI application.
    Provenance
    Tweet · Primary source
  5. 5

    @ggerganov (Georgi Gerganov)

    X ggerganov

    llama.cpp now has an official website: https:// llama.app Our goal is to make local AI accessible to everyone, and improving the user experience is a big part of that. On the new landing page you’ll find a single-line…

    x.com/ggerganov/status/2060394400237109567 →
    Details
    Excerpt
    llama.cpp now has an official website: https:// llama.app Our goal is to make local AI accessible to everyone, and improving the user experience is a big part of that. On the new landing page you’ll find a single-line…
    Context
    Announcing a primary artifact (official website/installer) for a key local AI tool (llama.cpp), directly related to AI infrastructure and accessibility.
    Key points
    • Announcing a primary artifact (official website/installer) for a key local AI tool (llama.cpp), directly related to AI infrastructure and accessibility.
    Provenance
    Tweet · Primary source
  6. 6

    Notes from the Mistral AI Now Summit — 399 pts · 174 comments

    Article vnglst

    https://koenvangilst.nl/lab/mistral-ai-now-summit · @trouve_search: OK, I'm 100% rooting for both Mistral and task focused small models. But Mistral has fall really far behind since 2025Q3. It seems they can't get good…

    koenvangilst.nl/lab/mistral-ai-now-summit →
    Details
    Excerpt
    https://koenvangilst.nl/lab/mistral-ai-now-summit · @trouve_search: OK, I'm 100% rooting for both Mistral and task focused small models. But Mistral has fall really far behind since 2025Q3. It seems they can't get good…
    Context
    Directly discusses Mistral's positioning, small models, and on-prem use in regulated European industries, hitting core topics.
    Key points
    • Directly discusses Mistral's positioning, small models, and on-prem use in regulated European industries, hitting core topics.
    Provenance
    Article · Supporting source
  7. 7

    @wzenus (Zihan "Zenus" Wang)

    X wzenus

    🧵 Claude-Opus-4.8 takes you too much tokens - but is this issue general across agents? Do agents know how much they'll spend? Introducing Budget-Aware Agents (BAGEN): We study budget awareness across 4 envs & 5…

    x.com/wzenus/status/2060397732846612489/pho… →
    Details
    Excerpt
    🧵 Claude-Opus-4.8 takes you too much tokens - but is this issue general across agents? Do agents know how much they'll spend? Introducing Budget-Aware Agents (BAGEN): We study budget awareness across 4 envs & 5…
    Context
    Discusses a technical limitation (token usage) and introduces a new concept/study (BAGEN) directly related to agentic tools and AI infrastructure.
    Key points
    • Discusses a technical limitation (token usage) and introduces a new concept/study (BAGEN) directly related to agentic tools and AI infrastructure.
    Provenance
    Tweet · Primary source
  8. 8

    @antirez

    X antirez

    DwarfStar distributed inference is now on GitHub: you can run 2 bit Flash using 2 64GB machines, or 4 bit Flash with two 128GB machines or 4 64GB, ad so forth. Prefill speed will increase thanks to pipelining.…

    x.com/antirez/status/2060403966676987918 →
    Details
    Excerpt
    DwarfStar distributed inference is now on GitHub: you can run 2 bit Flash using 2 64GB machines, or 4 bit Flash with two 128GB machines or 4 64GB, ad so forth. Prefill speed will increase thanks to pipelining.…
    Context
    Reports a primary artifact (GitHub release) related to AI infrastructure (distributed inference/GPUs), directly relevant to the podcast's focus.
    Key points
    • Reports a primary artifact (GitHub release) related to AI infrastructure (distributed inference/GPUs), directly relevant to the podcast's focus.
    Provenance
    Tweet · Primary source
  9. 9

    @saen_dev (Saeed Anwar)

    X saen_dev

    1 in 3 teams running open weights is the inflection point where the ecosystem starts building tooling for open models instead of treating them as second-class citizens. Once the tooling catches up, the adoption curve…

    x.com/saen_dev/status/2060409865638457805 →
    Details
    Excerpt
    1 in 3 teams running open weights is the inflection point where the ecosystem starts building tooling for open models instead of treating them as second-class citizens. Once the tooling catches up, the adoption curve…
    Context
    Discusses the critical inflection point of open weights adoption and the resulting tooling ecosystem, directly addressing the 'power dynamics' and 'AI infrastructure' aspects of the topic.
    Key points
    • Discusses the critical inflection point of open weights adoption and the resulting tooling ecosystem, directly addressing the 'power dynamics' and 'AI infrastructure' aspects of the topic.
    Provenance
    Tweet · Primary source
  10. 10

    @OnlyEvaWonder (Eva Wonder)

    X OnlyEvaWonder

    gemma leading the pack and ollama in the open inference section makes so much sense

    x.com/OnlyEvaWonder/status/2060420977205370… →
    Details
    Excerpt
    gemma leading the pack and ollama in the open inference section makes so much sense
    Context
    Mentions specific models (Gemma) and tools (Ollama) within the context of open inference, directly related to AI infrastructure and frontier models.
    Key points
    • Mentions specific models (Gemma) and tools (Ollama) within the context of open inference, directly related to AI infrastructure and frontier models.
    Provenance
    Tweet · Primary source
  11. 11

    @LaceyPresley (Lacey)

    X LaceyPresley

    TERAFAB IS GOING TO BE INSANE. We’re targeting 100–200 billion custom AI + memory chips per year at full ramp that’s 1 terawatt (1,000 GW) of annual AI compute capacity. Roughly 50x current global AI chip output. This…

    x.com/LaceyPresley/status/20604361356716320… →
    Details
    Excerpt
    TERAFAB IS GOING TO BE INSANE. We’re targeting 100–200 billion custom AI + memory chips per year at full ramp that’s 1 terawatt (1,000 GW) of annual AI compute capacity. Roughly 50x current global AI chip output. This…
    Context
    Discusses massive, specific AI infrastructure scale (1 terawatt, 50x current output), directly addressing the podcast's focus on AI infrastructure and power dynamics.
    Key points
    • Discusses massive, specific AI infrastructure scale (1 terawatt, 50x current output), directly addressing the podcast's focus on AI infrastructure and power dynamics.
    Provenance
    Tweet · Primary source
  12. 12

    r/ClaudeAI: Ai Benchmarks are useless - 0 pts · 0 comments

    Article Significant-Care-135

    I'm done with the launch cycle. Every new model drops with the same flashy report, bar charts all over the place, hitting 92% on MMLU-Pro, 94% on GPQA, or whatever coding benchmark they're pushing this week. Then you...

    www.reddit.com/r/ClaudeAI/comments/1trclg3/… →
    Details
    Excerpt
    I'm done with the launch cycle. Every new model drops with the same flashy report, bar charts all over the place, hitting 92% on MMLU-Pro, 94% on GPQA, or whatever coding benchmark they're pushing this week. Then you...
    Context
    Directly critiques the current state of AI evaluation (benchmarks), which is central to understanding frontier model capabilities and limitations in practice.
    Key points
    • Directly critiques the current state of AI evaluation (benchmarks), which is central to understanding frontier model capabilities and limitations in practice.
    Provenance
    Article · Supporting source
  13. 13

    r/LocalLLaMA: I tested MTP on vLLM and llama.cpp for Gemma 4 & Qwen 3.6 — 3.34x faster inference, here are my findings RTX 6000 PRO. - 0 pts · 0 comments

    Article FantasticNature7590

    Hey guys, I spent the last few weeks benchmarking Multi-Token Prediction (MTP) on Gemma 4 31B and Qwen 3.6 27B locally GGUF, FP8 using both vLLM and llama.cpp. MTP is the inference trick every major lab is quietly...

    www.reddit.com/r/LocalLLaMA/comments/1trf0r… →
    Details
    Excerpt
    Hey guys, I spent the last few weeks benchmarking Multi-Token Prediction (MTP) on Gemma 4 31B and Qwen 3.6 27B locally GGUF, FP8 using both vLLM and llama.cpp. MTP is the inference trick every major lab is quietly...
    Context
    Benchmarking MTP on major models (Gemma 4, Qwen 3.6) and frameworks (vLLM, llama.cpp) is a primary artifact that measurably changes the developer's mental model of inference speed.
    Key points
    • Benchmarking MTP on major models (Gemma 4, Qwen 3.6) and frameworks (vLLM, llama.cpp) is a primary artifact that measurably changes the developer's mental model of inference speed.
    Provenance
    Article · Supporting source
  14. 14

    MCP is dead? — 283 pts · 265 comments

    Article nadis

    https://www.quandri.io/engineering-blog/mcp-is-dead · @mxstbr: I run the team at OpenAI that's responsible for the ChatGPT App Store, Codex plugins, and all things MCP. The thing that all these "MCP is dead" posts are…

    www.quandri.io/engineering-blog/mcp-is-dead →
    Details
    Excerpt
    https://www.quandri.io/engineering-blog/mcp-is-dead · @mxstbr: I run the team at OpenAI that's responsible for the ChatGPT App Store, Codex plugins, and all things MCP. The thing that all these "MCP is dead" posts are…
    Context
    Directly discusses AI agents, service access, and protocols for connecting models to external services, which is central to the podcast's focus on agentic tools and AI infrastructure.
    Key points
    • Directly discusses AI agents, service access, and protocols for connecting models to external services, which is central to the podcast's focus on agentic tools and AI infrastructure.
    Provenance
    Article · Supporting source
  15. 15

    AI News & Strategy Daily | Nate B Jones · 51s

    Video AI News & Strategy Daily | Nate B Jones

    The death of the filing cabinet #ai #tech — Full Story w/ Prompts: https://natesnewsletter.substack.com/p/your-engineers-are-building-your?r=1z4sm5&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true…

    www.youtube.com/shorts/59NCmQ3hxz4 →
    Details
    Excerpt
    The death of the filing cabinet #ai #tech — Full Story w/ Prompts: https://natesnewsletter.substack.com/p/your-engineers-are-building-your?r=1z4sm5&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true…
    Context
    Discusses the shift from siloed data (filing cabinets/Jira) to a unified, intelligent context platform, directly addressing AI infrastructure and the changing craft of software engineering.
    Key points
    • Discusses the shift from siloed data (filing cabinets/Jira) to a unified, intelligent context platform, directly addressing AI infrastructure and the changing craft of software engineering.
    Provenance
    Video · Supporting source
  16. 16

    @LaceyPresley (Lacey)

    X LaceyPresley

    Elon Musk's Terafab & The Silicon Empire TerraFab: Tesla’s $119B semiconductor foundry in Texas is a masterclass in vertical integration. By owning 2nm AI silicon production (in JV with SpaceX, xAI & Intel), Tesla is…

    x.com/LaceyPresley/status/20605140423813246… →
    Details
    Excerpt
    Elon Musk's Terafab & The Silicon Empire TerraFab: Tesla’s $119B semiconductor foundry in Texas is a masterclass in vertical integration. By owning 2nm AI silicon production (in JV with SpaceX, xAI & Intel), Tesla is…
    Context
    Discusses a major, specific artifact (TeraFab foundry) and its direct impact on AI hardware/compute, which is central to the podcast's scope.
    Key points
    • Discusses a major, specific artifact (TeraFab foundry) and its direct impact on AI hardware/compute, which is central to the podcast's scope.
    Provenance
    Tweet · Primary source
  17. 17

    r/Anthropic: Here's >100 evals for Opus 4.8 compared to top AI models - 0 pts · 0 comments

    Article davidthesong

    I scraped 100+ evals on Opus 4.8 to see what changed. The big gains vs 4.7: Math: USAMO 2026 jumped from 69% → 97% Coding: Vibe Code Bench +12 pp Economically valuable work: #1 of 275 on GDPval-AA Biology Long-context...

    i.redd.it/xdz6vagi464h1.png →
    Details
    Excerpt
    I scraped 100+ evals on Opus 4.8 to see what changed. The big gains vs 4.7: Math: USAMO 2026 jumped from 69% → 97% Coding: Vibe Code Bench +12 pp Economically valuable work: #1 of 275 on GDPval-AA Biology Long-context...
    Context
    This post ships a primary artifact (benchmark data) comparing a new frontier model (Opus 4.8) against others, directly addressing model capability and performance.
    Key points
    • This post ships a primary artifact (benchmark data) comparing a new frontier model (Opus 4.8) against others, directly addressing model capability and performance.
    Provenance
    Article · Supporting source
  18. 18

    The AI Daily Brief: Artificial Intelligence News · 23m45s

    Video The AI Daily Brief: Artificial Intelligence News

    First Impressions of the New Opus 4.8 — Anthropic releases Claude Opus 4.8 with improved honesty, stronger self‑verification, and multi‑agent dynamic workflows for large code tasks. Benchmark scores narrow versus…

    www.youtube.com/watch?v=zf8BfgJghd8 →
    Details
    Excerpt
    First Impressions of the New Opus 4.8 — Anthropic releases Claude Opus 4.8 with improved honesty, stronger self‑verification, and multi‑agent dynamic workflows for large code tasks. Benchmark scores narrow versus…
    Context
    Covers major topics: Anthropic/OpenAI model releases, agentic coding (Cognition), and AI infrastructure/power dynamics (K&E, Meta).
    Key points
    • Covers major topics: Anthropic/OpenAI model releases, agentic coding (Cognition), and AI infrastructure/power dynamics (K&E, Meta).
    Provenance
    Video · Supporting source
  19. 19

    CourtListener AI RECAP Search - Legal Courts (US)

    Article District Court, D. Vermont

    Brunell v. OpenAI, LLC - Original document

    www.courtlistener.com/docket/73209323/38/br… →
    Details
    Excerpt
    Brunell v. OpenAI, LLC - Original document
    Context
    A lawsuit naming OpenAI directly addresses power dynamics, liability, and control, which is core to the podcast topic.
    Key points
    • A lawsuit naming OpenAI directly addresses power dynamics, liability, and control, which is core to the podcast topic.
    Provenance
    Article · Supporting source
  20. 20

    Techmeme - Industry Adjacent (US)

    Article

    Sources detail AI companies' engagement with US FERC as the energy regulator readies a June proposal to speed up data center connections to regional power grids (Politico) - Politico : Sources detail AI companies'...

    www.techmeme.com/260530/p7 →
    Details
    Excerpt
    Sources detail AI companies' engagement with US FERC as the energy regulator readies a June proposal to speed up data center connections to regional power grids (Politico) - Politico : Sources detail AI companies'...
    Context
    Directly addresses AI infrastructure (data centers, energy) and power dynamics (regulators, capital), which is a core topic.
    Key points
    • Directly addresses AI infrastructure (data centers, energy) and power dynamics (regulators, capital), which is a core topic.
    Provenance
    Article · Supporting source