Archive CONSTRUCT
The Verification Pass / DISPATCH 006
PDF RSS

Dispatch 006 · 2026-05-15

The Verification Pass

/ 00:14:24 / 9 sources

“A useful agent now has to build the harness that proves its answer can survive contact with the world.”

— Lenar Kess, today's narration

A useful agent now has to build the harness that proves its answer can survive contact with the world.

  • The Verification Pass

Chapters

  1. 00:00:00 Transcript

Sources

9 cited
  1. 1

    Orthrus-Qwen3-8B: up to 7.8 tokens per forward on Qwen3-8B

    Article Franck_Dernoncourt — Reddit poster disclosed co-authorship and linked the code, paper, and Hugging Face models.

    Output distribution is provably identical to the base model.

    www.reddit.com/r/LocalLLaMA/comments/1te5xp… →
    Details
    Cited text
    Output distribution is provably identical to the base model.
    Context
    It makes verification part of the inference loop rather than an external reviewer added later.
    Key points
    • Orthrus adds a trainable diffusion attention module to a frozen autoregressive transformer.
    • The diffusion head proposes 32 tokens in parallel and the autoregressive head verifies the longest accepted prefix.
    • The author reports up to 7.8 tokens per forward pass and roughly 6x wall-clock speed on MATH-500.
    • Limitations include Qwen-only evaluation, greedy plus rejection sampling only, and inherited limits from the frozen base model.
    Provenance
    Article · Supporting source
  2. 2

    Bryan Catanzaro on Nemotron 3 Super and Ultra

    X Bryan Catanzaro — NVIDIA researcher commenting on Nemotron training precision and scale.

    Accelerated computing means we rethink every aspect of the AI stack.

    x.com/ctnzr/status/2055393135971492034 →
    Details
    Cited text
    Accelerated computing means we rethink every aspect of the AI stack.
    Context
    It puts efficiency pressure inside the training run, not only at inference time.
    Key points
    • Nemotron 3 Super is described as 120 billion parameters and pretrained on 25 trillion tokens in NVFP4.
    • Nemotron 3 Ultra is described as roughly 500 billion parameters and also pretrained in NVFP4.
    • The post gives engineering direction, but not a full model card or evaluation package.
    Provenance
    Tweet · Primary source
  3. 3

    Greg Brockman Officially Takes Control of OpenAI's Products in Latest Shakeup

    Article WIRED — WIRED report surfaced through the r/OpenAI post in the packet.

    execute with maximum focus toward the agentic future

    www.wired.com/story/openai-reorg-greg-brock… →
    Details
    Cited text
    execute with maximum focus toward the agentic future
    Context
    It makes Codex a central product primitive rather than a separate coding surface.
    Key points
    • OpenAI told staff it is reorganizing product efforts under Greg Brockman.
    • The report says ChatGPT, Codex, and the developer API are being folded into one core product team.
    • Thibault Sottiaux is described as leading core product and platform across consumer, enterprise, and developer surfaces.
    Provenance
    Article · Supporting source
  4. 4

    Combine Skills and MCP to Close the Context Gap

    Video Pedro Rodrigues, Supabase — AI Engineer talk summarized in the packet.

    security_invoker = true

    www.youtube.com/watch?v=JT3OzDKrucU →
    Details
    Cited text
    security_invoker = true
    Context
    It shows a concrete product rule that a general agent misses unless the workflow teaches it before action.
    Key points
    • Supabase tested agents on Postgres row-level security tasks where views can bypass isolation without the right flag.
    • MCP plus skills improved completion compared with MCP-only runs.
    • The talk recommends keeping critical rules in the main skill file and enforcing opinionated workflows.
    Provenance
    Video · Supporting source
  5. 5

    Self-hosted MCP server for public U.S. financial data

    Article DanielAPO — Developer post on r-slash LocalLLaMA.

    No cloud dependency, no API keys, no telemetry

    www.reddit.com/r/LocalLLaMA/comments/1te2jk… →
    Details
    Cited text
    No cloud dependency, no API keys, no telemetry
    Context
    It turns local agents toward live data, which makes provenance and date discipline part of the product contract.
    Key points
    • Equibles exposes SEC filings, institutional holdings, insider and congressional trades, short data, FRED indicators, and prices as MCP tools.
    • The post frames current financial data as a missing ingredient for local model agents.
    • The tool runs locally and avoids cloud telemetry.
    Provenance
    Article · Supporting source
  6. 6

    Ruining Li introduces Articraft

    Thread Ruining Li — Researcher announcing Articraft and Articraft-10K.

    writes code, executes it, receives validation feedback

    x.com/RayLi234/status/2055345165779562870 →
    Details
    Cited text
    writes code, executes it, receives validation feedback
    Context
    It moves agentic coding into simulation-ready physical artifacts where validation has to check behavior, not just text.
    Key points
    • Articraft generates articulated 3D assets with parts, joints, and motion.
    • The system uses code execution and validation feedback rather than one-shot asset generation.
    • Articraft-10K contains more than 10,000 articulated objects across 250 categories.
    Provenance
    Thread · Primary source
  7. 7

    Agents Don't Do Standups: Building the Post-Engineer Engineering Org

    Video Mike Spitz, PFF — AI Engineer talk summarized in the packet.

    two engineers against a team of ten

    www.youtube.com/watch?v=VMemhtlsoNk →
    Details
    Cited text
    two engineers against a team of ten
    Context
    It frames agent productivity as a verification workflow, not just a headcount replacement story.
    Key points
    • The PFF case study used lightweight design documents, generated tickets and pull requests, trunk-based development, feature flags, agentic review, and QA agents.
    • The packet summary reports much higher deployment frequency and quality-score gains during the case study.
    • The workflow depends on system-literate senior engineers and task decomposition.
    Provenance
    Video · Supporting source
  8. 8

    Harrison Chase on dependable repair for agent failures

    X Harrison Chase — LangChain cofounder quoting a post about LangSmith Engine and auto-remediation.

    Dependably for LLM agent failures

    x.com/hwchase17/status/2055278799240241621 →
    Details
    Cited text
    Dependably for LLM agent failures
    Context
    It names the product direction where agent monitoring opens a path to proposed fixes.
    Key points
    • The quoted post describes LangSmith Engine as a detector for agent failures.
    • It proposes auto-remediation with a human approval gate as the next layer.
    • The concept fits the repair-after-detection pattern across the episode.
    Provenance
    Tweet · Primary source
  9. 9

    Tibo on GPT-5.5 performance reports

    X Tibo — Codex team lead posting about user reports.

    We don't have anything conclusive yet

    x.com/thsottiaux/status/2055316274394300829 →
    Details
    Cited text
    We don't have anything conclusive yet
    Context
    It separates model size and launch energy from the measured user experience.
    Key points
    • The Codex team is investigating reports that GPT-5.5 performs worse for some users.
    • The post says systems are healthy and the team has no conclusion yet.
    • The episode treats this as a caution against turning anecdotal performance into a product verdict.
    Provenance
    Tweet · Primary source