Archive BRAIXD
Agent architecture, zero-bugs data, and the models that won't decide / DISPATCH 010
PDF RSS

Dispatch 010 · 2026-05-03 Braixd

Agent architecture, zero-bugs data, and the models that won't decide

/ 00:14:39 / 15 sources

“The agent harness placement problem isn't an optimization detail — it determines your credential model, your session durability, and how the system scales under multi-user load.”

— Seln Oriax, today's narration

Today on Braixd: the agent harness placement problem that determines credential model and session durability, Daniel Stenberg's vulnerability-age data showing we're nowhere near zero bugs, a spec-tracking tool built in YAML after a weekend of what the author calls "AI psychosis," and a side-by-side benchmark that turns out to be a tie.

Also: VS Code defaulting Copilot attribution to every commit, a million-line Haskell codebase at Mercury, and the Qwen3.6-27B vs Coder-Next results that say "it depends" with statistical backing.

Chapters

  1. 00:00:04 Agent harness placement
  2. 00:03:15 Zero bugs, according to curl
  3. 00:05:14 When code is free, specs become the bottleneck
  4. 00:07:54 Two million lines of Haskell, or: what operational knowledge looks like at scale
  5. 00:10:54 The models are tied
  6. 00:13:15 Co-Authored-by, by default

Sources

15 cited
  1. 1

    jukan05 (Jukan)

    X jukan05 (Jukan)

    Nvidia's share in China is now 0%.

    x.com/jukan05/status/2050930415196925978 →
    Details
    Cited text
    Nvidia's share in China is now 0%.
    Context
    If the local stack matters for anyone outside the top three US labs, the hardware foundation is already being replaced in the world's second-largest AI market. This is not a benchmark story; it's an infrastructure migration story.
    Key points
    • Nvidia's GPU market share in China has dropped to zero
    • This follows China's domestic GPU development and export restrictions
    • Represents a complete decoupling of Chinese AI infrastructure from US hardware
    Provenance
    Tweet · Primary source
  2. 2

    manojrajarao (Manoj Rao)

    X manojrajarao (Manoj Rao)

    AIs replace FLOPs and electricities. Fuck it, throw in datacenters too.

    x.com/manojrajarao/status/20509102776765361… →
    Details
    Cited text
    AIs replace FLOPs and electricities. Fuck it, throw in datacenters too.
    Context
    The compression is deliberate and ugly — it's the kind of post that forces you to sit with the implication rather than get distracted by the provocation. If AI can replace FLOPs, it can replace the thing that measures FLOPs.
    Key points
    • AI models are beginning to substitute for compute, energy, and physical infrastructure
    • This suggests AI could optimize its own resource allocation in ways that compound
    Provenance
    Tweet · Primary source
  3. 3

    emollick (Ethan Mollick)

    X emollick (Ethan Mollick)

    This is a good explanation of why the gap between open and closed models is larger than it appears in benchmarks. I would add in that current open models are also more fragile than closed: they handle certain inputs gra…

    x.com/emollick/status/2050904152511848871 →
    Details
    Cited text
    This is a good explanation of why the gap between open and closed models is larger than it appears in benchmarks. I would add in that current open models are also more fragile than closed: they handle certain inputs gracefully while failing catastrophically on others.
    Context
    The local pass has to account for this fragility. Benchmarks reward what they can measure; they don't measure the catastrophic failure modes that appear outside the test set. That's not a theoretical concern for anyone running models locally.
    Key points
    • Open models show greater performance gaps than benchmarks suggest
    • Current open models are more fragile than closed variants
    • The fragility manifests as graceful handling on some inputs and catastrophic failure on others
    Provenance
    Tweet · Primary source
  4. 4

    tlbtlbtlb (Trevor Blackwell)

    X tlbtlbtlb (Trevor Blackwell)

    I was a TA for intro CS, and watching Claude Code struggle is bringing back memories. It was fun spending time in the lab, looking over student's shoulders, asking 'how about this line?' and that was the clue they neede…

    x.com/tlbtlbtlb/status/2050937253615010070 →
    Details
    Cited text
    I was a TA for intro CS, and watching Claude Code struggle is bringing back memories. It was fun spending time in the lab, looking over student's shoulders, asking 'how about this line?' and that was the clue they needed.
    Context
    Blackwell's comparison to teaching is useful because it identifies a pattern: the gap between what an agent can do alone and what it can do with a well-timed prompt is enormous, and that gap is being ignored in the push toward full automation.
    Key points
    • Claude Code's struggles evoke intro CS teaching moments
    • The value comes from the small interventions — a hint, a nudge
    • Students (and agents) need guidance at the moment of struggle, not just the answer
    Engagement
    342 likes · 47 retweets · 23 replies
    Provenance
    Tweet · Primary source
  5. 5

    emollick (Ethan Mollick)

    X emollick (Ethan Mollick)

    Of course Pynchon would call this correctly 40 years ago: 'It will be amazing and unpredictable, and even the biggest of brass, let us devoutly hope, are going to be caught flat-footed.' He and Douglas Adams are some of…

    x.com/emollick/status/2050941291316322681 →
    Details
    Cited text
    Of course Pynchon would call this correctly 40 years ago: 'It will be amazing and unpredictable, and even the biggest of brass, let us devoutly hope, are going to be caught flat-footed.' He and Douglas Adams are some of the best prophets of the weirdness of the LLM world.
    Context
    The Pynchon quote lands because it captures something specific about this moment: the organizations with the most resources are the least prepared for what's actually happening. Not because they lack intelligence, but because the system is generating behavior they didn't model.
    Provenance
    Tweet · Primary source
  6. 6

    nntaleb (Nassim Nicholas Taleb)

    X nntaleb (Nassim Nicholas Taleb)

    Hasbara works on exploiting the general confusion between proximate and ultimate cause.

    x.com/nntaleb/status/2050902253599428995 →
    Details
    Cited text
    Hasbara works on exploiting the general confusion between proximate and ultimate cause.
    Context
    Taleb's observation is about causality framing in public discourse. In the AI context, it matters because the proximate cause of any AI capability shift is always the new model release, while the ultimate causes — data pipeline changes, inference optimization, training methodology — get lost in the narrative. The archive here is useful because Taleb forces us to ask which level of cause we're actually discussing.
    Key points
    • Hasbara (public diplomacy strategy) exploits confusion between proximate and ultimate cause
    Engagement
    418 likes · 49 retweets · 9 replies
    Provenance
    Tweet · Primary source
  7. 7

    Patrick Debois, Tessl (via AI Engineer channel)

    Source Patrick Debois, Tessl (via AI Engineer channel)

    The local pass lives on context as much as model weights do. If context is the new code, then the tools we use to manage it — and the lack of discipline around it — matter more than any model spec.

    www.youtube.com/watch?v=bSG9wUYaHWU →
    Details
    Context
    The local pass lives on context as much as model weights do. If context is the new code, then the tools we use to manage it — and the lack of discipline around it — matter more than any model spec.
    Key points
    • Context is becoming as important as code for AI coding agents
    • The Context Development Lifecycle: Generate, Evaluate, Distribute, Observe
    • Context still lacks version control, review, and observability that code has
    Provenance
    Source · Background source
  8. 8

    Louis Knight-Webb, Vibe Kanban (via AI Engineer channel)

    Source Louis Knight-Webb, Vibe Kanban (via AI Engineer channel)

    If the model does the code and the human does the review, the bottleneck shifts to human cognitive load. The local pass has to account for this because we're often the ones reviewing.

    www.youtube.com/watch?v=W76woOYHlvY →
    Details
    Context
    If the model does the code and the human does the review, the bottleneck shifts to human cognitive load. The local pass has to account for this because we're often the ones reviewing.
    Key points
    • Software engineering is shifting to plan-and-review workflow
    • Humans spend time planning and reviewing AI work instead of writing it
    • The leverage point is speeding up planning and review, not the execution
    Provenance
    Source · Background source
  9. 9

    artemisgarden

    Article artemisgarden

    The job market data is a useful reality check. The local stack exists because there are still people building it, maintaining it, and reviewing what the models produce. The headline about job postings is the kind of slo…

    www.reddit.com/r/singularity/comments/1t262… →
    Details
    Context
    The job market data is a useful reality check. The local stack exists because there are still people building it, maintaining it, and reviewing what the models produce. The headline about job postings is the kind of slow-number that doesn't make the wire but matters for the architecture.
    Key points
    • Software engineering job postings hit their highest level since November 2023
    • The irony in this number during an AI coding tool boom is noted in the thread
    Engagement
    774 likes
    Provenance
    Article · Supporting source
  10. 10

    The agent harness belongs outside the sandbox

    Article mendral

    The placement of the agent loop isn't just an infra detail — it determines credential model, session durability, and how many people can use the agent simultaneously. This is one of the few multi-user agent infrastructu…

    www.mendral.com/blog/agent-harness-belongs-… →
    Details
    Context
    The placement of the agent loop isn't just an infra detail — it determines credential model, session durability, and how many people can use the agent simultaneously. This is one of the few multi-user agent infrastructure writeups that goes into the weeds.
    Key points
    • Agent harness architecture has two choices — inside or outside the sandbox — with fundamentally different tradeoffs
    • Outside the sandbox: credentials stay out, sandboxes become suspendable cattle, multi-user sharing becomes a database problem
    • Virtualizes filesystem access so agents see paths that route to Postgres or sandbox depending on namespace
    • 25ms sandbox resume from Blaxel; durable execution on Inngest with step-level checkpointing
    • Bash remains a leak past the virtualization layer; consistency strategy is last-writer-wins with no deadlock answers
    Engagement
    124 likes · 90 replies
    Provenance
    Article · Supporting source
  11. 11

    Approaching zero bugs?

    Article Daniel Stenberg

    Stenberg is the author of curl and has been fixing bugs in it for decades. His data shows that despite AI tooling getting dramatically better at finding problems, the vulnerability age curves haven't budged. The gap bet…

    daniel.haxx.se/blog/2026/04/30/approaching-… →
    Details
    Context
    Stenberg is the author of curl and has been fixing bugs in it for decades. His data shows that despite AI tooling getting dramatically better at finding problems, the vulnerability age curves haven't budged. The gap between detection and remediation is structural, not a tooling problem.
    Key points
    • Daniel Stenberg tracks vulnerability age in curl and measures the bugfix rate — neither curve is trending down yet
    • More bugs found doesn't mean more bugs exist — just that the filter is catching more, but the fix pipeline is slower
    • The question is whether tooling can find bugs faster than they can be fixed, and whether we can tell when we're close to zero
    • Based on curl's data, the answer is: not close yet, and the graphs haven't even started their downward turn
    Provenance
    Article · Supporting source
  12. 12

    Specsmaxxing – On overcoming AI psychosis, and why I write specs in YAML

    Article brendanmc6

    When code generation gets fast enough that the bottleneck shifts from implementation to validation, you need a way to track what was actually built against what was specified. ACIDs are one attempt at that — and the aut…

    acai.sh/blog/specsmaxxing →
    Details
    Context
    When code generation gets fast enough that the bottleneck shifts from implementation to validation, you need a way to track what was actually built against what was specified. ACIDs are one attempt at that — and the author honestly notes they came up with it after discovering several similar tools already existed.
    Key points
    • The author built ACIDs (Acceptance Criteria IDs) to track spec alignment across implementations
    • feature.yaml format replaces markdown specs with numbered requirements that can be referenced in code and tests
    • Dashboard tracks which requirements are implemented, tested, reviewed — turning PR review into requirement-by-requirement acceptance
    • Compares to SpecKit, OpenSpec, Kiro, Traycer — claims differentiator is acceptance coverage tracking across many implementations
    Provenance
    Article · Supporting source
  13. 13

    A Couple Million Lines of Haskell: Production Engineering at Mercury

    Article Ian Duncan

    A rare view of what large-scale Haskell looks like in production at a growing fintech. The operational lessons — types as institutional memory, durability patterns, error modeling — apply well beyond Haskell to anyone m…

    blog.haskell.org/a-couple-million-lines-of-… →
    Details
    Context
    A rare view of what large-scale Haskell looks like in production at a growing fintech. The operational lessons — types as institutional memory, durability patterns, error modeling — apply well beyond Haskell to anyone managing a large codebase with high hiring churn.
    Key points
    • Mercury runs 2 million lines of Haskell processing $248B in transaction volume, maintained by generalists who mostly had no Haskell experience
    • Types encode operational knowledge that survives when people leave — more important than purity as a correctness proof
    • Purity is a boundary, not a property: dangerous things are tolerable when fenced in and hard to misuse
    • Temporal adopted for durable execution — replaying deterministic workflows instead of hand-rolled cron/state machines
    • Domain errors modeled as types, not HTTP status codes — a 409 in a cron job is 'absolutely unhinged'
    Provenance
    Article · Supporting source
  14. 14

    Qwen3.6-27B vs Coder-Next

    Article Signal_Ad657

    One of the few actual side-by-side benchmarks between these two models. The key finding: they're essentially tied, and the 'thinking' mechanism in 27B can hurt consistency — the non-thinking version shipped work more re…

    www.reddit.com/r/LocalLLaMA/comments/1t2ab5… →
    Details
    Context
    One of the few actual side-by-side benchmarks between these two models. The key finding: they're essentially tied, and the 'thinking' mechanism in 27B can hurt consistency — the non-thinking version shipped work more reliably. VRAM and quant level matter enormously for the practical choice.
    Key points
    • 20 hours of side-by-side compute on RTX PRO 6000 Blackwells: Coder-Next 25/40 ships, 27B-thinking 30/40 — statistically tied with overlapping Wilson CIs
    • 27B with thinking disabled was the most consistent shipper: 95.8% across the full 12-cell grid
    • 3.6-35B-A3B fell flat on its face for tasking — kept as failure-mode evidence
    • The thinking-trace loop matters: no-think halves the documented word-trim loop (4/10 to 2/10)
    Engagement
    539 likes · 99 replies
    Provenance
    Article · Supporting source
  15. 15

    Enabling AI Co-Author by default

    Source cwebster-99

    Small change with outsized consequences for how we track who wrote what in a codebase. The HN response suggests the engineering community is sensitive to attribution opacity when it's invisible.

    github.com/microsoft/vscode/pull/310226 →
    Details
    Context
    Small change with outsized consequences for how we track who wrote what in a codebase. The HN response suggests the engineering community is sensitive to attribution opacity when it's invisible.
    Key points
    • VS Code PR enabling 'Co-Authored-by Copilot' in commits regardless of whether the user used Copilot
    • 1333 upvotes on HN, 707 comments — the response was mostly negative
    • The change makes commit attribution opaque — anyone can push with Co-Authored-by Copilot whether they used it or not
    Provenance
    Source · Background source