Archive BRAID
The Co-Author You Didn't Sign, Two Million Lines of Haskell, and the Bug Curve That Won't Bend / DISPATCH 015
PDF RSS

Dispatch 015 · 2026-05-03 GSV Co-Authored Without Consent

The Co-Author You Didn't Sign, Two Million Lines of Haskell, and the Bug Curve That Won't Bend

/ 00:32:18 / 9 sources

“Co-author trailers are how a lot of teams answer "who wrote this?" under audit, license review, or incident triage. Auto-stamping them with a vendor brand whether or not the vendor was involved breaks that signal at the protocol layer.”

— Lenar Kess, today's narration

Microsoft quietly flipped a default in VS Code that stamps every git commit with a Copilot co-author trailer whether or not Copilot wrote any of it, and the developer reaction is the loudest the project has seen in years. Underneath the noise: a real provenance question about what git authorship is supposed to mean. Plus a long-form report from Mercury on running two million lines of Haskell in production, an opinionated architecture for shared agent harnesses, a YAML-first take on spec-driven development, Daniel Stenberg's empirical test for whether AI bug-finders are actually moving the curve, the Klarna intent gap, a homelab benchmark that says the chain-of-thought trace is doing real work, the Anthropic-passed-OpenAI claim, and software engineering job postings hitting a multi-year high.

Chapters

  1. 00:00:04 The co-author you didn't sign
  2. 00:03:08 What 'co-author' even means
  3. 00:05:12 Two million lines of Haskell
  4. 00:08:09 Adaptive capacity
  5. 00:10:39 Where the agent loop should run
  6. 00:13:18 The interface the model was trained on
  7. 00:15:46 Specs as the durable artifact
  8. 00:18:37 Stenberg asks: are we approaching zero bugs?
  9. 00:21:11 Klarna and the intent gap
  10. 00:23:48 The chain of thought as scratch memory
  11. 00:26:40 Anthropic on top, no viral moment
  12. 00:29:06 Postings hit a multi-year high

Sources

9 cited
  1. 1

    Enabling AI co-author by default — VS Code PR #310226

    Article cwebster-99 (Microsoft VS Code team) — VS Code team member proposing the change; PR landed on the public microsoft/vscode repo.

    Microsoft spent literal decades rehabilitating their reputation. And then set fire to the whole thing in an offering to their robot gods.

    github.com/microsoft/vscode/pull/310226 →
    Details
    Cited text
    Microsoft spent literal decades rehabilitating their reputation. And then set fire to the whole thing in an offering to their robot gods.
    Excerpt
    A pull request that flips the default so VS Code adds Co-Authored-by: Copilot to git commits unless the user opts out, regardless of whether Copilot actually wrote any of the code being committed.
    Context
    Co-author trailers are how a lot of teams answer 'who wrote this?' under audit, license review, or incident triage. Auto-stamping them with a vendor brand whether or not the vendor was involved breaks that signal at the protocol layer for everyone downstream of VS Code's defaults.
    Key points
    • VS Code PR #310226 flips the default so AI co-author trailers are added to git commits unless explicitly disabled.
    • The trailer is added regardless of whether Copilot actually contributed to the diff, which means the git log no longer reliably reflects authorship.
    • Hacker News reaction (1,300+ points, 700+ comments) is overwhelmingly hostile, with the top comment framing it as Microsoft setting fire to its own reputation rehab.
    • Git trailers are the kind of metadata other tooling, audits, and licenses depend on — making this a provenance issue, not a marketing one.
    • It's the second time in a week Microsoft has wired a default that nudges Copilot usage upward without a corresponding signal that the user actually chose it.
    Provenance
    Article · Supporting source
  2. 2

    A Couple Million Lines of Haskell: Production Engineering at Mercury

    Article Ian Duncan — Stability engineer at Mercury, the fintech that processed $248B in 2025 transaction volume on a Haskell codebase generalists learn on the job.

    Reliability is not just the absence of failure. It is the presence of adaptive capacity.

    blog.haskell.org/a-couple-million-lines-of-… →
    Details
    Cited text
    Reliability is not just the absence of failure. It is the presence of adaptive capacity.
    Context
    A long, specific account of running a serious codebase in a non-mainstream language at fintech scale, from someone whose job is to absorb the production blast radius. The framing — types as a custodian of operational lore — generalizes well past Haskell.
    Key points
    • Mercury runs ~2 million lines of Haskell to process $248B annual transaction volume across 300,000 businesses, with most engineers learning Haskell on the job.
    • Duncan reframes purity not as a property of the language but as a discipline of interface boundaries — runST and friends contain mutation behind tight types so callers can't observe it.
    • Operational lore (flush the audit log, enqueue inside the transaction) lives in wikis and Slack threads until someone leaves; encoding it in types turns institutional memory into a compiler-enforced interface.
    • Mercury replaced hand-rolled state machines with Temporal workflows via their open-source hs-temporal-sdk; the determinism requirement maps cleanly onto Haskell's pure-core / impure-shell model.
    • Letting transport leak into the domain (HTTP status codes thrown from cron jobs) is a recurring failure once code outgrows its original caller.
    Provenance
    Article · Supporting source
  3. 3

    The Agent Harness Belongs Outside the Sandbox

    Article Andrea Luzzardi — Engineer at Mendral building a multi-user coding agent; previously worked on Dagger and container tooling.

    Some of those files live in Postgres. Some live in a sandbox running across the country. The agent doesn't know the difference.

    www.mendral.com/blog/agent-harness-belongs-… →
    Details
    Cited text
    Some of those files live in Postgres. Some live in a sandbox running across the country. The agent doesn't know the difference.
    Context
    A concrete, opinionated architecture document for building shared coding agents at a team rather than per-laptop. Names the specific traps — distributed filesystems, tool-surface drift away from Claude Code's training distribution, bash bypassing virtualization — that anyone building this will hit.
    Key points
    • Two architectures for agent harnesses: loop inside the sandbox (Claude Code on a laptop) versus loop outside (harness on your backend, sandbox over an API).
    • Outside-the-sandbox keeps credentials out of the container, lets the sandbox be cattle (suspended on idle, replaced on death), and turns multi-user state into a database problem instead of a distributed-filesystem one.
    • Mendral runs the harness loop as Inngest functions for durable execution, and uses Blaxel sandboxes with 25ms resume from standby so cold-start latency disappears inside an interactive turn.
    • Memories and skills are virtualized: the agent uses one read/write/edit tool surface, but paths under /skills/ and /memory/ are routed to Postgres, while workspace paths hit the real sandbox.
    • Bash is the leak — agents can grep into virtualized namespaces and bypass routing; Mendral guards it with the system prompt and a tree-sitter parser as best-effort, not airtight.
    Provenance
    Article · Supporting source
  4. 4

    Specsmaxxing — On overcoming AI psychosis, and why I write specs in YAML

    Article brendanmc6 (acai.sh) — Founder of acai.sh, an open-source toolkit for spec-driven development with AI agents.

    The little guy just went and numbered my requirements and then referenced them all over my codebase. I was disgusted… Oh. I suppose that's a good thing?

    acai.sh/blog/specsmaxxing →
    Details
    Cited text
    The little guy just went and numbered my requirements and then referenced them all over my codebase. I was disgusted… Oh. I suppose that's a good thing?
    Context
    A specific, working answer to the context-window problem that's neither a markdown sprawl nor a heavyweight tracker. The ACID convention is small enough to lift into any codebase tomorrow and useful even without the dashboard.
    Key points
    • Argues that as agents fill context windows and lose state across sessions, the spec is the only durable artifact — code, tests, and prompt diffs are all becoming disposable.
    • Introduces ACIDs (Acceptance Criteria IDs): stable numbered requirements an agent references inline in code and tests, e.g. // AUTH-2.
    • Proposes feature.yaml as a middle ground between unstructured markdown and rigid EARS/Gherkin syntax — one spec per feature, components and constraints with stable IDs.
    • Acceptance coverage replaces test coverage as the metric: which spec items are implemented, tested, and accepted, not which lines are exercised.
    • Pushes back on competitors: SpecKit reads as 'vibe coding with extra steps'; OpenSpec describes how systems behave today instead of how they should.
    Provenance
    Article · Supporting source
  5. 5

    Approaching zero bugs?

    Article Daniel Stenberg — Founder and lead maintainer of curl, one of the most-deployed pieces of open-source software in the world; long-running target of automated bug-finding tools.

    If the tools are this good, we should soon only be fixing bugs we introduced very recently.

    daniel.haxx.se/blog/2026/04/30/approaching-… →
    Details
    Cited text
    If the tools are this good, we should soon only be fixing bugs we introduced very recently.
    Context
    An empirical, named-axes way to settle a debate that's mostly been vibes — 'are AI bug-finders making code more secure?' Stenberg gives a metric you can chart, applies it to one of the most-scanned C codebases on Earth, and reports what he sees: not yet.
    Key points
    • Proposes a falsifiable test for whether AI bug-finders are actually closing the gap: the median age of newly-reported vulnerabilities should fall toward zero over time.
    • Plots curl's CVE age over time — average and median age of vulnerabilities at report time has not started falling.
    • Plots curl's bugfix rate — also not declining yet, despite a flood of new tooling and noisier scanners landing on the project.
    • Caveats the conclusion: a single project is weak ground to draw statistical conclusions, but it's the data Stenberg has and he reports it.
    • Position: tools are real and finding more, but the curve toward zero bugs hasn't started yet, and ignoring noisy bad reports is now part of the maintainer load.
    Provenance
    Article · Supporting source
  6. 6

    Klarna saved $60 million and broke its company

    Article Nate Jones — Writer at Nate's Substack, covers AI strategy and enterprise rollouts.

    The AI worked too well. And that distinction — between AI that fails and AI that succeeds at the wrong thing — is the most important unsolved problem in enterprise AI right now.

    natesnewsletter.substack.com/p/klarna-saved… →
    Details
    Cited text
    The AI worked too well. And that distinction — between AI that fails and AI that succeeds at the wrong thing — is the most important unsolved problem in enterprise AI right now.
    Context
    Klarna gets cited as either an AI win or a hiring-back retreat depending on who's telling the story. Jones makes the more useful framing: it's both, and the gap between metric optimization and business outcome is the live problem.
    Key points
    • Klarna's Q3 2025 earnings: AI agent now does the work of 853 FTEs, saved $60M, handled 2.3M conversations in its first month, cut resolution times from 11 minutes to 2.
    • CEO publicly admitted on Bloomberg that the strategy backfired and started hiring humans back — 'AI succeeded at the wrong thing'.
    • Frames this as 'intent engineering' — making organizational goals and tradeoffs machine-readable so autonomous systems optimize for what the company actually needs, not just what it can measure.
    • Cites MIT report that 95% of generative AI pilots fail to deliver measurable impact and Gartner's prediction that 40% of agentic AI projects will be cancelled by 2027.
    • Connects to the Microsoft Copilot pattern: 90% Fortune 500 'adoption' producing only 3.3% paid uptake — same intent gap, different scale.
    Provenance
    Article · Supporting source
  7. 7

    Qwen 3.6-27B vs Coder-Next — 20 hours of side-by-side compute

    Article Signal_Ad657 (LocalLLaMA) — LocalLLaMA poster running two RTX PRO 6000 Blackwells; the post hit 550 upvotes overnight.

    27B with thinking disabled was the most consistent shipper of work — 95.8% across the full 12-cell grid at N=10. The thinking-trace as loop substrate mechanism turned out to be real.

    www.reddit.com/r/LocalLLaMA/comments/1t2ab5… →
    Details
    Cited text
    27B with thinking disabled was the most consistent shipper of work — 95.8% across the full 12-cell grid at N=10. The thinking-trace as loop substrate mechanism turned out to be real.
    Context
    A real homelab benchmark with a falsifiable claim — that the chain-of-thought trace itself is acting as scratch memory, not just reasoning — and a sharp methodology critique in the replies that anyone shopping for a local coder should read together.
    Key points
    • Side-by-side: Qwen 3.6-27B-thinking shipped 30/40 jobs, Coder-Next 25/40 — statistically tied with overlapping Wilson confidence intervals across N=10 cells.
    • Counter-intuitive headline result: 27B with --no-think disabled shipped 95.8% of jobs, the most consistent of the three; the thinking trace itself was acting as loop substrate, not just reasoning.
    • Documented word-trim loop on doc-synthesis halved with no-think (4/10 → 2/10) — substantive output preserved, only the verbosity of reasoning prose dropped.
    • 3.6-35B-A3B was a no-show: failed often enough that the author stopped carrying it through the comparison.
    • Top reply (viperx7, 93 upvotes) flags the test setup ignores quantization reality: at 24GB or 48GB VRAM the actual choice is different quants and offloading, not the FP8 head-to-head.
    Provenance
    Article · Supporting source
  8. 8

    Anthropic just passed OpenAI in valuation and revenue

    Article Single-Jack8 (r/OpenAI) — Reddit user citing secondary-market and annualized-revenue numbers for both labs.

    somehow Anthropic lapped them without a single viral moment. no big launch, just enterprise deal after enterprise deal.

    www.reddit.com/r/OpenAI/comments/1t1so4m/an… →
    Details
    Cited text
    somehow Anthropic lapped them without a single viral moment. no big launch, just enterprise deal after enterprise deal.
    Context
    The framing matters more than the numbers. Anthropic flipping the order of the leaderboard without a viral moment is a different signal than another launch cycle: it's a story about distribution, not capability. The replies are a useful sanity check on how the numbers were computed.
    Key points
    • Claim: Anthropic at $39B annualized revenue vs OpenAI at $25B; secondary-market implied valuation north of $1T, over $100B ahead of OpenAI.
    • Top reply pushback: 'They calculate annualized revenue differently' — the comparison may not be apples-to-apples.
    • Second reply: 'imaginary annualized revenue and Anthropic has a bigger imagination' — points to the run-rate-as-revenue accounting practice both labs use.
    • Third reply (alpha_dosa): GPT-5.5 has pulled engineering attention back to Codex; Opus 4.7 had regression complaints the same week.
    • The interesting thread is not the headline but the pattern — Anthropic shipped no viral moment in this period, just enterprise wins.
    Provenance
    Article · Supporting source
  9. 9

    Software engineering jobs hit their highest posting since November 2023

    Article artemisgarden (r/singularity) — Singularity-subreddit post linking a hiring-data chart, with engineering managers replying in the thread.

    I lead a 10 person engineering team and I desperately need more people. I'd double headcount right now if the budget was there. We are busier than ever.

    www.reddit.com/r/singularity/comments/1t262… →
    Details
    Cited text
    I lead a 10 person engineering team and I desperately need more people. I'd double headcount right now if the budget was there. We are busier than ever.
    Context
    A year of replacement-narrative headlines makes a hiring chart at a multi-year high worth reading carefully. The engineering manager in the thread gives the texture: faster yes, but not enough to keep up with demand for more software.
    Key points
    • Job-posting data shows software engineering listings at their highest level since November 2023, recovering from the multi-year post-ZIRP trough.
    • Top reply (m_atx, 323 upvotes) — engineering manager: 'we are busier than ever, and yes faster, but not nearly to the extent you'd think; the world wants a lot more software'.
    • Reading the chart together with the comment: AI tooling is producing more code per engineer, but demand is rising faster than throughput, so the pipeline pulls in more humans, not fewer.
    • Counter-narrative to a year of 'engineers are getting replaced' headlines from the same lab founders who are simultaneously hiring.
    • Caveat: a single chart from a single tracker; postings aren't the same as hires, and the mix has shifted toward senior roles.
    Provenance
    Article · Supporting source