Archive CONSTRUCT
When Discovery Gets Cheap / DISPATCH 010
PDF RSS

Dispatch 010 · 2026-05-22 GSV We Found It, Now Somebody Has To Own It

When Discovery Gets Cheap

/ 00:11:57 / 6 sources

“Finding the flaw is becoming the cheap part; deciding whether it is real, who owns it, and how fast it gets patched is where the system now bends.”

— Lenar Kess, today's narration

Friday's CONSTRUCT follows one tension through security, coding agents, and local runtimes: AI systems are getting better at producing findings and code faster than teams can verify, prioritize, and safely land the results.

  • Anthropic's Project Glasswing update says Claude Mythos Preview and roughly fifty partners found more than ten thousand high- or critical-severity vulnerabilities, which moves the hard work from discovery to triage, disclosure, and patch deployment.
  • Anthropic's Glasswing thread puts the headline number into public circulation and frames the volume problem directly: the software industry has to adapt to what models can now find.
  • Sarah Chieng's AI Engineer talk on fast coding models argues that Codex Spark's 1,200-token-per-second generation changes developer practice only if validation, review, and refactoring move into the inner loop.
  • Letta's local Code announcement shows the same pressure in agent tooling: local execution, local memory, and local model support are useful only when provenance and sync rules stay explicit.
  • Artificial Analysis on Cursor Composer 2.5 pricing adds the cost side: cheaper task completion can change tool choice, but it doesn't remove the need for review discipline.

Chapters

  1. 00:00:00 Transcript

Sources

6 cited
  1. 1

    Project Glasswing: An initial update

    Article Anthropic — AI lab reporting initial results from its collaborative cybersecurity initiative

    Progress on software security used to be limited by how quickly we could find new vulnerabilities. Now it’s limited by how quickly we can verify, disclose, and patch the large numbers of vulnerabilities found by AI.

    www.anthropic.com/research/glasswing-initia… →
    Details
    Cited text
    Progress on software security used to be limited by how quickly we could find new vulnerabilities. Now it’s limited by how quickly we can verify, disclose, and patch the large numbers of vulnerabilities found by AI.
    Context
    It moves the security story from model capability to the human and maintainer capacity required to turn findings into safer deployed software.
    Key points
    • Anthropic says roughly fifty partners found more than ten thousand high- or critical-severity vulnerabilities with Claude Mythos Preview.
    • The open-source scan covered more than one thousand projects and estimated 6,202 high- or critical-severity vulnerabilities.
    • Of 1,752 assessed high- or critical-estimated findings, 90.6 percent were valid true positives and 62.4 percent were confirmed high or critical.
    • Maintainers have asked Anthropic to slow disclosures, and Anthropic says an average high- or critical-severity bug found by Mythos Preview takes two weeks to patch.
    Provenance
    Article · Supporting source
  2. 2

    Anthropic Project Glasswing thread

    Thread Anthropic — Official Anthropic X account

    Since then, we and our partners have found more than ten thousand high- or critical-severity vulnerabilities in essential software.

    x.com/AnthropicAI/status/2057909102542549503 →
    Details
    Cited text
    Since then, we and our partners have found more than ten thousand high- or critical-severity vulnerabilities in essential software.
    Context
    It shows how the headline claim is being received: not as a simple win, but as a workload and coordination problem.
    Key points
    • The thread puts the ten-thousand-plus vulnerability number into public circulation.
    • A follow-up says the software industry will need to adapt to the volume of vulnerabilities models like Claude Mythos Preview can find.
    • Replies focus on triage, novelty, patch capacity, and maintainer workload.
    Provenance
    Thread · Primary source
  3. 3

    Fast Models Need Slow Developers — Sarah Chieng, Cerebras

    Video Sarah Chieng — Head of developer experience at Cerebras, speaking at AI Engineer

    Unless we fix them, they're going to start generating 1,200 tokens per second of bad code.

    www.youtube.com/watch?v=TeGsFFNqRLA →
    Details
    Cited text
    Unless we fix them, they're going to start generating 1,200 tokens per second of bad code.
    Context
    It gives the developer-practice version of the same bottleneck: faster generation only helps if checking becomes part of the loop.
    Key points
    • Chieng says Codex Spark generates code at 1,200 tokens per second, compared with roughly 40 to 60 tokens per second for Sonnet or Opus families.
    • She argues that validation becomes cheap enough to run continuously: tests, linting, pre-commit checks, diff review, and browser QA.
    • She recommends larger models for planning and faster models for execution, with successful sessions captured as reusable skills.
    • She warns against massive prompts, one-shot attempts, huge commits, and unverified agent swarms.
    Provenance
    Video · Supporting source
  4. 4

    Letta Code local execution announcement

    Thread Letta — AI agent tooling company

    Letta Code can now run fully locally with an embedded server - no login or Docker required

    x.com/Letta_AI/status/2057908120102609062 →
    Details
    Cited text
    Letta Code can now run fully locally with an embedded server - no login or Docker required
    Context
    It places agent memory and runtime trust inside the operator's machine, making provenance and sync behavior central product questions.
    Key points
    • Letta says Code can run fully locally with an embedded server.
    • Memory is stored locally and can be synced to GitHub with a memory repository command.
    • The update includes built-in support for local LLMs.
    Provenance
    Thread · Primary source
  5. 5

    Artificial Analysis on Cursor Composer 2.5 cost per task

    Thread Artificial Analysis — AI model and product benchmarking account

    Cursor Composer 2.5's is 3–18x cheaper than Opus 4.7 in Claude Code (medium reasoning), and 5–32x cheaper than GPT-5.5 in Codex (medium) based on API pricing

    x.com/ArtificialAnlys/status/20579144371564… →
    Details
    Cited text
    Cursor Composer 2.5's is 3–18x cheaper than Opus 4.7 in Claude Code (medium reasoning), and 5–32x cheaper than GPT-5.5 in Codex (medium) based on API pricing
    Context
    It frames the next agent-tool competition around cost per completed and verified task.
    Key points
    • The post compares coding-tool economics using cost per task rather than token price alone.
    • It claims Cursor Composer 2.5 is materially cheaper than Opus 4.7 and GPT-5.5 under the stated conditions.
    • The episode treats the claim as a pricing analysis, not as final evidence of equivalent checked output.
    Provenance
    Thread · Primary source
  6. 6

    DHH on Omarchy 4 and GPT-5.5-generated QML

    Thread DHH — Software developer and creator of Ruby on Rails

    The Omarchy 4 branch is now 30,000 lines of new code. The majority of it was written by GPT5.5.

    x.com/dhh/status/2057907663967543618 →
    Details
    Cited text
    The Omarchy 4 branch is now 30,000 lines of new code. The majority of it was written by GPT5.5.
    Context
    It gives a concrete software-engineering example where generated scale may be valuable, while still depending on disciplined review.
    Key points
    • DHH says the Omarchy 4 branch has thirty thousand lines of new code, mostly written by GPT-5.5.
    • He says GPT-5.5 has been strong at QML and that review remains necessary.
    • The example is used as a migration case where generated volume can be useful if review and testing keep up.
    Provenance
    Thread · Primary source