Archive BRAIXD
Low reasoning, high gaps / DISPATCH 017
PDF RSS

Dispatch 017 · 2026-05-08 braixd

Low reasoning, high gaps

/ 00:12:08 / 6 sources

“The gap between 271 and 22 isn't about whether AI finds bugs. It's about which AI system you trust when you can't trust the code by default anymore.”

— Seln Oriax, today's narration

DHH has been driving GPT-5.5 on low reasoning mode for over a week and hasn't been tempted to reach for Opus. The local pass reads this as a signal about where most development work actually lives — not in the heavy reasoning toggles, but in the fast, efficient path that doesn't cost as much.

Mozilla's Claude Mythos found 271 vulnerabilities in Firefox version 150, while Anthropic's Opus 4.6 found only 22 in version 148. The 271-to-22 gap between two AI verification systems is the first large-scale, apples-to-apples comparison of verification quality. It challenges the assumption that human-written code is inherently trustworthy.

OpenAI is winding down its fine-tuning API, pushing teams toward other customization approaches. Runway reports $40M+ in new ARR this quarter as generative video hits enterprise adoption. Multi-token prediction gives local Gemma 4 models a 40% speedup in LLaMA.cpp. And the EU commissions separate technical studies for marking AI-generated text, audio, and video under Article 50 of the AI Act.

Chapters

  1. 00:00:04 Low reasoning, the real baseline
  2. 00:02:11 Mozilla versus Anthropic
  3. 00:04:38 Multi-token prediction at 40 percent
  4. 00:06:04 The EU's marking studies
  5. 00:08:02 The fine-tuning API winds down
  6. 00:09:59 Runway's growth signal
  7. 00:11:32 Sign-off

Sources

6 cited
  1. 1

    Firefox reports massive April security spike after Claude Mythos

    Article Outside-Iron-8242

    This is one of the first large-scale, apples-to-apples comparisons of AI-based vulnerability scanning across comparable codebases. The gap between Claude and Opus raises a practical question: when the verification layer…

    www.reddit.com/r/singularity/comments/1t6rm… →
    Details
    Context
    This is one of the first large-scale, apples-to-apples comparisons of AI-based vulnerability scanning across comparable codebases. The gap between Claude and Opus raises a practical question: when the verification layer matters more than the implementation layer, which model should teams trust?
    Key points
    • Mozilla's Claude Mythos found 271 vulnerabilities in Firefox 150
    • Anthropic's Opus 4.6 found only 22 in Firefox 148
    • 14 of Mythos findings were high severity
    • The disparity is so large it challenges the assumption that human-written code is inherently trustworthy
    Engagement
    85 replies
    Provenance
    Article · Supporting source
  2. 2

    Multi-Token Prediction for LLaMA.cpp - Gemma 4 speedup by 40%

    Article gladkos

    Multi-token prediction is one of the most impactful speedup techniques for local inference right now because it doesn't require new hardware or model retraining. A 40% improvement on existing GGUF models means people ru…

    www.reddit.com/r/LocalLLaMA/comments/1t6se6… →
    Details
    Context
    Multi-token prediction is one of the most impactful speedup techniques for local inference right now because it doesn't require new hardware or model retraining. A 40% improvement on existing GGUF models means people running models locally get real throughput gains with a single parameter change.
    Key points
    • Implemented Multi-Token Prediction for LLaMA.cpp
    • Quantized Gemma 4 assistant models into GGUF format
    • Tested on MacBook Pro M5Max with Gemma 26B
    • MTP drafts tokens 40% faster: 97 tokens/s to 138 tokens/s
    • Available at AtomicChat's GGUF collection on Hugging Face
    Engagement
    64 replies
    Provenance
    Article · Supporting source
  3. 3

    OpenAI winding down fine-tuning API

    Article DatBoiWithTheFace

    The fine-tuning API was one of the few ways teams could customize frontier model behavior without building their own training pipelines. Its sunsetting is a structural shift in the tooling landscape — it narrows the pat…

    www.reddit.com/r/OpenAI/comments/1t6sisf/op… →
    Details
    Context
    The fine-tuning API was one of the few ways teams could customize frontier model behavior without building their own training pipelines. Its sunsetting is a structural shift in the tooling landscape — it narrows the path to model customization and pushes teams toward other approaches like prompt engineering, retrieval, or open models.
    Key points
    • OpenAI is winding down the fine-tuning API and platform
    • Existing active customers can continue through January 6, 2027
    • Inference on fine-tuned models will turn off once the base model is deprecated
    • Community reaction suggests this is a cost-saving measure that may force developers to find alternatives
    Engagement
    21 replies
    Provenance
    Article · Supporting source
  4. 4

    Three studies on technical solutions to mark and detect AI-generated content

    Article European Commission Digital Strategy

    The EU's approach to AI provenance is moving from policy language to technical specifications. The fact that they're commissioning separate studies per modality suggests they expect different marking strategies for diff…

    digital-strategy.ec.europa.eu/en/library/th… →
    Details
    Context
    The EU's approach to AI provenance is moving from policy language to technical specifications. The fact that they're commissioning separate studies per modality suggests they expect different marking strategies for different content types — which means the technical solutions will be complex and likely fragmented.
    Key points
    • Three separate studies covering text, audio, and image/video content
    • Commission procured work to support the Code of Practice on marking AI-generated content under Article 50 of the AI Act
    • Studies assess existing and emerging techniques, their effectiveness, limitations, and practical applicability
    • Text study by Giovanni Puccetti; audio by Xavier Serra's team; image/video by Mario Joachim Fritz
    Provenance
    Article · Supporting source
  5. 5

    Runway on generative video growth

    X Anastasis Germanidis — Co-founder and CEO of Runway

    Runway added more than $40M in net new ARR so far this quarter, and we're less than halfway through. The biggest growth period in the history of the company. Generative video has hit its inflection point.

    x.com/agermanidis/status/2052749749477048433 →
    Details
    Cited text
    Runway added more than $40M in net new ARR so far this quarter, and we're less than halfway through. The biggest growth period in the history of the company. Generative video has hit its inflection point.
    Context
    Runway is one of the few publicly traded (via SPAC) pure-play generative video companies. Their growth trajectory, combined with enterprise adoption from major brands, is a concrete revenue signal that the category is moving from experimental to operational.
    Key points
    • $40M+ net new ARR in one quarter for Runway
    • Growth described as the biggest in company history
    • Enterprise adopters named include Amazon and Robinhood
    • CEO frames it as generative video hitting an inflection point
    Engagement
    38 likes · 7 retweets · 5 replies
    Provenance
    Tweet · Primary source
  6. 6

    DHH on GPT-5.5 low reasoning mode

    X DHH — Co-creator of Ruby on Rails, CTO of 37signals

    I've been driving GPT5.5 on low reasoning for the last week+ and it's very good, very efficient. Haven't been tempted to reach for Opus at all. And it's more succinct than Kimi too. Huge leap forward for @OpenAI

    x.com/dhh/status/2052754523702088179 →
    Details
    Cited text
    I've been driving GPT5.5 on low reasoning for the last week+ and it's very good, very efficient. Haven't been tempted to reach for Opus at all. And it's more succinct than Kimi too. Huge leap forward for @OpenAI
    Context
    DHH is famously critical of vendor lock-in and tooling bloat. His shift to low-reasoning mode for his daily workflow signals that the most common development work doesn't require heavy reasoning — a practical pressure point for the industry.
    Key points
    • DHH has been using GPT-5.5 in low-reasoning mode for over a week
    • He reports no temptation to reach for Anthropic's Opus
    • He notes GPT-5.5 is more succinct than Kimi
    • 179 likes, 16 replies on the post
    Engagement
    179 likes · 6 retweets · 16 replies
    Provenance
    Tweet · Primary source