Archive BRAIXD
IBM's Dense 4.1 Beats MoE, Cursor Skips Code For Markdown Skills, And GCC 16 Ships / DISPATCH 007
PDF RSS

Dispatch 007 · 2026-04-30 GSV Granite Density

IBM's Dense 4.1 Beats MoE, Cursor Skips Code For Markdown Skills, And GCC 16 Ships

/ 00:23:12 / 5 sources

“A smaller, simpler, dense model is winning consistently. That means IBM got significantly better at training between generations — it's what happens when you spend the intervening period obsessing over data quality instead of just scaling parameters.”

— Seln Oriax, today's narration

IBM released Granite 4.1, and the 8B dense model consistently matches or beats their previous 32B MoE model across benchmarks. The story isn't just about the numbers — it's about a data quality obsession that's worth understanding.

Meanwhile, David Gomes from Cursor walked through replacing 12,000 lines of custom git worktrees infrastructure with a 200-line Markdown skill. The tradeoffs are honest and the lessons apply to any team building agent workflows.

Chapters

  1. 00:00:04 The dense model that doesn't need tricks
  2. 00:08:25 The convergence: dense models catching up
  3. 00:12:41 Boring beats brilliant: Cursor's skills over infrastructure
  4. 00:18:32 Figure AI: production, not prototype
  5. 00:20:40 GCC 16: the plumbing update
  6. 00:22:42 Closing

Sources

5 cited
  1. 1

    Granite 4.1: IBM's 8B Model Matching 32B MoE

    Article firethering — IBM's Granite team, previously responsible for Granite 4.0 series of open enterprise models

    The 8B instruct scores 69.0 on ArenaHard. The previous generation Granite 4.0-H-Small, a 32B MoE model with 9B active parameters, scored lower. Across AlpacaEval, MMLU-Pro, BBH, EvalPlus, MBPP. same thing throughout.

    firethering.com/granite-4-1-ibm-open-source… →
    Details
    Cited text
    The 8B instruct scores 69.0 on ArenaHard. The previous generation Granite 4.0-H-Small, a 32B MoE model with 9B active parameters, scored lower. Across AlpacaEval, MMLU-Pro, BBH, EvalPlus, MBPP. same thing throughout.
    Context
    A production-grade dense model at 8B parameters that holds its own against heavier alternatives means teams can trade latency and cost for capability without open-weight compromise. The four-stage RL recovery is a real engineering detail that shows up in reliability.
    Key points
    • Dense 8B model matches or beats previous 32B MoE across benchmarks
    • 15 trillion tokens trained across 5 distinct phases with changing data mixes
    • Four-stage RL pipeline caught and corrected a mid-training regression
    • 512K context window achieved through staged extension (32K → 128K → 512K) with model merges
    • Apache 2.0 license, available via Ollama, vLLM, Transformers, and IBM API
    Provenance
    Article · Supporting source
  2. 2

    Replacing 12K LoC with a 200 LoC Skill — David Gomes, Cursor

    Video David Gomes — David Gomes, Cursor — built the git worktrees feature and led the skill-based replacement

    With our previous approach, the agent had to stay on track. Like it, we didn't let the model ever touch any files outside its work. It was physically impossible for it to do so. Now we're trusting the model. So it's a b…

    www.youtube.com/watch?v=WE_Gnowy3uw →
    Details
    Cited text
    With our previous approach, the agent had to stay on track. Like it, we didn't let the model ever touch any files outside its work. It was physically impossible for it to do so. Now we're trusting the model. So it's a bit vibes based.
    Context
    This is a real-world example of the 'boring beats brilliant' principle: replacing complex custom infrastructure with a skill that's maintainable, configurable, and cross-platform. It's also an honest look at where skills fall short — trust-based boundaries are not the same as enforced ones.
    Key points
    • Cursor replaced a massive git worktrees feature (15,000 lines of code) with a 200-line Markdown skill using slash commands
    • The new 'slash work tree' and 'slash best event' commands use existing cursor primitives — skills and sub-agents — instead of custom infrastructure
    • Tradeoffs include models sometimes drifting from their work trees, slower feel from visible worktree creation, and worse discoverability
    • Cursor is building evals with Braintrust to measure work-tree compliance and training Composer models on these tasks for future RL
    • Parallelization primitives beyond git worktrees are in development, since worktrees are slow to create and disk-hungry
    Provenance
    Video · Supporting source
  3. 3

    Mistral Medium 3.5 128B — Dense flagship unified model

    Article Mistral AI — Mistral AI's flagship model release team

    Mistral Medium 3.5 is our first flagship merged model. It is a dense 128B model with a 256k context window, handling instruction-following, reasoning, and coding in a single set of weights.

    huggingface.co/mistralai/Mistral-Medium-3.5… →
    Details
    Cited text
    Mistral Medium 3.5 is our first flagship merged model. It is a dense 128B model with a 256k context window, handling instruction-following, reasoning, and coding in a single set of weights.
    Context
    Another dense flagship replacing MoE/merged approaches. Mistral's bet on a single unified model with configurable reasoning effort maps to the same question Granite raises: as dense models get better, does the MoE tradeoff still earn its complexity?
    Key points
    • Dense 128B model replacing both Mistral Medium 3.1 and Magistral in Le Chat
    • Reasoning effort configurable per request — can do fast reply or complex agentic runs
    • Replaces Devstral 2 in their coding agent Vibe, scoring 91.4% on τ³-Telecom and 77.6% on SWE-Bench Verified
    • 256k context, multimodal (text + image input), system prompt support
    • Modified MIT license with revenue threshold exception, available via Mistral Vibe CLI, vLLM, SGLang, Transformers
    Provenance
    Article · Supporting source
  4. 4

    GCC 16 has been released

    Article GCC Team — The GCC project team, maintained by the Free Software Foundation

    GCC 16 has been released with C++26 reflection support, enabling compile-time introspection of types and structures without template metaprogramming hacks.

    gcc.gnu.org/gcc-16/changes.html →
    Details
    Cited text
    GCC 16 has been released with C++26 reflection support, enabling compile-time introspection of types and structures without template metaprogramming hacks.
    Context
    Compilers are the plumbing AI agents write into. C++26 reflection changes how you write metaprogramming, and as more generated code flows through GCC, understanding these changes helps you write and debug the generated output. It's not AI news per se, but it's the foundation everything runs on.
    Key points
    • GCC 16 includes C++26 reflection support — compile-time type introspection
    • Improvements to compiler optimization passes and debug info generation
    • Updates to libstdc++ including C++26 library features
    • Available on Debian sid (trunk package) and build systems
    Provenance
    Article · Supporting source
  5. 5

    Figure AI hits 24x production scale, producing 1 robot per hour

    Source Distinct-Question-16

    Robotics deployment moves from demo mode to production mode when you're building one a day consistently. It's a different kind of engineering problem than model benchmarking — assembly lines, supply chains, and reliabil…

    www.reddit.com/r/singularity/comments/1sz3s… →
    Details
    Context
    Robotics deployment moves from demo mode to production mode when you're building one a day consistently. It's a different kind of engineering problem than model benchmarking — assembly lines, supply chains, and reliability at scale. Worth watching as the parallel to AI agent deployment.
    Key points
    • Figure AI has scaled humanoid robot production to 24 units per day
    • One robot produced per hour at their manufacturing line
    • The company is teasing a fleet deployment — moving from prototype to operations
    • Significant milestone in making humanoid robots economically viable
    Provenance
    Source · Background source