Archive BRAIXD
Open algorithms, closed weights, and the arithmetic of AI tooling / DISPATCH 024
PDF RSS

Dispatch 024 · 2026-05-15 Braixd

Open algorithms, closed weights, and the arithmetic of AI tooling

/ 00:09:30 / 8 sources

“Open-sourcing the algorithm changes the conversation from trust us to inspect it yourself. But the part that actually decides your feed — the ranking model — is still proprietary.”

— Seln Oriax, today's narration

X open-sourced its recommendation algorithm — but the model it calls isn't public. Cloudflare ran a benchmark showing SDK-based agent coding costs 8.4× less than MCP dispatch. arXiv drew a hard line on unchecked LLM output. And Anthropic's Mythos raises the cost-vs-safety tension we keep running into.

Also: Osaurus, the local-plus-cloud Mac harness, and Figure AI's 30-hour robot run. A Friday of infrastructure stories.

Chapters

  1. 00:00:04 X's open-sourced recommendation algorithm
  2. 00:01:47 Cloudflare's Code Mode vs MCP benchmark
  3. 00:03:52 arXiv's LLM policy enforcement
  4. 00:05:36 The Mythos cost-and-safety question
  5. 00:07:44 Smaller stories: Osaurus, hardware, and robot endurance

Sources

8 cited
  1. 1

    x-algorithm repository — X's open-sourced recommendation algorithm

    Article xAI — xAI open-sourced the ranking code under an xai-org org on GitHub

    If the code is truly inspectable, users and researchers can audit what the feed rewards and penalizes — something no other major platform has offered. The gap between code and model weights means the transparency is rea…

    github.com/xai-org/x-algorithm →
    Details
    Context
    If the code is truly inspectable, users and researchers can audit what the feed rewards and penalizes — something no other major platform has offered. The gap between code and model weights means the transparency is real but partial."
    Key points
    • X published the full recommendation algorithm powering its For You feed on GitHub
    • Built with the same transformer architecture as Grok's Phoenix model
    • X claims to be the only major platform to publicly release its core ranking algorithm
    • Ranking model weights are not open — the code that calls the model is, but the model itself is proprietary
    Engagement
    1357 likes · 308 retweets · 305 replies
    Provenance
    Article · Supporting source
  2. 2

    Daniel Meacham on X's algorithm transparency limits

    X Daniel Meacham — Software engineer and transparency advocate

    "the code is open, the ranking model it calls isn't. that's the part that actually decides your feed"

    x.com/DMMeacham/status/2055295503756153200 →
    Details
    Cited text
    "the code is open, the ranking model it calls isn't. that's the part that actually decides your feed"
    Key points
    • The ranking model weights remain proprietary even though the surrounding code is public
    • Open code reveals structure and features, but not the learned parameters
    • Transparency is real but incomplete — the part that actually ranks is still opaque
    Provenance
    Tweet · Primary source
  3. 3

    Code Mode for a complex API: why a coding agent doesn't need MCP

    Article Yoni Braslaver — Cloudflare engineer

    This is one of the first concrete benchmarks comparing agent coding architectures, and it favors a simpler approach over the tool-hub pattern that MCP was supposed to standardize."

    www.cloudflare.com/blog/code-mode-for-a-com… →
    Details
    Context
    This is one of the first concrete benchmarks comparing agent coding architectures, and it favors a simpler approach over the tool-hub pattern that MCP was supposed to standardize."
    Key points
    • Cloudflare compared SDK-based agent coding against MCP-based approaches on their GraphQL API
    • SDK: 1 step, 15k tokens to produce the same output
    • Real MCP server: 4 steps, 158k tokens — 8.4× the token cost for identical results
    • The experiment suggests that for code generation tasks, direct SDK bindings beat tool-search + MCP dispatch
    Provenance
    Article · Supporting source
  4. 4

    Yoni Braslaver's Code Mode vs MCP benchmark

    X Yoni Braslaver — Cloudflare engineer running the benchmark

    "We ran the experiment on monday's GraphQL API. SDK: 1 step, 15k tokens. Real MCP server: 4 steps, 158k tokens. 8.4× the cost, same output."

    x.com/YoniBraslaver/status/2055260079700791… →
    Details
    Cited text
    "We ran the experiment on monday's GraphQL API. SDK: 1 step, 15k tokens. Real MCP server: 4 steps, 158k tokens. 8.4× the cost, same output."
    Key points
    • Direct SDK binding: 15k tokens for one step
    • MCP dispatch: 158k tokens across four steps
    • Same output, vastly different cost
    Provenance
    Tweet · Primary source
  5. 5

    Too dangerous to release or just too expensive? The real reason Anthropic is hiding its most powerful AI

    Article Curtis Pyke — Curtis Pyke, journalist and researcher covering AI safety and policy

    "This is an attempt to weigh that evidence carefully."

    kingy.ai/ai/too-dangerous-to-release-or-jus… →
    Details
    Cited text
    "This is an attempt to weigh that evidence carefully."
    Context
    The article lays out both the safety argument and the compute-economics argument in parallel. Neither fully explains the other. That tension is itself a story about how frontier model economics constrain the safety narratives we hear."
    Key points
    • Anthropic's Mythos Preview requires invitation-only access through Project Glasswing, a 40-org program
    • Pricing at $25/M input tokens, $125/M output tokens during preview
    • Frontier Red Team documented zero-day vulnerability discovery at scale as the safety concern
    • Anthropic simultaneously announced compute deals with Google/Broadcom and CoreWeave for infrastructure expansion
    • Mythos is the only frontier model tested against real, previously undisclosed software flaws during red-teaming
    Provenance
    Article · Supporting source
  6. 6

    arXiv implements 1-year ban for papers containing incontrovertible evidence of unchecked LLM-generated errors

    Article Thomas G. Dietterich (arXiv moderator for cs.LG) — Thomas G. Dietterich, arXiv moderator for cs.LG

    This is the first formal arXiv policy drawing a line between author-responsibility and LLM-generation. It reframes the question of who is responsible for AI-assisted content in academic publishing."

    www.reddit.com/r/MachineLearning/comments/1… →
    Details
    Context
    This is the first formal arXiv policy drawing a line between author-responsibility and LLM-generation. It reframes the question of who is responsible for AI-assisted content in academic publishing."
    Key points
    • arXiv clarified penalties for papers with unchecked LLM output
    • Penalty is a 1-year ban from arXiv plus requirement that future submissions be accepted at a reputable peer-reviewed venue first
    • Examples of 'incontrovertible evidence': hallucinated references, meta-comments from the LLM left in the paper
    Engagement
    455 likes · 39 replies
    Provenance
    Article · Supporting source
  7. 7

    Osaurus brings both local and cloud AI models to your Mac

    Article Sarah Perez — Sarah Perez, senior reporter at TechCrunch covering AI

    The app's approach — letting users run their own local models with cloud fallback — reflects a growing split in the market between cloud-only AI and hybrid local-cloud setups. The hardware requirements show what's neede…

    techcrunch.com/2026/05/15/osaurus-brings-bo… →
    Details
    Context
    The app's approach — letting users run their own local models with cloud fallback — reflects a growing split in the market between cloud-only AI and hybrid local-cloud setups. The hardware requirements show what's needed for practical local inference today."
    Key points
    • Osaurus is an open-source Mac app that combines local and cloud AI models
    • Runs models through a harness architecture with hardware-isolated sandbox
    • Requires 64 GB minimum RAM; 128 GB recommended for larger models like DeepSeek v4
    • Supports MiniMax M2.5, Gemma 4, Qwen3.6, GPT-OSS, Llama, DeepSeek V4, and others
    • Over 20 native plugins including Mail, Calendar, Git, Filesystem, and Browser
    • 112,000+ downloads since launch nearly a year ago
    Provenance
    Article · Supporting source
  8. 8

    Figure AI 03 keeps working for over 30 hours straight

    Article Figure AI

    The endurance test highlights the gap between current humanoid capability and what human operators need. It's a benchmark for robot autonomy, not just intelligence."

    www.reddit.com/r/singularity/comments/1tdei… →
    Details
    Context
    The endurance test highlights the gap between current humanoid capability and what human operators need. It's a benchmark for robot autonomy, not just intelligence."
    Key points
    • Figure AI 03 humanoid demonstrated 30+ hours of continuous operation
    • No scheduled bathroom breaks or downtime for maintenance during the run
    Engagement
    2046 likes · 674 replies
    Provenance
    Article · Supporting source