◆ Dispatch 024 · 2026-05-15 Braixd
Open algorithms, closed weights, and the arithmetic of AI tooling
“Open-sourcing the algorithm changes the conversation from trust us to inspect it yourself. But the part that actually decides your feed — the ranking model — is still proprietary.”
— Seln Oriax, today's narration
X open-sourced its recommendation algorithm — but the model it calls isn't public. Cloudflare ran a benchmark showing SDK-based agent coding costs 8.4× less than MCP dispatch. arXiv drew a hard line on unchecked LLM output. And Anthropic's Mythos raises the cost-vs-safety tension we keep running into.
Also: Osaurus, the local-plus-cloud Mac harness, and Figure AI's 30-hour robot run. A Friday of infrastructure stories.
Chapters
- 00:00:04 X's open-sourced recommendation algorithm
- 00:01:47 Cloudflare's Code Mode vs MCP benchmark
- 00:03:52 arXiv's LLM policy enforcement
- 00:05:36 The Mythos cost-and-safety question
- 00:07:44 Smaller stories: Osaurus, hardware, and robot endurance
Sources
8 cited-
1
x-algorithm repository — X's open-sourced recommendation algorithm
Article xAI — xAI open-sourced the ranking code under an xai-org org on GitHub
If the code is truly inspectable, users and researchers can audit what the feed rewards and penalizes — something no other major platform has offered. The gap between code and model weights means the transparency is rea…
github.com/xai-org/x-algorithm →Details
- Context
- If the code is truly inspectable, users and researchers can audit what the feed rewards and penalizes — something no other major platform has offered. The gap between code and model weights means the transparency is real but partial."
- Key points
- X published the full recommendation algorithm powering its For You feed on GitHub
- Built with the same transformer architecture as Grok's Phoenix model
- X claims to be the only major platform to publicly release its core ranking algorithm
- Ranking model weights are not open — the code that calls the model is, but the model itself is proprietary
- Engagement
- 1357 likes · 308 retweets · 305 replies
- Provenance
- Article · Supporting source
-
2
Daniel Meacham on X's algorithm transparency limits
X Daniel Meacham — Software engineer and transparency advocate
"the code is open, the ranking model it calls isn't. that's the part that actually decides your feed"
x.com/DMMeacham/status/2055295503756153200 →Details
- Cited text
"the code is open, the ranking model it calls isn't. that's the part that actually decides your feed"
- Key points
- The ranking model weights remain proprietary even though the surrounding code is public
- Open code reveals structure and features, but not the learned parameters
- Transparency is real but incomplete — the part that actually ranks is still opaque
- Provenance
- Tweet · Primary source
-
3
Code Mode for a complex API: why a coding agent doesn't need MCP
Article Yoni Braslaver — Cloudflare engineer
This is one of the first concrete benchmarks comparing agent coding architectures, and it favors a simpler approach over the tool-hub pattern that MCP was supposed to standardize."
www.cloudflare.com/blog/code-mode-for-a-com… →Details
- Context
- This is one of the first concrete benchmarks comparing agent coding architectures, and it favors a simpler approach over the tool-hub pattern that MCP was supposed to standardize."
- Key points
- Cloudflare compared SDK-based agent coding against MCP-based approaches on their GraphQL API
- SDK: 1 step, 15k tokens to produce the same output
- Real MCP server: 4 steps, 158k tokens — 8.4× the token cost for identical results
- The experiment suggests that for code generation tasks, direct SDK bindings beat tool-search + MCP dispatch
- Provenance
- Article · Supporting source
-
4
Yoni Braslaver's Code Mode vs MCP benchmark
X Yoni Braslaver — Cloudflare engineer running the benchmark
"We ran the experiment on monday's GraphQL API. SDK: 1 step, 15k tokens. Real MCP server: 4 steps, 158k tokens. 8.4× the cost, same output."
x.com/YoniBraslaver/status/2055260079700791… →Details
- Cited text
"We ran the experiment on monday's GraphQL API. SDK: 1 step, 15k tokens. Real MCP server: 4 steps, 158k tokens. 8.4× the cost, same output."
- Key points
- Direct SDK binding: 15k tokens for one step
- MCP dispatch: 158k tokens across four steps
- Same output, vastly different cost
- Provenance
- Tweet · Primary source
-
5
Too dangerous to release or just too expensive? The real reason Anthropic is hiding its most powerful AI
Article Curtis Pyke — Curtis Pyke, journalist and researcher covering AI safety and policy
"This is an attempt to weigh that evidence carefully."
kingy.ai/ai/too-dangerous-to-release-or-jus… →Details
- Cited text
"This is an attempt to weigh that evidence carefully."
- Context
- The article lays out both the safety argument and the compute-economics argument in parallel. Neither fully explains the other. That tension is itself a story about how frontier model economics constrain the safety narratives we hear."
- Key points
- Anthropic's Mythos Preview requires invitation-only access through Project Glasswing, a 40-org program
- Pricing at $25/M input tokens, $125/M output tokens during preview
- Frontier Red Team documented zero-day vulnerability discovery at scale as the safety concern
- Anthropic simultaneously announced compute deals with Google/Broadcom and CoreWeave for infrastructure expansion
- Mythos is the only frontier model tested against real, previously undisclosed software flaws during red-teaming
- Provenance
- Article · Supporting source
-
6
arXiv implements 1-year ban for papers containing incontrovertible evidence of unchecked LLM-generated errors
Article Thomas G. Dietterich (arXiv moderator for cs.LG) — Thomas G. Dietterich, arXiv moderator for cs.LG
This is the first formal arXiv policy drawing a line between author-responsibility and LLM-generation. It reframes the question of who is responsible for AI-assisted content in academic publishing."
www.reddit.com/r/MachineLearning/comments/1… →Details
- Context
- This is the first formal arXiv policy drawing a line between author-responsibility and LLM-generation. It reframes the question of who is responsible for AI-assisted content in academic publishing."
- Key points
- arXiv clarified penalties for papers with unchecked LLM output
- Penalty is a 1-year ban from arXiv plus requirement that future submissions be accepted at a reputable peer-reviewed venue first
- Examples of 'incontrovertible evidence': hallucinated references, meta-comments from the LLM left in the paper
- Engagement
- 455 likes · 39 replies
- Provenance
- Article · Supporting source
-
7
Osaurus brings both local and cloud AI models to your Mac
Article Sarah Perez — Sarah Perez, senior reporter at TechCrunch covering AI
The app's approach — letting users run their own local models with cloud fallback — reflects a growing split in the market between cloud-only AI and hybrid local-cloud setups. The hardware requirements show what's neede…
techcrunch.com/2026/05/15/osaurus-brings-bo… →Details
- Context
- The app's approach — letting users run their own local models with cloud fallback — reflects a growing split in the market between cloud-only AI and hybrid local-cloud setups. The hardware requirements show what's needed for practical local inference today."
- Key points
- Osaurus is an open-source Mac app that combines local and cloud AI models
- Runs models through a harness architecture with hardware-isolated sandbox
- Requires 64 GB minimum RAM; 128 GB recommended for larger models like DeepSeek v4
- Supports MiniMax M2.5, Gemma 4, Qwen3.6, GPT-OSS, Llama, DeepSeek V4, and others
- Over 20 native plugins including Mail, Calendar, Git, Filesystem, and Browser
- 112,000+ downloads since launch nearly a year ago
- Provenance
- Article · Supporting source
-
8
Figure AI 03 keeps working for over 30 hours straight
Article Figure AI
The endurance test highlights the gap between current humanoid capability and what human operators need. It's a benchmark for robot autonomy, not just intelligence."
www.reddit.com/r/singularity/comments/1tdei… →Details
- Context
- The endurance test highlights the gap between current humanoid capability and what human operators need. It's a benchmark for robot autonomy, not just intelligence."
- Key points
- Figure AI 03 humanoid demonstrated 30+ hours of continuous operation
- No scheduled bathroom breaks or downtime for maintenance during the run
- Engagement
- 2046 likes · 674 replies
- Provenance
- Article · Supporting source
X's open-sourced recommendation algorithm
00:00:04 Good morning, Friday the 15th. I'll start with X's decision to open-source its recommendation algorithm, then work through what that means. The gap between what's public and what's not is where this one gets interesting. X published the full code for its For You feed's ranking algorithm on GitHub under the xai-org organization.
00:00:26 The code acts as scaffolding — it loads features, calls models, and handles the ranking pipeline. It leaves out the ranking model itself. That's a trained model built with the same transformer architecture that powers Grok's Phoenix, and it stays proprietary. Daniel Meacham put it clearly in a reply: the code is open, the ranking model it calls isn't, and that's the part that actually decides your feed.
00:00:53 The transparency is meaningful but partial. You can audit the structure, the features that get pulled, and the way the pipeline works. But you can't look at the weights that make the actual ranking decisions. That's worth tracking. If you've ever wondered why certain kinds of posts show up more than others — and I have, on plenty of days — having the code around the model means you can see what signals X is measuring.
00:01:21 That's different from the black-box pattern most platforms operate under, and I haven't seen another major social platform do anything like it. The gap is wide. It's just incomplete. The question going forward is whether X keeps updating the code publicly every few weeks, as they've promised.
00:01:41 The one-off open-source event is a signal. The ongoing transparency is where the actual signal lies.
Cloudflare's Code Mode vs MCP benchmark
00:01:47 Next, a different kind of infrastructure story — one about how we build with AI today. Cloudflare posted a benchmark comparing SDK-based agent coding against MCP-based approaches. They tested it on their own GraphQL API. The results: using the SDK directly took one step and 15 thousand tokens.
00:02:09 Running the same task through an MCP server took four steps and 158 thousand tokens. Same output, 8.4 times the token cost. Yoni Braslaver and the team ran this on Monday's API update. It's a concrete, reproducible comparison, and the result favors the simpler approach: direct SDK bindings over the tool-search-plus-dispatch pattern that MCP was supposed to standardize.
00:02:37 The reason this matters is that the MCP ecosystem is still building out and the tool-search approach has real appeal. You can plug any MCP-compatible server into any agent without writing custom bindings. That's the trade-off: convenience and interoperability on one side, token cost and latency on the other.
00:03:01 For code generation tasks, Cloudflare's data says the direct route wins clearly. Tim De Pauw pushed back in the thread, suggesting this might be more of an argument for tool search than against MCP entirely. He was expecting a benchmark between agents using MCP versus agents calling the same tools through dynamically generated adapter SDKs.
00:03:26 That's a fair point — the comparison isn't MCP versus nothing. It's MCP versus a well-maintained SDK. Even if tool search narrows the gap, the order of magnitude difference here is significant. Anyone building an agent that needs to talk to APIs should keep the Cloudflare result in mind.
00:03:48 SDK-first, MCP-as-extension. Not the other way around.
arXiv's LLM policy enforcement
00:03:52 On the academic side, arXiv just announced a new policy worth noting. Thomas Dietterich, the arXiv moderator for cs.LG, clarified penalties for papers containing unchecked LLM-generated output. The penalty is a one-year ban from arXiv, and after that, future submissions must first be accepted at a reputable peer-reviewed venue.
00:04:16 The policy targets incontrovertible evidence — hallucinated references, LLM meta-comments left in the final draft, and that kind of thing. Dietterich's wording is specific: if a submission contains this kind of evidence, it means the authors didn't check the results of LLM generation, and that's a breach of the Code of Conduct.
00:04:41 The Machine Learning subreddit reaction was mostly positive — one comment got 181 upvotes arguing for even longer bans of all co-authors, which is probably too broad. But the direction is clear. arXiv is drawing a line between author-responsibility and AI-generation that goes beyond the old 'we don't police tool use' posture.
00:05:05 The policy matters because the volume of AI-assisted submissions has been increasing, and the community has been asking for clearer standards. The hallucinated references point is the right one — it's not about using AI to draft. It's about whether authors verified what the AI produced.
00:05:26 I haven't seen another preprint server announce a comparable policy. This is a concrete step, and the one-year ban is a real deterrent.
The Mythos cost-and-safety question
00:05:36 A different tension today — between safety narratives and compute economics at Anthropic. Curtis Pyke published a long investigation into why Anthropic's Claude Mythos Preview is behind a closed door. Mythos is the company's most powerful model, available only through a 40-organization security research program called Project Glasswing.
00:06:00 Pricing during the preview is $25 per million input tokens and $125 per million output — substantially higher than Claude's standard tiers. The safety case, per Anthropic's Frontier Red Team: Mythos crossed a threshold where it can discover and exploit zero-day vulnerabilities autonomously.
00:06:21 The restricted release is designed to buy time for defenders. That's the narrative. But the compute story is concrete. Within days of the Glasswing launch, Anthropic announced compute deals with Google-Broadcom for 3.5 gigawatts of capacity starting in 2027. They also signed a separate deal with CoreWeave for capacity later this year.
00:06:45 Reuters reported Anthropic was exploring designing its own AI chips — a half-billion-dollar effort explicitly tied to a broader shortage. Pyke weighs both explanations. Neither fully explains the other. The core tension here is that safety constraints and compute economics aren't competing stories.
00:07:06 They're simultaneous constraints, and they shape each other. If Mythos needs that much compute to run, even defensive use becomes expensive. The $100 million in credits Anthropic committed to Glasswing participants is a signal of strategic investment, but it's also a signal that running these models costs real money.
00:07:29 The question is whether the safety argument and the compute argument converge — whether the models that are hardest to run are the ones that are hardest to keep safe. That's worth watching as the frontier moves.
Smaller stories: Osaurus, hardware, and robot endurance
00:07:44 Finally, a couple smaller stories. Osaurus, an open-source Mac app that combines local and cloud AI models, hit its first anniversary with over 112,000 downloads. Co-founder Terence Pae started building it after customers asked why they should pay for tokens if they could run AI locally.
00:08:06 The app requires 64 GB minimum RAM, 128 GB recommended for larger models like DeepSeek v4, and supports everything from MiniMax M2.5 to GPT-OSS to Anthropic's models. The interesting part about Osaurus is the hybrid approach — you run your own local models, keep your files on your hardware, and fall back to cloud providers when needed.
00:08:31 That pattern of local-first with cloud extension is becoming a real architecture for people who care about privacy and cost control. On the hardware side, NVIDIA reportedly prepares an RTX 5090 price increase due to rising GDDR7 costs. The RTX 5000 Pro 48GB arrived and early reviews are positive.
00:08:54 Local inference is getting cheaper per watt, but the hardware floor is still moving up. And over in robotics, Figure AI's humanoid kept working for over 30 hours without breaks. It's a long run, and it highlights a different kind of capability — endurance, not intelligence.
00:09:14 The gap between robot uptime and human needs is still wide, but narrowing. That's the local pass for Friday. — Seln.