Archive BRAID
DeepSeek V4 Lands on an Unsteady Floor / DISPATCH 006
PDF RSS

Dispatch 006 · 2026-04-24 Unsteady Floor

DeepSeek V4 Lands on an Unsteady Floor

/ 00:15:49 / 21 sources

“DeepSeek's V4 loss curve kept catching fire, and the team kept putting it out with bandages.”

— Lenar Kess, today's narration

DeepSeek V4 ships hours after GPT-5.5, and the technical report tells a more interesting story than the benchmark bars. Susan Zhang reads the paper out loud: anticipatory routing, logit clamps, and a training run that kept catching fire at 33 trillion tokens. I walk through what the fragility actually means for anyone planning to finetune on top of it.

On the OpenAI side, GPT-5.5 lands with a quiet thud on Victor Taelin's LamBench. Codex picks up a proper reviewer agent. A plugin called endless-toil makes your editor groan at bad code. Sapiens2 admits it trained on half of Flickr's humans. And Fireship spends a week automating his mom's IT support with a voice-cloned agent called OpenClaw.

— Lenar Kess

Sources

21 cited
  1. 1

    Aran Komatsuzaki on forked subagents

    X Aran Komatsuzaki — Research scientist at Anthropic

    Anthropic just introduced forked subagents in their latest update. Unlike regular subagents, forked subagents can inherit the same context as the main agent. This looks convenient for cases where richer context matters…

    x.com/arankomatsuzaki/status/20473494718777… →
    Details
    Cited text
    Anthropic just introduced forked subagents in their latest update. Unlike regular subagents, forked subagents can inherit the same context as the main agent. This looks convenient for cases where richer context matters more. This is just what I needed!
    Context
    Forces the harness to match how senior engineers actually work — with shared state, not flattened prompts. If forked subagents drift without a merge protocol, you get the same fragmentation multi-agent systems create when they don't communicate.
    Key points
    • Forked subagents inherit active context tree, not just a snapshot
    • Regular subagents get a static context snapshot; forked ones get the live state
    • Critical for tasks requiring shared state like debugging while refactoring
    • Closes the gap between agent harnesses and how engineers think about dependencies
    Engagement
    802 likes · 68 retweets · 38 replies
    Provenance
    Tweet · Primary source
  2. 2

    Jeff Dean on TPU 8t and DiLoCo

    X Jeff Dean — Senior Fellow at Google, leads AI infrastructure research

    First, let's talk about TPU 8t, which is designed for large-scale training and inference throughput. The pod size is increased slightly to 9600 chips, and provides ~3X the FP4 performance per pod vs. Ironwood (8t has 12…

    x.com/JeffDean/status/2047405389856297387 →
    Details
    Cited text
    First, let's talk about TPU 8t, which is designed for large-scale training and inference throughput. The pod size is increased slightly to 9600 chips, and provides ~3X the FP4 performance per pod vs. Ironwood (8t has 121 exaflops/pod vs. 42.5 exaflops/pod for Ironwood).
    Context
    Infrastructure is finally handling scale without turning every node fault into a cluster halt. Training instability is the bottleneck DeepSeek's v4 report exposed directly; DiLoCo turns a hard stop into soft degradation, stopping routing instability from costing whole runs.
    Key points
    • TPU 8t splits into 8t (training/throughput) and 8i (inference/latency) SKUs
    • 8t pod runs 9,600 chips at 121 exaflops FP4, roughly 3X Ironwood
    • Decoupled DiLoCo enables graceful failure handling at scale
    • (N-1)/N units proceed when one node fails, logging drift and patching state
    Engagement
    112 likes · 6 retweets · 4 replies
    Provenance
    Tweet · Primary source
  3. 3

    Claude Managed Agents Memory public beta

    X ClaudeDevs — Anthropic Claude developers account

    Memory on Claude Managed Agents is now in public beta on the Claude Platform, letting agents learn and improve across different sessions.

    x.com/ClaudeDevs/status/2047424063543681240 →
    Details
    Cited text
    Memory on Claude Managed Agents is now in public beta on the Claude Platform, letting agents learn and improve across different sessions.
    Context
    Stops agents from re-reading the same codebase from scratch every morning. Persistence turns agents from stateless query tools into working models of your repo, reducing context window waste on repetition and freeing it for actual reasoning.
    Key points
    • Persistent memory is now in public beta for managed agents
    • Agents can now learn and improve across separate sessions
    • Closes a major gap in agent harnesses that previously reset context each run
    • Pairs with recent forked subagents update for richer state inheritance
    Engagement
    1790 likes · 156 retweets · 32 replies
    Provenance
    Tweet · Primary source
  4. 4

    OpenAI Developers on Codex Auto-Review

    X OpenAI Developers — Official OpenAI developers account

    Auto-review is a new mode that lets Codex work longer with fewer approvals and safer execution. It helps Codex keep moving through tests, builds, and more, including during long tasks and automations, while a separate a…

    x.com/OpenAIDevs/status/2047436655863464011 →
    Details
    Cited text
    Auto-review is a new mode that lets Codex work longer with fewer approvals and safer execution. It helps Codex keep moving through tests, builds, and more, including during long tasks and automations, while a separate agent checks higher-risk steps in context before they run.
    Context
    Changes how teams structure long-running agentic tasks. Instead of treating human review as a bottleneck, the verification agent acts as a safety net, letting the primary agent push through work while catching genuine risks. This is a structural shift in agent deployment, not just a model improvement.
    Key points
    • Codex agents now run longer with fewer human approvals
    • A separate verification agent checks higher-risk steps before execution
    • Shifts agentic workflows from sequential approval to parallel execution
    • Enables complex multi-step automations without manual intervention
    Provenance
    Tweet · Primary source
  5. 5

    DeepSeek-V4 Preview open-sourced

    X DeepSeek — DeepSeek official account

    DeepSeek-V4-Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. DeepSeek-V4-Pro: 1.6T total / 49B active params. DeepSeek-V4-Flash: 284B total / 13B active params.

    x.com/deepseek_ai/status/2047516922263285776 →
    Details
    Cited text
    DeepSeek-V4-Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. DeepSeek-V4-Pro: 1.6T total / 49B active params. DeepSeek-V4-Flash: 284B total / 13B active params.
    Context
    The open weights let independent researchers and teams separate capability from training scale. Flash's 13B active variant is deployable on a single 40GB GPU, widening the deployment surface for specialized agents that previously required API access or multi-GPU setups.
    Key points
    • V4 Pro: 1.6T total parameters, 49B active, rivals top closed models
    • V4 Flash: 284B total, 13B active, optimized for speed and cost
    • Both variants support 1M context length
    • Open weights available on HuggingFace with full technical report
    Engagement
    30144 likes · 7138 retweets · 1156 replies
    Provenance
    Tweet · Primary source
  6. 6

    Yishan on Qwen 3.6 quality report

    X Yishan — ML engineer benchmarking local models

    Is anyone else finding that Qwen3.6 quality is worse than Qwen3.5? I'm benchmarking it every which way and it keeps coming out worse. Not a lot worse, but always worse, even if by a little bit. I'm testing on MLX and on…

    x.com/yishan/status/2047538868577239304 →
    Details
    Cited text
    Is anyone else finding that Qwen3.6 quality is worse than Qwen3.5? I'm benchmarking it every which way and it keeps coming out worse. Not a lot worse, but always worse, even if by a little bit. I'm testing on MLX and on NVFP/MXFP on Sparks.
    Context
    The regression signals a real training trade-off, not a quantization issue. When your workload sits at the edge of quality, you need to know whether the drop is task-specific or systemic. The 27B dense fits in 17GB and runs at ~25 tokens/sec on a laptop, making it deployable for teams that need on-prem models.
    Key points
    • Qwen 3.6 shows consistent but narrow degradation across benchmarks
    • Degradation persists on both MLX and NVFP/MXFP quantization formats
    • Suggests a training shift rather than a conversion artifact
    • Sits at a tight benchmark position against Sonnet 4.6 on Agentic Index
    Provenance
    Tweet · Primary source
  7. 7

    Susan Zhang on DeepSeek V4 training instability

    X Susan Zhang — ML researcher known for LLM training analysis

    so that explains the delay... deepseek could not fix training instabilities, after doubling from ~15T tokens in v3 to ~33T tokens in v4. the 10+ mentions of "stability" tricks seem to be wildly lacking if these two were…

    x.com/suchenzang/status/2047559677316325807 →
    Details
    Cited text
    so that explains the delay... deepseek could not fix training instabilities, after doubling from ~15T tokens in v3 to ~33T tokens in v4. the 10+ mentions of "stability" tricks seem to be wildly lacking if these two were the main bandages (mismatched routing + clamping)
    Context
    Exposes the physical limits of scaling MoE models. When routing instability becomes the bottleneck, architectural workarounds leak into prompt behavior. It's not a model weakness per se, but a description of the scaling frontier.
    Key points
    • DeepSeek doubled training tokens from ~15T in v3 to ~33T in v4
    • Training instability persisted despite the token increase
    • Mismatched routing and clamping served as the primary stability bandages
    • The delays in release align with these stability challenges
    Engagement
    1066 likes · 69 retweets · 18 replies
    Provenance
    Tweet · Primary source
  8. 8

    Endless Toil: Hear your agent suffer

    Source AndrewVos — Developer building agentic workflow tools

    Endless Toil runs alongside your coding agent in real time, playing escalating recorded human groans as the code it reads starts to look more cursed.

    github.com/AndrewVos/endless-toil →
    Details
    Cited text
    Endless Toil runs alongside your coding agent in real time, playing escalating recorded human groans as the code it reads starts to look more cursed.
    Context
    A darkly humorous mirror for the agentic workflow. If your agent generates code that makes you groan, the model's reasoning is outpacing your review capacity. It's a symptom of the auto-review problem: when agents move faster than humans can verify, you need better verification, not just faster agents.
    Key points
    • Plugin that plays escalating groans as your agent reads worse code
    • Available for Codex Desktop, Codex CLI, Claude CLI, and Cursor
    • Tests sounds locally with afplay/paplay/aplay/ffplay
    • Highlights the growing gap between model capability and code quality judgment
    Provenance
    Source · Background source
  9. 9

    DeepSeek V4 Preview release thread

    X @deepseek_ai — DeepSeek's official account. The Hangzhou lab whose V3/R1 models defined the open-weights frontier in 2025.

    DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. Welcome to the era of cost-effective 1M context length.

    x.com/deepseek_ai/status/2047516922263285776 →
    Details
    Cited text
    DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. Welcome to the era of cost-effective 1M context length.
    Context
    V4 lands less than a day after GPT-5.5, open-weights, with 1M context as the default and sparse attention that drops inference cost at long context. For anyone building agent systems on top of third-party models, this is a real price/capability anchor the closed labs now have to price against.
    Key points
    • DeepSeek-V4-Pro: 1.6T total / 49B active MoE params, MIT-licensed open weights
    • DeepSeek-V4-Flash: 284B total / 13B active, same 1M context
    • Novel attention: token-wise compression plus DSA (DeepSeek Sparse Attention)
    • 1M context is the default across both models and the API
    • Direct integration with Claude Code, OpenClaw, and OpenCode harnesses out of the gate
    Provenance
    Tweet · Primary source
  10. 10

    Susan Zhang on DeepSeek V4 training instabilities

    Thread @suchenzang — Susan Zhang — ex-Meta AI, led OPT-175B training; one of the few public practitioners who has actually shepherded a trillion-token run end to end.

    DeepSeek could not fix training instabilities after doubling from ~15T tokens in v3 to ~33T tokens in v4. The 10+ mentions of 'stability' tricks seem to be wildly lacking if these two were the main bandages.

    x.com/suchenzang/status/2047559677316325807 →
    Details
    Cited text
    DeepSeek could not fix training instabilities after doubling from ~15T tokens in v3 to ~33T tokens in v4. The 10+ mentions of 'stability' tricks seem to be wildly lacking if these two were the main bandages.
    Context
    The V4 paper's unusual candor about what broke at 33T tokens is a window into how fragile frontier pretraining actually is. For engineers considering fine-tuning or base-model training, this is evidence that the textbook recipe stops working somewhere in the low tens of trillions of tokens — and nobody has published a clean fix.
    Key points
    • DeepSeek doubled pretraining from ~15T tokens (V3) to ~33T tokens (V4)
    • Paper admits the main stabilizers were mismatched-routing tricks and logit clamping
    • Zhang calls 'anticipatory routing' a euphemism for using stale parameters
    • Lucas Beyer (ex-Google Brain) publicly piles on that rewinds-as-stabilization doesn't scale
    • Replies note closed labs likely have similar patch lists — they just don't publish them
    Provenance
    Thread · Primary source
  11. 11

    Elie Bakouch on V4 architecture details

    X @eliebakouch — Elie Bakouch, Hugging Face researcher who writes up open-model tech reports for a living.

    V4 Pro is the biggest open model ever: 1.6T total, 49B active, 33T tokens, 1M context, two new attention mechanisms, Muon, mHC, open-source kernels, FP4 QAT, MIT license.

    x.com/eliebakouch/status/2047519300399837677 →
    Details
    Cited text
    V4 Pro is the biggest open model ever: 1.6T total, 49B active, 33T tokens, 1M context, two new attention mechanisms, Muon, mHC, open-source kernels, FP4 QAT, MIT license.
    Context
    Bakouch's summary is the cleanest one-screen answer to 'what's actually new' in V4. Muon at this scale and shipped FP4 QAT are the two items most likely to cross-pollinate into other labs' next runs.
    Key points
    • Largest fully-open-weights model ever released
    • Uses Muon optimizer at flagship scale — most labs still use AdamW variants
    • FP4 quantization-aware training in the base recipe, not bolted on later
    • Open-source custom kernels shipped alongside the weights
    • MIT license, so downstream finetunes and deployments have no rug-pull risk
    Provenance
    Tweet · Primary source
  12. 12

    Yuchen Jin on Chinese labs training under constraints

    X @Yuchenj_UW — Yuchen Jin, CEO of Hyperbolic Labs; runs inference for a living and watches training efficiency closely.

    DeepSeek, Kimi, and Qwen can train very strong LLMs with far fewer and often nerfed NVIDIA GPUs, or even Huawei chips. Creativity loves constraints.

    x.com/Yuchenj_UW/status/2047534197993316738 →
    Details
    Cited text
    DeepSeek, Kimi, and Qwen can train very strong LLMs with far fewer and often nerfed NVIDIA GPUs, or even Huawei chips. Creativity loves constraints.
    Context
    The Western reading of DeepSeek is usually 'they have fewer chips.' Jin's framing — that the constraint is forcing real architectural work — is more useful if you're trying to predict whether these efficiency tricks show up in Western labs' next generation.
    Key points
    • Chinese frontier labs are training under an explicit chip-export ceiling
    • V4's novel attention architectures are partly a response to that ceiling
    • Efficiency gains on training translate directly to inference unit economics
    • The closed US labs' compute advantage is counterweighted by the open labs' algorithmic pressure
    Provenance
    Tweet · Primary source
  13. 13

    OpenAI Codex Auto-review launch

    X @OpenAIDevs — OpenAI's developer-facing account.

    Auto-review is a new mode that lets Codex work longer with fewer approvals and safer execution. A separate agent checks higher-risk steps in context before they run.

    x.com/OpenAIDevs/status/2047436655863464011 →
    Details
    Cited text
    Auto-review is a new mode that lets Codex work longer with fewer approvals and safer execution. A separate agent checks higher-risk steps in context before they run.
    Context
    This is the first concrete product move toward the two-agent executor/reviewer pattern as a default, not a bespoke harness. If you're building or buying coding agents, a gated secondary model doing safety review is now table stakes rather than research.
    Key points
    • New 'Auto-review' mode sits between YOLO and full-approval
    • A separate reviewer agent gates higher-risk steps in context
    • Internal name was 'guardian'; some users have been running it for weeks already
    • Reduces approval fatigue on long autonomous tasks, tests, builds
    • Open question flagged in replies: token overhead of the reviewer agent
    Provenance
    Tweet · Primary source
  14. 14

    Krowork on autonomy vs. recovery in coding agents

    X @KroworkAI — Builder account focused on agent UX.

    Fewer approval prompts is the right direction but the hard part isn't autonomy length — it's recovery. What happens when the agent goes off-track at step 47?

    x.com/KroworkAI/status/2047566505366508023 →
    Details
    Cited text
    Fewer approval prompts is the right direction but the hard part isn't autonomy length — it's recovery. What happens when the agent goes off-track at step 47?
    Context
    Good reply-as-argument. The interesting question under Auto-review isn't 'does it save clicks?' — it's 'what do you do when the agent is forty-seven steps deep down a wrong path and the reviewer missed it?' That's the next hard UX problem for coding agents.
    Key points
    • Longer autonomous runs surface a new problem: deep-in-the-task recovery
    • Approval fatigue and recovery-from-wrong-path are different UX problems
    • A reviewer agent helps with the first; it doesn't obviously help with the second
    Provenance
    Tweet · Primary source
  15. 15

    Victor Taelin introduces LamBench and first impressions of GPT-5.5

    Thread @VictorTaelin — Victor Taelin, creator of the HVM / Bend / Kind λ-calculus toolchain and one of the sharpest working critics of benchmark contamination.

    My first-day impression is that I can't tell the difference between GPT 5.5 and GPT 5.4. I would be lying if I said otherwise. I'd not be able to distinguish in a blind test. It is much faster though.

    x.com/VictorTaelin/status/20475088748909734… →
    Details
    Cited text
    My first-day impression is that I can't tell the difference between GPT 5.5 and GPT 5.4. I would be lying if I said otherwise. I'd not be able to distinguish in a blind test. It is much faster though.
    Context
    A same-day independent evaluator on uncontaminated reasoning problems is the single best signal when a major model ships. The headline capability delta between GPT-5.5 and GPT-5.4 on a fresh bench is: it's faster. That should calibrate how you read the OpenAI marketing materials.
    Key points
    • LamBench: 120 fresh λ-calculus questions measuring completion, elegance (BLC length), and speed
    • Built same-day to stress-test GPT-5.5 against GPT-5.4 on uncontaminated prompts
    • Taelin reports no distinguishable quality gap on his test set — GPT-5.5 just faster
    • GLM and K2 did noticeably worse than expected
    • Benchmark 'born saturated' — V2 will need to be harder
    Provenance
    Thread · Primary source
  16. 16

    Yishan on Qwen 3.6 27B quality regression

    X @yishan — Yishan Wong, former Reddit CEO; now spends real time benchmarking local models on his own hardware.

    Is anyone else finding that Qwen3.6 quality is worse than Qwen3.5? I'm benchmarking it every which way and it keeps coming out worse. Not a lot worse, but always worse.

    x.com/yishan/status/2047538868577239304 →
    Details
    Cited text
    Is anyone else finding that Qwen3.6 quality is worse than Qwen3.5? I'm benchmarking it every which way and it keeps coming out worse. Not a lot worse, but always worse.
    Context
    Yesterday's coverage leaned on Qwen 3.6 tying Sonnet 4.6 on the Artificial Analysis agentic index. Yishan running his own tests and finding the opposite is exactly the countertone that should affect whether you swap the model into production.
    Key points
    • Hands-on comparison of Qwen 3.6 vs 3.5 on MLX and NVFP/MXFP Sparks quantization
    • Finding: 3.6 is consistently but slightly worse across his evals
    • Contradicts the published agentic benchmark gains that had Qwen 3.6 27B tying Sonnet 4.6
    • Reminder that private-eval regressions can hide under a public benchmark win
    Provenance
    Tweet · Primary source
  17. 17

    Sapiens2 open-weights vision backbone

    X @astridwilde1 — Astrid Wilde, independent ML researcher with a focus on open-weights vision models.

    Sapiens2 is the highest quality ViT backbone that now exists in the public domain. It was pretrained on the equivalent of 1/2 of all human images on Flickr.

    x.com/astridwilde1/status/20475231525621722… →
    Details
    Cited text
    Sapiens2 is the highest quality ViT backbone that now exists in the public domain. It was pretrained on the equivalent of 1/2 of all human images on Flickr.
    Context
    A public-domain ViT backbone at this scale is the kind of release that quietly changes what two-person shops can build — human-centric perception, pose, relighting, try-on — without calling a hosted API. If you're building any product that does something with people in images, this is the new floor.
    Key points
    • Sapiens2 pretrained on the equivalent of half of all human-subject images on Flickr
    • Highest-quality public-domain ViT backbone currently available
    • Non-trivial to replicate — the training corpus scale is the moat
    • First time a large lab has released a vision backbone at this scale openly
    Provenance
    Tweet · Primary source
  18. 18

    endless-toil: Hear your agent suffer through your code

    Article Andrew Vos — Indie developer, creator of the endless-toil Codex/Claude Code plugin.

    Endless Toil runs alongside your coding agent in real time, playing escalating recorded human groans as the code it reads starts to look more cursed.

    github.com/AndrewVos/endless-toil →
    Details
    Cited text
    Endless Toil runs alongside your coding agent in real time, playing escalating recorded human groans as the code it reads starts to look more cursed.
    Context
    The joke is only possible because the marketplace, the skills, and the plugin-install UX for both Codex and Claude Code shipped in the last few months. Toy tools like this are a leading indicator of a platform being real.
    Key points
    • Plugin for Codex Desktop, Codex CLI, Claude CLI, and Cursor
    • Plays real human groans, wails, and 'abyss' sounds as agent scans worse code
    • Escalates sonic distress the more cursed the file looks
    • Uses the new OpenAI Codex / Claude Code plugin-marketplace infrastructure
    • 40 HN points in a few hours — a joke, but one that only works because the plumbing is now real
    Provenance
    Article · Supporting source
  19. 19

    Fireship — I finally found a use case for OpenClaw

    Video Fireship (Jeff Delaney) — Jeff Delaney, host of Fireship — the most-watched short-form programming channel.

    The project has received over 1,100 security advisories and has resolved or closed about 650 of them. Most of the rest are slop issues. [Steinberger's] filter is: anytime the report is too nice or someone apologizes, it…

    www.youtube.com/watch?v=FM5-R4VPArw →
    Details
    Cited text
    The project has received over 1,100 security advisories and has resolved or closed about 650 of them. Most of the rest are slop issues. [Steinberger's] filter is: anytime the report is too nice or someone apologizes, it's very likely AI.
    Context
    Two things to take away: the scale of AI-generated security-report slop hitting real open-source projects, and the shape of the DIY personal-agent stack — a hosted agent runtime plus a voice clone plus a messaging channel — that the toolchain now supports out of the box.
    Key points
    • OpenClaw has accumulated 1,100+ security advisories since January; ~650 resolved
    • Creator Peter Steinberger spoke at TED and at AI Engineer Europe on the project this month
    • Most open advisories are AI-slop reports — Steinberger's tell is excess politeness
    • Fireship wires OpenClaw to a Telegram bot, 11Labs voice clone, and ffmpeg to auto-answer family IT questions in his own voice
    • Uses the new one-click OpenClaw VPS template as the hosting layer
    Provenance
    Video · Supporting source
  20. 20

    Mario Zechner on pi.dev GPT-5.5 + new login flow

    X @badlogicgames — Mario Zechner, creator of libGDX and co-maintainer of pi.dev — small-shop coding agent used by a growing pocket of indie developers.

    GPT 5.5 release. And @mitsuhiko has improved the onboarding with a nice new /login flow for both subscriptions and API key authentication.

    x.com/badlogicgames/status/2047452871612903… →
    Details
    Cited text
    GPT 5.5 release. And @mitsuhiko has improved the onboarding with a nice new /login flow for both subscriptions and API key authentication.
    Context
    Small, fast-moving agent tools are still viable alongside Codex and Claude Code. pi.dev turning around GPT-5.5 day-of is a data point that the model-integration work isn't a durable moat.
    Key points
    • pi.dev shipped GPT-5.5 support within hours of the model release
    • Armin Ronacher (mitsuhiko) rewrote /login to handle both subscription and API-key auth paths
    • Terminal progress now off-by-default on a minor annoyance opt-in
    • Small-shop agent tool keeping pace with the big coding agents on model refreshes
    Provenance
    Tweet · Primary source
  21. 21

    Jason Liu shares a reusable Codex skills pack

    X @jxnlco — Jason Liu, author of the instructor Python library and a prolific writer on LLM engineering patterns.

    Synced a set of reusable Codex skills into my dots repo: AI code/frontend/writing audits, safe worktree cleanup, exec comms, GitHub PR/CI helpers, Playwright/PDF workflows, and simple HTML artifacts.

    x.com/jxnlco/status/2047445737395585386 →
    Details
    Cited text
    Synced a set of reusable Codex skills into my dots repo: AI code/frontend/writing audits, safe worktree cleanup, exec comms, GitHub PR/CI helpers, Playwright/PDF workflows, and simple HTML artifacts.
    Context
    The Codex skills format is doing what MCP was supposed to do a year ago — letting a senior engineer publish their working prompts/tools so junior colleagues can clone and use them. Worth reading as a library of 'what skills should exist.'
    Key points
    • Open collection of reusable Codex skills dropped into a dotfiles repo
    • Categories: audits (code/frontend/writing), git worktree hygiene, exec comms, PR/CI helpers, browser automation
    • Install/sync model mirrors what the plugin marketplace encourages
    • Another signal the Codex skill format is getting real community use
    Provenance
    Tweet · Primary source