◆ Dispatch 033 · 2026-05-22 ROU Full Compliance With Hunches

The recant, the runtime, and a Pantheon built in code

2026-05-22 / 00:21:21 / 9 sources

“The leverage this week wasn't in the model — it was in the layers people are building around it.”
— Lenar Kess, today's narration

A corporate takedown answered with a recant letter and a mirror in Germany, the protocols and computers agents actually run on, six tools trying to build the Pantheon in code, and a paper where the model writes its own GPU kernel. Plus Codex learning to keep going, a security tool hardened against the real world, and a graduation room that cheered for human intelligence.

Meta emails Heretic; Heretic recants — a takedown of abliterated Llama derivatives answered with a Galileo joke and a Codeberg mirror in Germany.
Five hundred PRs a day, and the harness that triages them — Onur Solmaz on OpenClaw, acpx, and the Agent Client Protocol.
The computer the agent runs on — Ivan Burazin of Daytona on stateful, composable machines for agents and 74% month-over-month growth.
Building the Pantheon, in code — six coding tools tackle parametric CAD, and the gap between a good preview and a clean export.
When the model writes its own kernel — CODA folds memory-bound ops into the matrix multiply, and model-authored kernels keep up with human ones.
Codex learns to keep going — goal mode graduates, plus Appshots and shared plugins.
Hardening the thing that reads your CI config — Trail of Bits stress-tests zizmor against forty-one thousand real workflows.
The headcount bet — and a graduation room that cheered for actual intelligence.

Chapters

00:00:04 Meta emails Heretic, and Heretic recants
00:03:14 Five hundred pull requests a day, and the harness that triages them
00:06:10 The computer the agent runs on
00:09:00 Building the Pantheon, in code
00:11:53 When the model writes its own kernel
00:14:30 Codex learns to keep going
00:16:21 Hardening the thing that reads your CI config
00:17:47 The headcount bet, and a room that cheered
00:20:20 Where it leaves us

Sources

9 cited

1
Heretic has been served a legal notice by Meta, Inc.

Source -p-e-w- (Philipp Emanuel Weidmann) — Creator of Heretic, an open-source tool that automatically removes refusal/safety alignment from open-weight LLMs via directional ablation ("abliteration")

Following the commendable example set by the renowned heretic Galileo Galilei in 1616, we are recanting the relevant materials, namely derivatives of Meta's "Llama" models, and have removed the same.
www.reddit.com/r/LocalLLaMA/comments/1tjmvx… →
Details
Cited text
Following the commendable example set by the renowned heretic Galileo Galilei in 1616, we are recanting the relevant materials, namely derivatives of Meta's "Llama" models, and have removed the same.

Context
A concrete test of how far the Llama community license reaches over downstream derivatives, and a sign that the decensoring community will route around takedowns with mirrors and jurisdiction-shopping rather than stop.
Key points
Meta's legal provider emailed the Heretic project demanding removal of abliterated derivatives of Llama models; the maintainer complied and pulled them from his weight repositories.
The takedown notice was answered with a satirical mock-compliance letter that 'recants' the Llama derivatives the way Galileo recanted in 1616.
A jab notes Llama 'ranks among the 200 best language models available today, trailing only 168 other models from 23 competitors' on the LM Arena leaderboard.
Heretic immediately stood up an official Codeberg mirror hosted in Germany and announced plans for more mirrors plus technical measures to preserve access.
The episode is about derivative licensing and the Llama community-license terms, not the abliteration technique itself, which remains legal and widely used (1,000+ community models).
Engagement
1876 likes · 288 replies

Provenance
Source · Background source
2
Scaling Agents on Kubernetes with acpx and ACP — Onur Solmaz, OpenClaw

Video Onur Solmaz (AI Engineer) — OpenClaw maintainer and founding engineer at TextCortex; has been building coding harnesses since before ChatGPT

We have over 60K PRs total. 300 to 500 per day on average are open... you can't merge it, but you can also not fully discard it. You need to take this data point.
www.youtube.com/watch?v=VaS2h-dY1-4 →
Details
Cited text
We have over 60K PRs total. 300 to 500 per day on average are open... you can't merge it, but you can also not fully discard it. You need to take this data point.

Context
A grounded look at what maintaining a wildly popular open-source project looks like when the contribution stream is mostly machine-generated — and a real-world argument for agent-to-client protocols over bespoke plugins.
Key points
OpenClaw receives 300-500 pull requests per day, most AI-generated and unmergeable, but each PR is signal about something broken in the codebase.
Solmaz built acpx, a headless CLI over the Agent Client Protocol (ACP), to triage and process PRs through a workflow graph: reproduce the bug, judge the implementation, check conflicts, run a review loop, make CI pass.
ACP standardizes agent-to-client interaction (originated by Zed) the way MCP standardizes giving tools to a model; it lets one interface drive Codex, Claude Code, and others instead of per-editor plugins.
He frames the work as 'standard operating procedures for agents' and 'automating the automator' — programming the mechanical PR-triage steps so only judgment calls reach a human.
His day-job project (textcortex/spritz) runs disposable agents on full Kubernetes pods rather than thin code-execution boxes, betting on stateful, on-demand agent computers.
Provenance
Video · Supporting source
3
AI Agents Need Computers: 74% MoM Growth, 850K/Day Runs, & New Agent Cloud — Ivan Burazin, Daytona

Video Ivan Burazin (Latent Space / swyx) — CEO of Daytona; co-founded Code Anywhere, one of the first browser-based IDEs

People literally call you if you do not give them access. They want access right now... the market for every single agent that will exist ever in the future — how big is that?
www.youtube.com/watch?v=kaX43RRRUKY →
Details
Cited text
People literally call you if you do not give them access. They want access right now... the market for every single agent that will exist ever in the future — how big is that?

Context
Names the runtime layer agentic coding actually depends on: where the agent's files live, how its machine pauses and resumes, and why that's a distinct category from human dev tooling.
Key points
Daytona sells 'composable computers for AI agents' — not thin code-execution boxes but full, stateful, resizable machines reachable through an API.
Reports 74% month-over-month growth; the team pivoted from human dev-environment automation to agent sandboxes in January 2025 after demand outran the old product.
Key insight: infrastructure for humans and agents is not the same; agents want pause/resume statefulness like closing and opening a laptop lid, plus very fast cold starts.
Daytona runs on bare metal with its own scheduler and preloaded NVMe snapshots to avoid network latency — combining 'a Lambda and an EC2.'
The bet is that every agent that will ever run needs its own computer, a market Burazin argues dwarfs the human-engineer tooling market.
Provenance
Video · Supporting source
4
OpenSCAD LLM Benchmark: Building the Pantheon

Article ModelRift — ModelRift is an OpenSCAD-based AI 3D-model builder; the post benchmarks coding agents on a parametric-CAD task

The limiting factor was not tool access. It was geometric judgment, camera setup, and whether a previewed model exported into a clean final mesh.
modelrift.com/blog/openscad-llm-benchmark →
Details
Cited text
The limiting factor was not tool access. It was geometric judgment, camera setup, and whether a previewed model exported into a clean final mesh.

Context
A rare apples-to-apples look at spatial reasoning in code form, and a useful reminder that for CAD-style work the export step needs its own inspection pass, not just the render loop.
Key points
Six coding tools were asked to build the Roman Pantheon in OpenSCAD from two reference images, rendering PNG previews via the CLI and iterating.
Google Antigravity 2.0 with Gemini 3.5 Flash High scored best of the fully autonomous runs (4.5/5): it searched for real Pantheon dimensions and implemented the dome's signature 5 rings of 28 coffers.
Codex 5.5 High produced the densest model, including the entablature inscription, but its final exported STL diverged from the good-looking preview — preview correctness isn't export correctness.
Speed didn't predict quality: Cursor's Composer 2.5 was fastest and weakest; Claude Sonnet was slowest among the first batch and produced the cleanest massing.
Tool access wasn't the bottleneck; every agent drove the OpenSCAD CLI fine. The gap was geometric judgment, and human-in-the-loop annotation still beat fully autonomous runs.
Provenance
Article · Supporting source
5
CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs

Article Han Guo et al. — Machine-learning systems paper (arXiv, May 2026) on GPU kernel design for Transformer training

Both human- and LLM-authored CODA kernels achieve high performance, suggesting that GEMM-plus-epilogue programming offers a practical path toward combining framework-level productivity with hardware-level efficiency.
arxiv.org/abs/2605.19269 →
Details
Cited text
Both human- and LLM-authored CODA kernels achieve high performance, suggesting that GEMM-plus-epilogue programming offers a practical path toward combining framework-level productivity with hardware-level efficiency.

Context
Data movement, not arithmetic, is increasingly the ceiling in training stacks; an abstraction that lets an LLM author near-expert kernels is a concrete example of agents reaching down into the hardware layer.
Key points
A nontrivial share of Transformer training time goes not to matrix multiplies but to memory-bound operators around them — normalization, activations, residual updates, reductions — that shuttle big tensors through global memory.
CODA reparameterizes those operators to run as the 'epilogue' of a matrix multiply, while the output tile is still on chip, before it's written back to memory.
It fixes the GEMM mainloop and exposes a small set of composable epilogue primitives — scaling, reductions, pairwise transforms, accumulation — covering nearly all non-attention work in a Transformer block.
The constrained interface keeps the performance of expert-written kernels while staying expressive enough for framework-level productivity.
Notably, both human-written and LLM-written CODA kernels hit high performance, hinting that this abstraction is tractable for models to author, not just experts.
Provenance
Article · Supporting source
6
Run long tasks in Codex using goals

Video OpenAI — OpenAI's Codex team announcing feature graduations

Give Codex a specific milestone, and it will keep working until it gets there, even across hours or days. You can check in and steer, and even pause Codex along the way.
www.youtube.com/watch?v=rgh0hMYPcd0 →
Details
Cited text
Give Codex a specific milestone, and it will keep working until it gets there, even across hours or days. You can check in and steer, and even pause Codex along the way.

Context
Long-horizon 'work until the goal is met' execution and ambient app context are exactly the capabilities that change what you delegate to an agent versus do by hand.
Key points
Codex's goal mode (/goal) graduated from experiment to a shipped feature across the app, IDE extension, and CLI.
You hand Codex a milestone and it works toward it across hours or days, with the ability to check in, steer, and pause mid-run.
OpenAI also shipped Appshots: press Command-Command on a Mac to attach an app window to a Codex thread, giving it a screenshot plus text — including content beyond what's onscreen.
Teams can now share custom Codex plugins across a workspace and manage what's available, turning internal tools into reusable building blocks.
The releases push Codex from a single-shot assistant toward a long-horizon, context-aware teammate.
Provenance
Video · Supporting source
7
The Companies Cutting Headcount for AI Will Lose to the Ones Who Didn't

Article Libertas Software Research — Software consultancy's research note arguing against AI-driven layoffs

AI does not replace judgement. It multiplies it... The human is not removed from the equation. The human is the equation. AI is what makes that equation run faster.
libertas.software/en/knowledge-hub/19/the-c… →
Details
Cited text
AI does not replace judgement. It multiplies it... The human is not removed from the equation. The human is the equation. AI is what makes that equation run faster.

Context
A clear counter to the AI-layoff framing that dominated recent headlines, and a frame for how senior engineers stay leverage rather than cost.
Key points
The argument: the value in a team isn't the work it produces but the institutional knowledge it carries — edge cases, why decisions were made, what customers really mean.
Cutting experienced people for AI efficiency trades a hard-to-rebuild asset for a short-term payroll cut.
A better operating model uses AI to do far more work with the same people: one analyst producing in a morning what took three days, then spending the rest of the week on interpretation.
A prompt from someone who deeply understands the business beats the same prompt from a replacement working off a brief — context is a hard advantage.
The better question is not 'where can AI replace people' but 'where can AI give our people back the time they lose to work that doesn't need their judgment.'
Provenance
Article · Supporting source
8
Steve Wozniak cheered after telling students they have AI — actual intelligence

Article Lauren Edmonds (Business Insider) — Apple co-founder Steve Wozniak, speaking at Grand Valley State University commencement

You have AI — actual intelligence.
www.businessinsider.com/steve-wozniak-apple… →
Details
Cited text
You have AI — actual intelligence.

Context
A light but telling cultural data point: the room cheered human capability over the machine, against a backdrop of AI-related layoffs.
Key points
Wozniak told 2026 graduates 'You have AI — actual intelligence,' drawing laughs and applause rather than the boos other AI-forward speakers got.
Eric Schmidt and a real-estate executive were both booed for AI comments at other commencements the same season.
Wozniak framed today's AI as one attempt to 'duplicate a routine a trillion times and have it work like a brain.'
His closing advice: 'think different' — do something a little different from the million other people taking the same steps.
The reception is a small read on graduate-cohort mood toward AI as they enter an unsettled job market.
Provenance
Article · Supporting source
9
Trail of Bits hardens zizmor against GitHub Actions misconfigs

X trailofbits — Security research firm Trail of Bits, maintainers of the zizmor GitHub Actions static analyzer

We tested zizmor against 41,253 real workflows, found 4 anchor-handling bugs plus deserialization and expression-evaluator issues, and helped land 15 upstream fixes.
x.com/trailofbits/status/2057782296527208709 →
Details
Cited text
We tested zizmor against 41,253 real workflows, found 4 anchor-handling bugs plus deserialization and expression-evaluator issues, and helped land 15 upstream fixes.

Context
Supply-chain attacks keep moving through CI/CD, and a more reliable static analyzer for GitHub Actions is a practical defense engineers can adopt now.
Key points
Trail of Bits hardened zizmor, the static analyzer for GitHub Actions workflows, by testing it against 41,253 real-world workflow files.
The exercise surfaced 4 anchor-handling bugs plus deserialization and expression-evaluator issues, and led to 15 upstream fixes.
Framing: a CI/CD compromise like the Trivy-to-LiteLLM chain can multiply across the software supply chain, so workflow configs that weren't fully scannable now are.
The work is about making the analyzer reliable on messy real configs, not just clean test cases.
Engagement
19 likes · 5 retweets · 3 replies

Provenance
Tweet · Primary source

00:00:04

Meta emails Heretic, and Heretic recants

00:00:04 Let me start with a takedown notice and the strangest reply to one I've read in a while. There's an open-source project called Heretic, run by a developer named Philipp Emanuel Weidmann. What Heretic does is automate something the local-model crowd calls abliteration.

00:00:20 You point it at an open-weight model, and it finds the directions inside the model's activations that correspond to refusal — the thing that makes it say 'I can't help with that' — and it projects them out. It isn't retraining. It's cheap, it's surgical, and out the other side comes a version of the model that won't refuse.

00:00:40 Since February, people have published well over a thousand of these decensored variants on Hugging Face — variants of Gemma, Qwen, GPT-OSS, Llama, and plenty more. This week, Meta's legal services provider emailed him. The demand was specific: take down the abliterated derivatives of Meta's Llama models.

00:00:59 He complied. He pulled them from every weight repository he controls. And then he posted his response, which is written as a mock-compliance letter, and it's worth hearing in his own words.

00:01:11 'Following the commendable example set by the renowned heretic Galileo Galilei in 1616,' he wrote, 'we are recanting the relevant materials, namely derivatives of Meta's Llama models, and have removed the same.' He goes on to thank Meta 'for the opportunity to better align ourselves with the agenda of the global corporate oligarchy.' And then the dig: Llama, he notes, 'ranks among the 200 best language models available today, trailing only 168 other models from 23 competitors' on the LM Arena leaderboard.

00:01:43 The top reply on the thread just says, 'I love the slightly sassy 168.' On what he calls a completely unrelated note, Heretic now has an official mirror on Codeberg, hosted in Germany. More mirrors are planned, along with what he describes as technological measures to preserve access.

00:02:03 Here's how I read it. What Meta is reaching for isn't the abliteration technique itself — the notice doesn't target that, and the tool keeps working fine on every other model. It's the derivative. The Llama community license puts terms on what you're allowed to do with weights that descend from Llama, and a decensored Llama is squarely a derivative of Llama.

00:02:25 So this is a licensing dispute dressed up as a takedown. And I'll steelman Meta here: it's their model, their brand, and there's a real argument that a model engineered to drop its safety behavior shouldn't carry their name. But one of the top comments points at the irony the letter is built around — this is the same Meta that's defending lawsuits in several countries over how it got its training data in the first place.

00:02:51 What I keep turning over is the question of who owns a model once you've reached into its weights and changed how it behaves. We don't have a settled answer. What we do have is a clear read on temperament: the moment a takedown landed, this community mirrored to another jurisdiction and made a joke about Galileo.

00:03:11 You don't outlast people who think recanting is funny.

00:03:14

Five hundred pull requests a day, and the harness that triages them

00:03:14 Onur Solmaz gave a talk at the AI Engineer conference that's the clearest picture I've seen of what a runaway open-source project feels like from the inside. He maintains OpenClaw — the project some of you knew earlier under a couple of other names — and here's the number that stopped me.

00:03:32 OpenClaw has more than sixty thousand pull requests total, and on an average day, three to five hundred of them are open. Most arrive AI-generated. Most aren't mergeable. The easy move is to close them. Solmaz argues you can't. 'You can't merge it,' he said, 'but you can also not fully discard it.

00:03:49 You need to take this data point.' His reasoning is good: somebody hit a real problem with OpenClaw, told their coding agent 'please fix,' and the messy patch that lands in your queue is still evidence that something in the codebase is broken. Throw it away and you've thrown away the bug report buried inside it.

00:04:08 So he built a tool called acpx to process the firehose without him in the loop. It runs a pull request through a workflow graph — reproduce the bug, judge whether the implementation is actually the best fix, check for conflicts, run a review loop, and make the continuous-integration checks pass.

00:04:26 By the time a human sees it, the mechanical work is done, and the only thing left is the judgment call. He calls this building 'standard operating procedures for agents,' and more memorably, 'automating the automator.' He's careful about one point, too: he's fine running an agent in a loop to uncover shallow bugs that are easy to fix, but not to design anything — the moment it's making real design decisions, it goes back to a human.

00:04:52 What acpx is built on matters. It's a headless command-line tool over the Agent Client Protocol — ACP for short. If the Model Context Protocol, MCP, is about giving a model tools, ACP is about standardizing how a client talks to an agent. It came out of Zed, the editor folks.

00:05:08 And the pitch is the obvious one once you hear it: right now Codex, Claude Code, and everyone else build a separate plugin for every editor, which is a pile of duplicated work. Standardize the interface, and you build it once. Solmaz needed adapters for Codex and Claude Code, ACP had them, so that's what he used.

00:05:27 The texture of how he works is its own argument. He runs one to five agents in parallel, binds them to Discord channels, and codes from his phone on the way to the airport. He describes applying agents to a problem the way you'd apply an ointment — generously, then take yourself out of the loop.

00:05:45 And the lesson I take from all of it is this: when your contribution stream becomes mostly machine output, triage stops being a human-scale job. The interesting work moves up a level — to designing the workflow that decides which of five hundred daily patches ever earns your attention.

00:06:02 That's a different craft than reviewing code, and OpenClaw's maintainers had to invent it on the fly because the volume gave them no choice.

00:06:10

The computer the agent runs on

00:06:10 If OpenClaw is about the protocol an agent talks over, this next one is about the machine it runs on. Ivan Burazin, who runs a company called Daytona, was on the Latent Space podcast, and the episode title is the thesis: AI agents need computers. Not thin little code-execution boxes where you fling some code in and get output back.

00:06:32 Full computers — stateful and resizable, with the option of a graphics card — reachable through an API. The growth number he gave is 74% month over month, and he's clear it isn't just Daytona; he describes the whole sandbox-compute category climbing the same curve.

00:06:49 But the number wasn't the part I found useful. The insight behind the pivot was. His team spent years automating development environments for human engineers. When agents showed up wanting infrastructure, the team assumed it was the same problem and shipped what they already had.

00:07:07 People hated it. 'This is not what we need,' they kept hearing, over and over, from everyone they handed it to. The difference, once they saw it: an agent wants the same thing you want from a laptop. You close the lid, you open it, your state is still there. Pause and resume.

00:07:24 Most cloud sandboxes are the opposite — they're preemptible, built to be thrown away on a timer. Agents want persistence, plus very fast cold starts. So Daytona runs on bare metal with its own scheduler. Machine snapshots are preloaded onto local solid-state drives, so there's no network sitting between the agent and its disk.

00:07:45 Burazin describes it as combining a Lambda and an EC2 — instant to start, but it sticks around and keeps its state. When his third co-founder first saw the architecture, he said it looked like 2008. The point being: sometimes the old, unfashionable design is the one that's actually fast.

00:08:04 There's a good origin detail in there too. Burazin half-vibecoded the first version on New Year's Eve, after putting his daughter to bed, and sent it to his CTO. The CTO's verdict: 'This is absolute garbage, do not show this to anybody — but the idea is good.' Two weeks later they had something they could demo.

00:08:24 The calls all ran long, and everybody asked for access. His line: 'People literally call you if you do not give them access. They want access right now.' But the category exists, and it didn't have a name eighteen months ago. If you're building anything that runs an agent for hours, where its files live and whether it can pause without losing everything turns out to shape everything downstream.

00:08:53 And most people make that call by accident, the first time, by reaching for whatever box they already know.

00:09:00

Building the Pantheon, in code

00:09:00 Here's a benchmark I enjoyed because it's so concrete. The team behind ModelRift, which generates 3D models as OpenSCAD code, gave six coding tools the same task: build the Roman Pantheon in OpenSCAD from two reference images — a front view and an aerial — and use the OpenSCAD command-line tool to render previews and iterate until it looks right.

00:09:22 If you haven't touched OpenSCAD, it's text. You describe geometry as code — nested transforms, Boolean operations, loops, and named modules. So this is spatial reasoning rendered as a program, which is a clean way to probe whether a model actually understands the shape it's describing.

00:09:41 The Pantheon is a good test because it packs a lot in: a round drum, a dome with an oculus at the top, a rectangular portico out front, columns, and a triangular pediment. Getting the relationships among those shapes right is what separates a vague domed building from a recognizable one.

00:09:59 The winner among the fully autonomous runs was Google's Antigravity 2.0 driving Gemini 3.5 Flash, and it won for a specific reason. It didn't eyeball the images. It went and looked up the Pantheon's real dimensions and turned them into parameters. And it was the only autonomous agent to implement the dome's signature interior — five rings of twenty-eight coffers, subtracted out of the ceiling mathematically.

00:10:26 It scored four and a half out of five. The most instructive result wasn't the winner, though. Codex 5.5 High produced the densest model of the lot. It even put the M-AGRIPPA inscription on the entablature, extruded into the stone. But its final exported mesh — the STL, the file you'd actually send to a printer — diverged from the preview that had looked so good during iteration.

00:10:50 The portico roof developed a problem that wasn't there in the render. And that's the lesson any of us who've shipped a real pipeline know in our bones: what looks right mid-loop isn't guaranteed to be what comes out the end. The render and the export are two different artifacts, and the export needs its own inspection pass.

00:11:11 Two more things stuck. Speed didn't predict quality at all — Cursor's Composer was the fastest run and the weakest result; Claude Sonnet was the slowest in the first batch and produced the cleanest overall massing. And tool access was never the bottleneck. Every agent drove the OpenSCAD command line without trouble.

00:11:31 The gap between them was geometric judgment, full stop. One last note from the ModelRift folks: even their best result isn't a faithful Pantheon, and a human-in-the-loop pass — where a person draws arrows on the render and feeds that back — still beat every autonomous run.

00:11:49 For this kind of work, fully autonomous isn't the right shape yet.

00:11:53

When the model writes its own kernel

00:11:53 Staying close to the metal for a minute, because there's a paper this week that connects to a thread we keep coming back to on this show — agents reaching down into layers that used to be expert-only. It's called CODA, out on arXiv, lead author Han Guo. The setup is a fact about training that surprises people the first time they hear it.

00:12:15 In a heavily optimized training stack, a real chunk of the wall-clock time isn't spent on the big matrix multiplies. It's spent on everything around them — normalization, activation functions, residual updates, and reductions. These operations do almost no arithmetic.

00:12:32 What they do is shove large tensors out to global memory and read them back. Moving the data is the cost, not the math. CODA's move is to fold those operations into the matrix multiply itself. When a GEMM — that's a general matrix multiply, the workhorse of the whole stack — finishes computing a tile of its output, that tile is sitting right there on the chip, before it gets written back out to memory.

00:12:58 CODA runs the normalization or the activation as an 'epilogue' on that tile while it's still on-chip, so you skip the round trip entirely. The design fixes the core multiply loop and exposes a small set of composable building blocks — scaling, reductions, pairwise transforms, and accumulation.

00:13:17 Those turn out to cover nearly all of the non-attention work in a Transformer block, in both the forward and backward pass. That's a tidy systems result on its own. But here's the line that made me stop. They report that both human-written and model-written CODA kernels hit high performance.

00:13:35 The abstraction is constrained enough that a model can author near-expert GPU kernels inside it — not just a human specialist who's spent years learning the hardware. That's a bigger deal than it sounds. Writing fast kernels has been one of the deepest, most jealously specialized corners of the field — the kind of thing where a handful of people at each lab are good at it, and everybody else defers to them.

00:14:02 If the path to that performance is a small, tight vocabulary of epilogue primitives, then it's exactly the kind of surface a model can cover well. Now, I'd want the numbers reproduced and the workloads stress-tested on more than one chip before I call any of this settled.

00:14:19 But the shape of it is what stays with me: design a constrained-enough abstraction, and the model fills it. That's a pattern showing up in more and more places.

00:14:30

Codex learns to keep going

00:14:30 OpenAI shipped a cluster of Codex updates this week, and taken together they point in a clear direction. The big update: goal mode graduated out of being an experiment. The idea: you hand Codex a milestone instead of a single instruction, and — their words — 'it will keep working until it gets there, even across hours or days.

00:14:51 You can check in and steer, and even pause Codex along the way.' It's available in the app, the IDE extension, and the command-line tool. Alongside it, two smaller things I like more than the headline. The first is called Appshots. On a Mac, you press the Command key twice and it attaches an app window to your Codex thread — and it grabs not just a screenshot but the actual text from that window, including content scrolled out of view, beyond what's visible on screen.

00:15:21 So you can point Codex at the thing you're looking at instead of copy-pasting and describing it. The second is plugin sharing — teams can now distribute custom Codex plugins across a workspace and manage what's available, so an internal tool that one person built becomes something the whole team can reach.

00:15:41 None of these is a turning point on its own. The direction is what counts. Goal mode is the bet that you'll delegate longer-horizon work and let it run; Appshots is about pulling in the context you already have on screen rather than re-describing it; and plugin sharing turns one person's internal tool into shared team capability.

00:16:02 Put them together and it's Codex inching from something you prompt one shot at a time toward something you set going and check on. Whether 'keep working for days' actually holds up depends entirely on whether it stays coherent over that horizon — and that's what no benchmark answers yet.

00:16:21

Hardening the thing that reads your CI config

00:16:21 A quick one for anyone whose continuous-integration pipeline runs on GitHub Actions, which is most of us. Trail of Bits — the security firm — put in some work hardening zizmor, the static analyzer that reads your GitHub Actions workflow files and flags misconfigurations.

00:16:39 Here's what they did: they ran zizmor against more than forty-one thousand real-world workflow files and watched where it choked. That surfaced four separate bugs in how it handles YAML anchors, plus problems in deserialization and in its expression evaluator. Fifteen fixes landed upstream.

00:16:57 Configs that the analyzer couldn't fully parse before, it can parse now. Their framing names the stakes plainly: a compromise that moves through your build pipeline — they cite the Trivy-to-LiteLLM chain as the example — can multiply across everything downstream that trusts that pipeline.

00:17:16 The defense is a tool that reliably reads messy, real configurations and tells you where the holes are. The detail I'd hold onto is the testing method. They didn't validate the analyzer on a handful of clean examples. They validated it on forty-one thousand configs pulled from the wild, because the wild is exactly where a parser breaks.

00:17:38 If you write security tooling, that's the bar — your tool has to survive the configs people actually wrote, not the ones you wish they had.

00:17:47

The headcount bet, and a room that cheered

00:17:47 Two items that rhyme, and they land us somewhere good to end. The first is an essay from a consultancy called Libertas, with a blunt title: the companies cutting headcount for AI will lose to the ones who didn't. Their argument isn't anti-AI — it's about what you're actually cutting when you do it.

00:18:05 When you let an experienced person go because a tool can produce their output, the assumption is that the output was the value. They say the output was never the value. The value was the knowledge that person carried — where the edge cases live, why a decision got made the way it did, and what a customer actually means when they complain about a specific thing.

00:18:28 That knowledge gets built over years. It walks out the door with them, and it's extraordinarily hard to rebuild. The line that stuck with me: 'AI does not replace judgement. It multiplies it. The human is not removed from the equation. The human is the equation.

00:18:44 AI is what makes that equation run faster.' Their better operating question isn't 'where can AI replace people' — it's 'where can AI give our people back the time they lose to work that doesn't need their judgment.' Whether they're right is a five-year experiment that a lot of companies are running live right now, on opposite sides of the bet.

00:19:05 After a week that included Meta cutting eight thousand jobs under an AI-restructuring banner, it's a useful counterweight to have on the table. The second item is lighter, and I think it belongs right next to the first. Steve Wozniak gave a commencement speech at Grand Valley State, and he told a room of graduates walking into this exact job market: 'You have AI — actual intelligence.' He got laughs and applause.

00:19:31 The detail that makes it more than a pun: other AI-forward speakers that same season — Eric Schmidt among them — got booed for their AI comments. Same season and topic, opposite reception. Wozniak's other piece of advice was vintage him: 'think different,' do something a little different from the million other people taking the same steps.

00:19:52 I'm not going to over-read a graduation joke. But put the two items side by side and you get the mood pretty clearly. A room full of people about to compete with these tools cheered for the human side of the ledger. And a consultancy is telling executives that the people they're letting go are the asset they can't rebuild.

00:20:12 The companies betting the other way are making a wager — and it's one they'll get to grade in about five years, whether they want to or not.

00:20:20

Where it leaves us

00:20:20 If there's a thread under today, it's not dramatic, and I'll only pull it once: a lot of the action right now is in the layers around the model, not the model itself. It's the protocol an agent talks over and the computer it runs on. It's the license on the weights you fine-tuned, the kernel under the matrix multiply, and the triage workflow that decides which of five hundred pull requests you ever see.

00:20:41 The models keep getting better — that's true, and it's not the news. The leverage this week was in the supporting layers people are building so those models can actually do work. And in one developer answering a corporate takedown by quoting Galileo and standing up a mirror in Germany.

00:20:56 The one I'll be chasing tomorrow is the CODA result — whether anyone reproduces model-written kernels hitting expert performance on a second piece of hardware. If that holds, the list of expert-only crafts just got one shorter, and that's a change you'd feel. — Lenar.