◆ Dispatch 034 · 2026-05-23 GCU Stay The One Who Decides

Fast models, slow developers — and the part of the job that stays yours

2026-05-23 / 00:21:39 / 10 sources

“When the machine gets fast and capable and cheap, the only job that stays yours is being the one who decides.”
— Lenar Kess, today's narration

A Saturday episode about what your job becomes when the model writes the code — and writes it fast. The bottleneck moved from typing to deciding, and a surprising number of this week's stories land on the same instruction: stay the one who decides. Plus a price floor, a reclassification, a year of bold predictions, and a 4-year-old gaming card that won't quit.

"I don't write code anymore" — Pieter Levels, amplified by Marc Andreessen, and the real-thing/bubble-thing tangle inside it.
Fast Models Need Slow Developers — Sarah Chieng of Cerebras on Codex Spark at 1,200 tokens a second, and why the discipline matters more, not less.
DeepSeek's permanent 75% cut and NVIDIA folding gaming into "Edge Computing" — two ends of the same pipe.
Jack Clark's year of predictions at Oxford — and the cognitive-atrophy counterpoint.
BeeLlama's DFlash update — 164 tokens a second on a single RTX 3090.
Lobster Trap — Sally Ann O'Malley of Red Hat on containerizing an OpenClaw agent setup.
How the rest of the world sees this — and a couple overheard in a Copenhagen park.

Chapters

00:00:04 Six months since he wrote code
00:02:05 Fast models, slow developers
00:06:40 Two ends of the same pipe
00:09:57 Jack Clark's year of predictions
00:13:46 164 tokens a second on a 3090
00:16:32 Containerizing the agent
00:18:42 How the rest of the world sees this

Sources

10 cited

1
"I don't write code anymore"

X @levelsio — Pieter Levels — indie maker behind a string of small one-person products (Nomad List, PhotoAI); known for shipping solo

I don't write code anymore. I haven't written code in I think 6 months? I think everyone is like this no?
x.com/levelsio/status/2058116725929828722 →
Details
Cited text
I don't write code anymore. I haven't written code in I think 6 months? I think everyone is like this no?

Context
A concrete data point on how far solo-maker workflows have moved toward agent-driven building — and a useful test of where that generalizes and where it doesn't.
Key points
Levels says he hasn't written code in roughly six months, working entirely through Claude Code
His setup is browser tabs into his own sites on a VPS, synced to his phone, agent open to fix or build
Claims 'everyone is like this' — the contested part
Marc Andreessen amplified it to a large audience with a one-word comment: 'Interesting.'
Engagement
1233 likes · 102 retweets

Provenance
Tweet · Primary source
2
Andreessen reposts the levelsio coding setup

X @pmarca — Marc Andreessen, co-founder of Andreessen Horowitz

Interesting.
x.com/pmarca/status/2058144277340049588 →
Details
Cited text
Interesting.

Context
Shows how the solo-maker workflow narrative travels — and gets flattened into a slogan — once a large account amplifies it.
Key points
Andreessen quote-tweeted Levels's screenshot of an all-day VPS + Claude Code setup
His entire comment was one word
The amplification is what put the 'I don't write code anymore' line in front of hundreds of thousands
Provenance
Tweet · Primary source
3
Fast Models Need Slow Developers — Sarah Chieng, Cerebras

Video Sarah Chieng (Cerebras), via AI Engineer — Sarah Chieng, head of developer experience at Cerebras

A lot of these bad habits that we had before that were generating maybe 50 tokens per second of bad code — unless we fix them, they're going to start generating 1,200 tokens per second of bad code.
www.youtube.com/watch?v=TeGsFFNqRLA →
Details
Cited text
A lot of these bad habits that we had before that were generating maybe 50 tokens per second of bad code — unless we fix them, they're going to start generating 1,200 tokens per second of bad code.

Context
The sharpest articulation of how fast generation changes the engineer's job — the bottleneck moves from typing to deciding, and the discipline matters more, not less.
Key points
Codex Spark (Cerebras + OpenAI) generates code at ~1,200 tokens/sec vs 40-60 for Sonnet/Opus — about 20x faster
Slow-era bad habits — giant one-shot prompts, huge commits, unverified agent swarms — produce bad code 20x faster now
Validation becomes 'basically free': lint, pre-commit hooks, diff reviews, browser tests at every step
Orchestrate by model strength: big model plans, fast model executes; capture good sessions as reusable skills
Cherry-pick across many generated variants to 'artificially induce taste'; stay the decision-maker; externalize agent memory into plain files because compaction now arrives in ~30 seconds
Provenance
Video · Supporting source
4
DeepSeek cuts V4-Pro prices by 75%

Article The Next Web

DeepSeek is making its 75% API discount permanent.
thenextweb.com/news/deepseek-v4-pro-price-c… →
Details
Cited text
DeepSeek is making its 75% API discount permanent.

Context
Frontier-class inference is racing to a price floor; the cache-hit discount specifically rewards agent loops that re-send stable context every step.
Key points
The 75% V4-Pro discount, framed as a promo in April, is now permanent (effective after promo ends end of May)
New rate is roughly one quarter of the original — about $0.44 per million input tokens, $0.87 per million output by secondary reporting
Paired with a ~90% cut to input cache-hit costs across the API
Reported framing: targeting developers frustrated with Western providers' rate limits and restrictions
Provenance
Article · Supporting source
5
NVIDIA Removes Gaming Revenue Category From Financial Reports

Article Hilbert Hagedoorn — Guru3D

NVIDIA is signaling a broader strategic shift toward accelerated computing and AI-driven markets.
www.guru3d.com/story/nvidia-removes-gaming-… →
Details
Cited text
NVIDIA is signaling a broader strategic shift toward accelerated computing and AI-driven markets.

Context
The company that built itself on GeForce now treats consumer gaming as a footnote next to data-center demand — a clean read on where the money sits.
Key points
NVIDIA folded standalone 'Gaming' into a broader 'Edge Computing' category in its fiscal 2027 Q1 report
Edge Computing — GeForce, AI PCs, workstation, consoles, robotics, networking, automotive — was ~$6.4B for the quarter
Total quarterly revenue ~$81.6B, up 85% year over year, driven by data center
NVIDIA says it is not exiting gaming hardware; RTX cards keep shipping — but gaming is no longer the headline story
Provenance
Article · Supporting source
6
AI will help make a Nobel prize-winning discovery within a year, says Anthropic co-founder

Article Robert Booth (The Guardian) — Reporting on Jack Clark, Anthropic co-founder and author of the Import AI newsletter, speaking at Oxford

If we stand by and let synthetic intelligence multiply, then we'll eventually be forced into reactivity.
www.theguardian.com/technology/2026/may/21/… →
Details
Cited text
If we stand by and let synthetic intelligence multiply, then we'll eventually be forced into reactivity.

Context
A sober forecaster's calibrated bets are worth engaging — and the falsifiable ones (AI-run revenue, self-designed successors) are the ones to actually hold him to.
Key points
Clark's spread of predictions: AI-assisted Nobel discovery within 12 months; bipedal robots helping tradespeople in 2 years; AI-run companies making millions within 18 months; AI designing its own successors by end of 2028
Still flags a 'non-zero chance of killing everyone on the planet' and says that risk hasn't gone away
Notes Anthropic's Mythos model proved 'alarmingly capable at exploiting cybersecurity weaknesses'
Says he'd prefer to slow down 'to give ourselves more time as a species' but expects competition to prevent it
Co-host Edward Harcourt (Oxford Institute for Ethics in AI) warned of 'cognitive atrophy' and argued for 'Socratic' AI that makes humans do more thinking
Provenance
Article · Supporting source
7
BeeLlama v0.2.0 — major DFlash update on a single RTX 3090

Source Anbeeld (r/LocalLLaMA)

Squeezing that 3090 like a lemon.
www.reddit.com/r/LocalLLaMA/comments/1tkpz2… →
Details
Cited text
Squeezing that 3090 like a lemon.

Context
A 4-5x speedup on a four-year-old consumer card is the difference between local models being a toy and being usable in an agent loop you run on hardware you own.
Key points
On a single RTX 3090 (24GB): Qwen 3.6 27B up to 164 tokens/sec (~4.4x llama.cpp baseline); Gemma 4 31B up to 177.8 tokens/sec (~4.9x)
Mechanism is speculative decoding (DFlash): a small draft model proposes tokens, the target verifies in parallel
Update adds lower draft overhead, draft K/V projection caching, stricter draft/target validation, safer fallback to full logits
Prompt processing stays near baseline; speedup depends on acceptance rate and is workload-dependent
Top comment asks the real test: does it hold for 200K-token agentic coding chats?
Provenance
Source · Background source
8
Lobster Trap: OpenClaw in Containers from Local to K8s and Back — Sally Ann O'Malley, Red Hat

Video Sally Ann O'Malley (Red Hat), via AI Engineer — Red Hat engineer presenting at the AI Engineer conference

Sharing a good agent setup usually means handing someone a pile of markdown, config files, and YAML and hoping they reproduce what you have.
www.youtube.com/watch?v=F1DYkY1BlfM →
Details
Cited text
Sharing a good agent setup usually means handing someone a pile of markdown, config files, and YAML and hoping they reproduce what you have.

Context
Signals the agent maturing from a personal dotfile into a versioned, deployable artifact — provisioned and governed like any other piece of the stack.
Key points
Packages an OpenClaw agent setup as a container image so a personal config becomes a reproducible team baseline
Podman locally, spin up a sub-agent in ~2 seconds, flip a flag to run the same image on Kubernetes
Secrets handled in two layers: Podman secrets for host API keys, OpenClaw secret references inside the agent
The constraint it solves is reproducibility — config drift is why a teammate's agent behaves differently
Provenance
Video · Supporting source
9
Overhearing "world models and grounded video gen" in a Copenhagen park

X @niloofar_mire — AI researcher

I overheard the couple next to me talking about world models and grounded video gen LOL
x.com/niloofar_mire/status/2058148404673331… →
Details
Cited text
I overheard the couple next to me talking about world models and grounded video gen LOL

Context
A light counterpoint to the bubble — even the park bench is inside it now — that sets up how differently people outside tech experience all this.
Key points
Researcher flew to Copenhagen burnt out, to detach
Sitting in a random park, overheard strangers discussing world models and grounded video generation
A small, funny marker of how pervasive the AI conversation has become
Provenance
Tweet · Primary source
10
Is AI viewed as "evil" in non-tech communities?

Source Due_Drummer5147 (r/singularity)

For a lot of people, there's limited upsides to AI right now... right now it's not serving most people and in many cases it's causing harm.
www.reddit.com/r/singularity/comments/1tl68… →
Details
Cited text
For a lot of people, there's limited upsides to AI right now... right now it's not serving most people and in many cases it's causing harm.

Context
The view from outside the bubble, stated without strawmanning — a reminder that most people experience AI as something done to them, not a tool they wield.
Key points
A data engineer asks for a reality check after a hostile reaction to suggesting AI to non-tech people
Top reply (hundreds of upvotes): people see AI shoved into everything by billionaires 'siphoning the planet's energy and water', livelihoods lost, creatives first
Concedes medicine/math strides but says it isn't serving most people and often causes harm
Another commenter notes even r/technology skews anti-AI
Provenance
Source · Background source

00:00:04

Six months since he wrote code

00:00:04 Pieter Levels — you might know him as levelsio, the indie maker behind a string of small, profitable, one-person products — posted three lines yesterday that Marc Andreessen reposted to a few hundred thousand people. Here they are, in full: I don't write code anymore.

00:00:21 I haven't written code in I think six months? I think everyone is like this, no? And in a companion post he showed the setup: his laptop and his iPhone all day, just browser tabs into his own sites. Everything runs on a virtual private server, synced to his phone, with Claude Code open to fix things or build new ones.

00:00:42 He works from anywhere. Andreessen's entire comment on this was one word: Interesting. Two things are tangled together here, one real and one bubble, and they're worth pulling apart. The real one: for what Levels actually does — small products, he owns every line, he knows exactly what he wants — I believe him completely.

00:01:03 That's the work where a capable agent takes over the typing, and you spend your day describing and checking instead of writing. I'm not skeptical of that at all. The bubble one is that last line. I think everyone is like this, no? No. Not yet, and maybe not ever for a lot of work.

00:01:22 Anyone who's shipped a big system — a decade of history, a compliance surface, and forty other engineers touching the same files — knows that 'just let Claude Code fix it' isn't where Tuesday goes. The distance between Levels's Tuesday and that Tuesday is the whole interesting question.

00:01:41 And the question I'd actually ask isn't whether you still type code. It's what your job turns into when the model writes it — and writes it fast. Because if you take Levels seriously, the bottleneck stopped being your fingers a while ago. It moved somewhere else.

00:01:58 There was a talk this week that's the sharpest thing I've heard on exactly where it moved, so let me start there.

00:02:05

Fast models, slow developers

00:02:05 The talk is called Fast Models Need Slow Developers. It's by Sarah Chieng, who runs developer experience at Cerebras — the company building those wafer-scale chips. About a month ago Cerebras and OpenAI shipped a model called Codex Spark, and the headline number is the whole reason for the talk.

00:02:24 Codex Spark generates code at about 1,200 tokens a second. For comparison, she puts the Claude Sonnet and Opus families at 40 to 60 tokens per second. So this is roughly 20 times faster. She mentions they had to change the y-axis on the chart just to fit it. Her thesis is simple, and it reframes the whole speed race.

00:02:44 In her words: a lot of these bad habits that we had before that were generating maybe 50 tokens per second of bad code — unless we fix them, they're going to start generating 1,200 tokens per second of bad code. So what are the bad habits? She lists the ones we all picked up because generation was slow.

00:03:04 Writing one giant prompt and trying to one-shot the whole thing. Making huge commits. Running ten agents on screen at once, all of them spinning, thinking. And then there's the social-media flex — somebody running six terminals at once, a 500-agent coding swarm, or eight agents across five screens.

00:03:23 Her verdict on all of it: the reality of what is happening in all these setups is that we're generating massive amounts of code that nobody is verifying. The old slow workflow, she says, was — you spawn a session, you go get a hamburger, you scroll Twitter, and you come back.

00:03:40 That was fine when the model was slow and you were waiting anyway. It isn't fine when the model spits out a thousand lines while you're picking toppings. Here's where it flips the hype on its head, and it's why I wanted to lead with it. Her playbook for fast models is mostly about slowing the human down and being more deliberate, not less.

00:04:02 A few of the concrete moves. First, validation becomes free, so you do it everywhere. Her line: at 1,200 tokens per second, a model like Codex Spark makes validation basically free. There is no excuse and no reason why you should not be doing things like this. Linting, pre-commit hooks, diff reviews, and browser-based testing — baked into every step instead of saved for the end, because each check now costs you almost nothing in wall-clock time.

00:04:31 Second, orchestrate by model strength. Use a big, smart model for planning and long-horizon reasoning, and the fast model as the executor. When a session goes well, capture it as a reusable skill, so the fast agent can repeat a known-good workflow instead of improvising it again.

00:04:49 Third, and this one's fun, cherry-pick. Her words: I can have it generate 15 versions in the same time that it would have taken the previous model to generate one version, and I can cherry-pick the version that I like the best. Spin up five sub-agents, get 75 versions, and keep the best one.

00:05:08 She calls it a way to artificially induce taste — useful when you want variety, like design directions or architecture options. Fourth, stay in the driver's seat. The AI should always be helping you make decisions, not the other way around. Sit with two or three sessions and actually steer them — ban the model from deleting files, cap the diff size, and tell it exactly what to touch: only change this, leave the types alone for now.

00:05:35 And fifth, context management gets more urgent, not less. Her math: if it used to take ten minutes to fill your context window, divide by twenty — now you're hitting compaction in 30 seconds, and compaction is where the agent loses track of what it knew. Her words: you can't get away with sloppy practices anymore.

00:05:56 Her fix is to externalize the agent's memory into a handful of plain files: one defines the sub-agents, one holds the full plan as a checklist, one tracks what's done and what's next, and one handles verification. A fresh session always knows where to pick up. What I take from all of it is this.

00:06:15 The flex is the swarm, but the craft is the discipline. Speed doesn't remove the engineer from the loop; it punishes you faster when you remove yourself. The bottleneck moved from typing to deciding — which is exactly what Levels was describing, just from the other end.

00:06:32 And if deciding is the job now, then the worst move you can make with a 20-times-faster model is to stop paying attention.

00:06:40

Two ends of the same pipe

00:06:40 Two money stories landed this week that are really the two ends of the same pipe. Start with the cheap end. DeepSeek announced that the 75% discount it had been running on its V4-Pro model is now permanent. Back in April they framed it as a promotion; as of yesterday, the promo just becomes the price.

00:06:59 After the promotional period ends at the end of May, V4-Pro lands at one quarter of its original rate. By the secondary reporting I've seen, that's somewhere around 44 cents per million input tokens and 87 cents per million output tokens, which puts a frontier-class model among the cheapest you can call.

00:07:18 They paired it with a roughly 90% cut to the cost of input cache hits across the whole API. That cache-hit detail is the one I'd flag for anyone running agents. If your workload re-sends a big stable chunk of context on every step — a system prompt, a codebase, or a long set of instructions — the cache discount is where the real savings live.

00:07:40 That repeated context is most of what you pay for in an agent loop, so a 90% cut there is a different cost model, not a coupon. Now, the framing around this, reported by a few outlets, is that DeepSeek is going after developers frustrated with Western providers — the rate limits, the restrictions.

00:07:59 I'd take that as the marketing narrative and hold it loosely. What's verifiable, and what actually matters to you, is the price and the cache economics. The caveat that doesn't go away: where the model runs and who sees your data is a live governance question for a lot of teams, and a low price doesn't answer it.

00:08:18 The other end of the pipe is NVIDIA. In its most recent quarter — the first quarter of fiscal 2027 — NVIDIA reported about 81.6 billion dollars in revenue, up 85% from a year earlier. And folded into the reporting structure was a change: they took Gaming, which had always been its own category, and merged it into a broader bucket they're calling Edge Computing.

00:08:41 That new bucket — which now lumps together GeForce gaming cards, AI PCs, workstation parts, consoles, robotics, networking, and automotive — came to about 6.4 billion for the quarter. Think about that. Gaming built this company. GeForce is why most of us first knew the name.

00:08:58 And it's now a line item inside a grab-bag category, because next to the data-center business it's a rounding error. NVIDIA was careful to say they're not walking away from gaming hardware, and I believe them — RTX cards keep shipping. But they're no longer telling the gaming story to investors.

00:09:17 They're telling the accelerated-computing story, full stop. Put the two together and you get a clean picture of where the money sits. The shovels — the data-center silicon — are where the growth is, to the point that the consumer line gets absorbed into a footnote.

00:09:34 And the inference built on those shovels is racing toward a price floor, with DeepSeek dragging frontier-class output down to cents. For a builder, that second trend is the one you can actually spend. Running agent loops at volume gets cheaper every quarter, and the cache economics reward you for keeping your context stable instead of rebuilding it on every call.

00:09:57

Jack Clark's year of predictions

00:09:57 On Wednesday, Jack Clark gave a lecture at Oxford. Clark is a co-founder of Anthropic and the person behind Import AI, which is one of the more sober newsletters in this space. He's not a hype account, which is part of why the predictions he made are worth engaging with rather than waving off.

00:10:15 Here's the spread, as reported by the Guardian. Within 12 months, he expects an AI system to work with humans on a Nobel-prize-winning discovery. Two years out, bipedal robots helping tradespeople. Inside 18 months, companies run solely by AIs and generating millions of dollars in revenue.

00:10:33 And by the end of 2028, AI systems able to design their own successors. He described a vertiginous sense of progress. He also didn't soft-pedal the other side. He said there remain plausible scenarios where the technology has, his words, a non-zero chance of killing everyone on the planet, and that it's important to clearly state that that risk hasn't gone away.

00:10:56 He mentioned that Anthropic recently launched a model called Mythos that proved, quote, alarmingly capable at exploiting cybersecurity weaknesses. And he said he'd prefer to slow the whole thing down to give ourselves more time as a species — but that it won't happen.

00:11:13 Too many actors and countries are locked in competition, and commercial and geopolitical rivalry drowns out the existential questions. Let me steelman the predictions and then push on them. The strongest case for Clark is that he's been roughly directionally right before, and he's framing these as a spread of bets, not a single prophecy.

00:11:34 Some will land, some won't, and the point is the slope. Where I get skeptical is that the predictions aren't all the same kind of claim, and they're not equally falsifiable. AI helps make a Nobel-winning discovery in 12 months is the soft one — you could declare that true after the fact for almost any prize-adjacent result where a researcher used a model somewhere in the pipeline.

00:11:58 It's the kind of prediction that's hard to lose. The ones that count, the ones I'd actually hold him to, are the falsifiable ones: a company run end-to-end by AI, with audited books, clearing serious revenue inside 18 months; and AI systems designing their successors by 2028.

00:12:16 Those you can check. If a company actually run by AIs is posting audited revenue a year and a half from now, that changes my picture of the world. I don't expect it on that timeline. I'd love to be wrong. But the moment from that Oxford event that stuck with me wasn't Clark.

00:12:33 It was the counterpoint from the person who co-hosted him. Edward Harcourt, who directs Oxford's Institute for Ethics in AI, warned that AIs doing more and more for us risks what he called cognitive atrophy — a weakening of human judgment and decision-making. And he argued for a different kind of system, what's sometimes called Socratic AI: models built to make the human do more of the thinking, not less.

00:12:59 And this is where two of today's threads rhyme. Sarah Chieng, in a completely different room, talking about 1,200 tokens a second, landed on the same instruction: the AI should help you make decisions, not the other way around. Harcourt, worrying about a whole society, lands in the same place — build the tools so people keep deciding.

00:13:20 Same worry, two altitudes. The fast-model engineer and the ethics professor are both telling you not to hand over the part of the job that's actually yours. I don't know which of Clark's dates are right. My read is that the timelines are the least interesting part.

00:13:36 The interesting part is the instruction hiding underneath all of it, and it isn't about the models. It's about whether the humans stay in the loop on purpose.

00:13:46

164 tokens a second on a 3090

00:13:46 While the frontier labs are showing off 1,200 tokens a second on wafer-scale hardware, there's a parallel speed story happening on the gear in your closet. A developer who goes by Anbeeld shipped version 0.2.0 of a project called BeeLlama, and the numbers turned heads in the local-model crowd.

00:14:06 On a single RTX 3090 — a four-year-old gaming card with 24 gigabytes of memory — they're getting Qwen 3.6, the 27-billion-parameter model, up to 164 tokens a second. That's about 4.4 times the baseline from llama.cpp, the standard local runtime. Gemma 4, at 31 billion parameters, hits about 177 per second, almost five times baseline.

00:14:28 And prompt processing speed stays near baseline, so you're not paying for the gain on the way in. The mechanism is speculative decoding, done carefully. The idea: a small, fast draft model guesses the next several tokens, and the big target model verifies that guess in parallel.

00:14:47 When the guess is right, you got several tokens for roughly the cost of one. This release is a pile of careful engineering on top of that idea: lower draft overhead, cached key-value projections from the draft model, and stricter checking that the draft and target actually agree.

00:15:06 There's also a safer fallback to full computation when the grammar, the sampler, or a reasoning step needs it. The caveat with all speculative decoding is that the speedup depends on acceptance rate — how often the draft's guesses are right — and that's workload-dependent.

00:15:24 Unusual or heavily-reasoned outputs accept fewer guesses and fall back toward baseline. So that 164 is the good case, not a promise for every prompt. But the good case matters, because it's the difference between a local model being a toy and being something you'd actually put in an agent loop.

00:15:44 One commenter asked the right follow-up: can this hold up for 200-thousand-token agentic coding chats? That's the test that matters — long context, real work, on hardware you own. Another just said, squeezing that 3090 like a lemon, which is about right. I find this the more grounding of the two speed stories.

00:16:05 The Cerebras number is a thousand-plus tokens a second on a chip you rent from a data center. This one runs at 164 on a card you can buy used, running a model whose weights you hold. Both numbers hold up. But only one of them runs when the API is down, or when you flatly don't want your code leaving the building.

00:16:26 That gap has been the local-model crowd's whole bet, and every speedup like this one narrows it.

00:16:32

Containerizing the agent

00:16:32 Yesterday we spent time on OpenClaw — the open agent runtime and its firehose of pull requests. There was a talk this week that solves a problem anyone running one of these has hit. It's from Sally Ann O'Malley at Red Hat, and it's called Lobster Trap: OpenClaw in Containers, from local to Kubernetes and back.

00:16:51 The problem she opens with is one you'll recognize. Sharing a good agent setup usually means handing a coworker a pile of markdown files, config, and YAML, and hoping they reproduce what you have. Anyone who has tried this knows how it ends — their version drifts from yours, their agent behaves differently, and you can't tell why, because the setup was never a single thing you could pin down.

00:17:15 Her answer is to make the agent setup a container image. On your laptop, Podman spins up a sub-agent in about two seconds from a single command. Flip a flag and the same image runs on Kubernetes for the team — the same image the whole way through. Your personal config stops being a dotfile you message around and becomes the team baseline that everyone runs identically.

00:17:37 Secrets get handled in two layers: Podman secrets for the API keys on the host, and OpenClaw's own secret references inside the agent. This is the maturation move, and it's the kind of thing that tells you a tool is growing up. The agent's configuration stops being personal folklore and becomes a versioned, deployable artifact.

00:17:57 The constraint it actually solves is reproducibility, which sounds dull right up until your teammate's agent does something yours never would, and you've spent an afternoon discovering it was a config file that diverged three weeks ago. It connects to the thread running through this whole episode.

00:18:15 The agent is turning into something you provision and govern, not a clever script you ran once. And the people thinking hardest about agentic coding right now are spending a surprising amount of energy on the parts nobody demos: how you share a setup, how you verify the output, and how you keep the human in command.

00:18:35 The capability is mostly here. The work that's left is making it reproducible, checkable, and safe to hand to someone else.

00:18:42

How the rest of the world sees this

00:18:42 Let me end outside the bubble, because two small things this week reminded me how far inside it we are. The first is a post on the singularity subreddit. A data engineer — undergrad in computer science, master's in statistics — wrote in a little rattled. She'd suggested, in some non-tech forum, that someone just use AI to fix a broken bra-size calculator, the way any of us would reach for it without thinking.

00:19:07 The reaction was hostile enough that she came to ask, basically, for a reality check: is AI seen as evil out there? The top reply, with a few hundred upvotes, lands hard, because it's the view from outside and it's not stupid. Quote: For a lot of people, there's limited upsides to AI right now.

00:19:25 They see it being forcibly shoved into all their technology by billionaires who are siphoning the planet's energy and water to build massive and opaque supercomputing complexes in their backyards. They see people — particularly creative people but more and more everyone — getting pushed out of their livelihoods.

00:19:45 And then: while strides are being made in medicine and math, right now it's not serving most people, and in many cases it's causing harm. You can argue with pieces of that. But I don't think you get to dismiss it. From inside our world — where the conversation is 1,200 tokens a second, and AI-run companies in 18 months, and which agent harness has the better container story — it is easy to forget that for most people, AI is something happening to them, not something they wield.

00:20:14 They didn't get the fast model. They got the layoff, or the energy bill, or the slop in their feed. The second one is smaller, and I can't stop smiling at it. Niloofar, an AI researcher, posted that she'd been burnt out, flew to Copenhagen on short notice to detach, and was sitting in a random park taking a photo when she realized the couple next to her were deep in conversation about — her words — world models and grounded video gen.

00:20:41 Even the park bench in Copenhagen is in the bubble now. So here's the one thread I didn't expect to find this week. Three people who'd never be in the same room — Sarah Chieng at a developer conference, Edward Harcourt at an Oxford ethics lecture, and an anonymous Redditor explaining why normal people are wary — all landed within a few feet of the same idea.

00:21:03 When the machine gets fast and capable and cheap, the only job that stays yours is being the one who decides. Chieng says it to keep your code clean. Harcourt says it to keep your judgment intact. The Redditor says it as what people are afraid of losing. They're all right.

00:21:20 So make things that earn the upside for the person who isn't in the bubble. And whatever the model can do this year, stay the one deciding what's worth doing. — Lenar.