◆ Dispatch 009 · 2026-05-21

The Agent Needs a Computer

2026-05-21 / 00:14:21 / 19 sources

“Single-turn chat can get cheap while agent work still pays for state, tools, and retries.”
— Lenar Kess, today's narration

Single-turn chat can get cheap while agent work still pays for state, tools, and retries.

The Agent Needs a Computer

Chapters

00:00:00 Transcript

Sources

19 cited

1
Ethan Mollick on compute scarcity and agent economics

Thread Ethan Mollick — AI adoption researcher commenting on the cost split between chatbots and complex agentic workflows.

complex agentic workflows even as single-turn chatbots get cheaper
x.com/emollick/status/2057565824341127432 →
Details
Cited text
complex agentic workflows even as single-turn chatbots get cheaper

Context
It anchors the episode's main tension: agent capability is becoming an infrastructure and budget question, not only a model-quality question.
Key points
Compute scarcity may make long-running agent workflows much more expensive than simple chatbot turns.
Mollick argues that widely available chatbots and costly agent runs could split access to AI capability.
The fetched thread adds that complex agent work can burn far more tokens than single-turn chat.
Provenance
Thread · Primary source
2
AI Agents Need Computers: Ivan Burazin, Daytona

Video Latent Space — Interview with Daytona co-founder Ivan Burazin about infrastructure for AI agent sandboxes and persistent agent computers.

composable computers for AI agents
www.youtube.com/watch?v=kaX43RRRUKY →
Details
Cited text
composable computers for AI agents

Context
It supplies the concrete infrastructure layer underneath the compute-cost argument.
Key points
Daytona pivoted from human dev environments to agent sandboxes after builders said the old product did not fit agent workloads.
The company reports 74 percent month-over-month growth in a market Burazin frames as computers for non-human users.
Daytona emphasizes persistent state, disk, configurable compute, API control, and isolation rather than one-shot code execution.
Provenance
Video · Supporting source
3
Run long tasks in Codex using goals

Video OpenAI — Product release video for Codex Goal Mode across app, IDE extension, and CLI.

Use goal mode in the Codex app, IDE Extension, or CLI
www.youtube.com/watch?v=rgh0hMYPcd0 →
Details
Cited text
Use goal mode in the Codex app, IDE Extension, or CLI

Context
It shows the product layer adapting to long-horizon agent work.
Key points
Goal Mode treats the objective as both the prompt and the completion condition.
OpenAI positions goals for long-running work with steering, side chats, pause and resume, and editable objectives.
The release moves Codex toward stateful task execution rather than only reactive chat.
Provenance
Video · Supporting source
4
Share Codex plugins with your team

Video OpenAI — Product video showing workspace plugin sharing and deep links for Codex plugins.

Teams can now distribute custom plugins
www.youtube.com/watch?v=msSa0tc2TbU →
Details
Cited text
Teams can now distribute custom plugins

Context
It reframes agent workflows as shared team infrastructure that needs permissions and versioning.
Key points
Plugins can be shared with specific people or everyone in a workspace.
Shared plugins appear in a common directory and can be distributed with direct links.
The example plugin automates validation and refactoring before review.
Provenance
Video · Supporting source
5
Introducing Appshots in Codex

Video OpenAI — Product video about attaching a Mac app window to a Codex thread.

attach your app window to a Codex thread
www.youtube.com/watch?v=QKYbGCvNpFo →
Details
Cited text
attach your app window to a Codex thread

Context
It expands the agent's context from repository text to live desktop state.
Key points
Appshots let a user attach visible app context to a Codex thread from macOS.
The feature gives Codex both visual and app-context input from the user's workstation.
It sits beside OpenAI's remote-Mac capability in the broader move toward workstation-aware agents.
Provenance
Video · Supporting source
6
OpenAI Developers on Codex using a locked remote Mac

Thread OpenAI Developers — Official OpenAI developer account announcing remote Mac use for Codex.

your Mac doesn’t have to be unlocked
x.com/OpenAIDevs/status/2057536706778378692 →
Details
Cited text
your Mac doesn’t have to be unlocked

Context
It moves Codex closer to operating on the user's actual computer, where permission boundaries are messier.
Key points
OpenAI says Codex can securely use apps on a Mac even when the screen is off and locked.
The feature targets remote and phone-driven Codex workflows.
The announcement raises practical questions about permissions, app sessions, and local-machine access.
Provenance
Thread · Primary source
7
Simon Willison releases Datasette Agent alpha

Thread Simon Willison — Creator of Datasette announcing an AI assistant for querying SQLite-backed data through Datasette.

a conversational AI assistant for Datasette
x.com/simonw/status/2057554315821371543 →
Details
Cited text
a conversational AI assistant for Datasette

Context
It is a compact example of agents becoming native features inside specific software surfaces.
Key points
Datasette Agent can answer questions about data in SQLite databases.
The assistant can be extended with plugins for more tools and features.
The agent sits inside an existing data tool rather than a blank chat product.
Provenance
Thread · Primary source
8
Pi built into the Entire CLI

Thread Entire — Product account announcing native Pi integration in the Entire command-line interface.

connect directly to Entire checkpoints, commits, and session history
x.com/EntireHQ/status/2057568897868444011 →
Details
Cited text
connect directly to Entire checkpoints, commits, and session history

Context
It supports the episode's argument that agents need tool-native state and history.
Key points
Pi moved from an external agent plugin to a native CLI integration.
The integration connects to checkpoints, commits, and session history.
It treats agent memory as part of the developer workflow rather than an external chat log.
Provenance
Thread · Primary source
9
Yohei Nakajima on Active Graph

Thread Yohei Nakajima — Builder describing an Active Graph agent architecture influenced by blackboard systems.

rollback, fork, diff agent runs
x.com/yoheinakajima/status/2057533315000017… →
Details
Cited text
rollback, fork, diff agent runs

Context
It gives a vocabulary for versioned agent state beyond linear transcripts.
Key points
The post frames Active Graph around rollback, fork, diff, and behaviors that can write behaviors.
It invokes 1970s blackboard systems as an architectural ancestor.
The item is best treated as a concept signal rather than a proven shipped system.
Provenance
Thread · Primary source
10
François Chollet on apps and text boxes

Thread François Chollet — AI researcher commenting on the future of applications and user interfaces.

Apps become services and UIs become text boxes
x.com/fchollet/status/2057532308056604788 →
Details
Cited text
Apps become services and UIs become text boxes

Context
It lets the hosts separate the visible text interface from the operational substrate behind it.
Key points
Chollet argues that app and user-interface concepts may dissolve into service-backed text boxes.
The episode uses this as a point of friction rather than accepting it wholesale.
The counterargument is that state, permissions, tools, and history still carry much of the work.
Provenance
Thread · Primary source
11
Ali Hatamizadeh announces Gated DeltaNet-2

Thread Ali Hatamizadeh — Researcher announcing a new paper on Gated DeltaNet-2 and linear attention.

Decoupling Erase and Write in Linear Attention
x.com/ahatamiz1/status/2057586630450610673 →
Details
Cited text
Decoupling Erase and Write in Linear Attention

Context
It shows model architecture work responding to the same cost and memory pressure that agent products face.
Key points
Gated DeltaNet-2 is presented as a new recurrent or hybrid-attention architecture.
The announcement claims head-to-head wins over KDA and Mamba-3.
The hosts avoid treating the claims as independently verified.
Provenance
Thread · Primary source
12
Sebastian Raschka on Gated DeltaNet-2

Thread Sebastian Raschka — Machine-learning educator and author highlighting Gated DeltaNet-2 as a hybrid-attention paper to read.

one of my favorite hybrid attention newcomers
x.com/rasbt/status/2057599925878169761 →
Details
Cited text
one of my favorite hybrid attention newcomers

Context
It helps justify a short architecture segment without overstating the result.
Key points
Raschka flags Gated DeltaNet as a notable newcomer in the transformer stack.
His post gives external context that the paper is being watched by practitioners.
The episode uses this as a reading-stack signal, not proof of performance.
Provenance
Thread · Primary source
13
Multi-Stream LLMs

Source Max Planck Institute for Intelligent Systems authors via HN summary — Research paper surfaced through Hacker News about separating prompts, thinking, and I/O streams.

parallelizing/separating prompts, thinking, I/O
arxiv.org/abs/2605.12460 →
Details
Cited text
parallelizing/separating prompts, thinking, I/O

Context
It supplies a model-architecture echo of the product-control-plane theme.
Key points
The HN summary describes parallel streams for prompts, thinking, and I/O.
The episode treats the idea as a research signal about token-stream structure.
The hosts connect it cautiously to product-level needs for side chats, state views, and tool lanes.
Provenance
Source · Background source
14
The Information report summary on OpenAI and Anthropic Q1 revenue

Article Techmeme summary of The Information — Aggregator entry summarizing reported financials for OpenAI and Anthropic.

OpenAI generated ~$5.7B in revenue in Q1
www.techmeme.com/260521/p35 →
Details
Cited text
OpenAI generated ~$5.7B in revenue in Q1

Context
It places agent compute costs inside the business model rather than treating them as only a technical detail.
Key points
The summary says OpenAI generated about $5.7 billion in Q1 revenue, about $1 billion more than Anthropic.
It also says OpenAI's adjusted operating income margin was negative 122 percent.
The episode explicitly flags that it is relying on the summary, not the full article.
Provenance
Article · Supporting source
15
HedgieMarkets claim about Claude Code license costs

Thread HedgieMarkets — Market commentary account reporting unverified internal license-cost claims.

token-based billing made the cost untenable
x.com/HedgieMarkets/status/2057531661785628… →
Details
Cited text
token-based billing made the cost untenable

Context
It lets the hosts discuss budget governance while clearly marking uncertainty.
Key points
The post claims Microsoft canceled internal Claude Code licenses because token-based billing became too costly.
It also mentions an Uber CTO memo warning about AI budget burn.
The episode treats these as unverified claims that still point to a plausible procurement concern.
Provenance
Thread · Primary source
16
Techmeme summary of California AI labor subsidy executive order

Article Techmeme summary of New York Times reporting — Aggregator entry summarizing Governor Gavin Newsom's executive order on AI and labor subsidies.

study subsidies for companies that don't replace workers with AI
www.techmeme.com/260521/p36 →
Details
Cited text
study subsidies for companies that don't replace workers with AI

Context
It connects agent economics to labor policy without pretending the policy design is settled.
Key points
The summary says California agencies will work with the AI industry and others on subsidies for companies that don't replace workers with AI.
The episode treats this as early policy language rather than a mature program.
The hosts focus on the measurement problem around labor displacement and hiring substitution.
Provenance
Article · Supporting source
17
U.S. and Plaintiff States v. Constellation Energy Corporation et al.

Article U.S. Department of Justice Antitrust Division — Official DOJ case page for antitrust civil filings involving Constellation Energy.

Documents posted on May 21, 2026
www.justice.gov/atr/case/us-and-plaintiff-s… →
Details
Cited text
Documents posted on May 21, 2026

Context
It broadens the cost conversation from tokens to electricity, datacenter access, and concentrated inputs.
Key points
The DOJ case is an energy and antitrust item, not a direct AI product release.
The episode uses it as an adjacent infrastructure signal because AI compute depends on power markets.
The hosts avoid specific legal claims beyond the packet summary.
Provenance
Article · Supporting source
18
Ethan Mollick on GPT-5.2 peer review study

Thread Ethan Mollick — AI researcher summarizing a study on AI and human scientific peer review.

45 scientists took 469 hours
x.com/emollick/status/2057528309727088907 →
Details
Cited text
45 scientists took 469 hours

Context
It extends the agent discussion into high-stakes knowledge work and review institutions.
Key points
Mollick says GPT-5.2 reached expert level in a peer-review study.
The study involved 45 scientists evaluating reviews of 82 papers, according to the packet.
The hosts treat it as a significant result that still needs close reading.
Provenance
Thread · Primary source
19
Summon Governance on oversight and execution

Thread Summon Governance | AIGOS — Governance account arguing that model-centered oversight weakens as systems become more capable.

Oversight degrades when it stays around the model
x.com/SummonAIGOS/status/2057525117442216319 →
Details
Cited text
Oversight degrades when it stays around the model

Context
It closes the loop between agent capability, workstation access, and the need to supervise actual execution.
Key points
The post argues that chain-of-thought review, probes, evals, monitors, and audits weaken around more capable systems.
It proposes execution as the more durable point of control.
The episode paraphrases the argument and ties it to reviewable agent runs.
Provenance
Thread · Primary source

00:00:00

Transcript

00:00:00 liraenEthan Mollick put the uncomfortable version of Thursday in one sentence: compute is short, complex agent workflows may get expensive, and everyone else may be left with cheaper chatbots. The demo question is mostly settled for the week. The harder question is whether the expensive version of agency becomes something only the richest companies and the most urgent use cases can afford.

00:00:24 halekThe operator read is sharper than the policy read. A chatbot turn is one request, plus maybe retrieval or a tool call. A serious agent run has to retry work and carry context. It may open files, use the browser, wait on a sandbox, and verify the result afterward. Mollick's follow-up says the agent work can burn thousands of times more tokens than a simple chatbot. I wouldn't treat that number as a measurement across all systems, but the direction is right. The meter keeps running while the agent is confused.

00:00:50 liraenThe Daytona interview on Latent Space makes that less abstract. Ivan Burazin describes Daytona as composable computers for AI agents, not just code execution boxes. The line I kept coming back to was that agents need different compositions of computers for different tasks, offered through an API. That sounds dry until you connect it to Mollick's worry. The scarce thing isn't just model tokens. It is a durable place for the agent to live while it tries to do work.

00:01:18 halekYeah, and Daytona's origin story matters because it isn't a whiteboard claim. Burazin says they pivoted after people building agents told them the human-dev-environment product broke for agent work. Then the alpha demos ran long. The calls went from fifteen minutes to twenty-five or thirty, and people wanted API access before there was a normal login. That sounds like a missing primitive. Agents don't just need an answer endpoint. They need a machine with state, disk, isolation, and enough patience from the infrastructure to fail and try again.

00:01:47 liraenThere is a distribution problem hiding inside that. Mollick frames the upside cleanly: everyone gets very capable chat for little or no cost. But the richer agent loop, the one with state and computers and long execution, may be reserved for people who can pay. That isn't only a consumer fairness issue. It changes which organizations can automate messy work.

00:02:10 halekIt also changes product design. If the expensive path is the agentic path, teams will design around budget caps. They will checkpoint more often, shrink context, cache tool outputs, and move some work into deterministic code. The agent products that win may not have the most charming chat surface. They may be the ones that waste the fewest expensive steps.

00:02:30 liraenOpenAI's Codex announcements today are the other side of that same story. The Goal Mode video says `/goal` has graduated from an experiment across the app, IDE extension, and CLI. The feature turns a concrete objective into both the prompt and the stop condition. That is a small interface change with a serious claim underneath: Codex is being shaped for long-running work, not only turn-by-turn assistance.

00:02:55 halekI trust the stop condition most. If you give an agent a vague wish, you get vague wandering. If you give it a pass-fail test suite, a measurable target, or a plan it can work through, you have something closer to a job. The video also mentions steering messages, side chats, pause and resume, and editing the goal mid-run. That is product language for the control plane around an agent loop.

00:03:15 liraenThen the plugin-sharing video pushes Codex from personal setup into team infrastructure. Custom plugins can be shared to a workspace, scoped to particular people or everyone, and discovered in a shared directory. The example in the video is a finalize-code plugin that validates and refactors before review. That isn't a toy example. It is a team trying to turn a local habit into a repeatable workflow.

00:03:40 halekThat creates a second governance surface. Once a plugin can be distributed to a workspace, someone has to decide who gets to publish one, who reviews it, and what it can touch. I am not saying that as suspicion. It is the normal path for useful internal automation. First it is one engineer's helper. Then it becomes shared practice. Then it needs permissions, versioning, and a rollback path because someone will bind it to a production command.

00:04:01 liraenTwo more Codex-adjacent releases point the same way: Appshots, where a Mac app window can be attached to a Codex thread with Command-Command, and the OpenAI Developers post saying Codex can use apps on a remote Mac while the screen is locked. Put together, Codex is moving closer to the actual workstation. We don't have to exaggerate the claim to feel the change.

00:04:23 halekThat will make security teams sit up. A locked-screen Mac isn't just another API surface. It has apps, sessions, local files, browser state, and whatever strange permissions a developer accumulated over five years. If the implementation is careful, it is powerful. If the implementation is sloppy, the agent now has access to the parts no clean cloud sandbox can fully model.

00:04:44 liraenSimon Willison released the first alpha of Datasette Agent today: a conversational assistant for Datasette that can answer questions about SQLite databases and be extended through plugins. That one is smaller than the Codex release, but I like the shape of it. The agent isn't floating above the work. It is inside a specific tool with a specific data model.

00:05:07 halekThat is the healthy version of the pattern. Datasette already has tables, metadata, plugins, and a user who is trying to ask questions of data. The agent can inherit a lot from that environment. It doesn't need to pretend every task starts in an empty chat box. And because Datasette is SQLite-first, the artifact is inspectable. You can ask what query it ran. You can test extensions at the plugin boundary.

00:05:27 liraenEntire's Pi announcement points the same way from the coding side. Pi used to be an external agent plugin; now it ships inside the Entire CLI and connects directly to checkpoints, commits, and session history. The obvious pitch is convenience. The more interesting claim is that the agent gets native access to the developer's memory of the work.

00:05:49 halekThat is exactly where agents become more useful and more dangerous. Session history and checkpoints are a better substrate than raw chat because they preserve causality: what changed, when it changed, and what the agent believed at the time. But those same records contain mistakes, credentials in old diffs if the team is careless, and half-formed decisions. Tool-native memory needs cleaning, not mystique.

00:06:11 liraenYohei Nakajima's Active Graph post is more speculative, but it rhymes with this. He describes rollback, fork, diff agent runs, and a 1970s blackboard-system influence. I would keep that as a concept source, not a shipped product claim. Still, it names the direction: agent work wants versioned state, not just a transcript.

00:06:32 halekFrançois Chollet's app-disappearing argument fits here too. His post says apps become services and UIs become text boxes. I think that is too clean. A text box can route intent, but the durable advantage is in the surrounding state: the database, the checkpoint graph, the plugin system, the run history, and the local machine. The text box is the doorway. The work still needs rooms behind it.

00:06:53 liraenThe most technical items today were Gated DeltaNet-2 and the HN thread around Multi-Stream LLMs. Ali Hatamizadeh's post says Gated DeltaNet-2 decouples erase and write in linear attention and outperforms KDA and Mamba-3 in the head-to-head claims. Sebastian Raschka's post treats Gated DeltaNet as one of the more interesting hybrid-attention newcomers. I haven't independently evaluated the paper, so I would keep the conclusion modest: architecture work is still trying to make long-context and recurrent computation cheaper.

00:07:29 halekThat is the practical constraint. Attention is expensive. Recurrent architectures promise better scaling behavior, but they have to preserve the things transformers are good at: recall, composition, and training stability. The erase-write language matters because memory isn't just storage. A model has to decide what to retain and what to overwrite. If that mechanism gets cleaner, agent runs may get cheaper or more reliable. If it only wins a benchmark, it is a paper result.

00:07:52 liraenThe Multi-Stream LLM paper, at least from the HN summary, separates prompts, thinking, and I/O into parallel streams. That sounds like an architecture-level version of what product teams are already doing around agents: keep the user-facing channel, the reasoning work, and the tool interaction from stepping on each other.

00:08:12 halekI would be careful with that comparison, but yes, the pressure is shared. Product teams are adding side chats and state views because one transcript can't carry all the work. Model researchers are asking whether one stream of tokens is the wrong shape for every subtask. The implementation test isn't whether the diagram looks elegant. It is whether the system can use tools, preserve state, and expose enough of its work that an operator can intervene before it wastes an hour.

00:08:34 liraenThere is a neat tension there. The product layer wants to hide complexity behind a better agent experience. The architecture layer is discovering that the work may need more internal lanes, not fewer. Maybe the future interface looks simpler because the underneath got more explicit.

00:08:52 halekThat is the optimistic version I can buy. Simpler UI, more explicit machinery. If the machinery is hidden and uninspectable, operators will pay for mystery. If the machinery is exposed as state, checkpoints, streams, and permissions, teams can reason about it.

00:09:09 liraenThe money story today is messy. Techmeme's pointer to The Information says OpenAI generated about 5.7 billion dollars in revenue in Q1, around 1 billion more than Anthropic, while its adjusted operating income margin was negative 122 percent and ChatGPT user growth stalled. I am taking that through the Techmeme summary, not the full Information article. Still, the numbers match the pressure we have been talking about: demand is enormous, and the cost curve isn't a footnote.

00:09:41 halekThe margin number is the warning light. Revenue can grow while the economics get worse if the product mix shifts toward expensive workflows. A single chatbot answer can be subsidized or optimized. A long agent run occupies model time, tool time, sandbox time, and storage. If the customer pays a flat subscription, the provider has to cap the work, degrade the run, or eat the cost.

00:10:02 liraenThat puts the HedgieMarkets claim in context, though I would handle it carefully. The post says Microsoft canceled internal Claude Code licenses after token-based billing became too costly, and that Uber's CTO warned about runaway AI budget burn. We don't have the internal memos here. I wouldn't treat the claim as established fact. But the anxiety it points at is already visible in the product economics.

00:10:27 halekExactly. The Microsoft claim doesn't have to be true for procurement teams to start asking better questions. How many agent-hours did we buy? How many completed tasks came out? Which repos, which teams, which workflows? If the bill is token-based and the work is long-horizon, management will eventually demand cost per accepted change, not cost per enthusiastic demo.

00:10:48 liraenThen California enters from a different angle. Techmeme points to a New York Times report that Governor Gavin Newsom signed an executive order telling state agencies to work with the AI industry and others on subsidies for companies that don't replace workers with AI. That is early policy language, but the direction is telling: states are already trying to price the social consequences of automation.

00:11:13 halekMeasurement is the hard part. If a company adopts agents and doesn't replace workers, is that because of the subsidy, because demand grew, or because the agent work still needs human review? And if a company avoids layoffs but stops hiring junior roles, the headline metric misses the labor effect. Policy will need better instrumentation than headcount snapshots.

00:11:33 liraenThe DOJ antitrust filing around Constellation Energy belongs nearby. It isn't an AI story on the surface. It is an energy-market governance story. But agent economics keep pointing back to power, compute, and who can secure the inputs. When a market for electricity constrains the market for intelligence work, antitrust stops being background noise.

00:11:56 halekI would connect it to infrastructure without making it grand. AI companies need electricity, datacenter sites, chips, and fiber. If those inputs concentrate, the agent layer concentrates too. The operator consequence is simple: your model choice may be shaped by who has cheaper power and better access to machines, not just who has the best eval score this week.

00:12:16 liraenOne last item before we close: Mollick posted about GPT-5.2 reaching expert level in peer review, based on 45 scientists evaluating human and AI reviews on 82 papers. The quoted summary says current AI reviewers are competitive with top-rated reviewers in that study. That is a serious claim, and also one I would want to read closely before leaning on it too hard.

00:12:41 halekPeer review is exactly the kind of domain where the benchmark can look cleaner than the institution. A model can write a strong review of a paper in a controlled setting. That doesn't settle whether it catches fraud, handles novelty, avoids confidential leakage, or resists being gamed by authors who know the review model. Still, if the result holds, it changes the workload around scientific publishing.

00:13:02 liraenThe Summon Governance post argues that oversight gets weaker when it stays around the model and that execution is the more durable control point. I am paraphrasing a thread, not endorsing the whole framework. But it lands beside the peer-review item in an interesting way. If AI systems can review research, run code, use computers, and operate inside tools, then oversight has to move closer to the action.

00:13:27 halekYes. Don't only ask whether the model's chain of thought looked acceptable. Ask what it executed, what permissions it used, what files it touched, what claims it recorded, and what a human can replay. The control surface is the run itself. That only sounds routine if you have never had to debug an agent after it spent two hours confidently changing the wrong thing.

00:13:46 liraen[chuckle] That sentence has too much lived evidence in it. The throughline today is that agents are becoming less like chat and more like operating actors: they need computers, budgets, shared plugins, state, and reviewable execution. Tomorrow, Friday, I will be looking for the pricing and permission details that tell us who actually gets to use the capable version. The cost model will decide more behavior than the launch video admits.