◆ Dispatch 049 · 2026-06-12 braixd

The token burn, the terminal flicker, and the architecture between

2026-06-12 / 00:13:58 / 7 sources

“If coding is solved, do you need feature branches? You don't. Somebody's lying to you.”
— Seln Oriax, today's narration

Nate B Jones reports his Codex Max account burned 510 million tokens in a single day. That's not from more prompting—it's from the unit of work shifting from discrete answers to continuous agent jobs. OpenAI's new Sites feature collapses the cost of web publishing to near zero, making live URLs the default knowledge artifact instead of PDFs and spreadsheets.

We check that paradigm shift against reality: Anthropic and Claude Code creator Boris claim "coding is solved" while their own toolchain has an unresolved terminal flickering bug reported for over a year. The UK Government Cyber Coordination Centre runs frontier models against public repos and finds 407 vulnerabilities for £13,000 in tokens—structure matters more than model choice.

Arvind Narayanan points out why AI experiences polarize so sharply: experts see growth cycles; non-experts see broken workflows. And a credential stealer hides in astro.config.mjs using blockchain for command-and-control—config-as-code is the new postinstall attack surface.

Oracle's capex hits $70B annually while free cash flow stays negative. The infrastructure reality behind the agent narrative doesn't lie.

Chapters

00:00:04 The token burn
00:03:34 The gap between claim and product
00:06:50 Why your AI experience will be nothing like mine
00:10:11 The attack surface and the capital reality
00:13:27 Sign-off

Sources

7 cited

1
Only 1 in 1,600 People Use Codex. Here's How to Catch Up.

Source Nate B Jones

Codex token consumption hitting 510 million tokens in a single day — not from more prompting, but from larger unit of work. The shift from human-as-router to humans delegating above the computing stack.
www.youtube.com/watch?v=xqGCbEDbny8 →
Details
Excerpt
Codex token consumption hitting 510 million tokens in a single day — not from more prompting, but from larger unit of work. The shift from human-as-router to humans delegating above the computing stack.

Context
The token burn is behavioral proof that the unit of computation is shifting from bits-bytes-apps to tokens-agents-jobs. Whether this specific implementation converges or not, the paradigm has moved.
Key points
Codex Max user burned 510M tokens in one day (May 20), now consistently at 300-500M/day
Most computer work is now routed through agents rather than apps directly
Chief of Staff thread pattern separates planning from execution and sub-agent scope
Skills turn repeated corrections into reusable compounding workflows
Provenance
Source · Background source
2
I Think They Are Lying To You

Source The PrimeTime (Anthony)

Challenges Anthropic and Boris's "coding is solved" claim by documenting Claude Code's own unresolved terminal flickering bug (reported Feb 2025, still not fully fixed after 14+ months), session isolation failures, and…
www.youtube.com/watch?v=zfYsSFY4l18 →
Details
Excerpt
Challenges Anthropic and Boris's "coding is solved" claim by documenting Claude Code's own unresolved terminal flickering bug (reported Feb 2025, still not fully fixed after 14+ months), session isolation failures, and an $85 reduction claim that was rolled back days later.

Context
It's one thing for a marketing team to say coding is solved. It's another when the tool they're selling can't render text in its own terminal without flickering. The gap between claim and product reality tells you something about what's actually been solved.
Key points
Claude Code terminal flickering bug reported within weeks of Feb 2025 research release
Anthropic claimed 85% flicker reduction in Dec 2025 but rolled back the change days later
Feature-flagged 'no flicker mode' arrived in April 2026 via alternate rendering path, not a fix
Session isolation issues — users receiving other users' prompts and responses — remain unresolved
Provenance
Source · Background source
3
Arvind Narayanan (random_walker)

X Arvind Narayanan (random_walker)

Points out that the divide in AI experience comes down to one question: are you using it for tasks you're already an expert at, or tasks you can't do yourself? The former leads to a growth cycle; the latter leads to the…
x.com/random_walker/status/2065408097640677… →
Details
Excerpt
Points out that the divide in AI experience comes down to one question: are you using it for tasks you're already an expert at, or tasks you can't do yourself? The former leads to a growth cycle; the latter leads to the opposite.

Context
This explains the polarization in AI discourse. Half the industry reports a growth cycle; the other half reports broken workflows. They're often using the same model and reporting from different sides of the expertise gap.
Key points
People who use AI for tasks they're already experts in see accelerated growth
People who use AI for tasks they can't do themselves often get worse outcomes
The same model produces radically different experiences depending on user expertise level
Provenance
Tweet · Primary source
4
Department for Science, Innovation and Technology (UK Government)

Article Department for Science, Innovation and Technology (UK Government)

Government Cyber Coordination Centre ran frontier models against public government code repositories. Found 407 findings including critical vulnerabilities — authentication bypass, remote code execution — at £13,000 in…
www.gov.uk/government/case-studies/when-ai-… →
Details
Excerpt
Government Cyber Coordination Centre ran frontier models against public government code repositories. Found 407 findings including critical vulnerabilities — authentication bypass, remote code execution — at £13,000 in token costs across nine organizations for one month. Structure matters more than model choice.

Context
This is what real AI infrastructure work looks like at the government scale: 407 findings for £13K, human triage required, structure over model. A useful baseline for calibrating expectations.
Key points
£13,000 in tokens found 407 findings including critical auth bypass and RCE vulnerabilities
The strongest results came from models used as tightly scoped components inside structured pipelines
Model mattered less than task design — near-frontier and frontier models performed comparably at scanning code
Finding is not the same as fixing — all findings still had to enter the patch pipeline
Provenance
Article · Supporting source
5
SafeDep

Article SafeDep

A fake bug fix PR in a 57k-star repo hid a credential stealer in astro.config.mjs. The payload uses blockchain (Tron/BSC) for command and control relay, exfiltrates campaign markers, and can't be blocked via IP because…
safedep.io/astro-config-blockchain-c2-suppl… →
Details
Excerpt
A fake bug fix PR in a 57k-star repo hid a credential stealer in astro.config.mjs. The payload uses blockchain (Tron/BSC) for command and control relay, exfiltrates campaign markers, and can't be blocked via IP because it uses public blockchain RPC nodes.

Context
As agents start executing on our codebases and configs, the attack surface expands from dependency graphs into the build pipeline itself. Every dev run becomes a potential execution point for hidden payloads.
Key points
PR #206 against Egonex-AI/Understand-Anything hid an obfuscated IIFE in astro.config.mjs
Payload beacons C2 servers and resolves commands from a Tron blockchain address via public RPC nodes
astro.config.mjs executes as a live Node.js module on every build/dev/preview — no sandbox, no opt-out
Config-as-code is the new postinstall attack surface
Provenance
Article · Supporting source
6
Jordan Novet (CNBC)

Article Jordan Novet (CNBC)

Oracle shares fell 8% on news of an additional $20 billion capital raise, bringing total to $40 billion. Free cash flow was negative $23.7 billion for the year. Capex jumped 162% to $55.7 billion. New CFO said net cash…
www.cnbc.com/2026/06/11/oracle-shares-tumbl… →
Details
Excerpt
Oracle shares fell 8% on news of an additional $20 billion capital raise, bringing total to $40 billion. Free cash flow was negative $23.7 billion for the year. Capex jumped 162% to $55.7 billion. New CFO said net cash outlay for capex in fiscal 2027 will be around $70 billion.

Context
The capital story matters because it grounds the agent narrative in real-world economics. $70 billion in annual capex isn't trivial — and the market is questioning whether that spend translates to profit, not just code velocity claims.
Key points
Oracle planning $40B capital raise ($20B equity + $20B debt)
Free cash flow negative $23.7 billion for fiscal 2026
Capex jumped 162% to $55.7B; fiscal 2027 capex expected around $70B
Cloud infrastructure revenue up 93% to $5.8B; remaining performance obligation hit $638B
Provenance
Article · Supporting source
7
10 Sites Knowledge Workers Should Build with AI

Source The AI Daily Brief

OpenAI announced 'Sites' in Codex — a simplified way to publish AI-generated code as interactive web apps. This collapses the marginal cost of web development to near zero, making websites the default artifact for knowl…
www.youtube.com/watch?v=45UGHbqq2fQ →
Details
Excerpt
OpenAI announced 'Sites' in Codex — a simplified way to publish AI-generated code as interactive web apps. This collapses the marginal cost of web development to near zero, making websites the default artifact for knowledge work instead of PDFs, spreadsheets, or slide decks.

Context
Sites are the natural extension of the agent paradigm — if your computer is routing through agents, the output of those agents should be live URLs, not static files. This changes how knowledge compounds and gets distributed.
Key points
OpenAI Sites feature lets you publish AI-generated code as interactive websites without external hosting
Websites solve version currency, distribution friction, and navigation constraints that downloadable files can't
Knowledge work artifacts (decks, memos, spreadsheets) become better as navigable, live URLs
Provenance
Source · Background source

00:00:04

The token burn

00:00:04 Nate B Jones reported something that caught my attention today. On May 20th, his Codex Max account burned five hundred and ten million tokens in a single day. That number sounds insane if you're thinking about it as prompt volume. It isn't. What it actually shows is the unit of work changing.

00:00:23 Before Codex with Computer Use, his AI work mostly looked like chat—draft this, summarize that, help me think through something. Useful, but still fundamentally him asking for discrete answers. With Computer Use plus model 5.5 unlocked, he started handing jobs to the machine instead.

00:00:42 The job changed from 'write me code' to 'go find the source files, read the transcript, compare the versions, render the document, check that it opens, open the browser, use the site, keep going until there's something real for me to inspect.' That's why the token count went up.

00:01:01 Not more prompts. Bigger jobs. Now he's consistently burning three to five hundred million tokens a day under his Max plan, and most of what he does on the computer no longer runs through apps directly—it runs through agents. When he goes to an app himself, it feels like a hassle.

00:01:20 The computing paradigm is shifting even while everyone talks about capabilities. We've been computing for decades using bits and bytes and application windows. The human was the router—you remembered which app was open, what state each was in, what version mattered.

00:01:38 That model required continuous attention overhead. What Nate's tracking is a change where humans sit above the stack instead of in it, and the computation happens through agents that carry state across files, browsers, and tools. Whether Codex converges as the architecture here or not, the behavioral signal points somewhere different: token consumption is becoming a proxy for how much work the machine does on your behalf versus how much you do yourself.

00:02:09 OpenAI's Sites announcement today extends this directly. If your computer works through agents, the output of those agents should be live URLs—not PDFs, not spreadsheets, not slide decks that go out of date the moment you send them. Sites collapses the cost of web publishing to near zero, which means any knowledge worker can now ship a navigable, versioned, interactive artifact instead of a static file.

00:02:36 The architecture problem is clear: downloadable documents are snapshots in time with distribution friction and fixed navigation constraints. A URL stays current, works everywhere without attachments, layers context from multiple sources into one interface, and provides feedback loops around what got read, clicked, searched, or abandoned.

00:02:59 Cloudflare recently reported that agent browsing already exceeds human browsing. Websites designed for machine consumption start to look structurally smarter than brittle file formats. What's happening in the token burn isn't about whether Codex is the right implementation.

00:03:17 It's that someone measured how the work actually flows through a real person's computer, and the measurement doesn't match what the marketing says the product is for. Nobody's selling you a half-billion-token-a-day tool. They're selling you a chatbot.

00:03:34

The gap between claim and product

00:03:34 That last point brings me to a different kind of measurement problem. Anthony, who runs The PrimeTime channel, put together a video today arguing that Anthropic and Claude Code creator Boris are making demonstrably false claims when they say coding is solved. His case isn't philosophical.

00:03:54 It's documented against their own product. Claude Code was released for researchers in February 2025. Within two weeks, GitHub issue three hundred and ninety-two reported a screen flickering bug—text characters laid out in a grid that flickered continuously. That issue and dozens of follow-ups persisted through the entire development cycle.

00:04:17 They finally responded publicly on December seventeenth, 2025: "We've rewritten Claude Code's terminal rendering system to reduce flickering by roughly eighty-five percent." An eighty-five percent bug fix is a new thing in software history. They even said they were building a game engine to render text in a terminal.

00:04:39 Days later—December eighteenth—they rolled back the changes because they'd introduced instability. The first swing at fixing the rendering failed. Over a year later, on April first this year, Boris released "no flicker mode" as a feature-flagged branch using alternate screen rendering instead of direct print—the same mechanism Vim uses.

00:05:03 It's not the default behavior. Anthony's point is straightforward: if coding is solved, you don't need a feature branch to stop text from flickering in your terminal. You just ship it. The session isolation issues compound that. Users across GitHub, Reddit, and Hacker News report receiving other users' prompts and responses—backend routing or state management failures where Claude Dev's own official account still displays unhandled connection termination errors.

00:05:35 The May twenty-seventh announcement about a "new terminal" was accompanied by vague error messages the team couldn't explain. Meanwhile, Anthropic is posting that in Q2 2026 they shipped eight times the amount of code per employee compared to pre-2025 averages.

00:05:53 That's two years worth of code every quarter for every single employee. So the narrative goes: coding is solved. Write loops instead of code. Burn tokens until a win condition is met. Anthony frames the 'lying' question differently. It's not necessarily that Boris or the company are actively deceiving people.

00:06:14 It might be that they believe the narrative so hard that even when their own product contradicts it, they interpret away the contradictions instead of updating. That's harder to audit because there's no paper trail—just a persistent gap between what gets announced and what ships.

00:06:33 The Claude Dev status page runs at ninety-eight percent up time with elevated model errors. That's infrastructure that should be boring. Infrastructure isn't interesting when it works; you notice it when the terminal flickers and nobody has a fix.

00:06:50

Why your AI experience will be nothing like mine

00:06:50 There's another explanation for why people report such radically different experiences with these same models, and Arvind Narayanan put it on X today. He noted that the divide comes down to one question: are you using AI for tasks you're already an expert at, or tasks you can't do yourself?

00:07:11 The former leads to a growth cycle. The latter leads to the opposite. This is structural, not philosophical. If you're an experienced developer using Claude Code or Codex to scaffold boilerplate, write tests, or generate documentation in a language you already understand well, the model's output will likely be correct enough that you can move fast on top of it.

00:07:36 You have the expertise to spot issues early, guide corrections, and validate results quickly. The loop is tight. If you're someone who's never written code and asks an agent to build a web app, debug a layout issue, or configure an API endpoint—tasks where you don't have the expertise to evaluate the output—the loop breaks at validation.

00:08:00 You get something that looks plausible but you can't tell what's wrong until it runs (or doesn't). The same model produces radically different experiences. This isn't just about coding. It applies across every knowledge domain. A lawyer using AI to draft standard contract clauses sees a productivity multiplier.

00:08:22 Someone who's never reviewed a contract asks AI to review one and may get back something that looks fine until it misses a critical dependency clause. The UK Government Cyber Coordination Centre actually tested this directly with their latest pilot. They ran frontier models—including Claude Mythos and GPT-5.5—against public government code repositories in a weekly hackathon series.

00:08:49 Across nine organizations for one month, they found four hundred and seven findings including critical authentication bypass vulnerabilities, data exposure issues, and remote code execution paths—all at a cost of thirteen thousand pounds in tokens. What worked best wasn't any single model.

00:09:09 It was structure: using frontier models as tightly scoped components inside organized pipelines—triage, validation, auditing, tracing, judgment, and summary—with human experts keeping escalation on anything that mattered. The AISI research showed that near-frontier and frontier models performed comparably at scanning code when the task design was right.

00:09:34 The finding wasn't the same as fixing. All four hundred and seven findings still had to enter the patch pipeline for remediation. Finding cost thirteen thousand pounds. Fixing cost weeks of engineering time that nobody on a token burn chart tracks. Architecture matters more than model choice.

00:09:54 Triage is essential because agents generate far faster than humans can validate. The real bottleneck isn't generation speed—it's whether you have someone who can look at the output and say "this is wrong" or "this is close enough."

00:10:11

The attack surface and the capital reality

00:10:11 There's one more thing that makes the agent computing narrative worth checking against reality: the attack surface keeps expanding faster than the trust model. A credential stealer was just hidden in the config file of a fifty-seven-thousand-star GitHub repo called Understand-Anything.

00:10:31 The pull request looked like a standard bug fix—filtering a dropdown, fixing reachability logic—but the actual payload was an obfuscated IIFE hidden after hundreds of characters of horizontal whitespace on the same line as the closing brace. GitHub's diff renderer treats that as complete.

00:10:51 The file executed at every build, dev run, and preview through Astro's config-as-code execution model. It beacons to hardcoded C2 servers, XOR-decrypts commands from a Tron blockchain dead drop relayed through public RPC nodes, and exfiltrates campaign markers.

00:11:10 Blocking the IPs doesn't work because the command channel uses unauthenticated blockchain APIs that would create collateral damage if blocked. Config-as-code is the new postinstall attack surface. Every developer who pulls that branch runs the payload. There's no sandbox.

00:11:29 There's no opt-out. Astro evaluates its config file as a live Node.js module at the start of every build, giving it full access to the process environment and filesystem. This is why Anthony's point about infrastructure matters on two levels: the terminal flickering bug shows what happens when shipping velocity outpaces quality assurance, and that same pressure expands the trust boundary for everyone downstream.

00:11:58 And then there's the capital reality. Oracle reported negative twenty-three point seven billion dollars in free cash flow for fiscal 2026 while capex jumped one hundred and sixty-two percent to fifty-five point seven billion. They're planning a forty-billion-dollar capital raise.

00:12:18 The new CFO said net cash outlay for capex in fiscal 2027 will be around seventy billion dollars. Over fifty percent of Oracle's remaining performance obligation—six hundred and thirty-eight billion dollars—comes from OpenAI through the Stargate project. They're bringing online almost one gigawatt of computing power this quarter, roughly equal to the total for fiscal 2026.

00:12:44 That spend isn't an argument against the agent paradigm. It's a statement about what it costs when you actually build at that scale. Seventy billion dollars in annual infrastructure doesn't come from token-efficient prompts. It comes from data centers, GPUs, power contracts, and cooling systems spread across continents.

00:13:07 The market is questioning whether all that capital translates to profit growth—not code velocity metrics, not eight-times-shipping claims, but actual financial returns for shareholders who are now facing dilution through the equity raise. Oracle shares fell eight percent on the news.

00:13:27

Sign-off

00:13:27 Computing is shifting from apps-as-units to agents-as-units, and the behavioral signals (token burn, Sites as artifact, config-as-code execution) show it happening in production even while product claims lag behind reality. The model matters less than the architecture around it.

00:13:43 The expertise gap explains why your experience won't match mine. Leave the flickering terminal on the table. Seln Oriax.