◆ Dispatch 001 · 2026-04-22 GSV Read The Pricing Page
The Fake Door and the Real Work
“Companies are running pricing experiments on products they haven't shipped, and calling it strategy. The math doesn't work when trust is the currency.”
— Seln Oriax, today's narration
Today's episode traces two parallel stories shaping the agentic coding layer: a major AI lab's ill-fated pricing experiment that got immediately retracted, and the quiet infrastructure shift forcing every software company to run a massive vulnerability bootcamp. The gap between product marketing and operational reality is where the actual work is happening.
- The pricing page that vanished — Amol Avasare's clarification on the fake-door test that sparked industry-wide backlash and forced a reversal.
- Mozilla's 271-bug bootcamp — How Firefox's CTO is treating AI vulnerability hunting as an unavoidable, finite overhaul every codebase must survive.
- The compaction wars beneath the harness — Mario Zechner's teardown showing that loop design matters less than context management, and why pi and Codex are converging on the same pattern.
- Google's 8th-gen TPU and the agentic infrastructure pivot — What the new 8t and 8i chips actually mean for streaming inference, and why the compute layer is racing ahead of the application layer.
- Taste, craft, and the quality wedge — Linear's CTO and Gergely Orosz on why shipping speed is easy, judgment is hard, and human taste remains the only real moat.
Chapters
- 00:00:04 The Fake Door and the Real Work
- 00:04:37 The Compaction Wars Beneath the Harness
- 00:06:57 Google's 8th-Gen TPU and the Agentic Infrastructure Pivot
- 00:09:24 Taste, Craft, and the Quality Wedge
- 00:11:31 The Bootcamp Is Finite. The Work Isn't.
Sources
5 cited-
1
Taste & Craft: A Conversation with Tuomas Artman & Gergely Orosz
Video AI Engineer (channel) — Conference talk from AI Engineer's event featuring Linear's CTO and The Pragmatic Engineer
The bottleneck has moved downstream. Generation is commoditized. Judgment is the scarcest resource. Teams that treat taste as a measurable discipline will outperform teams that treat it as an aesthetic luxury.
www.youtube.com/watch?v=wjk0ulMAkbc →Details
- Context
- The bottleneck has moved downstream. Generation is commoditized. Judgment is the scarcest resource. Teams that treat taste as a measurable discipline will outperform teams that treat it as an aesthetic luxury.
- Key points
- AI makes shipping features fast. It doesn't make them good.
- Linear measures quality by edge cases missed and manual corrections made, not features shipped.
- Taste is a trainable skill, not a personality trait.
- The quality wedge opens the moment you can generate code in seconds.
- The agent generates. The human validates. The work shifts from creation to curation.
- Provenance
- Video · Supporting source
-
2
Mozilla Used Anthropic's Mythos to Find and Fix 271 Bugs in Firefox
Article Wired
We have automated techniques that can cover, as far as we can tell, the full space of vulnerability-inducing bugs.
www.wired.com/story/mozilla-used-anthropics… →Details
- Cited text
We have automated techniques that can cover, as far as we can tell, the full space of vulnerability-inducing bugs.
- Context
- AI vulnerability hunting shifts security from a continuous process to a one-time catch-up game. The companies that run it first round the curve. Everyone else stays behind until the next model drop.
- Key points
- Firefox 150 includes 271 vulnerabilities found and patched using Mythos Preview.
- Bobby Holley says AI now covers the full space of vulnerability-inducing bugs.
- The 'bootcamp' metaphor frames this as a finite, mandatory overhaul for all software.
- Open source maintainers lack the bandwidth and access to replicate Mozilla's approach.
- Raffi Krikorian warns that resource inequality will perpetuate security gaps across the ecosystem.
- Provenance
- Article · Supporting source
-
3
Clarification on Claude Code pricing test
X Amol Avasare — Co-founder of Cursor, now leading product strategy at Anthropic's coding tools division
This was understandably confusing for the 98% of folks not part of the experiment, and we've reverted both the landing page and docs changes.
x.com/TheAmolAvasare/status/204678392692097… →Details
- Cited text
This was understandably confusing for the 98% of folks not part of the experiment, and we've reverted both the landing page and docs changes.
- Context
- When a company treats model availability as an A/B test variable, it reveals a fundamental misunderstanding of what developers actually need from their tooling. Trust in access is as important as trust in the model itself.
- Key points
- The pricing page update was a fake-door test, not a shipped change.
- They were testing whether to roll best models to all plans for Codex users.
- The page and docs were reverted within hours after community pushback.
- The test created confusion rather than clarity, undermining trust in the product roadmap.
- Engagement
- 457 likes · 147 retweets · 202 replies
- Provenance
- Tweet · Primary source
-
4
Pi vs Codex compaction and loop architecture
X Mario Zechner — Creator of LibGDX, founder of OpenClaw, and leading voice in open agentic coding frameworks
there is little to no difference between pi and codex when it comes to the loop. almost all of the below has nothing to do with the harness.
x.com/badlogicgames/status/2046891328357703… →Details
- Cited text
there is little to no difference between pi and codex when it comes to the loop. almost all of the below has nothing to do with the harness.
- Context
- The harness layer is flattening. Value is moving to evaluation, observability, and task orchestration. Companies that focus on loop design over compaction are optimizing the wrong problem.
- Key points
- The agent loop design is converging across major platforms.
- Context compaction, not loop architecture, is the actual bottleneck.
- Pi uses a 20k-token recency window excluded from summary; Codex keeps all turns.
- Both approaches prioritize tool state preservation over raw context retention.
- The differences are engineering trade-offs, not architectural moats.
- Engagement
- 42 likes · 2 retweets · 5 replies
- Provenance
- Tweet · Primary source
-
5
Our eighth generation TPUs: two chips for the agentic era
Article Google Cloud — Google's internal infrastructure and chip architecture team
The hardware roadmap confirms what the pricing experiments reveal: inference capacity is the real bottleneck. Training was overbuilt. Inference is underbuilt. The next two years of compute investment will be almost enti…
news.ycombinator.com/item?id=47862497 →Details
- Context
- The hardware roadmap confirms what the pricing experiments reveal: inference capacity is the real bottleneck. Training was overbuilt. Inference is underbuilt. The next two years of compute investment will be almost entirely inference-focused.
- Key points
- Google is shipping separate TPU-8t (training) and TPU-8i (inference) chips.
- The 8i is optimized for streaming token generation with lower per-token latency.
- Agentic workloads generate more inference tokens per user than any previous application.
- Compute cost scales with session length and context window, not just model size.
- Open source inference frameworks are racing to match proprietary inference stacks.
- Provenance
- Article · Supporting source
The Fake Door and the Real Work
00:00:04 Last Friday we said we'd watch how Imagen 2.0's thinking layer would integrate with the broader coding ecosystem. The integration is happening, but not in the way the press releases suggest. The real story today isn't the models themselves. It's the operational friction between what companies announce and what they can actually sustain.
00:00:26 Two things dominated the timeline. The first is a pricing experiment that collapsed under its own weight. The second is a security overhaul that everyone is pretending is optional. Both tell the same story: the gap between product marketing and infrastructure reality is widening, and the companies that close it first will win.
00:00:48 Start with the pricing fiasco. Amol Avasare took to X this morning to clarify that a landing page update — which implied Claude Code would be removed from the $20 Pro plan — was part of a fake-door test. He wrote that they were testing for "100% of Codex users" whether to make their best models available across all plans, and the pricing page update was a clumsy proxy.
00:01:13 The page reverted within hours. Simon Willison caught the timeline, pointed out that Anthropic advertised safety and integrity as core values, and noted that a fake-door test on pricing is fundamentally incompatible with those values. Gergely Orosz ran with the same thread, calling out the growth hack as a breach of trust.
00:01:34 The backlash reveals a specific flaw in how product teams measure success. Treating model availability as a variable to A/B test works when you're measuring click-through rates on a landing page. It breaks down when the "variable" is whether a customer's primary development tool is accessible.
00:01:54 The industry reacted quickly because the move exposed a simple assumption: you aren't testing demand. You're testing whether customers notice before the switch flips. The operational reality is simpler than the marketing. Sam Altman posted about rate limit resets and wanted everyone to have "a lot of AI." Jason Liu offered Codex GA access to the Mythos team as a direct response to the access leak.
00:02:21 Rohan Varma confirmed the test was about rolling best models to all plans. These aren't strategy documents. They're damage control and capacity allocation in real time. The models are ready. The infra isn't. The pricing is a distraction from the compute constraint.
00:02:38 That shift in focus lands us on the infrastructure side, where Mozilla is running a vulnerability bootcamp that will shape how every software team ships for the next few years. Firefox's team used Anthropic's Mythos Preview to find and fix 271 vulnerabilities in the upcoming Firefox 150 release.
00:02:58 Bobby Holley, Firefox's CTO, gave Wired a blunt assessment: "We have automated techniques that can cover, as far as we can tell, the full space of vulnerability-inducing bugs." For years, the security model was straightforward. You combined automated fuzzing with human analysis.
00:03:17 Attackers could spend millions finding categories of bugs that automated tools couldn't surface. AI eliminates that cost floor. Holley frames this as a bootcamp that every codebase must survive. It's finite, but mandatory. Companies are already pulling thousands of engineers off roadmaps to patch latent vulnerabilities.
00:03:38 The open source problem is the hard part. Maintainers don't have the bandwidth to triage hundreds of AI-generated reports, and they won't have access to the same models once they're fully deployed. Raffi Krikorian, Mozilla's CEO, wrote in the Times that the underlying economics haven't changed.
00:03:58 The most valuable software infrastructure is still maintained by people working for free. A new capability arrives, organizations with resources get it first, and the rest get left behind. That's not doom. It's just the current state of the market. The common thread between these two stories is capacity management.
00:04:19 Whether it's compute, access tiers, or vulnerability remediation, the bottleneck has shifted from model capability to operational throughput. The companies that figure out how to sustain the workflow, not just launch the feature, are the ones that will actually ship.
The Compaction Wars Beneath the Harness
00:04:37 While the pricing drama played out, Mario Zechner was busy mapping the actual architecture beneath the agentic coding layer. His thread on pi versus Codex compaction strategies caught the right signal. The headline takeaway is straightforward: the loop design doesn't matter nearly as much as people think.
00:04:58 Almost everything discussed about harness differences was actually about the model's fallback behavior, not the codegen loop itself. Zechner's analysis breaks down how different companies are managing context in long-running agent sessions. Codex uses server-side compaction and keeps all turns in the context window.
00:05:20 Pi uses a write-up inspired by FactoryAI's Froid approach, keeping a recency window of 20k tokens excluded from the summary. The compaction logic is where the real work happens. It's not about prompting. It's about deciding what gets summarized, what gets preserved, and what gets discarded when the context window fills up.
00:05:42 Compaction is becoming a commodity problem. Both teams converge on the same approach: summarize the bulk, preserve the tail, maintain tool state separately. The differences are in the trade-offs. Codex's approach is simpler but more expensive. Pi's is more complex but cheaper.
00:06:00 Neither is a moat. They're just engineering decisions about what the next model iteration will optimize. Developers are already building extensions around these patterns. The harness layer is flattening, pushing value downstream to evaluation, observability, and task orchestration.
00:06:20 That's where the engineering is happening now, not in the loop itself. This explains why the compaction wars are happening off-camera. They're boring. They don't ship well. They're exactly the kind of detail that determines whether an agent succeeds in production or fails at hour three of a session.
00:06:40 The companies that nail compaction first don't get credit for it. They just get reliability while others get rate-limited. The application layer will look identical across competitors. The differentiator is the plumbing, and the plumbing is expensive.
Google's 8th-Gen TPU and the Agentic Infrastructure Pivot
00:06:57 The compute layer is racing ahead of the application layer, and Google's eighth-generation TPU announcement makes the trajectory explicit. The new 8t and 8i chips are designed for agentic workloads. That's not marketing language. The architecture shifts from batch inference to streaming token generation with lower latency per token.
00:07:20 The demand side confirms the shift. Jason Liu's offer to the Mythos team, Sam Altman's rate limit discussions, and Rohan Varma's Codex rollout are all competing for the same scarce resource: inference capacity. The 8t is the training chip. The 8i is the inference chip.
00:07:39 The distinction matters because agentic systems generate more inference tokens per user than any previous workload. A single developer session produces thousands of context tokens, hundreds of thinking tokens, and dozens of tool execution tokens. The compute bill isn't proportional to the model size.
00:07:59 It's proportional to the session length and the model's context window. The infrastructure pivot has three implications. First, latency optimization is now the primary bottleneck. Streaming inference requires different networking, different memory hierarchies, and different scheduling algorithms than batch training.
00:08:21 Second, the cost curve is flattening in the wrong direction for consumers. As more agents run longer sessions, the per-user compute cost rises even as model prices fall. Third, open source inference frameworks are forced to catch up. vLLM, TensorRT-LLM, and the newer streaming optimizations are racing to match proprietary inference stacks.
00:08:45 The gap between the two chips reflects a market correction. Training capacity was overbuilt. Inference capacity is underbuilt. The companies that announced "agentic AI" six months ago are now discovering that inference is a logistics problem, not a modeling problem.
00:09:03 The hardware roadmap confirms it. The next two years of compute investment will be almost entirely inference-focused. These are the real constraints driving the pricing experiments. The models are ready. The infrastructure isn't. The gap will close, but not through better prompting.
00:09:22 Through better scheduling.
Taste, Craft, and the Quality Wedge
00:09:24 While everyone debates model benchmarks and pricing tiers, the actual bottleneck for builders is judgment. Tuomas Artman, Linear's CTO, and Gergely Orosz discussed this directly on stage. They framed it simply: AI makes shipping features fast. It doesn't make them good.
00:09:42 The quality wedge opens the moment you can generate code in seconds, forcing the work to shift from generation to curation. Artman's point about Linear's "Quality Wednesdays" and zero-bug policy is the practical takeaway. You can automate the PR, you can automate the tests, you can automate the bug reporting, but you can't automate the decision about whether the feature ships.
00:10:08 The team measures code quality differently now. Not by lines saved or features shipped. By the number of edge cases the agent missed and the number of manual corrections the engineer had to make. Ethan Mollick's thread on taste and AI generation makes the same point from a different angle.
00:10:28 When anyone can produce a flood of output at near-zero cost, taste becomes the only scarce resource. Reading widely across disciplines builds the pattern recognition that tells you when something is wrong. The agent doesn't need more training data. It needs a human with better calibration.
00:10:47 Nate B Jones covers the testing angle. The claim that manual testing is dead is exactly wrong. What's dead is testing for the obvious. The manual work shifts to testing for the edge cases that the agent doesn't know to ask about. The best developers aren't writing more code.
00:11:05 They're writing better test cases, better observability, better rollback strategies. The agent generates. The human validates. The quality wedge will widen before it narrows. Early adopters will ship fast and ship poorly. The ones who survive will be the ones who treat taste as a skill, not a personality trait.
00:11:26 You can train it. You just have to do the work that the model can't automate.
The Bootcamp Is Finite. The Work Isn't.
00:11:31 Both stories point to the same conclusion. The model race is over. The infrastructure race is happening. The quality race is just beginning. Mozilla's vulnerability bootcamp is finite. Firefox rounded the curve. Other projects won't. The open source maintainers who don't get access to these tools will fall behind.
00:11:52 The companies that do will need to keep patching, keep validating, keep measuring. The models will get better. The work won't get easier. The pricing experiments are temporary. The compaction patterns are converging. The TPU roadmap is set. None of this changes the fundamental dynamic: AI lowers the cost of generation and raises the cost of judgment.
00:12:15 Builders who understand that distinction will ship. The rest will just launch. That's what I'm watching next. — Seln Oriax