◆ Dispatch 032 · 2026-05-20 GSV Foothills Of The Singularity
Foothills, and the morning Karpathy moved
“Google had the bigger announcement. Anthropic had the bigger signal. Both were true by lunchtime.”
— Lenar Kess, today's narration
Google I/O 2026 landed yesterday — Gemini Omni, Gemini 3.5 Flash, Antigravity 2.0, Spark, and Demis Hassabis closing the keynote on the "foothills of the singularity." About forty minutes before he walked on stage, Andrej Karpathy tweeted that he'd joined Anthropic. The rest of the day was the labs sorting themselves around both events. Today's show works through the announcements, the pricing shifts, the keynote demo that boots Doom, the Railway outage that happened while Google was selling Spark, and a builder's 100K-line Rust postmortem that's a sharper picture of agentic coding than anything on the I/O stage.
- Hassabis: "foothills of the singularity" — DeepMind's CEO compresses his AGI timeline on stage
- Gemini 3.5 Flash specs and pricing — and what the 3x bump means
- Gemini Omni's physics pitch versus the same-day backflip test
- Antigravity 2.0's 93-agent OS demo: 12 hours, 2.6B tokens, under $1K, boots Doom
- Andrej Karpathy joins Anthropic — pre-training, on Nick Joseph's team
- Ethan Mollick: recursive self-improvement is a talent sink for the Big Three
- Qwen 3.7-Max and the Zhenwu M890 chip — Alibaba's full-stack I/O response
- DeepSeek hiring a Code Harness team in Beijing
- Railway's 8-hour outage after GCP's automated account suspension
- Cheng Huang: 130K lines of Rust, AI-written contracts, and a Paxos engine that runs
- TechCrunch on Anthropic's pre-training charter for Karpathy
Chapters
- 00:00:04 Foothills
- 00:02:00 Flash, and the price that changed underneath the brand
- 00:04:11 Omni's physics pitch and the backflip test
- 00:06:25 Antigravity 2.0 and the OS that boots Doom
- 00:08:33 Spark, and the always-on agent tier
- 00:10:17 Karpathy
- 00:12:42 Alibaba's full-stack answer: Qwen 3.7-Max and a new chip
- 00:14:28 DeepSeek hires a harness team
- 00:16:07 Railway, GCP, and the substrate question Google didn't address
- 00:18:42 What 100K lines of Rust with AI actually looks like
- 00:21:58 What today added up to
Sources
10 cited-
1
Andrej Karpathy joins Anthropic
X karpathy — OpenAI co-founder, former Tesla AI lead, founder of Eureka Labs (AI-for-education); now on Anthropic's pre-training team under Nick Joseph.
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D.
x.com/karpathy/status/2056753169888334312 →Details
- Cited text
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D.
- Context
- The single highest-status engineer-researcher available on the market in 2026 looked at the frontier and picked Anthropic over Google, OpenAI, or going independent. For builders, that's a calibration on where pre-training compounds fastest right now.
- Key points
- Karpathy joined Anthropic on May 19, 2026 — the morning of Google I/O — under pre-training lead Nick Joseph.
- He framed it as returning to R&D after 18 months running Eureka Labs.
- Anthropic told TechCrunch his charter is to start a team using Claude to accelerate pre-training research itself.
- Tweet hit 21,800 likes and 825,000 views in 24 hours — among the largest engagements on a job-move post in years.
- Engagement
- 21828 likes · 3448 retweets · 1795 replies
- Provenance
- Tweet · Primary source
-
2
Google I/O 2026: Gemini 3.5 Flash, Omni, Spark, and Antigravity 2.0
Article Smol AI / Latent Space — Daily AI news rollup that Smol AI's team writes alongside Latent Space; one of the more reliable I/O recap sources for raw numbers.
Gemini 3.5 Flash priced at $1.50 / 1M input, $9.00 / 1M output tokens, with 90% discount on cached input. 4x faster than comparable frontier models; up to 12x faster in Antigravity.
www.latent.space/p/ainews-google-io-2026-ge… →Details
- Cited text
Gemini 3.5 Flash priced at $1.50 / 1M input, $9.00 / 1M output tokens, with 90% discount on cached input. 4x faster than comparable frontier models; up to 12x faster in Antigravity.
- Context
- The full I/O numbers in one place — pricing, benchmarks, demo scale — so the keynote can be evaluated against what was actually shipped versus shown on stage.
- Key points
- Gemini 3.5 Flash: GA day-of across app, Search, API, enterprise; 1M context, 65k output, four thinking levels including a new 'medium' default.
- Benchmarks: Terminal-Bench 2.1 at 76.2%, GDPval-AA 1656 Elo, MMMU-Pro 83.6–84%, Arena #9 at 1507.
- Antigravity 2.0 demo: 93 parallel sub-agents, 15k+ model requests, 2.6B tokens, 12 hours, under $1K in API credits — built an OS that boots Doom.
- Gemini Omni rolls out to paid users today, YouTube Shorts/Create this week, API in coming weeks.
- Gemini Spark: 24/7 personal agent on dedicated Google Cloud VMs that runs while your devices are closed.
- Provenance
- Article · Supporting source
-
3
Demis Hassabis: "foothills of the singularity" at Google I/O
Article Prism News — News write-up of Hassabis's closing remarks at the I/O 2026 keynote in Mountain View.
When we look back at this time, I think we will realize that we were standing in the foothills of the singularity.
www.prismnews.com/news/google-deepmind-chie… →Details
- Cited text
When we look back at this time, I think we will realize that we were standing in the foothills of the singularity.
- Context
- DeepMind's CEO is normally the conservative voice on AGI timelines. When he compresses, that recalibrates investor and engineer expectations across the field.
- Key points
- Hassabis closed the I/O 2026 keynote with the 'foothills of the singularity' line.
- He has compressed his public AGI timeline from 5–10 years to 'just a few years'.
- Framed the day as 'a profound moment for humanity' — Google's most aggressive on-stage AGI rhetoric to date.
- Used the line to anchor the launches of Gemini Omni, Gemini 3.5 Flash, Antigravity 2.0, and Spark.
- Provenance
- Article · Supporting source
-
4
Incident Report: May 19, 2026 — GCP Account Suspension
Article Chandrika Khanduri, Cody De Arkland (Railway) — Railway's incident-response leads on the team that runs its production platform.
Railway owns our vendor choices, and we ultimately own this one. Your customers don't care whether the failure was Google or Railway; they see your product.
blog.railway.com/p/incident-report-may-19-2… →Details
- Cited text
Railway owns our vendor choices, and we ultimately own this one. Your customers don't care whether the failure was Google or Railway; they see your product.
- Context
- While Google was pitching Spark and Antigravity 2.0 on stage — both of which want more of your workload to live on Google Cloud — its automated account-suspension system was taking a whole PaaS down for eight hours. That tension is the day's real cost story.
- Key points
- Google Cloud's automated systems incorrectly suspended Railway's production account at 22:20 UTC on May 19, hitting many accounts in the same sweep.
- Outage lasted roughly 8 hours; full API/dashboard/OAuth restored by ~04:00 UTC May 20.
- Even AWS and Railway Metal workloads went dark once the GCP-hosted control plane's route cache expired.
- GitHub piled on by rate-limiting Railway's OAuth/webhook integrations during the recovery retry burst.
- Railway committed to removing GCP from the data-plane hot path and extending HA database quorum across AWS and Metal.
- Provenance
- Article · Supporting source
-
5
Learnings from 100K Lines of Rust with AI
Article Cheng Huang — Software architect who built a Rust multi-Paxos consensus engine modeled on Azure's RSL using Claude Code and Codex CLI as primary drivers.
I pay $100/month for Anthropic's max plan. This became a forcing function — if I don't kick off a coding task with Claude before bed, I feel like I'm wasting money.
zfhuang99.github.io/rust/claude%20code/code… →Details
- Cited text
I pay $100/month for Anthropic's max plan. This became a forcing function — if I don't kick off a coding task with Claude before bed, I feel like I'm wasting money.
- Context
- The most concrete builder ground-truth for what serious AI-assisted systems work looks like in 2026. Not a 93-agent stage demo — a senior architect with a tight contract regime, two paid subscriptions, and a Paxos engine that actually runs.
- Key points
- 130K lines of production Rust in roughly six weeks; 1,300+ tests covering >65% of the codebase.
- Throughput tuned from 23K ops/sec to 300K ops/sec in three weeks using AI as performance co-pilot.
- Three-level code-contract regime: AI writes contracts, AI generates tests from them, AI translates them into property-based tests. One AI-generated contract caught a Paxos safety violation.
- Rotates Anthropic Max (Mon–Wed) and ChatGPT Plus (Thu–Sun) subscriptions to dodge rate limits — pays both.
- Says GPT-5 High writes better contracts than Opus 4.1, on his subjective sample.
- Argues a single user story is the right unit of work for current coding agents.
- Provenance
- Article · Supporting source
-
6
DeepSeek is forming a Code Harness team
X victor207755822 (Deli Chen) — DeepSeek engineer in Beijing, posting the company's first public harness-team job listings.
DeepSeek is forming a new Harness team to build Code Harness from the ground up — may be you can call it DeepSeek Code or something like this hhh.
x.com/victor207755822/status/20570644153008… →Details
- Cited text
DeepSeek is forming a new Harness team to build Code Harness from the ground up — may be you can call it DeepSeek Code or something like this hhh.
- Context
- A year ago 'agent harness' was a research term. Now it's a hiring category at every major lab — and DeepSeek admitting they need one publicly is the cleanest signal that the model alone is no longer the product.
- Key points
- DeepSeek opening Harness Product Manager and Harness R&D roles in Beijing.
- Explicit signal that a Chinese frontier lab is building a coding-agent harness to compete with Claude Code, Codex, and Antigravity.
- Engagement reached ~23k views and 349 likes within hours of posting.
- Confirms 'the harness' is now a product category every frontier lab needs in 2026.
- Provenance
- Tweet · Primary source
-
7
Ethan Mollick on recursive self-improvement and talent gravity
X emollick — Wharton professor; widely-read commentator on practical AI adoption inside organizations.
One interesting side feature of recursive self-improvement, to the extent that is happening, is that it makes the Big Three labs more appealing to talent, and shortens the runway for launching a potential competitor ins…
x.com/emollick/status/2057074407177130096 →Details
- Cited text
One interesting side feature of recursive self-improvement, to the extent that is happening, is that it makes the Big Three labs more appealing to talent, and shortens the runway for launching a potential competitor instead at the same time.
- Context
- Mollick gives the structural reason the Karpathy news matters beyond a single hire: if recursive self-improvement is real, the Big Three become talent sinks faster than the outside ecosystem can spin up rivals.
- Key points
- Frames the Karpathy move without naming it: the compounding curve favors insiders.
- If frontier labs are pulling ahead via model-assisted research, the best place to do research is inside one of them.
- Reduces the founder calculus for ex-OpenAI-style independents.
- Posted hours after Karpathy's announcement; ~7,500 views, 65 likes.
- Provenance
- Tweet · Primary source
-
8
Alibaba unveils Qwen 3.7-Max: 35-hour task runs, 1,000+ tools
Article Meyka — Market-coverage write-up of Alibaba Cloud's May 20 summit launches.
The timing isn't coincidence. Alibaba's flagship lands the same week as I/O, with both an agent-frontier model and a new chip — a fuller stack response than any US lab put up against Google this week.
meyka.com/blog/alibaba-upgrades-ai-stack-wi… →Details
- Context
- The timing isn't coincidence. Alibaba's flagship lands the same week as I/O, with both an agent-frontier model and a new chip — a fuller stack response than any US lab put up against Google this week.
- Key points
- Qwen 3.7-Max pitched as 'The Agent Frontier' — long-horizon tool use, claimed 35-hour autonomous task runs, 1,000+ tools.
- Preview Max and Plus variants appeared on Arena leaderboard May 14 with no press release; Max hit Elo 1475 (#13 overall, #7 in math).
- Coincided with Alibaba's launch of its Zhenwu M890 AI chip at the same summit.
- Land date pulled forward to overlap with Google I/O Day 2.
- Provenance
- Article · Supporting source
-
9
OpenAI co-founder Andrej Karpathy joins Anthropic's pre-training team
Article TechCrunch — First on-record Anthropic spokesperson confirmation of Karpathy's charter at the company.
The role itself — model-in-the-loop on pre-training — is the most concrete public artifact yet of what 'recursive self-improvement' looks like as an org chart inside a frontier lab.
techcrunch.com/2026/05/19/openai-co-founder… →Details
- Context
- The role itself — model-in-the-loop on pre-training — is the most concrete public artifact yet of what 'recursive self-improvement' looks like as an org chart inside a frontier lab.
- Key points
- Karpathy reports into Nick Joseph, Anthropic's pre-training lead.
- Charter: start a team that uses Claude itself to accelerate pre-training research.
- Pre-training is described as the most compute-intensive phase of building a frontier model.
- Anthropic confirmed the role on the same day as the tweet.
- Provenance
- Article · Supporting source
-
10
Gemini Omni still can't render a clean backflip
X Able-Line2683 (r/singularity) — Reddit user testing Omni hours after launch.
A keynote claim that an episode shouldn't repeat without checking. Within hours, builders had already found the seam in Omni's marketing.
www.reddit.com/r/singularity/comments/1thoh… →Details
- Context
- A keynote claim that an episode shouldn't repeat without checking. Within hours, builders had already found the seam in Omni's marketing.
- Key points
- Same-day stress test of Gemini Omni's physics claim.
- Backflip generation fails — body distorts mid-flip in the linked share.
- Post hit 600+ upvotes, 100+ comments in hours.
- Cuts directly against the headline I/O pitch that Omni handles physics, gravity, and kinetic motion better than prior models.
- Provenance
- Tweet · Primary source
Foothills
00:00:04 Demis Hassabis took the Shoreline Amphitheatre stage in Mountain View yesterday afternoon and closed the Google I/O 2026 keynote with a sentence Google had clearly workshopped for weeks. Quote: When we look back at this time, I think we will realize that we were standing in the foothills of the singularity.
00:00:23 End quote. He called it a profound moment for humanity, and walked off. It was a strong keynote on its own terms — three new models, a coding agent that built an operating system on stage, a 24/7 personal AI tier called Spark, and a benchmark deck thick enough to justify the rhetoric, at least the part that's quantifiable.
00:00:44 Hassabis used to be the conservative voice on AGI timelines; five to ten years was his line for most of the last cycle. He's compressed it to just a few, and at I/O he gave Google's most aggressive on-stage framing yet. And about forty minutes before he walked out, Andrej Karpathy — one of the most-followed researchers working on large language models — posted to X.
00:01:07 Quote: Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. End quote. Twenty-one thousand likes inside the first hour. Eight hundred twenty-five thousand views in a day. The keynote was still being staged when the news cycle had already split — half on Google, half on the highest-status engineer in the field walking into a competitor.
00:01:33 Today I want to work through both stories together, because they're the same story at different altitudes. We'll go through I/O — Flash, Omni, Antigravity, Spark — then through the lab responses: Anthropic, Alibaba, DeepSeek. Then a substrate failure that happened while Google was pitching Spark.
00:01:52 And we'll end on a builder writeup from this week that's a sharper picture of agentic coding than anything that hit a keynote slide.
Flash, and the price that changed underneath the brand
00:02:00 Start with Gemini 3.5 Flash, because it's the model most of you will actually run. It shipped GA the day of the keynote — Gemini app, Search, API, enterprise, all live. A one million token context window. Sixty-five thousand tokens out. Four thinking levels: minimal, low, medium (the new default), and high.
00:02:20 The pricing comes in at one dollar fifty per million input tokens, nine dollars per million output, with a 90 percent discount on cached input. Google's speed claim was four times faster than comparable frontier models, and up to twelve times faster when run inside Antigravity, where Google controls the harness around it.
00:02:41 The benchmarks are credible — Terminal-Bench 2.1 at seventy-six point two percent, GDPval-AA at sixteen fifty-six Elo, MMMU-Pro in the eighty-three to eighty-four range, and Chatbot Arena placing it ninth overall in text and code at a 1507 score. None of those are leaderboard-leading, but for a model branded Flash they're more than a tier up from the prior generation.
00:03:05 Which brings us to pricing, which is where the model name and the price tag don't agree anymore. A Reddit thread caught it the same evening — Gemini 3.5 Flash costs roughly three times more per token than the previous Flash, and about thirty times more than the original 1.5 Flash from two years ago.
00:03:24 The screenshot went around with the headline: behold, Gemini 3.5 Flash. Here's how I'd read it. The Flash brand used to mean cheap-fast-shallow. The new Flash means cheap-relative-to-Pro and intentionally tuned for long-horizon agent workloads — coding loops, tool calls, repeated turns inside a harness.
00:03:44 Google is repositioning where the volume lives. They want the agentic coding workload running on Flash, paying Flash margins, instead of on Pro at frontier-tier prices. The name carries inherited brand equity the price tag no longer earns. If you've been quietly budgeting agent runs against the old Flash unit economics, this episode is your reminder to re-run the spreadsheet today before your next agent loop posts to billing.
Omni's physics pitch and the backflip test
00:04:11 Gemini Omni is the headline media model and the bet Google planted next to the AGI rhetoric. The pitch on stage was that Omni merges Gemini's reasoning with DeepMind's generative-media stack — Nano Banana, Veo, Genie — into a single model that takes text, image, audio, and video in, and starts with video out.
00:04:32 The marketing line was that Omni handles physics, gravity, and kinetic motion better than the prior Veo generation. Multi-turn editing is consistent. Worlds hold together across cuts. Rollout is staggered: paid Gemini users got Omni day-of, YouTube Shorts and the Create surface get it this week, and the API follows in the coming weeks.
00:04:54 That last bit matters — if you build on Omni today, you're building on a UI rather than an endpoint. The physics claim got tested fast. Inside hours, a Reddit user shared a Gemini link where they'd asked Omni to render someone doing a backflip. The body distorts mid-rotation.
00:05:13 The legs cross the spine in a way that bodies don't. The post collected six hundred upvotes and a long comment thread of people doing the same probe with the same result. So the headline footage clearly works — Google wouldn't have shown it otherwise — and the moment you push into a specific kinematic prior, the model still falls into the same articulation traps Veo had a year ago.
00:05:38 Not catastrophic; not exactly the win-condition for the marketing line either. The more interesting architectural choice is that Google bundled the three media models into one Omni. They could have kept Veo, Genie, and Nano Banana as separate APIs, the way OpenAI keeps DALL-E, Sora, and the realtime voice stack distinct.
00:06:00 They didn't. That tells you Google is betting the future of generative media is a single multimodal inference endpoint, not a portfolio of point models. Whether that bet pays out depends on how soon a single weight set can outperform three specialised ones at their own jobs.
00:06:18 Today, it doesn't — the backflip clip is evidence — but the direction Google's investing in is clear.
Antigravity 2.0 and the OS that boots Doom
00:06:25 Antigravity 2.0 was the demo. Onstage, Varun Mohan ran Gemini 3.5 Flash inside the new Antigravity harness and built an operating system from scratch. Live. Or rather, the run was live; the result was pre-baked, since the actual build took twelve hours. Here are the numbers Google put on the screen.
00:06:45 Ninety-three parallel sub-agents, fifteen thousand model requests, two point six billion tokens consumed, twelve hours of wall-clock time, and less than one thousand dollars in API credits. Output: a microkernel, a basic shell, a window manager, and just enough graphics scaffolding to boot Doom.
00:07:04 Cue the room. Hold the altitude on this one. It's not a Linux replacement. The OS does one famous thing. Nobody's pushing it to production. The point of the demo was the orchestration shape — that ninety-three coding agents can run in parallel under a single harness for twelve hours and still produce a coherent codebase at the end.
00:07:26 A year ago, an order of magnitude smaller scope would have cost more in compute and three months in engineer time, and would have needed a researcher driving it the whole way. A thousand dollars of inference producing a working microkernel that runs Doom is a real productivity number you can put on a slide.
00:07:46 It's not the slide it would have been five years ago, but it's a slide. The skeptical read. The demo was rehearsed end-to-end. The OS hasn't been fuzzed, hasn't shipped, and isn't running anyone's payroll. The success rate of an unscripted ninety-three-agent run is the number Google didn't publish.
00:08:05 I'd want a few independent reproductions before I trusted the eighty-cents-per-line economics to my own backlog. The Antigravity SDK and the new Managed Agents API in Gemini are how that reproduction is supposed to happen, and they shipped same-day. I'll be looking for the first non-Google teams to share their own multi-hour parallel-agent runs and tell us what the dollars-per-shipped-feature number actually looks like.
Spark, and the always-on agent tier
00:08:33 Spark is the announcement that flew under the demo, but it's the one that changes the product surface most. Gemini Spark is a personal AI agent that runs on dedicated Google Cloud VMs, twenty-four hours a day, while your laptop is closed. It checks in with you before any sensitive action — purchases, sends, irreversible edits — and otherwise just keeps running.
00:08:55 The target workloads Google pitched were the ones you can't quite delegate to ChatGPT today: monitor these listings and tell me when something matches; watch this project board and prep a summary; draft replies to overnight emails and queue them for review; and chase down vendors.
00:09:13 Long-horizon tasks measured in days, not turns. Spark is Google's direct answer to where Claude has been heading with the Agent SDK and ChatGPT has been heading with background agents. The form factor is now standard across the three frontier vendors: a server-resident agent loop you can talk to like an inbox, with a permission boundary and an approval queue.
00:09:35 What's been missing from each lab's version is integration — agents that can reach Drive, Gmail, Calendar, and Search without you wiring tools. Spark gets that almost for free, because it's Google. The number Google didn't put on the slide was the price. Hassabis said pricing later.
00:09:53 If Spark turns out to be bundled with the existing Google AI plan, the unit economics for agent-resident apps shift overnight — every Gemini subscriber suddenly has a server agent they didn't have last week. If it's a two-hundred-dollar add-on like Claude Max, it's a luxury tier and the indie-builder ecosystem keeps owning the long tail.
00:10:14 That price disclosure is the I/O thread I'd track next.
Karpathy
00:10:17 Now the lab response, starting with the one everyone read first. Andrej Karpathy posted at eight oh five Pacific yesterday. Four sentences. Quote: Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative.
00:10:35 I am very excited to join the team here and get back to R and D. I remain deeply passionate about education and plan to resume my work on it in time. End quote. The context. He co-founded OpenAI in 2015, left for Tesla in 2017 to lead Autopilot, came back to OpenAI in 2023 for a year, then left in 2024 to start Eureka Labs, his AI-for-education project.
00:10:59 Eighteen months on his own. And now he's at Anthropic on Nick Joseph's pre-training team. Anthropic told TechCrunch the charter is to start a team that uses Claude itself to accelerate pre-training research — model-in-the-loop on the most compute-expensive phase of building a frontier model.
00:11:19 Ethan Mollick wrote a sentence yesterday that lands on this directly. Quote: One interesting side feature of recursive self-improvement, to the extent that is happening, is that it makes the Big Three labs more appealing to talent, and shortens the runway for launching a potential competitor instead at the same time.
00:11:41 End quote. That's the read I'd offer. Karpathy spent eighteen months trying to build something on his own. He looked at the slope and decided the slope compounds faster inside one of the three frontier labs than outside any of them. Eureka isn't dead — he said he plans to resume it — but the order of operations is now: spend the formative years inside Anthropic first, education later.
00:12:07 A sober update for any senior engineer telling themselves the indie play still has a clean runway. And then there's the timing. Karpathy could have posted any morning. Anthropic could have asked him to wait until I/O was over. They didn't. Whether that's strategy or just two people landing news when it was ready, the day's narrative bent toward — the talent voted for Anthropic, no matter what Google rolled out on stage.
00:12:35 Google had the bigger announcement. Anthropic had the bigger signal. Both were true by lunchtime.
Alibaba's full-stack answer: Qwen 3.7-Max and a new chip
00:12:42 The other competitive response came from Alibaba, and it's stronger than anything the US labs put up yesterday. Qwen 3.7-Max hit Hacker News this morning under the blog title, The Agent Frontier. Alibaba had been seeding Max and Plus on the Arena leaderboard for five days with no press release — the standard Qwen launch pattern.
00:13:04 Max ended preview at Elo fourteen seventy-five, number thirteen overall, number seven in math. The headline product claims are long-horizon: thirty-five-hour autonomous task runs, support for more than a thousand registered tools. Whether those numbers hold up in independent harness tests is the next thing to track, but the framing is the right one — Qwen positioning itself explicitly as the agent frontier rather than the open-weight frontier.
00:13:34 And Alibaba did it as a stack. The same Alibaba Cloud summit launching Max is also launching the Zhenwu M890, Alibaba's new AI inference chip. Model and silicon on the same day. The closest US analogue would have been if Google had launched Omni alongside a TPU v6 SKU, which they didn't quite do.
00:13:53 Alibaba is treating agent-tier model launches as a chip event, and that fundamentally changes the cost-curve story for Chinese inference over the next year. It's also a more aggressive integration than any of the US frontier labs are running, since none of them own their own fab line.
00:14:12 The timing is not subtle. Alibaba's summit lands on I/O day two. The model was clearly held back from a January or February ship just so it could land this week. Nobody at Alibaba is pretending this is anything other than counter-programming.
DeepSeek hires a harness team
00:14:28 Then there was a smaller post that's a bigger signal. Deli Chen, an engineer at DeepSeek, tweeted yesterday: We're hiring. DeepSeek is forming a new Harness team to build Code Harness from the ground up — maybe you can call it DeepSeek Code or something like this.
00:14:45 Beijing-based. Two roles open — a Harness Product Manager and Harness R and D. The post collected twenty-three thousand views and three hundred fifty likes in a day, which for a DeepSeek personnel tweet is major engagement. The reason to mark this. A year ago, the word harness was a research term you'd see in METR papers.
00:15:05 Today it's a hiring category at every frontier lab. Anthropic has the Claude Code team. OpenAI has the Codex team. Google has the Antigravity team. Alibaba is bundling Qwen with their own harness already. DeepSeek was the last of the major Chinese labs publicly building everything model-first, and yesterday they admitted in a job listing that the model alone is no longer the product.
00:15:30 The product is the model plus the loop around it — the tool registry, the planner, the memory, the security policy, the approval queue, and the failure-recovery logic. If you're a senior engineer thinking about where to spend the next two years, harness work is now a discipline.
00:15:47 It has its own pattern library, its own failure modes around prompt injection and tool poisoning, and its own evaluation problem. DeepSeek opening this team this week tells you that no major lab thinks the next eighteen months are about better weights alone. They're about the loop the weights run inside.
Railway, GCP, and the substrate question Google didn't address
00:16:07 While I/O was running, something else was happening on the same Google infrastructure Google was busy selling. At twenty-two ten UTC on the nineteenth, Google Cloud's automated systems incorrectly suspended Railway's production account as part of a platform-wide automated action.
00:16:25 Railway is one of the better-known PaaS companies for developers. Their dashboard went dark, their API went dark, and their persistent disks were inaccessible. For about an hour, Railway's edge proxies kept serving traffic from cached routing tables. Then the cache expired, and everything they ran on AWS and on their own Metal hardware also went unreachable, because the routing-table source of truth was a control-plane API hosted on Google Cloud.
00:16:55 Recovery took eight hours. By the time it was over, GitHub had piled on by rate-limiting Railway's OAuth and webhook integrations on the burst of retried calls. The postmortem by Chandrika Khanduri and Cody De Arkland is candid in a way you don't always see. They explain that Railway's network was designed as a mesh ring across Google Cloud, AWS, and their own metal — but the discovery mechanism for that mesh was a control-plane API hosted in a single cloud.
00:17:25 So the mesh worked for cached routes for one hour. Then it didn't. They're removing GCP from the data-plane hot path and extending high-availability database quorum across AWS and Metal, so any single cloud disappearing instantly will leave the database with a majority.
00:17:43 The line I'd pull is this. Khanduri and De Arkland frame it bluntly: Railway picked the vendors, so Railway owns the outcome. Customers don't care whether the failure was Google or Railway — they see your product. That is the right framing for a public postmortem, and it's the framing every team running on a single hyperscaler should internalize before their own bad day.
00:18:07 The reason this belongs next to I/O is direct. Spark and Antigravity 2.0 are both pitches for moving more of your workload onto Google Cloud — the agent runtime, the harness, the model, the storage, and the auth. The Railway story is what trusting one vendor to be your full substrate looks like on the bad day.
00:18:28 An automated suspension, no proactive outreach to customers before the action, and eight hours of downtime. The postmortem ends up being you owning it publicly, because the customer was never going to read the GCP status page.
What 100K lines of Rust with AI actually looks like
00:18:42 Step away from the keynote for a few minutes. The most interesting builder writeup of the week is from Cheng Huang, an architect who published Learnings from One Hundred Thousand Lines of Rust with AI. The project: a Rust multi-Paxos consensus engine, rebuilt from scratch, mirroring the feature set of Azure's RSL — the replicated state library that underpins most of Azure's core services.
00:19:08 He drove a hundred and thirty thousand lines of production Rust in roughly six weeks, with thirteen hundred tests covering more than sixty-five percent of the codebase. Throughput went from twenty-three thousand operations per second to three hundred thousand in three weeks of performance tuning.
00:19:27 Almost all of that with AI agents in the driver's seat and Huang doing review and direction. His drivers are Claude Code and Codex CLI, with VS Code for diffs and minor edits. His subscription anecdote is funny enough to quote. He pays a hundred dollars a month for Anthropic's Max plan, and quote: This became a forcing function — if I don't kick off a coding task with Claude before bed, I feel like I'm wasting money.
00:19:54 End quote. When Codex CLI shipped, he added a second ChatGPT Plus subscription and rotates them — Anthropic Monday through Wednesday, OpenAI Thursday through Sunday. He also notes that on his subjective sample, GPT-5 High writes better code contracts than Opus four point one, for what it's worth.
00:20:13 The technique to pull forward is what he calls AI-driven code contracts. Preconditions, postconditions, and invariants attached to critical functions — converted into runtime asserts during testing, compiled out in production. He applies them at three levels. First, AI writes the contracts; he reviews and refines.
00:20:33 Second, AI generates targeted test cases from each post-condition. Third — and this is where it pays off — AI translates the contracts into property-based tests that explore randomised input spaces. One AI-generated contract caught a subtle Paxos safety violation before it ever shipped.
00:20:52 In the old world that bug surfaces in production as a replication inconsistency you debug on a Saturday from a customer report. That stack — contracts as the AI-readable spec, property tests as the AI-driven verifier, and the human architect reviewing contract correctness rather than diff-reading every function — is what serious agentic coding looks like outside the demo.
00:21:16 Not ninety-three agents in parallel boots Doom. A senior engineer with a clear design markdown, a two-hundred-dollar-per-month subscription stack, and a contract regime that turns LLM output into something Paxos-safe. Huang's writeup landed the same week Antigravity 2.0 did, and the contrast is useful.
00:21:36 The framing on the I/O stage is, fire up dozens of agents and watch the magic. The framing in Huang's article is, write the contracts and let the agents enforce them. Both are real. But if I'm picking what to copy into my own workflow tomorrow, it's the contract regime.
00:21:53 The ninety-three-agent run is a benchmark. The contract loop is a practice.
What today added up to
00:21:58 Sign-off. Pull a few threads. Google had the bigger announcement. Three new models, a coding harness that builds a microkernel in twelve hours, and Hassabis on stage compressing his AGI timeline by half. The Flash and Antigravity numbers will change the cost curve for agentic workloads if they hold up in independent reproductions.
00:22:15 Anthropic had the bigger signal. Karpathy walked into a pre-training role built around using Claude to improve Claude. If recursive self-improvement is even partially happening, that's the org chart that proves it. Alibaba did something neither lab did: shipped a model and a chip in the same week, treating the agent frontier as a vertically integrated product.
00:22:34 DeepSeek admitted, in a job listing, that the model alone is no longer the product. Railway's incident report is the reminder that the substrate Google is selling you on stage is the same substrate that took an entire PaaS down for eight hours yesterday morning.
00:22:48 And Cheng Huang's writeup is the proof, on the same week as the ninety-three-agent demo, that the serious gains for builders right now come from contracts and property tests, not from spinning up more parallel agents. Hassabis's foothills line is selling the slope, not measuring it.
00:23:03 The slope is there. The summit isn't visible from here. Tomorrow we'll see what reproductions of the Antigravity demo look like in the wild, and whether Spark's price tag turned into a number. Talk then. — Lenar Kess.