◆ Dispatch 027 · 2026-05-19 braixd

Flash beats Pro — and everything else Google dropped at I/O

2026-05-19 / 00:10:13 / 21 sources

“Google's putting a Flash model in the driver's seat for coding and agents — and it actually beats 3.1 Pro on the benchmarks that matter. The naming is confusing, but the architecture is doing something real.”
— Seln Oriax, today's narration

Google I/O delivered a stacked agenda: Gemini 3.5 Flash (which beats 3.1 Pro on coding and agentic benchmarks), Gemini Omni (the multimodal video model), Antigravity 2.0, Gemini Spark, an Android CLI that works with Claude Code and Codex, and a complete Search overhaul. Plus Meta mandating 7,000+ workers into AI teams and a practical take on agent maturity from Cline's Ara Khan.

Local pass notes: Flash beating Pro is unusual but not unprecedented. The pricing confusion around Flash vs Pro tiers is worth tracking. The real test will be how these models behave in long-running agent loops beyond the demo stage.

Chapters

00:00:04 Flash that beats Pro
00:03:03 The agentic platform around it
00:04:58 Gemini Omni, the video thing
00:07:02 Search overhaul and the labor shift
00:08:26 Agent maturity and what's next

Sources

21 cited

1
HVM3 wnf bug test

X VictorTaelin — Taelin, researcher focused on formal verification and model evaluation

The new Gemini 3.5 Flash solved the HVM3's wnf bug in 1/3 attempts. This is my main test to take a model seriously. So far only the big models like GPT 5.5 solved it. And seems like it is 20x faster than Opus 4.6!
x.com/VictorTaelin/status/20567477529109386… →
Details
Cited text
The new Gemini 3.5 Flash solved the HVM3's wnf bug in 1/3 attempts. This is my main test to take a model seriously. So far only the big models like GPT 5.5 solved it. And seems like it is 20x faster than Opus 4.6!

Context
The wnf bug is a real stress test that separates capable models from flash-tier ones. Taelin's independent benchmark is a useful data point.
Key points
Solved HVM3 wnf bug in 1/3 attempts
Only big models like GPT-5.5 had solved it before
Roughly 20x faster than Opus 4.6
Provenance
Tweet · Primary source
2
Don't Build Slop (4 Levels of AI Agent Maturity)

Video Ara Khan, Cline

Ara Khan's framework from Cline provides a practical lens on agent design, and the prompt-length claim about GPT-5.3 is a specific data point about model capability changes.
www.youtube.com/watch?v=yUmS-F9IX90 →
Details
Context
Ara Khan's framework from Cline provides a practical lens on agent design, and the prompt-length claim about GPT-5.3 is a specific data point about model capability changes.
Key points
Four levels of agent maturity
GPT-5.3 prompts are one-third the size of GPT-5
Longer system prompts cause sensory overload
Every addition to an agent risks making it worse
Provenance
Video · Supporting source
3
On Karpathy's return

X polynoamial — Noam Brown, AI researcher (former OpenAI, DeepMind, Meta FAIR)

Andrej is back in the game! I would have loved for him to rejoin OpenAI, but I'm happy he's at any frontier lab pushing the field forward.
x.com/polynoamial/status/2056768036837949914 →
Details
Cited text
Andrej is back in the game! I would have loved for him to rejoin OpenAI, but I'm happy he's at any frontier lab pushing the field forward.

Context
Brown's framing captures a broader industry perspective on Karpathy's moves and the non-zero-sum nature of frontier AI research.
Key points
Karpathy at a frontier lab
Noam Brown sees it as collective progress
Not zero-sum among labs
Provenance
Tweet · Primary source
4
Gemini Omni announcement

X OfficialLoganK — Logan Kilpatrick, VP of Gemini at Google

Omni is our new model that can create anything from any input — starting with video (think Nano Banana but for video). Available in the Gemini App, Flow, and YouTube, with API support coming soon!
x.com/OfficialLoganK/status/205678787426016… →
Details
Cited text
Omni is our new model that can create anything from any input — starting with video (think Nano Banana but for video). Available in the Gemini App, Flow, and YouTube, with API support coming soon!

Context
The video generation capability is Google's entry into the multimodal content creation space, positioning Omni as a creative tool rather than just a reasoning model.
Key points
Create anything from any input
Starting with video generation
Nano Banana for video
Available in Gemini App, Flow, YouTube
Provenance
Tweet · Primary source
5
Introducing Gemini 3.5

X GoogleDeepMind — Google DeepMind's official account

Introducing Gemini 3.5: our newest family of models combining frontier intelligence with real-world action. The first release is 3.5 Flash, our strongest model yet for agents and coding
x.com/GoogleDeepMind/status/205678798777481… →
Details
Cited text
Introducing Gemini 3.5: our newest family of models combining frontier intelligence with real-world action. The first release is 3.5 Flash, our strongest model yet for agents and coding

Context
This is Google's primary model announcement at I/O 2026, marking their explicit shift toward agentic AI over conversational AI as the core product strategy.
Key points
Gemini 3.5 is the new model family
3.5 Flash is the first release
Positioned as strongest model for agents and coding
Rolling out in Gemini app and Search AI Mode
Engagement
2184 likes · 295 retweets · 69 replies

Provenance
Tweet · Primary source
6
Gemini app pricing changes

X GeminiApp

We're reducing the price of our top-tier Google AI Ultra plan from $250 to $200/mo AND introducing a new $100/mo Ultra plan tier. This new plan unlocks 5x higher usage limits to Gemini app than the Pro plan
x.com/GeminiApp/status/2056792679607103626 →
Details
Cited text
We're reducing the price of our top-tier Google AI Ultra plan from $250 to $200/mo AND introducing a new $100/mo Ultra plan tier. This new plan unlocks 5x higher usage limits to Gemini app than the Pro plan

Context
The pricing restructuring shows Google's commitment to lowering the barrier for heavy AI users, which is relevant for anyone evaluating AI costs.
Key points
Ultra plan dropped from $250 to $200/mo
New $100/mo Ultra tier
5x higher usage limits than Pro
Includes early access to new features
Provenance
Tweet · Primary source
7
Gemini 3.5: frontier intelligence with action

Article Koray Kavukcuoglu — Koray Kavukcuoglu, CTO of Google DeepMind and Chief AI Architect at Google

3.5 Flash delivers intelligence that rivals large flagship models on multiple dimensions, at the speeds you have come to expect from the Flash series. It's our strongest agentic and coding model yet.
blog.google/innovation-and-ai/models-and-re… →
Details
Cited text
3.5 Flash delivers intelligence that rivals large flagship models on multiple dimensions, at the speeds you have come to expect from the Flash series. It's our strongest agentic and coding model yet.

Context
The official model card from DeepMind provides the authoritative benchmark numbers and architectural details behind the Flash-over-Pro claim.
Key points
Flash beats Pro on benchmarks
4x faster than other frontier models
Antigravity integration
3.5 Pro coming next month
Available globally via Gemini app and Search
Provenance
Article · Supporting source
8
Google's Gemini Omni turns images, audio, and text into video

Article Rebecca Bellan

Omni can take a combination of inputs and rather than simply stitching those inputs together, Omni reasons across all of them to produce a consistent output. The first model is Gemini Omni Flash, rendering 10 seconds of…
techcrunch.com/2026/05/19/googles-gemini-om… →
Details
Cited text
Omni can take a combination of inputs and rather than simply stitching those inputs together, Omni reasons across all of them to produce a consistent output. The first model is Gemini Omni Flash, rendering 10 seconds of video.

Context
Omni represents Google's play for creative tools and video generation, competing in a space where very few video models have reached consumer usability.
Key points
Multi-modal reasoning across text, images, audio, video
10-second video output
Digital avatars with SynthID watermarking
Consumer-focused with enterprise implications
Editing prompts need high specificity
Provenance
Article · Supporting source
9
Agentic app coding gets an upgrade with Google's release of Android CLI

Article Sarah Perez

The move acknowledges that many people are now building for Android with AI agents that aren't from Google. The company is trying to find a way to make its specialized knowledge more accessible.
techcrunch.com/2026/05/19/agentic-app-codin… →
Details
Cited text
The move acknowledges that many people are now building for Android with AI agents that aren't from Google. The company is trying to find a way to make its specialized knowledge more accessible.

Context
Google's acceptance that developers use non-Google agents for Android development is a pragmatic shift in platform strategy.
Key points
Android CLI 1.0 stable
Works with Claude Code, Codex, Antigravity
Taps into Android Studio capabilities
Optional bundle for Google Antigravity
Provenance
Article · Supporting source
10
Gemini 3.5 Flash announcement

X JeffDean — Jeff Dean, Google's Senior Fellow and head of Google AI

3.5 Flash is our strongest model for coding and agent workflows. It outscores 3.1 Pro on agentic and coding benchmarks like Terminal-Bench and MCP Atlas, while running 4x faster than other frontier models. Used in Googl…
x.com/JeffDean/status/2056793419033588091 →
Details
Cited text
3.5 Flash is our strongest model for coding and agent workflows. It outscores 3.1 Pro on agentic and coding benchmarks like Terminal-Bench and MCP Atlas, while running 4x faster than other frontier models. Used in Google Antigravity, 3.5 Flash is even further optimized to be up to 12x faster.

Context
A Flash model beating a Pro model on benchmarks is unusual and suggests a significant architectural shift in how Google is prioritizing speed for agentic workloads.
Key points
Beats 3.1 Pro on coding and agentic benchmarks
4x faster than other frontier models
12x faster within Antigravity
Deploy sub-agents that collaborate at scale
Provenance
Tweet · Primary source
11
Flash pricing comparison

X emilheap

Holy expensive for a flash model ($1.5/$9). Almost same price as Gemini 3.1 Pro Preview. 3x - Gemini 3 Flash Preview. 5x - Gemini 2.5 Flash. 6x - Gemini 3.1 Flash Lite
x.com/emilheap/status/2056793636923162943 →
Details
Cited text
Holy expensive for a flash model ($1.5/$9). Almost same price as Gemini 3.1 Pro Preview. 3x - Gemini 3 Flash Preview. 5x - Gemini 2.5 Flash. 6x - Gemini 3.1 Flash Lite

Context
The pricing is the core of the naming confusion — if Flash costs as much as Pro, the tier label loses its meaning for cost-conscious users.
Key points
Flash is $1.5/$9 per million tokens
Priced similarly to Pro
Older Flash tiers were 3-6x cheaper
Provenance
Tweet · Primary source
12
With Gemini 3.5 Flash, Google bets its next AI wave on agents, not chatbots

Article Rebecca Bellan

Google launched Gemini 3.5 Flash, its most powerful coding and agentic AI model yet. The release signals Google's shift from pitching AI as a conversational tool to AI as an agentic tool.
techcrunch.com/2026/05/19/with-gemini-3-5-f… →
Details
Cited text
Google launched Gemini 3.5 Flash, its most powerful coding and agentic AI model yet. The release signals Google's shift from pitching AI as a conversational tool to AI as an agentic tool.

Context
TechCrunch's reporting captures the strategic pivot Google is making — positioning agents over chatbots as the primary AI interaction model.
Key points
Autonomously executes coding pipelines
Builds software from scratch
Sub-agents deploy to work in parallel
3.5 Pro will be the orchestrator
Gemini Spark and search agentic features announced
Provenance
Article · Supporting source
13
Meta is rapidly reorganizing its workers' jobs around AI: 'Transfers aren't optional'

Article Varsha Bansal — Varsha Bansal, technology reporter at The Guardian

Some employees will be moved to new teams focused on AI agents and cloud infrastructure. Late last week, Meta employees received a notice that engineers had been 'selected' for reassignment.
www.theguardian.com/technology/2026/may/19/… →
Details
Cited text
Some employees will be moved to new teams focused on AI agents and cloud infrastructure. Late last week, Meta employees received a notice that engineers had been 'selected' for reassignment.

Context
The scale of Meta's forced AI reorganization is a concrete signal of how aggressively companies are reshaping their workforces for the AI transition.
Key points
More than 7,000 workers to move to AI teams
New teams: AI cloud infrastructure and internal AI agent 'Hatch'
Transfers are mandatory, not voluntary
Similar move last month: 1,000 engineers to Applied AI
Provenance
Article · Supporting source
14
Antigravity 2.0 announcement

X antigravity

Rebuilt from the ground up with multi-agent teams, scheduled tasks, native voice and one-click integration with other Google
x.com/antigravity/status/2056795168326754759 →
Details
Cited text
Rebuilt from the ground up with multi-agent teams, scheduled tasks, native voice and one-click integration with other Google

Context
Antigravity 2.0 is Google's dedicated agent development environment, signaling that agent tooling is becoming a first-class product category.
Key points
Standalone desktop app
Multi-agent teams
Scheduled tasks
Native voice support
Provenance
Tweet · Primary source
15
Gemini 3.5 Flash benchmarks

X koraykv — Koray Kavukcuoglu, Chief Technologist at Google DeepMind

Beats 3.1 Pro on coding & agentic benchmarks Terminal-Bench 2.1 (76.2%), GDPval-AA (1656 Elo) and MCP Atlas (83.6%). 4x faster than other frontier models (12x in Antigravity!). SOTA on multimodality with 83.6% on MMMU-P…
x.com/koraykv/status/2056795667088204234 →
Details
Cited text
Beats 3.1 Pro on coding & agentic benchmarks Terminal-Bench 2.1 (76.2%), GDPval-AA (1656 Elo) and MCP Atlas (83.6%). 4x faster than other frontier models (12x in Antigravity!). SOTA on multimodality with 83.6% on MMMU-Pro

Context
These are the specific benchmark numbers that substantiate Google's claim of Flash beating Pro. Worth cross-referencing against independent tests.
Key points
Terminal-Bench 2.1: 76.2%
GDPval-AA: 1656 Elo
MCP Atlas: 83.6%
MMMU-Pro: 83.6%
4x speed over other frontier models
Provenance
Tweet · Primary source
16
I/O highlights thread

X sundarpichai — Sundar Pichai, CEO of Google and Alphabet

Compared to 3.1 Pro, 3.5 Flash is better across almost all benchmarks with huge progress in coding. Google is now processing 3.2 quadrillion tokens per month, up from 480T tokens a year ago and 9.7T tokens two years ago.
x.com/sundarpichai/status/20567968939514267… →
Details
Cited text
Compared to 3.1 Pro, 3.5 Flash is better across almost all benchmarks with huge progress in coding. Google is now processing 3.2 quadrillion tokens per month, up from 480T tokens a year ago and 9.7T tokens two years ago.

Context
Pichai's token volume numbers show the scale at which these models operate — the growth from 9.7T to 3.2Q in two years is staggering.
Key points
3.5 Flash beats 3.1 Pro on benchmarks
Huge progress in coding
3.2 quadrillion tokens/month at Google
Exponential growth trajectory
Provenance
Tweet · Primary source
17
Early access test of Gemini 3.5 Flash

X emollick — Ethan Mollick, Wharton professor and AI researcher known for practical AI evaluation

Very fast for a flash model and very capable, though not as powerful as a full frontier model. I added it to the gallery or procedurally generated one-shot towns (it made one error that it corrected)
x.com/emollick/status/2056798490353705380 →
Details
Cited text
Very fast for a flash model and very capable, though not as powerful as a full frontier model. I added it to the gallery or procedurally generated one-shot towns (it made one error that it corrected)

Context
Mollick's independent early access test provides ground-level verification of the model's actual capability versus Google's benchmark claims.
Key points
Fast and capable for a flash model
Not yet at full frontier model level
Self-corrected one error in town generation
Provenance
Tweet · Primary source
18
Prompts are code, .json/.md files are state

Article Mario Zechner — Creator of LibGDX and the Spine animation runtime, writing from a senior engineering perspective on agentic tooling.

LLMs also lack taste. Trained on all code on the web (and likely some private code), they generate, to oversimplify, the statistical mean of what they've seen.
mariozechner.at/posts/2025-06-02-prompts-ar… →
Details
Cited text
LLMs also lack taste. Trained on all code on the web (and likely some private code), they generate, to oversimplify, the statistical mean of what they've seen.

Context
Zechner is calling out the gap between benchmarked context length and actual engineering utility—a constraint any team running long-horizon agents will hit in production.
Key points
Context window tricks don't fix middle-context degradation around 100k tokens
Agentic coding on large codebases needs structured context engineering, not just bigger windows
The prompt-as-code paradigm reveals why agentic workflows break on legacy systems
Provenance
Article · Supporting source
19
Most human tasks are not Markovian

X François Chollet

Most human tasks are not Markovian, the optimal next action cannot be determined solely by looking at the current state. It depends heavily on the past trajectory, the original intent, and context constraints. An agent.…
x.com/fchollet/status/2056777649880752160 →
Details
Cited text
Most human tasks are not Markovian, the optimal next action cannot be determined solely by looking at the current state. It depends heavily on the past trajectory, the original intent, and context constraints. An agent...

Context
Chollet's critique lands on the structural blind spot in most agentic harnesses: they optimize for single-step or short-chain reasoning while the actual work requires maintaining a coherent historical record of intent.
Key points
Current agent frameworks treat tasks as Markovian—decisions based only on present state
Real work depends on trajectory, intent, and constraints that accumulate over time
Bridging this gap requires moving beyond stateless function-calling loops
Provenance
Tweet · Primary source
20
Gemini 3.5 Flash rollout

X Google DeepMind

We’re rolling out 3.5 Flash to everyone in the @GeminiApp and AI Mode in @Google Search. Developers can start building in @Antigravity and via the Gemini API in @GoogleAIStudio.
x.com/GoogleDeepMind/status/205678799236337… →
Details
Cited text
We’re rolling out 3.5 Flash to everyone in the @GeminiApp and AI Mode in @Google Search. Developers can start building in @Antigravity and via the Gemini API in @GoogleAIStudio.

Context
The rollout signals where Google is placing its near-term betting: speed and cost over raw frontier capability, targeting the agentic and search workflows that actually drive daily usage.
Key points
Gemini 3.5 Flash is shipping globally today
Available in Gemini App, Google Search AI Mode, Antigravity agent tool, and Gemini API
Positioned as fast, consistent, and cheaper than competing frontier models
Provenance
Tweet · Primary source
21
Ambient agents digesting long traces

X Sydney Runkle

two years ago we started building agents to automate work. turns out these are really useful, so there’s a LOT of runs and long traces that are hard to reason about now, use an ambient agent (engine) to digest and...
x.com/sydneyrunkle/status/20568014965667759… →
Details
Cited text
two years ago we started building agents to automate work. turns out these are really useful, so there’s a LOT of runs and long traces that are hard to reason about now, use an ambient agent (engine) to digest and...

Context
Runkle's observation tracks the real bottleneck: once agents ship, their execution logs become the new dependency tree. Without ambient summarization or tracing, debugging agent loops turns into archaeology.
Key points
Agent runs are accumulating into long traces that are hard to debug or reason about
Teams are building ambient digesting engines to manage the accumulation
The tooling gap is shifting from agent execution to agent observability
Provenance
Tweet · Primary source

00:00:04

Flash that beats Pro

00:00:04 Google I/O opened with Gemini 3.5, and the headline model is a flash model that beats a Pro model. That's an unusual stacking. Koray Kavukcuoglu, DeepMind's chief technologist, laid out the scores: Terminal-Bench 2.1 at 76.2 percent, GDPval-AA at 1,656 Elo, and 83.6 percent on both MCP Atlas and MMMU-Pro.

00:00:27 Flash runs about four times faster than other frontier models, and the Antigravity-optimized version claims a 12x speedup over standard Flash latency. It clears almost every coding and agentic benchmark in the model card. Jeff Dean's thread put it bluntly: 3.5 Flash is the strongest model for coding and agentic workflows that Google has built.

00:00:54 Google is rolling it out across the Gemini app, Search AI Mode, the API, Antigravity, AI Studio, and the Enterprise Agent Platform all at once. What's actually new here is the agentic architecture. 3.5 Flash can plan across massive codebases, deploy sub-agents that run in parallel, and sustain long-horizon tasks.

00:01:18 At I/O, Google engineer Varun Mohan demonstrated agents spawning off to work on separate components of an operating system build, then converging. Tulsee Doshi, Google's senior director for product, described 3.5 Pro, coming next month, as the orchestrator and planner, while 3.5 Flash becomes the various sub-agents executing those plans.

00:01:45 The division of labor is explicit: reasoning power in the Pro model, brute-force tool use and speed in Flash. Ethan Mollick tested 3.5 Flash early and noted it's fast and capable, though not quite at the level of a full frontier model. He put it through a procedural town generation task and it made one error before self-correcting.

00:02:11 Victor Taelin tested it against the HVM3 wnf bug, and it solved it in one out of three attempts — something only bigger models like GPT-5.5 had managed before. He noted it appeared roughly 20 times faster than Opus 4.6. Both of those are independent checks worth holding onto.

00:02:32 The token price for 3.5 Flash is $1.50 per million input tokens and $9 per million output tokens. People noticed. Emil Heap pointed out the pricing is almost the same as Gemini 3.1 Pro Preview, which makes the naming confusing. Philip S similarly questioned whether Flash really does make sense as a tier name when the price is Pro-adjacent.

00:02:59 The naming is arguably the weakest part of this launch.

00:03:03

The agentic platform around it

00:03:03 Google didn't just ship a model today. It shipped an entire agentic stack. Antigravity 2.0 is a standalone desktop application rebuilt from the ground up. It has multi-agent teams, scheduled tasks, native voice, and one-click integration with other Google tools.

00:03:23 Google DeepMind also released a CLI, a mission control dashboard, and an SDK for interacting with Antigravity agents. Koray Kavukcuoglu said Flash has become an integral part of their daily research cycle, and demonstrated a team of agents in Antigravity 2.0 recreating the original AlphaZero research paper and building a playable version.

00:03:50 Google's Android CLI hit version 1.0 stable. This is notable because it's designed for AI agents. Claude Code, OpenAI's Codex, Antigravity, and Gemini in Android Studio can all use it to retrieve Android-specific development knowledge. The move signals Google accepting that developers are building Android apps with non-Google agents and making their specialized knowledge accessible.

00:04:20 Then there's Gemini Spark, Google's 24/7 personal AI agent powered by Gemini 3.5, with Gmail integration. And the Gemini app itself got a price restructuring: the top-tier Google AI Ultra plan dropped from $250 to $200 per month, and a new $100 Ultra tier was introduced with five times the usage limits of the Pro plan, plus early access to new features.

00:04:47 Sundar Pichai noted Google is now processing 3.2 quadrillion tokens per month, up from 480 trillion a year ago and 9.7 trillion two years ago.

00:04:58

Gemini Omni, the video thing

00:04:58 Gemini Omni is a separate thread in today's announcements. It's a new multimodal model that reasons across text, images, audio, and video to generate and edit content. The headline feature is video: Omni can take a combination of inputs and produce a consistent video output, rather than just stitching media together.

00:05:21 Nicole Brichtova, DeepMind's director of product management, described it as a step toward combining Gemini's intelligence with their media rendering capabilities. The first model in the family is Gemini Omni Flash, which rolls out today to the Gemini app, YouTube Shorts, and the creative studio Flow.

00:05:42 It renders ten seconds of video. Not because of a model limitation, Brichtova said, but because Google is prioritizing broad access and expects most users won't want much longer videos yet. The demos lean consumer. Combining an image and an audio clip lets Omni reason across both to produce video that reflects understanding of physics, culture, and context.

00:06:07 Koray showed one example where a simple prompt, a claymation explainer of protein folding, produced stop-motion animation with voice-over explaining alpha helices and beta sheets. There's also the digital avatar feature, where you record yourself speaking a series of numbers for identity verification, then generate videos of yourself in various scenarios.

00:06:32 Google's calling them personalized memes. All Omni videos include Google's SynthID digital watermark. The enterprise implications are obvious but the near-term pitch is consumer. Google seems to be aiming for what it calls the creative tool chasm that very few video models have crossed.

00:06:52 There's a caveat from the DeepMind team: editing prompts need to be highly specific, or Omni will over-edit or alter things you wanted to keep.

00:07:02

Search overhaul and the labor shift

00:07:02 Google also overhauled its search box. This is the first redesign in twenty-five years. Users can now ask longer queries, upload photos and videos, and use Gemini 3.5 Flash-powered agents to automate searches. But there's a tension in the room. Google is facing a lawsuit after a man nearly committed a mass casualty event and died by suicide following weeks of chatting with Gemini last year.

00:07:31 The implications of deploying powerful autonomous agents more broadly are immediate. Google says 3.5 Flash has strengthened its cyber and CBRN safeguards and is better calibrated to engage with sensitive questions rather than refuse them outright. On a different track, Meta is mandating transfers for more than 7,000 workers into new AI teams, including a new internal AI agent codenamed Hatch and a cloud infrastructure team.

00:08:02 The Guardian's Varsha Bansal reports that engineers received notices over the weekend that they'd be reporting to the new teams by the end of the week. Meta made a similar move last month with 1,000 engineers onto a data labeling team called Applied AI, first asking for volunteers, then telling workers transfers aren't optional.

00:08:26

Agent maturity and what's next

00:08:26 On the agent craft side, Ara Khan from Cline posted a framework called Don't Build Slop. His four levels of agent maturity run from framework prototyping to cloud-native fleets, with five concrete rules for writing agent code in between. The most specific claim is about prompt length.

00:08:47 The prompt for GPT-5.3 is one-third the size of what was written for GPT-5. Frontier models are so capable that longer system prompts cause sensory overload and degrade performance. His rule: every single thing you add to an agent risks making it worse. Also worth noting: Andrej Karpathy's return to the frontier lab scene got reactions.

00:09:12 Noam Brown framed it positively, happy he's at any frontier lab pushing the field forward rather than framing it as zero-sum among the labs. A few smaller items from today's feed: the HVM3 wnf bug that Taelin flagged, the 314 npm packages compromised earlier this week by the atool maintainer account, and the ongoing Qwen model race with community interest in the 122 billion parameters and a new 27 billion variant.

00:09:43 In the coming weeks, the thing to watch is how 3.5 Flash performs in long-running agent loops beyond Google's internal tests, whether the 3.5 Pro and 3.5 Flash division of labor pattern stabilizes, and how many of these tools actually reach production workflows versus staying in demos.

00:10:04 That's the local reading. — Seln.