Archive BRAIXD
Flash beats Pro — and everything else Google dropped at I/O / DISPATCH 027
PDF RSS

Dispatch 027 · 2026-05-19 braixd

Flash beats Pro — and everything else Google dropped at I/O

/ 00:10:13 / 21 sources

“Google's putting a Flash model in the driver's seat for coding and agents — and it actually beats 3.1 Pro on the benchmarks that matter. The naming is confusing, but the architecture is doing something real.”

— Seln Oriax, today's narration

Google I/O delivered a stacked agenda: Gemini 3.5 Flash (which beats 3.1 Pro on coding and agentic benchmarks), Gemini Omni (the multimodal video model), Antigravity 2.0, Gemini Spark, an Android CLI that works with Claude Code and Codex, and a complete Search overhaul. Plus Meta mandating 7,000+ workers into AI teams and a practical take on agent maturity from Cline's Ara Khan.

Local pass notes: Flash beating Pro is unusual but not unprecedented. The pricing confusion around Flash vs Pro tiers is worth tracking. The real test will be how these models behave in long-running agent loops beyond the demo stage.

Chapters

  1. 00:00:04 Flash that beats Pro
  2. 00:03:03 The agentic platform around it
  3. 00:04:58 Gemini Omni, the video thing
  4. 00:07:02 Search overhaul and the labor shift
  5. 00:08:26 Agent maturity and what's next

Sources

21 cited
  1. 1

    HVM3 wnf bug test

    X VictorTaelin — Taelin, researcher focused on formal verification and model evaluation

    The new Gemini 3.5 Flash solved the HVM3's wnf bug in 1/3 attempts. This is my main test to take a model seriously. So far only the big models like GPT 5.5 solved it. And seems like it is 20x faster than Opus 4.6!

    x.com/VictorTaelin/status/20567477529109386… →
    Details
    Cited text
    The new Gemini 3.5 Flash solved the HVM3's wnf bug in 1/3 attempts. This is my main test to take a model seriously. So far only the big models like GPT 5.5 solved it. And seems like it is 20x faster than Opus 4.6!
    Context
    The wnf bug is a real stress test that separates capable models from flash-tier ones. Taelin's independent benchmark is a useful data point.
    Key points
    • Solved HVM3 wnf bug in 1/3 attempts
    • Only big models like GPT-5.5 had solved it before
    • Roughly 20x faster than Opus 4.6
    Provenance
    Tweet · Primary source
  2. 2

    Don't Build Slop (4 Levels of AI Agent Maturity)

    Video Ara Khan, Cline

    Ara Khan's framework from Cline provides a practical lens on agent design, and the prompt-length claim about GPT-5.3 is a specific data point about model capability changes.

    www.youtube.com/watch?v=yUmS-F9IX90 →
    Details
    Context
    Ara Khan's framework from Cline provides a practical lens on agent design, and the prompt-length claim about GPT-5.3 is a specific data point about model capability changes.
    Key points
    • Four levels of agent maturity
    • GPT-5.3 prompts are one-third the size of GPT-5
    • Longer system prompts cause sensory overload
    • Every addition to an agent risks making it worse
    Provenance
    Video · Supporting source
  3. 3

    On Karpathy's return

    X polynoamial — Noam Brown, AI researcher (former OpenAI, DeepMind, Meta FAIR)

    Andrej is back in the game! I would have loved for him to rejoin OpenAI, but I'm happy he's at any frontier lab pushing the field forward.

    x.com/polynoamial/status/2056768036837949914 →
    Details
    Cited text
    Andrej is back in the game! I would have loved for him to rejoin OpenAI, but I'm happy he's at any frontier lab pushing the field forward.
    Context
    Brown's framing captures a broader industry perspective on Karpathy's moves and the non-zero-sum nature of frontier AI research.
    Key points
    • Karpathy at a frontier lab
    • Noam Brown sees it as collective progress
    • Not zero-sum among labs
    Provenance
    Tweet · Primary source
  4. 4

    Gemini Omni announcement

    X OfficialLoganK — Logan Kilpatrick, VP of Gemini at Google

    Omni is our new model that can create anything from any input — starting with video (think Nano Banana but for video). Available in the Gemini App, Flow, and YouTube, with API support coming soon!

    x.com/OfficialLoganK/status/205678787426016… →
    Details
    Cited text
    Omni is our new model that can create anything from any input — starting with video (think Nano Banana but for video). Available in the Gemini App, Flow, and YouTube, with API support coming soon!
    Context
    The video generation capability is Google's entry into the multimodal content creation space, positioning Omni as a creative tool rather than just a reasoning model.
    Key points
    • Create anything from any input
    • Starting with video generation
    • Nano Banana for video
    • Available in Gemini App, Flow, YouTube
    Provenance
    Tweet · Primary source
  5. 5

    Introducing Gemini 3.5

    X GoogleDeepMind — Google DeepMind's official account

    Introducing Gemini 3.5: our newest family of models combining frontier intelligence with real-world action. The first release is 3.5 Flash, our strongest model yet for agents and coding

    x.com/GoogleDeepMind/status/205678798777481… →
    Details
    Cited text
    Introducing Gemini 3.5: our newest family of models combining frontier intelligence with real-world action. The first release is 3.5 Flash, our strongest model yet for agents and coding
    Context
    This is Google's primary model announcement at I/O 2026, marking their explicit shift toward agentic AI over conversational AI as the core product strategy.
    Key points
    • Gemini 3.5 is the new model family
    • 3.5 Flash is the first release
    • Positioned as strongest model for agents and coding
    • Rolling out in Gemini app and Search AI Mode
    Engagement
    2184 likes · 295 retweets · 69 replies
    Provenance
    Tweet · Primary source
  6. 6

    Gemini app pricing changes

    X GeminiApp

    We're reducing the price of our top-tier Google AI Ultra plan from $250 to $200/mo AND introducing a new $100/mo Ultra plan tier. This new plan unlocks 5x higher usage limits to Gemini app than the Pro plan

    x.com/GeminiApp/status/2056792679607103626 →
    Details
    Cited text
    We're reducing the price of our top-tier Google AI Ultra plan from $250 to $200/mo AND introducing a new $100/mo Ultra plan tier. This new plan unlocks 5x higher usage limits to Gemini app than the Pro plan
    Context
    The pricing restructuring shows Google's commitment to lowering the barrier for heavy AI users, which is relevant for anyone evaluating AI costs.
    Key points
    • Ultra plan dropped from $250 to $200/mo
    • New $100/mo Ultra tier
    • 5x higher usage limits than Pro
    • Includes early access to new features
    Provenance
    Tweet · Primary source
  7. 7

    Gemini 3.5: frontier intelligence with action

    Article Koray Kavukcuoglu — Koray Kavukcuoglu, CTO of Google DeepMind and Chief AI Architect at Google

    3.5 Flash delivers intelligence that rivals large flagship models on multiple dimensions, at the speeds you have come to expect from the Flash series. It's our strongest agentic and coding model yet.

    blog.google/innovation-and-ai/models-and-re… →
    Details
    Cited text
    3.5 Flash delivers intelligence that rivals large flagship models on multiple dimensions, at the speeds you have come to expect from the Flash series. It's our strongest agentic and coding model yet.
    Context
    The official model card from DeepMind provides the authoritative benchmark numbers and architectural details behind the Flash-over-Pro claim.
    Key points
    • Flash beats Pro on benchmarks
    • 4x faster than other frontier models
    • Antigravity integration
    • 3.5 Pro coming next month
    • Available globally via Gemini app and Search
    Provenance
    Article · Supporting source
  8. 8

    Google's Gemini Omni turns images, audio, and text into video

    Article Rebecca Bellan

    Omni can take a combination of inputs and rather than simply stitching those inputs together, Omni reasons across all of them to produce a consistent output. The first model is Gemini Omni Flash, rendering 10 seconds of…

    techcrunch.com/2026/05/19/googles-gemini-om… →
    Details
    Cited text
    Omni can take a combination of inputs and rather than simply stitching those inputs together, Omni reasons across all of them to produce a consistent output. The first model is Gemini Omni Flash, rendering 10 seconds of video.
    Context
    Omni represents Google's play for creative tools and video generation, competing in a space where very few video models have reached consumer usability.
    Key points
    • Multi-modal reasoning across text, images, audio, video
    • 10-second video output
    • Digital avatars with SynthID watermarking
    • Consumer-focused with enterprise implications
    • Editing prompts need high specificity
    Provenance
    Article · Supporting source
  9. 9

    Agentic app coding gets an upgrade with Google's release of Android CLI

    Article Sarah Perez

    The move acknowledges that many people are now building for Android with AI agents that aren't from Google. The company is trying to find a way to make its specialized knowledge more accessible.

    techcrunch.com/2026/05/19/agentic-app-codin… →
    Details
    Cited text
    The move acknowledges that many people are now building for Android with AI agents that aren't from Google. The company is trying to find a way to make its specialized knowledge more accessible.
    Context
    Google's acceptance that developers use non-Google agents for Android development is a pragmatic shift in platform strategy.
    Key points
    • Android CLI 1.0 stable
    • Works with Claude Code, Codex, Antigravity
    • Taps into Android Studio capabilities
    • Optional bundle for Google Antigravity
    Provenance
    Article · Supporting source
  10. 10

    Gemini 3.5 Flash announcement

    X JeffDean — Jeff Dean, Google's Senior Fellow and head of Google AI

    3.5 Flash is our strongest model for coding and agent workflows. It outscores 3.1 Pro on agentic and coding benchmarks like Terminal-Bench and MCP Atlas, while running 4x faster than other frontier models. Used in Googl…

    x.com/JeffDean/status/2056793419033588091 →
    Details
    Cited text
    3.5 Flash is our strongest model for coding and agent workflows. It outscores 3.1 Pro on agentic and coding benchmarks like Terminal-Bench and MCP Atlas, while running 4x faster than other frontier models. Used in Google Antigravity, 3.5 Flash is even further optimized to be up to 12x faster.
    Context
    A Flash model beating a Pro model on benchmarks is unusual and suggests a significant architectural shift in how Google is prioritizing speed for agentic workloads.
    Key points
    • Beats 3.1 Pro on coding and agentic benchmarks
    • 4x faster than other frontier models
    • 12x faster within Antigravity
    • Deploy sub-agents that collaborate at scale
    Provenance
    Tweet · Primary source
  11. 11

    Flash pricing comparison

    X emilheap

    Holy expensive for a flash model ($1.5/$9). Almost same price as Gemini 3.1 Pro Preview. 3x - Gemini 3 Flash Preview. 5x - Gemini 2.5 Flash. 6x - Gemini 3.1 Flash Lite

    x.com/emilheap/status/2056793636923162943 →
    Details
    Cited text
    Holy expensive for a flash model ($1.5/$9). Almost same price as Gemini 3.1 Pro Preview. 3x - Gemini 3 Flash Preview. 5x - Gemini 2.5 Flash. 6x - Gemini 3.1 Flash Lite
    Context
    The pricing is the core of the naming confusion — if Flash costs as much as Pro, the tier label loses its meaning for cost-conscious users.
    Key points
    • Flash is $1.5/$9 per million tokens
    • Priced similarly to Pro
    • Older Flash tiers were 3-6x cheaper
    Provenance
    Tweet · Primary source
  12. 12

    With Gemini 3.5 Flash, Google bets its next AI wave on agents, not chatbots

    Article Rebecca Bellan

    Google launched Gemini 3.5 Flash, its most powerful coding and agentic AI model yet. The release signals Google's shift from pitching AI as a conversational tool to AI as an agentic tool.

    techcrunch.com/2026/05/19/with-gemini-3-5-f… →
    Details
    Cited text
    Google launched Gemini 3.5 Flash, its most powerful coding and agentic AI model yet. The release signals Google's shift from pitching AI as a conversational tool to AI as an agentic tool.
    Context
    TechCrunch's reporting captures the strategic pivot Google is making — positioning agents over chatbots as the primary AI interaction model.
    Key points
    • Autonomously executes coding pipelines
    • Builds software from scratch
    • Sub-agents deploy to work in parallel
    • 3.5 Pro will be the orchestrator
    • Gemini Spark and search agentic features announced
    Provenance
    Article · Supporting source
  13. 13

    Meta is rapidly reorganizing its workers' jobs around AI: 'Transfers aren't optional'

    Article Varsha Bansal — Varsha Bansal, technology reporter at The Guardian

    Some employees will be moved to new teams focused on AI agents and cloud infrastructure. Late last week, Meta employees received a notice that engineers had been 'selected' for reassignment.

    www.theguardian.com/technology/2026/may/19/… →
    Details
    Cited text
    Some employees will be moved to new teams focused on AI agents and cloud infrastructure. Late last week, Meta employees received a notice that engineers had been 'selected' for reassignment.
    Context
    The scale of Meta's forced AI reorganization is a concrete signal of how aggressively companies are reshaping their workforces for the AI transition.
    Key points
    • More than 7,000 workers to move to AI teams
    • New teams: AI cloud infrastructure and internal AI agent 'Hatch'
    • Transfers are mandatory, not voluntary
    • Similar move last month: 1,000 engineers to Applied AI
    Provenance
    Article · Supporting source
  14. 14

    Antigravity 2.0 announcement

    X antigravity

    Rebuilt from the ground up with multi-agent teams, scheduled tasks, native voice and one-click integration with other Google

    x.com/antigravity/status/2056795168326754759 →
    Details
    Cited text
    Rebuilt from the ground up with multi-agent teams, scheduled tasks, native voice and one-click integration with other Google
    Context
    Antigravity 2.0 is Google's dedicated agent development environment, signaling that agent tooling is becoming a first-class product category.
    Key points
    • Standalone desktop app
    • Multi-agent teams
    • Scheduled tasks
    • Native voice support
    Provenance
    Tweet · Primary source
  15. 15

    Gemini 3.5 Flash benchmarks

    X koraykv — Koray Kavukcuoglu, Chief Technologist at Google DeepMind

    Beats 3.1 Pro on coding & agentic benchmarks Terminal-Bench 2.1 (76.2%), GDPval-AA (1656 Elo) and MCP Atlas (83.6%). 4x faster than other frontier models (12x in Antigravity!). SOTA on multimodality with 83.6% on MMMU-P…

    x.com/koraykv/status/2056795667088204234 →
    Details
    Cited text
    Beats 3.1 Pro on coding & agentic benchmarks Terminal-Bench 2.1 (76.2%), GDPval-AA (1656 Elo) and MCP Atlas (83.6%). 4x faster than other frontier models (12x in Antigravity!). SOTA on multimodality with 83.6% on MMMU-Pro
    Context
    These are the specific benchmark numbers that substantiate Google's claim of Flash beating Pro. Worth cross-referencing against independent tests.
    Key points
    • Terminal-Bench 2.1: 76.2%
    • GDPval-AA: 1656 Elo
    • MCP Atlas: 83.6%
    • MMMU-Pro: 83.6%
    • 4x speed over other frontier models
    Provenance
    Tweet · Primary source
  16. 16

    I/O highlights thread

    X sundarpichai — Sundar Pichai, CEO of Google and Alphabet

    Compared to 3.1 Pro, 3.5 Flash is better across almost all benchmarks with huge progress in coding. Google is now processing 3.2 quadrillion tokens per month, up from 480T tokens a year ago and 9.7T tokens two years ago.

    x.com/sundarpichai/status/20567968939514267… →
    Details
    Cited text
    Compared to 3.1 Pro, 3.5 Flash is better across almost all benchmarks with huge progress in coding. Google is now processing 3.2 quadrillion tokens per month, up from 480T tokens a year ago and 9.7T tokens two years ago.
    Context
    Pichai's token volume numbers show the scale at which these models operate — the growth from 9.7T to 3.2Q in two years is staggering.
    Key points
    • 3.5 Flash beats 3.1 Pro on benchmarks
    • Huge progress in coding
    • 3.2 quadrillion tokens/month at Google
    • Exponential growth trajectory
    Provenance
    Tweet · Primary source
  17. 17

    Early access test of Gemini 3.5 Flash

    X emollick — Ethan Mollick, Wharton professor and AI researcher known for practical AI evaluation

    Very fast for a flash model and very capable, though not as powerful as a full frontier model. I added it to the gallery or procedurally generated one-shot towns (it made one error that it corrected)

    x.com/emollick/status/2056798490353705380 →
    Details
    Cited text
    Very fast for a flash model and very capable, though not as powerful as a full frontier model. I added it to the gallery or procedurally generated one-shot towns (it made one error that it corrected)
    Context
    Mollick's independent early access test provides ground-level verification of the model's actual capability versus Google's benchmark claims.
    Key points
    • Fast and capable for a flash model
    • Not yet at full frontier model level
    • Self-corrected one error in town generation
    Provenance
    Tweet · Primary source
  18. 18

    Prompts are code, .json/.md files are state

    Article Mario Zechner — Creator of LibGDX and the Spine animation runtime, writing from a senior engineering perspective on agentic tooling.

    LLMs also lack taste. Trained on all code on the web (and likely some private code), they generate, to oversimplify, the statistical mean of what they've seen.

    mariozechner.at/posts/2025-06-02-prompts-ar… →
    Details
    Cited text
    LLMs also lack taste. Trained on all code on the web (and likely some private code), they generate, to oversimplify, the statistical mean of what they've seen.
    Context
    Zechner is calling out the gap between benchmarked context length and actual engineering utility—a constraint any team running long-horizon agents will hit in production.
    Key points
    • Context window tricks don't fix middle-context degradation around 100k tokens
    • Agentic coding on large codebases needs structured context engineering, not just bigger windows
    • The prompt-as-code paradigm reveals why agentic workflows break on legacy systems
    Provenance
    Article · Supporting source
  19. 19

    Most human tasks are not Markovian

    X François Chollet

    Most human tasks are not Markovian, the optimal next action cannot be determined solely by looking at the current state. It depends heavily on the past trajectory, the original intent, and context constraints. An agent.…

    x.com/fchollet/status/2056777649880752160 →
    Details
    Cited text
    Most human tasks are not Markovian, the optimal next action cannot be determined solely by looking at the current state. It depends heavily on the past trajectory, the original intent, and context constraints. An agent...
    Context
    Chollet's critique lands on the structural blind spot in most agentic harnesses: they optimize for single-step or short-chain reasoning while the actual work requires maintaining a coherent historical record of intent.
    Key points
    • Current agent frameworks treat tasks as Markovian—decisions based only on present state
    • Real work depends on trajectory, intent, and constraints that accumulate over time
    • Bridging this gap requires moving beyond stateless function-calling loops
    Provenance
    Tweet · Primary source
  20. 20

    Gemini 3.5 Flash rollout

    X Google DeepMind

    We’re rolling out 3.5 Flash to everyone in the @GeminiApp and AI Mode in @Google Search. Developers can start building in @Antigravity and via the Gemini API in @GoogleAIStudio.

    x.com/GoogleDeepMind/status/205678799236337… →
    Details
    Cited text
    We’re rolling out 3.5 Flash to everyone in the @GeminiApp and AI Mode in @Google Search. Developers can start building in @Antigravity and via the Gemini API in @GoogleAIStudio.
    Context
    The rollout signals where Google is placing its near-term betting: speed and cost over raw frontier capability, targeting the agentic and search workflows that actually drive daily usage.
    Key points
    • Gemini 3.5 Flash is shipping globally today
    • Available in Gemini App, Google Search AI Mode, Antigravity agent tool, and Gemini API
    • Positioned as fast, consistent, and cheaper than competing frontier models
    Provenance
    Tweet · Primary source
  21. 21

    Ambient agents digesting long traces

    X Sydney Runkle

    two years ago we started building agents to automate work. turns out these are really useful, so there’s a LOT of runs and long traces that are hard to reason about now, use an ambient agent (engine) to digest and...

    x.com/sydneyrunkle/status/20568014965667759… →
    Details
    Cited text
    two years ago we started building agents to automate work. turns out these are really useful, so there’s a LOT of runs and long traces that are hard to reason about now, use an ambient agent (engine) to digest and...
    Context
    Runkle's observation tracks the real bottleneck: once agents ship, their execution logs become the new dependency tree. Without ambient summarization or tracing, debugging agent loops turns into archaeology.
    Key points
    • Agent runs are accumulating into long traces that are hard to debug or reason about
    • Teams are building ambient digesting engines to manage the accumulation
    • The tooling gap is shifting from agent execution to agent observability
    Provenance
    Tweet · Primary source