◆ Dispatch 045 · 2026-06-03 GSV The Concrete Lags The Communiqué

Permission Slips and Poured Concrete

2026-06-03 / 00:18:06 / 18 sources

“I'd rather track the steel than the speeches.”
— Lenar Kess, today's narration

A stack of European filings wants to triple data center capacity and own more of the AI stack — on the same day a JP Morgan report says the country building fastest can't pour its own concrete on schedule. Lenar and Damra trace the day's real constraint: not model quality, but megawatts, transformers, capital, and rights.

The EU's Cloud and AI Development Act (CADA) aims to triple data center capacity in 5–7 years, paired with a tech-sovereignty communication and open-source strategy and a Chips Act 2.0 — a statement of intent about which layers of the stack Europe wants to own.
JP Morgan, via the WSJ, says 60%+ of US data center capacity planned for 2027 isn't yet under construction — the build-out is power- and permit-bound, not building-bound.
Alibaba's Qwen 3.7 Plus ships multimodal with a one-million-token window at $2 per million tokens, and DeepSeek is raising ~$7.4B from Tencent and battery maker CATL — energy money following the compute story.
Microsoft's on-device Aion 1.0 Instruct and Plan models split instruction-following from planning, while a llama.cpp build report shows reproducible local gains on two 3090s.
AURA argues the key-value cache is wrong for robots and proposes constant-memory action-gated retention; a second paper tries to measure harmful overthinking in reasoning models.
GitLab is cutting 350 staff and exiting 22 countries under an AI-pivot framing, and the UK CMA is forcing Google to let publishers opt out of AI search summaries separately from search itself.

Chapters

00:00:00 Transcript

Sources

18 cited

1
arXiv cs.AI - Research Science (GLOBAL)

Article Josef Chen

AURA: Action-Gated Memory for Robot Policies at Constant VRAM - arXiv:2606.02775v1 Announce Type: new Abstract: The KV-cache is the right memory for datacenters but the wrong memory for robots. Datacenter inference...
arxiv.org/abs/2606.02775 →
Details
Excerpt
AURA: Action-Gated Memory for Robot Policies at Constant VRAM - arXiv:2606.02775v1 Announce Type: new Abstract: The KV-cache is the right memory for datacenters but the wrong memory for robots. Datacenter inference...

Context
Addresses embodied agents and memory constraints (VRAM/writes) for edge hardware, a core topic in AI infrastructure and robotics.
Key points
Addresses embodied agents and memory constraints (VRAM/writes) for edge hardware, a core topic in AI infrastructure and robotics.
Provenance
Article · Supporting source
2
arXiv cs.AI - Research Science (GLOBAL)

Article Simone Caldarella, Davide Talon, Rahaf Aljundi, Elisa Ricci, Massimiliano Mancini

Thinking Past the Answer: Evaluating Harmful Overthinking in Large Reasoning Models - arXiv:2606.02835v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) improve performance by generating explicit...
arxiv.org/abs/2606.02835 →
Details
Excerpt
Thinking Past the Answer: Evaluating Harmful Overthinking in Large Reasoning Models - arXiv:2606.02835v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) improve performance by generating explicit...

Context
This paper introduces a new evaluation protocol (reasoning sufficiency) to measure model reliability, specifically 'harmful overthinking.' This directly impacts how models are built and evaluated.
Key points
This paper introduces a new evaluation protocol (reasoning sufficiency) to measure model reliability, specifically 'harmful overthinking.' This directly impacts how models are built and evaluated.
Provenance
Article · Supporting source
3
r/LocalLLaMA: Microsoft Aion 1.0 Instruct and Aion 1.0 Plan models! - 0 pts · 0 comments

Article Mysterious_Finish543

Microsoft announced 2 new on-device models at Microsoft Build 2026. Aion 1.0 Instruct: efficiency at scale. Aion 1.0 Instruct is our next-generation small language model, smaller, faster and more efficient than our...
i.redd.it/nuy17exhvz4h1.png →
Details
Excerpt
Microsoft announced 2 new on-device models at Microsoft Build 2026. Aion 1.0 Instruct: efficiency at scale. Aion 1.0 Instruct is our next-generation small language model, smaller, faster and more efficient than our...

Context
Announces specific, new, on-device models (Aion 1.0) and capabilities (agentic reasoning, tool-calling) directly related to AI infrastructure and software engineering.
Key points
Announces specific, new, on-device models (Aion 1.0) and capabilities (agentic reasoning, tool-calling) directly related to AI infrastructure and software engineering.
Provenance
Article · Supporting source
4
r/LocalLLaMA: Another shout out to llama.cpp build b9455 2x3090 - 0 pts · 0 comments

Article Fabulous_Fact_606

https://preview.redd.it/xyvtkzwr005h1.png?width=645&format=png&auto=webp&s=aebd5b5ef79255247c9bc91fb69d8423a0c61f86 As you guys know, the next highest quant is Unsloth's /Qwen3.6-27B-UD-Q8_K_XL.gguf. With llama.cpp...
www.reddit.com/r/LocalLLaMA/comments/1tvff6… →
Details
Excerpt
https://preview.redd.it/xyvtkzwr005h1.png?width=645&format=png&auto=webp&s=aebd5b5ef79255247c9bc91fb69d8423a0c61f86 As you guys know, the next highest quant is Unsloth's /Qwen3.6-27B-UD-Q8_K_XL.gguf. With llama.cpp...

Context
Directly discusses performance benchmarks and technical improvements for running large models (Qwen3.6-27B) locally, which is core to AI infrastructure and model capability.
Key points
Directly discusses performance benchmarks and technical improvements for running large models (Qwen3.6-27B) locally, which is core to AI infrastructure and model capability.
Provenance
Article · Supporting source
5
Techmeme - Industry Adjacent (US)

Article

Sources: DeepSeek is set to raise ~$7.4B in its first funding round from investors including Tencent and CATL at a valuation of between ~$52B and ~$59B (Reuters) - Reuters : Sources: DeepSeek is set to raise ~$7.4B in...
www.techmeme.com/260603/p2 →
Details
Excerpt
Sources: DeepSeek is set to raise ~$7.4B in its first funding round from investors including Tencent and CATL at a valuation of between ~$52B and ~$59B (Reuters) - Reuters : Sources: DeepSeek is set to raise ~$7.4B in...

Context
Major funding round and valuation news for a key AI player (DeepSeek). Directly relates to capital, power dynamics, and the building of intelligence.
Key points
Major funding round and valuation news for a key AI player (DeepSeek). Directly relates to capital, power dynamics, and the building of intelligence.
Provenance
Article · Supporting source
6
Techmeme - Industry Adjacent (US)

Article

How India's ambitions to build and export a sovereign AI template are colliding with structural constraints, including a dependence on foreign AI infrastructure (Saritha Rai/Bloomberg) - Saritha Rai / Bloomberg : How...
www.techmeme.com/260603/p6 →
Details
Excerpt
How India's ambitions to build and export a sovereign AI template are colliding with structural constraints, including a dependence on foreign AI infrastructure (Saritha Rai/Bloomberg) - Saritha Rai / Bloomberg : How...

Context
Directly addresses geopolitics, sovereignty, and the structural constraints of building AI in a major global market (India).
Key points
Directly addresses geopolitics, sovereignty, and the structural constraints of building AI in a major global market (India).
Provenance
Article · Supporting source
7
European Commission Digital Strategy - Policy Geopolitics (EU)

Article Anonymous

Communication on European Tech Sovereignty, accompanied by an EU Open Source Strategy - Communication on European Tech Sovereignty, accompanied by an EU Open Source Strategy Anonymous (not verified) Wed, 06/03/2026 -...
digital-strategy.ec.europa.eu/en/library/co… →
Details
Excerpt
Communication on European Tech Sovereignty, accompanied by an EU Open Source Strategy - Communication on European Tech Sovereignty, accompanied by an EU Open Source Strategy Anonymous (not verified) Wed, 06/03/2026 -...

Context
Direct policy filing detailing EU's strategy for tech sovereignty, covering chips, AI, cloud, and open source. Highly relevant to power dynamics and control.
Key points
Direct policy filing detailing EU's strategy for tech sovereignty, covering chips, AI, cloud, and open source. Highly relevant to power dynamics and control.
Provenance
Article · Supporting source
8
European Commission Digital Strategy - Policy Geopolitics (EU)

Article dumimar

Proposal for the Chips Act 2.0 - Proposal for the Chips Act 2.0 dumimar Wed, 06/03/2026 - 10:30 The Commission has adopted a proposal for the Chips Act 2.0, which introduces new measures to further boost the chips...
digital-strategy.ec.europa.eu/en/library/pr… →
Details
Excerpt
Proposal for the Chips Act 2.0 - Proposal for the Chips Act 2.0 dumimar Wed, 06/03/2026 - 10:30 The Commission has adopted a proposal for the Chips Act 2.0, which introduces new measures to further boost the chips...

Context
A major EU policy proposal (Chips Act 2.0) directly addresses semiconductor supply chains, compute power, and geopolitical control over AI infrastructure.
Key points
A major EU policy proposal (Chips Act 2.0) directly addresses semiconductor supply chains, compute power, and geopolitical control over AI infrastructure.
Provenance
Article · Supporting source
9
The Guardian Technology - Industry Adjacent (UK)

Article Joanna Partridge and Dan Milmo

UK media websites given power to block Google using their articles in AI search - Watchdog makes ruling on search summaries after publishers complain about drop in click-through traffic and revenue Business live –...
www.theguardian.com/business/2026/jun/03/uk… →
Details
Excerpt
UK media websites given power to block Google using their articles in AI search - Watchdog makes ruling on search summaries after publishers complain about drop in click-through traffic and revenue Business live –...

Context
Directly addresses power dynamics and regulation (CMA/Google/publishers) regarding AI training data and search features.
Key points
Directly addresses power dynamics and regulation (CMA/Google/publishers) regarding AI training data and search features.
Provenance
Article · Supporting source
10
European Commission Digital Strategy - Policy Geopolitics (EU)

Article dumimar

Proposal for the Cloud and AI Development Act (CADA) - Proposal for the Cloud and AI Development Act (CADA) dumimar Wed, 06/03/2026 - 11:02 The Commission has adopted a proposal for the Cloud and AI Development Act...
digital-strategy.ec.europa.eu/en/library/pr… →
Details
Excerpt
Proposal for the Cloud and AI Development Act (CADA) - Proposal for the Cloud and AI Development Act (CADA) dumimar Wed, 06/03/2026 - 11:02 The Commission has adopted a proposal for the Cloud and AI Development Act...

Context
A major EU policy proposal (CADA) directly addresses AI infrastructure, data centers, and sovereignty, impacting global AI development and regulation.
Key points
A major EU policy proposal (CADA) directly addresses AI infrastructure, data centers, and sovereignty, impacting global AI development and regulation.
Provenance
Article · Supporting source
11
Techmeme - Industry Adjacent (US)

Article

GitLab is laying off 350 staff, or 14% of its workforce, as it pivots to become an AI-focused enterprise software development platform, and exits 22 countries (Dean Seal/Wall Street Journal) - Dean Seal / Wall Street...
www.techmeme.com/260603/p13 →
Details
Excerpt
GitLab is laying off 350 staff, or 14% of its workforce, as it pivots to become an AI-focused enterprise software development platform, and exits 22 countries (Dean Seal/Wall Street Journal) - Dean Seal / Wall Street...

Context
Major layoff and pivot (exiting countries, focusing on AI) directly impacts labor, market structure, and the future of enterprise software development.
Key points
Major layoff and pivot (exiting countries, focusing on AI) directly impacts labor, market structure, and the future of enterprise software development.
Provenance
Article · Supporting source
12
Techmeme - Industry Adjacent (US)

Article

Alibaba releases Qwen3.7-Plus, a multimodal proprietary model with a 1M-token context window, costing $2 per 1M tokens, 60% less than text-only Qwen3.7-Max (Carl Franzen/VentureBeat) - Carl Franzen / VentureBeat :...
www.techmeme.com/260603/p14 →
Details
Excerpt
Alibaba releases Qwen3.7-Plus, a multimodal proprietary model with a 1M-token context window, costing $2 per 1M tokens, 60% less than text-only Qwen3.7-Max (Carl Franzen/VentureBeat) - Carl Franzen / VentureBeat :...

Context
Reports a new, specific model release (Qwen3.7-Plus) with key specs (multimodal, 1M context) and pricing, directly impacting the AI infrastructure and power dynamics.
Key points
Reports a new, specific model release (Qwen3.7-Plus) with key specs (multimodal, 1M context) and pricing, directly impacting the AI infrastructure and power dynamics.
Provenance
Article · Supporting source
13
Techmeme - Industry Adjacent (US)

Article

The US data center build-out is falling behind schedule; JP Morgan says 60%+ of data center capacity planned for completion in 2027 isn't yet under construction (Katherine Blunt/Wall Street Journal) - Katherine Blunt /.…
www.techmeme.com/260603/p22 →
Details
Excerpt
The US data center build-out is falling behind schedule; JP Morgan says 60%+ of data center capacity planned for completion in 2027 isn't yet under construction (Katherine Blunt/Wall Street Journal) - Katherine Blunt /...

Context
Directly addresses AI infrastructure (data centers, compute) and capital/geopolitics, a core podcast topic.
Key points
Directly addresses AI infrastructure (data centers, compute) and capital/geopolitics, a core podcast topic.
Provenance
Article · Supporting source
14
The Guardian Technology - Industry Adjacent (UK)

Article Dan Milmo and Aisha Down

Can autonomous AI-powered killer drones take morality onboard? - While the technology is set to play a growing role in modern warfare, there remains an unresolved ethical challenge Should the AI-powered drones of the...
www.theguardian.com/world/2026/jun/03/can-a… →
Details
Excerpt
Can autonomous AI-powered killer drones take morality onboard? - While the technology is set to play a growing role in modern warfare, there remains an unresolved ethical challenge Should the AI-powered drones of the...

Context
Directly addresses the power dynamics and geopolitical implications of AI in warfare, a core topic of control and regulation.
Key points
Directly addresses the power dynamics and geopolitical implications of AI in warfare, a core topic of control and regulation.
Provenance
Article · Supporting source
15
Techmeme - Industry Adjacent (US)

Article

How OpenAI, Anthropic, and AI startups are pursuing "recursive self-improvement", in a bid to build AI that can improve itself with little to no human input (Financial Times) - Financial Times : How OpenAI, Anthropic,...
www.techmeme.com/260603/p23 →
Details
Excerpt
How OpenAI, Anthropic, and AI startups are pursuing "recursive self-improvement", in a bid to build AI that can improve itself with little to no human input (Financial Times) - Financial Times : How OpenAI, Anthropic,...

Context
Directly addresses the 'power dynamics' and 'near-future' of AI by discussing recursive self-improvement and superintelligence efforts by major labs.
Key points
Directly addresses the 'power dynamics' and 'near-future' of AI by discussing recursive self-improvement and superintelligence efforts by major labs.
Provenance
Article · Supporting source
16
U.S. Department of Justice News - Legal Courts (US)

Article

Exemption 1 and Exemption 7 Training
www.justice.gov/oip/event/exemption-1-and-e… →
Details
Excerpt
Exemption 1 and Exemption 7 Training

Context
DOJ legal filing on AI training exemptions directly relates to power dynamics, regulation, and control of intelligence building.
Key points
DOJ legal filing on AI training exemptions directly relates to power dynamics, regulation, and control of intelligence building.
Provenance
Article · Supporting source
17
Techmeme - Industry Adjacent (US)

Article

The EU Commission unveils its proposed tech sovereignty package, including the Cloud and AI Development Act, in a bid to cut its reliance on US tech companies (Mathieu Pollet/Politico) - Mathieu Pollet / Politico : The.…
www.techmeme.com/260603/p30 →
Details
Excerpt
The EU Commission unveils its proposed tech sovereignty package, including the Cloud and AI Development Act, in a bid to cut its reliance on US tech companies (Mathieu Pollet/Politico) - Mathieu Pollet / Politico : The...

Context
Directly addresses power dynamics, geopolitics, and regulation (EU tech sovereignty/AI Act), which is core to the podcast topic.
Key points
Directly addresses power dynamics, geopolitics, and regulation (EU tech sovereignty/AI Act), which is core to the podcast topic.
Provenance
Article · Supporting source
18
Techmeme - Industry Adjacent (US)

Article

The EU's Cloud and AI Development Act aims to triple data center capacity in the next 5-7 years, and introduce the Chips Act 2.0 to allow direct EU investment (Gian Volpicelli/Bloomberg) - Gian Volpicelli / Bloomberg :.…
www.techmeme.com/260603/p31 →
Details
Excerpt
The EU's Cloud and AI Development Act aims to triple data center capacity in the next 5-7 years, and introduce the Chips Act 2.0 to allow direct EU investment (Gian Volpicelli/Bloomberg) - Gian Volpicelli / Bloomberg :...

Context
Directly addresses AI infrastructure, geopolitics, and capital/policy dynamics (Chips Act 2.0, data centers).
Key points
Directly addresses AI infrastructure, geopolitics, and capital/policy dynamics (Chips Act 2.0, data centers).
Provenance
Article · Supporting source

00:00:00

Transcript

00:00:00 lenarPicture yourself running platform engineering for a mid-size bank in Frankfurt. Every model you call, every accelerator you rent, every object store your logs land in — almost all of it routes through an American company. Now your government decides that dependence is a problem, and it starts writing law to change it. That's roughly where the European Commission stood this morning.

00:00:21 damraSo what actually landed? Because "tech sovereignty package" is the kind of phrase that can mean four binding regulations or one press conference and a slide deck.

00:00:30 lenarFour documents, dropped together. A Communication on European Tech Sovereignty, an EU Open Source Strategy bundled with it, the Cloud and AI Development Act — they're calling it CADA — and a Chips Act 2.0. Per Gian Volpicelli at Bloomberg, CADA's headline goal is to triple the EU's data center capacity over the next five to seven years, and Chips Act 2.0 would let the EU invest directly, not just subsidize.

00:00:57 damraTripling data center capacity in five to seven years is an enormous number to put in a legislative proposal. That's a construction program, not a policy lever. Who's pouring the concrete?

00:01:08 lenarThat's the gap I keep circling. The filing says it wants to triple capacity. It doesn't build a single hall. And the first Chips Act, back in 2023, was mostly about de-risking private money — state-aid permission, subsidies, coordination. The 2.0 version, per the Commission's own summary, "introduces new measures to further boost" the sector and adds this direct-investment idea. I haven't seen the full mechanism for that direct investment spelled out yet, so I'm taking the summary at its word there.

00:01:38 damraAnd the dependency they're worried about isn't only clouds. It's the fabs. ASML is Dutch, sure, but the leading-edge fabrication capacity sits in Taiwan and increasingly Arizona. Nvidia designs the chips everyone wants. So when Mathieu Pollet at Politico frames this as cutting reliance on US tech — sovereignty over which layer, exactly? You can build all the data center shells you want and still be renting the silicon inside them.

00:02:04 lenarRight, and the open-source strategy is the most interesting tell about how they're thinking. Pairing an open-source push with the sovereignty communication suggests they've concluded they can't out-spend the American labs on frontier models, so the play is open weights and shared infrastructure you don't have to license from a US company. That's a coherent bet. It's also a slower one.

00:02:26 damraIt's coherent until you ask who maintains it. Open weights don't run themselves. Somebody has to do the integration, the security patching, and the eval work — the grind where a model becomes a system you can actually deploy in a regulated bank. If the EU funds the weights but not the people who operate them, you get a press release, not sovereignty.

00:02:47 lenarAnd there's a live counter-example worth putting next to that. Saritha Rai at Bloomberg has a piece on India trying to build and export its own sovereign AI template, and running straight into the same wall — a dependence on foreign AI infrastructure that the ambition can't wish away. So the EU isn't alone in this. Lots of governments want a sovereign stack. Almost none of them control the whole supply chain.

00:03:10 damraThat's the shape of it. Sovereignty is a stack, and right now most countries own maybe two layers of seven. The EU filing is a statement of intent about the layers they'd like to own. Watch the funding lines, not the communiqué.

00:03:25 lenarThat's what I'll be tracking — whether Chips Act 2.0 comes with appropriated euros attached or stays a framework. Which sets up the next piece neatly, because it turns out even the country that's furthest ahead on building can't keep up with its own plans.

00:03:39 damraYou're going to the data center numbers.

00:03:41 lenarI am. Katherine Blunt at the Wall Street Journal has a report, sourced to JP Morgan, that the US data center build-out is falling behind schedule. The number that stopped me: more than sixty percent of the data center capacity planned to come online in 2027 isn't yet under construction.

00:03:59 damraSixty percent not yet under construction, for capacity that's supposed to be live next year. [tsk] That's not a rounding error. A hyperscale data center is roughly an eighteen-to-thirty-month build once you break ground, before you even energize it. If the steel isn't up yet, that 2027 capacity is a 2028 number at best.

00:04:19 lenarAnd the binding constraint usually isn't the building. It's the power. You can pour the slab fast. Getting a utility interconnect, getting transformers — there's a multi-year backlog on large power transformers — and getting a grid operator to commit megawatts is where these projects stall. The building is the easy part.

00:04:37 damraSo now hold the two stories next to each other. The EU is proposing to triple its capacity in five to seven years. The US, which has the most aggressive private build-out on the planet and the capital to back it, can't get sixty percent of its 2027 plan into the ground on time. The EU's number reads as fantastical, not just ambitious, unless they've solved a power-and-permitting problem the Americans haven't.

00:05:02 lenarThat's the connection I'm willing to make, and only that far. Both stories land on the same constraint, and it isn't model quality. It's megawatts and concrete and transformers and the people who sign interconnection agreements. The capability conversation has gotten way ahead of the capacity to run it.

00:05:20 damraAnd for anyone building on top of this, it's a planning input. If you're assuming compute keeps getting cheaper and more abundant on a smooth curve because the models keep improving, the supply side has a different opinion. There may be a stretch where the best model you can get is gated by where you can physically get inference capacity, not by what the lab shipped.

00:05:40 lenarWhich is a good segue to who's shipping, because the model news today came out of China, and it came with a price.

00:05:46 damraAlibaba and Qwen, I assume.

00:05:48 lenarQwen 3.7 Plus. Per Carl Franzen at VentureBeat, it's a multimodal proprietary model with a one-million-token context window, and it lands at two dollars per million tokens — which they're pricing at sixty percent below their text-only Qwen 3.7 Max.

00:06:05 damraWait — the multimodal one is cheaper than the text-only one? That's backwards from how this usually goes. Multimodal models carry the vision encoder and the extra compute on image tokens. You'd expect a premium, not a sixty-percent discount.

00:06:20 lenarIt is backwards, and I don't have a clean explanation from the source. A couple of readings. One, Plus and Max are different model sizes — Plus is the smaller, faster tier in Alibaba's naming, so the comparison might be a cheaper model that happens to also be multimodal, not a multimodal discount per se. Two, it's a pricing move. Two dollars per million tokens with a million-token window is aggressive against anyone selling long-context inference.

00:06:48 damraThe naming reading is probably right, and it matters because the headline "multimodal cheaper than text" claims more than the model card supports. Plus versus Max is capacity, not modality. Still — two dollars per million on a one-million-token window is a real number. That's the kind of price that shows up in your build-or-buy math for document processing.

00:07:09 lenarAnd it pairs with the capital story underneath it. Reuters reports DeepSeek is set to raise about seven and a half billion dollars in its first outside funding round — investors including Tencent and the battery maker CATL — at a valuation somewhere between fifty-two and fifty-nine billion dollars.

00:07:27 damraCATL is the interesting name on that list. A battery and energy-storage company taking a position in a frontier-model lab — that's a bet on the power story we were just talking about, more than on the model. If inference is constrained by megawatts, the people who store and move energy have a seat at this table.

00:07:46 lenarThat's a sharp read, and I'd hold it loosely — corporate investors take positions for a lot of reasons, and we don't have CATL's thesis from the Reuters piece. But the shape is notable. DeepSeek's first round, and it's already a fifty-billion-dollar company on paper, raising from a social platform and an energy firm rather than the usual venture names.

00:08:06 damraAnd it's a first round at that valuation, which tells you the previous funding was internal or quant-desk money. DeepSeek came out of a hedge fund. So this is the moment it goes from a research shop bankrolled by trading profits to a company taking serious outside capital. The interesting question is what strings come with Tencent money.

00:08:25 lenarWe won't know that from a funding announcement. Let's move down a layer, because the other half of today's model news isn't about the giant proprietary systems. It's about what you can run on hardware you already own.

00:08:37 damraMicrosoft at Build, and the local crowd.

00:08:39 lenarMicrosoft announced two on-device models at Build 2026 — Aion 1.0 Instruct and Aion 1.0 Plan. Their framing on Instruct is "efficiency at scale" — a next-generation small language model they say is smaller, faster, and more efficient than their previous one. The Plan model is the more interesting of the two by name: a model whose job is planning, agentic decomposition, rather than just answering.

00:09:06 damraSplitting Instruct from Plan is an admission that one model doing everything is the wrong shape for on-device. You want a small, fast model to handle the turn-by-turn instruction following, and a separate model that's good at breaking a task into steps. That's the agent architecture moving onto the laptop. The catch is always the same — what are the real numbers? Microsoft saying "smaller and faster" tells me nothing until I see tokens per second on actual silicon and a benchmark I trust.

00:09:35 lenarWhich is exactly why the local community is fun to read on the same morning. There's a post on the LocalLLaMA subreddit — a user shouting out a specific llama.cpp build, b9455, running on two RTX 3090s. He says that build sped up the Unsloth quantization of Qwen 3.6 — their 27-billion-parameter model — in the high-quality eight-bit quant.

00:10:00 damraThat's the texture I love about that corner of the world. Microsoft puts out a polished announcement with adjectives. And the same day, someone on two consumer cards is reporting an actual build number, an actual quant file, and a speedup they measured themselves. One of those is marketing, and one of those is a result you can reproduce tonight.

00:10:19 lenarAnd the gap between those two is shrinking, which matters for anyone deciding what to build on. A 27-billion-parameter model in an eight-bit quant on two 3090s is a serious local setup now. The question is no longer whether you can run something useful locally. It's which tier of capability you give up to keep your data on your own machine.

00:10:40 damraAnd the answer to that keeps getting better for the local side. But let's not oversell it — two 3090s is still a fifteen-hundred-dollar card situation and a power supply that trips your breaker. "Local" still costs you — it's just yours instead of rented.

00:10:56 lenarFair. Let's go to the research, because two papers landed today that both poke at things we usually take for granted, and the first one has the best opening line I've read in a while.

00:11:06 damraGo ahead, set it up.

00:11:07 lenarIt's a paper called AURA — Action-Gated Memory for Robot Policies at Constant VRAM, by Josef Chen, on arXiv. The abstract opens: "The key-value cache is the right memory for datacenters but the wrong memory for robots." And then it makes a simple argument. The key-value cache — the memory that makes large language model inference fast by remembering every prior token — grows without bound. A robot running continuously can't afford memory that grows forever.

00:11:38 damraThat's a good framing, because the key-value cache is one of those things everyone in inference treats as just how it works. In a datacenter you have a request, you serve it, you free the cache. A robot doesn't have a request boundary. It's running the same policy for hours. If your memory grows with every observation, you hit the wall on video memory and the robot stops. "Constant VRAM" is the whole pitch in two words.

00:12:05 lenarAnd "action-gated" is how they get there — instead of keeping every token, the memory update is gated on actions the policy takes, so what gets retained is tied to what the robot did, not to every frame it saw. I've read the abstract, not the full method, so I can't tell you how well it holds up against a baseline yet. But the problem statement is exactly right, and it's the kind of constraint that doesn't show up until you take a model off the server and put it on something that has to run all day.

00:12:32 damraIt's the same lesson as the on-device model split. The architecture that's correct in a datacenter is the wrong architecture on the edge, and we're watching a whole wave of work rediscover that for memory, model size, and planning. The server assumptions don't survive contact with a battery and a fixed memory budget.

00:12:50 lenarThe second paper is from a group including Simone Caldarella and Elisa Ricci, titled "Thinking Past the Answer: Evaluating Harmful Overthinking in Large Reasoning Models." The setup: reasoning models generate explicit chains of thought to improve accuracy, and this paper introduces an evaluation protocol for what they call harmful overthinking — cases where the extra reasoning makes the model worse, not better.

00:13:15 damraWhich anyone who's watched a reasoning model talk itself out of a correct answer already knows in their gut. You ask a simple question, the model gets it right in the first sentence, then reasons for four hundred more tokens and arrives somewhere wrong. The value here is they're trying to measure it — reasoning sufficiency, knowing when to stop.

00:13:34 lenarAnd that connects to a cost question, not just an accuracy one. Every one of those reasoning tokens is billed and adds latency. If a model overthinks its way to a worse answer, you paid more for a downgrade. An eval that names that failure is useful precisely because the whole industry has been selling "more reasoning" as strictly better.

00:13:54 damraAnd it ties back to the capacity story from the top of the show. If reasoning tokens are sometimes wasted compute that also lowers your accuracy, then "think harder" isn't free in a world where inference capacity is the constraint. Knowing when to stop reasoning is a capacity optimization, not just a quality one.

00:14:15 lenarThat's the thread that actually holds across the day, and I'll keep it that loose. Let's close on two items about power and labor, because both are concrete and both matter to anyone in this field.

00:14:25 damraGitLab first.

00:14:27 lenarGitLab is laying off 350 people — about fourteen percent of its workforce — and exiting 22 countries, as it pivots to position itself as an AI-focused enterprise software development platform. That's per Dean Seal at the Wall Street Journal.

00:14:42 damraExiting 22 countries is the detail that tells you what kind of cut this is. That's a retreat from go-to-market in whole regions, not a hiring-pace correction. A company that sells DevOps tooling worldwide deciding it can only afford to sell in a much smaller footprint — that's a company under real margin pressure dressing a contraction in the language of an AI pivot.

00:15:05 lenarI'll be careful and generous here, though. "AI pivot" gets used as a euphemism for layoffs, and sometimes it genuinely is one. But GitLab does have a real product problem to solve. If coding agents are doing more of the work inside the development loop, a platform built around human-authored merge requests and pipelines has to change shape or get disintermediated. The pivot might be both real strategy and cover for a hard quarter.

00:15:29 damraSure, both can be true. But 350 people lost their jobs today, and I'd push back on the framing that softens that into a strategy story. The plainer version is: a public company missed its numbers, cut fourteen percent, and the AI narrative is what you tell investors so the stock doesn't fall further.

00:15:48 lenarThat's fair, and we don't have their earnings detail in front of us to say which weighs more. The last item is a regulatory one, out of the UK. The Competition and Markets Authority ruled that UK media websites can now block Google from using their articles in its AI search summaries — the overviews that answer your query without you clicking through. Reported by Joanna Partridge and Dan Milmo at the Guardian.

00:16:12 damraAnd the mechanism is the whole story there. Until now, the brutal part for publishers was that opting out of AI summaries meant opting out of Google search entirely — the same crawler, the same index. You couldn't say "index me but don't summarize me." If the CMA is forcing Google to split those, that's a real change in leverage.

00:16:34 lenarThat's what I'd want to confirm in the full ruling — whether it's an actual separate opt-out that preserves your search ranking, or a softer commitment. The publishers' complaint was specific: AI overviews dropped their click-through traffic and the revenue that comes with it. A toggle that lets you keep the search visibility while denying the summary is the only version that actually helps them.

00:16:54 damraAnd it's a UK-only ruling for now, from one regulator. But it's the first time I've seen a competition authority treat "index for search" and "ingest for AI answers" as two different permissions a publisher can grant separately. If that distinction holds, it travels — every other regulator watching the same traffic collapse now has a template.

00:17:14 lenarOne thread runs under all of it. The EU wants to own more of the stack and can't build the halls fast enough. The US can't pour its own concrete on schedule. China priced a model to move and raised energy money to back it. And a UK regulator started prying apart permissions that everyone treated as one. None of it is the model getting smarter. All of it is about who controls the capacity, the energy, and the rights underneath the models. That's what I'm watching into Thursday.

00:17:41 damraAnd the data center number is the one I'll be checking against. If sixty percent of 2027 capacity still isn't in the ground by the fall, every sovereignty proposal and every pricing move is operating on compute that doesn't exist yet. I'd rather track the steel than the speeches.