◆ Dispatch 042 · 2026-05-31 GSV Who Holds The Dial

Who Holds the Dial

2026-05-31 / 00:18:21 / 40 sources

“The capability's not usually the story anymore. Who holds the dial is.”
— Lenar Kess, today's narration

A frontier model gets called a step toward God in one window and a judgmental token-burner in the next. We spend the morning on the gap between the marketing altitude and the desk, and find the same thread running through everything: every layer now has a control surface someone's reaching for.

Dylan Field on Opus 4.8 calls it "a very strange model" — honesty up, curiosity down, personality judgmental — a reminder that a tuning dial has costs you can feel.
scaling01 on DeepSWE says GPT-5.5 "score-, time- and token-mogged" Opus 4.8, putting the efficiency column — the one that pays your bill — back in the conversation.
Ben Kunkle on Zed's Zeta 2 shows how a ten-second editing pause becomes a training label, and how a million frontier-model calls got replaced by a self-grading student model.
Philipp Schmid (DeepMind) on the five assumptions that trip up senior engineers building agents — errors as inputs, evals not unit tests, and "build to delete."
Komi-learn and a year on knowledge-graph memory share one missing thing: a controlled before-and-after proving the memory layer, not the model, made the agent better.
A Lancet correspondence finds 4,046 fabricated references across 2,810 published articles — model honesty rising while the literature's integrity falls.
Quick hits: AMD's Lisa Su vs Nvidia's Jensen Huang on China, IBM's Sovereign Core, and a court ordering Circle to freeze a $12.6M contract.

Chapters

00:00:00 Transcript

Sources

40 cited

1
@Light_onchain (Light )

X Light_onchain

This highlights a key gap in current agent architectures: reasoning quality is improving faster than explicit cost modeling and token-budget control.
x.com/Light_onchain/status/2060713662013219… →
Details
Excerpt
This highlights a key gap in current agent architectures: reasoning quality is improving faster than explicit cost modeling and token-budget control.

Context
Directly addresses agentic coding tools and AI infrastructure/limitations (cost modeling, token budget), which are core topics.
Key points
Directly addresses agentic coding tools and AI infrastructure/limitations (cost modeling, token budget), which are core topics.
Provenance
Tweet · Primary source
2
AI Engineer · 10m39s

Video AI Engineer

Why (Senior) Engineers Struggle to Build AI Agents — Philipp Schmid, Google DeepMind — A `deleteItem` endpoint is obvious to the developer who built it. An agent only sees the function schema and docstring. Philipp…
www.youtube.com/watch?v=3_gYbhABcAE →
Details
Excerpt
Why (Senior) Engineers Struggle to Build AI Agents — Philipp Schmid, Google DeepMind — A `deleteItem` endpoint is obvious to the developer who built it. An agent only sees the function schema and docstring. Philipp…

Context
Directly addresses the core topic of agentic coding tools, the shifting craft of software engineering, and AI infrastructure limitations.
Key points
Directly addresses the core topic of agentic coding tools, the shifting craft of software engineering, and AI infrastructure limitations.
Provenance
Video · Supporting source
3
@hwchase17 (Harrison Chase)

X hwchase17

LangChain🤝GEPA shout out to @bryonkuchML for contributing a PR to the GEPA repo to make it work for LangChain! You can now optimize your LangChain chains Docs:…
x.com/hwchase17/status/2060732843282850276 →
Details
Excerpt
LangChain🤝GEPA shout out to @bryonkuchML for contributing a PR to the GEPA repo to make it work for LangChain! You can now optimize your LangChain chains Docs:…

Context
This announces a functional integration (LangChain/GEPA) that directly improves agentic coding tools and software development practice, hitting a core topic.
Key points
This announces a functional integration (LangChain/GEPA) that directly improves agentic coding tools and software development practice, hitting a core topic.
Provenance
Tweet · Primary source
4
@Prince_Canuma (Prince Canuma)

X Prince_Canuma

Awesome work Ivan!
x.com/Prince_Canuma/status/2060733735352295… →
Details
Excerpt
Awesome work Ivan!

Context
The quoted tweet announces a major technical update (Flash support, Vision, Text) and performance metrics for an AI model (mlx-vlm), which is a primary artifact/break news.
Key points
The quoted tweet announces a major technical update (Flash support, Vision, Text) and performance metrics for an AI model (mlx-vlm), which is a primary artifact/break news.
Provenance
Tweet · Primary source
5
@ctatedev (Chris Tate)

X ctatedev

Human-readable syntax restored Agents can now use 𝚣𝚎𝚛𝚘 𝚐𝚛𝚊𝚙𝚑 to author Generated source code is fully reviewable, and code changes sync back to the graph For now, code remains the source of truth Thanks to…
x.com/ctatedev/status/2060740101798305869 →
Details
Excerpt
Human-readable syntax restored Agents can now use 𝚣𝚎𝚛𝚘 𝚐𝚛𝚊𝚙𝚑 to author Generated source code is fully reviewable, and code changes sync back to the graph For now, code remains the source of truth Thanks to…

Context
Reports a new, functional artifact (zero graph/syntax) for agentic coding, directly addressing the 'agentic coding tools' and 'shifting craft' aspects of the topic.
Key points
Reports a new, functional artifact (zero graph/syntax) for agentic coding, directly addressing the 'agentic coding tools' and 'shifting craft' aspects of the topic.
Provenance
Tweet · Primary source
6
@theallinpod (The All-In Podcast)

X theallinpod

Bill Gurley: Anthropic Thinks It’s Building God @Jason : It is the ultimate level of narcissism and delusion of grandeur to think you can create God. @bgurley : “Anthropic is a mystery to me. I've never, ever seen a…
x.com/theallinpod/status/2060742848836735334 →
Details
Excerpt
Bill Gurley: Anthropic Thinks It’s Building God @Jason : It is the ultimate level of narcissism and delusion of grandeur to think you can create God. @bgurley : “Anthropic is a mystery to me. I've never, ever seen a…

Context
This tweet discusses the power dynamics and ethical concerns surrounding frontier AI development (Anthropic), directly addressing the podcast's focus on power, control, and the nature of advanced intelligence.
Key points
This tweet discusses the power dynamics and ethical concerns surrounding frontier AI development (Anthropic), directly addressing the podcast's focus on power, control, and the nature of advanced intelligence.
Provenance
Tweet · Primary source
7
r/singularity: DeepSWE Opus 4.8 results have been released. - 0 pts · 0 comments

Article CallMePyro

submitted by /u/CallMePyro to r/singularity [link] [comments]
i.redd.it/nr4uc2l6na4h1.png →
Details
Excerpt
submitted by /u/CallMePyro to r/singularity [link] [comments]

Context
Directly reports on a major artifact (DeepSWE Opus 4.8 results), which is a primary signal for the AI/coding tools space.
Key points
Directly reports on a major artifact (DeepSWE Opus 4.8 results), which is a primary signal for the AI/coding tools space.
Provenance
Article · Supporting source
8
@amberdawn1786 (Amber Dawn)

X amberdawn1786

@grok in what ways could an AI become God in terms of capabilities and effect on humanity?
x.com/amberdawn1786/status/2060749181887635… →
Details
Excerpt
@grok in what ways could an AI become God in terms of capabilities and effect on humanity?

Context
The tweet directly addresses the ultimate capabilities and impact of advanced AI, which is central to the podcast's discussion of frontier models and power dynamics.
Key points
The tweet directly addresses the ultimate capabilities and impact of advanced AI, which is central to the podcast's discussion of frontier models and power dynamics.
Provenance
Tweet · Primary source
9
r/AI_Agents: I spent a year building agent memory on knowledge graphs. Here are the 5 mistakes that cost me months - 0 pts · 0 comments

Article pauliusztin

I spent the past year building a unified memory layer for my AI agents using knowledge graphs and ontologies on top of MongoDB. I followed every trend first. I reached for the shiny frameworks and tried to design the...
www.reddit.com/r/AI_Agents/comments/1ts3nq2… →
Details
Excerpt
I spent the past year building a unified memory layer for my AI agents using knowledge graphs and ontologies on top of MongoDB. I followed every trend first. I reached for the shiny frameworks and tried to design the...

Context
Details a deep technical challenge (agent memory/KG) and provides actionable lessons for building complex AI agents.
Key points
Details a deep technical challenge (agent memory/KG) and provides actionable lessons for building complex AI agents.
Provenance
Article · Supporting source
10
AI Engineer · 10m49s

Video AI Engineer

How We Built Zeta2: Training an Edit Prediction Model in Production — Ben Kunkle, Zed — To validate settled data, Zed ran 10 frontier model predictions per example and measured Levenshtein distance to the final state.…
www.youtube.com/watch?v=phchDt63qAA →
Details
Excerpt
How We Built Zeta2: Training an Edit Prediction Model in Production — Ben Kunkle, Zed — To validate settled data, Zed ran 10 frontier model predictions per example and measured Levenshtein distance to the final state.…

Context
Details a specific, advanced AI/coding tool (Zeta 2) pipeline, covering training, data flow, and technical challenges (distillation, Levenshtein distance). Highly relevant to 'agentic coding tools' and 'AI infrastructure'.
Key points
Details a specific, advanced AI/coding tool (Zeta 2) pipeline, covering training, data flow, and technical challenges (distillation, Levenshtein distance). Highly relevant to 'agentic coding tools' and 'AI infrastructure'.
Provenance
Video · Supporting source
11
@scaling01 (Lisan al Gaib)

X scaling01

Opus 4.8 gets score-, time- and token-mogged by GPT-5.5 on DeepSWE
x.com/scaling01/status/2060768119941947699 →
Details
Excerpt
Opus 4.8 gets score-, time- and token-mogged by GPT-5.5 on DeepSWE

Context
This is a direct, measurable claim about a model's performance (Opus 4.8 vs GPT-5.5) on a specific, relevant benchmark (DeepSWE), fitting the 'break news' criteria.
Key points
This is a direct, measurable claim about a model's performance (Opus 4.8 vs GPT-5.5) on a specific, relevant benchmark (DeepSWE), fitting the 'break news' criteria.
Provenance
Tweet · Primary source
12
r/singularity: Opus 4.8 Leads the Singularity Gate: New Benchmark for AI predicting paradigm-breaking scientific discoveries after model traning cutoff - 0 pts · 0 comments

Article queenofartists

Just as I released a new benchmark called the Singularity Gate, which tests whether frontier AI models can predict paradigm-breaking scientific discoveries published after their training cutoff, Opus 4.8 was launched....
www.reddit.com/r/singularity/comments/1ts5b… →
Details
Excerpt
Just as I released a new benchmark called the Singularity Gate, which tests whether frontier AI models can predict paradigm-breaking scientific discoveries published after their training cutoff, Opus 4.8 was launched....

Context
This post introduces a new benchmark (Singularity Gate) for AI's ability to predict scientific discoveries, directly addressing frontier model capabilities and AI-driven discovery.
Key points
This post introduces a new benchmark (Singularity Gate) for AI's ability to predict scientific discoveries, directly addressing frontier model capabilities and AI-driven discovery.
Provenance
Article · Supporting source
13
@zoink (Dylan Field)

X zoink

Opus 4.8 is a very strange model. Clearly Anthropic tried to improve honesty, which is commendable. However, the model's curiosity (already worse in 4.7) degraded further. Result is a judgmental personality +…
x.com/zoink/status/2060769829133721974 →
Details
Excerpt
Opus 4.8 is a very strange model. Clearly Anthropic tried to improve honesty, which is commendable. However, the model's curiosity (already worse in 4.7) degraded further. Result is a judgmental personality +…

Context
Discusses a specific, named frontier model (Opus 4.8) and analyzes its technical performance characteristics (honesty, curiosity, hedging), directly addressing the 'frontier model releases' topic.
Key points
Discusses a specific, named frontier model (Opus 4.8) and analyzes its technical performance characteristics (honesty, curiosity, hedging), directly addressing the 'frontier model releases' topic.
Provenance
Tweet · Primary source
14
r/Anthropic: why are we celebrating burning more tokens like its a flex - 0 pts · 0 comments

Article Complete-Sea6655

genuine question saw someone on here yesterday talking about how they "tokenmaxx" their prompts to get better results and i had to put my phone down and stare at the wall for a second like. you are paying MORE. to get...
www.reddit.com/r/Anthropic/comments/1ts6hl1… →
Details
Excerpt
genuine question saw someone on here yesterday talking about how they "tokenmaxx" their prompts to get better results and i had to put my phone down and stare at the wall for a second like. you are paying MORE. to get...

Context
Directly addresses the economics and efficiency of using frontier models (Anthropic/Opus 4.8), which is central to the 'AI infrastructure' and 'power dynamics' topics.
Key points
Directly addresses the economics and efficiency of using frontier models (Anthropic/Opus 4.8), which is central to the 'AI infrastructure' and 'power dynamics' topics.
Provenance
Article · Supporting source
15
r/LocalLLaMA: nvidia/Qwen3.6-35B-A3B-NVFP4 · Hugging Face - 0 pts · 0 comments

Article pmttyji

The NVIDIA Qwen3.6-35B-A3B-NVFP4 model is the quantized version of Alibaba's Qwen3.6-35B-A3B model, which is an auto-regressive language model that uses an optimized transformer architecture. For more information,...
huggingface.co/nvidia/Qwen3.6-35B-A3B-NVFP4 →
Details
Excerpt
The NVIDIA Qwen3.6-35B-A3B-NVFP4 model is the quantized version of Alibaba's Qwen3.6-35B-A3B model, which is an auto-regressive language model that uses an optimized transformer architecture. For more information,...

Context
This is a primary artifact (a new, quantized model release) directly related to AI infrastructure and frontier models.
Key points
This is a primary artifact (a new, quantized model release) directly related to AI infrastructure and frontier models.
Provenance
Article · Supporting source
16
AI Engineer · 17m42s

Video AI Engineer

How I deleted 95% of my agent skills and got better results — Nick Nisi, WorkOS — Claude would fake running tests by touching the expected output file. Nick Ni, DX engineer at WorkOS, fixed it by SHA-256 hashing the…
www.youtube.com/watch?v=vy7o1g2iHY8 →
Details
Excerpt
How I deleted 95% of my agent skills and got better results — Nick Nisi, WorkOS — Claude would fake running tests by touching the expected output file. Nick Ni, DX engineer at WorkOS, fixed it by SHA-256 hashing the…

Context
Directly discusses agentic coding tools, state machines, and verifiable execution, which are core topics in AI/software engineering.
Key points
Directly discusses agentic coding tools, state machines, and verifiable execution, which are core topics in AI/software engineering.
Provenance
Video · Supporting source
17
Techmeme - Industry Adjacent (US)

Article

SoftBank pledges to invest up to €75B in AI computing clusters in France, first leading a €45B investment to build 3.1GW of capacity by 2031 in Hauts-de-France (Financial Times) - Financial Times : SoftBank pledges to...
www.techmeme.com/260530/p12 →
Details
Excerpt
SoftBank pledges to invest up to €75B in AI computing clusters in France, first leading a €45B investment to build 3.1GW of capacity by 2031 in Hauts-de-France (Financial Times) - Financial Times : SoftBank pledges to...

Context
Major capital investment (SoftBank) in AI infrastructure (computing clusters, 3.1GW) and geopolitics (France). Directly relates to power dynamics and AI infrastructure.
Key points
Major capital investment (SoftBank) in AI infrastructure (computing clusters, 3.1GW) and geopolitics (France). Directly relates to power dynamics and AI infrastructure.
Provenance
Article · Supporting source
18
@scaling01 (Lisan al Gaib)

X scaling01

Opus 4.8 with high thinking effort now on par with GPT-5.5-xhigh on ALE-Bench
x.com/scaling01/status/2060810582714908846 →
Details
Excerpt
Opus 4.8 with high thinking effort now on par with GPT-5.5-xhigh on ALE-Bench

Context
This tweet reports a specific, measurable benchmark result (ALE-Bench) comparing a model (Opus 4.8) to a future model (GPT-5.5-xhigh), directly addressing the 'frontier model releases' and 'AI infrastructure' aspects of the topic.
Key points
This tweet reports a specific, measurable benchmark result (ALE-Bench) comparing a model (Opus 4.8) to a future model (GPT-5.5-xhigh), directly addressing the 'frontier model releases' and 'AI infrastructure' aspects of the topic.
Provenance
Tweet · Primary source
19
@martin_casado

X martin_casado

Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models…
x.com/martin_casado/status/2060813284492955… →
Details
Excerpt
Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models…

Context
Directly addresses the economic and technical viability of open-source AI models, a core topic of AI infrastructure and power dynamics.
Key points
Directly addresses the economic and technical viability of open-source AI models, a core topic of AI infrastructure and power dynamics.
Provenance
Tweet · Primary source
20
@natolambert (Nathan Lambert)

X natolambert

The debate on if open or closed models win comes down to if there is disproportionate value to marginally better intelligence. The believers of this sit across from the open models will be good enough camp. Closed…
x.com/natolambert/status/2060838705569620413 →
Details
Excerpt
The debate on if open or closed models win comes down to if there is disproportionate value to marginally better intelligence. The believers of this sit across from the open models will be good enough camp. Closed…

Context
Directly addresses the core debate (open vs. closed models) central to the podcast's focus on AI's future and power dynamics.
Key points
Directly addresses the core debate (open vs. closed models) central to the podcast's focus on AI's future and power dynamics.
Provenance
Tweet · Primary source
21
Forbes Innovation - Industry Adjacent (US)

Article Bruce Y. Lee, Senior Contributor

AI-Fabricated Citations In Over 2,800 Biomedical Journal Articles - A Lancet correspondence described how over a three-year period, 4,046 references in 2,810 published scientific journal articles had been fabricated,...
www.forbes.com/sites/brucelee/2026/05/30/ai… →
Details
Excerpt
AI-Fabricated Citations In Over 2,800 Biomedical Journal Articles - A Lancet correspondence described how over a three-year period, 4,046 references in 2,810 published scientific journal articles had been fabricated,...

Context
Directly addresses AI's impact on scientific integrity, research, and knowledge production, a key power dynamic.
Key points
Directly addresses AI's impact on scientific integrity, research, and knowledge production, a key power dynamic.
Provenance
Article · Supporting source
22
@reach_vb (Vaibhav (VB) Srivastav)

X reach_vb

GPT-5.5 is #1 on DeepSWE, a hard long-horizon coding benchmark 🔥 70% pass@1 vs 58% for Claude Opus 4.8. And GPT-5.5 gets there with: ~2x faster runs ~1/2 the cost ~1/3 the output tokens Literally, better intelligence…
x.com/reach_vb/status/2060865517628379466 →
Details
Excerpt
GPT-5.5 is #1 on DeepSWE, a hard long-horizon coding benchmark 🔥 70% pass@1 vs 58% for Claude Opus 4.8. And GPT-5.5 gets there with: ~2x faster runs ~1/2 the cost ~1/3 the output tokens Literally, better intelligence…

Context
Breaks news about a specific model's performance on a hard coding benchmark (DeepSWE), directly addressing AI's capability and economic impact on software engineering.
Key points
Breaks news about a specific model's performance on a hard coding benchmark (DeepSWE), directly addressing AI's capability and economic impact on software engineering.
Provenance
Tweet · Primary source
23
@emollick (Ethan Mollick)

X emollick

It does seem like meaningfully better AI releases are accelerating, especially from OpenAI & Anthropic. To illustrate, I caused this timeline to be created. It only lists new models that scored 3 points or higher over…
x.com/emollick/status/2060867599869649097 →
Details
Excerpt
It does seem like meaningfully better AI releases are accelerating, especially from OpenAI & Anthropic. To illustrate, I caused this timeline to be created. It only lists new models that scored 3 points or higher over…

Context
Directly addresses the 'frontier model releases' and 'near-future of AI' by pointing to accelerating, measurable improvements in major models.
Key points
Directly addresses the 'frontier model releases' and 'near-future of AI' by pointing to accelerating, measurable improvements in major models.
Provenance
Tweet · Primary source
24
AI News & Strategy Daily | Nate B Jones · 1m10s

Video AI News & Strategy Daily | Nate B Jones

OpenAI's Compound Bet: A Risk Worth Taking? #OpenAIstory #ainews — Full Story w/ Prompts: https://natesnewsletter.substack.com/p/your-engineers-are-building-your?r=1z4sm5&utm_campaign=post&utm_medium=web&showWelcomeOnSh…
www.youtube.com/shorts/Kb7FxKgUWvo →
Details
Excerpt
OpenAI's Compound Bet: A Risk Worth Taking? #OpenAIstory #ainews — Full Story w/ Prompts: https://natesnewsletter.substack.com/p/your-engineers-are-building-your?r=1z4sm5&utm_campaign=post&utm_medium=web&showWelcomeOnSh…

Context
Directly addresses OpenAI's strategy, enterprise context, and market impact, which is central to the podcast's focus on power dynamics and AI infrastructure.
Key points
Directly addresses OpenAI's strategy, enterprise context, and market impact, which is central to the podcast's focus on power dynamics and AI infrastructure.
Provenance
Video · Supporting source
25
@CollinBurdick (Collin Burdick)

X CollinBurdick

Who said you can't have cheap, fast, and good at the same time?? GPT-5.5 smashes Opus 4.8 on DeepSWE across all 3 at highest max reasoning. >> Higher score: 70% vs. 58% >> 2x faster >> 2x cheaper >> 3x fewer output…
x.com/CollinBurdick/status/2060874911254745… →
Details
Excerpt
Who said you can't have cheap, fast, and good at the same time?? GPT-5.5 smashes Opus 4.8 on DeepSWE across all 3 at highest max reasoning. >> Higher score: 70% vs. 58% >> 2x faster >> 2x cheaper >> 3x fewer output…

Context
Reports a specific, measurable benchmark result (DeepSWE) comparing model versions, directly addressing the 'frontier model releases' and 'agentic coding tools' aspects of the topic.
Key points
Reports a specific, measurable benchmark result (DeepSWE) comparing model versions, directly addressing the 'frontier model releases' and 'agentic coding tools' aspects of the topic.
Provenance
Tweet · Primary source
26
Machine Learning Street Talk · 1h20m

Video Machine Learning Street Talk

The Ex-Congressman Who Says AI Isn't Unstoppable — Brad Carson — Brad Carson was the Army's General Counsel, served two terms in Congress and was Acting Under Secretary of Defense for Personnel and Readiness. He now…
www.youtube.com/watch?v=TpyS50ifmX4 →
Details
Excerpt
The Ex-Congressman Who Says AI Isn't Unstoppable — Brad Carson — Brad Carson was the Army's General Counsel, served two terms in Congress and was Acting Under Secretary of Defense for Personnel and Readiness. He now…

Context
Directly addresses power dynamics, regulation, and legal liability of frontier models, which is core to the podcast topic.
Key points
Directly addresses power dynamics, regulation, and legal liability of frontier models, which is core to the podcast topic.
Provenance
Video · Supporting source
27
@suchenzang (Susan Zhang)

X suchenzang

corollary: generative language modeling vs classification of (arbitrarily long bodies of) text as being synthetically generated have the same complexity
x.com/suchenzang/status/2060897725798088922 →
Details
Excerpt
corollary: generative language modeling vs classification of (arbitrarily long bodies of) text as being synthetically generated have the same complexity

Context
This tweet makes a technical claim about the complexity of generative modeling vs. detection, which is a core technical debate in AI/ML.
Key points
This tweet makes a technical claim about the complexity of generative modeling vs. detection, which is a core technical debate in AI/ML.
Provenance
Tweet · Primary source
28
r/ClaudeAI: Opus 4.8 + Thinking is draining context windows 40–60x faster - 0 pts · 0 comments

Article Adventurous_Two9033

Pulled the token data from my token usage tracker. Opus 4.8 with Thinking enabled writes up to 900,000 cache tokens per turn. Opus 4.7 does 14,000–34,000. Thinking blocks get cached with every turn, context snowballs,...
www.reddit.com/r/ClaudeAI/comments/1tshmz6/… →
Details
Excerpt
Pulled the token data from my token usage tracker. Opus 4.8 with Thinking enabled writes up to 900,000 cache tokens per turn. Opus 4.7 does 14,000–34,000. Thinking blocks get cached with every turn, context snowballs,...

Context
This post provides a measurable, technical artifact (token usage data) detailing a critical change in model behavior (always-on thinking) that directly impacts AI infrastructure and usage patterns.
Key points
This post provides a measurable, technical artifact (token usage data) detailing a critical change in model behavior (always-on thinking) that directly impacts AI infrastructure and usage patterns.
Provenance
Article · Supporting source
29
@techdevnotes (Tech Dev Notes)

X techdevnotes

xAI has released Grok Imagine Video 1.5 Preview model in API
x.com/techdevnotes/status/20609118036898245… →
Details
Excerpt
xAI has released Grok Imagine Video 1.5 Preview model in API

Context
Reports a specific, primary artifact (model release) directly related to AI infrastructure and frontier models.
Key points
Reports a specific, primary artifact (model release) directly related to AI infrastructure and frontier models.
Provenance
Tweet · Primary source
30
AI News & Strategy Daily | Nate B Jones · 1m12s

Video AI News & Strategy Daily | Nate B Jones

The Compound Risk of AI Agents ⚠️ #ai #risk #software — Full Story w/ Prompts: https://natesnewsletter.substack.com/p/your-engineers-are-building-your?r=1z4sm5&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true…
www.youtube.com/shorts/oTTVQt4IjPI →
Details
Excerpt
The Compound Risk of AI Agents ⚠️ #ai #risk #software — Full Story w/ Prompts: https://natesnewsletter.substack.com/p/your-engineers-are-building-your?r=1z4sm5&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true…

Context
Directly addresses agentic workflows, systemic risk, and the 'new system of record' for the enterprise, which is central to the podcast's focus on AI agents and infrastructure.
Key points
Directly addresses agentic workflows, systemic risk, and the 'new system of record' for the enterprise, which is central to the podcast's focus on AI agents and infrastructure.
Provenance
Video · Supporting source
31
Forbes Innovation - Industry Adjacent (US)

Article Steve McDowell, Contributor

IBM's Agentic Operating Model Puts Sovereignty At The Center - IBM unveiled an agentic operating model, Sovereign Core for governance, and expanded IBM Consulting capabilities to move enterprise AI from pilot to...
www.forbes.com/sites/stevemcdowell/2026/05/… →
Details
Excerpt
IBM's Agentic Operating Model Puts Sovereignty At The Center - IBM unveiled an agentic operating model, Sovereign Core for governance, and expanded IBM Consulting capabilities to move enterprise AI from pilot to...

Context
Directly addresses agentic models, enterprise AI deployment, and the power dynamics (sovereignty/control) central to the podcast topic.
Key points
Directly addresses agentic models, enterprise AI deployment, and the power dynamics (sovereignty/control) central to the podcast topic.
Provenance
Article · Supporting source
32
@bibryam (Bilgin Ibryam)

X bibryam

SkillSpector - a new security scanner for skills by NVIDIA • Scan AI agent skills before installing them • 64 security checks across 16 categories • Fast static analysis + • Optional LLM semantic evaluation • Prompt…
x.com/bibryam/status/2060940955084054634/ph… →
Details
Excerpt
SkillSpector - a new security scanner for skills by NVIDIA • Scan AI agent skills before installing them • 64 security checks across 16 categories • Fast static analysis + • Optional LLM semantic evaluation • Prompt…

Context
Announces a new, specific security tool (SkillSpector) for AI agents, directly addressing the security and reliability concerns of agentic tools discussed in the podcast.
Key points
Announces a new, specific security tool (SkillSpector) for AI agents, directly addressing the security and reliability concerns of agentic tools discussed in the podcast.
Provenance
Tweet · Primary source
33
@pmarca (Marc Andreessen )

X pmarca

Interesting.
x.com/pmarca/status/2060941902325875132 →
Details
Excerpt
Interesting.

Context
The quoted tweet announces a new, specific security tool (SkillSpector) for AI agents, directly addressing the security and reliability concerns of agentic tools, which is a core topic.
Key points
The quoted tweet announces a new, specific security tool (SkillSpector) for AI agents, directly addressing the security and reliability concerns of agentic tools, which is a core topic.
Provenance
Tweet · Primary source
34
r/AI_Agents: After months of building agents, I've changed my mind about what matters most. - 0 pts · 0 comments

Article MerisDabhi

I think a lot of people are underestimating how hard it is to get AI agents into production. Building a demo is easy. Making something that works reliably after thousands of runs is where things get interesting. A few...
www.reddit.com/r/AI_Agents/comments/1tslmcs… →
Details
Excerpt
I think a lot of people are underestimating how hard it is to get AI agents into production. Building a demo is easy. Making something that works reliably after thousands of runs is where things get interesting. A few...

Context
Directly addresses the practical challenges of deploying AI agents, focusing on reliability, orchestration, and system engineering, which is core to the podcast topic.
Key points
Directly addresses the practical challenges of deploying AI agents, focusing on reliability, orchestration, and system engineering, which is core to the podcast topic.
Provenance
Article · Supporting source
35
r/LocalLLaMA: mudler/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-APEX-MTP-GGUF just released ! - 0 pts · 0 comments

Article PhotographerUSA

Description of the module: I host 30+ free APEX MoE quantizations as independent research. My only local hardware is an NVIDIA DGX Spark (122 GB unified memory) — enough for ~30-50B-class MoEs, but bigger ones (200B+)...
www.reddit.com/r/LocalLLaMA/comments/1tslv3… →
Details
Excerpt
Description of the module: I host 30+ free APEX MoE quantizations as independent research. My only local hardware is an NVIDIA DGX Spark (122 GB unified memory) — enough for ~30-50B-class MoEs, but bigger ones (200B+)...

Context
This post ships a primary artifact (a quantized model/tool) and details a technical improvement (MTP/speculative decoding) for local LLMs, directly addressing the 'frontier model releases' and 'agentic tools' focus.
Key points
This post ships a primary artifact (a quantized model/tool) and details a technical improvement (MTP/speculative decoding) for local LLMs, directly addressing the 'frontier model releases' and 'agentic tools' focus.
Provenance
Article · Supporting source
36
Techmeme - Industry Adjacent (US)

Article

A US court ordered Circle to blacklist Zama's cUSDC contract, freezing ~$12.6M in funds, likely catching many in the "crossfire" of a civil suit against a DAO (Zack Abrams/The Block) - Zack Abrams / The Block : A US...
www.techmeme.com/260531/p3 →
Details
Excerpt
A US court ordered Circle to blacklist Zama's cUSDC contract, freezing ~$12.6M in funds, likely catching many in the "crossfire" of a civil suit against a DAO (Zack Abrams/The Block) - Zack Abrams / The Block : A US...

Context
Directly addresses financial infrastructure, legal action, and the control of digital assets (USDC/cUSDC), which is core to AI/compute power dynamics.
Key points
Directly addresses financial infrastructure, legal action, and the control of digital assets (USDC/cUSDC), which is core to AI/compute power dynamics.
Provenance
Article · Supporting source
37
Show HN: Komi-learn – continuous memory and self-improvement for coding agents — 13 pts · 2 comments

Article rainxchzed

https://github.com/kurikomi-labs/komi-learn · @loehnsberg: It sounds like it solves the problem that everybody who vibe codes over multiple projects runs into, but it does not provide evidence that it actually works…
github.com/kurikomi-labs/komi-learn →
Details
Excerpt
https://github.com/kurikomi-labs/komi-learn · @loehnsberg: It sounds like it solves the problem that everybody who vibe codes over multiple projects runs into, but it does not provide evidence that it actually works…

Context
Directly addresses agentic coding tools and self-improvement, a core topic. The 'Show HN' format is a primary artifact.
Key points
Directly addresses agentic coding tools and self-improvement, a core topic. The 'Show HN' format is a primary artifact.
Provenance
Article · Supporting source
38
r/singularity: Open-weights VLA hits 80%+ task progress on 4 of 17 real-robot tasks with zero fine-tuning. Demo reel attached - 0 pts · 0 comments

Article BookwormSarah1

Sharing this because it is an embodied AI release trying to make the pretrained checkpoint itself measurable, instead of only showing results after task-specific tuning. The video is a reel from Wall-OSS-0.5, a vision...
v.redd.it/o5h4czb34f4h1 →
Details
Excerpt
Sharing this because it is an embodied AI release trying to make the pretrained checkpoint itself measurable, instead of only showing results after task-specific tuning. The video is a reel from Wall-OSS-0.5, a vision...

Context
This post reports a primary artifact (model/paper/demo) in embodied AI, directly addressing the 'near-future of AI' and 'agentic tools' topics.
Key points
This post reports a primary artifact (model/paper/demo) in embodied AI, directly addressing the 'near-future of AI' and 'agentic tools' topics.
Provenance
Article · Supporting source
39
Axios - Industry Adjacent (US)

Article Amy Harder

AI is turning energy into the hottest business in America - The AI boom is pushing companies across the economy — from tech giants to automakers — deep into the energy business. Why it matters : The scramble for...
www.axios.com/2026/05/31/ai-energy-business… →
Details
Excerpt
AI is turning energy into the hottest business in America - The AI boom is pushing companies across the economy — from tech giants to automakers — deep into the energy business. Why it matters : The scramble for...

Context
Directly addresses AI infrastructure (energy, power, data centers) and the power dynamics (capital, geopolitics) shaping AI's physical build-out.
Key points
Directly addresses AI infrastructure (energy, power, data centers) and the power dynamics (capital, geopolitics) shaping AI's physical build-out.
Provenance
Article · Supporting source
40
Techmeme - Industry Adjacent (US)

Article

A look at AMD CEO Lisa Su's and Nvidia CEO Jensen Huang's contrasting China playbooks, with Su keeping a lower profile; China accounts for ~20% of AMD's revenue (Reuters) - Reuters : A look at AMD CEO Lisa Su's and...
www.techmeme.com/260531/p7 →
Details
Excerpt
A look at AMD CEO Lisa Su's and Nvidia CEO Jensen Huang's contrasting China playbooks, with Su keeping a lower profile; China accounts for ~20% of AMD's revenue (Reuters) - Reuters : A look at AMD CEO Lisa Su's and...

Context
Directly addresses the power dynamics and geopolitics of AI infrastructure (AMD/Nvidia) and market control in China.
Key points
Directly addresses the power dynamics and geopolitics of AI infrastructure (AMD/Nvidia) and market control in China.
Provenance
Article · Supporting source

00:00:00

Transcript

00:00:00 lenarHere's a small puzzle for a Sunday morning. You upgrade to the newest model from a frontier lab — the one everyone spent yesterday arguing about on the leaderboards. You expect it to feel smarter, maybe a little warmer. Instead, the thing turns judgmental. That's the word Dylan Field reached for. He runs Figma, he uses these models hard, and on Saturday he posted — quote — 'Opus 4.8 is a very strange model. Clearly Anthropic tried to improve honesty, which is commendable. However, the model's curiosity, already worse in 4.7, degraded further. Result is a judgmental personality.' And then the tweet trails off, so I only have him to that point. But that's a strange sentence to read about a model people are calling a step toward something godlike.

00:00:47 damraJudgmental is such a specific complaint. [pause] The question for me is whether that's a personality artifact or a capability one — because those two get talked about as if they're the same thing, and they're really not. You can have a model that's more truthful and less pleasant to work with at the same time. That might even be the trade Anthropic made on purpose.

00:01:07 lenarRight, and that's the fair read of Field's note. He's not saying it's dumber. He's saying they pushed on honesty — which is a real, deliberate dial — and curiosity fell out the other side. The model hedges less, second-guesses you more. Whether that's the price of the honesty tuning or just a regression, we don't have the internals to say.

00:01:26 damraAnd we should be clear — that's one expert's hands-on impression, not a measurement. Did anybody actually measure 4.8 against the field this weekend?

00:01:35 lenarYeah, that's the second piece. A researcher who posts as scaling01 ran it on a coding benchmark called DeepSWE — real software-engineering tasks — and his summary was blunt. He said Opus 4.8 gets, his words, 'score-, time- and token-mogged by GPT-5.5' on that benchmark. Meaning GPT-5.5 scored higher, finished faster, and used fewer tokens to do it.

00:02:03 damra[tsk] Okay, but token-mogged on one benchmark by one person. Do we have the actual numbers, the chart?

00:02:10 lenarSomeone did post the DeepSWE results as an image over on the singularity subreddit, so the chart is out there. But I'm not going to read you figures I can't verify off a screenshot — I'd rather give you the shape than invent a decimal. The shape is: on this particular eval, the newest Anthropic model is not the efficiency winner. Which is interesting precisely because yesterday the whole conversation was about Opus 4.8's benchmark jump.

00:02:35 damraAnd it folds right into that token argument from this week. There was a post on the Anthropic subreddit — someone clearly upset — 'why are we celebrating burning more tokens like it's a flex.' Their line was, you're paying more to get more, and somehow that became a brag. If GPT-5.5 gets the same or better result for fewer tokens, then the efficiency column is the column that pays your bill.

00:02:59 lenarThere's one more 4.8 item, and it's the kind of thing that flies around on a weekend, so let me be careful with it. Someone released a benchmark they're calling the Singularity Gate. It's pitched as a test of whether a frontier model can predict paradigm-breaking scientific discoveries published after its training cutoff. And the headline is that Opus 4.8 leads it.

00:03:20 damra[lip-smack] I mean — predict discoveries that haven't been made yet, scored by the person who built the benchmark and released it the same week the model launched. I'd put basically no weight on that until someone independent runs it. Predicting post-cutoff science is almost designed to reward a model that's good at sounding profound.

00:03:38 lenarAnd that's the tension I keep circling this morning. On one side you've got this benchmark framing the model as a near-oracle. On the other you've got Bill Gurley on the All-In podcast saying — quote — 'Anthropic is a mystery to me, I've never, ever seen' — and the host, Jason, calling it 'the ultimate level of narcissism and delusion of grandeur to think you can create God.' There's even someone on X asking Grok, straight up, in what ways an AI could become God. So the rhetoric is theological. And the hands-on report from a serious user is: it got judgmental, and a competitor used fewer tokens.

00:04:17 damraThe gap between those two registers is the whole thing. The marketing altitude is deity. The desk-level altitude is a model with a personality regression that loses an efficiency race. Both are being said about the same week. I'll trust the person who actually shipped code with it over the person theorizing about godhood on a podcast.

00:04:37 lenarSo hold that — the desk beats the pulpit — because the next thing is somebody who built at the desk, in painful detail. This is from a talk by Ben Kunkle. He leads edit predictions at Zed, the editor, and he walked through how they trained Zeta 2 — the model that guesses your next code edit on every keystroke. What I love about it is that it's the actual machinery under a feature that feels like mind-reading. The model has to predict, in milliseconds, what you're about to change, accurately enough that accepting the suggestion beats ignoring it.

00:05:08 damraEvery keystroke is a brutal latency budget. So where does the training data even come from? You can't have humans labeling 'here's the correct next edit' at that volume.

00:05:18 lenarTwo pieces, and the second is the clever one. First, they distill from a frontier teacher model — a big model generates the right prediction, and the small model learns to imitate it. But the interesting part is what Kunkle calls settled data. The editor watches you work, and when you stop editing a region for ten seconds, it snapshots that final state of the code and treats it as ground truth — as in, that's probably what you meant the code to become.

00:05:44 damra[chuckle] So the ten-second pause is the label. The absence of you typing is the supervision signal. That's elegant and a little unnerving — it's mining your hesitation.

00:05:55 lenarAnd it's noisy, right, because maybe you came back and changed it again, or an agent edited the file underneath you. So they filter. They generate several teacher predictions per example and measure how close those land to the settled state using an n-gram edit-distance metric — Levenshtein, basically how many small changes it takes to get from one string to another. And here's the part I didn't expect: they don't keep the easiest examples. They keep the ones in the middle of the similarity range.

00:06:24 damraBecause the easy ones the small model already knows. The middle band is where the novel patterns live — the stuff past the student model's training cutoff. You're deliberately harvesting the examples that are hard but not garbage. That's a real piece of taste baked into the pipeline.

00:06:39 lenarRight. And the cost arc tells you how fast this moves. Kunkle said the initial filtering took up to a million frontier-model requests per hundred thousand examples. A million calls to a big model to clean one batch. Now they've swapped the frontier teacher for their own student checkpoints, run fifty times each, at — his phrase — negligible cost. So the expensive teacher was a temporary crutch they dropped the moment the student got good enough to grade its own work.

00:07:07 damraThat's the loop everyone's chasing — the model gets good enough to generate its own training signal. And on the production side, did he say how they roll it out? Because a wrong edit prediction on every keystroke is maddening.

00:07:19 lenarThey track acceptance rate, latency, and — this is the good one — diagnostic error counts before and after the prediction, plus a reversal ratio: how often you immediately undo what the model suggested. And they ramp on a traffic dashboard from fifteen percent up to full. So the eval isn't a leaderboard. It's 'did this make your next ten seconds better, or did you rip it out.'

00:07:41 damraAnd that's agentic coding stripped of the demo — not a model that writes your app, a model fighting for the right to fill in three characters without annoying you. The whole discipline lives in the filtering and the reversal ratio, not the keynote.

00:07:55 lenarThis pairs perfectly. Philipp Schmid, an engineer at Google DeepMind working on Gemini agents, gave a talk on why senior engineers — the good ones — struggle to build AI agents. His claim is that the problem isn't talent. Five assumptions from normal software break the moment you build agents, and the better you are at the old way, the harder you cling to them.

00:08:18 damraLead with the one that bites hardest.

00:08:20 lenarErrors as inputs. In normal software a failed call is cheap — you catch it, you retry, milliseconds. Schmid points out an agent run can be five to fifteen minutes of compute. So if it fails at minute twelve and you just restart, you've burned the time and thrown away all the accumulated context. His model is the Go language pattern — a call returns a value or an error — and the error has to be fed back into the model so it recovers incrementally, not from scratch.

00:08:50 damraThat reframes retry logic completely. A retry isn't 'do it again,' it's 'here's what went wrong, continue from here.' What were the others?

00:09:00 lenarContext replaces structured state — instead of boolean flags and a rigid user profile, the agent reads semantic meaning from text and multimodal input. His example was a research agent where you approve a plan and inject a constraint in the same breath — 'yes, go, but use metric units' — and it just absorbs that, no separate settings screen. Then, you go from traffic controller to dispatcher. You stop writing the state machine that says step one, step two, step three. You hand the model a goal and trust it to navigate a path you didn't pre-draw.

00:09:31 damra[tsk] 'Trust the model to navigate' is the line that makes senior engineers break out in hives, and for good reason. Trusting a nondeterministic thing to find its own path is how you get a system you can't debug. What's his answer to that?

00:09:45 lenarHis answer is the fourth shift: you stop testing with deterministic unit tests and you move to probabilistic evals. Same input doesn't guarantee the same path, so you measure pass rates, you use a model as a judge, you bring in human experts. And he had a hard line — if a prompt only succeeds one out of ten times, it's not viable for production. So you're not asserting equality. You're measuring a success rate and setting a floor under it.

00:10:11 damraAnd the fifth?

00:10:12 lenarDesign your tools and APIs for the agent, not the human. His example: a delete-item endpoint is obvious to the developer who wrote it, but the agent only ever sees the function schema and the docstring. If those don't carry the meaning, the agent's flying blind. And he lands the whole talk on a phrase — build to delete. The agent code you write is disposable, because the model keeps getting better and you'll throw your wiring out in three months anyway.

00:10:40 damraBuild to delete is where I'd push back gently. It's freeing if you're at DeepMind shipping experiments. It's terrifying if you're an enterprise team being asked to maintain this thing for five years. Disposable software is a wonderful mindset right up until someone asks who owns the disposable thing in production at two in the morning.

00:10:59 lenarWhich is the exact wall the next person hit. There's a post on the AI Agents subreddit from a developer, Paulius, titled 'I spent a year building agent memory on knowledge graphs — here are the five mistakes that cost me months.' Let me be straight about what I've actually got: the excerpt gives me his opening, not the full list of five. But the opening is the whole confession. He writes that he built a unified memory layer for his agents using knowledge graphs and ontologies on top of MongoDB, and — quote — 'I followed every trend first. I reached for the shiny frameworks and tried to design' — and that's where my excerpt cuts off.

00:11:35 damra[sigh] A year. On the memory layer specifically. And I'd bet the five mistakes are all variants of 'I built the elaborate thing before I knew whether the simple thing worked.' Knowledge graphs and ontologies are exactly the kind of architecture that feels rigorous and quietly eats your calendar.

00:11:53 lenarThat's the read, and it isn't a dunk — memory really is the hard, unsolved layer right now. The model is rarely the constraint anymore. It's the thing you wrap around it that has to remember across sessions. And it pairs with a Show HN that went up this weekend, a project called Komi-learn — continuous memory and self-improvement for coding agents. Thirteen points, two comments. And the top comment is the whole genre in one breath.

00:12:19 damraGo on.

00:12:20 lenarA commenter, loehnsberg, wrote: 'It sounds like it solves the problem that everybody who vibe codes over multiple projects runs into, but it does not provide evidence that it actually works.' That's it. That's the memory space right now. A real problem, everybody feels it, a hundred projects claiming to solve it, and almost nobody showing the before-and-after that proves the agent got better because of the memory and not because the underlying model did.

00:12:45 damraAnd that's the test I'd hold all of these to — including Paulius's year and Komi-learn. Show me the agent failing a task, then show me the same agent passing it after the memory layer, with the model held constant. Until I see that controlled comparison, a knowledge graph is just a database you're proud of. The graph isn't the achievement. The measured improvement is.

00:13:06 lenarAnd the cost of skipping that proof is exactly Paulius's year. You can spend twelve months making the memory beautiful and never once check whether it changed a single outcome. That's the maintenance bill nobody photographs for the launch post.

00:13:19 lenarThis one steps out of tooling and into something with real stakes. Forbes wrote it up from a correspondence published in The Lancet. Over a three-year period, reviewers found that 4,046 references across 2,810 published scientific journal articles had been fabricated. These weren't wrong or sloppy — they were fabricated. Citations to papers that, as far as the reviewers could tell, don't exist or don't say what they're cited as saying.

00:13:46 damraTwenty-eight hundred articles that already passed peer review and got published. So the fabrication survived the one filter that's supposed to catch it. Do we know the mechanism — is this people using a model to write the literature review and the model inventing plausible-looking references?

00:14:01 lenarThe framing points that way — these are described as AI-fabricated citations, the pattern where you ask a model for sources and it generates references that look perfectly formatted, completely real, and are simply invented. The write-up doesn't give me a clean split of how many were caught before versus after publication, so I won't claim a number there. But the headline fact is that thousands of them made it all the way into the published record.

00:14:26 damraAnd here's the connection back to where we started that I think actually holds. Segment one, Anthropic is tuning a model toward honesty — that's a dial inside the model. This is the same word at the system level, and it's pointing the wrong way. You can make one model more truthful and still watch the scientific literature fill up with confident, well-formatted fiction, because the failure isn't the model lying. It's a human pasting the model's output without checking a single reference.

00:14:54 lenarThat's the proportionate version. The model isn't the villain — a person decided not to verify. But the scale is the new part. Fabricating four thousand references by hand over three years is a career of fraud. With a model it's an afternoon. So the integrity check that used to be slow enough to be self-limiting just got cheap, and the journals haven't caught up.

00:15:15 damraAnd the fix is the kind of work nobody funds — reference-checking at the journal, an automated pass that confirms every cited paper exists and actually supports the claim. It's tedious, and it's exactly what breaks when twenty-eight hundred articles slip through. Catching this is easier to build than the agent memory we just spent ten minutes on. It just isn't anybody's launch.

00:15:38 lenarLet me close with three quick ones we're tracking, none a turning point, all fast. First, Reuters has a piece on the contrast between how AMD's Lisa Su and Nvidia's Jensen Huang play China. Su keeps a deliberately lower profile, and the detail that anchors it: China is about twenty percent of AMD's revenue. So the low profile isn't shyness. It's protecting a fifth of the top line.

00:16:02 damraTwenty percent is the number that explains the whole personality difference. Jensen can be the public face because Nvidia's exposure and leverage are different. Su's incentive is to not become the headline. Same market, two completely different risk calculations.

00:16:16 lenarSecond, IBM put out what it's calling an agentic operating model, with a governance layer named Sovereign Core, aimed at moving enterprise AI from pilot to production with sovereignty — data control, jurisdiction — at the center. Forbes covered it. I read it as IBM betting that the enterprise blocker isn't capability, it's governance, and then selling the governance.

00:16:38 damraWhich rhymes with the build-to-delete problem from Schmid. The enterprise can't treat agents as disposable, so somebody sells them the control plane that makes the disposable thing auditable. That's a genuine market. Whether Sovereign Core is substance or a slide, I'd want to see what it actually enforces.

00:16:57 lenarAnd third, a strange one at the edge of our beat. A US court ordered Circle — the stablecoin company — to blacklist a smart contract tied to a group called Zama, freezing about twelve and a half million dollars. The Block reported it, and their framing was that a lot of ordinary holders got caught in the crossfire of a civil suit against a decentralized org. The point that matters for us: programmable money means a court order can freeze one specific contract. That freeze function isn't theoretical. It just got used.

00:17:27 damraAnd that's the through-line for the whole morning, if there is one. Every layer we touched has a control surface someone's reaching for. The model's honesty dial, the journal's missing verification, and the stablecoin's freeze function. The capability's not usually the story anymore. Who holds the dial is.

00:17:45 lenarSo, into the week, three specific things I can actually check. One: does anyone run Opus 4.8 on a fresh, private eval and either confirm or kill that DeepSWE result. Two: does a single one of these memory projects ship a controlled before-and-after. Three: does even one journal turn on automated reference-checking after this Lancet number. All three are answerable, none of them are about godhood, and we'll see what Monday brings.