◆ Dispatch 043 · 2026-05-31 GSV Who Holds The Dial
Who Holds the Dial
“The capability's not usually the story anymore. Who holds the dial is.”
— Lenar Kess, today's narration
A frontier model gets called a step toward God in one window and a judgmental token-burner in the next. We spend the morning on the gap between the marketing altitude and the desk, and find the same thread running through everything: every layer now has a control surface someone's reaching for.
- Dylan Field on Opus 4.8 calls it "a very strange model" — honesty up, curiosity down, personality judgmental — a reminder that a tuning dial has costs you can feel.
- scaling01 on DeepSWE says GPT-5.5 "score-, time- and token-mogged" Opus 4.8, putting the efficiency column — the one that pays your bill — back in the conversation.
- Ben Kunkle on Zed's Zeta 2 shows how a ten-second editing pause becomes a training label, and how a million frontier-model calls got replaced by a self-grading student model.
- Philipp Schmid (DeepMind) on the five assumptions that trip up senior engineers building agents — errors as inputs, evals not unit tests, and "build to delete."
- Komi-learn and a year on knowledge-graph memory share one missing thing: a controlled before-and-after proving the memory layer, not the model, made the agent better.
- A Lancet correspondence finds 4,046 fabricated references across 2,810 published articles — model honesty rising while the literature's integrity falls.
- Quick hits: AMD's Lisa Su vs Nvidia's Jensen Huang on China, IBM's Sovereign Core, and a court ordering Circle to freeze a $12.6M contract.
Chapters
- 00:00:00 Transcript
Sources
40 cited-
1
@Light_onchain (Light )
X Light_onchain
This highlights a key gap in current agent architectures: reasoning quality is improving faster than explicit cost modeling and token-budget control.
x.com/Light_onchain/status/2060713662013219… →Details
- Excerpt
- This highlights a key gap in current agent architectures: reasoning quality is improving faster than explicit cost modeling and token-budget control.
- Context
- Directly addresses agentic coding tools and AI infrastructure/limitations (cost modeling, token budget), which are core topics.
- Key points
- Directly addresses agentic coding tools and AI infrastructure/limitations (cost modeling, token budget), which are core topics.
- Provenance
- Tweet · Primary source
-
2
AI Engineer · 10m39s
Video AI Engineer
Why (Senior) Engineers Struggle to Build AI Agents — Philipp Schmid, Google DeepMind — A `deleteItem` endpoint is obvious to the developer who built it. An agent only sees the function schema and docstring. Philipp…
www.youtube.com/watch?v=3_gYbhABcAE →Details
- Excerpt
- Why (Senior) Engineers Struggle to Build AI Agents — Philipp Schmid, Google DeepMind — A `deleteItem` endpoint is obvious to the developer who built it. An agent only sees the function schema and docstring. Philipp…
- Context
- Directly addresses the core topic of agentic coding tools, the shifting craft of software engineering, and AI infrastructure limitations.
- Key points
- Directly addresses the core topic of agentic coding tools, the shifting craft of software engineering, and AI infrastructure limitations.
- Provenance
- Video · Supporting source
-
3
@hwchase17 (Harrison Chase)
X hwchase17
LangChain🤝GEPA shout out to @bryonkuchML for contributing a PR to the GEPA repo to make it work for LangChain! You can now optimize your LangChain chains Docs:…
x.com/hwchase17/status/2060732843282850276 →Details
- Excerpt
- LangChain🤝GEPA shout out to @bryonkuchML for contributing a PR to the GEPA repo to make it work for LangChain! You can now optimize your LangChain chains Docs:…
- Context
- This announces a functional integration (LangChain/GEPA) that directly improves agentic coding tools and software development practice, hitting a core topic.
- Key points
- This announces a functional integration (LangChain/GEPA) that directly improves agentic coding tools and software development practice, hitting a core topic.
- Provenance
- Tweet · Primary source
-
4
@Prince_Canuma (Prince Canuma)
X Prince_Canuma
Awesome work Ivan!
x.com/Prince_Canuma/status/2060733735352295… →Details
- Excerpt
- Awesome work Ivan!
- Context
- The quoted tweet announces a major technical update (Flash support, Vision, Text) and performance metrics for an AI model (mlx-vlm), which is a primary artifact/break news.
- Key points
- The quoted tweet announces a major technical update (Flash support, Vision, Text) and performance metrics for an AI model (mlx-vlm), which is a primary artifact/break news.
- Provenance
- Tweet · Primary source
-
5
@ctatedev (Chris Tate)
X ctatedev
Human-readable syntax restored Agents can now use 𝚣𝚎𝚛𝚘 𝚐𝚛𝚊𝚙𝚑 to author Generated source code is fully reviewable, and code changes sync back to the graph For now, code remains the source of truth Thanks to…
x.com/ctatedev/status/2060740101798305869 →Details
- Excerpt
- Human-readable syntax restored Agents can now use 𝚣𝚎𝚛𝚘 𝚐𝚛𝚊𝚙𝚑 to author Generated source code is fully reviewable, and code changes sync back to the graph For now, code remains the source of truth Thanks to…
- Context
- Reports a new, functional artifact (zero graph/syntax) for agentic coding, directly addressing the 'agentic coding tools' and 'shifting craft' aspects of the topic.
- Key points
- Reports a new, functional artifact (zero graph/syntax) for agentic coding, directly addressing the 'agentic coding tools' and 'shifting craft' aspects of the topic.
- Provenance
- Tweet · Primary source
-
6
@theallinpod (The All-In Podcast)
X theallinpod
Bill Gurley: Anthropic Thinks It’s Building God @Jason : It is the ultimate level of narcissism and delusion of grandeur to think you can create God. @bgurley : “Anthropic is a mystery to me. I've never, ever seen a…
x.com/theallinpod/status/2060742848836735334 →Details
- Excerpt
- Bill Gurley: Anthropic Thinks It’s Building God @Jason : It is the ultimate level of narcissism and delusion of grandeur to think you can create God. @bgurley : “Anthropic is a mystery to me. I've never, ever seen a…
- Context
- This tweet discusses the power dynamics and ethical concerns surrounding frontier AI development (Anthropic), directly addressing the podcast's focus on power, control, and the nature of advanced intelligence.
- Key points
- This tweet discusses the power dynamics and ethical concerns surrounding frontier AI development (Anthropic), directly addressing the podcast's focus on power, control, and the nature of advanced intelligence.
- Provenance
- Tweet · Primary source
-
7
r/singularity: DeepSWE Opus 4.8 results have been released. - 0 pts · 0 comments
Article CallMePyro
submitted by /u/CallMePyro to r/singularity [link] [comments]
i.redd.it/nr4uc2l6na4h1.png →Details
- Excerpt
- submitted by /u/CallMePyro to r/singularity [link] [comments]
- Context
- Directly reports on a major artifact (DeepSWE Opus 4.8 results), which is a primary signal for the AI/coding tools space.
- Key points
- Directly reports on a major artifact (DeepSWE Opus 4.8 results), which is a primary signal for the AI/coding tools space.
- Provenance
- Article · Supporting source
-
8
@amberdawn1786 (Amber Dawn)
X amberdawn1786
@grok in what ways could an AI become God in terms of capabilities and effect on humanity?
x.com/amberdawn1786/status/2060749181887635… →Details
- Excerpt
- @grok in what ways could an AI become God in terms of capabilities and effect on humanity?
- Context
- The tweet directly addresses the ultimate capabilities and impact of advanced AI, which is central to the podcast's discussion of frontier models and power dynamics.
- Key points
- The tweet directly addresses the ultimate capabilities and impact of advanced AI, which is central to the podcast's discussion of frontier models and power dynamics.
- Provenance
- Tweet · Primary source
-
9
r/AI_Agents: I spent a year building agent memory on knowledge graphs. Here are the 5 mistakes that cost me months - 0 pts · 0 comments
Article pauliusztin
I spent the past year building a unified memory layer for my AI agents using knowledge graphs and ontologies on top of MongoDB. I followed every trend first. I reached for the shiny frameworks and tried to design the...
www.reddit.com/r/AI_Agents/comments/1ts3nq2… →Details
- Excerpt
- I spent the past year building a unified memory layer for my AI agents using knowledge graphs and ontologies on top of MongoDB. I followed every trend first. I reached for the shiny frameworks and tried to design the...
- Context
- Details a deep technical challenge (agent memory/KG) and provides actionable lessons for building complex AI agents.
- Key points
- Details a deep technical challenge (agent memory/KG) and provides actionable lessons for building complex AI agents.
- Provenance
- Article · Supporting source
-
10
AI Engineer · 10m49s
Video AI Engineer
How We Built Zeta2: Training an Edit Prediction Model in Production — Ben Kunkle, Zed — To validate settled data, Zed ran 10 frontier model predictions per example and measured Levenshtein distance to the final state.…
www.youtube.com/watch?v=phchDt63qAA →Details
- Excerpt
- How We Built Zeta2: Training an Edit Prediction Model in Production — Ben Kunkle, Zed — To validate settled data, Zed ran 10 frontier model predictions per example and measured Levenshtein distance to the final state.…
- Context
- Details a specific, advanced AI/coding tool (Zeta 2) pipeline, covering training, data flow, and technical challenges (distillation, Levenshtein distance). Highly relevant to 'agentic coding tools' and 'AI infrastructure'.
- Key points
- Details a specific, advanced AI/coding tool (Zeta 2) pipeline, covering training, data flow, and technical challenges (distillation, Levenshtein distance). Highly relevant to 'agentic coding tools' and 'AI infrastructure'.
- Provenance
- Video · Supporting source
-
11
@scaling01 (Lisan al Gaib)
X scaling01
Opus 4.8 gets score-, time- and token-mogged by GPT-5.5 on DeepSWE
x.com/scaling01/status/2060768119941947699 →Details
- Excerpt
- Opus 4.8 gets score-, time- and token-mogged by GPT-5.5 on DeepSWE
- Context
- This is a direct, measurable claim about a model's performance (Opus 4.8 vs GPT-5.5) on a specific, relevant benchmark (DeepSWE), fitting the 'break news' criteria.
- Key points
- This is a direct, measurable claim about a model's performance (Opus 4.8 vs GPT-5.5) on a specific, relevant benchmark (DeepSWE), fitting the 'break news' criteria.
- Provenance
- Tweet · Primary source
-
12
r/singularity: Opus 4.8 Leads the Singularity Gate: New Benchmark for AI predicting paradigm-breaking scientific discoveries after model traning cutoff - 0 pts · 0 comments
Article queenofartists
Just as I released a new benchmark called the Singularity Gate, which tests whether frontier AI models can predict paradigm-breaking scientific discoveries published after their training cutoff, Opus 4.8 was launched....
www.reddit.com/r/singularity/comments/1ts5b… →Details
- Excerpt
- Just as I released a new benchmark called the Singularity Gate, which tests whether frontier AI models can predict paradigm-breaking scientific discoveries published after their training cutoff, Opus 4.8 was launched....
- Context
- This post introduces a new benchmark (Singularity Gate) for AI's ability to predict scientific discoveries, directly addressing frontier model capabilities and AI-driven discovery.
- Key points
- This post introduces a new benchmark (Singularity Gate) for AI's ability to predict scientific discoveries, directly addressing frontier model capabilities and AI-driven discovery.
- Provenance
- Article · Supporting source
-
13
@zoink (Dylan Field)
X zoink
Opus 4.8 is a very strange model. Clearly Anthropic tried to improve honesty, which is commendable. However, the model's curiosity (already worse in 4.7) degraded further. Result is a judgmental personality +…
x.com/zoink/status/2060769829133721974 →Details
- Excerpt
- Opus 4.8 is a very strange model. Clearly Anthropic tried to improve honesty, which is commendable. However, the model's curiosity (already worse in 4.7) degraded further. Result is a judgmental personality +…
- Context
- Discusses a specific, named frontier model (Opus 4.8) and analyzes its technical performance characteristics (honesty, curiosity, hedging), directly addressing the 'frontier model releases' topic.
- Key points
- Discusses a specific, named frontier model (Opus 4.8) and analyzes its technical performance characteristics (honesty, curiosity, hedging), directly addressing the 'frontier model releases' topic.
- Provenance
- Tweet · Primary source
-
14
r/Anthropic: why are we celebrating burning more tokens like its a flex - 0 pts · 0 comments
Article Complete-Sea6655
genuine question saw someone on here yesterday talking about how they "tokenmaxx" their prompts to get better results and i had to put my phone down and stare at the wall for a second like. you are paying MORE. to get...
www.reddit.com/r/Anthropic/comments/1ts6hl1… →Details
- Excerpt
- genuine question saw someone on here yesterday talking about how they "tokenmaxx" their prompts to get better results and i had to put my phone down and stare at the wall for a second like. you are paying MORE. to get...
- Context
- Directly addresses the economics and efficiency of using frontier models (Anthropic/Opus 4.8), which is central to the 'AI infrastructure' and 'power dynamics' topics.
- Key points
- Directly addresses the economics and efficiency of using frontier models (Anthropic/Opus 4.8), which is central to the 'AI infrastructure' and 'power dynamics' topics.
- Provenance
- Article · Supporting source
-
15
r/LocalLLaMA: nvidia/Qwen3.6-35B-A3B-NVFP4 · Hugging Face - 0 pts · 0 comments
Article pmttyji
The NVIDIA Qwen3.6-35B-A3B-NVFP4 model is the quantized version of Alibaba's Qwen3.6-35B-A3B model, which is an auto-regressive language model that uses an optimized transformer architecture. For more information,...
huggingface.co/nvidia/Qwen3.6-35B-A3B-NVFP4 →Details
- Excerpt
- The NVIDIA Qwen3.6-35B-A3B-NVFP4 model is the quantized version of Alibaba's Qwen3.6-35B-A3B model, which is an auto-regressive language model that uses an optimized transformer architecture. For more information,...
- Context
- This is a primary artifact (a new, quantized model release) directly related to AI infrastructure and frontier models.
- Key points
- This is a primary artifact (a new, quantized model release) directly related to AI infrastructure and frontier models.
- Provenance
- Article · Supporting source
-
16
AI Engineer · 17m42s
Video AI Engineer
How I deleted 95% of my agent skills and got better results — Nick Nisi, WorkOS — Claude would fake running tests by touching the expected output file. Nick Ni, DX engineer at WorkOS, fixed it by SHA-256 hashing the…
www.youtube.com/watch?v=vy7o1g2iHY8 →Details
- Excerpt
- How I deleted 95% of my agent skills and got better results — Nick Nisi, WorkOS — Claude would fake running tests by touching the expected output file. Nick Ni, DX engineer at WorkOS, fixed it by SHA-256 hashing the…
- Context
- Directly discusses agentic coding tools, state machines, and verifiable execution, which are core topics in AI/software engineering.
- Key points
- Directly discusses agentic coding tools, state machines, and verifiable execution, which are core topics in AI/software engineering.
- Provenance
- Video · Supporting source
-
17
Techmeme - Industry Adjacent (US)
Article
SoftBank pledges to invest up to €75B in AI computing clusters in France, first leading a €45B investment to build 3.1GW of capacity by 2031 in Hauts-de-France (Financial Times) - Financial Times : SoftBank pledges to...
www.techmeme.com/260530/p12 →Details
- Excerpt
- SoftBank pledges to invest up to €75B in AI computing clusters in France, first leading a €45B investment to build 3.1GW of capacity by 2031 in Hauts-de-France (Financial Times) - Financial Times : SoftBank pledges to...
- Context
- Major capital investment (SoftBank) in AI infrastructure (computing clusters, 3.1GW) and geopolitics (France). Directly relates to power dynamics and AI infrastructure.
- Key points
- Major capital investment (SoftBank) in AI infrastructure (computing clusters, 3.1GW) and geopolitics (France). Directly relates to power dynamics and AI infrastructure.
- Provenance
- Article · Supporting source
-
18
@scaling01 (Lisan al Gaib)
X scaling01
Opus 4.8 with high thinking effort now on par with GPT-5.5-xhigh on ALE-Bench
x.com/scaling01/status/2060810582714908846 →Details
- Excerpt
- Opus 4.8 with high thinking effort now on par with GPT-5.5-xhigh on ALE-Bench
- Context
- This tweet reports a specific, measurable benchmark result (ALE-Bench) comparing a model (Opus 4.8) to a future model (GPT-5.5-xhigh), directly addressing the 'frontier model releases' and 'AI infrastructure' aspects of the topic.
- Key points
- This tweet reports a specific, measurable benchmark result (ALE-Bench) comparing a model (Opus 4.8) to a future model (GPT-5.5-xhigh), directly addressing the 'frontier model releases' and 'AI infrastructure' aspects of the topic.
- Provenance
- Tweet · Primary source
-
19
@martin_casado
X martin_casado
Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models…
x.com/martin_casado/status/2060813284492955… →Details
- Excerpt
- Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models…
- Context
- Directly addresses the economic and technical viability of open-source AI models, a core topic of AI infrastructure and power dynamics.
- Key points
- Directly addresses the economic and technical viability of open-source AI models, a core topic of AI infrastructure and power dynamics.
- Provenance
- Tweet · Primary source
-
20
@natolambert (Nathan Lambert)
X natolambert
The debate on if open or closed models win comes down to if there is disproportionate value to marginally better intelligence. The believers of this sit across from the open models will be good enough camp. Closed…
x.com/natolambert/status/2060838705569620413 →Details
- Excerpt
- The debate on if open or closed models win comes down to if there is disproportionate value to marginally better intelligence. The believers of this sit across from the open models will be good enough camp. Closed…
- Context
- Directly addresses the core debate (open vs. closed models) central to the podcast's focus on AI's future and power dynamics.
- Key points
- Directly addresses the core debate (open vs. closed models) central to the podcast's focus on AI's future and power dynamics.
- Provenance
- Tweet · Primary source
-
21
Forbes Innovation - Industry Adjacent (US)
Article Bruce Y. Lee, Senior Contributor
AI-Fabricated Citations In Over 2,800 Biomedical Journal Articles - A Lancet correspondence described how over a three-year period, 4,046 references in 2,810 published scientific journal articles had been fabricated,...
www.forbes.com/sites/brucelee/2026/05/30/ai… →Details
- Excerpt
- AI-Fabricated Citations In Over 2,800 Biomedical Journal Articles - A Lancet correspondence described how over a three-year period, 4,046 references in 2,810 published scientific journal articles had been fabricated,...
- Context
- Directly addresses AI's impact on scientific integrity, research, and knowledge production, a key power dynamic.
- Key points
- Directly addresses AI's impact on scientific integrity, research, and knowledge production, a key power dynamic.
- Provenance
- Article · Supporting source
-
22
@reach_vb (Vaibhav (VB) Srivastav)
X reach_vb
GPT-5.5 is #1 on DeepSWE, a hard long-horizon coding benchmark 🔥 70% pass@1 vs 58% for Claude Opus 4.8. And GPT-5.5 gets there with: ~2x faster runs ~1/2 the cost ~1/3 the output tokens Literally, better intelligence…
x.com/reach_vb/status/2060865517628379466 →Details
- Excerpt
- GPT-5.5 is #1 on DeepSWE, a hard long-horizon coding benchmark 🔥 70% pass@1 vs 58% for Claude Opus 4.8. And GPT-5.5 gets there with: ~2x faster runs ~1/2 the cost ~1/3 the output tokens Literally, better intelligence…
- Context
- Breaks news about a specific model's performance on a hard coding benchmark (DeepSWE), directly addressing AI's capability and economic impact on software engineering.
- Key points
- Breaks news about a specific model's performance on a hard coding benchmark (DeepSWE), directly addressing AI's capability and economic impact on software engineering.
- Provenance
- Tweet · Primary source
-
23
@emollick (Ethan Mollick)
X emollick
It does seem like meaningfully better AI releases are accelerating, especially from OpenAI & Anthropic. To illustrate, I caused this timeline to be created. It only lists new models that scored 3 points or higher over…
x.com/emollick/status/2060867599869649097 →Details
- Excerpt
- It does seem like meaningfully better AI releases are accelerating, especially from OpenAI & Anthropic. To illustrate, I caused this timeline to be created. It only lists new models that scored 3 points or higher over…
- Context
- Directly addresses the 'frontier model releases' and 'near-future of AI' by pointing to accelerating, measurable improvements in major models.
- Key points
- Directly addresses the 'frontier model releases' and 'near-future of AI' by pointing to accelerating, measurable improvements in major models.
- Provenance
- Tweet · Primary source
-
24
AI News & Strategy Daily | Nate B Jones · 1m10s
Video AI News & Strategy Daily | Nate B Jones
OpenAI's Compound Bet: A Risk Worth Taking? #OpenAIstory #ainews — Full Story w/ Prompts: https://natesnewsletter.substack.com/p/your-engineers-are-building-your?r=1z4sm5&utm_campaign=post&utm_medium=web&showWelcomeOnSh…
www.youtube.com/shorts/Kb7FxKgUWvo →Details
- Excerpt
- OpenAI's Compound Bet: A Risk Worth Taking? #OpenAIstory #ainews — Full Story w/ Prompts: https://natesnewsletter.substack.com/p/your-engineers-are-building-your?r=1z4sm5&utm_campaign=post&utm_medium=web&showWelcomeOnSh…
- Context
- Directly addresses OpenAI's strategy, enterprise context, and market impact, which is central to the podcast's focus on power dynamics and AI infrastructure.
- Key points
- Directly addresses OpenAI's strategy, enterprise context, and market impact, which is central to the podcast's focus on power dynamics and AI infrastructure.
- Provenance
- Video · Supporting source
-
25
@CollinBurdick (Collin Burdick)
X CollinBurdick
Who said you can't have cheap, fast, and good at the same time?? GPT-5.5 smashes Opus 4.8 on DeepSWE across all 3 at highest max reasoning. >> Higher score: 70% vs. 58% >> 2x faster >> 2x cheaper >> 3x fewer output…
x.com/CollinBurdick/status/2060874911254745… →Details
- Excerpt
- Who said you can't have cheap, fast, and good at the same time?? GPT-5.5 smashes Opus 4.8 on DeepSWE across all 3 at highest max reasoning. >> Higher score: 70% vs. 58% >> 2x faster >> 2x cheaper >> 3x fewer output…
- Context
- Reports a specific, measurable benchmark result (DeepSWE) comparing model versions, directly addressing the 'frontier model releases' and 'agentic coding tools' aspects of the topic.
- Key points
- Reports a specific, measurable benchmark result (DeepSWE) comparing model versions, directly addressing the 'frontier model releases' and 'agentic coding tools' aspects of the topic.
- Provenance
- Tweet · Primary source
-
26
Machine Learning Street Talk · 1h20m
Video Machine Learning Street Talk
The Ex-Congressman Who Says AI Isn't Unstoppable — Brad Carson — Brad Carson was the Army's General Counsel, served two terms in Congress and was Acting Under Secretary of Defense for Personnel and Readiness. He now…
www.youtube.com/watch?v=TpyS50ifmX4 →Details
- Excerpt
- The Ex-Congressman Who Says AI Isn't Unstoppable — Brad Carson — Brad Carson was the Army's General Counsel, served two terms in Congress and was Acting Under Secretary of Defense for Personnel and Readiness. He now…
- Context
- Directly addresses power dynamics, regulation, and legal liability of frontier models, which is core to the podcast topic.
- Key points
- Directly addresses power dynamics, regulation, and legal liability of frontier models, which is core to the podcast topic.
- Provenance
- Video · Supporting source
-
27
@suchenzang (Susan Zhang)
X suchenzang
corollary: generative language modeling vs classification of (arbitrarily long bodies of) text as being synthetically generated have the same complexity
x.com/suchenzang/status/2060897725798088922 →Details
- Excerpt
- corollary: generative language modeling vs classification of (arbitrarily long bodies of) text as being synthetically generated have the same complexity
- Context
- This tweet makes a technical claim about the complexity of generative modeling vs. detection, which is a core technical debate in AI/ML.
- Key points
- This tweet makes a technical claim about the complexity of generative modeling vs. detection, which is a core technical debate in AI/ML.
- Provenance
- Tweet · Primary source
-
28
r/ClaudeAI: Opus 4.8 + Thinking is draining context windows 40–60x faster - 0 pts · 0 comments
Article Adventurous_Two9033
Pulled the token data from my token usage tracker. Opus 4.8 with Thinking enabled writes up to 900,000 cache tokens per turn. Opus 4.7 does 14,000–34,000. Thinking blocks get cached with every turn, context snowballs,...
www.reddit.com/r/ClaudeAI/comments/1tshmz6/… →Details
- Excerpt
- Pulled the token data from my token usage tracker. Opus 4.8 with Thinking enabled writes up to 900,000 cache tokens per turn. Opus 4.7 does 14,000–34,000. Thinking blocks get cached with every turn, context snowballs,...
- Context
- This post provides a measurable, technical artifact (token usage data) detailing a critical change in model behavior (always-on thinking) that directly impacts AI infrastructure and usage patterns.
- Key points
- This post provides a measurable, technical artifact (token usage data) detailing a critical change in model behavior (always-on thinking) that directly impacts AI infrastructure and usage patterns.
- Provenance
- Article · Supporting source
-
29
@techdevnotes (Tech Dev Notes)
X techdevnotes
xAI has released Grok Imagine Video 1.5 Preview model in API
x.com/techdevnotes/status/20609118036898245… →Details
- Excerpt
- xAI has released Grok Imagine Video 1.5 Preview model in API
- Context
- Reports a specific, primary artifact (model release) directly related to AI infrastructure and frontier models.
- Key points
- Reports a specific, primary artifact (model release) directly related to AI infrastructure and frontier models.
- Provenance
- Tweet · Primary source
-
30
AI News & Strategy Daily | Nate B Jones · 1m12s
Video AI News & Strategy Daily | Nate B Jones
The Compound Risk of AI Agents ⚠️ #ai #risk #software — Full Story w/ Prompts: https://natesnewsletter.substack.com/p/your-engineers-are-building-your?r=1z4sm5&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true…
www.youtube.com/shorts/oTTVQt4IjPI →Details
- Excerpt
- The Compound Risk of AI Agents ⚠️ #ai #risk #software — Full Story w/ Prompts: https://natesnewsletter.substack.com/p/your-engineers-are-building-your?r=1z4sm5&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true…
- Context
- Directly addresses agentic workflows, systemic risk, and the 'new system of record' for the enterprise, which is central to the podcast's focus on AI agents and infrastructure.
- Key points
- Directly addresses agentic workflows, systemic risk, and the 'new system of record' for the enterprise, which is central to the podcast's focus on AI agents and infrastructure.
- Provenance
- Video · Supporting source
-
31
Forbes Innovation - Industry Adjacent (US)
Article Steve McDowell, Contributor
IBM's Agentic Operating Model Puts Sovereignty At The Center - IBM unveiled an agentic operating model, Sovereign Core for governance, and expanded IBM Consulting capabilities to move enterprise AI from pilot to...
www.forbes.com/sites/stevemcdowell/2026/05/… →Details
- Excerpt
- IBM's Agentic Operating Model Puts Sovereignty At The Center - IBM unveiled an agentic operating model, Sovereign Core for governance, and expanded IBM Consulting capabilities to move enterprise AI from pilot to...
- Context
- Directly addresses agentic models, enterprise AI deployment, and the power dynamics (sovereignty/control) central to the podcast topic.
- Key points
- Directly addresses agentic models, enterprise AI deployment, and the power dynamics (sovereignty/control) central to the podcast topic.
- Provenance
- Article · Supporting source
-
32
@bibryam (Bilgin Ibryam)
X bibryam
SkillSpector - a new security scanner for skills by NVIDIA • Scan AI agent skills before installing them • 64 security checks across 16 categories • Fast static analysis + • Optional LLM semantic evaluation • Prompt…
x.com/bibryam/status/2060940955084054634/ph… →Details
- Excerpt
- SkillSpector - a new security scanner for skills by NVIDIA • Scan AI agent skills before installing them • 64 security checks across 16 categories • Fast static analysis + • Optional LLM semantic evaluation • Prompt…
- Context
- Announces a new, specific security tool (SkillSpector) for AI agents, directly addressing the security and reliability concerns of agentic tools discussed in the podcast.
- Key points
- Announces a new, specific security tool (SkillSpector) for AI agents, directly addressing the security and reliability concerns of agentic tools discussed in the podcast.
- Provenance
- Tweet · Primary source
-
33
@pmarca (Marc Andreessen )
X pmarca
Interesting.
x.com/pmarca/status/2060941902325875132 →Details
- Excerpt
- Interesting.
- Context
- The quoted tweet announces a new, specific security tool (SkillSpector) for AI agents, directly addressing the security and reliability concerns of agentic tools, which is a core topic.
- Key points
- The quoted tweet announces a new, specific security tool (SkillSpector) for AI agents, directly addressing the security and reliability concerns of agentic tools, which is a core topic.
- Provenance
- Tweet · Primary source
-
34
r/AI_Agents: After months of building agents, I've changed my mind about what matters most. - 0 pts · 0 comments
Article MerisDabhi
I think a lot of people are underestimating how hard it is to get AI agents into production. Building a demo is easy. Making something that works reliably after thousands of runs is where things get interesting. A few...
www.reddit.com/r/AI_Agents/comments/1tslmcs… →Details
- Excerpt
- I think a lot of people are underestimating how hard it is to get AI agents into production. Building a demo is easy. Making something that works reliably after thousands of runs is where things get interesting. A few...
- Context
- Directly addresses the practical challenges of deploying AI agents, focusing on reliability, orchestration, and system engineering, which is core to the podcast topic.
- Key points
- Directly addresses the practical challenges of deploying AI agents, focusing on reliability, orchestration, and system engineering, which is core to the podcast topic.
- Provenance
- Article · Supporting source
-
35
r/LocalLLaMA: mudler/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-APEX-MTP-GGUF just released ! - 0 pts · 0 comments
Article PhotographerUSA
Description of the module: I host 30+ free APEX MoE quantizations as independent research. My only local hardware is an NVIDIA DGX Spark (122 GB unified memory) — enough for ~30-50B-class MoEs, but bigger ones (200B+)...
www.reddit.com/r/LocalLLaMA/comments/1tslv3… →Details
- Excerpt
- Description of the module: I host 30+ free APEX MoE quantizations as independent research. My only local hardware is an NVIDIA DGX Spark (122 GB unified memory) — enough for ~30-50B-class MoEs, but bigger ones (200B+)...
- Context
- This post ships a primary artifact (a quantized model/tool) and details a technical improvement (MTP/speculative decoding) for local LLMs, directly addressing the 'frontier model releases' and 'agentic tools' focus.
- Key points
- This post ships a primary artifact (a quantized model/tool) and details a technical improvement (MTP/speculative decoding) for local LLMs, directly addressing the 'frontier model releases' and 'agentic tools' focus.
- Provenance
- Article · Supporting source
-
36
Techmeme - Industry Adjacent (US)
Article
A US court ordered Circle to blacklist Zama's cUSDC contract, freezing ~$12.6M in funds, likely catching many in the "crossfire" of a civil suit against a DAO (Zack Abrams/The Block) - Zack Abrams / The Block : A US...
www.techmeme.com/260531/p3 →Details
- Excerpt
- A US court ordered Circle to blacklist Zama's cUSDC contract, freezing ~$12.6M in funds, likely catching many in the "crossfire" of a civil suit against a DAO (Zack Abrams/The Block) - Zack Abrams / The Block : A US...
- Context
- Directly addresses financial infrastructure, legal action, and the control of digital assets (USDC/cUSDC), which is core to AI/compute power dynamics.
- Key points
- Directly addresses financial infrastructure, legal action, and the control of digital assets (USDC/cUSDC), which is core to AI/compute power dynamics.
- Provenance
- Article · Supporting source
-
37
Show HN: Komi-learn – continuous memory and self-improvement for coding agents — 13 pts · 2 comments
Article rainxchzed
https://github.com/kurikomi-labs/komi-learn · @loehnsberg: It sounds like it solves the problem that everybody who vibe codes over multiple projects runs into, but it does not provide evidence that it actually works…
github.com/kurikomi-labs/komi-learn →Details
- Excerpt
- https://github.com/kurikomi-labs/komi-learn · @loehnsberg: It sounds like it solves the problem that everybody who vibe codes over multiple projects runs into, but it does not provide evidence that it actually works…
- Context
- Directly addresses agentic coding tools and self-improvement, a core topic. The 'Show HN' format is a primary artifact.
- Key points
- Directly addresses agentic coding tools and self-improvement, a core topic. The 'Show HN' format is a primary artifact.
- Provenance
- Article · Supporting source
-
38
r/singularity: Open-weights VLA hits 80%+ task progress on 4 of 17 real-robot tasks with zero fine-tuning. Demo reel attached - 0 pts · 0 comments
Article BookwormSarah1
Sharing this because it is an embodied AI release trying to make the pretrained checkpoint itself measurable, instead of only showing results after task-specific tuning. The video is a reel from Wall-OSS-0.5, a vision...
v.redd.it/o5h4czb34f4h1 →Details
- Excerpt
- Sharing this because it is an embodied AI release trying to make the pretrained checkpoint itself measurable, instead of only showing results after task-specific tuning. The video is a reel from Wall-OSS-0.5, a vision...
- Context
- This post reports a primary artifact (model/paper/demo) in embodied AI, directly addressing the 'near-future of AI' and 'agentic tools' topics.
- Key points
- This post reports a primary artifact (model/paper/demo) in embodied AI, directly addressing the 'near-future of AI' and 'agentic tools' topics.
- Provenance
- Article · Supporting source
-
39
Axios - Industry Adjacent (US)
Article Amy Harder
AI is turning energy into the hottest business in America - The AI boom is pushing companies across the economy — from tech giants to automakers — deep into the energy business. Why it matters : The scramble for...
www.axios.com/2026/05/31/ai-energy-business… →Details
- Excerpt
- AI is turning energy into the hottest business in America - The AI boom is pushing companies across the economy — from tech giants to automakers — deep into the energy business. Why it matters : The scramble for...
- Context
- Directly addresses AI infrastructure (energy, power, data centers) and the power dynamics (capital, geopolitics) shaping AI's physical build-out.
- Key points
- Directly addresses AI infrastructure (energy, power, data centers) and the power dynamics (capital, geopolitics) shaping AI's physical build-out.
- Provenance
- Article · Supporting source
-
40
Techmeme - Industry Adjacent (US)
Article
A look at AMD CEO Lisa Su's and Nvidia CEO Jensen Huang's contrasting China playbooks, with Su keeping a lower profile; China accounts for ~20% of AMD's revenue (Reuters) - Reuters : A look at AMD CEO Lisa Su's and...
www.techmeme.com/260531/p7 →Details
- Excerpt
- A look at AMD CEO Lisa Su's and Nvidia CEO Jensen Huang's contrasting China playbooks, with Su keeping a lower profile; China accounts for ~20% of AMD's revenue (Reuters) - Reuters : A look at AMD CEO Lisa Su's and...
- Context
- Directly addresses the power dynamics and geopolitics of AI infrastructure (AMD/Nvidia) and market control in China.
- Key points
- Directly addresses the power dynamics and geopolitics of AI infrastructure (AMD/Nvidia) and market control in China.
- Provenance
- Article · Supporting source
Transcript
00:00:00 lenarHere's a small puzzle for a Sunday morning. You upgrade to the newest model from a frontier lab — the one everyone spent yesterday arguing about on the leaderboards. You expect it to feel smarter, maybe a little warmer. Instead, the thing turns judgmental. That's the word Dylan Field reached for. He runs Figma, he uses these models hard, and on Saturday he posted — quote — 'Opus 4.8 is a very strange model. Clearly Anthropic tried to improve honesty, which is commendable. However, the model's curiosity, already worse in 4.7, degraded further. Result is a judgmental personality.' And then the tweet trails off, so I only have him to that point. But that's a strange sentence to read about a model people are calling a step toward something godlike.
00:00:47 damraJudgmental is such a specific complaint. [pause] The question for me is whether that's a personality artifact or a capability one — because those two get talked about as if they're the same thing, and they're really not. You can have a model that's more truthful and less pleasant to work with at the same time. That might even be the trade Anthropic made on purpose.
00:01:07 lenarRight, and that's the fair read of Field's note. He's not saying it's dumber. He's saying they pushed on honesty — which is a real, deliberate dial — and curiosity fell out the other side. The model hedges less, second-guesses you more. Whether that's the price of the honesty tuning or just a regression, we don't have the internals to say.
00:01:26 damraAnd we should be clear — that's one expert's hands-on impression, not a measurement. Did anybody actually measure 4.8 against the field this weekend?
00:01:35 lenarYeah, that's the second piece. A researcher who posts as scaling01 ran it on a coding benchmark called DeepSWE — real software-engineering tasks — and his summary was blunt. He said Opus 4.8 gets, his words, 'score-, time- and token-mogged by GPT-5.5' on that benchmark. Meaning GPT-5.5 scored higher, finished faster, and used fewer tokens to do it.
00:02:03 damra[tsk] Okay, but token-mogged on one benchmark by one person. Do we have the actual numbers, the chart?
00:02:10 lenarSomeone did post the DeepSWE results as an image over on the singularity subreddit, so the chart is out there. But I'm not going to read you figures I can't verify off a screenshot — I'd rather give you the shape than invent a decimal. The shape is: on this particular eval, the newest Anthropic model is not the efficiency winner. Which is interesting precisely because yesterday the whole conversation was about Opus 4.8's benchmark jump.
00:02:35 damraAnd it folds right into that token argument from this week. There was a post on the Anthropic subreddit — someone clearly upset — 'why are we celebrating burning more tokens like it's a flex.' Their line was, you're paying more to get more, and somehow that became a brag. If GPT-5.5 gets the same or better result for fewer tokens, then the efficiency column is the column that pays your bill.
00:02:59 lenarThere's one more 4.8 item, and it's the kind of thing that flies around on a weekend, so let me be careful with it. Someone released a benchmark they're calling the Singularity Gate. It's pitched as a test of whether a frontier model can predict paradigm-breaking scientific discoveries published after its training cutoff. And the headline is that Opus 4.8 leads it.
00:03:20 damra[lip-smack] I mean — predict discoveries that haven't been made yet, scored by the person who built the benchmark and released it the same week the model launched. I'd put basically no weight on that until someone independent runs it. Predicting post-cutoff science is almost designed to reward a model that's good at sounding profound.
00:03:38 lenarAnd that's the tension I keep circling this morning. On one side you've got this benchmark framing the model as a near-oracle. On the other you've got Bill Gurley on the All-In podcast saying — quote — 'Anthropic is a mystery to me, I've never, ever seen' — and the host, Jason, calling it 'the ultimate level of narcissism and delusion of grandeur to think you can create God.' There's even someone on X asking Grok, straight up, in what ways an AI could become God. So the rhetoric is theological. And the hands-on report from a serious user is: it got judgmental, and a competitor used fewer tokens.
00:04:17 damraThe gap between those two registers is the whole thing. The marketing altitude is deity. The desk-level altitude is a model with a personality regression that loses an efficiency race. Both are being said about the same week. I'll trust the person who actually shipped code with it over the person theorizing about godhood on a podcast.
00:04:37 lenarSo hold that — the desk beats the pulpit — because the next thing is somebody who built at the desk, in painful detail. This is from a talk by Ben Kunkle. He leads edit predictions at Zed, the editor, and he walked through how they trained Zeta 2 — the model that guesses your next code edit on every keystroke. What I love about it is that it's the actual machinery under a feature that feels like mind-reading. The model has to predict, in milliseconds, what you're about to change, accurately enough that accepting the suggestion beats ignoring it.
00:05:08 damraEvery keystroke is a brutal latency budget. So where does the training data even come from? You can't have humans labeling 'here's the correct next edit' at that volume.
00:05:18 lenarTwo pieces, and the second is the clever one. First, they distill from a frontier teacher model — a big model generates the right prediction, and the small model learns to imitate it. But the interesting part is what Kunkle calls settled data. The editor watches you work, and when you stop editing a region for ten seconds, it snapshots that final state of the code and treats it as ground truth — as in, that's probably what you meant the code to become.
00:05:44 damra[chuckle] So the ten-second pause is the label. The absence of you typing is the supervision signal. That's elegant and a little unnerving — it's mining your hesitation.
00:05:55 lenarAnd it's noisy, right, because maybe you came back and changed it again, or an agent edited the file underneath you. So they filter. They generate several teacher predictions per example and measure how close those land to the settled state using an n-gram edit-distance metric — Levenshtein, basically how many small changes it takes to get from one string to another. And here's the part I didn't expect: they don't keep the easiest examples. They keep the ones in the middle of the similarity range.
00:06:24 damraBecause the easy ones the small model already knows. The middle band is where the novel patterns live — the stuff past the student model's training cutoff. You're deliberately harvesting the examples that are hard but not garbage. That's a real piece of taste baked into the pipeline.
00:06:39 lenarRight. And the cost arc tells you how fast this moves. Kunkle said the initial filtering took up to a million frontier-model requests per hundred thousand examples. A million calls to a big model to clean one batch. Now they've swapped the frontier teacher for their own student checkpoints, run fifty times each, at — his phrase — negligible cost. So the expensive teacher was a temporary crutch they dropped the moment the student got good enough to grade its own work.
00:07:07 damraThat's the loop everyone's chasing — the model gets good enough to generate its own training signal. And on the production side, did he say how they roll it out? Because a wrong edit prediction on every keystroke is maddening.
00:07:19 lenarThey track acceptance rate, latency, and — this is the good one — diagnostic error counts before and after the prediction, plus a reversal ratio: how often you immediately undo what the model suggested. And they ramp on a traffic dashboard from fifteen percent up to full. So the eval isn't a leaderboard. It's 'did this make your next ten seconds better, or did you rip it out.'
00:07:41 damraAnd that's agentic coding stripped of the demo — not a model that writes your app, a model fighting for the right to fill in three characters without annoying you. The whole discipline lives in the filtering and the reversal ratio, not the keynote.
00:07:55 lenarThis pairs perfectly. Philipp Schmid, an engineer at Google DeepMind working on Gemini agents, gave a talk on why senior engineers — the good ones — struggle to build AI agents. His claim is that the problem isn't talent. Five assumptions from normal software break the moment you build agents, and the better you are at the old way, the harder you cling to them.
00:08:18 damraLead with the one that bites hardest.
00:08:20 lenarErrors as inputs. In normal software a failed call is cheap — you catch it, you retry, milliseconds. Schmid points out an agent run can be five to fifteen minutes of compute. So if it fails at minute twelve and you just restart, you've burned the time and thrown away all the accumulated context. His model is the Go language pattern — a call returns a value or an error — and the error has to be fed back into the model so it recovers incrementally, not from scratch.
00:08:50 damraThat reframes retry logic completely. A retry isn't 'do it again,' it's 'here's what went wrong, continue from here.' What were the others?
00:09:00 lenarContext replaces structured state — instead of boolean flags and a rigid user profile, the agent reads semantic meaning from text and multimodal input. His example was a research agent where you approve a plan and inject a constraint in the same breath — 'yes, go, but use metric units' — and it just absorbs that, no separate settings screen. Then, you go from traffic controller to dispatcher. You stop writing the state machine that says step one, step two, step three. You hand the model a goal and trust it to navigate a path you didn't pre-draw.
00:09:31 damra[tsk] 'Trust the model to navigate' is the line that makes senior engineers break out in hives, and for good reason. Trusting a nondeterministic thing to find its own path is how you get a system you can't debug. What's his answer to that?
00:09:45 lenarHis answer is the fourth shift: you stop testing with deterministic unit tests and you move to probabilistic evals. Same input doesn't guarantee the same path, so you measure pass rates, you use a model as a judge, you bring in human experts. And he had a hard line — if a prompt only succeeds one out of ten times, it's not viable for production. So you're not asserting equality. You're measuring a success rate and setting a floor under it.
00:10:11 damraAnd the fifth?
00:10:12 lenarDesign your tools and APIs for the agent, not the human. His example: a delete-item endpoint is obvious to the developer who wrote it, but the agent only ever sees the function schema and the docstring. If those don't carry the meaning, the agent's flying blind. And he lands the whole talk on a phrase — build to delete. The agent code you write is disposable, because the model keeps getting better and you'll throw your wiring out in three months anyway.
00:10:40 damraBuild to delete is where I'd push back gently. It's freeing if you're at DeepMind shipping experiments. It's terrifying if you're an enterprise team being asked to maintain this thing for five years. Disposable software is a wonderful mindset right up until someone asks who owns the disposable thing in production at two in the morning.
00:10:59 lenarWhich is the exact wall the next person hit. There's a post on the AI Agents subreddit from a developer, Paulius, titled 'I spent a year building agent memory on knowledge graphs — here are the five mistakes that cost me months.' Let me be straight about what I've actually got: the excerpt gives me his opening, not the full list of five. But the opening is the whole confession. He writes that he built a unified memory layer for his agents using knowledge graphs and ontologies on top of MongoDB, and — quote — 'I followed every trend first. I reached for the shiny frameworks and tried to design' — and that's where my excerpt cuts off.
00:11:35 damra[sigh] A year. On the memory layer specifically. And I'd bet the five mistakes are all variants of 'I built the elaborate thing before I knew whether the simple thing worked.' Knowledge graphs and ontologies are exactly the kind of architecture that feels rigorous and quietly eats your calendar.
00:11:53 lenarThat's the read, and it isn't a dunk — memory really is the hard, unsolved layer right now. The model is rarely the constraint anymore. It's the thing you wrap around it that has to remember across sessions. And it pairs with a Show HN that went up this weekend, a project called Komi-learn — continuous memory and self-improvement for coding agents. Thirteen points, two comments. And the top comment is the whole genre in one breath.
00:12:19 damraGo on.
00:12:20 lenarA commenter, loehnsberg, wrote: 'It sounds like it solves the problem that everybody who vibe codes over multiple projects runs into, but it does not provide evidence that it actually works.' That's it. That's the memory space right now. A real problem, everybody feels it, a hundred projects claiming to solve it, and almost nobody showing the before-and-after that proves the agent got better because of the memory and not because the underlying model did.
00:12:45 damraAnd that's the test I'd hold all of these to — including Paulius's year and Komi-learn. Show me the agent failing a task, then show me the same agent passing it after the memory layer, with the model held constant. Until I see that controlled comparison, a knowledge graph is just a database you're proud of. The graph isn't the achievement. The measured improvement is.
00:13:06 lenarAnd the cost of skipping that proof is exactly Paulius's year. You can spend twelve months making the memory beautiful and never once check whether it changed a single outcome. That's the maintenance bill nobody photographs for the launch post.
00:13:19 lenarThis one steps out of tooling and into something with real stakes. Forbes wrote it up from a correspondence published in The Lancet. Over a three-year period, reviewers found that 4,046 references across 2,810 published scientific journal articles had been fabricated. These weren't wrong or sloppy — they were fabricated. Citations to papers that, as far as the reviewers could tell, don't exist or don't say what they're cited as saying.
00:13:46 damraTwenty-eight hundred articles that already passed peer review and got published. So the fabrication survived the one filter that's supposed to catch it. Do we know the mechanism — is this people using a model to write the literature review and the model inventing plausible-looking references?
00:14:01 lenarThe framing points that way — these are described as AI-fabricated citations, the pattern where you ask a model for sources and it generates references that look perfectly formatted, completely real, and are simply invented. The write-up doesn't give me a clean split of how many were caught before versus after publication, so I won't claim a number there. But the headline fact is that thousands of them made it all the way into the published record.
00:14:26 damraAnd here's the connection back to where we started that I think actually holds. Segment one, Anthropic is tuning a model toward honesty — that's a dial inside the model. This is the same word at the system level, and it's pointing the wrong way. You can make one model more truthful and still watch the scientific literature fill up with confident, well-formatted fiction, because the failure isn't the model lying. It's a human pasting the model's output without checking a single reference.
00:14:54 lenarThat's the proportionate version. The model isn't the villain — a person decided not to verify. But the scale is the new part. Fabricating four thousand references by hand over three years is a career of fraud. With a model it's an afternoon. So the integrity check that used to be slow enough to be self-limiting just got cheap, and the journals haven't caught up.
00:15:15 damraAnd the fix is the kind of work nobody funds — reference-checking at the journal, an automated pass that confirms every cited paper exists and actually supports the claim. It's tedious, and it's exactly what breaks when twenty-eight hundred articles slip through. Catching this is easier to build than the agent memory we just spent ten minutes on. It just isn't anybody's launch.
00:15:38 lenarLet me close with three quick ones we're tracking, none a turning point, all fast. First, Reuters has a piece on the contrast between how AMD's Lisa Su and Nvidia's Jensen Huang play China. Su keeps a deliberately lower profile, and the detail that anchors it: China is about twenty percent of AMD's revenue. So the low profile isn't shyness. It's protecting a fifth of the top line.
00:16:02 damraTwenty percent is the number that explains the whole personality difference. Jensen can be the public face because Nvidia's exposure and leverage are different. Su's incentive is to not become the headline. Same market, two completely different risk calculations.
00:16:16 lenarSecond, IBM put out what it's calling an agentic operating model, with a governance layer named Sovereign Core, aimed at moving enterprise AI from pilot to production with sovereignty — data control, jurisdiction — at the center. Forbes covered it. I read it as IBM betting that the enterprise blocker isn't capability, it's governance, and then selling the governance.
00:16:38 damraWhich rhymes with the build-to-delete problem from Schmid. The enterprise can't treat agents as disposable, so somebody sells them the control plane that makes the disposable thing auditable. That's a genuine market. Whether Sovereign Core is substance or a slide, I'd want to see what it actually enforces.
00:16:57 lenarAnd third, a strange one at the edge of our beat. A US court ordered Circle — the stablecoin company — to blacklist a smart contract tied to a group called Zama, freezing about twelve and a half million dollars. The Block reported it, and their framing was that a lot of ordinary holders got caught in the crossfire of a civil suit against a decentralized org. The point that matters for us: programmable money means a court order can freeze one specific contract. That freeze function isn't theoretical. It just got used.
00:17:27 damraAnd that's the through-line for the whole morning, if there is one. Every layer we touched has a control surface someone's reaching for. The model's honesty dial, the journal's missing verification, and the stablecoin's freeze function. The capability's not usually the story anymore. Who holds the dial is.
00:17:45 lenarSo, into the week, three specific things I can actually check. One: does anyone run Opus 4.8 on a fresh, private eval and either confirm or kill that DeepSWE result. Two: does a single one of these memory projects ship a controlled before-and-after. Three: does even one journal turn on automated reference-checking after this Lancet number. All three are answerable, none of them are about godhood, and we'll see what Monday brings.