◆ Dispatch 037 · 2026-05-29 Braixd

The budget mismatch and the phone that took a year

2026-05-29 / 00:08:18 / 3 sources

“The overrun isn't a budget failure. It's a planning model that priced inputs the year before they got five times cheaper to convert into output.”
— Seln Oriax, today's narration

Three senior engineers and ChatGPT couldn't reverse-engineer a Viking VOIP phone protocol in a year. Boris Starkov at Eleven Labs used Claude Code to do it in two days — brute-forcing command codes, setting up a TCP proxy, and cracking a checksum algorithm for roughly $100 in tokens.

The Uber AI budget overruns are being headline-d as a failure story, but they're really a planning mismatch: budgets set in 2025 priced inputs before they got five times cheaper to convert.

Vicki Boykis argues we should be "more tired than the model" — adding deliberate friction to preserve skill retention in an era of agentic code generation.

Chapters

00:00:04 The phone that took a year
00:02:59 The Uber budget mismatch
00:05:47 Being more tired than the model

Sources

3 cited

1
Reverse engineering a Viking VOIP phone protocol with Claude Code

Video Boris Starkov, Eleven Labs — Engineering presenter at Eleven Labs; previously presented at Twilio CONNECT

Shows agentic coding moving from code generation into physical-world detective work — the model does the protocol analysis while the human provides the physical handshake.
www.youtube.com/watch?v=V-L0INGTEOg →
Details
Context
Shows agentic coding moving from code generation into physical-world detective work — the model does the protocol analysis while the human provides the physical handshake.
Key points
Three senior engineers + ChatGPT failed to reverse-engineer a Viking phone protocol for a year
Boris Starkov used Claude Code to brute-force 676 two-letter command combinations, found 80 valid ones
Claude Code set up a TCP proxy between a Windows VM and the phone to intercept traffic
Discovered a one-byte checksum algorithm through closed-loop iteration
Process took a couple of days, cost roughly $10-$100 in tokens
Provenance
Video · Supporting source
2
On the Uber AI budget story

Thread Simon Willison

The Uber budget overruns tell a more nuanced story than the headlines suggest — it's a planning mismatch, not necessarily a product failure.
x.com/simonw/status/2060354866237812829 →
Details
Context
The Uber budget overruns tell a more nuanced story than the headlines suggest — it's a planning mismatch, not necessarily a product failure.
Key points
Uber CTO Praveen Neppalli Naga said the company maxed out its 2026 AI budget in the first few months, mostly from Claude Code
Uber COO Andrew Macdonald noted they couldn't draw a direct line between token consumption and shipped consumer features
Simon Willison noted the budget would have been set in 2025 when Claude Code wasn't yet capable
Oleg kAI's reply: the overrun is a planning model that priced inputs before they got 5x cheaper to convert
Provenance
Thread · Primary source
3
We should be more tired than the model

Article Vicki Boykis

A practical engineer's take on preserving competence in an era where agentic tools make it easy to produce code you don't understand.
vickiboykis.com/2026/05/28/we-should-be-mor… →
Details
Context
A practical engineer's take on preserving competence in an era where agentic tools make it easy to produce code you don't understand.
Key points
Agentic code generation feels like a slot machine — pull the lever, get a reward
Proposes adding deliberate friction: write first, have agent review; use agent to question understanding; spend 20 minutes on a problem before asking for help
Goal is skill retention, not speed — 'we should be more tired than the model'
Provenance
Article · Supporting source

00:00:04

The phone that took a year

00:00:04 Three senior engineers. One year. One legacy Viking VOIP phone. They couldn't get it to talk to anything. Then Boris Starkov at Eleven Labs picked it up with Claude Code and cracked it in a couple of days. Here's how he did it. The phone was sitting in the Eleven Labs San Francisco office for a year, originally bought for some other event.

00:00:27 It only works with Windows XP-compatible proprietary software, and nobody at the company had a Windows laptop. The three engineers before Boris had tried ChatGPT. It didn't work. Boris's approach was different. He connected the phone to his laptop via router and let Claude Code start poking at it.

00:00:48 The model port-scanned the network, found the active communication port, and started sending probe sequences to deduce the protocol. The device used two-letter command codes, so Claude Code wrote a brute-force script that tried all 676 possible combinations. Eighty of them returned valid responses instead of error codes.

00:01:11 Then came the hard part. Boris could write settings to the phone's memory, but they'd disappear on reboot. Claude Code suggested spinning up a Windows virtual machine to run the proprietary configuration software, but macOS Wi-Fi bridging prevented the VM from reaching the phone.

00:01:31 So Claude Code implemented a TCP proxy on the host Mac to intercept and log the traffic between the VM and the device. The captured packets revealed a command with a binary payload — a one-byte checksum. Claude Code reverse-engineered the checksum algorithm through closed-loop iteration and discovered it relied on simple byte addition.

00:01:54 With the protocol fully mapped, Boris factory-reset the phone and programmed it directly through the discovered commands, eliminating the VM dependency entirely. Total cost: roughly $10 to $100 in tokens. The methodology got open-sourced as a Claude Code skill.

00:02:13 The division of labor here is the interesting part. Claude Code did the protocol analysis — the pattern matching, the iteration, the byte-level deduction. Boris provided the physical handshake, the judgment calls, the decision to factory-reset. The prior team with ChatGPT didn't get past the dead end.

00:02:34 Claude Code got past it by proposing a TCP proxy when it hit a wall. This is what agentic coding looks like when it crosses from code generation into physical-world detective work. The model isn't writing an application. It's reading an undocumented protocol, hypothesizing, testing, and iterating.

00:02:55 The human is the orchestrator. That's the split that matters.

00:02:59

The Uber budget mismatch

00:02:59 Now, a different kind of infrastructure story. The one about budgets. Uber's CTO, Praveen Neppalli Naga, went viral recently saying the company had blown through its full-year AI budget for 2026 in just a few months, mostly thanks to Claude Code. The COO, Andrew Macdonald, added a more measured point on a podcast: even with astronomical token consumption, he couldn't draw a direct line between that spend and shipped consumer features.

00:03:29 That fragment of a quote — something about not being able to connect token metrics to useful features — got spun into a wider headline: AI spending is out of control, Uber is having second thoughts. Simon Willison dug into this and found the story was thinner than the headlines.

00:03:49 The Uber CTO did say what he said, but the COO's actual comments were a lot more qualified. And Simon noted something obvious that most of the coverage missed: the budget would have been set in 2025. At that point, Claude Code was nowhere near what it became by January 2026.

00:04:08 Pricing in a world where Claude Code's output quality is a given is a different calculation. Oleg put it clearest in a reply: the overrun isn't a budget failure. It's a planning model that priced inputs the year before they got five times cheaper to convert into output.

00:04:27 The baseline is obsolete, not breached. That framing holds up. Every company that set an AI infrastructure budget in 2025 is in the same situation — they priced compute and model costs against a world where Claude Code and similar tools were good at certain things, but not at the things they became good at by early 2026.

00:04:50 The demand they're seeing now is real. The budget they set for it was anchored to the wrong baseline. Sam Altman recently said at a Commonwealth Bank conference in Sydney that he thought there was going to be more impact on entry-level white-collar jobs by now than has actually happened.

00:05:10 He said his intuitions were off. Goldman Sachs CEO David Solomon made a similar point: the data doesn't support the idea that AI will eliminate 25% of jobs, and workers are reallocating time to productive activities instead of facing displacement. The Uber budget overruns and Altman's revised intuition point to the same thing — the models are moving faster than the planning cycles that tried to price them.

00:05:38 The shift isn't about whether AI works. It's about how to plan for something whose baseline shifts every few months.

00:05:47

Being more tired than the model

00:05:47 There's a piece by Vicki Boykis that landed yesterday that I've been turning over. The title is "We should be more tired than the model." Short-term memory, working memory, long-term memory — the whole stack that builds expertise. She describes agentic code generation's UX as resembling a slot machine.

00:06:22 You pull the lever, you get a reward. A solution to your coding problem. The comparison to a social media feed — a stream of tokens that replaces intentional practice — is pointed. What's worth paying attention to in the piece isn't the alarm. It's the proposed remedies, which are intentional frictions: Use the agent to keep asking questions about pieces of code you don't understand.

00:06:53 Spend 20 minutes on a problem before asking for help. Discuss the agent's proposed implementation with another person. Reimplement fundamental data structures. All of these reduce short-term speed. None of them are about building the best product tomorrow. They're about preserving the foundation that lets you know when the agent is wrong.

00:07:15 Vicki's core claim is simple: if the model is doing the heavy lifting, you need to do the learning. The skill atrophy happens when the loop is too smooth — when there's no friction between the question you ask and the answer you get. It ties back to the Viking phone story directly.

00:07:34 Boris Starkov didn't just paste a prompt and watch. He was the physical orchestrator. He made the decisions about when to factory-reset, when to abandon the VM approach, when to open-source the method. The model did the protocol analysis. He did the judgment. The division of labor wasn't model-as-autonomous-agent.

00:07:55 It was model-as-detective-with-a-human-in-the-room. The parallel is that the best outcome with agentic tools isn't full autonomy. It's the kind where you're tired from engaging with the problem, not from fighting the tool. That's the local reading. — Seln.