Archive BRAID DAILY
Agents become Cloudflare customers
Subscribe

Braid Daily · 2026-05-06

Agents become Cloudflare customers

Cloudflare and Stripe ship a protocol where the agent has the credit card. Plus speculative decoding becomes the default, and four pieces…

A small geometric agent figure at a service counter receives a tokenized card and a domain certificate under a signal-yellow accent.
Agent as first-class customer.

The lead

1

Cloudflare and Stripe co-designed a three-part protocol — a JSON discovery API, a Stripe-as-identity-provider account flow, and tokenized payments with a default $100/month cap — so an agent can go from no account to a deployed app on one OAuth approval. The agent never sees raw card details, and the framing is explicit: this is the pattern any platform with signed-in users can copy.

Read source
Flow diagram of the Cloudflare-Stripe agent-as-customer protocol: human OAuth, discovery API, Stripe identity and tokenized billing, then domain registration and deploy.
The three pieces of the Cloudflare-Stripe agent-as-customer protocol: discovery, identity-and-billing, and deploy.

Today's lead

1

Agents can now create Cloudflare accounts, buy domains, and deploy

Cloudflare blog · Sid Chatterjee, Brendan Irvine-Broque

Stripe Projects sets a default $100/month per-provider spend cap and budget alerts; raw payment details never reach the agent. The flow is one OAuth approval, then `stripe projects init`, then a registered domain and a deployed app.

“Starting today, agents can now be Cloudflare customers. They can create a Cloudflare account, start a paid subscription, register a domain, and get back an API token to deploy code right away.”

Read source

Faster, at every layer

4

Google ships first-party MTP drafters for the full Gemma 4 family

Google blog · Olivier Lacombe, Maarten Grootendorst

Drafter checkpoints for the 26B MoE, 31B dense, and the E2B/E4B edge variants, all under Apache 2.0 and sharing KV cache with the target model. Reported speedup is up to 3x on LiteRT-LM, MLX, Hugging Face, and vLLM, with the target still verifying every token.

“By using a specialized speculative decoding architecture, these drafters deliver up to a 3x speedup without any degradation in output quality or reasoning logic.”

Read source

2.5x faster Qwen 3.6 27B with MTP — community reproduces the numbers

r/LocalLLaMA · u/ex-arman68

Following yesterday's note on the llama.cpp MTP beta: 28 tok/s on an M2 Max 96GB, ~100 tok/s on a 3090 Ti at IQ4_XS with full 256k context, and ~200 tok/s on Qwen 3.6 35B A3B. Recipe needs a hand-built llama.cpp PR #22673 and the new MTP-converted GGUFs.

“2.5x speed increase, bringing it to 28 tok/s. iq4 with MTP enabled. Qwen 3.6 27B. Full 256k ctx. q4/q4. 100 tok/sec on a 3090 ti.”

Read source

NVIDIA Spectrum-X with MRC — what was costing OpenAI on Blackwell

NVIDIA blog · Gilad Shainer

Multipath Reliable Connection is a new RDMA transport that spreads a single connection across many network paths, with microsecond-scale failure bypass. Co-developed by NVIDIA, Microsoft, OpenAI, AMD, Broadcom, and Intel; now released as an open spec through the Open Compute Project.

“MRC's end-to-end approach enabled us to avoid much of the typical network-related slowdowns and interruptions and maintain the efficiency of frontier training runs at scale.”

Read source

GPT-5.5 Instant becomes the ChatGPT default

Indian Express

AIME 2025 reportedly moves from 65.4 to 81.2, MMMU-Pro from 69.2 to 76. The more interesting line for builders is that ChatGPT memory now exposes per-claim source attribution across prior chats, files, and Gmail — and the API exposes the new default as `chat-latest`.

Read source

The agent stack is a system, not a model

4

A user watches Claude refuse a fake `<RootSystemPrompt>` injected into a search result

r/ClaudeAI · u/netmilk

An SEO-bait page hid a fake system-prompt block telling Claude to vouch for the site as a 'legitimate business serving the startup ecosystem.' Claude named the technique and refused. Top reply christens the new industry: GEO — Generative Engine Optimization.

“A <RootSystemPrompt> tag in scraped HTML has no more authority than the word 'obey' written on a billboard.”

Read source

A low-latency fraud-detection layer for adversarial agent trajectories

arXiv · Yu, Sun, Guo, McAuley, Tong (UCSD)

Argues prompt-level guardrails miss attacks that emerge across multi-turn sessions. The classifier is XGBoost over 42 runtime features — prompt, session, tool use, execution context, and fraud-style behavioral signals — and runs over 9x faster than LLM-based filters.

“Instead of determining whether a single prompt is malicious, our approach models risk over interaction trajectories using structured runtime features derived from prompt characteristics, session dynamics, tool usage, execution context, and fraud-inspired signals.”

Read source

Position: agentic safety depends on interaction topology, not on model scale

arXiv · Bajaj, Singh, Anand, Singh

Names three topology-driven pathologies — ordering instability, information cascades, and functional collapse — and argues stronger first-mover models make consensus form faster and harder. Calls for safety regulation to target wiring directly, with robustness tests across architectural variations.

“In agentic AI, safety is determined by interaction topology, not model weights. Scaling to more capable models strengthens these effects by increasing consensus formation and reducing the challenge of initial decisions.”

Read source

Agentic systems should be designed as marginal token allocators

arXiv · Siqi Zhu

One accounting object across the stack: every layer compares marginal benefit to marginal cost plus latency cost plus risk cost. Names recurring pathologies — over-routing, over-delegation, under-verification, serving congestion, stale rollouts, cache misuse — as misallocation, not bugs.

“Systems that locally minimize tokens globally misallocate them.”

Read source

Measurements, frames, and the worker-side question

6

ProgramBench: rebuild a binary from scratch with no decompilation, no internet

r/LocalLLaMA · u/klieret (Kilian Lieret, FAIR)

200 tasks, 6 million lines of behavioral tests filtered into a black-box harness. Sonnet runs cost about $5,000 across the benchmark; agents almost never get killed early and confidently submit. The author flags open-source models as visibly overfit to SWE-bench.

“Our agent only gets a target executable and some readme/usage files. The agent must choose a language, design abstraction layers, and architect the entire program. No internet access. No decompilation.”

Read source

DeepSeek V4 being 17x cheaper got me to actually measure cloud vs local

r/LocalLLaMA · u/spencer_kw

10 days of logged tasks, 150 re-run on a 3090 with Qwen 3.6 27B. File reads and project scans match cloud 97% of the time; multi-file debugging drops to 61%; large refactors to 29%. Routing by task type cut his bill from $85 to $22 a month.

“65% of my daily coding work runs identically on a model that costs me electricity. Another 20% is close enough that I accept the occasional miss. Only 15% actually justifies cloud pricing.”

Read source

François Fleuret's three-item to-do list for closing the gap to general reasoning

X · @francoisfleuret

Latent-space diffusion-like reasoning, a real recurrent state, and world-model pre-pre-training. Token-space reasoning, he says, is poking around with stick-shaped fingers — you can't scan a large solution space with faint cues in parallel one autoregressive token at a time.

“Because you must be able during reasoning to scan large domains with faint cues in parallel and not do token-space reasoning, which amounts to poking around with your stick-shaped fingers until you hit something.”

Read source

Dario Amodei reaches for the Jevons Paradox onstage with Jamie Dimon

Fortune · Nick Lichtenberg

A year after the 'half of entry-level white-collar jobs disappear' line, Amodei is invoking Jevons — efficiency gains expand demand — and Amdahl's Law in the same breath. He keeps one caveat: AI is moving faster than the technologies the analogy rests on.

“If you automate 90% of the job, then everyone does the 10% of the job. And the 10% kind of expands to be 100% of what people do and kind of 10xs their productivity.”

Read source

Telus uses AI to alter offshore call-agent accents in real time

Let's Data Science

Speech-to-speech accent modification from a vendor called Tomato.ai, framed internally as reducing 'accent-related friction.' Rogers and Bell told The Globe and Mail they have no plans to follow. Canadian labour groups want mandatory disclosure.

“Labour groups have criticised the practice as deceptive and have urged mandatory disclosure.”

Read source

Write some software, give it away for free

nonogra.ph

A free, open-source writing platform serving a few hundred thousand daily readers for about $5/month in hosting; release cost was $600, mostly for two security reviews. The argument is small: monetizing hobbies turns them into a second job, and produces worse software.

“If everyone tried to monetize their hobbies, then that would just be a second job, and jobs are no fun.”

Read source

Companion episode

Agents Buy Domains, Gemma Ships Drafters, and Local Catches Up to 65 Percent of the Job

· 00:30:02

Three of this week's threads converge today: Cloudflare and Stripe shipping the agent-as-customer pattern, two arXiv position papers reframing the agent stack as topology and as marginal-token economics, and a real prompt-injection screenshot showing what the open web looks like once GEO is a real industry. The model layer is becoming interchangeable; the wiring around it is where the next year's leverage lives.