Patrick Jiang announced Harness-1, a 20-billion-parameter search agent trained with what he calls a state-externalizing harness. The pitch: "frontier-level long-horizon search, rivaling Opus-4.6 and outperforming GPT-5.4" at "Context-1-level cost and latency." The claim worth testing is that a small model plus a harness built to push its working state outside the context window can stand in for a…
Read source◆ Braid Daily · 2026-06-07
Harness-1: a 20-billion-parameter search agent trained to rival Opus-4.6
A 20-billion-parameter agent claims frontier search at Context-1 cost — the harness, not the weights, is the story.
The lead
1
Agent = model + harness
4A default recipe for tuning the whole agent
X / Viv
Viv argues an agent is a model plus a harness, and you should train both: build a v1 on a sensible base harness with task-specific scaffolding, then optimize the pair together rather than just swapping in a bigger model.
Read source“Agent = Model + Harness”
One dollar, twenty minutes, three platforms
X / Nate
Nate reports stitching DeepSeek agents together with a few homemade tools to one-shot a full-stack web, iOS, and Android app. His number: about a dollar in roughly twenty minutes.
Read source“I can now one-shot a full-stack web + iOS + Android app for about $1 in 20 minutes.”
An agent that shipped an app to the App Store
X / Tamaz Gadaev
Tamaz Gadaev describes a CRUX test where an agent built and published an iOS app to the App Store with a few human interventions, his case for why open-world evaluations show more than a pass/fail score.
Read sourceGrok Build edits a live app from a comment
X / Jon Shulkin
Jon Shulkin shows a natural-language comment-and-edit tool built with Grok Build that lives inside the app being built; leave a comment, and Grok Build makes the change and updates the app.
Read sourcePlumbing for agents
4Sem: code entities on top of Git, not LSP
Hacker News
Sem proposes a primitive for code understanding built from Git dependencies rather than a language server: ask what a function depends on and what depends on it. 128 points and 49 comments on Hacker News.
Read sourceA proposed shared format for agent memory
Hacker News
The Universal Memory Protocol wants one portable format for agent memory across tools. The top comment names the catch directly: a protocol is only as good as its adoption, and it isn't clear who is using this yet.
Read sourceDatabricks tuned a retriever to speed up its assistant
X / Matei Zaharia
Matei Zaharia writes up how Databricks made its Knowledge Assistant three times faster with an Instructed Retriever trained end-to-end, a sign that custom model tuning is showing up as agents reach production scale.
Read sourcepidgin.sh turns Claude Code artifacts into URLs
Reddit / r/ClaudeAI
Built with Claude Code, pidgin.sh targets a familiar friction: Claude generates an HTML mockup or a one-pager, and now it can share that artifact as a hosted link instead of you saving and hosting it by hand.
Read sourceOn the timeline
4OpenAI plans to turn ChatGPT into a superapp
Techmeme / Financial Times
OpenAI plans to overhaul ChatGPT in the coming weeks into a superapp with coding tools and agents, framed as a gateway to higher-margin products, per Cristina Criddle at the Financial Times.
Read sourceSpaceX signs a $30B compute deal with Google
Indian Express
SpaceX agreed to supply Google with AI computing power under a deal reported at $30 billion, another sign of compute supply being locked up across the largest players.
Read sourceUK police told to stop drafting court statements with AI
Techmeme / Financial Times
Several UK police forces have been told to stop using AI to prepare court statements, on the concern that inaccurate outputs could contaminate legal procedures, per Robert Wright at the Financial Times.
Read sourceWhat stays scarce after AGI
Techmeme / Dwarkesh Podcast
A Q&A with Google DeepMind's Alex Imas and Epoch AI's Phil Trammell on what remains scarce after AGI and how AI-generated wealth might be redistributed, on the Dwarkesh Podcast.
Read sourceLocal & on-device
2r/LocalLLaMA is still waiting on a runnable GLM Air
Reddit / r/LocalLLaMA
The local crowd's complaint, in one thread: GLM 5.1 is a strong coder but too big to run at home and slow on the API, and there's been no upgraded Air model since 4.5. The ask is a capable GLM that fits on local hardware.
Read sourcemlx-audio ships local TTS and ASR on Apple Silicon
X / Kris Matterz
mlx-audio v0.4.4 brings new text-to-speech and speech recognition models running locally on Apple Silicon, the kind of on-device audio stack that doesn't need a server round-trip.
Read sourceCompanion episode
Twenty Billion Parameters, One Big Harness
Three days running, the thread has been the same: capability is moving into the harness and the tooling around the model, not only the weights. Harness-1 puts a number on it, and the agent-building tweets show people wiring small models into full apps. The open question the memory-protocol thread keeps asking is who agrees on the standards once everyone's doing it.