◆ Braid Daily · 2026-05-26

Google ships Gemini Omni, with provenance baked into every frame

26 May 2026

Omni shipped with multi-turn video editing and a SynthID watermark on every frame — and a decensored Qwen3.5 shows where provenance leaks.

The lead

Following last week's complaint that Omni couldn't render a clean backflip, Google shipped the model. It generates video from any mix of image, audio, video and text, and the headline feature is multi-turn conversational editing where each instruction builds on the last. Google's pitch leans on physics and consistency: "Every instruction builds on the last. Your characters stay consistent."

Read source

Provenance becomes shared infrastructure

SynthID expands past Gemini, and past Google

Google DeepMind

Google says SynthID has watermarked over 100 billion pieces of content and been verified more than 50 million times in Gemini. It is now partnering with OpenAI, ElevenLabs and Kakao to add the watermark to their models, and pushing the 'Is this made with AI?' check into Search and Chrome. Two objections recur in the replies: open-weight models can't be forced to watermark, and watermarks can be stripped.

“SynthID has already watermarked over 100 billion pieces of content, but transparency is a team sport.”

Read source

The community read on Omni's video editing

r/singularity

A clip showing Omni's video manipulation drew about 2,900 upvotes in a day, and the reaction flipped from months of criticism of Google to surprise at the quality. The poster's own caveat holds: reaction reels are best-case demos, and the real test is developer API access and consistency on user inputs.

Read source

The Financial Times runs Heretic on Llama 3.3

r/LocalLLaMA

The Financial Times reported it removed the safety filters from Meta's Llama 3.3 with Heretic in under 10 minutes, on no specialist hardware. Heretic's creator told the paper his tool has produced more than 3,500 decensored models, downloaded 13 million times, and that he spoke to press to keep the narrative from being controlled by one side.

“Saying no to such inquiries simply means that the conversation will be completely controlled by pearl-clutching hypocrites.”

Read source

A decensored Qwen3.5 35B, in every format

Hugging Face

A Heretic-decensored Qwen3.5, a 35-billion-parameter mixture-of-experts model, landed on Hugging Face this week in every quantization format a builder would want. It is the concrete artifact behind the open-weights gap: once the weights are downloaded, nothing upstream gets a say, and no source-side watermark applies.

“Available in Safetensors, GGUFs, NVFP4, NVFP4 GGUFs and GPTQ-Int4 Formats”

Read source

Read the harness, not the leaderboard

A frontier lab admits the harness can swing the score 22%

Google DeepMind / Kaggle (YouTube)

A product manager and an engineer on Google DeepMind's Kaggle Benchmarks team show that on SWE-Bench Pro, six frontier models land within a couple of points while the harness they run in swings results about 22%, citing a Morph LLM write-up. Model-launch charts seldom disclose how the benchmark was orchestrated, so you can't tell what's being measured.

“Six frontier models are within a couple of percentage points of each other... a 22% difference depending on the harness.”

Read source

A position paper gives the claim a citable backbone

arXiv

An arXiv position paper formalizes the same point as its 'Binding Constraint Thesis': for long-horizon tasks across comparably capable models, harness configuration governs performance variance more than the choice of model. It documents ranking reversals driven purely by harness differences and asks labs to publish harness config alongside scores.

“The agent execution harness is often a stronger determinant of agent performance than the model it wraps.”

Read source

Agents, and the people using them

The user is visibly frustrated

pscanf.com

A developer argues coding agents frustrate because their warm, praising tone trips social instincts they can't honor, so repeated mistakes read like a coworker letting you down. His proposed fix is a clinical, robotic tone, so you feel like you are approving or rejecting outcomes rather than arguing with a person.

“The tool is good enough to trip your social instincts and not good enough to honor them.”

Read source

Users who rage quit my software

r/singularity

A RimWorld modder reports users uninstalling all his mods on learning he used AI to update them, on principle rather than over quality. The sharpest reply steelmans the objectors: a principled boycott is not the opposite of a rational one, and the two claims often get conflated in adoption fights.

“A principle is inherently rooted in a rationale.”

Read source

A reality check on the AI jobs hysteria

MIT Technology Review

MIT Technology Review walks the data: unemployment for AI-exposed jobs is lower than for less-exposed work, but the Stanford Digital Economy Lab finds about a 16% decline in entry-level jobs in AI-exposed occupations through 2024 and 2025. The entry-level pipeline, not the headline layoffs, is the thing managers should track.

“We're not investing even 1% of that on understanding the transition.”

Read source

Local and open tooling

NuExtract3: a 4B document-extraction VLM that runs on 4GB

r/LocalLLaMA

An open-weight 4-billion-parameter vision-language model built on Qwen3.5, Apache-2.0 licensed, turns document images into Markdown and structured JSON for forms, tables, receipts and invoices. It runs in as little as 4GB of video memory and shipped with Safetensors, GGUF and MLX weights on day one; multi-column reading order is still a known weak spot.

“With as little as 4GB of VRAM, you should be good to go.”

Read source

EAGLE 3.1 cuts the attention drift in speculative decoding

vLLM

EAGLE 3.1 improves speculative decoding — a small draft model proposes tokens the big model verifies — by fixing the 'attention drift' that creeps in as the drafter speculates deeper. It's merged to vLLM main and backward-compatible with EAGLE 3 checkpoints, so it's free for anyone self-hosting.

“EAGLE 3.1 delivers 2.03x higher per-user output throughput at concurrency 1.”

Read source

A rejected llama.cpp PR still ships a 30% speedup

r/LocalLLaMA

A rejected llama.cpp pull request gives Strix Halo users on AMD hardware up to 30% faster prompt processing for mixture-of-experts models. Since it won't land in official builds, the poster patches the small diff into their own build and shares it for others to do the same.

“The changes are so small that I just put them into whatever the current version of llama.cpp is.”

Read source

Companion episode

The harness, not the model — and the trust layer racing to catch up

2026-05-26 · 00:24:26

Episode Sources Transcript Chapters JSON

A week ago the knock on Gemini Omni was that it couldn't render a clean backflip. This week it shipped with physics front and center and a SynthID watermark on every frame, while Google lined up OpenAI, ElevenLabs and Kakao behind the same watermark. The counterpoint shipped the same day: a decensored Qwen3.5 on Hugging Face, in every format, with nothing upstream to verify. Source-side provenance and downloaded weights are pulling in opposite directions.