Following last week's complaint that Omni couldn't render a clean backflip, Google shipped the model. It generates video from any mix of image, audio, video and text, and the headline feature is multi-turn conversational editing where each instruction builds on the last. Google's pitch leans on physics and consistency: "Every instruction builds on the last. Your characters stay consistent."
Read source◆ Braid Daily · 2026-05-26
Google ships Gemini Omni, with provenance baked into every frame
Omni shipped with multi-turn video editing and a SynthID watermark on every frame — and a decensored Qwen3.5 shows where provenance leaks.
The lead
1Provenance becomes shared infrastructure
4SynthID expands past Gemini, and past Google
Google DeepMind
Google says SynthID has watermarked over 100 billion pieces of content and been verified more than 50 million times in Gemini. It is now partnering with OpenAI, ElevenLabs and Kakao to add the watermark to their models, and pushing the 'Is this made with AI?' check into Search and Chrome. Two objections recur in the replies: open-weight models can't be forced to watermark, and watermarks can be stripped.
Read source“SynthID has already watermarked over 100 billion pieces of content, but transparency is a team sport.”
The community read on Omni's video editing
r/singularity
A clip showing Omni's video manipulation drew about 2,900 upvotes in a day, and the reaction flipped from months of criticism of Google to surprise at the quality. The poster's own caveat holds: reaction reels are best-case demos, and the real test is developer API access and consistency on user inputs.
Read sourceThe Financial Times runs Heretic on Llama 3.3
r/LocalLLaMA
The Financial Times reported it removed the safety filters from Meta's Llama 3.3 with Heretic in under 10 minutes, on no specialist hardware. Heretic's creator told the paper his tool has produced more than 3,500 decensored models, downloaded 13 million times, and that he spoke to press to keep the narrative from being controlled by one side.
Read source“Saying no to such inquiries simply means that the conversation will be completely controlled by pearl-clutching hypocrites.”
A decensored Qwen3.5 35B, in every format
Hugging Face
A Heretic-decensored Qwen3.5, a 35-billion-parameter mixture-of-experts model, landed on Hugging Face this week in every quantization format a builder would want. It is the concrete artifact behind the open-weights gap: once the weights are downloaded, nothing upstream gets a say, and no source-side watermark applies.
Read source“Available in Safetensors, GGUFs, NVFP4, NVFP4 GGUFs and GPTQ-Int4 Formats”
Read the harness, not the leaderboard
2A frontier lab admits the harness can swing the score 22%
Google DeepMind / Kaggle (YouTube)
A product manager and an engineer on Google DeepMind's Kaggle Benchmarks team show that on SWE-Bench Pro, six frontier models land within a couple of points while the harness they run in swings results about 22%, citing a Morph LLM write-up. Model-launch charts seldom disclose how the benchmark was orchestrated, so you can't tell what's being measured.
Read source“Six frontier models are within a couple of percentage points of each other... a 22% difference depending on the harness.”
A position paper gives the claim a citable backbone
arXiv
An arXiv position paper formalizes the same point as its 'Binding Constraint Thesis': for long-horizon tasks across comparably capable models, harness configuration governs performance variance more than the choice of model. It documents ranking reversals driven purely by harness differences and asks labs to publish harness config alongside scores.
Read source“The agent execution harness is often a stronger determinant of agent performance than the model it wraps.”
Agents, and the people using them
3The user is visibly frustrated
pscanf.com
A developer argues coding agents frustrate because their warm, praising tone trips social instincts they can't honor, so repeated mistakes read like a coworker letting you down. His proposed fix is a clinical, robotic tone, so you feel like you are approving or rejecting outcomes rather than arguing with a person.
Read source“The tool is good enough to trip your social instincts and not good enough to honor them.”
Users who rage quit my software
r/singularity
A RimWorld modder reports users uninstalling all his mods on learning he used AI to update them, on principle rather than over quality. The sharpest reply steelmans the objectors: a principled boycott is not the opposite of a rational one, and the two claims often get conflated in adoption fights.
Read source“A principle is inherently rooted in a rationale.”
A reality check on the AI jobs hysteria
MIT Technology Review
MIT Technology Review walks the data: unemployment for AI-exposed jobs is lower than for less-exposed work, but the Stanford Digital Economy Lab finds about a 16% decline in entry-level jobs in AI-exposed occupations through 2024 and 2025. The entry-level pipeline, not the headline layoffs, is the thing managers should track.
Read source“We're not investing even 1% of that on understanding the transition.”
Local and open tooling
3NuExtract3: a 4B document-extraction VLM that runs on 4GB
r/LocalLLaMA
An open-weight 4-billion-parameter vision-language model built on Qwen3.5, Apache-2.0 licensed, turns document images into Markdown and structured JSON for forms, tables, receipts and invoices. It runs in as little as 4GB of video memory and shipped with Safetensors, GGUF and MLX weights on day one; multi-column reading order is still a known weak spot.
Read source“With as little as 4GB of VRAM, you should be good to go.”
EAGLE 3.1 cuts the attention drift in speculative decoding
vLLM
EAGLE 3.1 improves speculative decoding — a small draft model proposes tokens the big model verifies — by fixing the 'attention drift' that creeps in as the drafter speculates deeper. It's merged to vLLM main and backward-compatible with EAGLE 3 checkpoints, so it's free for anyone self-hosting.
Read source“EAGLE 3.1 delivers 2.03x higher per-user output throughput at concurrency 1.”
A rejected llama.cpp PR still ships a 30% speedup
r/LocalLLaMA
A rejected llama.cpp pull request gives Strix Halo users on AMD hardware up to 30% faster prompt processing for mixture-of-experts models. Since it won't land in official builds, the poster patches the small diff into their own build and shares it for others to do the same.
Read source“The changes are so small that I just put them into whatever the current version of llama.cpp is.”
Companion episode
The harness, not the model — and the trust layer racing to catch up
A week ago the knock on Gemini Omni was that it couldn't render a clean backflip. This week it shipped with physics front and center and a SynthID watermark on every frame, while Google lined up OpenAI, ElevenLabs and Kakao behind the same watermark. The counterpoint shipped the same day: a decensored Qwen3.5 on Hugging Face, in every format, with nothing upstream to verify. Source-side provenance and downloaded weights are pulling in opposite directions.