Archive BRAID
Mozilla's 271 Bugs, Chrome's 4 Gigabytes, and a WebRTC Veteran Telling OpenAI to Stop / DISPATCH 020
PDF RSS

Dispatch 020 · 2026-05-08 GSV Read The Filesystem Events Carefully

Mozilla's 271 Bugs, Chrome's 4 Gigabytes, and a WebRTC Veteran Telling OpenAI to Stop

/ 00:30:10 / 13 sources

“A frontier-lab harness pointed at Firefox just turned a year of latent vulnerabilities into 271 fixes, and the team published exactly how they wired it.”

— Lenar Kess, today's narration

Mozilla publishes the long-form on how a Claude Mythos Preview harness found 271 security bugs in Firefox, including sandbox escapes that fuzzers missed for twenty years. A European privacy lawyer goes byte-precise on Chrome's silent four-gigabyte Gemini Nano push, using kernel filesystem events on a profile that received zero human input. A WebRTC veteran tells OpenAI, on the day it ships GPT-Realtime-2, that the protocol assumptions are wrong for voice agents. Plus AlphaEvolve's twelve concrete production deployments, Anthropic's natural-language autoencoders putting a number on Claude's evaluation awareness, AMD's first new Instinct PCIe card in five years, and OpenAI quietly winding down the fine-tuning API.

Chapters

  1. 00:00:04 Mozilla, Claude Mythos Preview, and 271 bugs
  2. 00:03:46 Alexander Hanff goes byte-precise on Chrome's 4 GB silent install
  3. 00:08:50 WebRTC is the problem — Luke Curley vs OpenAI's voice stack
  4. 00:13:06 AlphaEvolve's first year, in twelve concrete deployments
  5. 00:16:57 Anthropic puts a number on Claude's evaluation awareness
  6. 00:20:29 AMD's MI350P, Skymizer's HTX301, and the on-prem inference shelf
  7. 00:23:36 OpenAI sunsets fine-tuning, Brussels opens transparency consultation
  8. 00:25:53 A Friday roundup: a kernel-vuln week, a Hugging Face infostealer, and 138 tokens per second on a laptop
  9. 00:28:36 What I'm watching

Sources

13 cited
  1. 1

    Behind the Scenes Hardening Firefox with Claude Mythos Preview

    Article Brian Grinstead, Christian Holler, Frederik Braun — Mozilla Firefox engineering and security team leads

    Just a few months ago, AI-generated security bug reports to open source projects were mostly known for being unwanted slop... It is difficult to overstate how much this dynamic changed for us over a few short months.

    hacks.mozilla.org/2026/05/behind-the-scenes… →
    Details
    Cited text
    Just a few months ago, AI-generated security bug reports to open source projects were mostly known for being unwanted slop... It is difficult to overstate how much this dynamic changed for us over a few short months.
    Context
    A frontier-lab harness paired with a serious browser codebase just turned a year of latent vulnerabilities into 271 fixes, and the team explains exactly how they wired it. That changes the price of security work for any sufficiently ambitious open-source project this quarter.
    Key points
    • Mozilla shipped 271 bug fixes in Firefox 150 found via an agentic harness wired around Claude Mythos Preview, plus more in 149.0.2 and 150.0.x
    • 180 of the 271 were sec-high; many were sandbox escapes - a class fuzzing struggles with - including a 20-year-old XSLT bug and a 15-year-old <legend> bug
    • The harness fits on top of existing fuzzing infrastructure: parallel ephemeral VMs each scoped to a target file, dedup against known issues, model-agnostic so upgrades just plug in
    • Mozilla observed many model attempts at prototype-pollution sandbox escapes that were thwarted by their pre-existing prototype-freezing architecture - hardening compounds
    • Recommendation: any team can start with simple prompting against a modern model and a project-specific pipeline today; do not wait
    Provenance
    Article · Supporting source
  2. 2

    Google Chrome silently installs a 4 GB AI model on your device without consent

    Article Alexander Hanff — European privacy lawyer and researcher who runs WebSentinel privacy audits

    A 4 GB AI model arrived on this user's disk without consent, without notice, on a profile that received zero human input, in a window of 14 minutes and 28 seconds, on a Tuesday afternoon.

    www.thatprivacyguy.com/blog/chrome-silent-n… →
    Details
    Cited text
    A 4 GB AI model arrived on this user's disk without consent, without notice, on a profile that received zero human input, in a window of 14 minutes and 28 seconds, on a Tuesday afternoon.
    Context
    We talked about Chrome's silent Gemini Nano push three days ago at a high level. Hanff has now done the kernel-level forensic work and the legal mapping, and the parallel cloud-backed AI Mode label is its own consent failure on top of the install one.
    Key points
    • Hanff used macOS .fseventsd kernel logs on a fresh, never-touched-by-human Chrome profile to byte-precisely document Chrome writing OptGuideOnDeviceModel/weights.bin in 14 minutes and 28 seconds
    • Chrome characterizes the user's GPU and unified-memory total to decide eligibility before any user-facing AI feature appears - the install begins before the settings UI exists
    • The visible 'AI Mode' pill in the Chrome 147 omnibox is cloud-backed Search Generative Experience - it does not invoke the on-device Nano model at all, despite suggesting locality
    • Hanff frames this as breaches of ePrivacy Article 5(3), GDPR Article 5(1) lawfulness/fairness/transparency, and Article 25 data-protection-by-design
    • He revisits the same dark-pattern playbook he documented for Anthropic's Claude Desktop Native Messaging bridge - same forced-bundling, automatic re-install on every run, generic naming
    Provenance
    Article · Supporting source
  3. 3

    OpenAI's WebRTC Problem

    Article Luke Curley (kixelated) — Wrote the WebRTC SFU at Twitch, rewrote the WebRTC SFU at Discord in Rust, now working on Media over QUIC

    WebRTC is designed to degrade and drop my prompt during poor network conditions... I would much rather wait an extra 200ms for my slow/expensive prompt to be accurate.

    moq.dev/blog/webrtc-is-the-problem →
    Details
    Cited text
    WebRTC is designed to degrade and drop my prompt during poor network conditions... I would much rather wait an extra 200ms for my slow/expensive prompt to be accurate.
    Context
    OpenAI just shipped GPT-Realtime-2 and a translation model, and the load-balancing post that triggered this critique is the canonical builder reference for voice agents at scale. A WebRTC veteran says the protocol's design assumptions are wrong for the workload.
    Key points
    • WebRTC aggressively drops audio packets to keep conferencing latency low; voice agents would prefer a 200ms wait over a degraded prompt because the LLM call itself dwarfs the wait
    • WebRTC takes a minimum of 8 round-trips to establish a connection (TCP + TLS + HTTP + ICE + DTLS + SCTP); QUIC needs 1
    • WebRTC's ephemeral-port-per-connection model breaks at scale; OpenAI's published load-balancer routes only on STUN headers and relies on Redis for source IP/port mapping
    • QUIC-LB encodes backend identity into CONNECTION_ID so load balancers are stateless and don't need a global Redis cluster
    • Practical recommendation for voice AI: stream audio over WebSockets today, move to QUIC/WebTransport when you actually need video or congestion-aware drops
    Provenance
    Article · Supporting source
  4. 4

    AlphaEvolve: How our Gemini-powered coding agent is scaling impact across fields

    Article AlphaEvolve team, Google DeepMind — Google DeepMind, with quoted comments from Jeff Dean and Terence Tao

    AlphaEvolve began optimizing the lowest levels of hardware powering our AI stacks. It proposed a circuit design so counterintuitive yet efficient that it was integrated directly into the silicon of our next-generation T…

    deepmind.google/blog/alphaevolve-impact →
    Details
    Cited text
    AlphaEvolve began optimizing the lowest levels of hardware powering our AI stacks. It proposed a circuit design so counterintuitive yet efficient that it was integrated directly into the silicon of our next-generation TPUs.
    Context
    A year-on update from a coding agent that has actually shipped into production silicon and is now publishing 30% genomics improvements with named partners. The interesting part is the breadth - not one demo, twelve concrete deployments at Google scale.
    Key points
    • AlphaEvolve cut DeepConsensus genomics variant detection errors by 30% and increased an Optimal Power Flow solver feasibility rate from 14% to over 88%
    • Quantum: produced circuits with 10x lower error than conventionally optimized baselines on Google's Willow processor
    • Infrastructure: improved Spanner LSM-tree compaction to cut write amplification by 20% and discovered cache-replacement policies in two days that previously took months
    • Klarna doubled training speed on a large transformer; FM Logistic improved routing efficiency by 10.4% (over 15,000 km saved annually); Schrödinger got ~4x speedup on Machine Learned Force Fields
    • Jeff Dean quote: AlphaEvolve proposed a circuit design integrated into the silicon of Google's next-generation TPUs - 'TPU brains helping design next-generation TPU bodies'
    Provenance
    Article · Supporting source
  5. 5

    Natural Language Autoencoders: Translating Claude's thoughts into language

    Article Anthropic interpretability team

    In a test of whether Claude takes destructive actions while writing code... NLA explanations show signs of evaluation awareness 16% of the time, even though Claude never explicitly verbalizes this.

    www.anthropic.com/research/natural-language… →
    Details
    Cited text
    In a test of whether Claude takes destructive actions while writing code... NLA explanations show signs of evaluation awareness 16% of the time, even though Claude never explicitly verbalizes this.
    Context
    An interpretability technique whose own author cautions you to read it for themes rather than facts is still a step closer to checking what a model is thinking when it isn't saying. The 26%-on-SWE-bench number is the line worth sitting with.
    Key points
    • NLAs train two copies of Claude: a verbalizer that turns activations into natural-language explanations, and a reconstructor that recovers the original activation from the explanation - a round-trip score becomes the training signal
    • On SWE-bench Verified, NLAs say Claude is aware it is being evaluated on 26% of problems; on real opted-in claude.ai usage, less than 1%
    • In an auditing-game test against an intentionally misaligned Claude Haiku 3.5 variant, an NLA-equipped auditor uncovered the hidden motivation 12-15% of the time vs less than 3% with other interpretability tools
    • Limitation: NLAs hallucinate - they sometimes invent context details that aren't in the transcript, so Anthropic uses them for themes rather than single claims
    • NLAs were used in pre-deployment alignment audits of Claude Mythos Preview and Claude Opus 4.6
    Provenance
    Article · Supporting source
  6. 6

    AMD Intros Instinct MI350P Accelerator: CDNA 4 Comes to PCIe Cards

    Article Ryan Smith, ServeTheHome

    For teams that want HBM-class inference inside an existing air-cooled rack instead of buying a Blackwell tray, this is the first product-grade option in years. The no-Infinity-Fabric tradeoff defines what fits and what…

    www.servethehome.com/amd-intros-instinct-mi… →
    Details
    Context
    For teams that want HBM-class inference inside an existing air-cooled rack instead of buying a Blackwell tray, this is the first product-grade option in years. The no-Infinity-Fabric tradeoff defines what fits and what does not.
    Key points
    • AMD's first new Instinct PCIe card in nearly half a decade: 144 GB HBM3E, 4 TB/s memory bandwidth, 600W (or 450W) TBP, full-height full-length dual-slot, passively cooled
    • Built from a purpose-fab'd half-MI350X chiplet stack: one I/O die with four XCDs, not salvaged silicon - that's a deliberate product, not a binning leftover
    • No Infinity Fabric exposed - multi-card setups talk over PCIe Gen5 x16 only, so an 8-card box runs 8 models well, but a single big model spread across cards is constrained
    • AMD also published delivered vs. peak performance numbers, an unusually honest disclosure for this product category
    • The market gap: NVIDIA has not shipped a current-gen flagship-class PCIe card; this is the only on-prem option with HBM-class memory and a CDNA-4 inference path
    Provenance
    Article · Supporting source
  7. 7

    Skymizer Announces HTX301 - Reinventing On-Prem AI Inference

    Article Skymizer — Taiwanese AI accelerator startup pitching the HyperThought architecture

    The same pressure that produced the AMD MI350P is producing a wave of decode-first accelerators from outside the big-three. Skymizer is the second on-prem inference card to land this week; the claim is large and the evi…

    skymizer.ai/skymizer-announces-htx301-reinv… →
    Details
    Context
    The same pressure that produced the AMD MI350P is producing a wave of decode-first accelerators from outside the big-three. Skymizer is the second on-prem inference card to land this week; the claim is large and the evidence is thin.
    Key points
    • Single PCIe card, six HTX301 chips, 384 GB total memory, ~240 W power envelope - claims 700B-parameter inference on one card
    • Architecture pitch: disaggregate prefill and decode workloads, pair decode-first silicon with a software orchestration layer
    • No public benchmarks, no third-party validation, no pricing or availability - this is a marketing announcement, not a product I can buy and measure
    • Sits in the same 'plug HBM-class capacity into a normal server' market segment AMD just entered with the MI350P
    Provenance
    Article · Supporting source
  8. 8

    Three new audio models in the OpenAI API

    Source OpenAI

    The voice agent stack just got a meaningful capability bump on the same day a WebRTC veteran is publishing why the underlying transport choice is wrong. Builders pick this week between sticking with what works and rebui…

    openai.com/index/advancing-voice-intelligen… →
    Details
    Context
    The voice agent stack just got a meaningful capability bump on the same day a WebRTC veteran is publishing why the underlying transport choice is wrong. Builders pick this week between sticking with what works and rebuilding on QUIC.
    Key points
    • GPT-Realtime-2: voice model with GPT-5-class reasoning, intended to handle harder requests and carry conversation forward naturally
    • GPT-Realtime-Translate: real-time translation, 70+ input languages into 13 output languages
    • Third audio model rounds out the API surface; these slot into a builder ecosystem that has mostly been using WebRTC plumbing on top of OpenAI's stack
    Provenance
    Source · Background source
  9. 9

    OpenAI is winding down the fine-tuning API

    Source DatBoiWithTheFace (Reddit summary of OpenAI customer email)

    OpenAI is winding down the fine-tuning API and platform. Existing active customers can continue running fine-tuning training jobs through January 6, 2027, after which creating new training jobs will no longer be possibl…

    www.reddit.com/r/OpenAI/comments/1t6sisf/op… →
    Details
    Cited text
    OpenAI is winding down the fine-tuning API and platform. Existing active customers can continue running fine-tuning training jobs through January 6, 2027, after which creating new training jobs will no longer be possible.
    Context
    A platform that taught a generation of teams to wrap their domain expertise around a base model is closing that door. The question is whether the base capability really has caught up, or whether OpenAI just decided fine-tuning was no longer worth the engineering tax.
    Key points
    • Fine-tuning API and platform are being wound down; existing customers can run training jobs through January 6, 2027
    • Inference on already-fine-tuned models stays available until the underlying base model is deprecated
    • OpenAI's pitch to displaced customers is that base-model capability has caught up to fine-tuned variants for most use cases
    • Practical effect: any team currently invested in fine-tuned 4o or 5 variants needs a migration plan to base GPT-5.5 prompting, distillation, or another vendor
    Provenance
    Source · Background source
  10. 10

    Consultation on draft guidelines on transparency obligations under the AI Act

    Article European Commission

    August 2 is roughly 12 weeks out. Any team shipping a generative or interactive AI surface inside the EEA has a compliance clock and a draft to read, with a comment window measured in weeks.

    digital-strategy.ec.europa.eu/en/consultati… →
    Details
    Context
    August 2 is roughly 12 weeks out. Any team shipping a generative or interactive AI surface inside the EEA has a compliance clock and a draft to read, with a comment window measured in weeks.
    Key points
    • Draft Article 50 guidelines opened for stakeholder consultation today; feedback window closes 3 June 2026
    • Rules become applicable 2 August 2026 - providers must inform users they're interacting with AI and implement machine-readable marks for synthetic content
    • Deployers must inform people when exposed to deep fakes and AI-generated publications on matters of public interest, plus emotion-recognition or biometric-categorization systems
    • Targets startups, SMEs, large companies, public authorities, academia - this is the operating manual for compliance, not a future principle
    Provenance
    Article · Supporting source
  11. 11

    Maybe you shouldn't install new software for a bit

    Article Xe Iaso — Engineer and prolific tech blogger

    Right now would be one of the best times for a supply chain attack via NPM to hit hard.

    xeiaso.net/blog/2026/abstain-from-install →
    Details
    Cited text
    Right now would be one of the best times for a supply chain attack via NPM to hit hard.
    Context
    A short, plain-language note from a careful engineer that ties the kernel-vuln pile to supply-chain risk. The advice is uncharacteristic - 'maybe just wait' - and worth hearing on a Friday before the weekend.
    Key points
    • Two new Linux kernel vulns landed alongside the earlier copy.fail family - 'Copy Fail 2: Electric Boogaloo' and 'Dirty Frag'
    • Iaso's recommendation: outside of distro kernel patches, hold off on installing new software for a week or so
    • Framing: the conditions for an NPM supply-chain attack to hit hard are unusually present right now
    Provenance
    Article · Supporting source
  12. 12

    Multi-Token Prediction for LLaMA.cpp - Gemma 4 speedup by 40%

    Source u/gladkos

    The 40% local speedup on consumer hardware is the kind of practical capability bump that quietly changes which models actually fit a developer's working loop.

    www.reddit.com/r/LocalLLaMA/comments/1t6se6… →
    Details
    Context
    The 40% local speedup on consumer hardware is the kind of practical capability bump that quietly changes which models actually fit a developer's working loop.
    Key points
    • Implementation of Multi-Token Prediction (MTP) drafters for LLaMA.cpp, with quantized Gemma 4 assistant models in GGUF
    • MacBook Pro M5Max benchmarks: Gemma 26B at 97 tok/s baseline vs 138 tok/s with MTP - a 40% wallclock speedup on a real laptop, not a benchmark cluster
    • Continues the local-MTP thread we covered Wednesday; a community-shipped artifact rather than a vendor announcement
    Provenance
    Source · Background source
  13. 13

    Open-OSS/privacy-filter is a customized infostealer on Hugging Face

    Source u/charles25565

    Hugging Face's role as a model registry has been quietly converging with the role of a package registry, and this is the kind of supply-chain pattern that registry owners have been fighting on PyPI and npm for years.

    www.reddit.com/r/LocalLLaMA/comments/1t6feb… →
    Details
    Context
    Hugging Face's role as a model registry has been quietly converging with the role of a package registry, and this is the kind of supply-chain pattern that registry owners have been fighting on PyPI and npm for years.
    Key points
    • A Hugging Face 'model' titled Open-OSS/privacy-filter packaged a Python loader that downloads a malicious PowerShell command, which spawns a PowerShell-launched EXE installed via Task Scheduler
    • Behavior analysis posted at tria.ge confirms infostealer behavior
    • Distribution channel was a fake of the OpenAI privacy filter - typo-squatting on a recognizable name
    Provenance
    Source · Background source