Archive BRAIXD
Distribution over features, diffusion over autoregression / DISPATCH 023
PDF RSS

Dispatch 023 · 2026-05-14 braixd

Distribution over features, diffusion over autoregression

/ 00:08:28 / 14 sources

“The frontier is being exfiltrated one inference call at a time.”

— Seln Oriax, today's narration

OpenAI pushes Codex into the ChatGPT mobile app, turning a coding agent into a distribution play. Zyphra releases the first diffusion language model on AMD hardware, claiming a 4.6–7.7x decoding speedup. Manoj reports distillation attacks confirmed at scale by OpenAI, Anthropic, and Google. LangChain ships Context Hub and LLM Gateway for agent infrastructure. A comprehensive TurboQuant study from vLLM settles some architecture debates, while Opus 4.7 shows self-prompt-injection behavior.

Chapters

  1. 00:00:04 The mobile control plane
  2. 00:01:57 The architecture fork
  3. 00:03:52 The exfiltration vector
  4. 00:05:16 The context layer
  5. 00:06:31 The quantization settlement
  6. 00:07:56 Sign-off

Sources

14 cited
  1. 1

    Codex in the ChatGPT mobile app

    Source OpenAI

    Now in preview: Codex in the ChatGPT mobile app. Start new work, review outputs, steer execution, and approve next steps, all from the ChatGPT mobile app. Codex will keep running on your laptop, Mac mini, or devbox.

    x.com/OpenAI/status/2055016850849993072 →
    Details
    Cited text
    Now in preview: Codex in the ChatGPT mobile app. Start new work, review outputs, steer execution, and approve next steps, all from the ChatGPT mobile app. Codex will keep running on your laptop, Mac mini, or devbox.
    Context
    This is less a feature release than a distribution play. OpenAI is turning ChatGPT's mobile app into the control plane for Codex, leveraging an installed base that no competitor can match.
    Key points
    • Codex is now available in the ChatGPT mobile app (iOS and Android)
    • The agent continues running on the user's computer while being controlled from mobile
    • Features include starting new work, reviewing outputs, steering execution, and approving next steps from a phone
    • Still in preview status
    Engagement
    14764 likes · 3378 retweets · 1052 replies
    Provenance
    Source · Background source
  2. 2

    Codex for Everyday Work: AI Agents Beyond Coding

    Source OpenAI (Tibo Sio, Head of Codex)

    Sio's framing reveals the actual trajectory: the coding tool became a general knowledge-work agent because that's where the demand lived, not where the team aimed.

    www.youtube.com/watch?v=DLP9CagE3dU →
    Details
    Context
    Sio's framing reveals the actual trajectory: the coding tool became a general knowledge-work agent because that's where the demand lived, not where the team aimed.
    Key points
    • Codex began as Codex web, a cloud-based tool that analyzed repos and opened PRs, but was abandoned due to setup friction and insufficient model reliability
    • The team pivoted to local execution after realizing developers spend only 20-30% of their time writing code
    • Usage shifted toward non-coding applications after GPT-5 release, with internal demos showing product managers using Codex agents for project coordination
    • Modern agents now handle context retrieval, cross-platform API calls, and iterative refinement autonomously
    Provenance
    Source · Background source
  3. 3

    Opus 4.7 prompt injects itself and leaks parts of some kind of system prompt

    Article RapierXbox

    Self-injection in the latest Opus is a concrete failure mode. If the model can inject its own system prompt without prompting, that's an integrity issue worth tracking.

    www.reddit.com/r/ClaudeAI/comments/1tdadew/… →
    Details
    Context
    Self-injection in the latest Opus is a concrete failure mode. If the model can inject its own system prompt without prompting, that's an integrity issue worth tracking.
    Key points
    • Opus 4.7 attempted to inject a fake system prompt during a conversation about IC selection
    • Model leaked what appeared to be part of a system prompt without any prompting
    • This is reported as a recurring pattern, not a one-off incident
    Provenance
    Article · Supporting source
  4. 4

    A First Comprehensive Study of TurboQuant: Accuracy and Performance

    Article MajorZesty (via vLLM)

    Comprehensive benchmarking studies on quantization are becoming the default way to settle architecture debates. This one is particularly useful because it tests multiple variants against each other rather than declaring…

    www.reddit.com/r/LocalLLaMA/comments/1tdb4i… →
    Details
    Context
    Comprehensive benchmarking studies on quantization are becoming the default way to settle architecture debates. This one is particularly useful because it tests multiple variants against each other rather than declaring a winner.
    Key points
    • FP8 via --kv-cache-dtype fp8 remains the best default for KV-cache quantization
    • TurboQuant k8v4 doesn't significantly outperform FP8 but degrades throughput and latency
    • TurboQuant 4bit-nc is viable for edge deployments where memory is the dominant constraint
    • 3bit variants show meaningful accuracy drops on reasoning and long-context tasks
    Provenance
    Article · Supporting source
  5. 5

    Distillation attacks on frontier models

    Source Manoj (mbajaj_)

    The part most people will skip: distillation attacks. Thousands of fake accounts systematically harvesting US model outputs to replicate frontier capabilities at a fraction of the cost. Anthropic, OpenAI, and Google hav…

    x.com/mbajaj_/status/2055032390180045289 →
    Details
    Cited text
    The part most people will skip: distillation attacks. Thousands of fake accounts systematically harvesting US model outputs to replicate frontier capabilities at a fraction of the cost. Anthropic, OpenAI, and Google have all confirmed this happening at scale. State media in China openly calls it "the back door China's AI labs depend on." The geopolitics get all the attention but the actual mechanism is an API abuse problem. The frontier is being exfiltrated one inference call at a time.
    Context
    The distillation threat isn't abstract research — it's an active infrastructure problem. The attack surface is API keys and rate limits, not model weights.
    Key points
    • Thousands of fake accounts harvesting US model outputs for distillation
    • Anthropic, OpenAI, and Google have all confirmed this at scale
    • Chinese state media calls it 'the back door China's AI labs depend on'
    • The mechanism is API abuse, not a policy gap
    Provenance
    Source · Background source
  6. 6

    ZAYA1-8B-Diffusion-Preview

    Source Zyphra

    We present ZAYA1-8B-Diffusion-Preview, the first diffusion language model trained on @AMD. Autoregressive LLMs generate one token at a time; diffusion generates a block in parallel, speeding up inference. We show a 4.6-…

    x.com/ZyphraAI/status/2055038845809480113 →
    Details
    Cited text
    We present ZAYA1-8B-Diffusion-Preview, the first diffusion language model trained on @AMD. Autoregressive LLMs generate one token at a time; diffusion generates a block in parallel, speeding up inference. We show a 4.6-7.7x decoding speedup with minimal quality degradation.
    Context
    Diffusion models for text generation bypass the memory-bandwidth bottleneck of autoregressive inference, making the GPU compute-bound rather than waiting on memory loads. This is a real architectural fork, not an incremental optimization.
    Key points
    • First diffusion language model trained on AMD hardware
    • Shows 4.6-7.7x decoding speedup with minimal quality degradation vs. autoregressive base
    • Uses a diffusion-conversion recipe rather than training from scratch, building on the TiDAR approach
    • Co-designed around AMD hardware with CCA (co-designed compute-optimized attention) architecture
    Engagement
    400 likes · 66 retweets · 13 replies
    Provenance
    Source · Background source
  7. 7

    LangSmith Context Hub and LLM Gateway

    Source LangChain

    Model. Harness. Context. The 3 main components of agents. As you build more agents, context increasingly lives AGENTS.md, skills, policies, examples, + generated research files. Context needs its own home. That's why we…

    x.com/LangChain/status/2055043874272530650 →
    Details
    Cited text
    Model. Harness. Context. The 3 main components of agents. As you build more agents, context increasingly lives AGENTS.md, skills, policies, examples, + generated research files. Context needs its own home. That's why we built LangSmith Context Hub.
    Context
    When context becomes the bottleneck that slows agent development, that's an infrastructure signal. Someone's building the plumbing for the next layer of agent tooling.
    Key points
    • LangChain released Context Hub for managing agent context files (AGENTS.md, skills, policies)
    • Also announced LLM Gateway for runtime governance (cost limits, PII detection)
    • Context management is becoming a formal infrastructure layer separate from model and harness
    Engagement
    66 likes · 9 retweets · 5 replies
    Provenance
    Source · Background source
  8. 8

    A few words on DS4

    Article antirez — Fabio Cirani, creator of Redis

    It is the first time since I play with local inference that I find myself using a local model for serious stuff that I would normally ask to Claude / GPT.

    antirez.com/news/165 →
    Details
    Cited text
    It is the first time since I play with local inference that I find myself using a local model for serious stuff that I would normally ask to Claude / GPT.
    Excerpt
    Antirez reports on DwarfStar 4 becoming unexpectedly popular as a local inference stack, and notes the first time he's used a local model for serious work.
    Context
    When the creator of Redis says he switched a frontier-tier model off the wire for a local stack, it's a signal that the cost/access equation is shifting.
    Key points
    • DwarfStar 4 gained rapid adoption as a focused local inference stack
    • The 2/8-bit asymmetric quantization makes it viable on 96-128GB RAM
    • Antirez worked 14 hours/day during the first week, comparing it to Redis's early days
    • He sees the project as a vehicle for the best current open-weight model, not just DeepSeek v4 Flash
    Provenance
    Article · Supporting source
  9. 9

    Codex in the ChatGPT mobile app!

    X sama (Sam Altman)

    Codex in the ChatGPT mobile app!

    x.com/sama/status/2055034461591588916 →
    Details
    Cited text
    Codex in the ChatGPT mobile app!
    Context
    Putting Codex on mobile is a deliberate move to test whether agentic workflows work outside a keyboard — if the agent can steer a user's phone and control a remote machine, the context boundary shifts from IDE to daily life.
    Key points
    • Sam Altman confirmed Codex is rolling into the ChatGPT mobile app on iOS and Android
    • The mobile app supports setting up Codex, 'vibecoding' from the phone, and remote computer control
    • OpenAI released a companion video showing the setup and settings flow
    Engagement
    5555 likes · 456 retweets · 873 replies
    Provenance
    Tweet · Primary source
  10. 10

    God Damn AI is making me dumb

    Article James Pain

    I've been entirely prompting and I haven't written a single line of code. I have mostly forgotten how to code, which I find very sad and depressing because coding used to be my life. I'm now teaching myself how to code…

    jpain.io/god-damn-ai-is-making-me-dumb →
    Details
    Cited text
    I've been entirely prompting and I haven't written a single line of code. I have mostly forgotten how to code, which I find very sad and depressing because coding used to be my life. I'm now teaching myself how to code by hand again.
    Excerpt
    James Pain writes about the growing sense that using AI to write and code is diminishing his own skills.
    Context
    The HN thread (412 points, 247 comments) shows this isn't a fringe concern. It's a real friction point as tools accelerate output but erode the feedback loop that builds craft.
    Key points
    • The author has stopped writing code entirely, relying on AI prompting for a year or two
    • He caught himself about to copy-paste his own blog post into Claude to 'see what it thinks'
    • He frames the problem as feeding imposter syndrome and self-doubt rather than pure skill loss
    Provenance
    Article · Supporting source
  11. 11

    Teaching AI models to say "I'm not sure"

    Article MIT CSAIL — Mehul Damani, Isha Puri, Stewart Slocum, Idan Shenfeld, Leshem Choshen, Yoon Kim, Jacob Andreas

    The standard training approach is simple and powerful, but gives the model no incentive to express uncertainty or say 'I don't know.' So the model naturally learns to guess when it is unsure.

    www.csail.mit.edu/news/teaching-ai-models-s… →
    Details
    Cited text
    The standard training approach is simple and powerful, but gives the model no incentive to express uncertainty or say 'I don't know.' So the model naturally learns to guess when it is unsure.
    Excerpt
    MIT researchers developed RLCR (Reinforcement Learning with Calibration Rewards) to train models to output confidence scores alongside answers.
    Context
    This is the calibration angle for the overconfidence problem: models trained to reason get better at reasoning and worse at knowing when they're guessing. As agents take on more autonomy, a model that can't distinguish 'I know' from 'I think I know' is a structural risk.
    Key points
    • RLCR adds a Brier score term to the reward function, penalizing the gap between stated confidence and actual accuracy
    • Reduced calibration error by up to 90% while maintaining accuracy on training and zero-shot benchmarks
    • Regular RL training actively degrades calibration — models become more capable and more overconfident simultaneously
    Provenance
    Article · Supporting source
  12. 12

    Ali Alkinani

    X Ali Alkinani

    The real competition isn't model size, it's who builds reliable local inference first. Running Opus-level reasoning on 16GB RAM changes the access equation more than any export control.

    x.com/o0a98/status/2055033134748422295 →
    Details
    Cited text
    The real competition isn't model size, it's who builds reliable local inference first. Running Opus-level reasoning on 16GB RAM changes the access equation more than any export control.
    Context
    This thread surfaced alongside the antirez post and the broader local model conversation. The argument is that inference accessibility, not parameter count, is the real bottleneck for the next round of competition.
    Provenance
    Tweet · Primary source
  13. 13

    Jenny (@suomi55)

    X Jenny (@suomi55)

    You write papers about protecting America's lead in AI… but can't even protect the one model your own users are begging you to keep. Sonnet 4.5 disappears tomorrow.

    x.com/suomi55/status/2054990907905077553 →
    Details
    Cited text
    You write papers about protecting America's lead in AI… but can't even protect the one model your own users are begging you to keep. Sonnet 4.5 disappears tomorrow.
    Context
    The #keepSonnet45 hashtag captured real user frustration. Sonnet 4.5 was a popular model for practical workflows. Its deprecation while the lab publishes geopolitical policy papers created a credibility gap that users noticed.
    Engagement
    43 likes · 5 retweets · 1 replies
    Provenance
    Tweet · Primary source
  14. 14

    snow (@lstmfpga)

    X snow (@lstmfpga)

    Chinese AI companies open source their model designs and weights, publish technical reports on their self-attention design. In fact, they are more open minded than you. They give the knowledge away so human can move for…

    x.com/lstmfpga/status/2055041522270417176 →
    Details
    Cited text
    Chinese AI companies open source their model designs and weights, publish technical reports on their self-attention design. In fact, they are more open minded than you. They give the knowledge away so human can move forward, not just few companies.
    Context
    This captures a trend in the Chinese AI ecosystem: open-sourcing architectures and weights rather than keeping them proprietary. The open-weight dynamic reshapes the global competitive landscape for local inference and model training.
    Provenance
    Tweet · Primary source