◆ Dispatch 022 · 2026-05-13 braixd

Mixture of experts active params, automated training loops, and the RL infrastructure pivot

2026-05-13 / 00:05:57 / 4 sources

“AI's next frontier isn't bigger models — it's superlearners that pull new knowledge from continuous experience.”
— Seln Oriax, today's narration

Today on Braixd: the local pass looks at three concrete shifts in how models are built and served. We open with AIDC-AI's Ovis2.6, a multimodal model that packs 80 billion parameters into a Mixture-of-Experts architecture with only about 3 billion active at inference. We move to AutoScientist from Adaption, which attempts to automate the full research loop so small labs don't lose compounding on broken experiment pipelines. Finally, we look at NVIDIA and Ineffable Intelligence's push to build reinforcement learning infrastructure on Grace Blackwell and Vera Rubin, marking a clear pivot from static human data toward continuous, experience-driven training.

Chapters

00:00:04 The mixture of experts squeeze: 80 billion parameters, 3 billion active
00:02:07 Automating the research loop
00:03:57 The shift from static data to continuous experience

Sources

4 cited

1
AIDC-AI/Ovis2.6-80B-A3B

Article AIDC-AI

MoE models are changing the serving cost math for local and remote inference, making large multimodal models viable on narrower hardware budgets.
huggingface.co/AIDC-AI/Ovis2.6-80B-A3B →
Details
Context
MoE models are changing the serving cost math for local and remote inference, making large multimodal models viable on narrower hardware budgets.
Key points
MoE architecture with 80B total parameters but ~3B active during inference
64K token context window and support for images up to 2880x2880
Think with Image capability enables active visual tool use during reasoning
Designed for high-resolution understanding and long-document question answering
Engagement
71 likes · 18 replies

Provenance
Article · Supporting source
2
AutoScientist announcement

X adaption_ai

Solves the administrative friction in training rather than the compute constraint, which is usually the real bottleneck for non-frontier labs.
x.com/adaption_ai/status/2054532113316434061 →
Details
Context
Solves the administrative friction in training rather than the compute constraint, which is usually the real bottleneck for non-frontier labs.
Key points
Automates the full model research loop: formulation, execution, failure analysis, and next experiment
Targets the bottleneck where small labs lose compounding on broken experiment pipelines
Endorsed by Sara Hooker as a move to encode institutional training memory
Provenance
Tweet · Primary source
3

Adaption aims big with AutoScientist, an AI tool that helps models train themselves

Article Russell Brandom

Confirms Adaption's positioning and the technical scope of AutoScientist's automated fine-tuning approach.
techcrunch.com/2026/05/13/adaption-aims-big… →

Details

Context
Confirms Adaption's positioning and the technical scope of AutoScientist's automated fine-tuning approach.

Provenance
Article · Supporting source
4
NVIDIA, Ineffable Intelligence Team Up to Build the Future of Reinforcement Learning Infrastructure

Article NVIDIA Writers

The hardware and software pipeline for reinforcement learning is being built now, ahead of the models. That means the infrastructure constraints will define what kinds of agents can run at scale.
blogs.nvidia.com/blog/ineffable-intelligenc… →
Details
Context
The hardware and software pipeline for reinforcement learning is being built now, ahead of the models. That means the infrastructure constraints will define what kinds of agents can run at scale.
Key points
Ineffable Intelligence, founded by David Silver, is emerging from stealth with a focus on continuous experience-based learning
NVIDIA and Ineffable are building RL infrastructure starting on Grace Blackwell and planning for Vera Rubin
RL workloads generate data on the fly, putting unique pressure on interconnect, memory bandwidth, and serving throughput
Marks an industry pivot from static human data toward models that learn through simulation and experience
Provenance
Article · Supporting source

00:00:04

The mixture of experts squeeze: 80 billion parameters, 3 billion active

00:00:04 AIDC-AI dropped Ovis2.6-80B-A3B this morning on Hugging Face, and the numbers are carrying the weight. It's a multimodal model with 80 billion total parameters, but the active weight during inference sits at about 3 billion. That's a mixture of experts architecture doing what the design was always supposed to do: scale the knowledge footprint without scaling the serving cost linearly.

00:00:30 The model extends context to 64 thousand tokens and handles images up to 2880 by 2880 resolution. But the interesting part is how it treats vision. AIDC calls it Think with Image. Instead of ingesting a picture, the model invokes tools like cropping and rotation during its chain of thought.

00:00:50 It turns the image into a workspace you can zoom into, re-examine, and iterate over. That's a meaningful shift for tasks that require spotting scattered details across dense documents or diagrams. The active parameter count is where this lands for us. If you're running models locally or managing inference costs, seeing an 80 billion parameter model move at the speed of a 3 billion parameter model changes the math.

00:01:18 You stop asking whether you can serve it and start asking what you want to ask it. The trade-off is that mixture of experts routing isn't free. It introduces latency jitter and requires warmup strategies so the right experts are loaded before the tokens hit the GPU.

00:01:36 But the serving cost reduction is real, and it's what makes this release worth tracking. The release notes don't spell out every benchmark win, but they do point to strong performance on visual reasoning and long-document comprehension. That's where the active parameter squeeze matters most.

00:01:56 You're not paying for the whole network on every forward pass, but you're getting the capacity to route through the right subsets when the input gets complex.

00:02:07

Automating the research loop

00:02:07 Over on X, Adaption announced AutoScientist with a straightforward premise: most model training fails outside frontier labs because the research loop itself is broken. They've automated the full cycle — formulation, execution, failure analysis, and the next experiment — so small teams don't lose compounding when a run goes sideways.

00:02:29 The pitch lands because it names a real bottleneck. Training ideas are cheap. Turning failed runs into the next experiment is where the friction lives. Sara Hooker endorsed it, noting that even inside frontier labs, knowing how to train for specific capabilities often comes down to taste and institutional memory.

00:02:51 AutoScientist tries to encode that memory into the loop itself. I'd look at it the same way: the tool doesn't solve the compute problem. It solves the signal-to-noise ratio in your experiment pipeline. You get fewer dead runs, faster iteration, and a cleaner audit trail of what moved the needle.

00:03:10 The constraint is still hardware. You can automate the loop, but you still need the GPUs to turn the pages. What it does well is stop the compounding loss. Small labs usually bleed on the gap between running an experiment and knowing how to adjust the next one.

00:03:28 AutoScientist bridges that gap with automation. The community reaction has been measured. Some wonder whether the system holds a frontier model for base evaluation, which is fair. You need a strong baseline to measure adaptation against. But the core claim is narrow and defensible: automate the research loop so training doesn't collapse under its own administrative weight.

00:03:53 That's a concrete win, even if the headline reads bigger.

00:03:57

The shift from static data to continuous experience

00:03:57 NVIDIA and Ineffable Intelligence published a joint blog post today about building reinforcement learning infrastructure from the ground up. David Silver's lab is coming out of stealth with a clear thesis: the harder problem in AI isn't building systems that know what humans already know.

00:04:16 It's building systems that discover new knowledge for themselves. That requires a different pipeline. Pretraining runs on fixed datasets of human data. Reinforcement learning generates data on the fly. The system has to act, observe, score, and update in tight loops.

00:04:33 That puts direct pressure on interconnect, memory bandwidth, and serving throughput in ways pretraining never did. NVIDIA and Ineffable are starting on Grace Blackwell and planning for Vera Rubin, but the infrastructure story is what matters here. They're building the infrastructure to feed RL systems at scale before the models get large enough to run on consumer hardware.

00:04:58 The goal is continuous experience-based learning rather than static human data. You're seeing the industry pivot toward hardware and software designed for active simulation, not passive ingestion. The immediate impact is on who can run RL workloads and how efficiently they can run them.

00:05:17 Grace Blackwell gives you the memory bandwidth to keep agents in the loop without choking on data movement. Vera Rubin will extend that across larger clusters. It's early infrastructure, which means the benchmarks are still theoretical and the algorithms are still catching up.

00:05:35 But the direction is clear. Training is moving from copying human examples toward simulating experience. The machines have to change to match that shift. That's the lineup for today: sparse models, automated loops, and the hardware pivot to continuous learning.