Introducing AVALON-2B
The first sub-3B language model that knows what it doesn't know.
Today we are open-sourcing AVALON-2B — a 1.88-billion-parameter language model that knows when to look things up.
AVALON is the first language model below 3B parameters to implement Self-Reflective Retrieval-Augmented Generation with learned reflection tokens. Built on Qwen 3.5 2B, AVALON introduces a five-token reflection vocabulary that lets the model decide, at generation time, whether a query needs external retrieval — and whether the response it just generated is good enough.
Why this matters
Personal intelligence requires self-knowledge. A mind that doesn't know when it doesn't know is not yet a mind. Until now, that capability — Self-RAG — required 7B+ parameters. AVALON does it at 1.88B.
What it does
AVALON generates four self-reflective token classes during inference:
- [Retrieval] [No Retrieval] — does this query need external knowledge?
- [Relevant] — is the retrieved content actually useful?
- [Utility:1–5] — how good is the response I just generated?
A 22M-parameter MiniLM router predicts retrieval necessity at the query level with 90.5% accuracy in 5 ms — letting the retrieval call happen in parallel with prompt encoding.
Small enough for a phone. Big enough to know what it doesn't know.
Beats Qwen base 61.63, Gemma 4 E2B 58.0, SmolLM3 3B 55.0
Reflection vocabulary, learned end-to-end
Q4_K_M quant · 1.5 GB on disk
Native context window: 262 K tokens
- HellaSwag: 64.14 · ARC-Challenge: 42.75
- 262 K-token native context window
- Apache 2.0 — weights, GGUF quants and the paper public from today
Open
AVALON-2B is released under Apache 2.0. Weights, GGUF quants and the paper are public from today.
What's next
AVALON-2B is the first model in our research roadmap. PLMR — pre-tokenizer latent memory routing for byte-level LMs — is in preprint. Hydra is in active development. And on the infrastructure side, our cognitive memory layer for agents, Hypersave, is live.
We're a research lab on the long arc from personal intelligence to general intelligence. AVALON is the first proof that we mean it.
— Nuro AI Labs
Run it locally. Read the paper. Build with it.
Slow updates from the lab.
One email per release. No marketing, no tracking, unsubscribe anytime.