Skip to content
Nuro AI Labs
Solutions · For consumer & prosumer

One mind, one user, one device.

Personal intelligence isn't luxury infrastructure. It should run on the phone in your pocket, on the laptop in front of you, in the apps you actually use every day. AVALON-2B is the on-device Self-RAG runtime; Hypersave is the cognitive memory layer that makes the app remember you. Together they're the substrate for the next generation of consumer AI.

On Apple M3
40 tok/s
AVALON-2B
Parameters
1.88B
sub-3B · Self-RAG
Quants
GGUF
Q4 / Q5 / Q8
Memory recall
<200ms
p95 · Hypersave
What you can build

Apps that know who they're talking to.

01

Runs on the device

AVALON-2B is sub-3B parameters and ships as GGUF quants. 40 tok/s on Apple M3, comfortable on flagship phones, free on every desktop. Latency you can feel.

02

Memory that follows the user

Hypersave gives every consumer app a persistent cognitive layer. The user’s preferences, history, voice — captured once, recalled everywhere. They stop introducing themselves.

03

Self-RAG keeps it honest

AVALON’s reflection vocabulary lets a small on-device model decide when it needs to consult memory or external sources — and admit when it doesn’t know. No silent confabulation.

04

Privacy as the default

Inference local. Memory keys per-device. No mandatory call-home. Build consumer apps where the privacy story is the product story.

In the wild

We dogfood the stack. Nuro Chat is the public proof.

Nuro Chat is multi-model chat with persistent memory built on Hypersave — switch models without losing context, recall conversations from last week, run AVALON-2B locally when you don't want to send a turn to the cloud. The Nuro stack (Chat, Studio, One) is the consumer surface and the reference implementation.

Sample exchange · with persistent memory
you · last week
I'm working on a long-form essay about bottom-up routes to AGI.
you · today
Where did we leave the essay?
avalon · [Retrieval] [Relevant]
The essay on bottom-up AGI. You finished the section on persistent memory and were about to argue that reflection — not scale — is the rate-limiting step. Want to pick up from there?
Local-first integration

Pull AVALON. Add memory. Ship.

The fastest way from open-weights model to consumer app with a memory. Run AVALON through Ollama on the user's machine, route memory through Hypersave (managed or self-hosted). Nothing else to glue.

install.shbash
# On the user's device
ollama pull nuroai/avalon-2b

# In your app
npm install @hypersave/sdk ollama
app.tsts
import { Hypersave } from "@hypersave/sdk";
import { ollama } from "ollama";

const memory = new Hypersave({ apiKey: process.env.HYPERSAVE_KEY });

async function reply(userId: string, prompt: string) {
  const { answer: context } = await memory.recall({ userId, query: prompt });

  const response = await ollama.chat({
    model: "nuroai/avalon-2b",     // local — never hits the cloud
    messages: [
      { role: "system", content: `What you remember about the user: ${context}` },
      { role: "user", content: prompt },
    ],
  });

  await memory.remember({ userId, text: prompt, sector: "episodic" });
  return response.message.content;
}
Built on

One open-weights runtime, one cognitive memory layer.

AVALON-2B

On-device Self-RAG runtime

Sub-3B Self-Reflective RAG model. Apache 2.0. GGUF quants on Hugging Face. Ollama-ready. 40 tok/s on Apple M3. Beats Qwen 3.5 2B, Gemma 4 E2B, SmolLM3 3B on its target benchmarks.

Read the paper →

Hypersave

Persistent cognitive memory

Five sectors, Ebbinghaus decay, RRF hybrid retrieval. Embed via TS or Python SDK. Use the managed cloud or self-host on the user’s own infrastructure.

Read the docs →
A 2B model that knows when it doesn't know, paired with a memory layer that actually fuses retrieval — that is the substrate consumer AI has been waiting for.
Composite developer feedback · AVALON-2B early adopters · 2026
Get started

The model is open. The SDK is one command. Build something.

Free tier on Hypersave. AVALON-2B free forever (Apache 2.0). If you're shipping consumer AI and want to compare notes, reach press@nuroailabs.com.