Skip to content

Explanation

Embedding models and build channels

Which BGE model the brain uses, why it can't change after init, and how the airgapped and online channels differ.

GigaBrain runs the embedding model itself, on the CPU you already own. The defaults are tuned for a useful tradeoff between quality and binary size; the online build lets you trade those off at install time.

AliasModel IDDimensionsNotes
small (default)BAAI/bge-small-en-v1.5384Fastest. Embedded in the airgapped binary.
baseBAAI/bge-base-en-v1.5768Stronger English recall. Online build only.
largeBAAI/bge-large-en-v1.51024Highest English recall. Online build only.
m3BAAI/bge-m31024Multilingual. Online build only.

You can also pass a full Hugging Face model ID as --model <ORG>/<NAME> on the online channel; it’s resolved at runtime, downloaded, and cached.

We picked the BGE family because:

  • Quality. Strong on retrieval benchmarks (BEIR) for the size class.
  • Open weights. No license entanglement.
  • Sane defaults. bge-small-en-v1.5 punches above its weight on personal-knowledge corpora.

We use candle for inference — pure Rust, no Python or ONNX runtime. The whole pipeline (tokenizer → model forward pass → pooled embedding) runs inside the binary.

The same source compiles two binaries, selected by Cargo features:

ChannelFeaturesBinary sizeModel assets
Airgapped (default)embedded-model~180 MBBGE-small bytes embedded via include_bytes!
Onlinebundled,online-model~90 MBCached on first semantic use under ~/.gbrain/models/
Terminal window
# Default — airgapped
cargo build --release
# Online build
cargo build --release --no-default-features --features bundled,online-model

On the airgapped binary, --model and GBRAIN_MODEL are warning-only no-ops: the embedded model is the only one that will ever load. The warning fires loudly enough that it’s hard to miss in CI.

gbrain init writes the active model into the immutable brain_config table:

KeyExampleMeaning
model_idBAAI/bge-large-en-v1.5The full Hugging Face model ID.
model_aliaslargeThe alias the user passed (or default).
embedding_dim1024The vector width — drives which page_embeddings_vec_<dim> table is used.
schema_version5The schema generation.

On every subsequent open, the binary verifies that the requested model matches the recorded one. A mismatch fails the call before any command runs. This isn’t a UX choice; it’s a correctness one — embeddings produced by different models live in different vector spaces and aren’t comparable.

You can’t change a brain’s model in place. The vector space is part of the database identity. To switch:

  1. gbrain init ~/brain-large.db --model large (online build).
  2. gbrain export ~/brain.db ~/brain-export/ (or just import your source markdown directory again).
  3. gbrain import ~/brain-export/ --db ~/brain-large.db.
  4. Drop ~/brain.db when you’re satisfied.

See Switch embedding models for the full recipe.

  • small (default) — Almost everyone, almost always. Fast, accurate enough for personal-knowledge scale, ships in the airgapped binary.
  • base / large — When you have ≥100K pages and you’re noticing recall degradation on paraphrased queries. Costs you binary size and per-query latency.
  • m3 — When your corpus is multilingual or you specifically want the m3 retrieval profile.
  • A specific HF model ID — When you’ve benchmarked something specific to your domain and you want to use it.

We do not recommend swapping models without a measured reason. The defaults are tuned for the median user.

Out of scope. The whole point of GigaBrain is that the brain doesn’t talk to the network. If you want OpenAI or Voyage embeddings, you’re in the wrong tool — but those tools work fine alongside GigaBrain for other concerns.

When brain_put updates a page, the embedding job queue (embedding_jobs) re-chunks and re-embeds whatever changed. If you suspect drift — a model upgrade, a chunking-strategy change, an aborted batch — gbrain embed --stale re-embeds only chunks whose content_hash no longer matches; gbrain embed --all rebuilds everything.