Explanation
Hybrid search
How GigaBrain combines FTS5, vector similarity, and progressive retrieval.
A query like “who is interested in offline-first knowledge?” needs both keyword precision and semantic recall. GigaBrain runs both, fuses the results, and — when asked — keeps expanding context until the answer holds together. This page explains how that works.
Three retrieval lanes
Section titled “Three retrieval lanes”Every brain_query runs up to three lookups, in order:
- Small-match short-circuit (SMS). If the query is a slug, a title, or a near-exact match for one of those, return it immediately. No need to embed or rank.
- FTS5 keyword search over
(title, slug, compiled_truth, timeline)with the porter-unicode61 tokenizer. - Vector search over the active model’s
page_embeddings_vec_<dim>table.
Lanes 2 and 3 run in parallel when SMS doesn’t fire. Their result sets are merged with set-union — unique pages take rank from whichever lane scored them, with a small bias toward documents that appeared in both.
Why both lanes
Section titled “Why both lanes”FTS5 and vector search fail in different places, and each is good at what the other can’t do:
| Question | What helps |
|---|---|
| ”river ai” | FTS5 (exact tokens, no synonyms needed) |
| “offline-first knowledge systems” | Vector (paraphrase tolerance) |
| “the person who joined RiverAI in April” | Both (needs RiverAI AND April AND a person concept) |
| “who’s lukewarm on cloud RAG?” | Vector (no exact phrase exists in the corpus) |
Running only vectors makes you miss precise matches; running only FTS5 makes you miss everything that’s worded differently. Set-union recovers both.
Set-union, not RRF
Section titled “Set-union, not RRF”Many hybrid systems use Reciprocal Rank Fusion. We don’t. The set-union strategy is simpler and works well at a single-user scale:
- Take the top-K from each lane.
- Merge by page ID, deduplicate.
- For a page that appeared in both lanes, take the better rank and add a small “appeared in both” bonus.
- Sort by adjusted score, return.
The merge strategy is configurable via config.search_merge_strategy for future experimentation; set-union is the seeded default.
Chunks, not pages
Section titled “Chunks, not pages”The vector index doesn’t embed whole pages. It embeds:
- Truth sections — chunks split by heading inside the compiled-truth half.
- Individual timeline entries — each
- DATE — textline is its own chunk.
This is what gives GigaBrain its date-aware behavior. A query like “what happened in April with Alice?” lands directly on the relevant timeline entries, not the whole Alice page.
When brain_put writes a new version, the embedding job queue (embedding_jobs) re-chunks and re-embeds anything whose content_hash changed. Stale chunks are pruned. gbrain embed --stale does the same job ad-hoc.
Progressive retrieval
Section titled “Progressive retrieval”brain_query accepts depth: "auto". With auto-depth on, the engine doesn’t just return the top-K — it expands context.
The expansion logic, simplified:
- Run hybrid search; take top-K results.
- For each result, fetch the surrounding chunks (headings above and below; nearby timeline entries).
- While the cumulative token count is under the budget (
default_token_budget, default 4000), keep adding context. - Stop when the budget is hit, or when adding more context would not improve coverage (no new relevant terms).
The output is a richer, ranked list with fully reconstituted context — designed to be dropped straight into an LLM prompt without further retrieval.
Wing filters and palace classification
Section titled “Wing filters and palace classification”gbrain derives a wing and room for every page during ingestion (see core::palace). Wings are the top-level memory-palace bucket (people, companies, concepts, …); rooms are derived from content shape.
You can pass --wing <NAME> to search, query, or list to constrain retrieval to one bucket. The classifier is also consulted by intent detection on the query side: a query with named-entity shape biases toward people/company wings; a how-to question biases toward concept/resource wings.
Token budgets
Section titled “Token budgets”Two budgets matter:
- Embedding budget at ingest time. Chunks that exceed the model’s max context get split. The active model’s max is recorded in
embedding_modelsand respected by the chunker. - Retrieval budget at query time.
default_token_budget(default 4000) caps how much context progressive retrieval will assemble. Tune viagbrain config set default_token_budget 6000if your downstream model wants more.
Performance
Section titled “Performance”For a brain with ~50,000 pages on a modern laptop:
- SMS short-circuit: sub-millisecond.
- FTS5 lane: typically 5–30 ms.
- Vector lane: 20–100 ms (BGE-small, ~150K embedded chunks).
- Progressive retrieval expansion: O(K × neighbours), typically under 100 ms.
The end-to-end p95 for brain_query --depth auto lands in the low-hundreds of milliseconds. There’s no network in the path.
When to use which CLI verb
Section titled “When to use which CLI verb”gbrain search— when you want just keyword matches and you trust your phrasing.gbrain query(no--depth) — when you want hybrid ranking but no expansion.gbrain query --depth auto— when you want fully assembled context for downstream consumption (an LLM, a report, a summary).
The MCP tools mirror this split: brain_search for FTS5-only, brain_query for hybrid (with optional depth: "auto").