How Search Works

GNO uses a sophisticated multi-stage search pipeline that combines traditional keyword search with modern neural techniques. This document explains how your queries are processed, expanded, and ranked.

New to the terminology? See the Glossary for definitions of BM25, RRF, HyDE, and other terms.

The Search Pipeline

The diagram below shows how your query flows through GNO’s search system:

Stage 1: Query Expansion → Your query is expanded by an LLM into keyword variants (for BM25), semantic variants (for vectors), and a HyDE passage.

Stage 2: Parallel Search → BM25 and vector searches run simultaneously on original query + all variants.

Stage 3: RRF Fusion → Results are merged using Reciprocal Rank Fusion. Documents appearing in multiple lists get boosted.

Stage 4: Reranking → Top 20 candidates are rescored by a cross-encoder for final ordering.

┌───────────────────────────────────────────────────────────────┐
│                         YOUR QUERY                            │
│                "how do I deploy to production"                │
└───────────────────────────────┬───────────────────────────────┘
                                │
                                ▼
┌───────────────────────────────────────────────────────────────┐
│  STAGE 1: QUERY EXPANSION (LLM)                               │
│                                                               │
│  Lexical variants (for BM25):                                 │
│    • "deployment process", "deploy application"               │
│                                                               │
│  Semantic variants (for vectors):                             │
│    • "steps to release software"                              │
│                                                               │
│  HyDE passage (hypothetical answer):                          │
│    "To deploy, run build, push to staging..."                 │
└───────────────────────────────┬───────────────────────────────┘
                                │
              ┌─────────────────┴─────────────────┐
              ▼                                   ▼
┌─────────────────────────────┐ ┌─────────────────────────────┐
│  STAGE 2A: BM25 SEARCH      │ │  STAGE 2B: VECTOR SEARCH    │
│                             │ │                             │
│  Keyword matching via FTS5  │ │  Semantic similarity via    │
│                             │ │  embedding cosine distance  │
│  Searches in parallel:      │ │                             │
│  • Original query (2x)      │ │  Searches in parallel:      │
│  • Each lexical variant     │ │  • Original query (2x)      │
│                             │ │  • Semantic variants + HyDE │
└──────────────┬──────────────┘ └──────────────┬──────────────┘
               │                               │
               └───────────────┬───────────────┘
                               ▼
┌───────────────────────────────────────────────────────────────┐
│  STAGE 3: RECIPROCAL RANK FUSION (RRF)                        │
│                                                               │
│  score = Σ (weight / (k + rank))    where k=60                │
│                                                               │
│  Documents in top positions across multiple searches get      │
│  boosted. Weights: original=1.0, variants=0.5, HyDE=0.7       │
└───────────────────────────────┬───────────────────────────────┘
                                │
                                ▼
┌───────────────────────────────────────────────────────────────┐
│  STAGE 4: RERANKING (Cross-Encoder)                           │
│                                                               │
│  Top 20 candidates rescored by neural cross-encoder.          │
│                                                               │
│  Position-aware blending:                                     │
│    1-3: 75% fusion / 25% rerank                               │
│    4-10: 60% fusion / 40% rerank                              │
│    11+: 40% fusion / 60% rerank                               │
└───────────────────────────────┬───────────────────────────────┘
                                │
                                ▼
┌───────────────────────────────────────────────────────────────┐
│                        FINAL RESULTS                          │
│                 Sorted by blended score [0-1]                 │
└───────────────────────────────────────────────────────────────┘

Query Expansion with HyDE

GNO uses a technique inspired by HyDE (Hypothetical Document Embeddings). Instead of just searching with your query, the LLM generates:

Lexical Queries

Keyword variations optimized for BM25 full-text search. If you ask “how to protect my app”, these might be:

  • “security application”
  • “protect app authentication”
  • “app security measures”

Semantic Queries

Rephrased versions that capture the meaning differently, optimized for vector search:

  • “ways to secure software from attacks”
  • “implementing application security”

HyDE Passage

A short hypothetical document that would answer your question. This is powerful because:

  • Documents are written in “answer style”, not “question style”
  • Searching with an answer-like text finds similar answer-like documents
  • Bridges the vocabulary gap between questions and documentation
Query: "how do I deploy to production"

HyDE: "To deploy the application to production, first ensure all tests pass,
       then run the build command with production flags, push the artifacts
       to your staging environment for validation, and finally promote to
       production using the deployment pipeline..."

Why Expansion Helps

Without expansion, searching “deploy to production” only finds documents with those exact words. With expansion:

Search Type Finds Documents About
Original “deploy”, “production”
Lexical variants “deployment”, “release”, “shipping”
Semantic variants CI/CD, infrastructure, DevOps
HyDE Step-by-step guides, tutorials, runbooks

Search Modes

GNO offers different search commands for different needs:

gno search - BM25 Only

Fast keyword search using SQLite FTS5. Best for:

  • Exact term lookups
  • Code identifiers
  • Known phrases
gno search "useEffect cleanup"

gno vsearch - Vector Only

Pure semantic search using embeddings. Best for:

  • Conceptual queries
  • “How do I…” questions
  • Finding related content
gno vsearch "how to prevent memory leaks in React"

Combines BM25 + vector + expansion + reranking. Best for:

  • General purpose search
  • When you’re not sure what terms to use
  • Complex questions
gno query "best practices for error handling"

Score Normalization

All scores are normalized to [0.0 - 1.0] range where 1.0 is the best match. This makes scores comparable within a result set.

Important: Scores are normalized per query and are NOT comparable across different queries. A score of 0.8 on query A doesn’t mean the same relevance as 0.8 on query B.

BM25 Scores

Raw BM25: smaller (more negative) = better
Normalized: (worst - raw) / (worst - best)
Result: 1.0 = best match in results, 0.0 = worst

Vector Scores

Cosine distance: 0 = identical, 2 = opposite
Similarity: 1 - (distance / 2)
Result: 1.0 = identical vectors

Fusion Scores

RRF produces position-based scores that are then normalized. Documents appearing highly in multiple lists score best.

Blended Scores

Final score combines fusion + rerank with position-aware weights, then normalized to [0,1].

Reciprocal Rank Fusion (RRF)

RRF is an algorithm that merges multiple ranked lists without needing to calibrate scores across different retrieval methods. The formula:

RRF_score(doc) = Σ weight_i / (k + rank_i)

Where:

  • k = 60 (dampening constant, reduces impact of exact rank)
  • rank_i = position in result list i (1-indexed)
  • weight_i = importance of that result list

Why RRF Works

Consider a document that appears:

  • Rank 1 in BM25: 1.0 / (60 + 1) = 0.0164
  • Rank 3 in Vector: 1.0 / (60 + 3) = 0.0159
  • Total: 0.0323

vs a document that appears:

  • Rank 1 in BM25 only: 1.0 / (60 + 1) = 0.0164
  • Not in Vector results
  • Total: 0.0164

The document appearing in both lists wins, even if another document ranked #1 in just one list.

Variant Weighting

Not all searches are equal:

Source Weight Reasoning
Original BM25 1.0 Direct match to user query
Original Vector 1.0 Direct semantic match
BM25 variants 0.5 LLM-generated, less direct
Vector variants 0.5 LLM-generated, less direct
HyDE passage 0.7 Powerful but indirect

Position-Aware Blending

After RRF fusion, the top candidates are reranked using a cross-encoder model. But we don’t just replace fusion scores with rerank scores - we blend them based on position:

Position Fusion Weight Rerank Weight Why
1-3 75% 25% Top results from multi-signal fusion are reliable
4-10 60% 40% Balanced - both signals useful
11+ 40% 60% Lower ranks benefit more from reranker judgment

This approach:

  • Trusts the robust multi-signal fusion for top positions
  • Lets the deeper cross-encoder model refine lower positions
  • Prevents a single model from dominating results

Retrieval Limits

GNO retrieves more candidates than you request, then filters down:

Stage Candidates Retrieved
BM25 (original query) limit × 2
BM25 (each variant) limit
Vector (original query) limit × 2
Vector (each variant) limit
Vector (HyDE) limit
After fusion All unique docs
Reranking Top 20
Final output Your requested limit

Controlling Search Behavior

Skip Expansion

If you want faster results or have a precise query:

gno query "exact phrase match" --no-expand

Skip Reranking

For speed or if you trust fusion scores:

gno query "my search" --no-rerank

Filter by Score

Only show high-confidence results:

gno query "my search" --min-score 0.5

Limit Results

gno query "my search" -n 10

See Pipeline Details

The --explain flag shows what happened:

gno query "my search" --explain

Graceful Degradation

GNO works even when components are missing:

Missing Component Behavior
sqlite-vec extension BM25 search only
Embedding model Vector search disabled
Rerank model Skip reranking, use fusion
Generation model Skip query expansion

Run gno doctor to check what’s available.

Language Support

Query expansion prompts are language-aware:

  • English (en-*): Optimized English prompt
  • German (de-*): Native German prompt
  • Other: Multilingual fallback prompt

Language is auto-detected from your query text using the franc library (supports 30+ languages).

Performance Characteristics

Operation Typical Time
BM25 search ~5-20ms
Vector search ~10-50ms
Query expansion ~1-3s (LLM generation)
Reranking (20 docs) ~500ms-2s
Full hybrid query ~2-5s

Query expansion is cached by query + model, so repeated queries are fast.