AI Answers Without the Cloud

Get cited, grounded answers from your own documents using local language models. GNO runs everything on your machine - no API keys, no data sharing, no subscriptions.

Key Benefits

100% local processing
No API keys required
Cited answers from your docs
Multiple model presets (slim, balanced, quality)

Example Commands

gno ask 'your question' --answer gno models use balanced gno models pull

Get Started

Ready to try Local LLM Answers?

Quick Start Guide CLI Reference

How It Works

GNO uses local language models via node-llama-cpp to generate answers grounded in your documents.

Ask Questions, Get Cited Answers

gno ask "What was decided about the API design?" --answer

GNO will:

Search your documents using hybrid search
Retrieve relevant chunks
Generate an answer citing specific documents
Return the answer with source references

Model Presets

Choose the right balance of speed and quality:

Preset	Speed	Quality	Use Case
slim	Fast	Good	Default, quick lookups
balanced	Medium	Good	Slightly larger model
quality	Slower	Best	Complex questions

gno models use slim
gno models pull

Remote GPU Server Support

Run on lightweight machines by offloading inference to a GPU server on your network:

# ~/.config/gno/config.yaml
models:
  activePreset: remote-gpu
  presets:
    - id: remote-gpu
      name: Remote GPU Server
      embed: "http://192.168.1.100:8081/v1/embeddings#bge-m3"
      rerank: "http://192.168.1.100:8082/v1/completions#reranker"
      gen: "http://192.168.1.100:8083/v1/chat/completions#qwen3-4b"

Works with any OpenAI-compatible server (llama-server, Ollama, LocalAI, vLLM). No CORS configuration needed—just point to your server.

Configuration guide →

No Cloud Required

Everything runs on your machine (or your network):

Models downloaded once, run locally
Optional: offload to GPU server on LAN
No API keys or subscriptions
Works completely offline
Your data never leaves your network