AI Answers Without the Cloud
Get cited, grounded answers from your own documents using local language models. GNO runs everything on your machine - no API keys, no data sharing, no subscriptions.
Key Benefits
- 100% local processing
- No API keys required
- Cited answers from your docs
- Multiple model presets (slim, balanced, quality)
Example Commands
gno ask 'your question' --answer
gno models use balanced
gno models pull
Get Started
Ready to try Local LLM Answers?
How It Works
GNO uses local language models via node-llama-cpp to generate answers grounded in your documents.
Ask Questions, Get Cited Answers
gno ask "What was decided about the API design?" --answer
GNO will:
- Search your documents using hybrid search
- Retrieve relevant chunks
- Generate an answer citing specific documents
- Return the answer with source references
Model Presets
Choose the right balance of speed and quality:
| Preset | Speed | Quality | Use Case |
|---|---|---|---|
| slim | Fast | Good | Default, quick lookups |
| balanced | Medium | Good | Slightly larger model |
| quality | Slower | Best | Complex questions |
gno models use slim
gno models pull
Remote GPU Server Support
Run on lightweight machines by offloading inference to a GPU server on your network:
# ~/.config/gno/config.yaml
models:
activePreset: remote-gpu
presets:
- id: remote-gpu
name: Remote GPU Server
embed: "http://192.168.1.100:8081/v1/embeddings#bge-m3"
rerank: "http://192.168.1.100:8082/v1/completions#reranker"
gen: "http://192.168.1.100:8083/v1/chat/completions#qwen3-4b"
Works with any OpenAI-compatible server (llama-server, Ollama, LocalAI, vLLM). No CORS configuration needed—just point to your server.
No Cloud Required
Everything runs on your machine (or your network):
- Models downloaded once, run locally
- Optional: offload to GPU server on LAN
- No API keys or subscriptions
- Works completely offline
- Your data never leaves your network