Fine-Tuned Models

Guide for using fine-tuned local generation models with gno.

Current Promoted Retrieval Model

Current promoted slim retrieval model:

release id: slim-retrieval-v1
canonical run: auto-entity-lock-default-mix-lr95
repeated benchmark median:
- nDCG@10 0.925
- ask Recall@5 0.875
- schema success 1.0
- p95 4775.99ms

Canonical bundle:

This model passed the promotion gate and is the one to use for final packaging and publishing.

What Is Portable

The training backend can be machine-specific.

Example:

training on Apple Silicon via MLX LoRA

What must be portable is the exported artifact:

fused weights
GGUF runtime file
benchmark summary
install snippet / model card

Recommended Workflow

Public/shared model:

use the published HF model directly
keep it in a custom preset such as slim-tuned
benchmark before replacing any defaults in your own project

Private/internal model:

train a run in research/finetune/
promote the run:

bun run research:finetune:promote <run>

if it is private or paid, keep the resulting GGUF on disk and use file:
if it is public, publish the GGUF and model card to HF and switch to hf:
benchmark before replacing any defaults

Install In GNO

Recommended public preset:

models:
  activePreset: slim-tuned
  presets:
    - id: slim-tuned
      name: GNO Slim Retrieval v1
      embed: hf:gpustack/bge-m3-GGUF/bge-m3-Q4_K_M.gguf
      rerank: hf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/qwen3-reranker-0.6b-q8_0.gguf
      gen: hf:guiltylemon/gno-expansion-slim-retrieval-v1/gno-expansion-auto-entity-lock-default-mix-lr95-f16.gguf

Then:

gno models use slim-tuned
gno query "ECONNREFUSED 127.0.0.1:5432" --thorough

For a private model that is not published to HF yet, replace gen: with:

gen: file:/absolute/path/to/your-private-model.gguf

When To Keep It Custom

Keep the fine-tuned model in a custom preset when:

it only has one strong benchmark run
ask-style retrieval is flat or noisy
you have not repeated the benchmark on fresh runs

Only consider changing defaults after repeated measured wins.

Troubleshooting

`mlx_lm fuse --export-gguf` fails for `qwen3`

Current workaround:

bun run research:finetune:fuse-best <run>
bun run research:finetune:export-env
bun run research:finetune:export-gguf <run>

This works because the export path uses:

MLX fuse with --dequantize
llama.cpp conversion on the dequantized fused model

Exported GGUF loads but does not follow the JSON contract

This usually means the model is not trained enough yet, not that export failed.

Check:

selected checkpoint vs final checkpoint
schema success rate in benchmark summary
raw output from bun run research:finetune:smoke-gno-export <run>

Fine-tuned model is better on loss but not on retrieval

Do not promote on loss alone.

Use:

bun run research:finetune:benchmark-export <run>

Promotion should follow retrieval metrics, not training loss.

Mac-only training concern

Training can stay Mac-only.

The exported GGUF is the portable artifact and can be used anywhere llama.cpp / node-llama-cpp can load it.