🔌 Kilo Code Models — 500+ via OpenRouter
Kilo's biggest functional advantage over Claude Code is model breadth. Through OpenRouter, Kilo can route to 500+ models — every Claude tier, GPT-5.5, GPT-5.4-Cyber, o4-mini, Gemini 3.1 Pro/Flash, Kimi K2, Qwen 3.5 / 3.6, Llama 3.4, DeepSeek V3.5, Mistral Large 3, and a long tail of specialized models. All at provider rates with no Kilo markup. This guide explains how to wire each path, when to use which, and the cost-per-task patterns we measured.
Three ways to connect models
- OpenRouter (default). One credential, 500+ models, no markup. Pay via Kilo credits (1 credit = $1) or directly to OpenRouter. Best for breadth and quick model swaps.
- Direct provider keys. Anthropic, OpenAI, Google, etc. Each gets its own API key in Kilo settings. Bills directly to that provider. Best when you already have a relationship or volume discount with one vendor.
- Hybrid. Kilo lets you route different model classes to different providers. Common pattern: orchestrator's planner step → Anthropic direct (you have a Max plan), coder step → OpenRouter (cheaper for high-volume), debugger → direct OpenAI (lowest latency for o4-mini).
Recommended starting pairings
| Use case | Default model | Why |
|---|---|---|
| Day-to-day chat / quick edits | Sonnet 4.6 | Best balance of speed, quality, cost for 80% of tasks |
| Hard reasoning / architecture | Opus 4.7 (xhigh effort) | The depth shows; effort-levels guide |
| Batch / summaries | Haiku 4.5 or Gemini 2.5 Flash | 10-50× cheaper for bulk work |
| Open-weights cost control | Kimi K2 or Qwen 3.5 72B (OpenRouter) | ~3-5× cheaper than GPT-5.4 / Sonnet at similar quality |
| Privacy-sensitive | Local Ollama (Qwen 3.6 35B MoE) via Kilo's local-model routing | $0/token, data never leaves your network |
Per-task cost patterns we measured
Across 50 representative coding tasks (small refactor, multi-file feature, debugging session, code review):
- Sonnet 4.6 baseline: $0.05–0.30 per task
- Opus 4.7 high effort: 3-4× Sonnet baseline ($0.15–1.20)
- Opus 4.7 xhigh effort: 5-7× Sonnet baseline ($0.25–2.00)
- Orchestrator on, 3 sub-agents: ~1.8× the single-agent cost (less than 3× because planner/debugger are usually small; coder is the bulk)
- Kimi K2 via OpenRouter: ~$0.02–0.10 per task — most cost-effective for low-stakes work
Plug your real numbers into the cost calculator for projections at your usage level.
Local models — when and how
Kilo supports local Ollama endpoints for privacy-critical work. Configure in ~/.kilo/config.toml:
[providers.ollama]
base_url = "http://localhost:11434/v1"
models = ["qwen3.6:35b-moe", "gemma2:9b"]The orchestrator can mix: planner on cloud Opus 4.7, coder on local Qwen 3.6. Latency is higher but privacy is total. See our daily-journal use case for a privacy-first pattern.
Pitfalls
- Default-routing everything to the most expensive model. Set per-mode defaults: chat → Sonnet, planner → Opus 4.7, coder → Sonnet, debugger → Haiku.
- OpenRouter free tier. Free-tier requests get throttled hard — feels like Kilo is broken. Add $5 to OpenRouter and the experience changes.
- BYO key + leaked .env. Standard rule: never paste API keys into shared chats, never commit them to git. Add
.kilo/to.gitignoreif you're customizing local config.
Next
- Orchestrator deep-dive — how the planner/coder/debugger model assignment works
- Cost calculator — every model Kilo routes to is priced
- Cost optimization patterns — model tiering applies to Kilo too
← Back to the Kilo Code hub