Last updated: 2026-04-27

🔌 Kilo Code Models — 500+ via OpenRouter

Kilo's biggest functional advantage over Claude Code is model breadth. Through OpenRouter, Kilo can route to 500+ models — every Claude tier, GPT-5.5, GPT-5.4-Cyber, o4-mini, Gemini 3.1 Pro/Flash, Kimi K2, Qwen 3.5 / 3.6, Llama 3.4, DeepSeek V3.5, Mistral Large 3, and a long tail of specialized models. All at provider rates with no Kilo markup. This guide explains how to wire each path, when to use which, and the cost-per-task patterns we measured.

Three ways to connect models

OpenRouter (default). One credential, 500+ models, no markup. Pay via Kilo credits (1 credit = $1) or directly to OpenRouter. Best for breadth and quick model swaps.
Direct provider keys. Anthropic, OpenAI, Google, etc. Each gets its own API key in Kilo settings. Bills directly to that provider. Best when you already have a relationship or volume discount with one vendor.
Hybrid. Kilo lets you route different model classes to different providers. Common pattern: orchestrator's planner step → Anthropic direct (you have a Max plan), coder step → OpenRouter (cheaper for high-volume), debugger → direct OpenAI (lowest latency for o4-mini).

Recommended starting pairings

Use case	Default model	Why
Day-to-day chat / quick edits	Sonnet 4.6	Best balance of speed, quality, cost for 80% of tasks
Hard reasoning / architecture	Opus 4.7 (xhigh effort)	The depth shows; effort-levels guide
Batch / summaries	Haiku 4.5 or Gemini 2.5 Flash	10-50× cheaper for bulk work
Open-weights cost control	Kimi K2 or Qwen 3.5 72B (OpenRouter)	~3-5× cheaper than GPT-5.4 / Sonnet at similar quality
Privacy-sensitive	Local Ollama (Qwen 3.6 35B MoE) via Kilo's local-model routing	$0/token, data never leaves your network

Per-task cost patterns we measured

Across 50 representative coding tasks (small refactor, multi-file feature, debugging session, code review):

Sonnet 4.6 baseline: $0.05–0.30 per task
Opus 4.7 high effort: 3-4× Sonnet baseline ($0.15–1.20)
Opus 4.7 xhigh effort: 5-7× Sonnet baseline ($0.25–2.00)
Orchestrator on, 3 sub-agents: ~1.8× the single-agent cost (less than 3× because planner/debugger are usually small; coder is the bulk)
Kimi K2 via OpenRouter: ~$0.02–0.10 per task — most cost-effective for low-stakes work

Plug your real numbers into the cost calculator for projections at your usage level.

Local models — when and how

Kilo supports local Ollama endpoints for privacy-critical work. Configure in ~/.kilo/config.toml:

[providers.ollama]
base_url = "http://localhost:11434/v1"
models = ["qwen3.6:35b-moe", "gemma2:9b"]

The orchestrator can mix: planner on cloud Opus 4.7, coder on local Qwen 3.6. Latency is higher but privacy is total. See our daily-journal use case for a privacy-first pattern.

Pitfalls

Default-routing everything to the most expensive model. Set per-mode defaults: chat → Sonnet, planner → Opus 4.7, coder → Sonnet, debugger → Haiku.
OpenRouter free tier. Free-tier requests get throttled hard — feels like Kilo is broken. Add $5 to OpenRouter and the experience changes.
BYO key + leaked .env. Standard rule: never paste API keys into shared chats, never commit them to git. Add .kilo/ to .gitignore if you're customizing local config.

Orchestrator deep-dive — how the planner/coder/debugger model assignment works
Cost calculator — every model Kilo routes to is priced
Cost optimization patterns — model tiering applies to Kilo too

← Back to the Kilo Code hub