Last updated: 2026-04-27

🔌 Kilo Code Models — 500+ via OpenRouter

Kilo's biggest functional advantage over Claude Code is model breadth. Through OpenRouter, Kilo can route to 500+ models — every Claude tier, GPT-5.5, GPT-5.4-Cyber, o4-mini, Gemini 3.1 Pro/Flash, Kimi K2, Qwen 3.5 / 3.6, Llama 3.4, DeepSeek V3.5, Mistral Large 3, and a long tail of specialized models. All at provider rates with no Kilo markup. This guide explains how to wire each path, when to use which, and the cost-per-task patterns we measured.

Three ways to connect models

  1. OpenRouter (default). One credential, 500+ models, no markup. Pay via Kilo credits (1 credit = $1) or directly to OpenRouter. Best for breadth and quick model swaps.
  2. Direct provider keys. Anthropic, OpenAI, Google, etc. Each gets its own API key in Kilo settings. Bills directly to that provider. Best when you already have a relationship or volume discount with one vendor.
  3. Hybrid. Kilo lets you route different model classes to different providers. Common pattern: orchestrator's planner step → Anthropic direct (you have a Max plan), coder step → OpenRouter (cheaper for high-volume), debugger → direct OpenAI (lowest latency for o4-mini).

Recommended starting pairings

Use caseDefault modelWhy
Day-to-day chat / quick editsSonnet 4.6Best balance of speed, quality, cost for 80% of tasks
Hard reasoning / architectureOpus 4.7 (xhigh effort)The depth shows; effort-levels guide
Batch / summariesHaiku 4.5 or Gemini 2.5 Flash10-50× cheaper for bulk work
Open-weights cost controlKimi K2 or Qwen 3.5 72B (OpenRouter)~3-5× cheaper than GPT-5.4 / Sonnet at similar quality
Privacy-sensitiveLocal Ollama (Qwen 3.6 35B MoE) via Kilo's local-model routing$0/token, data never leaves your network

Per-task cost patterns we measured

Across 50 representative coding tasks (small refactor, multi-file feature, debugging session, code review):

  • Sonnet 4.6 baseline: $0.05–0.30 per task
  • Opus 4.7 high effort: 3-4× Sonnet baseline ($0.15–1.20)
  • Opus 4.7 xhigh effort: 5-7× Sonnet baseline ($0.25–2.00)
  • Orchestrator on, 3 sub-agents: ~1.8× the single-agent cost (less than 3× because planner/debugger are usually small; coder is the bulk)
  • Kimi K2 via OpenRouter: ~$0.02–0.10 per task — most cost-effective for low-stakes work

Plug your real numbers into the cost calculator for projections at your usage level.

Local models — when and how

Kilo supports local Ollama endpoints for privacy-critical work. Configure in ~/.kilo/config.toml:

[providers.ollama]
base_url = "http://localhost:11434/v1"
models = ["qwen3.6:35b-moe", "gemma2:9b"]

The orchestrator can mix: planner on cloud Opus 4.7, coder on local Qwen 3.6. Latency is higher but privacy is total. See our daily-journal use case for a privacy-first pattern.

Pitfalls

  • Default-routing everything to the most expensive model. Set per-mode defaults: chat → Sonnet, planner → Opus 4.7, coder → Sonnet, debugger → Haiku.
  • OpenRouter free tier. Free-tier requests get throttled hard — feels like Kilo is broken. Add $5 to OpenRouter and the experience changes.
  • BYO key + leaked .env. Standard rule: never paste API keys into shared chats, never commit them to git. Add .kilo/ to .gitignore if you're customizing local config.

Next

← Back to the Kilo Code hub

📬 Weekly Digest — In Your Inbox

One email a week: top news, releases, and our deepest new guide. No spam. Same content via RSS if you prefer.