Last updated: 2026-04-18

What is Local model?

An LLM running on your own GPU instead of a hosted API — via Ollama, vLLM, or llama.cpp. Eliminates per-token cost and keeps data private, at the price of GPU hardware and slower inference.

See also

← Back to the full AI agent glossary.