Last updated: 2026-04-18
What is Local model?
An LLM running on your own GPU instead of a hosted API — via Ollama, vLLM, or llama.cpp. Eliminates per-token cost and keeps data private, at the price of GPU hardware and slower inference.
See also
← Back to the full AI agent glossary.