Published: 2026-06-05
Mellum 2: JetBrains’ 12B MoE Model with MCP Tool Use and Hermes Agent
Chapters / key moments (click to jump — plays here on the page)
Fahd Mirza demonstrates running JetBrains' new Mellum 2 model locally via vLLM, connecting it to an MCP filesystem server for real file read/write operations, and then testing it inside Hermes Agent. Mellum 2 is a 12B mixture-of-experts model released under Apache 2.0 — not a fine-tune, built from scratch — with tool use, multi-step agentic workflows, and a 131k token context window at roughly the compute cost of a 2.5B dense model.
Source video
"Mellum2: JetBrains' New Coding Model - vLLM + MCP Tool Use Locally" by Fahd Mirza — Watch on YouTube →
Key Takeaways
- Mellum 2 is JetBrains' 12B mixture-of-experts model — Apache 2.0 licensed, built from scratch (not a fine-tune), with native tool use and a 131k token context window.
- MoE architecture: 3 of 4 attention layers use sliding window attention (1024 tokens) for speed; the 4th uses full attention to preserve long-range context. Total compute cost is equivalent to a 2.5B dense model.
- A thinking variant exposes chain-of-thought inside
<think>tags; vLLM uses the Qwen 3 reasoning parser to surface it as a separate API field. - Connect to MCP via a simple JSON config + MCP CLI — Mellum 2 called tools autonomously (directory listing, file read, file write) without explicit prompting to do so.
- Works inside Hermes Agent by selecting "local/custom endpoint" and pointing it at the vLLM server on
localhost:8000. - Verdict: solid for code refactoring and bug finding; not the strongest single-shot code generator for highly constrained multi-requirement tasks.
Commands & Setup Steps
# 1. Serve Mellum 2 locally with vLLM (requires NVIDIA GPU with ~45GB VRAM)
# Use the thinking variant from Hugging Face
vllm serve JetBrains/Mellum-2-12B-instruct-thinking \
--reasoning-parser qwen3 \
--enable-auto-tool-choice \
--tool-call-parser hermes \
--port 8000
# 2. Install latest Transformers if model fails to load
pip install git+https://github.com/huggingface/transformers
# 3. Create MCP filesystem server config (mcp-server-config.json)
# Example config telling MCP CLI where the sandbox lives and how to start the server:
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/sandbox"]
}
}
}
# 4. Install MCP CLI and connect to Mellum 2
pip install mcp-cli
mcp chat --model http://localhost:8000/v1 --config mcp-server-config.json
# 5. Use Mellum 2 inside Hermes Agent
# In Hermes setup: select "local" or "custom endpoint" as provider
# Set model endpoint to http://localhost:8000
# Hermes will auto-detect the Mellum 2 model





