Published: 2026-06-20
Gemma 4 12B Coder on Hermes: a Local Coding Agent Tested on Real Bugs
Chapters / key moments (click to jump — plays here on the page)
Fahd Mirza installs a coding-focused fine-tune of Gemma 4 12B — trained only on examples whose reasoning produced code that actually ran and passed tests — runs it locally with Ollama, and wires it into the Hermes agent. He puts it through three real tasks: fixing a tie-breaker bug in a World Cup tracker, generating an HTML animation from scratch, and optimizing a slow SQL query.
Source video
"Gemma 4 12B Coder Fable5 Composer2.5 - Local Coding Agent for Everyone" by Fahd Mirza — Watch on YouTube →
Key Takeaways
- The fine-tune used execution-gated training: if an example’s code failed its tests it was thrown out, and hard problems the teacher model missed were retried from scratch by a second model.
- Mirza runs the Q8 quant on an NVIDIA RTX A6000 (48GB VRAM) via Ollama, using ~16GB VRAM; he recommends the Q4_K_M quant for commodity 8GB GPUs.
- Bug fix: given one prompt and the codebase, the model correctly diagnosed that the standings ignored goal difference and patched it (Ghana now advancing over Ecuador).
- SQL optimization: it cleanly identified legacy comma joins, ambiguous grouping and a broken HAVING alias, rewrote with explicit joins and even recommended the right indexes — "10 out of 10."
- Creative coding miss: asked to generate a self-contained tree-animation HTML file, the model hallucinated a file path and couldn’t deliver — two of three tasks passed overall.
Commands & Code Mentioned
ollama pull <gemma-4-12b-coder> # Q8 quant shown; Q4_K_M recommended for 8GB GPUs
# configure the pulled model inside the Hermes agent, then launch Hermes
nvidia-smi # confirm VRAM use (~16GB for the 12B model + KV cache)





