Published: 2026-06-15

Build a Local AI Assistant: Gemma 4 12B + Hermes Agent on a Mac Mini

Name: Build a Local AI Assistant: Gemma 4 12B + Hermes Agent on a Mac Mini
Uploaded: 2026-06-15
Description: Bart Slodyczka installs Hermes Agent on a 16GB M4 Mac Mini and wires it to a local Gemma 4 12B model in LM Studio, using Claude Code to drive the setup.

Chapters / key moments (click to jump — plays here on the page)

Bart Slodyczka takes a 16GB M4 Mac Mini running a local Gemma 4 12B model in LM Studio and turns it into a private AI assistant by installing Hermes Agent and pointing it at the local model. The twist: he uses Claude Code (in the Claude desktop app) to do the whole setup — check hardware, keep the Mac awake, install Hermes, set Gemma as the default model, and run an integration test — so you watch an agent configure another agent.

Source video

"Gemma 4 12B + Hermes Agent: Build Your Own AI Assistant" by Bart Slodyczka — Watch on YouTube →

Key Takeaways

Let the coding agent drive the install. Because Claude Code runs in the terminal with access to the machine, it can read the hardware, find the Hermes install line, run it, and use the Hermes CLI to configure everything — no manual setup.
Keep the Mac always-on. If the Mac Mini sleeps, Hermes Agent sleeps with it and misses time-sensitive jobs (e.g. emails at 3am). Bart also recommends a reboot script to relaunch LM Studio, Hermes Agent, and Tailscale after a power loss.
The RAM-vs-context balancing act is the whole game on 16GB. Loading Gemma at ~12GB leaves only ~4GB for the OS and for Hermes to actually do work. He lands around 12GB / ~67K-token context as a workable sweet spot — too low (≈30K) isn't enough to hold a conversation or finish tasks.
QAT vs regular weights is use-case dependent. The Quantized-Aware-Training Gemma 4 12B (~6.66GB) is smaller than the regular 12B (~7.04GB), but one Reddit report found QAT was a regression for their use case — test both.
Squeeze more context with flash attention + Q8 KV cache (a small accuracy trade-off) to make the model more usable inside Hermes Agent.
Stay local and private. Prefer local/free options — Docker-hosted Firecrawl for crawling, or the agent's built-in browser mode via Chromium — over paid web-search APIs.
Verify the wiring with an integration test. Claude sends a prompt through the Hermes CLI; a response back confirms LM Studio is up, the model is loaded, and Gemma is correctly set as the Hermes default.

Commands & Code Mentioned

# List models available in LM Studio (via the LM Studio CLI)
lms ls

# Send a one-off prompt through Hermes Agent to integration-test the model
hermes chat "your test prompt here"

New to local models with Hermes? See best free models for Hermes, Hermes setup, and the Hermes web dashboard.

Build a Local AI Assistant: Gemma 4 12B + Hermes Agent on a Mac Mini

Key Takeaways

Commands & Code Mentioned

More Hermes news

Go deeper: Hermes guides