Published: 2026-06-30
Summary

Ornith 9B Local Coding Test: 9B vs 35B Agentic Model on a Mac Mini

Chapters / key moments (click to jump — plays here on the page)

Bart Slodyczka runs the new open-source Ornith 1.0 9B dense model locally on a 16GB M4 Mac mini via LM Studio and the PyAgent harness, tasking it to build a tower-defense game in a single HTML file. The 9B model produced broken, non-functional code, while the 35B variant one-shotted a playable game — a clear, honest demonstration of where small local coding models still fall short.

Source video

"New Agentic Coding Model Ornith 9B — Is It Worth Running Locally?" by Bart SlodyczkaWatch on YouTube →

Key Takeaways

  • The model family. Ornith 1.0 (by Deep Reinforce, San Francisco) is an agentic-coding family in 9B dense, 31B dense, 35B MoE and 397B MoE sizes. The 9B, 35B and 397B are open-weight — downloadable and runnable today; the 31B isn't released yet. It's built on pretrained Gemma 4 and Qwen 3.5.
  • Memory & context on 16GB. On the Mac mini, the 9B GGUF used ~6GB for weights and reached ~185K context at full KV cache (~15GB total system usage). Enabling Flash Attention and quantizing the KV cache to Q8 lets you push the full context window while consuming only ~10GB.
  • Throughput. ~16 tokens/sec for the 9B on the constrained Mac mini, versus ~100 tokens/sec for the 35B running on a larger Mac Studio. On Apple Silicon the MLX format is faster than GGUF, especially for prefill on longer prompts.
  • The 9B failed the build. Asked to build a tower-defense game in one HTML file, the 9B produced ~750 lines that didn't work — undeclared or empty functions, broken tool calls, and it looped endlessly when asked to self-debug. The 35B one-shotted a comparable, playable game.
  • Bottom line. At 9B parameters the accuracy for from-scratch builds isn't there — you have to reduce scope or expectations. Small local models struggle to "zip the app together"; the 35B has the capacity for real work but needs more hardware.

Tools & Commands Mentioned

LM Studio            # free desktop app to download + run local models; search "Ornith", pick 9B (GGUF or MLX)
                     # enable Flash Attention + Q8 KV cache to fit more context in limited RAM
pyagent              # lightweight harness used to drive the local model (faster than Claude Code for small models)
/model               # inside PyAgent — pick the loaded Ornith model (e.g. Ornith 1.0 9B)

Weekly Digest — In Your Inbox

Get the week's top AI agent news, updates, and guides — every Friday.