Published: 2026-06-22
GLM 5.2 on a Mac Studio M3 Ultra: 395GB, 12 tok/s, 74K Context
Chapters / key moments (click to jump — plays here on the page)
Bart Slodyczka fits the new GLM 5.2 — a 395GB download — onto a Mac Studio M3 Ultra with 512GB of unified memory. It was too new to load in LM Studio natively, so he ran it on a custom runtime, clocking ~12 tokens/sec and around 74K of usable context before the machine ran out of headroom. His real takeaway is about harnesses, not hardware: the same local model produced a "very basic app" inside Pi Agent in Cursor but a far better result when driven by a strong harness like Claude Code.
Source video
"Running GLM 5.2 on Mac Studio M3 Ultra" by Bart Slodyczka — Watch on YouTube →
Key Takeaways
- Hardware: M3 Ultra Mac Studio with 512GB unified memory; GLM 5.2 is a 395GB download that lives almost entirely in RAM.
- Performance: ~12 tokens/sec generation and roughly 74,000 tokens of context before the machine hit its limit.
- Runtime: the model was too new to run in LM Studio natively, so it needed a custom runtime to load at all.
- Claude Code timed out: at ~12 tok/s, generation was slow enough that requests kept timing out inside Claude Code — so he demoed in Pi Agent inside Cursor instead.
- The harness matters more than the model: a basic prompt to Pi Agent gave a basic app, while a strong harness like Claude Code turns a simple prompt into a far more complete result. Pair a local model with a good harness when you can.





