Published: 2026-06-22

GLM 5.2 on a Mac Studio M3 Ultra: 395GB, 12 tok/s, 74K Context

Chapters / key moments (click to jump — plays here on the page)

Bart Slodyczka fits the new GLM 5.2 — a 395GB download — onto a Mac Studio M3 Ultra with 512GB of unified memory. It was too new to load in LM Studio natively, so he ran it on a custom runtime, clocking ~12 tokens/sec and around 74K of usable context before the machine ran out of headroom. His real takeaway is about harnesses, not hardware: the same local model produced a "very basic app" inside Pi Agent in Cursor but a far better result when driven by a strong harness like Claude Code.

Source video

"Running GLM 5.2 on Mac Studio M3 Ultra" by Bart SlodyczkaWatch on YouTube →

Key Takeaways

  • Hardware: M3 Ultra Mac Studio with 512GB unified memory; GLM 5.2 is a 395GB download that lives almost entirely in RAM.
  • Performance: ~12 tokens/sec generation and roughly 74,000 tokens of context before the machine hit its limit.
  • Runtime: the model was too new to run in LM Studio natively, so it needed a custom runtime to load at all.
  • Claude Code timed out: at ~12 tok/s, generation was slow enough that requests kept timing out inside Claude Code — so he demoed in Pi Agent inside Cursor instead.
  • The harness matters more than the model: a basic prompt to Pi Agent gave a basic app, while a strong harness like Claude Code turns a simple prompt into a far more complete result. Pair a local model with a good harness when you can.

Weekly Digest — In Your Inbox

Get the week's top AI agent news, updates, and guides — every Friday.