Published: 2026-06-22

GLM 5.2 on a Mac Studio M3 Ultra: 395GB, 12 tok/s, 74K Context

Name: GLM 5.2 on a Mac Studio M3 Ultra: 395GB, 12 tok/s, 74K Context
Uploaded: 2026-06-22
Description: Bart Slodyczka runs the 395GB GLM 5.2 locally on a 512GB M3 Ultra Mac Studio, measuring ~12 tokens/sec and 74K context, and argues a good harness matters more than the model.

Chapters / key moments (click to jump — plays here on the page)

Bart Slodyczka fits the new GLM 5.2 — a 395GB download — onto a Mac Studio M3 Ultra with 512GB of unified memory. It was too new to load in LM Studio natively, so he ran it on a custom runtime, clocking ~12 tokens/sec and around 74K of usable context before the machine ran out of headroom. His real takeaway is about harnesses, not hardware: the same local model produced a "very basic app" inside Pi Agent in Cursor but a far better result when driven by a strong harness like Claude Code.

Source video

"Running GLM 5.2 on Mac Studio M3 Ultra" by Bart Slodyczka — Watch on YouTube →

Key Takeaways

Hardware: M3 Ultra Mac Studio with 512GB unified memory; GLM 5.2 is a 395GB download that lives almost entirely in RAM.
Performance: ~12 tokens/sec generation and roughly 74,000 tokens of context before the machine hit its limit.
Runtime: the model was too new to run in LM Studio natively, so it needed a custom runtime to load at all.
Claude Code timed out: at ~12 tok/s, generation was slow enough that requests kept timing out inside Claude Code — so he demoed in Pi Agent inside Cursor instead.
The harness matters more than the model: a basic prompt to Pi Agent gave a basic app, while a strong harness like Claude Code turns a simple prompt into a far more complete result. Pair a local model with a good harness when you can.

GLM 5.2 on a Mac Studio M3 Ultra: 395GB, 12 tok/s, 74K Context

Key Takeaways

More OpenClaw & Claude Code news

Go deeper: OpenClaw guides