Published: 2026-04-16

Opus 4.7 Benchmarks: A Half-Step Up, and the Mythos Distillation Theory

Nick Saraev runs through Opus 4.7's benchmarks against 4.6, GPT 5.4, Gemini 3.1 Pro, and Mythos preview — and notices something strange: almost every improvement is mathematically about halfway between 4.6 and Mythos. His hypothesis: Opus 4.7 is Mythos preview distilled down and deployed on faster hardware, rather than a fundamentally new model.

Source video

"Claude Opus-4.7 Just Dropped, And..." by Nick SaraevWatch on YouTube →

Key Takeaways

  • Opus 4.7 is better than 4.6 in essentially every benchmark — but the step up is consistently about half the distance between 4.6 and Mythos preview. On SWE-bench Pro (the main software engineering benchmark): 53.4% (4.6) → 64.3% (4.7) → ~75% (Mythos). A +10.9% improvement that lands almost exactly halfway.
  • The same halfway pattern appears across multiple benchmarks. Nick finds this suspicious — genuine independent model improvements rarely produce such mathematically clean gaps. It suggests intentional calibration, not emergent performance.
  • Nick's Mythos distillation theory: Opus 4.7 is probably Mythos preview "basically just distilled, dummified down a little bit and running on a lot faster and better hardware." A smaller, faster version of the same model rather than a new architecture.
  • Agentic terminal coding shows a smaller step up: 65.4% → 69.4% (4.7) → 82% (Mythos). Nick thinks this is where the safety concerns from Mythos concentrate — Anthropic is reluctant to give the full agentic terminal capability to general users because this is the attack surface that allowed Mythos to compromise Chrome and multiple operating systems.
  • Anthropic's position on Mythos: they've described it as being like "giving kids nuclear weapons" — a model capable enough to autonomously compromise security systems. This is why they're not releasing Mythos directly, and why they're releasing a distilled version that's meaningfully safer on the agentic dimensions.
  • GPT/Spud model expected within days of Opus 4.7 — the competitive cycle is tight enough that significant Anthropic launches are reliably followed by OpenAI responses within a week.

Benchmark Comparison

Benchmark Opus 4.6 Opus 4.7 Mythos Preview
SWE-bench Pro 53.4% 64.3% ~75%
SWE-bench Verified +10–11% ~2× the gap
Agentic terminal coding 65.4% 69.4% 82%

Mythos preview figures are approximate, sourced from Nick's benchmark comparison scorecard in the video.

Related on OpenClawDatabase

← Back to News digest · See also: OpenClaw guide

📬 Weekly Digest — In Your Inbox

One email a week: top news, releases, and our deepest new guide. No spam. Same content via RSS if you prefer.