Published: 2026-04-16

Opus 4.7 Just Dropped — Was Opus 4.6 Intentionally Degraded to Hype It?

Nate Herk documents the Opus 4.6 quality regression in meticulous detail — a 73% collapse in thinking depth, models skipping file reads a third of the time, 12 times more user interruptions — then asks the uncomfortable question: did Anthropic deliberately degrade 4.6 to manufacture the contrast for 4.7's launch?

Source video

"Claude Opus 4.7 Just Dropped... Or Did It Really?" by Nate Herk — Watch on YouTube →

Key Takeaways

The 4.6 regression was real and measurable. A senior AMD director analyzed nearly 7,000 Claude Code sessions and found thinking depth collapsed 73% — from 2,200 characters of reasoning per task to just 600. The model started skipping file reads before edits, jumping straight into action without context.
The file-skip rate went from 6% to 33.7% — meaning roughly one in three edits was made without the model reading what it was editing. This is the root cause of the code quality problems users complained about.
User interruptions increased 12x. When a model is going off the rails more often and requires constant correction, the claimed token savings from "less thinking" disappear — users burn tokens stopping and restarting sessions instead.
Hallucinations changed character: not just wrong answers but fabricated technical artefacts — invented git commit hashes, non-existent package names, fake API versions. The word "simplest" appeared 3x more often in output, a signal the model was optimizing for minimal effort rather than correctness.
Opus 4.7's claims directly address the 4.6 complaints: "handles long-running tasks with more rigor," "follows instructions more precisely," "verifies its own outputs before reporting back." This either means Anthropic fixed what broke, or 4.6 was deliberately weakened so 4.7 could look like a bigger jump.
Power users on the $200/month Max plan started canceling. When a session burns through your monthly budget in an hour because the model won't stop looping without intervention, the value proposition collapses.

The 4.6 Regression, By the Numbers

The AMD analysis is the most rigorous external measurement published on the regression. Key findings across ~7,000 sessions:

Thinking depth: 2,200 → 600 characters (−73%)
File-skip rate: 6% → 33.7% (model editing without reading first)
User interruptions: 12× increase
"Simplest" frequency: 3× increase in model outputs

Beyond the metrics, users reported premature task abandonment — the model stopping work mid-task without warning or explanation. This is particularly damaging for autonomous agent workflows that run unsupervised over long periods.

Is 4.7 Real or Staged?

The conspiracy argument: Anthropic intentionally degraded 4.6 in production (a server-side change, not a model change) to set a low baseline, then launched 4.7 against that baseline so the improvement appears larger. Evidence cited: the regression happened server-side with no model version change, and Anthropic acknowledged the issue only after significant public pressure.

The counter-argument: model behavior in production is complex, evaluation benchmarks don't capture all real-world failure modes, and regressions can happen from infrastructure changes unrelated to intentional capability decisions. Nate doesn't claim certainty either way — he presents both cases and lets viewers decide.

Related on OpenClawDatabase

Nick Saraev on Opus 4.7 Benchmarks — independent benchmark analysis and Mythos distillation theory
Claude Opus 4.7 as a 24/7 Trading Agent — practical application of the upgraded model
OpenClaw Configuration — how to pin model versions and manage upgrades

← Back to News digest · See also: OpenClaw guide