Published: 2026-06-23

Sakana Fugu Ultra vs Claude Opus 4.8: 38-Task Battle Test

Chapters / key moments (click to jump — plays here on the page)

Nate Herk takes Sakana's viral Fugu Ultra — a single API that orchestrates frontier models (Opus, GPT, Gemini) like a multi-agent router — and runs it head-to-head against Claude Opus 4.8 across 38 tasks. The result: 36 ties, with Fugu roughly 4.5× slower and 5× more expensive, largely because Opus is one of the very models Fugu delegates to.

Source video

"I Battle Tested Sakana Fugu's Fable Killer" by Nate HerkWatch on YouTube →

Key Takeaways

  • Fugu is not a new LLM. It's a small "manager" model that breaks a task down and routes sub-tasks to frontier models (Opus, GPT-5.5, Gemini, and others), then has another model merge the results — a multi-agent system delivered as one API.
  • It runs inside Claude Code via a markdown config file plus an API key. Notably, the context window stays near zero through a long session because responses are routed through Fugu's server rather than filling Claude Code's own context.
  • The scoreboard: across 38 AI-generated, Codex-graded, mostly pass/fail tasks (puzzles, traps, specs, heavy algorithms), 36 ended in ties and Opus won 2. Fugu never clearly won — unsurprising, since Opus 4.8 is one of the models Fugu itself selects from.
  • Cost and speed are the story: Fugu's runs took 357 minutes total vs Opus's 80 minutes, and cost ~$50 vs ~$10 — about 4.5× slower and 5× pricier. Easy tasks Opus answered in ~6 seconds took Fugu several minutes.
  • The pattern isn't new. It's the same orchestration you already do pairing Claude Code sub-agents, or running Codex and Claude Code on one codebase — Fugu just automates the delegation. It differs from OpenRouter's Fusion API, which fans the same prompt to three models and judges/merges rather than splitting the task.
  • Honest takeaway: impressive benchmarks, but for knowledge work the cost and latency aren't worth it over a Claude Code or Codex subscription. The real value is for heavy, multi-team software development — and the broader skill of optimizing which model does which task is only getting more important.

Weekly Digest — In Your Inbox

Get the week's top AI agent news, updates, and guides — every Friday.