Tool use / MCP — Benchmark Sources & Consensus
Ability to select, call, and chain external tools (MCP servers, function calls) correctly.
Platforms tracked: Claude Cowork · Openclaw · Hermes · Chatgpt
Consensus across 0 sources
No aggregated consensus yet — benchmark sources being gathered.
All Sources
We aggregate published benchmarks; we never run our own tests and never pick winners. Each row links back to the original publication.
| Source | Date | Finding | Methodology | Quality |
|---|---|---|---|---|
| No sources yet for this task. Check back next week. | ||||
How we work
OpenClawDatabase aggregates and links to published benchmarks. We don't run our own tests, and we don't pick winners. Our weekly benchmark-aggregator routine scans 7+ live leaderboards (OpenRouter, Aider, SWE-bench, GAIA, LMSYS, BigCodeBench, MMLU-Pro) plus relevant Reddit and Hacker News threads, then writes structured entries into /assets/benchmarks.json. Every row here links back to the original publication.
← Back to all benchmark tasks · See also: Decision guide · Cost calculator