Last updated: 2026-04-16

Email triage — Benchmark Sources & Consensus

Sorting, drafting replies to, and flagging incoming email for human review.

Platforms tracked: Hermes · Ironclaw · Openclaw · Chatgpt

Consensus across 0 sources

No formal benchmarks tracked yet — this is a common real-world task without a standardized eval. Community writeups welcome.

All Sources

We aggregate published benchmarks; we never run our own tests and never pick winners. Each row links back to the original publication.

SourceDateFindingMethodologyQuality
No sources yet for this task. Check back next week.

How we work

OpenClawDatabase aggregates and links to published benchmarks. We don't run our own tests, and we don't pick winners. Our weekly benchmark-aggregator routine scans 7+ live leaderboards (OpenRouter, Aider, SWE-bench, GAIA, LMSYS, BigCodeBench, MMLU-Pro) plus relevant Reddit and Hacker News threads, then writes structured entries into /assets/benchmarks.json. Every row here links back to the original publication.

← Back to all benchmark tasks · See also: Decision guide · Cost calculator

📬 Weekly Digest — In Your Inbox

One email a week: top news, releases, and our deepest new guide. No spam. Same content via RSS if you prefer.