Last updated: 2026-05-03

Local / on-device agents — Benchmark Sources & Consensus

Name: Local / on-device agents benchmark sources
Creator: OpenClawDatabase
License: https://creativecommons.org/licenses/by/4.0/

Running an agent entirely on local hardware (no cloud API calls).

Platforms tracked: Nemoclaw · Openclaw

Consensus across 2 sources

Across 2 sources, local agents show promise: $500 GPU setups rival cloud Claude Sonnet, and harness selection on M3 Max hardware swings coding pass rates by 16+ points.

All Sources

We aggregate published benchmarks; we never run our own tests and never pick winners. Each row links back to the original publication.

Source	Date	Finding	Methodology	Quality
Hacker News	2026-03-27	A $500 local GPU using multi-solution generation with test-feedback filtering achieves performance comparable to Claude Sonnet on coding tasks	Multi-candidate solution sampling with iterative refinement; compared against Claude Sonnet API on coding benchmarks	medium
neuralnoise.com	2026-04-28	17 model-quants × 5 harnesses on M3 Max: claude harness 3rd at 66.2%; Qwen3.6-27B+pi tops at 82.5% on 16 SE coding tasks	17 quants × 5 harnesses × 16 SE tasks on local M3 Max 128GB; automated pass/fail	high

How we work

OpenClawDatabase aggregates and links to published benchmarks. We don't run our own tests, and we don't pick winners. Our weekly benchmark-aggregator routine scans 7+ live leaderboards (OpenRouter, Aider, SWE-bench, GAIA, LMSYS, BigCodeBench, MMLU-Pro) plus relevant Reddit and Hacker News threads, then writes structured entries into /assets/benchmarks.json. Every row here links back to the original publication.

← Back to all benchmark tasks · See also: Decision guide · Cost calculator