Last updated: 2026-04-18

What is SWE-bench?

Benchmark testing whether an agent can resolve real GitHub issues by reading the repo, writing a patch, and passing the project's tests. The closest thing to a "can it ship code?" score. Agents are ranked by pass rate on 2,294 issues.

See also

← Back to the full AI agent glossary.