Last updated: 2026-04-18
What is SWE-bench?
Benchmark testing whether an agent can resolve real GitHub issues by reading the repo, writing a patch, and passing the project's tests. The closest thing to a "can it ship code?" score. Agents are ranked by pass rate on 2,294 issues.
See also
← Back to the full AI agent glossary.