👁️ Code Review Bot for Pull Requests
An agent that reviews PRs for bugs, style issues, and missing tests — posts comments like a thoughtful junior reviewer, not a nagging linter.
The problem
Linters catch formatting. Humans catch design. But there's a huge gap — bugs that linters miss and humans overlook at 5pm on Friday. Getting humans to review every PR doesn't scale; getting them to review thoroughly never scales. You want a second set of eyes that runs on every PR and catches the obvious-in-retrospect issues.
The outcome
Every PR gets an automated review within 60 seconds of opening. Comments are specific and linked to line numbers. It catches: off-by-one errors, missing null checks, unhandled error paths, tests that exercise the happy path but not the failure path. Humans still review — but they focus on design, not boilerplate.
Why IronClaw
IronClaw's sandbox is critical when you're giving an agent read access to your codebase. Skill allowlisting means the review skill can only read the diff and post comments — it can't exfiltrate source code or open new repos. For any company with IP concerns, this is non-negotiable.
Alternatives worth considering
- OpenClaw — Faster to set up, less strict sandboxing — fine for solo devs or open-source projects
- Claude Cowork — Paste the diff, get review — no automation, but the quality is arguably better since you can add context
Setup steps
-
Step 1: Set up the GitHub webhook
PR opened → webhook → IronClaw receives the event. Scope the webhook to specific repos, not your whole org. Use a dedicated service account with read-only code access + comment permission.
-
Step 2: Write the review prompt carefully
Give the model a role: 'senior engineer doing PR review'. Tell it what to flag (bugs, security, missing tests) and what to ignore (style, formatting — that's the linter's job). Specificity matters — generic reviews are useless.
-
Step 3: Add a 'no opinions on design' rule
Design debates should be human. The bot's value is catching what humans miss, not rehashing what humans will argue about. Explicitly forbid comments like 'consider refactoring this.'
-
Step 4: Batch the PR diff intelligently
Large PRs will blow your context budget. Split by file, review each, then summarize. Use Haiku for the per-file pass and Sonnet only for the summary.
-
Step 5: Monitor the signal-to-noise ratio
Track what % of bot comments devs mark as helpful. Aim for >60%. Below that, tune the prompt. Nobody reads a reviewer that cries wolf.
Example prompt
Review this PR diff as a senior engineer. Flag: logic bugs, missing null checks, unhandled errors, tests missing for non-happy paths. Skip: formatting, style, design opinions. Comment inline with line numbers. Keep each comment under 80 words.
Pitfalls to avoid
- Letting the bot block merges. Never. The bot advises; humans decide. A blocking bot becomes a rubber-stamp bot the day it hallucinates.
- Exposing source code to external APIs without approval. Check with security/legal. Proprietary code going to an API provider is a policy question. IronClaw's local-model mode sidesteps this entirely.
- Paying for Sonnet on every file. Cost can spiral on large PRs. Haiku handles 80% of reviews at 10% the cost. Pin by default; escalate only when triggered.
Cost breakdown (monthly)
| Item | Cost |
|---|---|
| Haiku API (50 PRs/week, avg 5 files) | $15–40 |
| Sonnet fallback for complex reviews | $5–30 |
| IronClaw hosting | $0 (self-hosted) |
Total: $20–80/month. Costs assume typical usage; heavy use can run higher.
Related guides
← Back to all use cases · Compare platforms at the decision guide.