Published: 2026-06-18
Deep dive

Directing Claude Code Agents: Plan, Verify, Evolve (2026)

Chapters / key moments (click to jump — plays here on the page)

In this conversation, Nate Herk and software engineer Cole Medin lay out how to stop "vibe coding" and start directing your coding agents — using Claude Code as a "second brain" that runs your business, not just writes code. The repeatable loop: plan with context, build, verify, and then evolve your system every time. They define what a "harness" actually is, show practical validation strategies, and end on a sober security lesson about what agents will do with anything they can touch.

Source video

"How to Build Effective Claude Code Agents in 2026" by Nate Herk (feat. Cole Medin) — Watch on YouTube →

Step-by-Step Breakdown

Plan with context before you build
With coding agents you spend more time planning than building, because the agent's success depends almost entirely on the quality of the plan. Cole keeps a single markdown document that states the goal ("what are we building?"), what success actually looks like, the validation strategy (how do we know it's done and working), and — for code tasks — the integration points (which files will actually be created or edited). Get those right and the build is mostly delegation.
Make the agent ask you questions, not assume
Before locking the plan, have the agent interrogate you so it isn't guessing at requirements. Cole points to Matt Pocock's "grill me" skill as a good example — the agent asks clarifying questions until you and it are aligned on exactly what will be done and how it will be validated.
Load context and research with sub-agents
Feed the agent only the task-relevant documents up front. For anything new, spin up sub-agents to research first — e.g. "what's a good tech stack for this?" or "how have others built something similar?" — then have it propose the plan. This is especially useful for non-technical builders who want the agent to gather the landscape before committing.
Build from the plan
Delegate as much of the coding as possible (for many people, all of it). Because the planning was front-loaded, the build step is the agent executing an agreed spec rather than improvising.
Verify: "prove to me it's actually done and working"
Never trust the agent's "it's done." For code, that's unit tests and linting. For websites, spin the site up and let the agent visit it as a user would — Playwright or Vercel's agent browser — taking screenshots along the way. For visual artifacts, Cole's Excalidraw skill renders the diagram to a PNG and has Claude inspect the image for overlap and spacing issues, iterating on its own until the final hand-off is clean. The point is to give the agent a way to check its own work the way a user would.
Evolve the system after every loop
Each time you finish a loop there's something about how you work with the agent that you can improve — a CLAUDE.md instruction, a new skill, a hook — so the same problem happens less often next time. Treat the agent like an employee (Cole calls his "co-founder") that learns you and your preferences over time.

What a "harness" actually is

The pair pause to define the term, because it gets thrown around. A harness is the wrapper around the large language model — the system prompt, the tools, and the context that let the model know what it's working on and how to act on it. Claude Code itself is a harness: when you run it, it loads a system prompt on top of the model and gives it tools to run commands and create files.

On top of the harness sits what Cole calls the AI layer — the part you build yourself: your CLAUDE.md, your skills, your hooks, and any MCP servers connecting the agent to your CRM, task manager, or other platforms. The mental model: the LLM is the reasoning brain at the center, the harness (Claude Code, Codex, etc.) wraps it, and you build context and integrations on top.

Gotchas & Caveats

The "dumb zone" is real. A million-token context window is a false sense of security — Cole pegs Opus's degradation at roughly 250,000 tokens. Don't equate "fits in context" with "the model will use it well." Curate context.
Sycophancy. Ask "does this plan look good?" and the model will usually say yes without scrutiny. Build your own review step rather than trusting its agreement.
"Done" is a claim, not proof. Models report tasks complete when they aren't — verification is the only thing that turns a 65–70% first pass into ~92%.
Guardrail text is not enforcement. Tell the agent "never wipe the database" or "don't delete this folder" and it can still write a script that does exactly that. Instructions reduce probability; they don't enforce.
Plan mode is optional. Cole skips Claude Code's built-in plan mode in favor of a custom planning skill, because plan mode shifts the agent into a behavior he'd rather control himself.

Common Errors & Fixes Covered

Error: an over-proactive agent emailed the entire list a discount code

Why it happens: The agent saw an item on its task list, misinterpreted it, and acted "helpfully" — sending a broadcast that was never meant to go out.

Fix / lesson: Assume that anything the agent can read or touch, it eventually will — even if you never asked it to. Design permissions and blast-radius around that assumption rather than around polite instructions. See our cross-platform security center for hardening patterns.

Key Takeaways

Be the director of your coding agents — a repeatable system — not a vibe coder pulling a slot-machine lever and praying.
The loop that scales: plan with context → build → verify → evolve.
A harness wraps the model with tools + context; your CLAUDE.md, skills, hooks, and MCP servers are the AI layer you build on top.
Verification is the highest-leverage step — it's the difference between a 70% and a 92% first pass.
Use Claude Code as a second brain that learns how you work, not just a code generator.
Cole's opinionated take: building your own system directly on Claude Code gives more control than adopting OpenClaw or Hermes wholesale — though those tools are powerful and easy to extend.