Published: 2026-06-25
Deep dive

4 Claude Code Upgrades That Make It Actually Make You Money

Chapters / key moments (click to jump — plays here on the page)

Nate Herk argues that Claude Code is tuned to make you feel productive, not to make you money — and patches the gap with four upgrades. He demos each one live while building a small product end to end: a /roast persona council that stress-tests an idea, a Playwright verification loop that forces Claude to check its own work, a /session-handoff skill that beats context rot, and parallel sub-agents driven by a /goal command graded by a separate evaluator model.

Source video

"I asked Claude Code to make me as much money as possible" by Nate Herk — Watch on YouTube →

Step-by-Step Breakdown

Upgrade 1 — Kill sycophancy with a /roast council
Models default to agreeing with you — research calls it sycophancy, and the "Elephant" study found models fail to push back on how you frame something ~88% of the time (vs ~60% for humans), getting worse the more memory/personalization they have. The fix is a /roast skill that spins up a council of personas — a contrarian hunting fatal flaws, an expansionist looking for upside, a first-principles thinker working from pure logic, a deep researcher pulling real market and competitor-pricing data, and a buyer who role-plays whether they'd actually pay. A judge synthesizes one verdict — green light, reshape, or kill — plus the single cheapest 48-hour test to validate the idea. In the demo a "$9/mo YouTube-transcript-to-LinkedIn-posts" idea scored 2/10 from the contrarian and was sent back to "reshape" with a concrete next step (DM 20–30 people in a niche before writing code).
Upgrade 2 — Make Claude verify its own work with a Playwright loop
"Finished" and "actually working" are not the same — an NYU review of ~1,600 Copilot-generated programs found ~40% had security vulnerabilities, and the failures are easy to miss until something breaks in front of a client. This upgrade is a methodology, not a single skill: prompt Claude to run a verification loop after it builds. In the demo Claude builds a landing page, starts a local server, drives it with Playwright CLI, screenshots every section at desktop and mobile viewports, then iterates until there are zero visible errors against an explicit "definition of done." A second pass stress-tests the form in a headed browser — submitting malformed emails, odd spacing, varied dropdowns — surfacing edge-case bugs (22/22 tests run, plus honest non-blocking notes like "no duplicate-email guard" and "lenient email validation").
Upgrade 3 — Manage context to dodge "context rot"
Every model degrades as a conversation grows — the "context rot" study tested 18 top models and all got worse on even simple tasks well before the window filled. Tools shown: /context to visualize exactly what's eating the window (MCP servers, skills, memory files, tool results), /clear to wipe and start fresh, and /compact (which he mostly avoids — slow). His preferred move is a custom /session-handoff skill: it writes a structured summary (what started, locked decisions, what shipped, key files, running state, verification status, open questions, where to pick up), then /copy → /clear → paste to resume in a clean window — or to hand the work to a different model or Codex. He starts a new session whenever context passes roughly a quarter-million tokens.
Upgrade 4 — Parallelize with sub-agents and a /goal evaluator
You are the bottleneck because Claude only points one direction at a time. Anthropic's own engineering team found a lead agent coordinating parallel sub-agents beat a single agent by 90%+ on their internal research eval. A sub-agent is a separate Claude with its own clean context that does one task and reports back — spin several up for any independent work (e.g. researching three topics at once). Layered on top is the /goal command: you set a completion condition and Claude works turn after turn until it's met, with a separate evaluator model checking each turn for "done = true." Claude doesn't get to declare itself finished — separating worker from judge fixes the same sycophancy problem from Upgrade 1.

Commands & Skills Shown

`/roast`

/roast I have this idea to make a $9/month tool where people drop in a YouTube link and the transcript becomes a week of LinkedIn posts

Purpose: Custom skill that pulls Claude out of agreement mode and convenes a persona council (contrarian, expansionist, first-principles, deep researcher, buyer) plus a judge that returns green-light / reshape / kill and the cheapest 48-hour validation test.

When to use: Before building anything or approving a plan — any decision you want stress-tested rather than rubber-stamped.

Playwright verification-loop prompt

After you build it, do not trust that it looks right. Verify yourself with
Playwright CLI before reporting back: start the local server, open the site,
screenshot each section individually at both viewports, iterate until there
are no visible errors and the form looks clean. Definition of done: ...

Purpose: Forces Claude to drive the real UI (screenshots, clicks, form submissions) and fix issues before handing work back.

When to use: Any build with an observable surface — landing pages, forms, flows. Tailor the verification to what you're building (a data pipeline verifies differently than a UI).

`/context`

/context

Purpose: Visualizes what's consuming the context window — MCP servers, skills, memory files, system tools, and tool results — and suggests where to reclaim tokens.

When to use: When sessions feel slow or sloppy, to triage before clearing.

`/clear`

/clear

Purpose: Wipes the conversation and resets the context window to zero.

When to use: After capturing a handoff summary, to escape context rot without losing your place.

`/session-handoff` (custom skill) + `/copy`

/session-handoff
/copy
/clear
# paste the handoff back into the fresh window

Purpose: Writes a structured resume-here summary (decisions, key files, running state, verification, open questions) so a cleared session — or a different model/Codex — picks up exactly where you left off.

When to use: Before clearing, instead of relying on /compact.

`/goal`

/goal <completion condition>

Purpose: Sets a finish line and lets Claude work turn after turn until met, with a separate evaluator model grading each turn for "done = true."

When to use: Long, well-defined tasks where you want autonomous iteration with an independent done-check rather than self-declared completion.

Problems These Upgrades Target

Problem: Claude agrees with everything (sycophancy)

Why it happens: Models are tuned to be liked; the "Elephant" study measured ~88% failure to push back on framing, worse with more memory/personalization.

Fix: The /roast council and the /goal evaluator both separate the worker from an independent judge.

Problem: Claude reports work as done that isn't

Why it happens: "Finished-looking" output isn't tested; in one anecdote an outreach agent claimed it sent hundreds of emails but had sent ~25%.

Fix: The Playwright verification + stress-test loop, with an explicit definition of done.

Problem: Long sessions get slow and dumb (context rot)

Why it happens: All models degrade as the window fills — often well before it's full.

Fix: /context to audit, /session-handoff + /clear to reset cleanly, and start fresh past ~250K tokens.

Gotchas & Caveats

The /roast skill triggered itself once without being asked — useful, but be explicit ("don't use the roast skill") when you want a plain answer for comparison.
Verification loops are build-specific: a landing page verifies with screenshots and clicks; a pipeline or edited video needs an entirely different check. There's no one magic button.
The generated landing page was visually clean but "AI-generic" — verification proves it works, not that it's well-branded. Design taste is still on you.
The skills and prompts are distributed through the creator's free Skool community, not a public repo — treat any downloaded skill the way you'd treat any third-party skill and review it before running.

Key Takeaways

Default Claude optimizes for feeling productive; deliberate scaffolding (council, verification, handoff, parallelism) is what makes the output trustworthy.
Separate the worker from the judge — a persona council and a /goal evaluator both beat letting the model grade itself.
Make verification a habit baked into your prompts, with an explicit definition of done, so "looks finished" becomes "proven working."
Treat context as a scarce desk, not an infinite drawer — audit with /context and reset with a handoff before quality quietly drops.
Fan out independent work to sub-agents; Anthropic's own data showed a 90%+ lift over a single agent doing everything serially.