Deep dive
4 Claude Code Upgrades That Make It Actually Make You Money
Nate Herk argues that Claude Code is tuned to make you feel productive, not to make you money — and patches the gap with four upgrades. He demos each one live while building a small product end to end: a /roast persona council that stress-tests an idea, a Playwright verification loop that forces Claude to check its own work, a /session-handoff skill that beats context rot, and parallel sub-agents driven by a /goal command graded by a separate evaluator model.
"I asked Claude Code to make me as much money as possible" by Nate Herk — Watch on YouTube →
Step-by-Step Breakdown
-
Upgrade 1 — Kill sycophancy with a
/roastcouncilModels default to agreeing with you — research calls it sycophancy, and the "Elephant" study found models fail to push back on how you frame something ~88% of the time (vs ~60% for humans), getting worse the more memory/personalization they have. The fix is a
/roastskill that spins up a council of personas — a contrarian hunting fatal flaws, an expansionist looking for upside, a first-principles thinker working from pure logic, a deep researcher pulling real market and competitor-pricing data, and a buyer who role-plays whether they'd actually pay. A judge synthesizes one verdict — green light, reshape, or kill — plus the single cheapest 48-hour test to validate the idea. In the demo a "$9/mo YouTube-transcript-to-LinkedIn-posts" idea scored 2/10 from the contrarian and was sent back to "reshape" with a concrete next step (DM 20–30 people in a niche before writing code). -
Upgrade 2 — Make Claude verify its own work with a Playwright loop
"Finished" and "actually working" are not the same — an NYU review of ~1,600 Copilot-generated programs found ~40% had security vulnerabilities, and the failures are easy to miss until something breaks in front of a client. This upgrade is a methodology, not a single skill: prompt Claude to run a verification loop after it builds. In the demo Claude builds a landing page, starts a local server, drives it with Playwright CLI, screenshots every section at desktop and mobile viewports, then iterates until there are zero visible errors against an explicit "definition of done." A second pass stress-tests the form in a headed browser — submitting malformed emails, odd spacing, varied dropdowns — surfacing edge-case bugs (22/22 tests run, plus honest non-blocking notes like "no duplicate-email guard" and "lenient email validation").
-
Upgrade 3 — Manage context to dodge "context rot"
Every model degrades as a conversation grows — the "context rot" study tested 18 top models and all got worse on even simple tasks well before the window filled. Tools shown:
/contextto visualize exactly what's eating the window (MCP servers, skills, memory files, tool results),/clearto wipe and start fresh, and/compact(which he mostly avoids — slow). His preferred move is a custom/session-handoffskill: it writes a structured summary (what started, locked decisions, what shipped, key files, running state, verification status, open questions, where to pick up), then/copy→/clear→ paste to resume in a clean window — or to hand the work to a different model or Codex. He starts a new session whenever context passes roughly a quarter-million tokens. -
Upgrade 4 — Parallelize with sub-agents and a
/goalevaluatorYou are the bottleneck because Claude only points one direction at a time. Anthropic's own engineering team found a lead agent coordinating parallel sub-agents beat a single agent by 90%+ on their internal research eval. A sub-agent is a separate Claude with its own clean context that does one task and reports back — spin several up for any independent work (e.g. researching three topics at once). Layered on top is the
/goalcommand: you set a completion condition and Claude works turn after turn until it's met, with a separate evaluator model checking each turn for "done = true." Claude doesn't get to declare itself finished — separating worker from judge fixes the same sycophancy problem from Upgrade 1.
Commands & Skills Shown
/roast
/roast I have this idea to make a $9/month tool where people drop in a YouTube link and the transcript becomes a week of LinkedIn posts
Purpose: Custom skill that pulls Claude out of agreement mode and convenes a persona council (contrarian, expansionist, first-principles, deep researcher, buyer) plus a judge that returns green-light / reshape / kill and the cheapest 48-hour validation test.
When to use: Before building anything or approving a plan — any decision you want stress-tested rather than rubber-stamped.
Playwright verification-loop prompt
After you build it, do not trust that it looks right. Verify yourself with
Playwright CLI before reporting back: start the local server, open the site,
screenshot each section individually at both viewports, iterate until there
are no visible errors and the form looks clean. Definition of done: ...
Purpose: Forces Claude to drive the real UI (screenshots, clicks, form submissions) and fix issues before handing work back.
When to use: Any build with an observable surface — landing pages, forms, flows. Tailor the verification to what you're building (a data pipeline verifies differently than a UI).
/context
/context
Purpose: Visualizes what's consuming the context window — MCP servers, skills, memory files, system tools, and tool results — and suggests where to reclaim tokens.
When to use: When sessions feel slow or sloppy, to triage before clearing.
/clear
/clear
Purpose: Wipes the conversation and resets the context window to zero.
When to use: After capturing a handoff summary, to escape context rot without losing your place.
/session-handoff (custom skill) + /copy
/session-handoff
/copy
/clear
# paste the handoff back into the fresh window
Purpose: Writes a structured resume-here summary (decisions, key files, running state, verification, open questions) so a cleared session — or a different model/Codex — picks up exactly where you left off.
When to use: Before clearing, instead of relying on /compact.
/goal
/goal <completion condition>
Purpose: Sets a finish line and lets Claude work turn after turn until met, with a separate evaluator model grading each turn for "done = true."
When to use: Long, well-defined tasks where you want autonomous iteration with an independent done-check rather than self-declared completion.
Problems These Upgrades Target
Why it happens: Models are tuned to be liked; the "Elephant" study measured ~88% failure to push back on framing, worse with more memory/personalization.
Fix: The /roast council and the /goal evaluator both separate the worker from an independent judge.
Why it happens: "Finished-looking" output isn't tested; in one anecdote an outreach agent claimed it sent hundreds of emails but had sent ~25%.
Fix: The Playwright verification + stress-test loop, with an explicit definition of done.
Why it happens: All models degrade as the window fills — often well before it's full.
Fix: /context to audit, /session-handoff + /clear to reset cleanly, and start fresh past ~250K tokens.
Gotchas & Caveats
- The
/roastskill triggered itself once without being asked — useful, but be explicit ("don't use the roast skill") when you want a plain answer for comparison. - Verification loops are build-specific: a landing page verifies with screenshots and clicks; a pipeline or edited video needs an entirely different check. There's no one magic button.
- The generated landing page was visually clean but "AI-generic" — verification proves it works, not that it's well-branded. Design taste is still on you.
- The skills and prompts are distributed through the creator's free Skool community, not a public repo — treat any downloaded skill the way you'd treat any third-party skill and review it before running.
Key Takeaways
- Default Claude optimizes for feeling productive; deliberate scaffolding (council, verification, handoff, parallelism) is what makes the output trustworthy.
- Separate the worker from the judge — a persona council and a
/goalevaluator both beat letting the model grade itself. - Make verification a habit baked into your prompts, with an explicit definition of done, so "looks finished" becomes "proven working."
- Treat context as a scarce desk, not an infinite drawer — audit with
/contextand reset with a handoff before quality quietly drops. - Fan out independent work to sub-agents; Anthropic's own data showed a 90%+ lift over a single agent doing everything serially.





