Published: 2026-07-03
Deep dive

The Agent Skeleton: One Structure for Email, Insurance Appeals, and Taxes

Chapters / key moments (click to jump — plays here on the page)

Most agent demos stop at drafting emails and scheduling meetings — low-stakes work you could do yourself. Nate B Jones argues the real prize is high-trust paperwork like insurance appeals and taxes, and that you reach it with the same agent structure you'd use for email. He builds one nine-part "agent skeleton" live, then reuses it verbatim across three escalating builds, showing that domain (health vs. taxes) is an illusion — to the agent, it's always a mess-to-structure problem followed by a human gate.

Source video

"Every AI Agent Demo Stops at Email. I Pointed Mine at the Bills That Cost You Money." by Nate B Jones — Watch on YouTube →

Step-by-Step Breakdown

Reframe the problem: it's mess-to-structure, not domain-by-domain
We file life by domain — health, taxes, insurance — so each feels like a different problem. To an agent, they're identical: unstructured files that must become structured context before any delicate operation. The unopened tax folder and the un-appealed insurance denial are the same shape of problem. Fix the messy data first; the final action is the easy part.
Build for lift, not clicks
Don't chase the demo where the agent "sends the email" or "files the claim." The valuable agent is the one that does the heavy lifting — sorting bureaucracy, organizing the mess, getting everything ready — so that the human's final click is trivial. You can click the button; you need the agent to make clicking it safe.
The nine-part skeleton
Every build is the same nine primitives, in order: context pack (define exactly what the agent may read), ingest, chunk, normalize, store, retrieve, cite, export, and gate. The gate is the rule the whole video turns on: the agent may read, organize, draft, and cite — but it is never allowed to submit, pay, or sign. That guardrail is designed in from the start, not bolted on.
Build 1 — email & calendar (mistakes are cheap)
The agent gets a context pack (this thread, calendar constraints, the people involved) with one goal: prepare a reply and a proposed calendar hold — not send it. It ingests the thread, normalizes (dates become dates, people become people, time-zone mismatches surface), drafts the reply, builds the proposed hold — then stops and leaves a receipt: what sources it used, what it changed, what still needs approval. "AI handled it" becomes "I know what happened and I can trust it."
The bridge everyone skips: reuse, don't restart
Going from an email agent to an insurance-appeal agent doesn't mean a new tool or system. You already own ingestion-with-source-anchors, normalization, the receipt, and the gate from Build 1. Those primitives don't care whether they're in a scheduling thread or a denial letter. This is the flywheel: every build adds a reusable skill to the shelf and makes the next build cheaper.
Build 2 — insurance appeals (real money, delicate)
Context pack: the denial letter, the insurer's real published policy documents, claim history, supporting docs. Goal: not a "vibes-based" appeal letter but an inspectable case file. The agent chunks the denial into addressable pieces (date, denial reason, claim number, deadline, "what evidence would change this"), normalizes amounts and — critically — turns missing documents into missing documents so gaps surface before a deadline. Everything is stored locally (a SQLite database + a folder you can open). Because insurers must cite the exact policy language they rely on, retrieval is by structure, not a vector database — you already know the "address" of what's hurting you. The agent sanity-checks whether the cited section actually says what the letter implies (finding #1), then produces a timeline, a denial map, the governing policy language, an evidence checklist (have vs. missing), and a draft letter. The letter isn't the point — the citation-mapped evidence packet is, because you can validate every argument. Then it stops.
Build 3 — taxes (the flywheel pays off)
Same skeleton, same order, far faster because nothing is new. Ingest W-2s, 1099s, invoices, receipts, bank exports, mileage notes — much of it living in that same inbox from Build 1. Chunk into forms; normalize into a tax-year ledger (date, vendor, amount, category, source file). A citation guard won't let a deduction through without evidence — it points at the receipt or flags the line instead of guessing. The export is a reviewable packet (income summary, expense ledger, deduction-evidence map, missing docs, and questions for your CPA), not a completed 1040. It preps the folder and stops.
The payoff: clean data lets you drop the expensive model
All three builds share clean, normalized data underneath. When dates are dates and every claim has an "address," you stop needing the most expensive model for most of the work — lightweight (even open-source) models handle it. Fix the dirty pile first, and model choice opens up.

Gotchas & Caveats

The gate is non-negotiable. The agent never submits, pays, or signs. If it sends a bad appeal on its own, you now have two problems — the denial and the mess the agent made.
Citations make review faster, not optional. You are responsible for what you send. Never fire an agent-generated packet at an insurer or the IRS unread.
Where money or health is involved, keep a professional in the loop. The agent turns your pile into a case file so a human (or CPA) can win — it doesn't win for you.
Everything stays local. In the demo, policy documents are real but patient details are synthetic; on your machine, your real files never leave. Local storage (SQLite + a folder) means you can open the sources and records yourself.
Don't build one-offs. A one-shot agent for taxes throws away the work. Build the reusable skeleton so each new domain is cheaper than the last.

Key Takeaways

High-trust paperwork (insurance, taxes, healthcare) and low-stakes email are the same problem to an agent: unstructured mess → structured context → gated human action.
The nine primitives — context pack, ingest, chunk, normalize, store, retrieve, cite, export, gate — are the whole build. You watch the same list run three times.
Design the gate from day one: read/organize/draft/cite is allowed; submit/pay/sign is not.
Store locally and cite by structure — for regulated documents you already know the exact address of the relevant text, so structural retrieval beats a vector database.
Clean normalized data is the real unlock: it lets cheaper, smaller models do most of the work.
Treat each build as a flywheel — reusable primitives make every next high-trust use case shorter to build than you'd expect.