ChatGPT Agent Mode — What It Does & How to Use It
Agent Mode is OpenAI's name for ChatGPT's autonomous task execution — it browses the web, runs code, opens files, and chains tools across multiple steps to complete a goal you describe in a single prompt. Where a regular ChatGPT response gives you text, Agent Mode gives you a finished artifact (a report, a comparison spreadsheet, a booking, a refactored codebase). This page covers what it actually does today, where it falls short, how to use it safely, and what it costs.
Agent Mode = ChatGPT + a sandboxed browser + Python + file tools + a planning loop. Available on Plus, Pro, Business, and Enterprise. Best for research, data wrangling, and form-filling. Not yet reliable for high-stakes financial or security-sensitive actions — and there's no full undo.
What Agent Mode can do today
- Browse the live web — open URLs, click links, fill forms, screenshot pages, extract structured data. Reads JavaScript-rendered content (unlike API web-search, which is text-only).
- Run code — Python in a sandboxed Code Interpreter environment. Read your uploaded files, process them, return results.
- Use multiple tools in sequence — search → click → extract → process → output. The planner decides which tool to invoke for each sub-step.
- Resume long tasks — Agent Mode runs can span 5–20 minutes of autonomous work. You can leave the tab and return to a finished result.
- Hand off to the user when stuck — when it hits a login wall, CAPTCHA, or ambiguous instruction, it pauses and asks you a clarifying question before continuing.
What Agent Mode can't reliably do (yet)
- Make purchases or financial transactions — Agent Mode will plan a checkout flow but stops at the "confirm purchase" step, asking you to complete it manually. This is a deliberate safety boundary, not a bug.
- Access local files outside the chat — Agent Mode runs in OpenAI's sandbox, not on your machine. It can only see files you've explicitly uploaded to the conversation.
- Use your existing browser sessions / cookies — every Agent Mode run starts with a clean browser. It cannot impersonate you on sites where you're already logged in.
- Handle dynamic-only sites with strong bot detection — sites with aggressive anti-bot measures (Cloudflare Turnstile, hCaptcha) often block Agent Mode mid-task.
- Maintain state across separate runs — each Agent Mode session is independent. If you want persistent memory across runs, see the Memory feature guide.
How to start an Agent Mode run
- Open ChatGPT (web app or desktop).
- Below the input box, click the tools menu (icon next to attachments).
- Select Agent Mode. The input area expands and shows "Agent Mode active."
- Describe the task in one prompt. The more concrete, the better — Agent Mode is bad at "make a website" but good at "find the 5 most-recent papers on retrieval-augmented generation, summarize each in two sentences, and put them in a table I can copy into Notion."
- Watch the planning panel on the right — it shows each sub-step (browse, click, extract, run code) as Agent Mode executes. You can stop at any time.
- When Agent Mode completes, it returns a structured summary plus any artifacts (markdown tables, downloadable files, screenshots).
Where Agent Mode genuinely earns its keep
From real-world testing in April 2026, these are the workflows where Agent Mode reliably beats both manual work and using regular ChatGPT:
- Competitive research: "Compare the pricing pages of these 8 SaaS companies on these 4 features. Output a markdown table."
- Data collection from public sources: "Pull the last 30 days of release notes from these 5 GitHub repos and group them by category."
- Form-filling and intake: "Read this PDF, extract these 12 fields, and fill out the corresponding fields in this Google Form. Show me a screenshot before submitting."
- Spreadsheet wrangling: "Read these 3 uploaded CSVs, find the rows where customer_id matches across all three, output a combined sheet."
- Booking research (not booking itself): "Find me 5 flights from JFK to LHR next Friday under $700, return-trip, no overnight layovers." Agent Mode does the research; you do the booking.
Safety boundaries — what OpenAI built in
Agent Mode has explicit guardrails that prevent it from completing certain actions even if you ask:
| Action category | Behavior |
|---|---|
| Purchases / payments | Plans the flow, stops at confirmation, asks user to complete manually |
| Posting public content (social media, forums) | Drafts the post in the chat, never submits without explicit confirmation |
| Email sending | Drafts the email, never sends — you copy and send yourself |
| Account creation / signup | Refuses; tells you to sign up yourself |
| Submitting forms with sensitive data | Halts at the field requesting SSN, payment, or ID and asks for confirmation |
| Following links from untrusted observed content | Refuses by default; treats observed instructions as untrusted |
These boundaries follow the same logic as our cross-platform prompt-injection guidance. They limit Agent Mode's usefulness for some workflows but they're the right defaults given current LLM reliability.
How much it costs
Agent Mode usage counts against your ChatGPT tier's monthly cap, with per-tool-call surcharges layered on top:
| Tier | Agent Mode runs/month | Per-tool-call |
|---|---|---|
| Free | Not available | — |
| Plus ($23/mo) | ~50 runs | Included |
| Pro (~$200/mo) | Unlimited "fair use" | Included |
| Business (per seat) | 200+/seat/month | Pooled across seats |
| Enterprise | Custom | Custom |
A typical Agent Mode run uses 8–15 tool calls. Browsing-heavy runs (research, comparison) burn more; code-heavy runs (file processing) burn fewer. See the pricing deep-dive for worked examples.
When to use Agent Mode vs a Custom GPT vs the API
| Use case | Best fit | Why |
|---|---|---|
| One-off multi-step research task | Agent Mode | No setup, full browsing, completes autonomously |
| Repeated task with a stable system prompt | Custom GPT | Save the prompt + tools, run it daily with one click |
| Programmatic / scripted task | OpenAI API | Lower per-token cost at volume, full control over the loop |
| Long-running scheduled task | Hermes or OpenClaw | Agent Mode is interactive; long-horizon agents are a different category |
| Coding inside an IDE | Kilo Code or Claude Code | Agent Mode is not IDE-integrated; dedicated coding agents have file-tree context |
Common gotchas
- Agent Mode "drifts" on long runs. Beyond 15–20 minutes of autonomous work, output quality drops sharply. Break large tasks into focused sub-tasks.
- Login walls stop it cold. Many target sites require authentication. Either feed Agent Mode the data directly (uploaded CSV, PDF) or pick public-data tasks.
- The "browse the web" panel sometimes hangs. If a run shows no activity for >2 minutes, click "Stop" and restart. It usually completes the second try.
- Output formatting drift. If you ask for a markdown table and Agent Mode returns prose, add "Output only a markdown table with these columns: X, Y, Z" to the prompt.
- Privacy: Agent Mode interactions are logged in your ChatGPT history. For sensitive research, use a Project (Pro+) so the conversation is contained.
← Back to ChatGPT hub · Next: Memory feature →