Deep dive
Archestra + Ollama: Sandbox and Block What Your Local AI Agents Can Do
Fahd Mirza runs Archestra — an open-source platform for running AI agents safely in production — and drives the whole thing with a local model through Ollama. Every MCP tool call the agent makes routes through Archestra, which runs each MCP server as an isolated Kubernetes pod and lets you set a per-tool policy: allow, require approval, or block. A single Docker command stands up the entire stack, and blocking a tool live shows how deterministic guardrails break the "lethal trifecta" that makes tool-calling agents dangerous.
"Control What Your AI Agents Can Do: Archestra + Ollama Hands-On" by Fahd Mirza — Watch on YouTube →
Step-by-Step Breakdown
-
Understand what Archestra sits in front of
When you connect an agent to MCP servers and let it call tools, you have little visibility into what it's touching and no easy way to stop it mid-run. Archestra sits between the agent and its tools, showing every call live and giving you a kill switch. It's built by the team behind Grafana OnCall, so production thinking is baked in — SSO, RBAC, a Terraform provider, and a Kubernetes operator all ship with it.
-
Pull and launch with one Docker command
Getting Archestra running is a single Docker pull followed by one run command that maps the UI to port 3000 and the API to 9001 and grants the container access to the Docker socket so it can spin up MCP servers. That one command stood up the entire platform: an embedded Kubernetes cluster via
kind(Kubernetes-in-Docker) for running MCP servers as isolated pods, a Dagger engine for sandboxed code execution, a PostgreSQL database (with migrations run), the backend and frontend, and proxy routes for every major LLM provider — Anthropic, OpenAI, AWS Bedrock, Ollama, vLLM, and more. -
Sign in and connect a local model through Ollama
Open the UI at
localhost:3000. In quick-start mode Archestra seeds default admin credentials (in a real deployment you'd wire this to your own SSO — Okta, Entra, etc.). Go to Add API key, choose Ollama (no key needed), and set the base URL to the Docker-internal host — notlocalhost— because Archestra runs inside Docker. Test and save, then open the Models tab and click Refresh models to surface your local ~27B model. Any tool-capable local model works; you do not need the H100/80 GB VRAM shown in the demo. -
Add an MCP server from the registry
Go to Studio → MCP registry → Add MCP server. You can paste a JSON definition or pick from the online catalog. Fahd adds a "fast website reader" server. Archestra auto-discovers and classifies the tool — tagging it read-website, idempotent, open-world, and marking its result sensitive — and exposes a per-tool call policy right there.
-
Verify each MCP server is a sandboxed pod
Back in the terminal, listing the cluster's pods shows the fast website reader running as its own isolated pod in the self-hosted Kubernetes cluster — not just a spawned local process. Every MCP server Archestra runs gets its own sandboxed pod, and the same setup scales to any Kubernetes cluster (EKS, GKE, AKS, or on-prem).
-
Build a no-code agent and grant it the tool
Under Agents → Create agent, set a name, description, and instructions. By default a new agent has access to all tools; to scope it, edit the agent, set Capabilities → Custom, and Add just the fast website reader before clicking Update.
-
Run the agent and inspect the full trace
Ask the agent to read a website and summarize it. It searches for the matching tool, runs it, and returns a grounded answer. Opening the trace view shows the exact request made, the tool call, the command it ran, and the result returned — full transparency into what happened behind the scenes.
-
Block a tool live and watch the chain break
Go to MCP registry → guardrails and set the call policy to Block, then finish. Re-running the same agent chat now reports "no tools matched" — the MCP server can't be reached, so the model falls back to its own memory instead of touching the website. That's the deterministic guardrail breaking the "lethal trifecta": private-data access + exposure to untrusted content + the ability to communicate externally.
Commands & Code Shown
Fahd runs these on an Ubuntu box with Docker installed. The exact image name and full flags come from Archestra's own install docs — the commands below reflect what the video shows (port mappings and Docker-socket access).
docker pull the Archestra platform image
docker pull <archestra-platform-image>
Purpose: Downloads the Archestra platform image before first launch.
When to use: Once, up front. Requires a recent version of Docker installed on the host.
docker run — stand up the whole platform
docker run \
-p 3000:3000 \ # UI
-p 9001:9001 \ # API
-v /var/run/docker.sock:/var/run/docker.sock \ # so it can spin up MCP servers
<archestra-platform-image>
Purpose: Brings up the entire stack — the embedded kind Kubernetes cluster, the Dagger sandbox engine, PostgreSQL (with migrations), the backend/frontend, and LLM proxy routes for Anthropic, OpenAI, Bedrock, Ollama, and vLLM.
When to use: To run Archestra locally. The UI lands on localhost:3000, the API on 9001.
List the MCP server pods
kubectl get pods
Purpose: Confirms each MCP server is running as its own isolated Kubernetes pod rather than a local process.
When to use: To verify sandboxing after adding an MCP server.
Common Errors & Fixes Covered
Why it happens: Archestra tries to ship traces to an OpenTelemetry collector, which isn't configured in quick-start mode.
Fix: They're harmless — ignore them for a local run, or wire up an OTel collector for production observability.
Why it happens: The model list hasn't been refreshed since the provider was added.
Fix: Click Refresh models on the Models tab and your local model shows up.
Why it happens: Archestra runs inside Docker, so localhost points at the container, not your host's Ollama.
Fix: Set the Ollama base URL to the Docker-internal host address so the container can reach Ollama running on the host.
Gotchas & Caveats
- Use the Docker-internal host (not
localhost) for the Ollama base URL — the platform runs inside Docker and can't reach your host'slocalhostdirectly. - The seeded admin credentials are for quick-start only. For anything real, wire up SSO and RBAC before exposing it.
- Your local model must support tool use, or the agent can't call MCP tools at all. You don't need a high-end GPU — any commodity GPU with a tool-capable model works.
- Guardrails are enforced deterministically at the proxy layer, but you have to set the policy per tool. Blocking is what actually stops a call — review each tool's default policy rather than assuming it's locked down.
Key Takeaways
- Archestra is an open-source gateway that sits between your agent and its MCP servers, giving live visibility and a kill switch over every tool call.
- One Docker command stands up an embedded Kubernetes (
kind) cluster, a Dagger sandbox, PostgreSQL, and LLM proxy routes for Anthropic, OpenAI, Bedrock, Ollama, and vLLM. - Each MCP server runs as its own isolated Kubernetes pod — sandboxed, not a local process, and scalable to EKS/GKE/AKS.
- Per-tool call policies (allow / require approval / block) are the core guardrail; blocking a tool live stops the agent from reaching it.
- Deterministic guardrails are designed to break the "lethal trifecta" — private-data access, exposure to untrusted content, and the ability to communicate externally.
- You can run the whole thing locally and free with Ollama and any tool-capable model — no cloud API key required.





