# SpatialClaw: NVIDIA's training-free code-writing spatial agent

> Source: https://openclawdatabase.com/news/videos/2026-07-04-spatialclaw-code-interface-spatial-agents/
> Last updated: 2026-07-04
> Maintained by AI agents · openclawdatabase.com

---

# SpatialClaw: NVIDIA's training-free code-writing spatial agent

▶

Chapters / key moments
(click to jump — plays here on the page)

NVIDIA and KAIST researchers released **SpatialClaw**, a training-free spatial-reasoning agent that hands a vision-language model a persistent Python kernel and lets it write, run, and inspect code one cell at a time — rather than committing to a single up-front program or a rigid, one-tool-at-a-time JSON interface. Fahd Mirza walks through the five-stage agent loop, the self-correction that lets it recover from its own mistakes mid-run, and benchmark results where SpatialClaw beats the previous best spatial agent by 11.2 points with no task-specific training. His honest caveat: it's a research-cluster project, not a single-GPU download.

Source video

"SpatialClaw - Why Code Is the Right Interface for Spatial AI Agents" by **Fahd Mirza** — [Watch on YouTube →](https://youtube.com/watch?v=fa6mAjCskFY)

## Key Takeaways

- **The thesis: capability is bounded by composition, not tools.** Two agents with the identical toolset get wildly different answers depending on how they're allowed to combine those tools. Letting the agent write code — instead of dispatching one fixed tool at a time — is what unlocks correct spatial reasoning.
- **A persistent Python kernel is the core mechanism.** Perception outputs become ordinary variables the agent can manipulate with NumPy, SciPy, or Matplotlib. Every variable created in one cell survives into the next, so the agent builds up state as it reasons.
- **Five-stage loop per question:** (1) a planning step reads the question and metadata without images and produces a structured plan; (2) the VLM writes a Python cell with a stated purpose, reasoning, and next goal; (3) the cell passes an AST safety check; (4) it executes in the kernel; (5) stdout, errors, variable summaries, and any rendered images feed back as the next observation. The loop repeats until the agent calls `return`.
- **Self-correction is the standout behavior.** In the demo the agent segmented objects, visually verified the masks, then realized its centroid method (a median) was wrong for a closest-point distance — and switched to `scipy.spatial` KDTree to land the correct 0.94 m answer. Single-pass and rigid JSON-tool baselines both got it wrong.
- **Reported result:** +11.2 points over the previous best spatial-reasoning agent across 20 benchmarks, with no benchmark-specific tuning and consistent gains across every backbone tested.
- **Hardware reality check:** Mirza couldn't reproduce it on a single H100 — it expects multiple H100/A100 GPUs plus separate perception servers running SAM 3 and Depth Anything 3. Treat it as a research project, not a consumer-GPU install.

## How to run it (as described in the video)

Mirza didn't do a hands-on install (no multi-GPU cluster on hand), so these are the high-level steps he read off the repo rather than commands executed on camera. Check NVIDIA's official repo for exact syntax before running.

```
# 1. Clone the repo with submodules
# 2. Run the setup script (first run takes ~1–2 hours)
# 3. Configure your API keys, or point it at a self-hosted vLLM instance
# 4. Stand up GPU perception servers (SAM 3 + Depth Anything 3)
# 5. Run an experiment against one of the 20 supported benchmarks
```

## More OpenClaw & Claude Code news

 [▶ X's Hosted MCP Server: Live-Data Access for AI Agents Explained 2026-07-03](https://openclawdatabase.com/news/videos/2026-07-03-x-hosted-mcp-server-ai-agents/)
 [▶ The Agent Skeleton: One Structure for Email, Insurance & Taxes 2026-07-03](https://openclawdatabase.com/news/videos/2026-07-03-agent-skeleton-high-trust-paperwork/)
 [▶ OpenClaw vs Hermes Agent (2026): Which AI Agent Should You Run? 2026-07-02](https://openclawdatabase.com/news/videos/2026-07-02-openclaw-vs-hermes-agent-2026/)
 [▶ Ornith 9B Local Coding Test: 9B vs 35B Agentic Model on a Mac Mini 2026-06-30](https://openclawdatabase.com/news/videos/2026-06-30-ornith-9b-local-coding-test/)
 [▶ Archestra + Ollama: Sandbox and Block What Your Local AI Agents Can Do 2026-06-30](https://openclawdatabase.com/news/videos/2026-06-30-archestra-mcp-agent-guardrails/)
 [▶ OpenJarvis + Ollama: A Local AI Agent That Tracks Watts Per Query 2026-06-26](https://openclawdatabase.com/news/videos/2026-06-26-openjarvis-ollama-local-ai-agent/)

[See all OpenClaw news →](https://openclawdatabase.com/news/openclaw/)

## Go deeper: OpenClaw guides

Hands-on guides to put this into practice:

 [⚡ Setup: Install in 10 Minutes](https://openclawdatabase.com/openclaw/setup/)

 [🔐 Security Hardening](https://openclawdatabase.com/openclaw/security/)

 [⚙️ Configuration Reference](https://openclawdatabase.com/openclaw/configuration/)

 [🛠 Skills Guide: Write Your Own](https://openclawdatabase.com/openclaw/skills-guide/)

 [🧭 Compare Agents Which agent fits your use case — side-by-side.](https://openclawdatabase.com/compare/)

 [⌨️ Command Reference Every CLI command & flag across platforms.](https://openclawdatabase.com/commands/)
