Published: 2026-06-04
Add Voice to Hermes & OpenClaw with MiniMax M3: Hands-Free Agent Interaction
The MiniMax M3 update adds a built-in voice agent to both OpenClaw and Hermes — no special hardware or API key juggling required. You speak, the model transcribes and thinks, then responds in a real voice. This video demonstrates the voice pipeline, four selectable voice modes, voice note storage, phone-based hands-free usage via Telegram, and MiniMax M3's broader multimodal capabilities.
Source video
"Hermes + OpenClaw Agent Voice Mode Just Dropped…" by Julian Goldie SEO — Watch on YouTube →
Key Takeaways
- MiniMax M3 adds voice chat directly into OpenClaw and Hermes — activate it by clicking the voice button in the agent UI, no extra setup required.
- Voice pipeline: microphone input → transcription → MiniMax M3 processing → spoken response. The model handles all three steps natively.
- Hermes offers four voice modes, including a "deep mode" for richer, more detailed spoken responses; choose the mode from the voice settings panel.
- Latency is a few seconds between speaking and response — not instant like a phone call. Factor this in for real-time conversations; it works best for task delegation rather than rapid back-and-forth.
- Voice notes are saveable to your workspace — useful for recording podcast snippets, ideas, or task outputs without switching apps.
- Works from phone via Telegram — hands-free agent control while away from your desk, at the gym, or commuting.
- MiniMax M3 also handles image and video generation alongside voice, and integrates Grok for real-time Twitter/X search when used as the model in OpenClaw.
- MiniMax M3 demonstrated 24-hour autonomous operation on a kernel optimization task, completing 1,959 agentic tool calls — indicating strong long-running agent reliability.
How to Enable Voice Mode
# In OpenClaw or Hermes Desktop:
# 1. Set MiniMax M3 as your active model
# (OpenClaw: /model minimax-m3 or via Settings → Model)
# (Hermes Desktop: Settings → Model → select MiniMax M3)
# 2. Click the voice/microphone button in the chat interface
# — OpenClaw: voice button appears in the input bar
# — Hermes Desktop: voice tab in the top navigation
# 3. Select a voice mode (Hermes has 4 options including "deep mode")
# 4. Speak your task — agent transcribes, thinks, responds aloud
# To save a voice note: use the "workspace" tab to store recordings
# For phone access: use Hermes via Telegram
# — speak your message in the Telegram voice note feature
# — Hermes transcribes and processes as a normal message





