Skip to content

AIGC-Hackers/openclaw-zoom-agent

Repository files navigation

🦞 OpenClaw Zoom Agent

AI voice agent that joins Zoom meetings via PSTN dial-in. Listens, transcribes, and speaks with AI-generated responses.

How It Works

You β†’ "Join my Zoom" β†’ Agent dials Zoom PSTN β†’ DTMF joins meeting
                      β†’ Live transcription via Telnyx webhooks
                      β†’ AI responses via GPT-4o-mini β†’ Telnyx TTS

Quick Start

# 1. Install dependencies
npm install

# 2. Copy and fill in your credentials
cp .env.example .env

# 3. Join a meeting
node m3-voice-agent.js -m 83914076399 -p 953856

CLI Options

node m3-voice-agent.js [options]

  -m, --meeting-id <id>     Zoom meeting ID (required)
  -p, --passcode <code>     Meeting passcode
  -d <seconds>              Max duration (default: 600)

Environment Variables

Variable Required Description
TELNYX_API_KEY βœ… Telnyx API key
TELNYX_DID βœ… Your Telnyx phone number
TELNYX_CONNECTION_ID βœ… Call control application ID
OPENAI_API_KEY βœ… OpenAI API key (for GPT responses)
ZOOM_DIAL_IN Zoom dial-in number (default: +16699009128)
AGENT_NAME Display name (default: "AI Assistant")
AGENT_ROLE Role description
AGENT_INSTRUCTIONS Custom system prompt
BUFFER_DELAY Ms to wait before responding (default: 1500)
NO_SPEAK Set to "true" for listen-only mode
TRANSCRIPT_FILE Path to save transcript after call
USE_OPENCLAW_BRAIN Set to "true" to route responses via OpenClaw gateway
OPENCLAW_GATEWAY OpenClaw gateway URL (default: http://localhost:18789)
OPENCLAW_TOKEN OpenClaw API token (if auth enabled)

Architecture

Files

File Purpose
m3-voice-agent.js Main agent β€” full voice loop (transcribe + respond + speak)
m2-live-transcribe.js Transcription-only mode (no AI responses)
gemini-live-agent.js Gemini Live API integration (experimental β€” audio bridge WIP)
bridge.js WebSocket bridge (requires public URL for media streaming)
server.js Legacy Retell AI integration

Flow

  1. Dial β€” Telnyx PSTN call to Zoom dial-in number
  2. Join β€” DTMF sequence: meeting ID β†’ skip participant ID β†’ passcode
  3. Tunnel β€” ngrok exposes local webhook server for Telnyx events
  4. Transcribe β€” Telnyx real-time transcription (Engine B)
  5. Think β€” OpenClaw brain (if enabled) or GPT-4o-mini generates response
  6. Speak β€” Telnyx TTS speaks response into the call

Timing

  • ~15s for Zoom IVR greeting
  • ~13s for DTMF sequence (meeting ID + passcode)
  • ~1.5s buffer before responding (configurable)
  • Total join time: ~30s

Requirements

  • Node.js 18+
  • ngrok installed and authenticated
  • Telnyx account with:
    • A phone number (DID)
    • Call control application (outbound channel limit β‰₯ 2)
  • OpenAI API key

Telnyx Setup

  1. Create a Call Control Application
  2. Set outbound channel limit to at least 2
  3. Assign your phone number to the application
  4. Note the connection ID β€” that's your TELNYX_CONNECTION_ID

The webhook URL is set dynamically at runtime via the API (no manual config needed).

Language Support

The agent responds in the same language the speaker uses:

  • English β†’ English response
  • Chinese (Mandarin) β†’ Chinese response
  • Mixed β†’ dominant language

Telnyx TTS supports en-US and cmn-CN voices.

Limitations

  • Voice quality: Telnyx basic TTS (robotic). Upgrade path: OpenAI TTS β†’ audio streaming.
  • Latency: ~2-4s round-trip (transcription + GPT + TTS). Reducible with streaming.
  • One call per instance: Run multiple instances for concurrent meetings.
  • PSTN only: No Zoom SDK integration (yet). Phone audio quality.

License

MIT

About

🦞 AI voice agent that joins Zoom meetings β€” transcribes and speaks with GPT

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors