AI voice agent that joins Zoom meetings via PSTN dial-in. Listens, transcribes, and speaks with AI-generated responses.
You β "Join my Zoom" β Agent dials Zoom PSTN β DTMF joins meeting
β Live transcription via Telnyx webhooks
β AI responses via GPT-4o-mini β Telnyx TTS
# 1. Install dependencies
npm install
# 2. Copy and fill in your credentials
cp .env.example .env
# 3. Join a meeting
node m3-voice-agent.js -m 83914076399 -p 953856node m3-voice-agent.js [options]
-m, --meeting-id <id> Zoom meeting ID (required)
-p, --passcode <code> Meeting passcode
-d <seconds> Max duration (default: 600)
| Variable | Required | Description |
|---|---|---|
TELNYX_API_KEY |
β | Telnyx API key |
TELNYX_DID |
β | Your Telnyx phone number |
TELNYX_CONNECTION_ID |
β | Call control application ID |
OPENAI_API_KEY |
β | OpenAI API key (for GPT responses) |
ZOOM_DIAL_IN |
Zoom dial-in number (default: +16699009128) | |
AGENT_NAME |
Display name (default: "AI Assistant") | |
AGENT_ROLE |
Role description | |
AGENT_INSTRUCTIONS |
Custom system prompt | |
BUFFER_DELAY |
Ms to wait before responding (default: 1500) | |
NO_SPEAK |
Set to "true" for listen-only mode | |
TRANSCRIPT_FILE |
Path to save transcript after call | |
USE_OPENCLAW_BRAIN |
Set to "true" to route responses via OpenClaw gateway | |
OPENCLAW_GATEWAY |
OpenClaw gateway URL (default: http://localhost:18789) | |
OPENCLAW_TOKEN |
OpenClaw API token (if auth enabled) |
| File | Purpose |
|---|---|
m3-voice-agent.js |
Main agent β full voice loop (transcribe + respond + speak) |
m2-live-transcribe.js |
Transcription-only mode (no AI responses) |
gemini-live-agent.js |
Gemini Live API integration (experimental β audio bridge WIP) |
bridge.js |
WebSocket bridge (requires public URL for media streaming) |
server.js |
Legacy Retell AI integration |
- Dial β Telnyx PSTN call to Zoom dial-in number
- Join β DTMF sequence: meeting ID β skip participant ID β passcode
- Tunnel β ngrok exposes local webhook server for Telnyx events
- Transcribe β Telnyx real-time transcription (Engine B)
- Think β OpenClaw brain (if enabled) or GPT-4o-mini generates response
- Speak β Telnyx TTS speaks response into the call
- ~15s for Zoom IVR greeting
- ~13s for DTMF sequence (meeting ID + passcode)
- ~1.5s buffer before responding (configurable)
- Total join time: ~30s
- Node.js 18+
- ngrok installed and authenticated
- Telnyx account with:
- A phone number (DID)
- Call control application (outbound channel limit β₯ 2)
- OpenAI API key
- Create a Call Control Application
- Set outbound channel limit to at least 2
- Assign your phone number to the application
- Note the connection ID β that's your
TELNYX_CONNECTION_ID
The webhook URL is set dynamically at runtime via the API (no manual config needed).
The agent responds in the same language the speaker uses:
- English β English response
- Chinese (Mandarin) β Chinese response
- Mixed β dominant language
Telnyx TTS supports en-US and cmn-CN voices.
- Voice quality: Telnyx basic TTS (robotic). Upgrade path: OpenAI TTS β audio streaming.
- Latency: ~2-4s round-trip (transcription + GPT + TTS). Reducible with streaming.
- One call per instance: Run multiple instances for concurrent meetings.
- PSTN only: No Zoom SDK integration (yet). Phone audio quality.
MIT