Voice Agent Orchestrator

Multi-agent voice AI orchestration framework for home services, built on LiveKit Agents.

Demonstrates deterministic conversation control over probabilistic LLM outputs through finite state machines, slot-filling with confirmation gates, multi-layer guardrails, and an evaluation framework for continuous improvement.

Architecture

Caller  ──>  Deepgram STT  ──>  Agent Orchestrator  ──>  Cartesia TTS  ──>  Caller
                                        │
                          ┌─────────────┼─────────────┐
                          │             │             │
                    IntakeAgent   BookingAgent   InfoAgent
                          │             │             │
                          └──── EscalationAgent ──────┘

Agents

Agent	Responsibility	Tools
IntakeAgent	Greet, identify intent, route	`route_to_booking`, `route_to_info`, `route_to_emergency`, `identify_caller`
BookingAgent	Slot collection, confirmation gate, booking	`record_*` (7 slots), `confirm_booking_details`, `check_and_book`, `correct_detail`, `escalate_to_human`
InfoAgent	Service/pricing questions	`get_service_info`, `list_all_services`, `route_to_booking`
EscalationAgent	Emergency handling, human handoff	`complete_handoff`, `record_callback_number`, `provide_emergency_guidance`

Conversation State Machine

12 states with 23 explicit transitions ensuring deterministic conversation flow:

GREETING → INTENT_DETECTION → SERVICE_SELECTION → SLOT_FILLING → SLOT_CONFIRMATION
    → AVAILABILITY_CHECK → BOOKING_CREATION → CONFIRMATION → FAREWELL

Branch paths:
  INTENT_DETECTION → INFO_RESPONSE (service questions)
  INTENT_DETECTION → ESCALATION (emergency / human request)
  SLOT_FILLING → ERROR_RECOVERY (max retries)
  SLOT_CONFIRMATION → SLOT_FILLING (caller corrects info)

Slot-Filling Pattern

Three-phase lifecycle: Collect → Validate → Confirm

7 slots with per-slot validators (phone regex, service catalog match, address length)
Correction history tracking for evaluation
Confirmation gate prevents booking execution until explicit caller approval

Guardrails

Four independent layers composed into pre-LLM and post-LLM pipelines:

ScopeGuardrail — validates services against catalog, rejects off-topic requests
HallucinationGuardrail — flags unverified claims (guarantees, warranties, etc.)
PersonaGuardrail — enforces voice style, blocks AI self-references and markdown
EscalationGuardrail — detects emergencies, frustration keywords, error thresholds

Setup

# Clone and install
git clone <repo-url> && cd voice-agent-orchestrator
pip install -e ".[dev]"

# Configure environment
cp .env.example .env
# Edit .env with your API keys

Usage

# Run tests
make test

# Run with coverage
make test-cov

# Run evaluation on sample transcripts
make eval

# Start voice agent (requires LiveKit server)
make run

# Start in console mode (text-only, no mic)
make console

Evaluation Framework

15 KPIs across 5 categories, 10 failure pattern detectors, and auto-improvement suggestions.

python -m src.evaluation.run_eval --transcripts sample_transcripts/ --verbose

Metrics

Category	KPIs
Task Success	Success rate, first-call resolution, containment rate
Slot Quality	Fill rate, correction rate, avg attempts, confirmation pass rate
Efficiency	Avg turns to booking, avg duration, handoff rate
Errors	Error rate, recovery success rate, escalation rate
Guardrails	Scope violation rate, hallucination detection rate

Failure Detection

Detects 10 patterns: repeated slot failure, confirmation loops, wrong agent handoff, scope violations, caller frustration, hallucinated info, missed intent, incomplete booking, unnecessary escalation, slow responses.

Auto-Improvement

Maps each detected failure to a specific prompt modification with expected impact. Prioritized by severity (critical > high > medium > low).

Configuration

All business values are configurable via environment variables — nothing is hardcoded in agent logic. See .env.example for all options.

Config validation runs at startup — invalid values (e.g. temperature outside 0.0-2.0, negative thresholds) raise clear error messages immediately rather than causing obscure failures later.

Utilities

Phone Normalization

src/utils.normalize_phone() strips formatting characters and normalizes international prefixes. Used by both the slot manager and customer lookup to ensure consistent matching.

Correlation ID Logging

src/logging_context provides set_call_id() and get_call_logger() for tracing a single caller's journey across the multi-agent system. All agent and tool modules emit logs with a call_id field.

Test Isolation

Each mock tool module (booking, customer, availability) exposes a reset() function. An autouse pytest fixture in conftest.py calls all resets before each test, preventing cross-test state pollution.

Project Structure

voice-agent-orchestrator/
├── src/
│   ├── agents/           # 4 specialized agents
│   ├── conversation/     # State machine, slot manager, guardrails
│   ├── evaluation/       # Metrics, failure detection, auto-improvement
│   ├── prompts/          # System prompts and templates
│   ├── schemas/          # Pydantic models
│   ├── tools/            # Mock service integrations (with TypedDict returns)
│   ├── config.py         # Centralized configuration with startup validation
│   ├── logging_context.py # Correlation ID logging
│   └── utils.py          # Shared utilities (phone normalization)
├── tests/                # Test suite with autouse reset fixtures
├── sample_transcripts/   # 5 evaluation scenarios
├── main.py               # LiveKit entry point
├── pyproject.toml
└── Makefile

Tech Stack

Voice Pipeline: LiveKit Agents SDK, Deepgram STT, OpenAI GPT-4o-mini, Cartesia TTS, Silero VAD
Framework: Python 3.11+, Pydantic, asyncio
Testing: pytest, ruff, mypy
CI: GitHub Actions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Agent Orchestrator

Architecture

Agents

Conversation State Machine

Slot-Filling Pattern

Guardrails

Setup

Usage

Evaluation Framework

Metrics

Failure Detection

Auto-Improvement

Configuration

Utilities

Phone Normalization

Correlation ID Logging

Test Isolation

Project Structure

Tech Stack

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.github/workflows		.github/workflows
sample_transcripts		sample_transcripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
console_demo.py		console_demo.py
main.py		main.py
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Voice Agent Orchestrator

Architecture

Agents

Conversation State Machine

Slot-Filling Pattern

Guardrails

Setup

Usage

Evaluation Framework

Metrics

Failure Detection

Auto-Improvement

Configuration

Utilities

Phone Normalization

Correlation ID Logging

Test Isolation

Project Structure

Tech Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages