An offline Python tool that post-processes Webex meeting recordings to identify and label speakers using audio diarization and video analysis.
- Audio Diarization: Identifies distinct speakers from audio tracks
- Video Analysis: Detects faces and tracks lip movements to associate speakers with visual identities
- Speaker Naming: Automatically extracts participant names from meeting introductions
- Multi-format Output: Generates labeled SRT subtitles and structured JSON metadata
- Annotated Video Generation: Creates video with face detection boxes and speaker-labeled subtitles
- Web UI Viewer: Interactive interface to view videos with synchronized transcripts
- Auto-Transcription: Built-in Whisper ASR support when transcript is not available
- CPU-Optimized: Runs locally without GPU requirements
- Privacy-First: All processing happens on your machine
- Post-process Webex meetings recorded in conference rooms (single microphone, multiple participants)
- Generate accurate speaker-labeled transcripts
- Create searchable meeting records with speaker attribution
- Python 3.8+
- CPU-based processing (no GPU required)
- ~4GB RAM for typical meetings
- Disk space for temporary processing files
# Clone the repository
git clone <repository-url>
cd webex-speaker-labeling
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtBasic processing:
python process_meeting.py \
--video path/to/meeting.mp4 \
--transcript path/to/transcript.vtt \
--output-dir ./outputAuto-transcription (no transcript file):
python process_meeting.py \
--video path/to/meeting.mp4 \
--output-dir ./output \
--asr-model baseGenerate annotated video with face detection:
python process_meeting.py \
--video path/to/meeting.mp4 \
--output-dir ./output \
--annotated-videoLaunch web UI viewer:
python process_meeting.py \
--video path/to/meeting.mp4 \
--output-dir ./output \
--web-uiOr view already-processed videos:
python view_videos.py --output-dir ./output --port 5000Then open http://localhost:5000 in your browser.
- meeting_labeled.srt: Subtitle file with speaker names
- meeting_labeled.json: Structured metadata with speaker segments, timestamps, and confidence scores
- meeting_annotated.mp4: (Optional) Video with face detection boxes and speaker-labeled subtitles
webex-speaker-labeling/
├── src/
│ ├── audio/ # Audio processing and diarization
│ ├── video/ # Face detection and tracking
│ ├── fusion/ # Audio-visual alignment
│ ├── naming/ # Speaker name extraction
│ ├── output/ # Output generation
│ ├── visualizer.py # Video annotation with face boxes
│ └── web_ui.py # Flask web viewer
├── process_meeting.py # Main CLI entry point
├── view_videos.py # Standalone web UI viewer
├── config.yaml # Configuration parameters
└── requirements.txt # Python dependencies
Edit config.yaml to customize:
- Video frame sampling rate
- Diarization parameters
- Face detection thresholds
- Name extraction patterns
- Up to 10 participants
- Meeting duration up to 2 hours
- Processing time: ~1-2x meeting duration
- Best results when all participants are visible on camera
- Single-room, single-microphone setups
- VS Code extension integration
- Electron desktop app wrapper
- Real-time processing capabilities
- GPU acceleration support
- 100% Local Processing: No data leaves your machine by default
- Optional Cloud Services: Can be configured but disabled by default
- No Data Collection: No telemetry or usage tracking
[Your License Here]
Contributions welcome! Please read CONTRIBUTING.md for guidelines.
For issues or questions, please open a GitHub issue.