Skip to content

nayyarsan/TeamsRecordingSubtitleGenerator

Repository files navigation

Webex Meeting Speaker Labeling Tool (MVP)

An offline Python tool that post-processes Webex meeting recordings to identify and label speakers using audio diarization and video analysis.

Features

  • Audio Diarization: Identifies distinct speakers from audio tracks
  • Video Analysis: Detects faces and tracks lip movements to associate speakers with visual identities
  • Speaker Naming: Automatically extracts participant names from meeting introductions
  • Multi-format Output: Generates labeled SRT subtitles and structured JSON metadata
  • Annotated Video Generation: Creates video with face detection boxes and speaker-labeled subtitles
  • Web UI Viewer: Interactive interface to view videos with synchronized transcripts
  • Auto-Transcription: Built-in Whisper ASR support when transcript is not available
  • CPU-Optimized: Runs locally without GPU requirements
  • Privacy-First: All processing happens on your machine

Use Cases

  • Post-process Webex meetings recorded in conference rooms (single microphone, multiple participants)
  • Generate accurate speaker-labeled transcripts
  • Create searchable meeting records with speaker attribution

System Requirements

  • Python 3.8+
  • CPU-based processing (no GPU required)
  • ~4GB RAM for typical meetings
  • Disk space for temporary processing files

Installation

# Clone the repository
git clone <repository-url>
cd webex-speaker-labeling

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Quick Start

Basic processing:

python process_meeting.py \
  --video path/to/meeting.mp4 \
  --transcript path/to/transcript.vtt \
  --output-dir ./output

Auto-transcription (no transcript file):

python process_meeting.py \
  --video path/to/meeting.mp4 \
  --output-dir ./output \
  --asr-model base

Generate annotated video with face detection:

python process_meeting.py \
  --video path/to/meeting.mp4 \
  --output-dir ./output \
  --annotated-video

Launch web UI viewer:

python process_meeting.py \
  --video path/to/meeting.mp4 \
  --output-dir ./output \
  --web-ui

Or view already-processed videos:

python view_videos.py --output-dir ./output --port 5000

Then open http://localhost:5000 in your browser.

Output Files

  • meeting_labeled.srt: Subtitle file with speaker names
  • meeting_labeled.json: Structured metadata with speaker segments, timestamps, and confidence scores
  • meeting_annotated.mp4: (Optional) Video with face detection boxes and speaker-labeled subtitles

Project Structure

webex-speaker-labeling/
├── src/
│   ├── audio/           # Audio processing and diarization
│   ├── video/           # Face detection and tracking
│   ├── fusion/          # Audio-visual alignment
│   ├── naming/          # Speaker name extraction
│   ├── output/          # Output generation
│   ├── visualizer.py    # Video annotation with face boxes
│   └── web_ui.py        # Flask web viewer
├── process_meeting.py   # Main CLI entry point
├── view_videos.py       # Standalone web UI viewer
├── config.yaml          # Configuration parameters
└── requirements.txt     # Python dependencies

Configuration

Edit config.yaml to customize:

  • Video frame sampling rate
  • Diarization parameters
  • Face detection thresholds
  • Name extraction patterns

Limitations (MVP)

  • Up to 10 participants
  • Meeting duration up to 2 hours
  • Processing time: ~1-2x meeting duration
  • Best results when all participants are visible on camera
  • Single-room, single-microphone setups

Future Enhancements

  • VS Code extension integration
  • Electron desktop app wrapper
  • Real-time processing capabilities
  • GPU acceleration support

Privacy & Security

  • 100% Local Processing: No data leaves your machine by default
  • Optional Cloud Services: Can be configured but disabled by default
  • No Data Collection: No telemetry or usage tracking

License

[Your License Here]

Contributing

Contributions welcome! Please read CONTRIBUTING.md for guidelines.

Support

For issues or questions, please open a GitHub issue.

About

Generate speaker-labeled subtitles for Microsoft Teams recordings using Webex AI and Hugging Face models. Includes React frontend and visualization tools.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors