Multi-modal AI system for predicting dyadic synchrony (coordinated behavior between two people) using video, audio, and fNIRS brain signals.
Three independent modality pipelines feed into a fusion model:
Video (DINOv2-small) ──→ temporal LSTM ──→ features ──┐
Audio (WavLM-base+) ──→ projection ──→ features ──┼──→ fusion ──→ synchrony
fNIRS (DDPM U-Net) ──→ per-pair enc ──→ features ──┘
Each pipeline can run independently or be combined at the fusion level. Features are pre-extracted to disk for fast downstream training.
- Backbone: DINOv2-small (384-dim, 224x224 input)
- Temporal: LSTM or attention over frame sequences
- Training: Two-stage fine-tuning (frozen backbone, then differential LR unfreeze)
- Features: Pre-extracted at multiple resolutions (112, 168, 224)
- Backbone: WavLM-base-plus (768-dim per layer, 16kHz input)
- Features: Per-layer extraction for learned layer weighting
- Supports: Audio files (WAV, FLAC) and video files (MP4, MOV via ffmpeg)
- Generative: DDPM diffusion model learns HbO/HbR hemodynamic dynamics
- Architecture: Per-pair (feature_dim=2) — one source-detector pair at a time
- Transfer: Frozen encoder features → child/adult classifier → synchrony
- Quality control: Multi-stage QC with tiered data quality (gold/standard/salvageable)
- Sweep: 4 encoder sizes (micro/small/medium/large) x 5 classifier architectures
- Strategies: Concat, gated, or temporal cross-attention
- Cross-attention: Operates on frame-level sequences (B, T, D), not pooled vectors
# Install (editable, for development)
pip install -e .
# Or with optional audio dependencies
pip install -e ".[audio]"python -m synchronai.main --video --train classifier \
--data-dir path/to/labels.csv \
--save-dir runs/video_classifier \
--backbone dinov2-small \
--temporal-aggregation lstmbash scripts/generative_pretrain.sh \
--save-dir runs/fnirs_perpair \
--per-pair \
--unet-base-width 32 \
--enable-qc --sci-threshold 0.40 --snr-threshold 2.0python scripts/extract_fnirs_features.py \
--encoder-weights runs/fnirs_perpair/fnirs_unet_encoder.pt \
--data-dirs "/path/to/NIRS_data" \
--output-dir data/fnirs_features \
--per-pair --enable-qc \
--include-tiers "gold,standard,salvageable"python scripts/train_fnirs_from_features.py \
--feature-dir data/fnirs_features \
--save-dir runs/classifier \
--pool lstm --hidden-dim 64 \
--include-tiers "gold,standard" \
--holdout-tiers "gold,salvageable"src/synchronai/
models/ # Architectures (video, audio, fNIRS, multimodal)
training/ # Training loops per modality
data/ # Datasets, preprocessing, quality control
inference/ # Prediction and generation
evaluation/ # IRR analysis, metrics
utils/ # Logging, wandb, visualization, config
scripts/
*.py # Feature extraction, training, analysis utilities
bsub/ # LSF cluster submission scripts (versioned)
bsub/archive/ # Deprecated scripts
docs/
plans/ # Transfer learning plan, upgrade roadmap
*.md # Architecture docs, results, troubleshooting
runs/ # Training outputs (gitignored)
data/ # Extracted features, labels (gitignored)
All BSub scripts live in scripts/bsub/ and include SCRIPT_VERSION for
log traceability. See CLAUDE.md for cluster conventions and
docs/TROUBLESHOOTING.md for common issues.
# Submit fNIRS per-pair pretraining (4 model sizes)
sh scripts/bsub/pre_fnirs_perpair_pretrain_bsub.sh
# Submit child/adult classification sweep (20 jobs)
sh scripts/bsub/pre_fnirs_child_adult_sweep_bsub.sh| Document | Purpose |
|---|---|
| CLAUDE.md | Claude Code instructions, cluster conventions |
| AGENTS.md | Development principles, workflows |
| Transfer Learning Plan | fNIRS pipeline roadmap (Phases 1-5) |
| Troubleshooting | Cluster and training debugging |
| Methods & Results | fNIRS generative pretraining paper-style writeup |
| Transfer Learning Fixes | Critical bugs fixed during multimodal integration |
| Multimodal Heatmaps | Visualization of fusion predictions |
- Subject-grouped splits: All train/val splits group by subject_id to prevent data leakage
- Per-pair fNIRS: Universal HbO/HbR dynamics, generalizes to any montage configuration
- Quality tiers: Gold (pristine) / standard / salvageable — holdout evaluation on each tier during training
- Pre-extracted features: Frozen backbone features saved to disk for fast classifier sweeps
- No pip install in cluster jobs: Use PYTHONPATH to avoid NFS race conditions