Skip to content

Conversation

@dimkab
Copy link
Collaborator

@dimkab dimkab commented Oct 23, 2025

No description provided.

claude and others added 18 commits October 22, 2025 21:54
This commit adds a complete training pipeline for the InteractionParticle model
on boids trajectory data, with visualization and comparison to true boid rules.

Key Features:
- InteractionParticle model adapted from decomp-gnn (Battaglia et al. 2016)
- Graph Neural Network that learns interaction forces between particles
- Learns from relative positions, velocities, and distances
- Includes learnable particle embeddings

Components Added:
- model.py: InteractionParticle model implementation with MLP edge functions
- train.py: Training pipeline with data loading, graph construction, and optimization
- plotting.py: Visualization tools for learned interaction functions
- run_training.py: Main executable script with CLI arguments
- example.py: Quick example demonstrating usage on toy data
- README.md: Comprehensive documentation

Comparison with True Boids:
- Visualizes learned interaction functions vs true boid rules
- Compares separation (inverse-square repulsion), alignment, and cohesion
- Plots show distance-dependent forces learned from trajectory data

Boid Parameters from config.yaml:
- min_separation: 20.0 (separation distance)
- neighborhood_dist: 80.0 (alignment/cohesion range)
- separation_weight: 15.0, alignment_weight: 1.0, cohesion_weight: 0.5

Usage:
python -m collab_env.gnn.interaction_particles.run_training \
    --dataset collab_env/data/boids/boid_single_species_basic.pt \
    --epochs 100 --batch-size 32

Output:
- Model checkpoints (best_model.pt, final_model.pt)
- Plots: training history, interaction functions, comparison with true rules
- Training log and configuration files

Reference: https://github.com/saalfeldlab/decomp-gnn

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Updated the training pipeline to work with existing 2D boids data from
docs/gnn/0a-Simulate_Boid_2D.ipynb while maintaining compatibility with 3D boids.

Key Changes:

Data Loading (train.py):
- Handle AnimalTrajectoryDataset format (returns position, species tuples)
- Auto-detect dataset type (2D vs 3D, Dataset vs Tensor)
- Extract positions from dataset objects correctly
- Maintained backward compatibility with raw tensors

2D Boid Rules (plotting.py):
- Updated separation: linear repulsion (avoid_factor * distance) vs inverse-square
- Updated alignment: step function at visual_range (50 pixels)
- Updated cohesion: linear attraction (centering_factor * distance)
- Changed distance units from arbitrary to pixels (480x480 scene)
- Parameters from boids_gnn_temp/boid.py:
  * visual_range: 50 pixels (for alignment & cohesion)
  * min_distance: 15 pixels (for separation)
  * avoid_factor: 0.05
  * matching_factor: 0.5
  * centering_factor: 0.005

Configuration Loading (run_training.py):
- Auto-detect config files (.pt for 2D, .yaml for 3D)
- Infer config path from dataset path (e.g., dataset.pt -> dataset_config.pt)
- Load species_configs from .pt files for 2D boids
- Fallback to sensible 2D defaults if no config found
- Updated default dataset path to simulated_data/

Convenience Script (train_2d_boids.py):
- Quick start script specifically for 2D boids
- Automatically calculates correct visual_range (50/480 ≈ 0.104 normalized)
- Preset options for all available 2D datasets:
  * boid_single_species_basic.pt
  * boid_single_species_noisy.pt
  * boid_single_species_high_cluster_high_speed.pt
  * boid_single_species_short.pt
- Quick test mode (--quick for 10 epochs)

Documentation (README.md):
- Added Quick Start section highlighting 2D boids
- Documented differences between 2D and 3D boid rules
- Added separate usage examples for 2D vs 3D
- Listed all available 2D datasets
- Explained parameter mappings (visual_range, centering_factor, etc.)

Usage:
# Simple 2D training
python -m collab_env.gnn.interaction_particles.train_2d_boids

# Quick test
python -m collab_env.gnn.interaction_particles.train_2d_boids --quick

# Different dataset
python -m collab_env.gnn.interaction_particles.train_2d_boids \
    --dataset simulated_data/boid_single_species_noisy.pt

The model will now correctly learn and compare against the true 2D boid
rules (linear separation, step-function alignment, linear cohesion) instead
of the 3D rules (inverse-square separation, etc.).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Removed all 3D boids support and complex format handling to make the code
cleaner and focused solely on the 2D boids data from docs/gnn/0a-Simulate_Boid_2D.ipynb.

Data Loading (train.py):
- Removed support for raw tensors, dict format, 3D data
- Now only loads AnimalTrajectoryDataset format
- Simplified to directly extract (positions, species) tuples
- Removed complex normalization (data already normalized to [0,1])
- Set p_range = 1.0 since 2D data is pre-normalized

Plotting (plotting.py):
- Removed all 3D boid rule implementations
- Only 2D rules remain: linear separation, step alignment, linear cohesion
- Changed all distances from "original units" to "pixels"
- Simplified comparison plot titles to emphasize 2D
- Used config dict directly (no .get() fallbacks)
- Added scene_size parameter (default 480.0)
- Cleaner force function implementations

Configuration (run_training.py):
- Removed YAML config support (was for 3D boids)
- Only loads .pt config files (species_configs)
- Simplified auto-detection logic
- Cleaner default config handling
- Updated all logging to say "2D Boids"
- Removed complex format branching

Documentation (README.md):
- Removed all mentions of 3D boids
- Removed "both 2D and 3D" claims
- Simplified to single "2D Boids Data" title
- Removed 3D usage examples
- Removed 3D boid rule documentation
- Cleaner, more focused instructions
- Emphasized AnimalTrajectoryDataset format
- Updated data format section to be 2D-specific

Benefits:
- Much simpler codebase (~100 lines removed)
- Easier to understand and maintain
- No confusion about which format to use
- Faster data loading (no format detection)
- Clearer error messages
- More focused documentation

The code now has a single clear purpose: train InteractionParticle models
on 2D boids data from the existing simulated_data/ directory.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Removed unnecessary example.py script since train_2d_boids.py provides
a better quick start with real data.

Changes:
- Deleted example.py (139 lines of toy data code)
- Updated __init__.py docstring to say "2D boids" and add quick start
- Updated train.py docstring to say "2D boids data"
- Fixed lingering "2D or 3D boids" help text to just "2D boids"
- All docstrings now consistently refer to 2D boids only

Final module structure:
├── __init__.py          (535 bytes)
├── model.py             (8.2K) - InteractionParticle GNN
├── train.py             (11K) - Training pipeline
├── plotting.py          (14K) - Visualization & comparison
├── run_training.py      (9.6K) - Main CLI
├── train_2d_boids.py    (2.4K) - Quick start script
└── README.md            (10K) - Documentation

Total: ~56K of clean, focused code for 2D boids training

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Removed three intermediate documentation files from root:
- INTERACTION_PARTICLES_SUMMARY.md (original implementation notes)
- INTERACTION_PARTICLES_2D_UPDATE.md (2D adaptation details)
- INTERACTION_PARTICLES_FINAL.md (final summary)

All documentation is now in the module's README.md:
  collab_env/gnn/interaction_particles/README.md

No need for multiple markdown files tracking the development process.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Replaced the Python wrapper script with a simpler shell script containing
commented examples for different training scenarios.

Changes:
- Deleted train_2d_boids.py (74 lines of Python wrapper code)
- Created train_2d_boids.sh (75 lines of bash with examples)
- Updated README.md to reference the shell script
- Updated __init__.py quick start to use shell script

Benefits of shell script approach:
- Easier to read and modify for users
- No Python wrapper logic needed
- Clear examples for all common use cases
- Just uncomment the command you want to run
- All options visible at a glance

The shell script includes examples for:
1. Quick test (10 epochs) - default, uncommented
2. Full training (100 epochs)
3. Noisy dataset
4. High cluster dataset
5. Short dataset
6. High capacity model (256 hidden, 32 embed, 4 layers)
7. Custom visual range
8. Evaluation only mode

Each example shows the complete command with all parameters:
- --dataset (which data file)
- --epochs (how many)
- --batch-size (default 32)
- --visual-range (0.104 for 2D boids)
- --save-dir (where to save results)

Usage:
  ./collab_env/gnn/interaction_particles/train_2d_boids.sh

Now we have one Python training script (run_training.py) and one
shell script with examples - cleaner separation of concerns.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
New features:
- generate_rollout(): Autoregressive multi-step trajectory prediction
- evaluate_rollout(): Evaluate model on validation set with rollouts
- plot_rollout_comparison(): Side-by-side ground truth vs predicted trajectories
- plot_rollout_error_over_time(): Error accumulation over timesteps
- create_rollout_report(): Comprehensive visualization and metrics report

CLI additions:
- --evaluate-rollout: Enable rollout evaluation
- --n-rollout-steps: Number of steps for rollout (default: 50)

Metrics reported:
- Mean position error (± std)
- Mean velocity error (± std)
- Per-trajectory error analysis

Validation dataset:
- Uses same 80/20 train/val split as training
- Ensures model tested on unseen trajectories

Updated documentation:
- README.md: Added "Rollout Evaluation" section with examples
- train_2d_boids.sh: Added rollout evaluation example
- __init__.py: Exported new rollout functions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants