This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
End-to-end ML training pipeline for wildfire detection using YOLO models. Handles data preparation, model training, hyperparameter optimization, model export (ONNX/NCNN), and GitHub releases.
uv sync # Install dependencies into virtual environment
uv run pre-commit install # Install git hooksData is managed via DVC with an S3 remote (s3://pyronear-ml/dvc). Requires AWS credentials under the pyronear profile.
# Development
uv run ruff check . # Lint
uv run ruff format . # Format
uv run mypy src/ # Type check
uv run pytest # Run tests (or: make run_test_suite)
# DVC Pipeline
dvc dag # View pipeline DAG
dvc repro # Run full pipeline
# MLFlow
make mlflow_start # Start experiment tracking UI at localhost:5000
make mlflow_stop
# Hyperparameter search
make run_yolo_wide_hyperparameter_search # 50 iterations, fast
make run_yolo_narrow_hyperparameter_search # 5 iterations, deep
# Benchmark
make run_yolo_benchmark # Generate benchmark CSV from trained models01_raw/ (wildfire dataset)
→ build_model_input → 03_model_input/ (YOLO format, 5% sample)
→ train_yolo_baseline_small / train_yolo_baseline / train_yolo_best
→ build_manifest_yolo_best → 06_reporting/
→ export_yolo_best (ONNX + NCNN, cpu/mps matrix) → 04_models/yolo-export/
-
src/pyro_train/— Library code:model/yolo/train.py— Core training function with default hyperparametersmodel/yolo/utils.py— YOLO version/size enumsmodel/yolo/hyperparameters/space.py— Hyperparameter space parsingdata/— YAML utilities for dataset configsutils.py— Shared utilities (file hashing)
-
scripts/model/yolo/— CLI entry points:train.py— Training CLIexport.py— Export to ONNX/NCNNhyperparameter_search.py— Random searchbenchmark.py— Benchmark trained modelsbuild_manifest.py— Build model metadataconfigs/— Training configs (baseline.yaml,best.yaml)spaces/— Hyperparameter search spaces (wide.yaml,narrow.yaml,default.yaml)
-
scripts/release.py— GitHub release automation (requiresGITHUB_ACCESS_TOKEN)
- DVC manages data versioning and pipeline reproducibility; never commit data files directly.
- MLFlow tracks all training experiments; experiments are gitignored but tracked via DVC.
- Model releases follow an adjective+animal naming convention with matching initials (e.g., "dazzling dragonfly").
- Exports target both ONNX and NCNN for edge/mobile deployment.
- Pre-commit hooks prevent direct commits to
main; always use pull requests.