Using Sentinel-1 SAR and Ensemble Machine Learning (2015–2025)
Cristian Espinal Maya · Santiago Jimenez Londono
School of Applied Sciences and Engineering, Universidad EAFIT, Medellin, Colombia
A reproducible, open-access framework that delivers municipality-level flood risk statistics across multiple departments of Colombia. Each department study processes Sentinel-1 C-band SAR scenes (2015–2025) within Google Earth Engine using adaptive Otsu thresholding, then integrates predictor variables into a weighted ensemble of Random Forest, XGBoost, and LightGBM.
| Department | Municipalities | Study Area | Ensemble AUC-ROC | Subdirectory |
|---|---|---|---|---|
| Antioquia | 125 | 63,612 km² | 0.94 ± 0.02 | departments/antioquia |
| Bolívar | 46 | 25,978 km² | — | departments/bolivar |
| Cauca | 42 | 29,308 km² | — | departments/cauca |
| Chocó | 30 | 46,530 km² | — | departments/choco |
| Guajira | 15 | 20,848 km² | — | departments/guajira |
| Magdalena | 30 | 23,188 km² | — | departments/magdalena |
| Nariño | 64 | 33,268 km² | — | departments/narino |
| Total | 352 | 242,732 km² |
Guajira includes a specialized Sand Exclusion Layer for arid/semi-arid terrain adaptation.
colombia-flood-risk/
├── README.md
├── departments/
│ ├── antioquia/ # Full pipeline + manuscript
│ ├── bolivar/ # Full pipeline + manuscript
│ ├── cauca/ # Full pipeline + manuscript
│ ├── choco/ # Full pipeline + manuscript
│ ├── guajira/ # Full pipeline + manuscript (arid adaptation)
│ ├── magdalena/ # Full pipeline + manuscript
│ └── narino/ # Full pipeline + manuscript
Each department subdirectory contains:
scripts/— Processing and analysis pipeline (SAR water detection, ML susceptibility, population exposure, climate analysis)overleaf/— Manuscript in LaTeX (preprint format)gee_config.py— Google Earth Engine configurationrequirements.txt— Python dependenciesREADME.md— Department-specific results and metrics
- SAR Water Detection — Sentinel-1 scenes processed with adaptive Otsu thresholding → monthly water extent composites at 10 m resolution
- Feature Engineering — 18 predictor variables (HAND, elevation, slope, SAR flood frequency, land cover, population density, etc.)
- Ensemble ML — Weighted ensemble of Random Forest, XGBoost, and LightGBM with spatial five-fold cross-validation
- Population Exposure — Susceptibility surface overlaid with 100 m population data
- Climate Analysis — ENSO influence on flood extent (La Nina vs El Nino)
If you use this work, please cite the corresponding department preprint. See each subdirectory's README for specific citation details.