A high-performance satellite imagery dataset creation tool for computer vision.
Creating machine learning datasets from satellite imagery is traditionally a frustrating experience. Wrestling with heavy, notoriously complex GIS libraries like GDAL is a massive pain point for researchers who just want to train models.
Existing geospatial ecosystems are heavily analysis-first. mapcv is different. It is explicitly designed as a data creation-first tool. It provides a fast, end-to-end pipeline written in Python and Rust specifically optimized for fetching map tiles, rasterizing complex labels (KML/GeoJSON), and splitting areas into uniform, ML-ready patches. The target audience includes computer vision researchers, data scientists, and ML engineers who need an efficient and reliable way to prepare high-quality satellite datasets for training segmentation models without the traditional GIS headaches.
pip install mapcvRequires Python 3.10 or higher. Pre-built wheels cover Linux, macOS, and Windows.
mapcv init my_dataset.yamlOpen the file and fill in your region, tile source, and (optionally) label path. Everything else has sensible defaults.
Note: mapcv supports tile sources that serve standard 256x256 pixel tiles (OpenStreetMap, Esri, Google Satellite, CartoDB, and most providers). Sources serving 512x512 tiles are not supported and will produce incorrectly scaled patches.
mapcv generate my_dataset.yamlThis fetches tiles, rasterizes labels, extracts patches, and writes everything to the output directory specified in your config.
mapcv split ./output --test-ratio 0.15 --val-ratio 0.10Re-runs the train/val/test split from the existing manifest.json without re-downloading anything.
Full documentation including configuration reference, CLI reference, and API reference is available at tahamukhtar20.github.io/mapcv.
Contributions are welcome. Please review the Contributing Guide and Code of Conduct before opening a pull request.
(Citation information will be added after publication.)
Muhammad Taha Mukhtar · tahamukhtar20+mapcv@gmail.com
This project is licensed under the MIT License — see the LICENSE file for details.