This project automates the download and processing of public finance datasets from the Peruvian Ministry of Economy and Finance (MEF), specifically using the datastorefiles service.
It supports downloading both .csv and .zip files, extracting zipped files, and organizing them into a clean folder structure for further processing.
project-root/
│
├── data/
│ ├── bronze/ # Raw downloaded files (CSV and extracted ZIPs)
│ └── silver/ # (Planned) Cleaned/processed datasets
│
├── downloader.py # Main script to download and extract data
├── .gitignore # Ignores raw data files
└── README.md # Project documentation
- ⏱️ Decorator to time each download
- 🌐 Downloads
.csvand.zipfiles directly from MEF OpenData Portal - 📦 Automatically extracts
.zipfiles and renames them to match naming convention
git clone https://github.com/Tooruogata/mef-opendata-analysis.git
cd mef-opendata-analysisdocker build -t mef-opendata-analysis:latest -f .devcontainer/Dockerfile .
docker run -dit --name mef-opendata-analysis -v "$repopath:/workspace" -w /workspace mef-opendata-analysis:latest.gitignore is set up to ignore large raw data files:
**/data/**/*.zip
**/data/**/*.csv
**/data/**/*.xlsx
**/data/**/*.xls
**/data/**/*.json
!**/data/**/*.mdThis means:
- All
.zip,.csv,.xlsx,.xls, and.jsonfiles under anydata/folder are ignored.
list_dataset = [
('SIAF', '2025-Ingreso-Diario', 'ingreso_2025', 'csv'),
('SIAF', '2025-Ingreso-Diario', 'ingreso_2025', 'zip'),
]The script will download:
https://fs.datosabiertos.mef.gob.pe/datastorefiles/2025-Ingreso-Diario.csvhttps://fs.datosabiertos.mef.gob.pe/datastorefiles/2025-Ingreso-Diario.zip
It will save them as:
data/bronze/SIAF_ingreso_2025.csvdata/bronze/SIAF_fzip_ingreso_2025.csv(extracted and renamed from ZIP)
- Move cleaned data to
silver/ - Combine the processed datasets and generate summary analytics in the
gold/layer