-
Notifications
You must be signed in to change notification settings - Fork 0
feat: periodically refresh estimates from tercet_missing_codes.csv #44
Description
Problem
The tercet_missing_codes.csv file in this repository is updated over time with newly discovered postal codes that are missing from TERCET data, along with their estimated NUTS mappings. However, the running service only imports this file when explicitly triggered via scripts/import_estimates.py. If the CSV is updated (e.g. via automated monitor contributions), the deployed service remains unaware of the new entries until a manual re-import and redeploy.
Proposal
The service should periodically fetch the latest tercet_missing_codes.csv directly from this GitHub repository and re-import estimates automatically. This could be implemented as:
- Periodic fetch from GitHub: On a configurable interval (e.g. daily), fetch the raw CSV from
https://raw.githubusercontent.com/bk86a/PostalCode2NUTS/main/tercet_missing_codes.csv, compare against the currently loaded data (e.g. by hash or row count), and re-run the import logic if changed. - Startup + schedule: Always fetch and import on startup, plus schedule a periodic refresh (e.g. via
asynciobackground task in the FastAPI lifespan).
This would make deployed instances self-updating as new missing postal codes are discovered and merged into the repository, without requiring a redeploy.
Alternatives considered
- Local file watcher: Only works if the CSV is mounted/updated on the host — doesn't help containerised deployments.
- Webhook-based reload: More responsive but adds complexity (needs a webhook endpoint and GitHub webhook configuration).
- Keep current manual import: Simpler, but means the service can go stale between deploys.