Skip to content

feat: periodically refresh estimates from tercet_missing_codes.csv #44

@bk86a

Description

@bk86a

Problem

The tercet_missing_codes.csv file in this repository is updated over time with newly discovered postal codes that are missing from TERCET data, along with their estimated NUTS mappings. However, the running service only imports this file when explicitly triggered via scripts/import_estimates.py. If the CSV is updated (e.g. via automated monitor contributions), the deployed service remains unaware of the new entries until a manual re-import and redeploy.

Proposal

The service should periodically fetch the latest tercet_missing_codes.csv directly from this GitHub repository and re-import estimates automatically. This could be implemented as:

  1. Periodic fetch from GitHub: On a configurable interval (e.g. daily), fetch the raw CSV from https://raw.githubusercontent.com/bk86a/PostalCode2NUTS/main/tercet_missing_codes.csv, compare against the currently loaded data (e.g. by hash or row count), and re-run the import logic if changed.
  2. Startup + schedule: Always fetch and import on startup, plus schedule a periodic refresh (e.g. via asyncio background task in the FastAPI lifespan).

This would make deployed instances self-updating as new missing postal codes are discovered and merged into the repository, without requiring a redeploy.

Alternatives considered

  • Local file watcher: Only works if the CSV is mounted/updated on the host — doesn't help containerised deployments.
  • Webhook-based reload: More responsive but adds complexity (needs a webhook endpoint and GitHub webhook configuration).
  • Keep current manual import: Simpler, but means the service can go stale between deploys.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions