-
Notifications
You must be signed in to change notification settings - Fork 0
feat: investigate using happyGISCO for improved NUTS estimation #45
Description
Context
Eurostat maintains happyGISCO, a Python client for GISCO web services. It provides two relevant capabilities:
place2coord()— geocodes a place name to coordinatescoord2nuts()— returns the NUTS region for given coordinates
This project's API works with country + postal code only (no coordinates), so the interesting part is that happyGISCO could bridge the gap: take a postal code that is missing from TERCET, geocode it to coordinates, and then look up the NUTS region — all via Eurostat's own services.
Potential improvements
1. Fallback for unknown postal codes
When the API receives a postal code not found in TERCET or the estimates table, it currently returns 404. With happyGISCO, it could instead:
- Geocode the postal code + country via
place2coord() - Pass the resulting coordinates to
coord2nuts() - Return the NUTS region as an
approximatematch
This would reduce 404s without requiring the postal code to be pre-registered in any data file.
2. Better NUTS estimation in the monitor
The postal code monitor currently estimates NUTS for missing codes by querying neighboring postal codes (±1, ±2, etc.) — a rough heuristic. Instead, it could geocode the postal code via place2coord() and then use coord2nuts() for a more authoritative NUTS lookup, without relying on Nominatim coordinates.
3. Validation of existing estimates
place2coord() + coord2nuts() could cross-validate entries in tercet_missing_codes.csv — geocode each postal code via GISCO's own geocoder, look up the NUTS region, and flag any where the estimated NUTS3 disagrees.
Questions to investigate
- Does
place2coord()reliably resolve European postal codes (e.g.place2coord("1010", country="AT"))? - Is GISCO's
find-nutsservice rate-limited? What throughput can we expect for batch validation? - How does it handle edge cases (coastal areas, border regions, overseas territories)?
- Would adding
happyGISCOas a dependency be appropriate, or should we call the GISCO API directly? - Licensing compatibility (happyGISCO is EUPL, same as this project)