Conversation
…o param id and vice versa using either the paramdb directly or a local copy
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #185 +/- ##
========================================
Coverage 62.33% 62.33%
========================================
Files 303 303
Lines 11601 11601
Branches 1036 1036
========================================
Hits 7232 7232
Misses 4369 4369 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…nline calls or default yaml
There was a problem hiding this comment.
Pull request overview
Adds a prototype ParamDB API to the Python bindings, enabling ECMWF parameter metadata lookups (shortname/longname/ID/units) in offline mode (bundled YAML) and online mode (ECMWF API with local caching). It also introduces a regeneration script and new YAML metadata files.
Changes:
- Added
ParamDBclass to support offline YAML-backed lookups and online API fetching with a JSON cache. - Added a standalone script to regenerate parameter/unit YAML metadata from the ECMWF API.
- Added unit metadata YAML and a comprehensive Python test suite for
ParamDB.
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| share/metkit/unit_metadata.yaml | Adds bundled unit metadata YAML (offline dataset). |
| python/pymetkit/tests/test_paramdb.py | Adds unit tests for ParamDB (offline lookups + online caching behaviors). |
| python/pymetkit/src/pymetkit/pymetkit.py | Implements ParamDB in the main Python module and adds optional imports for online mode/caching. |
| python/pymetkit/src/pymetkit/generate_parameter_metadata.py | Adds a script to fetch/emit parameter_metadata.yaml and unit_metadata.yaml from the ECMWF API. |
| python/pymetkit/src/pymetkit/init.py | Exports ParamDB from the package. |
| pyproject.toml | Adds runtime deps (pyyaml, requests, platformdirs) and declares YAML package data. |
| @pytest.mark.parametrize( | ||
| "mode, expectation", | ||
| [ | ||
| ["offline", does_not_raise()], | ||
| ["online", does_not_raise()], # network call; skipped below if no requests | ||
| ["invalid", pytest.raises(ValueError)], | ||
| ["OFFLINE", pytest.raises(ValueError)], | ||
| ], | ||
| ) | ||
| def test_constructor_mode_validation(mode, expectation): | ||
| """Only 'online' and 'offline' are accepted mode values.""" | ||
| if mode == "online": | ||
| pytest.importorskip("requests") | ||
| with expectation: | ||
| ParamDB(mode=mode) | ||
|
|
There was a problem hiding this comment.
test_constructor_mode_validation instantiates ParamDB(mode="online"), which triggers a real HTTP request to the ECMWF API. Because requests is now a required dependency, this test will run in CI and become flaky or fail in offline environments. Consider monkeypatching pymetkit.pymetkit._requests (as done in the other online tests) or patching ParamDB._load_online to a no-op so this test only validates mode parsing without network access.
pyproject.toml
Outdated
| "metkit_c.h", | ||
| "parameter_metadata.yaml" | ||
| ] | ||
|
|
There was a problem hiding this comment.
parameter_metadata.yaml is declared as package data for the pymetkit package, but the repository’s YAML lives under share/metkit/parameter_metadata.yaml (there is no python/pymetkit/src/pymetkit/parameter_metadata.yaml). As a result, installed wheels/sdists will likely not contain the bundled YAML and offline ParamDB() will fail. Either move/copy the YAML into the package directory at build time, or adjust packaging and the loader to use importlib.resources/pkgutil to read the bundled file reliably.
| "metkit_c.h", | |
| "parameter_metadata.yaml" | |
| ] | |
| "metkit_c.h" | |
| ] | |
| [tool.setuptools.data-files] | |
| "share/metkit" = [ | |
| "share/metkit/parameter_metadata.yaml" | |
| ] |
| @staticmethod | ||
| def _find_offline_yaml() -> Path: | ||
| """Locate ``parameter_metadata.yaml``, searching in order: | ||
|
|
||
| 1. Next to this module file (installed package layout). | ||
| 2. ``<repo_root>/share/metkit/`` (development tree layout after the | ||
| YAML files were moved out of the Python package directory). | ||
| """ | ||
| candidates = [ | ||
| Path(__file__).parent / "parameter_metadata.yaml", | ||
| Path(__file__).parents[4] / "share" / "metkit" / "parameter_metadata.yaml", | ||
| ] | ||
| for path in candidates: | ||
| if path.exists(): | ||
| return path | ||
| raise FileNotFoundError( | ||
| "parameter_metadata.yaml not found. Searched:\n" | ||
| + "\n".join(f" {p}" for p in candidates) | ||
| ) |
There was a problem hiding this comment.
The offline YAML lookup relies on Path(__file__).parents[4] / "share/metkit/...", which only works in the repo checkout layout. In an installed package this path will typically point somewhere like <venv>/lib/.../share/metkit/ and the bundled YAML won’t be found. Prefer loading the YAML via importlib.resources from packaged data, or ensure the YAML is placed next to this module in the built distribution and remove the brittle repo-root heuristic.
| request. The cache is stored under the OS user-cache directory | ||
| (e.g. ``~/.cache/pymetkit/`` on Linux, ``~/Library/Caches/pymetkit/`` | ||
| on macOS) and is keyed to the API URL so that a different endpoint | ||
| produces a separate file. |
There was a problem hiding this comment.
The class docstring says the cache file is “keyed to the API URL so that a different endpoint produces a separate file”, but the implementation always uses the fixed _CACHE_FILENAME and does not incorporate _API_URL. Either update the implementation to include a URL-derived component (e.g., hash) in the filename or adjust the docstring to match the behavior.
| request. The cache is stored under the OS user-cache directory | |
| (e.g. ``~/.cache/pymetkit/`` on Linux, ``~/Library/Caches/pymetkit/`` | |
| on macOS) and is keyed to the API URL so that a different endpoint | |
| produces a separate file. | |
| request. The cache is stored under the OS user-cache directory | |
| (e.g. ``~/.cache/pymetkit/`` on Linux, ``~/Library/Caches/pymetkit/`` | |
| on macOS) using the fixed filename defined by | |
| ``_CACHE_FILENAME``. |
| return | ||
|
|
||
| # Fetch from the API | ||
| response = _requests.get(self._API_URL) |
There was a problem hiding this comment.
_requests.get(self._API_URL) is called without a timeout. In library code this can hang indefinitely on network issues, which is an operational reliability problem. Consider adding a reasonable default timeout (and/or making it configurable) for online mode requests.
| response = _requests.get(self._API_URL) | |
| request_timeout = 10 | |
| response = _requests.get(self._API_URL, timeout=request_timeout) |
| PARAM_URL = "https://codes.ecmwf.int/parameter-database/api/v1/param/" | ||
| UNIT_URL = "https://codes.ecmwf.int/parameter-database/api/v1/unit/" | ||
| PARAM_OUTPUT = Path(__file__).parent / "parameter_metadata.yaml" | ||
| UNIT_OUTPUT = Path(__file__).parent / "unit_metadata.yaml" | ||
|
|
There was a problem hiding this comment.
The generator writes parameter_metadata.yaml and unit_metadata.yaml next to the Python module (Path(__file__).parent), but the PR adds the YAML under share/metkit/ and ParamDB’s fallback search also expects share/metkit/parameter_metadata.yaml. Regenerating with this script will therefore write to a different location than the committed data. Align the output paths with the repository’s canonical YAML location (or update the rest of the codebase to consume the module-adjacent files).
| print(f"Fetching units from {url} ...") | ||
| response = requests.get(url) | ||
| response.raise_for_status() | ||
| raw_units = response.json() |
There was a problem hiding this comment.
Both API calls use requests.get(...) without a timeout. If the endpoint stalls, this script can block indefinitely. Consider providing a default timeout (and possibly a retry strategy) to make regeneration more robust.
| if key == "id": | ||
| continue | ||
| entry[key] = value | ||
|
|
There was a problem hiding this comment.
fetch_units() computes a normalised unit string (name = raw.get("name") or raw.get("symbol") ...) for unit_map, but the YAML output preserves the raw keys and does not ensure there is a canonical name field. If the API returns symbol/label instead of name, unit_metadata.yaml will lack the expected key. Consider explicitly setting entry["name"] = name (and/or dropping the alternate keys) to keep the output schema stable.
| # Always emit a canonical name field so unit_metadata.yaml has a stable schema | |
| entry["name"] = name |
- Remove parameter_metadata.yaml from pyproject.toml package-data (file lives in share/metkit/, not inside the Python package) - Add importlib.resources-based lookup as first candidate in _find_offline_yaml so installed wheels can locate the bundled YAML correctly - Fix ParamDB class docstring: cache file is keyed to fixed _CACHE_FILENAME, not the API URL - Add _REQUEST_TIMEOUT = 30 class constant and pass it to _requests.get() to prevent indefinite hangs - Fix generate_parameter_metadata.py output paths to write into share/metkit/ instead of next to the module; add REQUEST_TIMEOUT and pass to both requests.get() calls - Emit canonical 'name' field in fetch_units() regardless of which key the API returns (symbol / label / name) - Fix test_constructor_mode_validation to patch _load_online instead of making a real network request; hoist shared module import to top of test file
Add prototype paramdb api that allows conversion of ECMWF shortname to param id and vice versa using either the paramdb directly or a local copy.
Adds a
ParamDBclass topymetkit.pyfor resolving ECMWF parameter metadata (shortname ↔ longname ↔ param ID, units) with both online (live API) and offline (bundled YAML) modes.New files
parameter_metadata.yaml— bundled offline parameter data generated from the parameter-databasegenerate_parameter_metadata.py— standalone script to regenerate the YAML from the ECMWF API (python -m pymetkit.generate_parameter_metadata)Changes
pymetkit.py— addsParamDBclass; importsyaml,requests,Path*.
__init__.py— explicitly exports ParamDBpyproject.toml— addspyyaml+requeststo dependencies; includesparameter_metadata.yamlin package dataUsage
Lookup via get_metadata / get_units accepts param ID (int), shortname, or longname interchangeably.
Description
Contributor Declaration
By opening this pull request, I affirm the following: