Skip to content

Add prototype paramdb api#185

Draft
awarde96 wants to merge 7 commits intodevelopfrom
feature/paramdb-api
Draft

Add prototype paramdb api#185
awarde96 wants to merge 7 commits intodevelopfrom
feature/paramdb-api

Conversation

@awarde96
Copy link
Copy Markdown

Add prototype paramdb api that allows conversion of ECMWF shortname to param id and vice versa using either the paramdb directly or a local copy.

Adds a ParamDB class to pymetkit.py for resolving ECMWF parameter metadata (shortname ↔ longname ↔ param ID, units) with both online (live API) and offline (bundled YAML) modes.

New files

  • parameter_metadata.yaml — bundled offline parameter data generated from the parameter-database
  • generate_parameter_metadata.py — standalone script to regenerate the YAML from the ECMWF API (python -m pymetkit.generate_parameter_metadata)

Changes

  • pymetkit.py — adds ParamDB class; imports yaml, requests, Path
    *. __init__.py — explicitly exports ParamDB
  • pyproject.toml — adds pyyaml + requests to dependencies; includes parameter_metadata.yaml in package data

Usage

from pymetkit import ParamDB

# Offline (default, uses bundled YAML)
db = ParamDB()
db.shortname_to_longname("2t")        # "2 metre temperature"
db.param_id_to_shortname(129)         # "z"
db.longname_to_param_id("Geopotential")  # 129
db.get_units("tp")                    # "m"
db.get_metadata(167)                  # full dict

# Online (fetches from codes.ecmwf.int)
db = ParamDB(mode="online")

Lookup via get_metadata / get_units accepts param ID (int), shortname, or longname interchangeably.

Description

Contributor Declaration

By opening this pull request, I affirm the following:

  • All authors agree to the Contributor License Agreement.
  • The code follows the project's coding standards.
  • I have performed self-review and added comments where needed.
  • I have added or updated tests to verify that my changes are effective and functional.
  • I have run all existing tests and confirmed they pass.

…o param id and vice versa using either the paramdb directly or a local copy
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 62.33%. Comparing base (fa88f33) to head (9c374d6).
⚠️ Report is 37 commits behind head on develop.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop     #185   +/-   ##
========================================
  Coverage    62.33%   62.33%           
========================================
  Files          303      303           
  Lines        11601    11601           
  Branches      1036     1036           
========================================
  Hits          7232     7232           
  Misses        4369     4369           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a prototype ParamDB API to the Python bindings, enabling ECMWF parameter metadata lookups (shortname/longname/ID/units) in offline mode (bundled YAML) and online mode (ECMWF API with local caching). It also introduces a regeneration script and new YAML metadata files.

Changes:

  • Added ParamDB class to support offline YAML-backed lookups and online API fetching with a JSON cache.
  • Added a standalone script to regenerate parameter/unit YAML metadata from the ECMWF API.
  • Added unit metadata YAML and a comprehensive Python test suite for ParamDB.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
share/metkit/unit_metadata.yaml Adds bundled unit metadata YAML (offline dataset).
python/pymetkit/tests/test_paramdb.py Adds unit tests for ParamDB (offline lookups + online caching behaviors).
python/pymetkit/src/pymetkit/pymetkit.py Implements ParamDB in the main Python module and adds optional imports for online mode/caching.
python/pymetkit/src/pymetkit/generate_parameter_metadata.py Adds a script to fetch/emit parameter_metadata.yaml and unit_metadata.yaml from the ECMWF API.
python/pymetkit/src/pymetkit/init.py Exports ParamDB from the package.
pyproject.toml Adds runtime deps (pyyaml, requests, platformdirs) and declares YAML package data.

Comment on lines +26 to +41
@pytest.mark.parametrize(
"mode, expectation",
[
["offline", does_not_raise()],
["online", does_not_raise()], # network call; skipped below if no requests
["invalid", pytest.raises(ValueError)],
["OFFLINE", pytest.raises(ValueError)],
],
)
def test_constructor_mode_validation(mode, expectation):
"""Only 'online' and 'offline' are accepted mode values."""
if mode == "online":
pytest.importorskip("requests")
with expectation:
ParamDB(mode=mode)

Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_constructor_mode_validation instantiates ParamDB(mode="online"), which triggers a real HTTP request to the ECMWF API. Because requests is now a required dependency, this test will run in CI and become flaky or fail in offline environments. Consider monkeypatching pymetkit.pymetkit._requests (as done in the other online tests) or patching ParamDB._load_online to a no-op so this test only validates mode parsing without network access.

Copilot uses AI. Check for mistakes.
pyproject.toml Outdated
Comment on lines 45 to 48
"metkit_c.h",
"parameter_metadata.yaml"
]

Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameter_metadata.yaml is declared as package data for the pymetkit package, but the repository’s YAML lives under share/metkit/parameter_metadata.yaml (there is no python/pymetkit/src/pymetkit/parameter_metadata.yaml). As a result, installed wheels/sdists will likely not contain the bundled YAML and offline ParamDB() will fail. Either move/copy the YAML into the package directory at build time, or adjust packaging and the loader to use importlib.resources/pkgutil to read the bundled file reliably.

Suggested change
"metkit_c.h",
"parameter_metadata.yaml"
]
"metkit_c.h"
]
[tool.setuptools.data-files]
"share/metkit" = [
"share/metkit/parameter_metadata.yaml"
]

Copilot uses AI. Check for mistakes.
Comment on lines +489 to +507
@staticmethod
def _find_offline_yaml() -> Path:
"""Locate ``parameter_metadata.yaml``, searching in order:

1. Next to this module file (installed package layout).
2. ``<repo_root>/share/metkit/`` (development tree layout after the
YAML files were moved out of the Python package directory).
"""
candidates = [
Path(__file__).parent / "parameter_metadata.yaml",
Path(__file__).parents[4] / "share" / "metkit" / "parameter_metadata.yaml",
]
for path in candidates:
if path.exists():
return path
raise FileNotFoundError(
"parameter_metadata.yaml not found. Searched:\n"
+ "\n".join(f" {p}" for p in candidates)
)
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The offline YAML lookup relies on Path(__file__).parents[4] / "share/metkit/...", which only works in the repo checkout layout. In an installed package this path will typically point somewhere like <venv>/lib/.../share/metkit/ and the bundled YAML won’t be found. Prefer loading the YAML via importlib.resources from packaged data, or ensure the YAML is placed next to this module in the built distribution and remove the brittle repo-root heuristic.

Copilot uses AI. Check for mistakes.
Comment on lines +337 to +340
request. The cache is stored under the OS user-cache directory
(e.g. ``~/.cache/pymetkit/`` on Linux, ``~/Library/Caches/pymetkit/``
on macOS) and is keyed to the API URL so that a different endpoint
produces a separate file.
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class docstring says the cache file is “keyed to the API URL so that a different endpoint produces a separate file”, but the implementation always uses the fixed _CACHE_FILENAME and does not incorporate _API_URL. Either update the implementation to include a URL-derived component (e.g., hash) in the filename or adjust the docstring to match the behavior.

Suggested change
request. The cache is stored under the OS user-cache directory
(e.g. ``~/.cache/pymetkit/`` on Linux, ``~/Library/Caches/pymetkit/``
on macOS) and is keyed to the API URL so that a different endpoint
produces a separate file.
request. The cache is stored under the OS user-cache directory
(e.g. ``~/.cache/pymetkit/`` on Linux, ``~/Library/Caches/pymetkit/``
on macOS) using the fixed filename defined by
``_CACHE_FILENAME``.

Copilot uses AI. Check for mistakes.
return

# Fetch from the API
response = _requests.get(self._API_URL)
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_requests.get(self._API_URL) is called without a timeout. In library code this can hang indefinitely on network issues, which is an operational reliability problem. Consider adding a reasonable default timeout (and/or making it configurable) for online mode requests.

Suggested change
response = _requests.get(self._API_URL)
request_timeout = 10
response = _requests.get(self._API_URL, timeout=request_timeout)

Copilot uses AI. Check for mistakes.
Comment on lines +17 to +21
PARAM_URL = "https://codes.ecmwf.int/parameter-database/api/v1/param/"
UNIT_URL = "https://codes.ecmwf.int/parameter-database/api/v1/unit/"
PARAM_OUTPUT = Path(__file__).parent / "parameter_metadata.yaml"
UNIT_OUTPUT = Path(__file__).parent / "unit_metadata.yaml"

Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generator writes parameter_metadata.yaml and unit_metadata.yaml next to the Python module (Path(__file__).parent), but the PR adds the YAML under share/metkit/ and ParamDB’s fallback search also expects share/metkit/parameter_metadata.yaml. Regenerating with this script will therefore write to a different location than the committed data. Align the output paths with the repository’s canonical YAML location (or update the rest of the codebase to consume the module-adjacent files).

Copilot uses AI. Check for mistakes.
Comment on lines +39 to +42
print(f"Fetching units from {url} ...")
response = requests.get(url)
response.raise_for_status()
raw_units = response.json()
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both API calls use requests.get(...) without a timeout. If the endpoint stalls, this script can block indefinitely. Consider providing a default timeout (and possibly a retry strategy) to make regeneration more robust.

Copilot uses AI. Check for mistakes.
if key == "id":
continue
entry[key] = value

Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch_units() computes a normalised unit string (name = raw.get("name") or raw.get("symbol") ...) for unit_map, but the YAML output preserves the raw keys and does not ensure there is a canonical name field. If the API returns symbol/label instead of name, unit_metadata.yaml will lack the expected key. Consider explicitly setting entry["name"] = name (and/or dropping the alternate keys) to keep the output schema stable.

Suggested change
# Always emit a canonical name field so unit_metadata.yaml has a stable schema
entry["name"] = name

Copilot uses AI. Check for mistakes.
- Remove parameter_metadata.yaml from pyproject.toml package-data (file lives
  in share/metkit/, not inside the Python package)
- Add importlib.resources-based lookup as first candidate in _find_offline_yaml
  so installed wheels can locate the bundled YAML correctly
- Fix ParamDB class docstring: cache file is keyed to fixed _CACHE_FILENAME,
  not the API URL
- Add _REQUEST_TIMEOUT = 30 class constant and pass it to _requests.get() to
  prevent indefinite hangs
- Fix generate_parameter_metadata.py output paths to write into share/metkit/
  instead of next to the module; add REQUEST_TIMEOUT and pass to both
  requests.get() calls
- Emit canonical 'name' field in fetch_units() regardless of which key the API
  returns (symbol / label / name)
- Fix test_constructor_mode_validation to patch _load_online instead of making
  a real network request; hoist shared module import to top of test file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants