Add NClimGrid data source with lexicon and test coverage#806
Add NClimGrid data source with lexicon and test coverage#806iype-eldho wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
Greptile SummaryThis PR adds
|
| Filename | Overview |
|---|---|
| earth2studio/data/nclimgrid.py | New DataSource implementation; missing SPDX license header, uses stdlib logging instead of loguru, not exported from init.py, stale-cache risk on modifier changes |
| earth2studio/lexicon/nclimgrid.py | New lexicon with unit-conversion modifiers; missing SPDX license header and not exported from lexicon init.py |
| test/data/test_nclimgrid.py | Thorough offline and network (marked @pytest.mark.network) test coverage for schema, unit conversions, caching, and scientific sanity checks |
Reviews (1): Last reviewed commit: "Add NClimGrid data source and lexicon wi..." | Re-trigger Greptile
| @@ -0,0 +1,371 @@ | |||
| import logging | |||
There was a problem hiding this comment.
Missing required SPDX license header
All Python source files in earth2studio/ must start with the Apache-2.0 SPDX header (project rule). This file starts directly with import logging, so the /license check will fail and the CI compliance gate will block merging.
| import logging | |
| # SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. | |
| # SPDX-FileCopyrightText: All rights reserved. | |
| # SPDX-License-Identifier: Apache-2.0 | |
| # | |
| # Licensed under the Apache License, Version 2.0 (the "License"); | |
| # you may not use this file except in compliance with the License. | |
| # You may obtain a copy of the License at | |
| # | |
| # http://www.apache.org/licenses/LICENSE-2.0 | |
| # | |
| # Unless required by applicable law or agreed to in writing, software | |
| # distributed under the License is distributed on an "AS IS" BASIS, | |
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
| # See the License for the specific language governing permissions and | |
| # limitations under the License. | |
| import logging |
| @@ -0,0 +1,111 @@ | |||
| from collections.abc import Callable | |||
There was a problem hiding this comment.
Missing required SPDX license header
earth2studio/lexicon/nclimgrid.py also starts directly with imports, missing the mandatory Apache-2.0 header required by the project style guide. Every other lexicon file in the directory carries this header.
| from collections.abc import Callable | |
| # SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. | |
| # SPDX-FileCopyrightText: All rights reserved. | |
| # SPDX-License-Identifier: Apache-2.0 | |
| # | |
| # Licensed under the Apache License, Version 2.0 (the "License"); | |
| # you may not use this file except in compliance with the License. | |
| # You may obtain a copy of the License at | |
| # | |
| # http://www.apache.org/licenses/LICENSE-2.0 | |
| # | |
| # Unless required by applicable law or agreed to in writing, software | |
| # distributed under the License is distributed on an "AS IS" BASIS, | |
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
| # See the License for the specific language governing permissions and | |
| # limitations under the License. | |
| from collections.abc import Callable |
| import logging | ||
| import os | ||
| from datetime import datetime | ||
| from pathlib import Path | ||
| from typing import Any | ||
|
|
||
| import numpy as np | ||
| import pandas as pd | ||
| import pyarrow as pa | ||
| import s3fs | ||
| import xarray as xr | ||
| from tqdm.auto import tqdm | ||
|
|
||
| from earth2studio.lexicon.nclimgrid import NClimGridLexicon | ||
|
|
||
| logger = logging.getLogger("earth2studio.nclimgrid") | ||
| logger.setLevel(logging.INFO) |
There was a problem hiding this comment.
stdlib
logging instead of loguru
The project mandates from loguru import logger for all logging inside earth2studio/ (CLAUDE.md, style guide). import logging / logging.getLogger(...) / logger.setLevel(logging.INFO) are all non-standard here — loguru does not use setLevel. Replace with the loguru pattern used by every other data source (e.g. mrms.py).
| import logging | |
| import os | |
| from datetime import datetime | |
| from pathlib import Path | |
| from typing import Any | |
| import numpy as np | |
| import pandas as pd | |
| import pyarrow as pa | |
| import s3fs | |
| import xarray as xr | |
| from tqdm.auto import tqdm | |
| from earth2studio.lexicon.nclimgrid import NClimGridLexicon | |
| logger = logging.getLogger("earth2studio.nclimgrid") | |
| logger.setLevel(logging.INFO) | |
| from loguru import logger |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| @@ -0,0 +1,371 @@ | |||
| import logging | |||
There was a problem hiding this comment.
NClimGrid not exported from __init__.py
NClimGrid is not added to earth2studio/data/__init__.py and NClimGridLexicon is not added to earth2studio/lexicon/__init__.py. Every existing data source and lexicon is exported from these files, making them available via from earth2studio.data import NClimGrid. Without these exports the new data source is effectively invisible to users following the standard import path.
| cache_file = self._cache_file_for_timestep(native, ts) | ||
|
|
||
| if self.cache and cache_file.exists(): | ||
| df_t = pd.read_parquet(cache_file) | ||
| else: | ||
| df_t = self._dataarray_to_dataframe(da_t, var, modifier) | ||
|
|
||
| if self.cache: | ||
| df_t.to_parquet(cache_file) | ||
|
|
||
| frames.append(df_t) |
There was a problem hiding this comment.
Stale disk cache after modifier changes
The parquet cache is keyed solely on native variable name + date. The files contain unit-converted values (e.g. °C→K). If the conversion logic is ever corrected (e.g. a bug fix to modifier), existing cache files will silently serve wrong values until users manually delete ~/.earth2studio-cache/nclimgrid. Consider including a hash of the modifier or a schema version in the cache filename.
Earth2Studio Pull Request
Description
This PR adds support for the NClimGrid dataset as a new Earth2Studio data source.
Key features
NClimGridLexiconTesting
test/data/test_nclimgrid.pyNotes
Checklist
Dependencies
None