Add NClimGrid data source with lexicon and test coverage by iype-eldho · Pull Request #806 · NVIDIA/earth2studio

iype-eldho · 2026-04-07T16:19:12Z

Earth2Studio Pull Request

Description

This PR adds support for the NClimGrid dataset as a new Earth2Studio data source.

Key features

Implements NClimGrid data loader using NOAA NODD S3 bucket
Adds canonical variable mapping via NClimGridLexicon
Supports flexible time selection (single timestamp, list, slice)
Implements per-timestep parquet caching for efficient reuse
Ensures schema consistency with existing Earth2Studio data interfaces

Testing

Added tests under test/data/test_nclimgrid.py
Covers:
- schema validation
- variable mapping correctness
- time selection behavior
- basic scientific sanity checks (temperature ranges, precipitation non-negativity)
- caching behavior

Notes

Data is streamed per timestep to avoid loading entire monthly datasets into memory
Uses public NOAA NODD S3 bucket (no authentication required)

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.
Assess and address Greptile feedback (AI code review bot for guidance; use discretion, addressing all feedback is not required).

Dependencies

None

greptile-apps · 2026-04-07T16:22:40Z

Greptile Summary

This PR adds NClimGrid as a new DataFrame-returning data source backed by the public NOAA NODD S3 bucket, with per-timestep parquet caching and a companion NClimGridLexicon for unit conversion (°C→K, mm→m). The implementation logic is sound and the test suite is thorough, but both new files are missing the mandatory SPDX Apache-2.0 license header and neither class is exported from its respective __init__.py, making the new data source unreachable via the standard from earth2studio.data import NClimGrid import path.

Confidence Score: 3/5

Not ready to merge — two P1 compliance issues must be fixed first

Two P1 issues remain: missing SPDX license headers (will fail the CI /license check) and missing init.py exports (makes the new data source unreachable through the standard API). The underlying implementation logic is correct.

earth2studio/data/nclimgrid.py and earth2studio/lexicon/nclimgrid.py both need license headers; earth2studio/data/init.py and earth2studio/lexicon/init.py both need export entries added

Important Files Changed

Filename	Overview
earth2studio/data/nclimgrid.py	New DataSource implementation; missing SPDX license header, uses stdlib logging instead of loguru, not exported from init.py, stale-cache risk on modifier changes
earth2studio/lexicon/nclimgrid.py	New lexicon with unit-conversion modifiers; missing SPDX license header and not exported from lexicon init.py
test/data/test_nclimgrid.py	Thorough offline and network (marked @pytest.mark.network) test coverage for schema, unit conversions, caching, and scientific sanity checks

_{Reviews (1): Last reviewed commit: "Add NClimGrid data source and lexicon wi..." | Re-trigger Greptile}

greptile-apps · 2026-04-07T16:22:44Z

earth2studio/data/nclimgrid.py

@@ -0,0 +1,371 @@
+import logging


Missing required SPDX license header

All Python source files in earth2studio/ must start with the Apache-2.0 SPDX header (project rule). This file starts directly with import logging, so the /license check will fail and the CI compliance gate will block merging.

Suggested change

import logging

# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES.

# SPDX-FileCopyrightText: All rights reserved.

# SPDX-License-Identifier: Apache-2.0

#

# Licensed under the Apache License, Version 2.0 (the "License");

# you may not use this file except in compliance with the License.

# You may obtain a copy of the License at

#

# http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

import logging

greptile-apps · 2026-04-07T16:22:45Z

earth2studio/lexicon/nclimgrid.py

@@ -0,0 +1,111 @@
+from collections.abc import Callable


Missing required SPDX license header

earth2studio/lexicon/nclimgrid.py also starts directly with imports, missing the mandatory Apache-2.0 header required by the project style guide. Every other lexicon file in the directory carries this header.

Suggested change

from collections.abc import Callable

# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES.

# SPDX-FileCopyrightText: All rights reserved.

# SPDX-License-Identifier: Apache-2.0

#

# Licensed under the Apache License, Version 2.0 (the "License");

# you may not use this file except in compliance with the License.

# You may obtain a copy of the License at

#

# http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

from collections.abc import Callable

greptile-apps · 2026-04-07T16:22:46Z

earth2studio/data/nclimgrid.py

+import logging
+import os
+from datetime import datetime
+from pathlib import Path
+from typing import Any
+
+import numpy as np
+import pandas as pd
+import pyarrow as pa
+import s3fs
+import xarray as xr
+from tqdm.auto import tqdm
+
+from earth2studio.lexicon.nclimgrid import NClimGridLexicon
+
+logger = logging.getLogger("earth2studio.nclimgrid")
+logger.setLevel(logging.INFO)


stdlib logging instead of loguru

The project mandates from loguru import logger for all logging inside earth2studio/ (CLAUDE.md, style guide). import logging / logging.getLogger(...) / logger.setLevel(logging.INFO) are all non-standard here — loguru does not use setLevel. Replace with the loguru pattern used by every other data source (e.g. mrms.py).

Suggested change

import logging

import os

from datetime import datetime

from pathlib import Path

from typing import Any

import numpy as np

import pandas as pd

import pyarrow as pa

import s3fs

import xarray as xr

from tqdm.auto import tqdm

from earth2studio.lexicon.nclimgrid import NClimGridLexicon

logger = logging.getLogger("earth2studio.nclimgrid")

logger.setLevel(logging.INFO)

from loguru import logger

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps · 2026-04-07T16:22:48Z

earth2studio/data/nclimgrid.py

@@ -0,0 +1,371 @@
+import logging


NClimGrid not exported from __init__.py

NClimGrid is not added to earth2studio/data/__init__.py and NClimGridLexicon is not added to earth2studio/lexicon/__init__.py. Every existing data source and lexicon is exported from these files, making them available via from earth2studio.data import NClimGrid. Without these exports the new data source is effectively invisible to users following the standard import path.

greptile-apps · 2026-04-07T16:22:49Z

earth2studio/data/nclimgrid.py

+            cache_file = self._cache_file_for_timestep(native, ts)
+
+            if self.cache and cache_file.exists():
+                df_t = pd.read_parquet(cache_file)
+            else:
+                df_t = self._dataarray_to_dataframe(da_t, var, modifier)
+
+                if self.cache:
+                    df_t.to_parquet(cache_file)
+
+            frames.append(df_t)


Stale disk cache after modifier changes

The parquet cache is keyed solely on native variable name + date. The files contain unit-converted values (e.g. °C→K). If the conversion logic is ever corrected (e.g. a bug fix to modifier), existing cache files will silently serve wrong values until users manually delete ~/.earth2studio-cache/nclimgrid. Consider including a hash of the modifier or a schema version in the cache filename.

Add NClimGrid data source and lexicon with tests

e5bb2e8

greptile-apps bot reviewed Apr 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NClimGrid data source with lexicon and test coverage#806

Add NClimGrid data source with lexicon and test coverage#806
iype-eldho wants to merge 1 commit intoNVIDIA:mainfrom
iype-eldho:feature/nclimgrid-tests

iype-eldho commented Apr 7, 2026

Uh oh!

greptile-apps bot commented Apr 7, 2026

Confidence Score: 3/5

Important Files Changed

Uh oh!

greptile-apps bot Apr 7, 2026

Uh oh!

greptile-apps bot Apr 7, 2026

Uh oh!

greptile-apps bot Apr 7, 2026

Uh oh!

greptile-apps bot Apr 7, 2026

Uh oh!

greptile-apps bot Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-import logging
+# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES.
+# SPDX-FileCopyrightText: All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import logging

Conversation

iype-eldho commented Apr 7, 2026

Earth2Studio Pull Request

Description

Key features

Testing

Notes

Checklist

Dependencies

Uh oh!

greptile-apps bot commented Apr 7, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Uh oh!

greptile-apps bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant