Grid Doctor HEALs your Grids

Note

This is a scripting solution for a proof of concept. An operational ready approach will follow. For adding code for specific datasets please add your script solution into the scripts/<yourname> folder.

Installation

git clone git@github.com:freva-org/grid-doctor.git
cd grid-doctor
python -m pip install -e .

For GPU support use

python -m pip install -e .[gpu]

For remapping of large grids you should install ESMF through conda-forge.

mamba install -c conda-forge -y "esmf=*=mpi_openmpi_*" esmpy

Developing the documentation

This repository builds two documentation sites from the same docs/ source tree:

Technical documentation: built with mkdocs.tech.yml
Waterpark/data documentation: built with mkdocs.data.yml

Both sites share common assets and selected shared Markdown files, but are published independently.

Documentation layout

The documentation sources are organised as follows:

docs/
├── technical/   # technical Grid Doctor documentation
├── data/        # Waterpark/data-oriented documentation
├── shared/      # pages shared by both documentation sites
└── assets/      # images, CSS, JavaScript, logos, etc.

During the build, the relevant documentation tree is staged into .build/:

.build/
├── tech/
└── data/

This allows both documentation sites to behave as if their own content starts at the web root /.

Building the documentation

The default documentation target builds the technical documentation:

tox -e docs

This is equivalent to:

tox -e docs -- tech

To build the Waterpark/data documentation:

tox -e docs -- data

The generated output directories are:

site-tech/   # technical documentation
site-data/   # Waterpark/data documentation

Serving the documentation locally for easy and quick development

To serve the technical documentation locally:

tox -e docs-serve

or explicitly:

tox -e docs-serve -- tech

To serve the Waterpark/data documentation locally:

tox -e docs-serve -- data

The documentation is then served at:

http://localhost:8000

The local serve target uses symlinks in .build/, so changes made under docs/technical/, docs/data/, docs/shared/, or docs/assets/ can be picked up while developing.

Adding new pages

Add technical documentation pages under:

docs/technical/

Add Waterpark/data documentation pages under:

docs/data/

Add pages that should be available to both sites under:

docs/shared/

After adding a page, include it in the appropriate MkDocs navigation file:

mkdocs.tech.yml
mkdocs.data.yml

For example:

nav:
  - Overview: index.md
  - Getting started: getting-started.md

Adding images and other assets

Shared images, CSS, JavaScript, logos, and other static assets should go under:

docs/assets/

You can reference them from Markdown like this:

![Example image](assets/example.png)

or, when a root-relative link is more appropriate:

![Example image](/assets/example.png)

Shared pages

Files in docs/shared/ are staged into both documentation builds. This is useful for pages such as common concepts, terminology, license notes, or explanations that apply to both the technical and data-facing documentation.

For example, a shared file:

docs/shared/technical-decisions.md

can be referenced in either MkDocs config as:

nav:
  - Technical decisions: technical-decisions.md

Deployment overview

The two documentation sites are deployed differently:

site-tech/ is deployed via the GitHub Pages action.
site-data/ is pushed to the gh-pages branch and served independently from the custom Waterpark web server.

This allows both sites to use / as their root URL while still keeping all documentation sources in a single repository.

pip install tox
tox -e docs          # build to site/
tox -e docs-serve    # live preview at http://127.0.0.1:8000

Quick Start

import grid_doctor as gd

ds = gd.cached_open_dataset(["path/to/*.nc"])
max_level = gd.resolution_to_healpix_level(gd.get_latlon_resolution(ds))
weights_dir="/scratch/{user[0]}/{user}/grid-doctor/weights"\
    .format(user=getuser(), level=level)
gd.cached_weights(
    ds,
    level=max_level,
    prefer_offline=True,
    cache_path=weights_path
)
pyramid = gd.create_healpix_pyramid(
    ds,
    weights_path=weights_dir,
    max_level=max_level
)
gd.save_pyramid(
    pyramid,
    "s3://my-bucket/dataset.zarr",
    s3_options=gd.get_s3_options(
        "https://s3.eu-dkrz-3.dkrz.cloud",
        "~/.s3-credentials.json",
    ),
)

🏥 Grid Rehab Progress

How are our patients doing? Every dataset starts broken and leaves HEALed. If your dataset is still 😢, it needs a doctor — that could be you. Claim a patient, write a script, and turn that frown into 😎.

	Meaning
😢	Not started
🩹	In treatment
😎	HEALed

Dataset	Uploaded to S3	Script Submitted
ICON-DREAM	😎	😎
EERIE	😎	😎
ERA5	😎	😎
CMIP6	🩹	😎
NextGEMS	😎	😎
ICDC	😎	🩹
ORCHESTRA	😎	😎
PalMod	😢	😢
Dyamond	😎	😎

Tip

To claim a dataset, open a PR adding your script to scripts/<dataset>/ and update this table. See Getting Started for the template.

Writing a Conversion Script

Create a folder under scripts/ and add your script:

mkdir -p scripts/<yourname>

A minimal script using the built-in CLI helpers:

import grid_doctor as gd
import grid_doctor.cli as gd_cli
from data_portal_worker.rechunker import ChunkOptimizer

parser = gd_cli.get_parser("my-dataset", "Convert my-dataset to HEALPix.")
parser.add_argument("--variables", nargs="*", default=["t_2m"])
args = parser.parse_args()
gd_cli.setup_logging_from_args(args)

ds = gd.cached_open_dataset(["path/to/*.nc"])
pyramid = gd.create_healpix_pyramid(ds)
gd.save_pyramid(
    chunked,
    f"s3://{args.s3_bucket}/my-dataset.zarr",
    s3_options=gd.get_s3_options(args.s3_endpoint, args.s3_credentials_file),
)

Run with verbosity:

python scripts/my-dataset/convert.py my-bucket -vv

Important

Please add a descriptive README about what your script is trying to achieve. Document any problems you ran into.

Caution

DO NOT commit S3 keys or secrets to this repository. Use environment variables or a credentials file.

Type Checking

tox -e type-check

Issues

As this is still very much work in progress it is very likely that you will run into problems. Please note any problems in the README.md file for your dataset folder. Feel free to submit PRs if there are any issues with the DatasetAggregator or ChunkOptimizer classes. If you don't feel comfortable with submitting PRs you can file an issue report here.

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
.github/workflows		.github/workflows
docs		docs
scripts		scripts
src/grid_doctor		src/grid_doctor
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mkdocs.base.yml		mkdocs.base.yml
mkdocs.data.yml		mkdocs.data.yml
mkdocs.tech.yml		mkdocs.tech.yml
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Grid Doctor HEALs your Grids

Installation

Developing the documentation

Documentation layout

Building the documentation

Serving the documentation locally for easy and quick development

Adding new pages

Adding images and other assets

Shared pages

Deployment overview

Quick Start

🏥 Grid Rehab Progress

Writing a Conversion Script

Type Checking

Issues

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Grid Doctor HEALs your Grids

Installation

Developing the documentation

Documentation layout

Building the documentation

Serving the documentation locally for easy and quick development

Adding new pages

Adding images and other assets

Shared pages

Deployment overview

Quick Start

🏥 Grid Rehab Progress

Writing a Conversion Script

Type Checking

Issues

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages