Skip to content

freva-org/grid-doctor

Repository files navigation

Grid Doctor HEALs your Grids

Logo
Documentation

Note

This is a scripting solution for a proof of concept. An operational ready approach will follow. For adding code for specific datasets please add your script solution into the scripts/<yourname> folder.

Installation

git clone git@github.com:freva-org/grid-doctor.git
cd grid-doctor
python -m pip install -e .

For GPU support use

python -m pip install -e .[gpu]

For remapping of large grids you should install ESMF through conda-forge.

mamba install -c conda-forge -y "esmf=*=mpi_openmpi_*" esmpy

Developing the documentation

This repository builds two documentation sites from the same docs/ source tree:

  • Technical documentation: built with mkdocs.tech.yml
  • Waterpark/data documentation: built with mkdocs.data.yml

Both sites share common assets and selected shared Markdown files, but are published independently.

Documentation layout

The documentation sources are organised as follows:

docs/
├── technical/   # technical Grid Doctor documentation
├── data/        # Waterpark/data-oriented documentation
├── shared/      # pages shared by both documentation sites
└── assets/      # images, CSS, JavaScript, logos, etc.

During the build, the relevant documentation tree is staged into .build/:

.build/
├── tech/
└── data/

This allows both documentation sites to behave as if their own content starts at the web root /.

Building the documentation

The default documentation target builds the technical documentation:

tox -e docs

This is equivalent to:

tox -e docs -- tech

To build the Waterpark/data documentation:

tox -e docs -- data

The generated output directories are:

site-tech/   # technical documentation
site-data/   # Waterpark/data documentation

Serving the documentation locally for easy and quick development

To serve the technical documentation locally:

tox -e docs-serve

or explicitly:

tox -e docs-serve -- tech

To serve the Waterpark/data documentation locally:

tox -e docs-serve -- data

The documentation is then served at:

http://localhost:8000

The local serve target uses symlinks in .build/, so changes made under docs/technical/, docs/data/, docs/shared/, or docs/assets/ can be picked up while developing.

Adding new pages

Add technical documentation pages under:

docs/technical/

Add Waterpark/data documentation pages under:

docs/data/

Add pages that should be available to both sites under:

docs/shared/

After adding a page, include it in the appropriate MkDocs navigation file:

mkdocs.tech.yml
mkdocs.data.yml

For example:

nav:
  - Overview: index.md
  - Getting started: getting-started.md

Adding images and other assets

Shared images, CSS, JavaScript, logos, and other static assets should go under:

docs/assets/

You can reference them from Markdown like this:

![Example image](assets/example.png)

or, when a root-relative link is more appropriate:

![Example image](/assets/example.png)

Shared pages

Files in docs/shared/ are staged into both documentation builds. This is useful for pages such as common concepts, terminology, license notes, or explanations that apply to both the technical and data-facing documentation.

For example, a shared file:

docs/shared/technical-decisions.md

can be referenced in either MkDocs config as:

nav:
  - Technical decisions: technical-decisions.md

Deployment overview

The two documentation sites are deployed differently:

  • site-tech/ is deployed via the GitHub Pages action.
  • site-data/ is pushed to the gh-pages branch and served independently from the custom Waterpark web server.

This allows both sites to use / as their root URL while still keeping all documentation sources in a single repository.

pip install tox
tox -e docs          # build to site/
tox -e docs-serve    # live preview at http://127.0.0.1:8000

Quick Start

import grid_doctor as gd

ds = gd.cached_open_dataset(["path/to/*.nc"])
max_level = gd.resolution_to_healpix_level(gd.get_latlon_resolution(ds))
weights_dir="/scratch/{user[0]}/{user}/grid-doctor/weights"\
    .format(user=getuser(), level=level)
gd.cached_weights(
    ds,
    level=max_level,
    prefer_offline=True,
    cache_path=weights_path
)
pyramid = gd.create_healpix_pyramid(
    ds,
    weights_path=weights_dir,
    max_level=max_level
)
gd.save_pyramid(
    pyramid,
    "s3://my-bucket/dataset.zarr",
    s3_options=gd.get_s3_options(
        "https://s3.eu-dkrz-3.dkrz.cloud",
        "~/.s3-credentials.json",
    ),
)

🏥 Grid Rehab Progress

How are our patients doing? Every dataset starts broken and leaves HEALed. If your dataset is still 😢, it needs a doctor — that could be you. Claim a patient, write a script, and turn that frown into 😎.

Meaning
😢 Not started
🩹 In treatment
😎 HEALed
Dataset Uploaded to S3 Script Submitted
ICON-DREAM 😎 😎
EERIE 😎 😎
ERA5 😎 😎
CMIP6 🩹 😎
NextGEMS 😎 😎
ICDC 😎 🩹
ORCHESTRA 😎 😎
PalMod 😢 😢
Dyamond 😎 😎

Tip

To claim a dataset, open a PR adding your script to scripts/<dataset>/ and update this table. See Getting Started for the template.

Writing a Conversion Script

Create a folder under scripts/ and add your script:

mkdir -p scripts/<yourname>

A minimal script using the built-in CLI helpers:

import grid_doctor as gd
import grid_doctor.cli as gd_cli
from data_portal_worker.rechunker import ChunkOptimizer

parser = gd_cli.get_parser("my-dataset", "Convert my-dataset to HEALPix.")
parser.add_argument("--variables", nargs="*", default=["t_2m"])
args = parser.parse_args()
gd_cli.setup_logging_from_args(args)

ds = gd.cached_open_dataset(["path/to/*.nc"])
pyramid = gd.create_healpix_pyramid(ds)
gd.save_pyramid(
    chunked,
    f"s3://{args.s3_bucket}/my-dataset.zarr",
    s3_options=gd.get_s3_options(args.s3_endpoint, args.s3_credentials_file),
)

Run with verbosity:

python scripts/my-dataset/convert.py my-bucket -vv

Important

Please add a descriptive README about what your script is trying to achieve. Document any problems you ran into.

Caution

DO NOT commit S3 keys or secrets to this repository. Use environment variables or a credentials file.

Type Checking

tox -e type-check

Issues

As this is still very much work in progress it is very likely that you will run into problems. Please note any problems in the README.md file for your dataset folder. Feel free to submit PRs if there are any issues with the DatasetAggregator or ChunkOptimizer classes. If you don't feel comfortable with submitting PRs you can file an issue report here.

About

Grid doctor HEALs your Grids

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors