Note
This is a scripting solution for a proof of concept. An operational ready
approach will follow. For adding code for specific datasets please add
your script solution into the scripts/<yourname> folder.
git clone git@github.com:freva-org/grid-doctor.git
cd grid-doctor
python -m pip install -e .For GPU support use
python -m pip install -e .[gpu]For remapping of large grids you should install ESMF through conda-forge.
mamba install -c conda-forge -y "esmf=*=mpi_openmpi_*" esmpyThis repository builds two documentation sites from the same docs/ source tree:
- Technical documentation: built with
mkdocs.tech.yml - Waterpark/data documentation: built with
mkdocs.data.yml
Both sites share common assets and selected shared Markdown files, but are published independently.
The documentation sources are organised as follows:
docs/
├── technical/ # technical Grid Doctor documentation
├── data/ # Waterpark/data-oriented documentation
├── shared/ # pages shared by both documentation sites
└── assets/ # images, CSS, JavaScript, logos, etc.
During the build, the relevant documentation tree is staged into .build/:
.build/
├── tech/
└── data/
This allows both documentation sites to behave as if their own content starts at the web root /.
The default documentation target builds the technical documentation:
tox -e docsThis is equivalent to:
tox -e docs -- techTo build the Waterpark/data documentation:
tox -e docs -- dataThe generated output directories are:
site-tech/ # technical documentation
site-data/ # Waterpark/data documentation
To serve the technical documentation locally:
tox -e docs-serveor explicitly:
tox -e docs-serve -- techTo serve the Waterpark/data documentation locally:
tox -e docs-serve -- dataThe documentation is then served at:
http://localhost:8000
The local serve target uses symlinks in .build/, so changes made under docs/technical/, docs/data/, docs/shared/, or docs/assets/ can be picked up while developing.
Add technical documentation pages under:
docs/technical/
Add Waterpark/data documentation pages under:
docs/data/
Add pages that should be available to both sites under:
docs/shared/
After adding a page, include it in the appropriate MkDocs navigation file:
mkdocs.tech.yml
mkdocs.data.yml
For example:
nav:
- Overview: index.md
- Getting started: getting-started.mdShared images, CSS, JavaScript, logos, and other static assets should go under:
docs/assets/
You can reference them from Markdown like this:
or, when a root-relative link is more appropriate:
Files in docs/shared/ are staged into both documentation builds. This is useful for pages such as common concepts, terminology, license notes, or explanations that apply to both the technical and data-facing documentation.
For example, a shared file:
docs/shared/technical-decisions.md
can be referenced in either MkDocs config as:
nav:
- Technical decisions: technical-decisions.mdThe two documentation sites are deployed differently:
site-tech/is deployed via the GitHub Pages action.site-data/is pushed to thegh-pagesbranch and served independently from the custom Waterpark web server.
This allows both sites to use / as their root URL while still keeping all documentation sources in a single repository.
pip install tox
tox -e docs # build to site/
tox -e docs-serve # live preview at http://127.0.0.1:8000import grid_doctor as gd
ds = gd.cached_open_dataset(["path/to/*.nc"])
max_level = gd.resolution_to_healpix_level(gd.get_latlon_resolution(ds))
weights_dir="/scratch/{user[0]}/{user}/grid-doctor/weights"\
.format(user=getuser(), level=level)
gd.cached_weights(
ds,
level=max_level,
prefer_offline=True,
cache_path=weights_path
)
pyramid = gd.create_healpix_pyramid(
ds,
weights_path=weights_dir,
max_level=max_level
)
gd.save_pyramid(
pyramid,
"s3://my-bucket/dataset.zarr",
s3_options=gd.get_s3_options(
"https://s3.eu-dkrz-3.dkrz.cloud",
"~/.s3-credentials.json",
),
)How are our patients doing? Every dataset starts broken and leaves HEALed. If your dataset is still 😢, it needs a doctor — that could be you. Claim a patient, write a script, and turn that frown into 😎.
| Meaning | |
|---|---|
| 😢 | Not started |
| 🩹 | In treatment |
| 😎 | HEALed |
| Dataset | Uploaded to S3 | Script Submitted |
|---|---|---|
| ICON-DREAM | 😎 | 😎 |
| EERIE | 😎 | 😎 |
| ERA5 | 😎 | 😎 |
| CMIP6 | 🩹 | 😎 |
| NextGEMS | 😎 | 😎 |
| ICDC | 😎 | 🩹 |
| ORCHESTRA | 😎 | 😎 |
| PalMod | 😢 | 😢 |
| Dyamond | 😎 | 😎 |
Tip
To claim a dataset, open a PR adding your script to scripts/<dataset>/
and update this table. See Getting Started
for the template.
Create a folder under scripts/ and add your script:
mkdir -p scripts/<yourname>A minimal script using the built-in CLI helpers:
import grid_doctor as gd
import grid_doctor.cli as gd_cli
from data_portal_worker.rechunker import ChunkOptimizer
parser = gd_cli.get_parser("my-dataset", "Convert my-dataset to HEALPix.")
parser.add_argument("--variables", nargs="*", default=["t_2m"])
args = parser.parse_args()
gd_cli.setup_logging_from_args(args)
ds = gd.cached_open_dataset(["path/to/*.nc"])
pyramid = gd.create_healpix_pyramid(ds)
gd.save_pyramid(
chunked,
f"s3://{args.s3_bucket}/my-dataset.zarr",
s3_options=gd.get_s3_options(args.s3_endpoint, args.s3_credentials_file),
)Run with verbosity:
python scripts/my-dataset/convert.py my-bucket -vvImportant
Please add a descriptive README about what your script is trying to achieve. Document any problems you ran into.
Caution
DO NOT commit S3 keys or secrets to this repository. Use environment variables or a credentials file.
tox -e type-checkAs this is still very much work in progress it is very likely that you will
run into problems. Please note any problems in the README.md file
for your dataset folder. Feel free to submit PRs if there are any issues
with the DatasetAggregator or ChunkOptimizer classes. If you don't feel
comfortable with submitting PRs you can file an issue report
here.
