Data assimilation of global wavefields using instaseis.
da-instaseis/
├── notebooks/ # Jupyter notebooks (examples, tutorials)
│ ├── getting_started.ipynb # Introduction and basic usage
│ └── generate_wavefields.ipynb # Synthetic & real data workflows
├── src/
│ └── da_instaseis/ # Main Python package
│ ├── __init__.py
│ ├── waveforms.py
│ └── plotting.py
├── tests/ # pytest unit tests
├── pixi.toml # Pixi environment & task definitions
├── pyproject.toml # Python package metadata (PEP 517/518)
└── README.md
This project uses Pixi to manage the conda + pip environment.
curl -fsSL https://pixi.sh/install.sh | bashgit clone https://github.com/Denolle-Lab/da-instaseis.git
cd da-instaseis
pixi installPixi reads pixi.toml and installs all dependencies (obspy, matplotlib,
scipy, numpy, jupyterlab, … from conda-forge; torch and instaseis via
pip) into an isolated environment under .pixi/.
pixi shellOr prefix individual commands with pixi run:
pixi run python -c "import obspy; print(obspy.__version__)"pixi run labThis opens JupyterLab in your browser. Navigate to the notebooks/ directory to access:
-
getting_started.ipynb- Introduction and basic usage examples -
generate_wavefields.ipynb- Synthetic-only wavefields on a sphere, visualization and GIF animation -
get_waveforms.ipynb- End-to-end pipeline that packages, per past earthquake (M≥7 since 2010):- real observed seismograms at permanent global stations,
- matching Syngine synthetics at those same stations,
- a semi-continuous synthetic wavefield on a Fibonacci sphere of virtual receivers,
all saved as a single
.npzper event. Stations are discovered from the FDSN station service (permanent networks recording continuously since 2010: IU, II, IC, G, GE, GT, CU), so the workflow no longer depends on the brokenlibcomcat.get_phase_dataframe. The reusable logic lives insrc/da_instaseis/download.py:from da_instaseis import download as D cat = D.build_event_catalog("2010-01-01", "2024-12-31", minmagnitude=7.0) src = D.extract_source(cat[0]) # origin + moment tensor inv, st = D.select_permanent_stations("IRIS") # continuous since 2010 path = D.build_event_package(src, stations=st) # -> data/<event_id>.npz
Each
.npzholdsreal_obs,real_synof shape(n_stations, 3, n_samples),sphere_synof shape(n_receivers, n_samples), their coordinates, the source moment tensor and full metadata. Traces are band-passed 0.01–0.1 Hz and stored at 0.25 Hz (lossless for that band, ~1.2 MB/event).Events are written to the repo-root
data/folder, then bundled into zip chunks of 50 events (data/events_NNN.zip) for upload. The download loop is resume-safe — it skips events already inside a chunk. -
read_packaged.ipynb- Standalone reader (numpy + matplotlib + stdlib only) that opens and inspects the packaged events straight out of thedata/events_NNN.zipchunks.
The packaged datasets are not committed to git (too large). They live in an
external Dropbox folder; only the small pointer data/README.md is tracked.
Put the Dropbox direct-download URL (append ?dl=1) in data/README.md and in
the DATA_URL cell at the top of read_packaged.ipynb, which downloads and
unpacks the chunks into data/ on demand.
# Execute all cells in a notebook
pixi run jupyter nbconvert --to notebook --execute notebooks/getting_started.ipynb
# Or use papermill for parameterized execution
pixi run pip install papermill
pixi run papermill notebooks/generate_wavefields.ipynb output.ipynbpixi run testInside pixi shell:
pip install -e .| Package | Source | Purpose |
|---|---|---|
| obspy | conda-forge | Seismological data handling & FDSN access |
| instaseis | pip | Green's function database access |
| matplotlib | conda-forge | Visualization |
| numpy | conda-forge | Numerical computing |
| scipy | conda-forge | Scientific algorithms |
| pandas | conda-forge | Data manipulation & analysis |
| cartopy | conda-forge | Geographic map visualizations |
| pillow | conda-forge | Image processing |
| h5py | conda-forge | HDF5 file I/O |
| torch | pip | Deep learning / data assimilation |
| jupyterlab | conda-forge | Interactive notebooks |
- longboard (install separately):
pip install longboard- Interactive seismic waveform visualization (if available in your Python environment)
The generate_wavefields.ipynb notebook includes functionality to download real seismic data from FDSN web services:
- Earthquake catalog queries - Query global earthquake catalogs (e.g., M≥7.0 events in 2023)
- Waveform downloads - Download 3-component long-period data (LHZ, LHN, LHE) from networks II, IU
- Automatic preprocessing - Remove instrument response, filter, and organize by station
- Multiple visualization approaches:
- Traditional matplotlib record sections
- Interactive longboard explorer (optional)
- Geographic maps with Cartopy
- Data export - Save processed data as NumPy NPZ arrays for machine learning workflows
Note: Internet connection required for FDSN data downloads. Downloads may take several minutes depending on the number of earthquakes and stations.
# Clone and set up the environment
git clone https://github.com/Denolle-Lab/da-instaseis.git
cd da-instaseis
pixi install
# Launch JupyterLab
pixi run lab
# Or run tests
pixi run testThen open notebooks/generate_wavefields.ipynb or notebooks/getting_started.ipynb in JupyterLab.
See LICENSE.