Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified .DS_Store
Binary file not shown.
51 changes: 51 additions & 0 deletions notebooks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Lamwo Composite Pipeline

This folder contains the Jupyter notebooks used to build the Lamwo district monthly Sentinel-2 composites and related biomass statistics. The notebooks are meant to be run in Google Colab with access to the Sunbird shared Google Drive and Google Earth Engine (EE).

## Prerequisites
- Google account with access to the `Sunbird AI/Projects` shared drive.
- Google Colab runtime with GPU/CPU high-RAM recommended.
- Earth Engine account authorized for `ee-isekalala` (adjust if needed).
- Python packages installed within the Colab session, including `pystac-client`, `rio-tiler`, `boto3`, `python-dateutil`, `geopandas`, `shapely`, `rasterio`, `earthengine-api`, `rasterstats`, and plotting libraries.
- Expected directory structure in Drive:
- `/content/drive/Shareddrives/Sunbird AI/Projects/suntrace/suntrace-multimodal/data` for outputs.
- Administrative boundaries in `/content/drive/Shareddrives/Sunbird AI/Projects/GIZ Mini-grid Identification/Phase II/Data/administrative areas/`.

## Recommended Run Order
1. **`Pull S2 imagery for Lamwo_monthly composites.ipynb`**
Creates 1 km tile grid, pulls monthly Sentinel-2 L2A composites from AWS STAC, and saves per-tile `npy` cubes plus the grid GeoJSON to Drive.

2. **`Precomputed Aggregates.ipynb`**
Authenticates with EE, maps tile geometries over various datasets (NDVI, elevation, climate, buildings), and exports the per-tile statistics as CSV/GeoJSON.

3. **`get_lwb.ipynb`**
Downloads Aboveground Live Woody Biomass rasters per tile from ArcGIS, clips them to the Lamwo AOI, and stores GeoTIFFs locally in the Colab session.

4. **`merge_lwb_tile_stats.ipynb`**
Loads the biomass rasters and EE tile statistics, computes total biomass per tile, and merges results into GeoJSON/CSV artifacts ready for downstream analysis.

## Inputs and Outputs (by step)
- `Pull S2 imagery …`
- Inputs: Lamwo AOI GeoPackage, Sentinel-2 STAC (`earth-search.aws.element84.com`).
- Outputs: `grid.geojson`, monthly `.npy` files per tile/year in Drive.
- `Precomputed Aggregates`
- Inputs: Tile grid (`grid.geojson`), EE datasets (S2_SR, SRTM, PAR, CHIRPS, VIIRS, building layers).
- Outputs: EE Drive export (CSV) or downloaded GeoJSON with tile-level statistics.
- `get_lwb`
- Inputs: Lamwo AOI, ArcGIS feature service `Aboveground_Live_Woody_Biomass_Density`.
- Outputs: Downloaded `.tif` rasters and clipped versions for intersecting tiles.
- `merge_lwb_tile_stats`
- Inputs: Biomass raster (`aglwb.tif` or clipped stack), tile grid GeoJSON, tile stats CSV.
- Outputs: Merged GeoJSON/CSV with biomass totals and calculated metrics per tile.

## Operational Tips
- Verify Drive paths at the top of each notebook; update to your Drive mount if different.
- When running EE exports, monitor the EE Tasks tab for completion before proceeding.
- `merge_lwb_tile_stats.ipynb` requires selecting the correct biomass raster unit (`Mg_per_pixel` vs `Mg_per_ha`). Check dataset metadata before merging.
- Large raster downloads may time out; re-run the `get_lwb` download cell for failed tiles.
- Keep secrets (API keys, service accounts) out of notebooks; rely on EE auth prompts in Colab.

## Troubleshooting Checklist
- Missing imports: re-run the initial install cells or add `!pip install` for any new packages.
- CRS mismatches: the merge notebook reprojects to UTM 36N; ensure custom rasters share this CRS or adjust the code.
- Drive quota: exported artifacts land in `ee_exports` or the `data` folder—clean up old runs to stay within limits.
752 changes: 679 additions & 73 deletions notebooks/analyzer_playground.ipynb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion notebooks/merge_lwb_tile_stats.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2333,7 +2333,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.18"
"version": "3.9.16"
}
},
"nbformat": 4,
Expand Down
11 changes: 11 additions & 0 deletions scripts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Lamwo Data Pipeline Scripts

These CLI utilities mirror the Lamwo notebook workflow so teammates can refresh inputs without opening Colab. Update the placeholder paths/project IDs before running.

1. `generate_tile_grid.py` — Build a 1 km grid from an AOI GeoPackage. Outputs a GeoJSON grid (`lamwo_grid.geojson`).
2. `download_s2_monthly.py` — Download monthly Sentinel-2 composites per tile as `.npy` cubes using Earth Engine.
3. `export_tile_stats.py` — Start an Earth Engine export (Drive or direct download) with vegetation, terrain, climate, and building stats per tile.
4. `download_lwb_rasters.py` — Fetch Aboveground Live Woody Biomass rasters per tile from the ArcGIS FeatureServer and clip them to the AOI.
5. `merge_lwb_tile_stats.py` — Combine biomass totals with Earth Engine tile statistics and save merged GeoJSON/CSV outputs.

Typical run order follows the notebook sequence: generate the grid → (optional) fetch Sentinel-2 composites → export the EE aggregates → download biomass rasters → merge outputs.
112 changes: 112 additions & 0 deletions scripts/download_lwb_rasters.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
"""Download and optionally clip Aboveground Live Woody Biomass rasters per tile."""
from __future__ import annotations

import argparse
import logging
import sys
from pathlib import Path
from typing import List

import geopandas as gpd
import requests
import rasterio
from rasterio.mask import mask

ROOT = Path(__file__).resolve().parents[1]
if str(ROOT) not in sys.path:
sys.path.insert(0, str(ROOT))

LOGGER = logging.getLogger(__name__)

DEFAULT_AOI_PATH = Path("PATH/TO/lamwo_aoi.gpkg")
DEFAULT_SERVICE_URL = (
"https://services2.arcgis.com/g8WusZB13b9OegfU/arcgis/rest/services/"
"Aboveground_Live_Woody_Biomass_Density/FeatureServer/0/query?where=1%3D1&"
"outFields=tile_id,Mg_px_1_download,Shape__Area,Shape__Length&outSR=4326&f=json"
)
DEFAULT_DOWNLOAD_DIR = Path("PATH/TO/output/aglwb_tiles")
DEFAULT_CLIP_DIR = Path("PATH/TO/output/aglwb_clipped")


def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("--aoi-path", type=Path, default=DEFAULT_AOI_PATH, help="AOI footprint (GeoPackage/GeoJSON)")
parser.add_argument("--service-url", default=DEFAULT_SERVICE_URL, help="ArcGIS FeatureServer query URL")
parser.add_argument("--download-dir", type=Path, default=DEFAULT_DOWNLOAD_DIR, help="Directory for downloaded rasters")
parser.add_argument("--clip-dir", type=Path, default=DEFAULT_CLIP_DIR, help="Directory for AOI-clipped rasters")
parser.add_argument("--tile-field", default="tile_id", help="Field name containing tile identifier")
parser.add_argument(
"--url-field",
default="Mg_px_1_download",
help="Field containing download URLs for each tile raster",
)
parser.add_argument("--overwrite", action="store_true", help="Overwrite existing files")
parser.add_argument("--http-timeout", type=int, default=600, help="HTTP timeout (seconds)")
return parser.parse_args()


def download_raster(url: str, output_path: Path, timeout: int, overwrite: bool) -> bool:
if output_path.exists() and not overwrite:
LOGGER.info("Skipping existing %s", output_path)
return False

response = requests.get(url, stream=True, timeout=timeout)
response.raise_for_status()
output_path.parent.mkdir(parents=True, exist_ok=True)
with open(output_path, "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
if chunk:
f.write(chunk)
LOGGER.info("Downloaded %s", output_path)
return True


def clip_raster_to_aoi(raster_path: Path, clip_path: Path, geometry) -> Path:
clip_path.parent.mkdir(parents=True, exist_ok=True)
with rasterio.open(raster_path) as src:
out_image, out_transform = mask(src, [geometry], crop=True)
out_meta = src.meta.copy()
out_meta.update(
{
"height": out_image.shape[1],
"width": out_image.shape[2],
"transform": out_transform,
}
)
with rasterio.open(clip_path, "w", **out_meta) as dest:
dest.write(out_image)
LOGGER.info("Clipped %s", clip_path)
return clip_path


def main() -> None:
logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s")
args = parse_args()

aoi = gpd.read_file(args.aoi_path)
if aoi.empty:
raise ValueError(f"AOI at {args.aoi_path} is empty")
aoi_geom = aoi.to_crs(epsg=4326).geometry.iloc[0]

tiles = gpd.read_file(args.service_url)
tiles = tiles.to_crs(epsg=4326)
intersects = tiles[tiles.intersects(aoi_geom)]
LOGGER.info("Found %d biomass tiles intersecting AOI", len(intersects))

for _, row in intersects.iterrows():
tile_id = row.get(args.tile_field)
url = row.get(args.url_field)
if not url:
LOGGER.warning("Row %s missing URL field %s", tile_id, args.url_field)
continue

download_path = args.download_dir / f"{tile_id}.tif"
downloaded = download_raster(url, download_path, args.http_timeout, args.overwrite)

if downloaded or args.overwrite:
clip_path = args.clip_dir / f"{tile_id}.tif"
clip_raster_to_aoi(download_path, clip_path, aoi_geom)


if __name__ == "__main__":
main()
149 changes: 149 additions & 0 deletions scripts/download_s2_monthly.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
"""Download monthly Sentinel-2 composites per tile using Earth Engine."""
from __future__ import annotations

import argparse
import io
import logging
import sys
from concurrent.futures import ThreadPoolExecutor, as_completed
from pathlib import Path
from typing import Iterable, Tuple

import geopandas as gpd
import numpy as np
import requests

ROOT = Path(__file__).resolve().parents[1]
if str(ROOT) not in sys.path:
sys.path.insert(0, str(ROOT))

import ee # type: ignore

from utils.lamwo_pipeline import (
ee_bytes_to_img,
geom_to_ee,
init_ee,
reshape_monthly_stack,
)

LOGGER = logging.getLogger(__name__)

DEFAULT_PROJECT_ID = "your-ee-project-id"
DEFAULT_GRID_PATH = Path("PATH/TO/input/lamwo_grid.geojson")
DEFAULT_OUTPUT_DIR = Path("PATH/TO/output/s2_tiles")
S2_DATASET_ID = "COPERNICUS/S2_SR_HARMONIZED"
DEFAULT_BANDS = ["B2", "B3", "B4", "B5", "B6", "B7", "B8", "B8A", "B11", "B12"]


def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("--project-id", default=DEFAULT_PROJECT_ID, help="Earth Engine project ID (update placeholder)")
parser.add_argument("--grid-path", type=Path, default=DEFAULT_GRID_PATH, help="Path to the tile grid GeoJSON")
parser.add_argument("--output-dir", type=Path, default=DEFAULT_OUTPUT_DIR, help="Directory for tile numpy stacks")
parser.add_argument("--year", type=int, default=2024, help="Calendar year to composite")
parser.add_argument("--dataset-id", default=S2_DATASET_ID, help="Earth Engine dataset to sample (default: Sentinel-2 SR)")
parser.add_argument("--bands", nargs="+", default=DEFAULT_BANDS, help="Band list to include per month")
parser.add_argument("--max-cloud", type=float, default=20.0, help="Cloud percentage threshold")
parser.add_argument("--scale", type=float, default=10.0, help="Pixel scale in metres for downloads")
parser.add_argument("--format", default="NPY", choices=["NPY"], help="Download format (NPY only)")
parser.add_argument("--max-workers", type=int, default=1, help="Concurrent download workers")
parser.add_argument("--http-timeout", type=int, default=600, help="HTTP timeout per tile download (seconds)")
return parser.parse_args()


def monthly_stack(region: ee.Geometry, *, year: int, dataset_id: str, bands: Iterable[str], max_cloud: float) -> ee.Image:
collection = ee.ImageCollection(dataset_id)
composites = []
for month in range(1, 13):
start = ee.Date.fromYMD(year, month, 1)
end = start.advance(1, "month")
filtered = (
collection
.filterBounds(region)
.filterDate(start, end)
.filter(ee.Filter.lt("CLOUDY_PIXEL_PERCENTAGE", max_cloud))
.select(list(bands))
)
image = filtered.median()
month_str = ee.Number(month).format("%02d")
renamed = image.rename(image.bandNames().map(lambda b: ee.String(b).cat("_m").cat(month_str)))
composites.append(renamed)
return ee.Image.cat(composites)


def resolve_tile_label(row, fallback_index: int) -> str:
for candidate in ("tile_id", "tileID", "id"):
value = row.get(candidate)
if value not in (None, ""):
return str(value)
return f"{fallback_index:04d}"


def process_tile(
idx: int,
row,
*,
args: argparse.Namespace,
band_names: Iterable[str],
) -> Tuple[str, Path]:
tile_label = resolve_tile_label(row, idx)
geom = row.geometry
region = geom_to_ee(geom)
image = monthly_stack(
region,
year=args.year,
dataset_id=args.dataset_id,
bands=args.bands,
max_cloud=args.max_cloud,
)
url = image.getDownloadURL(
{
"bands": list(band_names),
"region": region,
"scale": args.scale,
"format": args.format,
}
)
response = requests.get(url, timeout=args.http_timeout)
response.raise_for_status()
payload = io.BytesIO(response.content)
pixel_data = np.load(payload, allow_pickle=True)
array = ee_bytes_to_img(pixel_data)
cube = reshape_monthly_stack(array, len(args.bands))

output_dir = args.output_dir
output_dir.mkdir(parents=True, exist_ok=True)
output_path = output_dir / f"tile_{tile_label}_{args.year}.npy"
np.save(output_path, cube)
LOGGER.info("Saved %s", output_path)
return tile_label, output_path


def main() -> None:
logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s")
args = parse_args()

init_ee(args.project_id)

tiles = gpd.read_file(args.grid_path).to_crs(epsg=4326).reset_index(drop=True)
band_names = [f"{band}_m{month:02d}" for month in range(1, 13) for band in args.bands]

if args.max_workers > 1:
with ThreadPoolExecutor(max_workers=args.max_workers) as executor:
futures = [
executor.submit(process_tile, idx, row, args=args, band_names=band_names)
for idx, row in tiles.iterrows()
]
for future in as_completed(futures):
try:
future.result()
except Exception as exc: # pragma: no cover - surfacing errors
LOGGER.error("Tile download failed: %s", exc)
raise
else:
for idx, row in tiles.iterrows():
process_tile(idx, row, args=args, band_names=band_names)


if __name__ == "__main__":
main()
Loading