Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
View / edit / reply to this conversation on ReviewNB hrodmn commented on 2026-03-19T01:38:46Z Should we target the Hub for this instead of the ADE? |
|
View / edit / reply to this conversation on ReviewNB hrodmn commented on 2026-03-19T01:38:47Z Great use of the
I don't love that we have to download these entire granule files to work on them in R. You could add something like "if your workflow is taking too long due to the download process consider using the Python workflow" and then link to the NISAR Python notebook. |
|
View / edit / reply to this conversation on ReviewNB hrodmn commented on 2026-03-19T01:38:47Z This is a really nice snippet but we should update this to clip out a specific area of interest (in projected coordinates) rather than grid cell indexes. |
hrodmn
left a comment
There was a problem hiding this comment.
@HarshiniGirish really nice job on this one. It is succinct and to the point, and the formatting is so clean 🫶 .
I have a few change requests:
- Replace the local download option method with some kind of cloud-native data access path. I know support for reading from S3 in R is not great but I think there is a solution out there.
- Change the subset operation at the end to use projected coordinates instead of grid cell indexes. This might be a bit of work, but it will be what users want to be able to do.
For the cloud-optimized read solution there are a few possibilities:
Use rhdf5 instead of hdf5r: see https://huber-group-embl.github.io/rhdf5/articles/rhdf5_cloud_reading.html
Maybe we could use GDAL drivers via the terra package to load the file lazily (without downloading the entire file), but I am not really sure how well terra handles the complex HDF5 data structure.
I tried this:
vsis3_path = "/vsis3/sds-n-cumulus-prod-nisar-products/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_002_109_D_063_4005_DHDH_A_20251012T182508_20251012T182531_X05010_N_P_J_001/NISAR_L2_PR_GCOV_002_109_D_063_4005_DHDH_A_20251012T182508_20251012T182531_X05010_N_P_J_001.h5"
# got creds from a python session
setGDALconfig("AWS_SECRET_ACCESS_KEY", "...")
setGDALconfig("AWS_ACCESS_KEY_ID", "...")
setGDALconfig("AWS_SESSION_TOKEN", "...")
setGDALconfig("AWS_REGION", "us-west-2")
# Enable the virtual file system cache
setGDALconfig("VSI_CACHE", "TRUE")
# Set the size of that cache (e.g., 500 MB)
# This prevents re-downloading the same blocks during analysis
setGDALconfig("VSI_CACHE_SIZE", "500000000")
# Increase the global block cache (default is usually too small)
# This can be a % of your RAM or a specific byte value
setGDALconfig("GDAL_CACHEMAX", "20%")
cube <- sds(vsis3_path)
cube
No description provided.