- Purpose: End-to-end pipeline orchestrating data generation, model fitting, and analysis for any census geometry type.
- Steps:
- Generate adjacency network (via
GeometryWeightsGenerator) - Generate topology statistics (via
pp_topology.py) - Generate flooding dataset (via
generate_flooding_dataset.py) - Add external covariates (via
add_covariates_to_flooding_dataset.py) - Fit ICAR model (via
icar_model.py) - Copy context dataframe to run directory
- Generate adjacency network (via
- CLI:
python pipeline.py --geometry-type {ct,cbg,cb} --prefix STR [options] - Supports:
--external-covariates,--skip-data-generation,--data-only,--force-regenerate,--downsample-frac,--downsample-all-images,--trim-to-median,--compare-to-baselines
- Purpose: Centralized configuration for multi-geometry support (Census Tracts, Block Groups, Blocks).
- Key types:
GeometryTypeenum:CT,CBG,CBGeometryConfigdataclass: display name, ID column, file prefix, default adjacency bufferGeometryPathsclass: path factory for geometry-specific file paths (GeoJSON, adjacency, datasets, topology, runs)
- Factory:
get_geometry_paths(geometry_type, base_dir)→GeometryPaths - Default geometry type controlled by
BAYFLOOD_GEOMETRY_TYPEenvironment variable (default:ct).
- Purpose: Train ICAR/CAR-based Bayesian models in Stan via the Python
pystanbackend; manage runs and outputs. - Main class:
ICAR_MODEL- Key init args:
PREFIX: run prefix used inruns/<...>ICAR_PRIOR_SETTING: one of"none" | "icar" | "proper" | "just_model_p_y"ANNOTATIONS_HAVE_LOCATIONS: bool; enables annotation-location model and external covariates pathwayEXTERNAL_COVARIATES: bool; when true, buildsexternal_covariatesmatrix insideutil.read_real_dataSIMULATED_DATA: bool; use simulated data generation inutil.generate_simulated_dataESTIMATE_PARAMS: subset of["p_y", "at_least_one_positive_image_by_area", "at_least_one_positive_image_by_area_if_you_have_100_images"]EMPIRICAL_DATA_PATH: path to processed dataset CSVadj: adjacency input paths (edge lists or.npy)adj_matrix_storage: True if.npyadjacency path provideddownsample_frac: float, downsampling of annotated imagesGEOMETRY_TYPE: geometry type string (ct,cbg,cb)
- Key methods:
load_data(): Loads empirical or simulated data, validates inputs, and constructsobserved_datafit(CYCLES, WARMUP, SAMPLES, data_already_loaded): Builds Stan model per setting; samples and returns(fit, df)plot_results,plot_histogram,plot_scatter: Diagnostics and plotswrite_estimate: Writesestimate_<param>.csvwith CIscompare_to_baselines: Train/test split baselines and comparisons
- Key init args:
- CLI:
python icar_model.py <icar_prior_setting> [--annotations_have_locations] [--simulated_data] [--external_covariates] [--no_catch_basins] [--prefix STR] [--downsample_frac FLOAT] [--downsample_all_images] [--trim_to_median] [--trim_remove_frac FLOAT] [--empirical_data_path PATH] [--adj_node1_path PATH] [--adj_node2_path PATH] [--adj_npy_path PATH] [--geometry_type {ct,cbg,cb}] [--compare_to_baselines]
- Purpose: Data IO, adjacency handling, covariate engineering, simulation, and validation.
- Key functions:
read_real_data(fpath, annotations_have_locations, adj, adj_matrix_storage, use_external_covariates)→(observed_data, external_covariates_info)validate_observed_data(observed_data, annotations_have_locations, downsample_frac)generate_simulated_data(N, images_per_location, total_annotated_classified_negative, total_annotated_classified_positive, icar_prior_setting, annotations_have_locations)
- Purpose: Merge ICAR run estimates with geometry boundaries, ACS features, topology summaries, FloodNet sensors, DEP stormwater coverage, and 311 counts to produce analysis CSVs.
- Main function:
generate_nyc_analysis_df(run_dir, custom_prefix, use_smoothing, base_dir='.', logger=None)→pd.DataFrame - Inputs: expects estimate CSVs in
run_dir, and data perdocs/DATA_DEPENDENCIES.md.
- Purpose: Visualize geometry-level estimates with overlays of positives, ground truth, FloodNet sensors, 311, and DEP polygons.
- Main function:
generate_maps(run_id, estimate_path, estimate='p_y' | 'at_least_one_positive_image_by_area')
- Purpose: Generate the flooding dataset (image counts and annotations per geometry unit) from raw inference outputs.
- Parameterized by geometry type via
--geometry-type.
- Purpose: Add external covariates (topology, DEP stormwater, FloodNet, 311) to the flooding dataset.
- Parameterized by geometry type via
--geometry-type.
- Purpose: Parameterized aggregation of flooding data and covariates to different census geography levels.
- CLI:
python aggregate_by_geometry.py --geometry-type {ct,cbg,cb}
- Purpose: Generate and analyze spatial weights for census geographies.
- Key class:
GeometryWeightsGenerator— supports custom geometric buffer, queen/rook contiguity, and distance-band adjacency methods. Used bypipeline.pyto generate adjacency networks.
- Purpose: Colored logging with custom
SUCCESSlevel;setup_logger(name)standardizes console logs.
- Purpose: Clear local Stan cache directory for a clean rebuild;
refresh_cache(base_dir=None).
- Purpose: Centralize defaults and environment overrides for paths and sampling params.
- Exposed:
DATASET_PATH,ADJ_NODE1_PATH,ADJ_NODE2_PATH,ADJ_NPY_PATHEXTERNAL_COVARIATES,DEFAULT_WARMUP,DEFAULT_SAMPLES