feat: major refactor#242
Closed
martjanz wants to merge 53 commits into
Closed
Conversation
Add unit tests for DuckDB Arrow-registration memory safety, verifying every register call is balanced by unregister and that save_legs uses zero Arrow registrations via parquet staging.
Replace stale `h3_o` column references with `h3` in geo equivalence and desire-line helpers. Compute polygon centroid from geometry instead of relying on dropped `polygon_lat/lon` mean columns.
Resolves 18 files with conflict markers. Strategy: keep HEAD's StorageContext/ DuckDB architecture throughout, integrate VV's functional improvements: - Rename assign_gps_destination → assign_time_distances; now computes distance_od, distance_route, distance_route_gps and saves to travel_times_legs/trips tables - preparo_dashboard: query etapas/viajes with LEFT JOIN to travel_times_* to pull pre-computed distance and speed columns - kpi.py: read_data_for_daily_kpi joins travel_times_legs; add inf cleanup and rounding to compute_kpi_by_line_day; add pd.set_option for pandas - misc.py: persist_indicators queries use distance_od from travel_times_trips - run_process.py: procesar_transacciones uses assign_time_distances, moves rearrange_trip_id_same_od before it, removes redundant add_distance calls - routes.py: improved assert error message from VV - kpi_lineas.py: column renames (distance_km→distance_route, distancia→ distance_od, travel_speed→kmh_od) and vehiculos_operativos logic - configs: alias_db_insumos key, new GPS column names preserved from VV
…run_basic_kpi output
…r first-run tables
… fo_mean) to VV names throughout dashboard and schema
…n resumen_x_linea
…configs without alias_db_insumos
…s, not autogenerated config
…s inline from corrida name
…g to GPS schema DDL
…ce on autogenerated config file
…eh_hour; guard NULL with pd.to_numeric
… h3_o/h3_d not h3)
… aliases centrally - Add `--config` CLI flag support in dashboard.py; propagates path via URBANTRIPS_CONFIG env var so all modules pick it up - Extract `_get_base_config_path()` and `get_project_root()` to honour URBANTRIPS_CONFIG when locating configs and DB files - Add `resolve_db_aliases()` to centralise alias_db priority logic (alias_db_insumos > alias_db, per-corrida overrides) - Simplify `leer_alias()` using the new helper; fix multi-corrida path to only bypass aliases when no shared alias_db key exists - Rename `traigo_db_path` → `get_db_path`; prepend project-root candidates so absolute paths resolve correctly - Fall back to config file corridas list in `traer_dias_disponibles()` when DB query fails or returns empty
Add id_etapa column to travel_times dataframe to avoid KeyError during time/distance assignment. Fix polygon centroid calculation to project to EPSG:3857 before computing centroid (ensuring metric accuracy), then reproject to EPSG:4326 for coordinates.
fix in destination for min distance and fix in legs to use use_gps instead of nombre_archivo_gps
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request restructures and modernizes the project's CI/CD and configuration setup, introduces a new dependency management system, and finalizes a complex merge of two major development branches. The most significant changes are the refactoring and separation of GitHub Actions workflows, integration of Pipenv for dependency management, updates to configuration and data files, and a detailed merge of architectural and algorithmic improvements from two diverged branches.
CI/CD and Dependency Management Improvements:
build.yml(now only for publishing on tag),docs.yml(for documentation builds), andtests.yml(for linting, unit, integration, and packaging), providing clearer, more maintainable automation. The test workflow now uses Python 3.12 and the Ruff linter, and coverage reports are uploaded as artifacts.Pipfilefor dependency management with Pipenv, specifying all core dependencies and Python 3.12 compatibility.Configuration and Data Updates:
configs/configuraciones_generales.yamlto rename keys (e.g.,alias_db→alias_db_insumos), add new GPS-related columns, and introduce new batch processing and documentation for GPS fields.data/data_ciudad/stops.csvto update the stop data schema and values, reflecting a new format and presumably new data.Merge and Refactor of Major Feature Branches:
validar_velocidadintofeat/refactor-clean, blending architectural changes (new storage context, DuckDB, CLI, logging) with algorithmic improvements (GPS-based distance/speed, new travel time tables, vectorized KPI helpers, and comprehensive column renames). The merge strategy prioritized the refactored architecture while integrating new logic and discarding obsolete code.assign_time_distancesfunction, reordering of processing steps, improved dashboard data loading, KPI calculation enhancements, and consistent column renames throughout the codebase and documentation.Documentation Updates:
resultados.rst) to reflect new column names (e.g.,speed_kmh→kmh_route,distance_osm_drive→distance_od) for consistency with the merged codebase.Miscellaneous:
dashboard.batfor launching the dashboard via Streamlit.Most Important Changes:
1. CI/CD and Dependency Management
build.yml(publishing only),docs.yml(documentation), andtests.yml(lint/unit/integration/package build), modernizing the automation pipeline and improving maintainability.Pipfilefor dependency management, specifying all required packages and Python version.2. Major Feature Branch Merge and Refactor
validar_velocidadintofeat/refactor-clean, integrating architectural refactor (storage context, DuckDB, CLI, logging) with new GPS-based distance/speed logic, new travel time tables, and consistent column renames.3. Configuration and Data File Updates
4. Documentation Consistency
5. Developer Experience