This repository contains a Python algorithm for detecting massive event-related demand peaks in public transport station time series.
The method combines seasonal decomposition, residual behaviour analysis, a Hampel-based preprocessing stage, and adaptive event detection using either IQR or MAD to compute thresholds depending on the detected temporal behaviour.
The pipeline is designed to process daily station demand data and identify dates associated with unusual demand increases that may correspond to massive events.
The workflow includes:
- Time series preprocessing
- Seasonal decomposition
- Computation of CEI/PEI indicators
- Hampel-based filtering and STL decomposition
- Adaptive event detection
- IQR-based detection for collective event behaviours
- MAD-based detection for punctual event behaviours
- Optional evaluation against known event calendars
- Optional visualisation for single-station analysis
- CSV export of detected events
metro-event-detection/
├── .gitignore
├── README.md
├── requirements.txt
├── Python/
| ├── event_detection_pipeline.py
| └── heatmap.py
└── data/
└── README.md
The main script in this repository is:
event_detection_pipeline.py
The code is provided as a single script in order to preserve the original research workflow and execution logic.
Install dependencies with:
pip install -r requirements.txtThe required Python packages are:
numpypandasstatsmodelsmatplotlibscikit-learn
The datasets used in this project are not publicly available and are therefore not included in this repository.
This repository only provides the source code for the event detection algorithm. Users who want to run the pipeline must provide their own compatible datasets and reproduce the folder structure expected by the script.
This script expects external CSV files stored in relative paths.
The main input file is expected at:
../data/metro_madrid/daily_records_since_2017.csv
This file must contain at least the following columns:
FECHA- dateVALOR- daily card validation countNOMBRE- station name
The script also optionally uses:
ORDEN- station order within the metro line
For evaluation against known events, the script expects the following files:
../data/eventos_ventas_completo.csv
../data/eventos_santiago_bernabeu.csv
../data/eventos_estadio_metropolitano.csv
These files must contain at least:
FECHA- event date
Run the script from the command line:
python event_detection_pipeline.pyWhen executed, the script prompts the user for two inputs:
- Station: exact station name or
All - Year: a specific year such as
2023orAll
Example:
Station (exact name or All): Ventas
Year (e.g., 2023 or All): 2023
The script supports four main execution modes.
Station (exact name or All): Ventas
Year (e.g., 2023 or All): 2023
This mode analyses one station for one specific year.
Station (exact name or All): Ventas
Year (e.g., 2023 or All): All
This mode analyses one station across all available years in the selected date range.
Station (exact name or All): All
Year (e.g., 2023 or All): 2023
This mode analyses all stations for one specific year.
Station (exact name or All): All
Year (e.g., 2023 or All): All
This mode runs the full pipeline over all stations and all available years.
Station names must match the names available in the input CSV. The year must be an integer available in the filtered dataset, or All. If an invalid station or year is entered, the script raises an error.
For each selected station and year, the script performs the following steps:
- Load and preprocess the station time series.
- Compute the seasonal decomposition.
- Estimate the CEI and PEI indicators.
- Apply the Hampel-based preprocessing and STL decomposition.
- Select the detection mode:
- IQR-based detection for collective behaviour
- MAD-based detection for punctual behaviour
- Filter low-level detections.
- Compute evaluation metrics when ground-truth data is available.
- Save detected events to a CSV file.
If a station has an associated ground-truth event file, the script computes:
- confusion matrix
- precision
- recall
- accuracy
- ROC/AUC score when possible
Plots are only generated when a single specific station is selected.
In that case, the script may display figures such as:
- observed series and minimum rolling baseline
- observed vs expected series
- residuals vs detection threshold
- STL trend and seasonal components
- ROC curve, when ground-truth labels are available
Detected events are appended to the configured CSV output file.
By default, the script writes results to:
../data/detection_events/events_detected_all.csv
The exported file includes the following columns:
station_nameyeardateresidthresholdmode
The current configuration is set for the post-COVID period:
START_DATE = "2021-01-01"END_DATE = "2024-09-30"