Adaptive Massive Event Detection Algorithm

This repository contains a Python algorithm for detecting massive event-related demand peaks in public transport station time series.

The method combines seasonal decomposition, residual behaviour analysis, a Hampel-based preprocessing stage, and adaptive event detection using either IQR or MAD to compute thresholds depending on the detected temporal behaviour.

Overview

The pipeline is designed to process daily station demand data and identify dates associated with unusual demand increases that may correspond to massive events.

The workflow includes:

Time series preprocessing
Seasonal decomposition
Computation of CEI/PEI indicators
Hampel-based filtering and STL decomposition
Adaptive event detection
- IQR-based detection for collective event behaviours
- MAD-based detection for punctual event behaviours
Optional evaluation against known event calendars
Optional visualisation for single-station analysis
CSV export of detected events

Repository contents

metro-event-detection/
├── .gitignore
├── README.md
├── requirements.txt
├── Python/
|     ├── event_detection_pipeline.py
|     └── heatmap.py
└── data/
    └── README.md

Main script

The main script in this repository is:

event_detection_pipeline.py

The code is provided as a single script in order to preserve the original research workflow and execution logic.

Requirements

Install dependencies with:

pip install -r requirements.txt

The required Python packages are:

numpy
pandas
statsmodels
matplotlib
scikit-learn

Data availability

The datasets used in this project are not publicly available and are therefore not included in this repository.

This repository only provides the source code for the event detection algorithm. Users who want to run the pipeline must provide their own compatible datasets and reproduce the folder structure expected by the script.

Expected input data

This script expects external CSV files stored in relative paths.

Main dataset

The main input file is expected at:

../data/metro_madrid/daily_records_since_2017.csv

This file must contain at least the following columns:

FECHA - date
VALOR - daily card validation count
NOMBRE - station name

The script also optionally uses:

ORDEN - station order within the metro line

Optional ground-truth event files

For evaluation against known events, the script expects the following files:

../data/eventos_ventas_completo.csv
../data/eventos_santiago_bernabeu.csv
../data/eventos_estadio_metropolitano.csv

These files must contain at least:

FECHA - event date

Execution

Run the script from the command line:

python event_detection_pipeline.py

When executed, the script prompts the user for two inputs:

Station: exact station name or All
Year: a specific year such as 2023 or All

Example:

Station (exact name or All): Ventas
Year (e.g., 2023 or All): 2023

Supported execution modes

The script supports four main execution modes.

1. Single station, single year

Station (exact name or All): Ventas
Year (e.g., 2023 or All): 2023

This mode analyses one station for one specific year.

2. Single station, all available years

Station (exact name or All): Ventas
Year (e.g., 2023 or All): All

This mode analyses one station across all available years in the selected date range.

3. All stations, single year

Station (exact name or All): All
Year (e.g., 2023 or All): 2023

This mode analyses all stations for one specific year.

4. All stations, all available years

Station (exact name or All): All
Year (e.g., 2023 or All): All

This mode runs the full pipeline over all stations and all available years.

Input validation

Station names must match the names available in the input CSV. The year must be an integer available in the filtered dataset, or All. If an invalid station or year is entered, the script raises an error.

Internal execution flow

For each selected station and year, the script performs the following steps:

Load and preprocess the station time series.
Compute the seasonal decomposition.
Estimate the CEI and PEI indicators.
Apply the Hampel-based preprocessing and STL decomposition.
Select the detection mode:
- IQR-based detection for collective behaviour
- MAD-based detection for punctual behaviour
Filter low-level detections.
Compute evaluation metrics when ground-truth data is available.
Save detected events to a CSV file.

Evaluation

If a station has an associated ground-truth event file, the script computes:

confusion matrix
precision
recall
accuracy
ROC/AUC score when possible

Plots

Plots are only generated when a single specific station is selected.

In that case, the script may display figures such as:

observed series and minimum rolling baseline
observed vs expected series
residuals vs detection threshold
STL trend and seasonal components
ROC curve, when ground-truth labels are available

Output

Detected events are appended to the configured CSV output file.

By default, the script writes results to:

../data/detection_events/events_detected_all.csv

The exported file includes the following columns:

station_name
year
date
resid
threshold
mode

Notes on execution

The current configuration is set for the post-COVID period:

START_DATE = "2021-01-01"
END_DATE = "2024-09-30"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adaptive Massive Event Detection Algorithm

Overview

Repository contents

Main script

Requirements

Data availability

Expected input data

Main dataset

Optional ground-truth event files

Execution

Supported execution modes

1. Single station, single year

2. Single station, all available years

3. All stations, single year

4. All stations, all available years

Input validation

Internal execution flow

Evaluation

Plots

Output

Notes on execution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Python		Python
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_heatmap.md		README_heatmap.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Adaptive Massive Event Detection Algorithm

Overview

Repository contents

Main script

Requirements

Data availability

Expected input data

Main dataset

Optional ground-truth event files

Execution

Supported execution modes

1. Single station, single year

2. Single station, all available years

3. All stations, single year

4. All stations, all available years

Input validation

Internal execution flow

Evaluation

Plots

Output

Notes on execution

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages