Traffic Light Optimization with Deep Reinforcement Learning

A comprehensive experimental framework for testing three hypotheses about Deep Q-Networks (DQN) for traffic light control using the CityFlow simulator.

🎯 Project Overview

This project implements and statistically validates three distinct approaches to traffic light control using DQN:

H1: Standard vs Future-Aware state representations
H2: MaxPressure reward with simplified states
H3: Multi-Agent coordination strategies

Each hypothesis is tested against baselines with rigorous statistical analysis (10 independent runs, paired t-tests, effect sizes).

🔬 Hypotheses

H1: Standard vs Future-Aware DQN

Hypothesis: A Single-Agent DQN using standard queue metrics will outperform Fixed-Time controllers in low-variance traffic, but will fail to adapt to high-variance surges due to lack of future-aware state features.

Implementations:

H1-Basic: Queue snapshots only
H1-Enhanced: Queue snapshots + temporal features (phase duration, derivatives, phase history)

Test Scenarios: Low-variance vs High-variance/Surge traffic

H2: MaxPressure Reward Decoupling

Hypothesis: Decoupling the reward function (MaxPressure) from state representation (simplified queue snapshots) will achieve PressLight-level throughput with significantly lower computational overhead.

Implementation:

H2-MaxPressure: Simple queue state + MaxPressure reward

Comparison: Performance vs computational cost trade-off

H3: Multi-Agent Coordination

Hypothesis: Explicit sharing of immediate neighbor phases is the primary driver of coordination in multi-agent systems. Simple Shared-Phase DQN will statistically match complex GAT-based models during congestion peaks in grid topologies.

Implementations:

H3-Independent: No coordination (baseline)
H3-Shared-Phase: Explicit neighbor phase sharing

Test Focus: Congestion recovery rates and coordination metrics

📁 Project Structure

traffic-light-agent/
├── README.md                      # This file
├── IMPLEMENTATION_PLAN.md         # Detailed design document
├── IMPLEMENTATION_SUMMARY.md      # What was built
│
├── scenarios/                     # Traffic scenario generation
│   ├── generate_variance.py       # Low/high variance generators
│   ├── generate_surge.py          # Surge scenario generators
│   ├── validate_scenarios.py      # Validation script
│   └── configs/                   # Generated scenario configs
│       ├── config_low_variance.json
│       ├── config_high_variance.json
│       ├── config_moderate_surge.json
│       └── config_extreme_surge.json
│
├── models/                        # All DQN implementations
│   ├── __init__.py                # Package exports
│   ├── h1_basic.py                # H1: Standard DQN
│   ├── h1_enhanced.py             # H1: Temporal features DQN
│   ├── h2_maxpressure.py          # H2: MaxPressure reward
│   ├── h3_multi_agent.py          # H3: Multi-agent coordination
│   └── baselines.py               # Fixed-time controller
│
├── experiments/                   # Experiment runners
│   ├── run_h1.py                  # Full H1 experiments (10 runs)
│   └── run_h1_quick.py            # Quick H1 validation (3 runs)
│
├── results/                       # Experiment outputs
│   ├── h1_results.json            # H1 statistical results
│   └── plots/                     # Generated visualizations
│
└── reports/                       # Analysis reports
    └── (generated after experiments)

🔧 Installation

Prerequisites

Python 3.8+
CityFlow (traffic simulator)
PyTorch
NumPy, SciPy, Matplotlib, Pandas

Setup

Install CityFlow:

# Ensure CityFlow is built in ../CityFlow/build/
# Or adjust sys.path in model files

Install Python dependencies:

pip install torch numpy scipy matplotlib pandas

Generate traffic scenarios:

python scenarios/generate_variance.py
python scenarios/generate_surge.py

Validate scenarios:
```
python scenarios/validate_scenarios.py
```

🚀 Quick Start

Test that everything works:

# Quick sanity check
python test_h1_quick.py

Run a single model:

from models import H1BasicAgent, CityFlowEnv

# Create environment
env = CityFlowEnv("scenarios/configs/config_low_variance.json",
                   frame_skip=1, max_steps=1000)

# Create agent
agent = H1BasicAgent(env)

# Train
rewards = agent.train(episodes=50)

# Evaluate
eval_rewards = agent.evaluate(episodes=5)
print(f"Average reward: {sum(eval_rewards)/len(eval_rewards):.2f}")

🧪 Running Experiments

H1 Quick Validation (15 minutes)

python experiments/run_h1_quick.py

3 runs per configuration
30 episodes per run
Tests both H1-Basic and H1-Enhanced
Outputs quick statistical comparison

H1 Full Experiments (30-60 minutes)

python experiments/run_h1.py

10 runs per configuration (rigorous)
50 episodes per run
Complete statistical analysis
Generates plots with significance markers
Outputs: results/h1_results.json and results/plots/h1_comparison.png

H2 and H3 Experiments

# TODO: Create run_h2.py and run_h3.py
# Similar structure to run_h1.py

📊 Understanding Results

Statistical Output Format

Experiment 1: H1-Basic vs Fixed-Time (Low Variance)
  H1-Basic:   -7653.25 ± 955.15
  Fixed-Time: -4.67 ± 0.00
  t=-11.325, p=0.0077, d=-6.538
  Result: Fixed-Time ✅

Interpretation:

Mean ± Std: Average reward across 10 runs with standard deviation
t-statistic: Measures difference relative to variance
p-value: Probability results are due to chance (p < 0.05 = significant)
Cohen's d: Effect size (0.2=small, 0.5=medium, 0.8=large)
Result: Winner with ✅ if statistically significant

Effect Size Interpretation

Cohen's d	Interpretation
< 0.2	Negligible
0.2 - 0.5	Small
0.5 - 0.8	Medium
> 0.8	Large

🤖 Model Descriptions

H1-Basic

File: models/h1_basic.py

State Features:

Normalized lane waiting vehicle counts

Reward:

Negative mean waiting vehicles

Network:

Input → Linear(128) → ReLU → Linear(64) → ReLU → Output

Use Case: Baseline for H1 hypothesis

H1-Enhanced

File: models/h1_enhanced.py

State Features:

Normalized lane waiting vehicle counts
Queue derivatives (rate of change)
Normalized phase duration
Phase history (last 4 phases, one-hot encoded)

Reward:

Same as H1-Basic

Network:

Input → LayerNorm → Linear(256) → ReLU → Linear(128) → ReLU →
Linear(64) → ReLU → Output

Use Case: Handle high-variance traffic surges

H2-MaxPressure

File: models/h2_maxpressure.py

State Features:

Simple normalized queue counts (same as H1-Basic)

Reward:

Pressure(phase) = incoming_vehicles - outgoing_vehicles
Reward = max(Pressure across all phases)

Network:

Same as H1-Basic

Use Case: Test if reward shaping > state complexity

H3-Multi-Agent

File: models/h3_multi_agent.py

Coordination Modes:

Independent:
- State: Own queue state only
- No coordination
Shared-Phase:
- State: Own queue state + neighbor current phases (one-hot)
- Explicit phase sharing

Network:

One DQN per intersection
Separate replay buffers

Use Case: Test coordination strategies

🚦 Traffic Scenarios

Low Variance

File: scenarios/configs/config_low_variance.json

Constant arrival rates (5s ± 0.5s)
12 routes through intersection
Purpose: Test baseline DQN performance

High Variance

File: scenarios/configs/config_high_variance.json

Mixed intervals (2.5s, 5s, 7.5s)
Different flow rates per route
Purpose: Test adaptation to varying densities

Moderate Surge

File: scenarios/configs/config_moderate_surge.json

Base: 5s interval
Surges: 1.5s interval for 100 steps
3 surge events per episode (every 300 steps)
Purpose: Test response to traffic spikes

Extreme Surge

File: scenarios/configs/config_extreme_surge.json

Base: 5s interval
Surges: 1.0s interval for 150 steps
3 surge events per episode (every 350 steps)
Purpose: Test extreme congestion handling

📈 Statistical Analysis

Experimental Design

N = 10 runs per configuration (different random seeds)
Paired comparisons on same scenarios
Multiple scenarios per hypothesis

Statistical Tests

Paired T-Test

from scipy import stats
t_stat, p_value = stats.ttest_rel(group1, group2)

When: Comparing two models
Assumption: Normal distribution of differences
Significance: α = 0.05

Effect Size (Cohen's d)

diff = group1 - group2
cohens_d = mean(diff) / std(diff)

Measures practical significance
Independent of sample size

Visualization

Box plots with error bars
Statistical significance markers (*, **, ***)
Training curves with confidence intervals

🛠️ Implementation Details

Training Hyperparameters

Parameter	Value	Notes
Learning Rate	1e-3 (8e-4 for Enhanced)	Adam optimizer
Gamma (γ)	0.99	Discount factor
Epsilon Start	1.0	Initial exploration
Epsilon End	0.1	Final exploration
Epsilon Decay	0.995	Per episode
Batch Size	128	Replay buffer sampling
Replay Buffer	5000	Experience capacity
Episodes	50-150	Configurable

CityFlow Settings

Parameter	Value
Interval	1.0s
Frame Skip	1 (no skip)
Max Steps	1000 per episode
Thread Num	1
Roadnet	roadnet-adv.json

State Normalization

All queue counts are normalized by max count to keep values in [0, 1]:

normalized = queue_counts / max(queue_counts) if max(queue_counts) > 0 else queue_counts

🐛 Troubleshooting

CityFlow Import Error

ModuleNotFoundError: No module named 'cityflow'

Solution: Ensure CityFlow is built and path is correct in model files:

sys.path.insert(0, str(Path(__file__).resolve().parent.parent / "CityFlow" / "build"))

Scenario Validation Fails

Assertion failed: (timeInterval >= 1 || ...)

Solution: CityFlow requires vehicle spawn interval >= 1.0 seconds. Check flow files.

Training is Very Slow

Solutions:

Reduce max_steps (e.g., 500 instead of 1000)
Reduce number of episodes
Use frame_skip > 1 (but affects learning)

📚 References

Papers

PressLight: Learning Phase Competition for Traffic Signal Control
CoLight: Learning Network-level Traffic Signal Control with Graph Attention
DQN: Playing Atari with Deep Reinforcement Learning (Mnih et al., 2013)

Tools

CityFlow: https://cityflow-project.github.io/
PyTorch: https://pytorch.org/
SciPy: https://scipy.org/

👥 Contributing

This is a research project. Key areas for extension:

Implement H2 and H3 experiment runners
Add more baseline comparisons (e.g., Actuated control)
Extend to larger road networks
Add additional coordination mechanisms
Implement proper MaxPressure lane mappings

📄 License

Academic research project. Check with your institution for usage rights.

🙏 Acknowledgments

CityFlow team for the traffic simulator
PyTorch community
CS4644 course staff

📞 Contact

For questions about this implementation, please refer to:

IMPLEMENTATION_PLAN.md for design decisions
IMPLEMENTATION_SUMMARY.md for what was built
Code comments in individual model files

Last Updated: 2025-11-29

Status: ✅ All models implemented and validated. Ready for comprehensive experiments.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
__pycache__		__pycache__
data		data
experiments		experiments
models		models
reports		reports
results		results
scenarios		scenarios
utils		utils
Figure_1.png		Figure_1.png
Figure_2.png		Figure_2.png
README.md		README.md
dqn_model.pth		dqn_model.pth
dqn_model_multi_scenario.pth		dqn_model_multi_scenario.pth
dqn_multi_scenario_results.png		dqn_multi_scenario_results.png
dqn_training_results.png		dqn_training_results.png
main.py		main.py
models.py		models.py
models2.py		models2.py
test.py		test.py
test_h1_quick.py		test_h1_quick.py
train_dqn.py		train_dqn.py
train_dqn_multi_scenario.py		train_dqn_multi_scenario.py

Folders and files

Latest commit

History

Repository files navigation

Traffic Light Optimization with Deep Reinforcement Learning

🎯 Project Overview

📋 Table of Contents

🔬 Hypotheses

H1: Standard vs Future-Aware DQN

H2: MaxPressure Reward Decoupling

H3: Multi-Agent Coordination

📁 Project Structure

🔧 Installation

Prerequisites

Setup

🚀 Quick Start

Test that everything works:

Run a single model:

🧪 Running Experiments

H1 Quick Validation (15 minutes)

H1 Full Experiments (30-60 minutes)

H2 and H3 Experiments

📊 Understanding Results

Statistical Output Format

Effect Size Interpretation

🤖 Model Descriptions

H1-Basic

H1-Enhanced

H2-MaxPressure

H3-Multi-Agent

🚦 Traffic Scenarios

Low Variance

High Variance

Moderate Surge

Extreme Surge

📈 Statistical Analysis

Experimental Design

Statistical Tests

Paired T-Test

Effect Size (Cohen's d)

Visualization

🛠️ Implementation Details

Training Hyperparameters

CityFlow Settings

State Normalization

🐛 Troubleshooting

CityFlow Import Error

Scenario Validation Fails

Training is Very Slow

📚 References

Papers

Tools

👥 Contributing

📄 License

🙏 Acknowledgments

📞 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages