Reinforcement learning agents for the board game Quoridor on a 3x3 board with 1 wall per player.
Game engine:
game.py— game logic, Alpha-Beta, MCTSquoridor_env.py— Gymnasium environment (flat/grid obs, sparse/dense reward)wrappers.py— observation, reward, and action mask wrappersopponents.py— baseline agents (Random, GreedyPath, Blocking, Minimax)
Value-based methods:
deep_q_network.py,double_deep_q_network.py,dueling_deep_q_network.py,categorical_deep_q_network.py,rainbow_deep_q_network.py— DQN variant implementationstrain_all.py,run_train_all.py— training scriptsbest_params.json— tuned hyperparameters per modeldqn_agents.py— unified loader for all DQN models
Policy gradient methods:
train_pg.py— REINFORCE, A2C, PPO, TRPO trainingpolicy_agents.py— unified loader for PG models
Evaluation:
arena.py— round-robin tournament frameworkeval_ppo.py— tournament runnervisualize.py— game replay and visualizationquoridor_research.ipynb— main analysis notebook with all results
Trained models:
models/— DQN models (20 variants, retrained with alternating starts)pg_results/— PG models and training curves
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install stable-baselines3 sb3-contrib torchrlTrain policy gradient models:
python train_pg.pyTrain DQN models:
python run_train_all.pyRun the full tournament:
python arena.pyAnalysis notebook:
jupyter notebook quoridor_research.ipynb