ML & RL meet retro game arenas.
Three game AI agents โ each one smarter than the last โ built on the PAIA MLGame framework. From imitating an expert with a decision tree, to navigating battlefields with BFS, to learning combat strategy from scratch via Q-Learning.
| Project | Game Type | AI Approach | Key Techniques |
|---|---|---|---|
| Arkanoid | Brick Breaker | Supervised ML | Physics simulation โ Data collection โ Decision Tree |
| Swimming Squid | Competitive Foraging | Reinforcement Learning | Directional state quantization โ Tabular Q-Learning |
| TankMan | Team Tank Battle | Hybrid RL + Search | BFS pathfinding + Q-Learning combat controller |
A classic brick breaker where the AI learns to control the paddle by imitating a physics-based expert policy.
The pipeline has three stages: a handcrafted script auto-plays the game using ball trajectory prediction, the resulting frame-by-frame decisions are saved as training data, and a Decision Tree classifier learns to replicate that behavior.
flowchart LR
A["๐ฎ Rule-Based Agent
Ball physics & bounce prediction"] -->|auto-play| B["๐พ Data Collection
Per-frame features + actions"]
B -->|train| C["๐ณ Decision Tree
Predict paddle movement"]
C -->|deploy| D["๐น๏ธ ML Agent
Real-time inference"]
The model receives these features each frame:
| Feature | Description |
|---|---|
ball_x, ball_y |
Current ball position |
delta_x, delta_y |
Ball velocity vector |
direction |
Encoded ball direction (4 quadrants) |
platform_x |
Current paddle position |
frame |
Frame number |
The model predicts one of three paddle commands: MOVE_LEFT (โ1), MOVE_RIGHT (+1), or NONE (0).
- Bounce prediction accounts for wall reflections using an even/odd parity method to compute the final landing X coordinate.
- Brick collision is considered โ the agent simulates ball reflection off bricks to adjust the predicted landing point.
- Randomized thresholds in the expert script add natural variance to the training data, improving model robustness.
๐ View game rules & details โ
A competitive 2-player ocean foraging game. Each squid eats food for points, avoids garbage, and can collide with the opponent for bonus/penalty scoring. The agent learns an optimal movement policy entirely through Q-Learning.
The environment is discretized by computing a weighted score for each of the four movement directions (food value divided by distance, plus opponent threat), then ranking those scores into a compact state tuple. A tabular Q-Learning agent explores this state space over 150 training rounds with decaying ฮต-greedy exploration.
flowchart LR
A["๐ Environment
Foods, garbage, opponent"] -->|observe| B["๐ State Quantization
Score each direction
by value รท distance"]
B -->|rank| C["๐ข Discrete State
4D rank tuple
e.g. (2,0,3,1)"]
C -->|ฮต-greedy| D["๐ Q-Table
4ร4ร4ร4ร4"]
D -->|action| E["๐ฆ Agent
UP / DOWN / LEFT / RIGHT"]
E -->|reward| D
Each frame, the agent processes all visible food and the opponent into four directional buckets (UP, DOWN, LEFT, RIGHT). Items are scored by value / (distance + 1) and summed per direction. The four sums are then rank-ordered (0โ3), producing a compact 4D state.
| Parameter | Value | Strategy |
|---|---|---|
| State space | 4โด ร 4 = 1,024 entries | Rank-based discretization |
| Exploration (ฮต) | 1.0 โ 0.01 | Linear decay over 150 rounds |
| Learning rate (ฮฑ) | 1.0 โ 0.01 | Linear decay over 150 rounds |
| Discount (ฮณ) | 0.9 | โ |
The reward is based on alignment between the chosen action and the optimal direction ranking โ the agent receives higher reward for moving toward the direction with the best score.
๐ View game rules & details โ
A team-based tank battle game combining BFS pathfinding for resource management with Q-Learning for combat aiming. The agent switches between two behavioral modes depending on the tactical situation.
The agent operates a priority-based decision loop: when fuel or ammo is low, it uses BFS on a discretized grid map to navigate to the nearest supply station. When an enemy is within range, it switches to a Q-Learning policy that controls turret aiming and firing decisions.
flowchart TD
A["๐ฏ Decision Loop"] --> B{"Low fuel or ammo?"}
B -->|Yes| C["๐บ๏ธ BFS Pathfinding
Grid map โ nearest station"]
B -->|No| D{"Enemy within 300px?"}
D -->|Yes| E["๐ค Q-Learning Combat
Aim & shoot policy"]
D -->|No| F["๐งฑ Wall Destruction
Scan & clear obstacles"]
C -->|FORWARD / BACKWARD
TURN_LEFT / TURN_RIGHT| G["๐น๏ธ Execute Action"]
E -->|SHOOT / AIM_RIGHT
AIM_LEFT / BACKWARD| G
F -->|SHOOT / AIM_RIGHT
random move| G
The map is discretized into a 50ร30 grid. Walls, stations, teammates, and enemies are projected onto this grid. The BFS search operates in a 3D state space (row, col, angle) โ considering the tank's facing direction โ and returns the shortest path to the nearest fuel or bullet station.
When an enemy enters range, the agent computes a state vector for the Q-table:
| State Dimension | Values | Description |
|---|---|---|
angle_diff |
0โ8 | Discretized angle between gun and enemy (45ยฐ bins) |
turning_direction |
0โ1 | Clockwise vs. counter-clockwise to target |
is_cooldown |
0โ1 | Whether the gun is on cooldown |
teammate_angle_diff |
0โ8 | Angle to nearest teammate (friendly fire avoidance) |
The Q-table has shape (9, 2, 2, 9, 5) mapping states to five actions: SHOOT, AIM_RIGHT, AIM_LEFT, BACKWARD, and a fallback wall-destruction mode.
| Parameter | Value | Strategy |
|---|---|---|
| State space | 9ร2ร2ร9ร5 = 1,620 entries | Angle-based discretization |
| Exploration (ฮต) | 1.0 โ 0.01 | Linear decay over 170 rounds |
| Learning rate (ฮฑ) | 1.0 โ 0.01 | Linear decay over 170 rounds |
| Discount (ฮณ) | 0.9 | โ |
๐ View game rules & details โ
- Language: Python 3.9
- Game Framework: PAIA MLGame, Pygame 2.0.1
- ML/RL: NumPy (tabular Q-Learning), scikit-learn (Decision Tree)
- Pathfinding: Custom BFS with directional state space
- Serialization: Pickle (Q-tables and trained models)
.
โโโ README.md # โ You are here
โโโ arkanoid/
โ โโโ README.md # Game rules & details
โ โโโ ml/
โ โโโ ml_play_template.py # Rule-based expert + data collection
โ โโโ ml_play_model.py # Trained Decision Tree agent
โโโ swimming-squid/
โ โโโ README.md # Game rules & details
โ โโโ ml/
โ โโโ handleData.py # State quantization + Q-Learning class
โ โโโ Qlearning.py # Training script
โ โโโ model_play.py # Trained Q-table agent
โโโ tankman/
โโโ README.md # Game rules & details
โโโ ml/
โโโ data_handler.py # State processing + Q-Learning class
โโโ find_station.py # BFS pathfinder
โโโ wall_handler.py # Wall detection & destruction
โโโ trainQL_play.py # Training script
โโโ ml_model_play.py # Trained hybrid agent
# Install dependencies
pip install mlgame pygame numpy scikit-learn
# Run any game with its AI agent
python -m mlgame -i ./ml/ml_play_model.py ./ --level <N>See each project's README for game-specific configuration and level options.
Harris โ Built as a portfolio exploring the progression from rule-based AI to reinforcement learning in competitive game environments.