Skip to content

vanillaSky00/neural-arcade

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

11 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ•น๏ธ neural-arcade

ML & RL meet retro game arenas.

Three game AI agents โ€” each one smarter than the last โ€” built on the PAIA MLGame framework. From imitating an expert with a decision tree, to navigating battlefields with BFS, to learning combat strategy from scratch via Q-Learning.

Python 3.9 MLGame

Project Game Type AI Approach Key Techniques
Arkanoid Brick Breaker Supervised ML Physics simulation โ†’ Data collection โ†’ Decision Tree
Swimming Squid Competitive Foraging Reinforcement Learning Directional state quantization โ†’ Tabular Q-Learning
TankMan Team Tank Battle Hybrid RL + Search BFS pathfinding + Q-Learning combat controller

Arkanoid โ€” Supervised Learning

A classic brick breaker where the AI learns to control the paddle by imitating a physics-based expert policy.

Approach

The pipeline has three stages: a handcrafted script auto-plays the game using ball trajectory prediction, the resulting frame-by-frame decisions are saved as training data, and a Decision Tree classifier learns to replicate that behavior.

flowchart LR
    A["๐ŸŽฎ Rule-Based Agent
    Ball physics & bounce prediction"] -->|auto-play| B["๐Ÿ’พ Data Collection
    Per-frame features + actions"]
    B -->|train| C["๐ŸŒณ Decision Tree
    Predict paddle movement"]
    C -->|deploy| D["๐Ÿ•น๏ธ ML Agent
    Real-time inference"]
Loading

State Features

The model receives these features each frame:

Feature Description
ball_x, ball_y Current ball position
delta_x, delta_y Ball velocity vector
direction Encoded ball direction (4 quadrants)
platform_x Current paddle position
frame Frame number

Actions

The model predicts one of three paddle commands: MOVE_LEFT (โˆ’1), MOVE_RIGHT (+1), or NONE (0).

Key Design Decisions

  • Bounce prediction accounts for wall reflections using an even/odd parity method to compute the final landing X coordinate.
  • Brick collision is considered โ€” the agent simulates ball reflection off bricks to adjust the predicted landing point.
  • Randomized thresholds in the expert script add natural variance to the training data, improving model robustness.

๐Ÿ“‚ View game rules & details โ†’


Swimming Squid โ€” Reinforcement Learning

A competitive 2-player ocean foraging game. Each squid eats food for points, avoids garbage, and can collide with the opponent for bonus/penalty scoring. The agent learns an optimal movement policy entirely through Q-Learning.

Approach

The environment is discretized by computing a weighted score for each of the four movement directions (food value divided by distance, plus opponent threat), then ranking those scores into a compact state tuple. A tabular Q-Learning agent explores this state space over 150 training rounds with decaying ฮต-greedy exploration.

flowchart LR
    A["๐ŸŒŠ Environment
    Foods, garbage, opponent"] -->|observe| B["๐Ÿ“ State Quantization
    Score each direction
    by value รท distance"]
    B -->|rank| C["๐Ÿ”ข Discrete State
    4D rank tuple
    e.g. (2,0,3,1)"]
    C -->|ฮต-greedy| D["๐Ÿ“Š Q-Table
    4ร—4ร—4ร—4ร—4"]
    D -->|action| E["๐Ÿฆ‘ Agent
    UP / DOWN / LEFT / RIGHT"]
    E -->|reward| D
Loading

State Representation

Each frame, the agent processes all visible food and the opponent into four directional buckets (UP, DOWN, LEFT, RIGHT). Items are scored by value / (distance + 1) and summed per direction. The four sums are then rank-ordered (0โ€“3), producing a compact 4D state.

Training Configuration

Parameter Value Strategy
State space 4โด ร— 4 = 1,024 entries Rank-based discretization
Exploration (ฮต) 1.0 โ†’ 0.01 Linear decay over 150 rounds
Learning rate (ฮฑ) 1.0 โ†’ 0.01 Linear decay over 150 rounds
Discount (ฮณ) 0.9 โ€”

Reward Shaping

The reward is based on alignment between the chosen action and the optimal direction ranking โ€” the agent receives higher reward for moving toward the direction with the best score.

๐Ÿ“‚ View game rules & details โ†’


TankMan โ€” Hybrid BFS + Reinforcement Learning

A team-based tank battle game combining BFS pathfinding for resource management with Q-Learning for combat aiming. The agent switches between two behavioral modes depending on the tactical situation.

Approach

The agent operates a priority-based decision loop: when fuel or ammo is low, it uses BFS on a discretized grid map to navigate to the nearest supply station. When an enemy is within range, it switches to a Q-Learning policy that controls turret aiming and firing decisions.

flowchart TD
    A["๐ŸŽฏ Decision Loop"] --> B{"Low fuel or ammo?"}
    B -->|Yes| C["๐Ÿ—บ๏ธ BFS Pathfinding
    Grid map โ†’ nearest station"]
    B -->|No| D{"Enemy within 300px?"}
    D -->|Yes| E["๐Ÿค– Q-Learning Combat
    Aim & shoot policy"]
    D -->|No| F["๐Ÿงฑ Wall Destruction
    Scan & clear obstacles"]

    C -->|FORWARD / BACKWARD
    TURN_LEFT / TURN_RIGHT| G["๐Ÿ•น๏ธ Execute Action"]
    E -->|SHOOT / AIM_RIGHT
    AIM_LEFT / BACKWARD| G
    F -->|SHOOT / AIM_RIGHT
    random move| G
Loading

BFS Navigation

The map is discretized into a 50ร—30 grid. Walls, stations, teammates, and enemies are projected onto this grid. The BFS search operates in a 3D state space (row, col, angle) โ€” considering the tank's facing direction โ€” and returns the shortest path to the nearest fuel or bullet station.

Q-Learning Combat Controller

When an enemy enters range, the agent computes a state vector for the Q-table:

State Dimension Values Description
angle_diff 0โ€“8 Discretized angle between gun and enemy (45ยฐ bins)
turning_direction 0โ€“1 Clockwise vs. counter-clockwise to target
is_cooldown 0โ€“1 Whether the gun is on cooldown
teammate_angle_diff 0โ€“8 Angle to nearest teammate (friendly fire avoidance)

The Q-table has shape (9, 2, 2, 9, 5) mapping states to five actions: SHOOT, AIM_RIGHT, AIM_LEFT, BACKWARD, and a fallback wall-destruction mode.

Training Configuration

Parameter Value Strategy
State space 9ร—2ร—2ร—9ร—5 = 1,620 entries Angle-based discretization
Exploration (ฮต) 1.0 โ†’ 0.01 Linear decay over 170 rounds
Learning rate (ฮฑ) 1.0 โ†’ 0.01 Linear decay over 170 rounds
Discount (ฮณ) 0.9 โ€”

๐Ÿ“‚ View game rules & details โ†’


Tech Stack

  • Language: Python 3.9
  • Game Framework: PAIA MLGame, Pygame 2.0.1
  • ML/RL: NumPy (tabular Q-Learning), scikit-learn (Decision Tree)
  • Pathfinding: Custom BFS with directional state space
  • Serialization: Pickle (Q-tables and trained models)

Repository Structure (ML/RL related files)

.
โ”œโ”€โ”€ README.md                  # โ† You are here
โ”œโ”€โ”€ arkanoid/
โ”‚   โ”œโ”€โ”€ README.md              # Game rules & details
โ”‚   โ””โ”€โ”€ ml/
โ”‚       โ”œโ”€โ”€ ml_play_template.py        # Rule-based expert + data collection
โ”‚       โ””โ”€โ”€ ml_play_model.py           # Trained Decision Tree agent
โ”œโ”€โ”€ swimming-squid/
โ”‚   โ”œโ”€โ”€ README.md              # Game rules & details
โ”‚   โ””โ”€โ”€ ml/
โ”‚       โ”œโ”€โ”€ handleData.py              # State quantization + Q-Learning class
โ”‚       โ”œโ”€โ”€ Qlearning.py               # Training script
โ”‚       โ””โ”€โ”€ model_play.py              # Trained Q-table agent
โ””โ”€โ”€ tankman/
    โ”œโ”€โ”€ README.md              # Game rules & details
    โ””โ”€โ”€ ml/
        โ”œโ”€โ”€ data_handler.py            # State processing + Q-Learning class
        โ”œโ”€โ”€ find_station.py            # BFS pathfinder
        โ”œโ”€โ”€ wall_handler.py            # Wall detection & destruction
        โ”œโ”€โ”€ trainQL_play.py            # Training script
        โ””โ”€โ”€ ml_model_play.py           # Trained hybrid agent

How to Run

# Install dependencies
pip install mlgame pygame numpy scikit-learn

# Run any game with its AI agent
python -m mlgame -i ./ml/ml_play_model.py ./ --level <N>

See each project's README for game-specific configuration and level options.


Author

Harris โ€” Built as a portfolio exploring the progression from rule-based AI to reinforcement learning in competitive game environments.

About

3 AI agents that learn to play Arkanoid, Swimming Squid, and TankMan, from decision trees to Q-Learning.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors