Applying Reinforcement Learning to Car Navigation

A 2D Unity simulation where cars learn to navigate through courses using Reinforcement Learning. This project extends Samuel Arzt's original simulation with a custom PPO (Proximal Policy Optimization) implementation in Python, replacing the original evolutionary approach with modern RL.

Short demo video of an early version: https://youtu.be/rEDzUT3ymw4

Overview

Cars must navigate through a course without hitting walls or obstacles. Each car has five front-facing distance sensors (covering ~90 degrees) and a speed reading, forming a 6-dimensional observation vector. A neural network maps these observations to continuous actions: engine force and turning force.

Project Structure

Applying_RL/
├── Agent/                        # Python RL training code
│   ├── PPO/                      # Custom PPO implementation
│   │   ├── ppo.py                # PPOClip algorithm (policy + value networks)
│   │   └── rollout.py            # GAE-based rollout buffer
│   ├── car_agent/                # Unity car training & evaluation
│   │   ├── train_unity_car.py    # Train car agent with custom PPO
│   │   ├── eval_unity_car.py     # Evaluate trained models
│   │   └── train_optuna_unity.py # Hyperparameter search with Optuna
│   ├── custom_envs/              # Gymnasium environment scripts
│   │   ├── custom_env.py         # Custom test environments
│   │   ├── train_ppo.py          # Train on any Gymnasium env
│   │   ├── eval_ppo.py           # Evaluate custom PPO models
│   │   ├── train_stable.py       # Train with Stable-Baselines3 (baseline)
│   │   └── eval_stable.py        # Evaluate SB3 models
│   └── runables/                 # Shell scripts for training/eval
├── UnityProject/                 # Unity simulation source
│   └── Assets/Scripts/           # C# scripts (CarAgent, CarController, etc.)
├── Build/                        # Pre-built Unity executables
└── Images/                       # Demo assets

PPO Implementation

The custom PPO implementation (Agent/PPO/) features:

Clipped surrogate objective with configurable clip range
Generalized Advantage Estimation (GAE) for variance reduction
Support for both continuous and discrete action spaces
Gaussian policy (continuous) and Categorical policy (discrete)
Configurable architecture via PPOConfig dataclass

The agent connects to Unity via the ML-Agents Python Low-Level API, allowing full control over the training loop.

Quick Start

Prerequisites

Python 3.10 with uv package manager
Unity Editor (for development) or use the pre-built executables in Build/

Installation

cd Agent
uv sync

Training

# Train with the Linux build (recommended)
cd Agent/runables
./train_unity_car.sh --time-scale 50.0

# Train with Unity Editor
./train_unity_car.sh --env editor

# Hyperparameter search with Optuna
./train_optuna_unity.sh --trials 20 --trial-steps 750000 --time-scale 50.0

Evaluation

cd Agent/runables
./eval_unity_car.sh --weights ../results/custom_ppo/ppo_final.pt --episodes 10

Monitoring

tensorboard --logdir Agent/results/custom_ppo/tensorboard
# Then open http://localhost:6006

For detailed usage, training options, and troubleshooting, see Agent/README.md.

The Simulation

The simulation can be run from the Unity Editor or using the built executables in Build/. Cars are spawned on a track and must navigate checkpoints while avoiding walls. Rewards come from reaching checkpoints; hitting walls ends the episode.

Courses

Multiple courses of increasing difficulty are available as Unity scenes in UnityProject/Assets/Scenes/.

Gymnasium Benchmarks

The PPO implementation can also be tested on standard Gymnasium environments for validation:

cd Agent/custom_envs

# CartPole
python train_ppo.py --env CartPole-v1 --timesteps 100000

# Compare with Stable-Baselines3
python train_stable.py --env CartPole-v1 --timesteps 100000

License

This project is based on Applying EANNs by Samuel Arzt, licensed under the MIT License. The Unity simulation, car physics, and sensor system originate from that project.

The RL training infrastructure (custom PPO, Optuna integration, Gymnasium benchmarks) was developed by Eliana Ostro.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Applying Reinforcement Learning to Car Navigation

Overview

Project Structure

PPO Implementation

Quick Start

Prerequisites

Installation

Training

Evaluation

Monitoring

The Simulation

Courses

Gymnasium Benchmarks

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
Agent		Agent
Build		Build
Images		Images
UnityProject		UnityProject
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Applying Reinforcement Learning to Car Navigation

Overview

Project Structure

PPO Implementation

Quick Start

Prerequisites

Installation

Training

Evaluation

Monitoring

The Simulation

Courses

Gymnasium Benchmarks

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages