SEPTA Delay Scraper is an open-source project that collects real-time train data from SEPTA's public APIs, including:
- Train positions (
train_view.py) - Real-time trip updates (
trip_updates.py) - GTFS schedule updates (
rrschedules.py)
The scraper runs every 10 minutes (rrschedules.py runs once a day) and stores data in SQLite databases.
This project is containerized with Docker, making deployment easy on any server.
- Scrapes real-time train positions
- Stores historical delay data
- Downloads & updates GTFS schedules
- Fully automated with cron jobs inside Docker
Before running the scraper, install Docker:
sudo apt update && sudo apt install -y docker.io docker-composeVerify installation:
docker --version
docker-compose --versiongh repo clone nathankong97/septa-delay
cd septa-delaydocker-compose up -d --buildThis will:
- Install dependencies
- Run scrapers every 10 minutes
- Store data in SQLite databases inside
data/, json file insidescraping/ - Persist logs inside
logs/
docker psdocker logs septa_scraperdocker-compose downdocker exec -it septa_scrapersepta-delay/
βββ data/ # Stores SQLite databases (Persistent)
βββ logs/ # Stores log files
βββ scraping/ # Stores json files
βββ septa/
β βββ core/
β β βββ database.py # Database handling
β β βββ fetcher.py # API fetch logic
β β βββ logger.py # Logging system
β βββ rrschedules.py # Fetch GTFS data and final updates
β βββ train_view.py # Fetch live train positions
β βββ trip_updates.py # Fetch real-time trip updates
βββ config.py # Configuration settings
βββ Dockerfile # Docker build instructions
βββ docker-compose.yml # Manages Docker services
βββ run_scraper.sh # Auto-runs all scrapers
βββ requirements.txt # Python dependencies
βββ README.md