Skip to content

πŸš† SEPTA Delay Scraper – An open-source project that scrapes real-time SEPTA train schedules, delays, and GTFS data.

License

Notifications You must be signed in to change notification settings

nathankong97/septa-delay

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

24 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš† SEPTA Delay Scraper

SEPTA Delay Scraper is an open-source project that collects real-time train data from SEPTA's public APIs, including:

  • Train positions (train_view.py)
  • Real-time trip updates (trip_updates.py)
  • GTFS schedule updates (rrschedules.py)

The scraper runs every 10 minutes (rrschedules.py runs once a day) and stores data in SQLite databases. This project is containerized with Docker, making deployment easy on any server.


🎯 Features

  1. Scrapes real-time train positions
  2. Stores historical delay data
  3. Downloads & updates GTFS schedules
  4. Fully automated with cron jobs inside Docker

πŸ“₯ Deployment Guide

1. Install Docker & Docker Compose

Before running the scraper, install Docker:

sudo apt update && sudo apt install -y docker.io docker-compose

Verify installation:

docker --version
docker-compose --version

2. Clone the Repository

gh repo clone nathankong97/septa-delay
cd septa-delay

3. Build & Run the Scraper

docker-compose up -d --build

This will:

  • Install dependencies
  • Run scrapers every 10 minutes
  • Store data in SQLite databases inside data/, json file inside scraping/
  • Persist logs inside logs/

🐳 Managing the Scraper

Check Running Containers

docker ps

View Logs

docker logs septa_scraper

Shutdown Docker

docker-compose down

Access Container

docker exec -it septa_scraper

πŸ“ Project Structure

septa-delay/
β”œβ”€β”€ data/                 # Stores SQLite databases (Persistent)
β”œβ”€β”€ logs/                 # Stores log files
β”œβ”€β”€ scraping/             # Stores json files
β”œβ”€β”€ septa/
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ database.py   # Database handling
β”‚   β”‚   β”œβ”€β”€ fetcher.py    # API fetch logic
β”‚   β”‚   β”œβ”€β”€ logger.py     # Logging system
β”‚   β”œβ”€β”€ rrschedules.py   # Fetch GTFS data and final updates
β”‚   β”œβ”€β”€ train_view.py    # Fetch live train positions
β”‚   β”œβ”€β”€ trip_updates.py  # Fetch real-time trip updates
β”œβ”€β”€ config.py             # Configuration settings
β”œβ”€β”€ Dockerfile            # Docker build instructions
β”œβ”€β”€ docker-compose.yml    # Manages Docker services
β”œβ”€β”€ run_scraper.sh        # Auto-runs all scrapers
β”œβ”€β”€ requirements.txt      # Python dependencies
└── README.md             

About

πŸš† SEPTA Delay Scraper – An open-source project that scrapes real-time SEPTA train schedules, delays, and GTFS data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published