Dockerized Airflow pipeline that scrapes NBA player prop lines using scripts under airflow_pipeline/api_scripts, normalizes and appends the results into Postgres on docker itself. Airflow scheduled to run the nba_props_dag daily.
- Docker Desktop (or Docker Engine) with Compose v2
- Optional: Python 3.10+ if you want to run the scraper scripts locally
-
Configure
git clone <repo> cd webscrape
The Airflow container reads DB credentials from the environment (
DB_USERNAME,DB_PASSWORD,DB_HOST,DB_PORT,DB_NAME). Defaults target the Postgres service defined indocker-compose.yml(line_dancer / sportsbook_data @ postgres:5432 / nba_deeplearning). Update either.envor the compose file if you need different values. -
Build and launch
docker compose down # don't dompose down -v: will wipe old containers/volumes, as # well as the postgres database docker compose up -d # no need to build docker ps # check container status
This starts:
postgres: stores the scraped data (volumepostgres_datakeeps rows between runs, exposed onlocalhost:5433)airflow: runs Airflow 3 with SequentialExecutor + SQLite metadata, but connects to the Postgres service for ETL output
-
Access Airflow
- UI: http://localhost:8080
- Credentials:
admin/M7fhanqeB2mUPSpe(set indocker-compose.yml; rotate before sharing externally)
-
Trigger the DAG
- In the UI, unpause
nba_sportsbook_pipelineand click “Trigger DAG” - Or via CLI:
docker compose exec airflow airflow dags test nba_sportsbook_pipeline $(date +%Y-%m-%d)
The DAG imports
migrate_to_postgres.py, which runs the BettingPros, PrizePicks, and DraftEdge scrapers, validates the schema, and appends rows into theplayer_linestable inside Postgres. - In the UI, unpause
-
Inspect the database
psql -h localhost -p 5433 -U line_dancer -d nba_deeplearning \dt SELECT COUNT(*) FROM player_lines; docker exec -it webscrape-postgres-1 psql -U line_dancer -d nba_deeplearning;
-
Stop / clean up
docker compose down # keep scraped data docker compose down -v # remove containers + Postgres volume // DON'T DO THIS
- Install dependencies:
pip install -r requirements.txt - Export the same
DB_*vars as the container and runpython airflow/dags/migrate_to_postgres.pyto append data without Airflow.
- Airflow metadata stays on SQLite (default) so Postgres holds only the scraper output.
- Update secrets (
DB_PASSWORD, Airflow admin password) before committing or sharing the project.