Skip to content

nhungoc1508/S25-BDM-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ElevatEd - Student Success Monitoring Platform

Overall architecture

Cloning the repository

git clone https://github.com/nhungoc1508/S25-BDM-Project.git

Setting up the data simulation container

cd S25-BDM-Project
cd data-simulation
docker network create data-processing-network
docker compose up -d

Check that the PostgreSQL server is running and data has been loaded:

docker logs postgres_sis | grep "Mock data"

Should see either Mock data inserted successfully! or Mock data already exists. Skipping insertion.

Check that the MongoDB server is running and data has been loaded:

docker logs mongo_counselors | grep "Mock data"

Should see Mock data inserted successfully!

Setting up Delta Lake and the ingestion pipeline

Important

If you are running locally, comment out this line in the docker-compose.yaml file (this line appears 3 times under spark-master, spark-worker-1, and spark-worker-2):

platform: linux/arm64

Setting up Spark master and workers

Build custom images for Spark and Airflow:

cd ../delta-lake
docker build -t custom-airflow -f Dockerfile.airflow .
docker build -t custom-spark -f Dockerfile.spark .

Start MongoDB and Neo4J:

docker compose up counseling-db graph-db -d

Start master:

docker compose up spark-master -d

Check that master is running:

docker logs spark-master | grep "I have been elected leader! New state: ALIVE"

Start workers:

docker compose up spark-worker-1 spark-worker-2 -d

Check that all nodes are running and the workers are registered with master:

docker logs spark-worker-1 | grep "Successfully registered with master spark://spark-master:7077"
docker logs spark-worker-2 | grep "Successfully registered with master spark://spark-master:7077"

In case of failure to register, run compose down then repeat previous steps:

docker compose down spark-master spark-worker-1 spark-worker-2 -v

Once running, the Spark master UI is available at localhost:8081/ and will show 2 alive workers:

Setting up Airflow

mkdir -p ./data ./logs ./plugins ./config
echo -e "AIRFLOW_UID=$(id -u)" > .env

docker compose up airflow-init

Check that Airflow is using PostgreSQL for metadata (and not SQLite):

docker logs airflow-init | grep "DB: postgresql+psycopg2"

Start the rest of the Airflow-related services:

docker compose up airflow-worker airflow-scheduler airflow-dag-processor airflow-apiserver airflow-triggerer airflow-cli flower -d

Check that the webserver UI is up and running:

docker logs airflow-apiserver | grep "Application startup complete"

The Airflow webserver is available at localhost:8080/:

Log in with username airflow and password airflow. After logging in, click on the Dags tab on the left menu bar, the webserver UI will list all available DAGs:

Running DAGs to submit Spark ingestion jobs

Setting up connection to Spark master

In the Airflow webserver UI, go to Admin > Connections. Select Add Connection and add a connection to the Spark master with ID spark-default, type Spark, host spark://spark-master, and port 7077:

![alt text](image.png)

Running DAGs

Either trigger the DAGs manually or wait for scheduled runs, and monitor the DAG logs:

About

Integrated project for courses at Universitat Politècnica de Catalunya, Spring 2025

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors