ElevatEd - Student Success Monitoring Platform

Overall architecture

Cloning the repository

git clone https://github.com/nhungoc1508/S25-BDM-Project.git

Setting up the data simulation container

cd S25-BDM-Project
cd data-simulation
docker network create data-processing-network
docker compose up -d

Check that the PostgreSQL server is running and data has been loaded:

docker logs postgres_sis | grep "Mock data"

Should see either Mock data inserted successfully! or Mock data already exists. Skipping insertion.

Check that the MongoDB server is running and data has been loaded:

docker logs mongo_counselors | grep "Mock data"

Should see Mock data inserted successfully!

Setting up Delta Lake and the ingestion pipeline

Important

If you are running locally, comment out this line in the docker-compose.yaml file (this line appears 3 times under spark-master, spark-worker-1, and spark-worker-2):

platform: linux/arm64

Setting up Spark master and workers

Build custom images for Spark and Airflow:

cd ../delta-lake
docker build -t custom-airflow -f Dockerfile.airflow .
docker build -t custom-spark -f Dockerfile.spark .

Start MongoDB and Neo4J:

docker compose up counseling-db graph-db -d

Start master:

docker compose up spark-master -d

Check that master is running:

docker logs spark-master | grep "I have been elected leader! New state: ALIVE"

Start workers:

docker compose up spark-worker-1 spark-worker-2 -d

Check that all nodes are running and the workers are registered with master:

docker logs spark-worker-1 | grep "Successfully registered with master spark://spark-master:7077"
docker logs spark-worker-2 | grep "Successfully registered with master spark://spark-master:7077"

In case of failure to register, run compose down then repeat previous steps:

docker compose down spark-master spark-worker-1 spark-worker-2 -v

Once running, the Spark master UI is available at localhost:8081/ and will show 2 alive workers:

Setting up Airflow

mkdir -p ./data ./logs ./plugins ./config
echo -e "AIRFLOW_UID=$(id -u)" > .env

docker compose up airflow-init

Check that Airflow is using PostgreSQL for metadata (and not SQLite):

docker logs airflow-init | grep "DB: postgresql+psycopg2"

Start the rest of the Airflow-related services:

docker compose up airflow-worker airflow-scheduler airflow-dag-processor airflow-apiserver airflow-triggerer airflow-cli flower -d

Check that the webserver UI is up and running:

docker logs airflow-apiserver | grep "Application startup complete"

The Airflow webserver is available at localhost:8080/:

Log in with username airflow and password airflow. After logging in, click on the Dags tab on the left menu bar, the webserver UI will list all available DAGs:

Running DAGs to submit Spark ingestion jobs

Setting up connection to Spark master

In the Airflow webserver UI, go to Admin > Connections. Select Add Connection and add a connection to the Spark master with ID spark-default, type Spark, host spark://spark-master, and port 7077:

![alt text](image.png)

Running DAGs

Either trigger the DAGs manually or wait for scheduled runs, and monitor the DAG logs:

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
data-simulation		data-simulation
delta-lake		delta-lake
imgs		imgs
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ElevatEd - Student Success Monitoring Platform

Overall architecture

Cloning the repository

Setting up the data simulation container

Setting up Delta Lake and the ingestion pipeline

Setting up Spark master and workers

Setting up Airflow

Running DAGs to submit Spark ingestion jobs

Setting up connection to Spark master

Running DAGs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ElevatEd - Student Success Monitoring Platform

Overall architecture

Cloning the repository

Setting up the data simulation container

Setting up Delta Lake and the ingestion pipeline

Setting up Spark master and workers

Setting up Airflow

Running DAGs to submit Spark ingestion jobs

Setting up connection to Spark master

Running DAGs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages