AWS SCD Type 2 Data Pipeline (Project 3)

This project implements Slowly Changing Dimension (SCD) Type 2 on customer data using AWS S3 + AWS Glue (PySpark) + Athena.
It preserves full history of changes instead of overwriting customer attributes.

What is SCD Type 2?

When customer attributes (like city/state) change, SCD Type 2:

expires the old record (is_current=false, sets end_date)
inserts a new record (is_current=true, end_date=NULL)
keeps historical versions for audit and analytics

Architecture

S3 Raw (customers_day1.csv, customers_day2.csv)
→ AWS Glue PySpark (SCD Type 2 logic)
→ S3 Silver (Parquet dimension table)
→ Athena validation queries

S3 Structure

Raw:

s3://surya-project3-scd/raw/customers/customers_day1.csv
s3://surya-project3-scd/raw/customers/customers_day2.csv

Silver:

s3://surya-project3-scd/silver/customers_scd2/

Output Schema (Silver)

customer_id
customer_unique_id
customer_city
customer_state
start_date
end_date
is_current

Proof (Athena Validation)

Result after running SCD:

is_current = true → 99441 rows
is_current = false → 10 rows

This confirms 10 customers had attribute changes and history was preserved.

Tech Stack

AWS S3
AWS Glue
PySpark
Athena
Parquet

Learning Outcome

Built a real-world historical dimension pipeline using SCD Type 2 logic with AWS Glue, and validated results using Athena queries.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Data		Data
glue_job		glue_job
screenshots		screenshots
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AWS SCD Type 2 Data Pipeline (Project 3)

What is SCD Type 2?

Architecture

S3 Structure

Output Schema (Silver)

Proof (Athena Validation)

Tech Stack

Learning Outcome

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AWS SCD Type 2 Data Pipeline (Project 3)

What is SCD Type 2?

Architecture

S3 Structure

Output Schema (Silver)

Proof (Athena Validation)

Tech Stack

Learning Outcome

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages