Skip to content
View Creative-Ataraxia's full-sized avatar
🌟
Senior Data Engineer
🌟
Senior Data Engineer

Block or report Creative-Ataraxia

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Creative-Ataraxia/README.md

Hi there, happy to see you! 👋


I'm a data engineer building reliable data platforms across ingestion, streaming, transformation, orchestration, and lakehouse storage.

I work with:

  • Kafka/Flink pipelines for low-latency event processing
  • Spark/Glue jobs for large-scale transformations
  • Hudi/S3 lakehouse tables for CDC-aware analytical storage
  • Airflow DAGs for orchestration, retries, freshness checks, and backfills
  • SQL/dbt-style modeling for reusable analytics layers
  • Data quality, schema evolution, and access governance

My work combines independent ownership with measurable business impact: cost savings, latency reduction, operational scaling, and technical debt reduction.

I care about the unglamorous parts of data engineering: stable keys, idempotent writes, partitioning, skew, small files, late events, schema drift, and making sure downstream users can trust the data.

This GitHub is where I keep personal projects, experiments, interview prep, and engineering notes around data systems, analytics engineering, and LLM-assisted workflows.

Outside of work, I enjoy building personal projects such as Batch ETL, Stream Processing, and LLM-related coding competitions. I recently won a Silver Medal in a featured Kaggle LLM competition.

Architectures I've Worked With

Governed Data Marketplace / OLAP Lakehouse Realtime + Lakehouse Analytics Platform
Governed Data Marketplace OLAP Lakehouse Architecture Realtime Serving and Lakehouse Analytics Architecture
A governed banking data marketplace architecture coordinating backfills and CDC ingestion into a Spark/Glue + Hudi/S3 lakehouse, with Airflow orchestration, Lake Formation access controls, and Athena/Redshift Spectrum serving curated self-service data products. A dual-path manufacturing data platform where Kafka decouples high-volume floor events, Flink/Aurora powers low-latency operational serving, and Spark/Hudi/S3 supports analytics and ML consumption through lakehouse storage.

Contacts

Pinned Loading

  1. GA4-Analytical-Pipeline GA4-Analytical-Pipeline Public

    A fully containerised batch ETL stack that ingests ~5M Google Analytics 4 data, transforms it with Spark, orchestrates the workflow in Airflow, lands data facts/dimensions in Postgres for downstrea…

    Python 1 1

  2. eonet-realtime-streaming eonet-realtime-streaming Public

    Real-time streaming data engineering project; Ingests, transforms, persists, and visualize data about real-time natural events sourced from NASA's eonet APIs; Mapbox data visuals below:

    Python

  3. Atomized-Tasks-Dataset Atomized-Tasks-Dataset Public

    a tabular dataset of 6,970 real-world workflows; These workflows are commonly used in SaaS, e-commerce, advertising, marketing, sales, customer support, etc. Each row represents an atomic task: a m…

  4. Kaggle_Solutions Kaggle_Solutions Public

    My solutions for the featured kaggle competitions I participated in

    HTML 2 4

  5. Statistics_and_Probability_Concepts_Cheatsheet Statistics_and_Probability_Concepts_Cheatsheet Public

    Notes for all 4 courses in the MITx SDS program

    HTML 3