I'm a data engineer building reliable data platforms across ingestion, streaming, transformation, orchestration, and lakehouse storage.
I work with:
- Kafka/Flink pipelines for low-latency event processing
- Spark/Glue jobs for large-scale transformations
- Hudi/S3 lakehouse tables for CDC-aware analytical storage
- Airflow DAGs for orchestration, retries, freshness checks, and backfills
- SQL/dbt-style modeling for reusable analytics layers
- Data quality, schema evolution, and access governance
My work combines independent ownership with measurable business impact: cost savings, latency reduction, operational scaling, and technical debt reduction.
I care about the unglamorous parts of data engineering: stable keys, idempotent writes, partitioning, skew, small files, late events, schema drift, and making sure downstream users can trust the data.
This GitHub is where I keep personal projects, experiments, interview prep, and engineering notes around data systems, analytics engineering, and LLM-assisted workflows.
Outside of work, I enjoy building personal projects such as Batch ETL, Stream Processing, and LLM-related coding competitions. I recently won a Silver Medal in a featured Kaggle LLM competition.
- Email: roy.ma9@gmail.com
- LinkedIn: linkedin.com/in/royma
- Visa Status: None needed — U.S. citizen and open to relocation.





