Skip to content

Kodotautas/Data-Engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

647 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering

Data engineering projects and learnings:

  1. DE Trainings - Big data engineering homeworks: SQLite, Parquet, file processing, database operations
  2. Apache Airflow - Workflow orchestration: DAGs, operators, scheduling, monitoring, security
  3. GCP Data Engineering - Full pipeline: Airflow, PostgreSQL, Data Lake, BigQuery, Dataproc, Composer
  4. Lithuania Statistics Pipeline - Dataflow ETL from Statistics Lithuania API to GCP Storage and BigQuery
  5. LT Transport Dashboard - Car/motorcycle market analytics: Dataflow, BigQuery, Looker Studio, scheduled pipeline
  6. VNO Airplane Spotting - Flask web app with flight data analysis, deployed on GCP App Engine
  7. Rust vs Python Performance - Data processing speed comparison with benchmarks and performance analysis
  8. Real-time Streaming (Rust) - Event-driven pipeline: Pub/Sub → Cloud Run → BigQuery with Rust
  9. Gemini Pro Translator - LLM integration in BigQuery for multilingual text translation using SQL
  10. Rust DataFusion vs PySpark - 10 billion rows benchmark: 2.7x faster performance with DataFusion
  11. Google Pipe SQL - Comparison of standard SQL vs Google's new pipe syntax for readability
  12. BigQuery Optimization - Query performance tuning using historical execution patterns
  13. GCP Cost Optimization - Resource management, sustained use discounts, preemptible instances strategies
  14. Creative Data Engineering - Analysis of creativity aspects in data engineering with visualization
  15. GCP CMEK Integration - Customer-managed encryption keys implementation for secure data ingestion

About

Various Data Engineering experiments on free time

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors