Skip to content
View DataProtagonist's full-sized avatar

Block or report DataProtagonist

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
DataProtagonist/README.md

Hi there! πŸ‘‹ I'm Pranay Datta Kavukuntla

Data Engineer | Data Analyst | Cloud Enthusiast | Business Analyst

πŸ”­ I am an experienced Data Engineer with over 3+ years of expertise in designing, building, and optimizing complex data pipelines, ETL processes, and data architectures. I enjoy solving business challenges by leveraging scalable data solutions and using data to drive decision-making.


πŸ’‘ About Me

  • 🌐 Currently working as a Data Engineer Intern at Marlabs Inc. where I optimize data workflows using Azure, Databricks, and real-time ingestion tools like Kafka.
  • πŸ’» Skilled in Big Data frameworks like Apache Spark and Hadoop, with a strong command of SQL, Python, and ETL tools like SSIS and Azure Data Factory.
  • ☁️ Hands-on experience with cloud platforms including Azure, AWS, and Google Cloud Platform, focusing on serverless data management and real-time processing solutions.
  • πŸ“Š Passionate about data visualization, using tools like Power BI, Tableau, and Qlik Sense to turn complex datasets into actionable insights.
  • πŸ”§ Proficient in data quality assurance, data modeling, and data governance, ensuring accuracy and consistency across diverse sources.

πŸ› οΈ Skills & Expertise

Programming & Scripting:

  • Languages: SQL, Python, PL/SQL, Bash
  • Libraries: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, TensorFlow
  • Automation: Git, GitHub, SQL Server Agent, CI/CD Pipelines

Big Data & Distributed Systems:

  • Frameworks: Apache Spark, Hadoop, HDFS, Hive, Apache Kafka, Databricks
  • Data Pipelines: ETL pipelines with SSIS, Azure Data Factory, real-time data ingestion with Kafka and Event Hubs

Cloud Platforms:

  • Microsoft Azure: Azure Data Factory, Azure Synapse Analytics, Azure SQL Database, Azure Event Hubs
  • AWS: S3, EC2, Lambda, IAM, RDS
  • Google Cloud Platform: BigQuery, Dataflow

Databases & Storage:

  • Relational Databases: MySQL, PostgreSQL, SQL Server
  • NoSQL: MongoDB, Cassandra, HDFS

Data Visualization & BI Tools:

  • Visualization: Power BI, Tableau, Qlik Sense, MS Excel (Advanced)
  • Real-Time Dashboards: Developed and maintained business intelligence dashboards to track key performance indicators (KPIs)

Data Governance & Quality:

  • Data Lineage & Governance: Apache Atlas, Azure Purview
  • Data Quality: Implementing data validation, error handling, and governance frameworks for high data integrity

πŸš€ Professional Experience

  • Optimized ETL workflows across multiple industries using tools like Azure Data Factory, Databricks, and SSIS, improving data processing speeds by up to 25%.
  • Integrated real-time data from diverse sources with Apache Kafka and Azure Event Hubs, ensuring seamless ingestion and processing.
  • Led a cloud migration project, reducing operational costs by 25% while automating data transfer and optimizing cloud resources.
  • Developed 300+ ETL packages and optimized SQL queries, enhancing system performance by 40% and ensuring smooth legacy system integration.
  • Built Power BI dashboards and automated data pipelines that processed 1M+ records, reducing data processing times by 30% and improving decision-making efficiency.

πŸ† Projects & Highlights

  • Real-time Data Ingestion: Integrated Apache Kafka and Azure Event Hubs for critical real-time operations, improving data flow efficiency.
  • Predictive Modeling: Built machine learning models using scikit-learn and TensorFlow to forecast research trends, boosting engagement by 15%.
  • Data Visualization Dashboards: Developed custom dashboards in Power BI and Tableau to monitor pipeline performance, enhancing decision-making and reducing downtime by 30%.
  • Financial Data Migration: Automated ETL processes using SSIS and Apache Spark, optimizing data transformation for high-volume environments.

🎯 What Drives Me

I’m passionate about using data to drive business impact by optimizing pipelines, building predictive models, and creating actionable dashboards. I thrive on continuous learning, staying up-to-date with the latest in data engineering and cloud technologies, and enjoy collaborating across teams to solve complex problems and enhance data infrastructure.


🌱 What I'm Currently Learning

  • Enhancing my skills in distributed computing and real-time data processing.
  • Experimenting with advanced machine learning techniques for predictive modeling and data analytics.

🀝 Let's Connect!

I’m always open to new opportunities and collaborations. Feel free to reach out if you’d like to discuss projects, exchange ideas, or explore how I can help with your data engineering challenges!

Pinned Loading

  1. Neuroscan Neuroscan Public

    Jupyter Notebook

  2. InsightMed InsightMed Public

  3. Safecity Safecity Public