Skip to content
View Prady029's full-sized avatar
👨‍🎓
Busy working on Local LLMs.
👨‍🎓
Busy working on Local LLMs.

Block or report Prady029

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
prady029/README.md

Hi, I'm Pradyumna Kumar Sahoo 👋

Senior Data Scientist | AI/ML Engineer | GenAI & Medical AI Specialist

Impact-driven Data Scientist with 5+ years of experience building production-grade AI/ML systems across Medical and Finance domains for Computer Vision, Audio, and Generative AI use-cases. Proven track record in architecting real-time audio chatbots, Knowledge-graph based GraphRAG pipelines, fine-tuning large language models, developing multimodal agents, and leading cross-functional teams.

Currently: Data Scientist @ Mondee Pvt. Ltd. | Open to: Senior AI/ML roles, GenAI research collaborations, and consulting opportunities


📍 Connect With Me

Platform Link
📧 Email pradyumna.sahoo@outlook.in
🔗 Portfolio prady029.github.io
📄 Resume Download PDF
🐙 GitHub @Prady029
💼 LinkedIn @prady029
✍️ Medium @prady029
🐦 Twitter/X @prady029
✈️ Telegram @prady029

🧠 Skills & Technologies

🨠 LLM Training & Inference

Full Fine-tuning • PEFT / LoRA / QLoRA • Instruction Tuning • RLHF / DPO • Mixed-precision (bf16/fp16) • Gradient Checkpointing • vLLM • Quantization (GPTQ / AWQ / bitsandbytes) • HuggingFace Transformers • DSPy • Unsloth

🕸️ Agentic & RAG Systems

LangChain • LangGraph • LiveKit • Tool-use / Function Calling • Multi-agent Orchestration • Multimodal RAG • Model Context Protocol (MCP) • Google Agent Development Kit • Prompt Engineering

👁️ Knowledge Graphs & Vector Search

Neo4j • AWS Neptune • Qdrant • LanceDB • Knowledge Graph Construction • Agentic KG • RxNorm API • PubMed API • Drug Interaction Systems

⚙️ Computer Vision & NLP

PyTorch • Detectron2 • YOLOv8 • OpenCV • GradCAM++ • GAN / Synthetic Data • Instance Segmentation • SpaCy • NLTK • ASR Fine-tuning • SWIN2SR • NAFNet

☁️ MLOps & Infrastructure

Weights & Biases • MLflow • Docker • FastAPI • Apache Airflow • Apache Spark (Databricks) • ETL Pipeline Design • Process Mining • CI/CD for ML • Git

☁️ Cloud & Databases

AWS Bedrock • AWS Neptune • AWS S3 / Lambda / EC2 • AWS OpenSearch • GCP Vertex AI • Azure ML • PostgreSQL • MongoDB • SQLite • Python • SQL • Bash


💼 Work Experience

Data Scientist

Mondee Pvt. Ltd. | Hyderabad, India | August 2025 – Present

  • Architected a medical-grade GraphRAG audio chatbot for clinical decision support system (CDSS) deployed across Surekha Hospital Chain and BhaktiVedant Hospital, constructing structured knowledge graphs from medical textbooks using NER and Neo4J
  • Engineered a drug–drug interaction checker and dosage scheduler agent served through Model Context Protocol (MCP), integrating real-time data from RxNorm and PubMed APIs
  • Led end-to-end fine-tuning and serving of Medgemma-27b-text-it for domain-specific clinical NLP tasks using Unsloth and vLLM, coordinating with research scholars from IIT Madras and IIT Hyderabad
  • Directed large-scale Speech-to-Text (STT/ASR) data preparation and multimodal fine-tuning of gemma-3n-e2b-it via LoRA, building a medical voice agent for real-time clinical transcription

Tags: Medical GraphRAG • Clinical NLP • LLM Fine-tuning • Voice AI • Medgemma-27b • MCP


Senior Member Technical (AI/ML)

ADP India Pvt. Ltd. | Hyderabad, India | December 2023 – July 2025

  • Designed and deployed a Knowledge-Graph based RAG pipeline on AWS Neptune & Bedrock orchestrated via LangGraph, enhancing financial data processing accuracy (recognized as runner-up in ADP Global Hackathon 2024)
  • Developed an intelligent Process Mining Chatbot for Global Payroll Services using Microsoft Power Automate and Databricks to optimize payment workflows across 73 payroll cycles per client
  • Engineered an agentic assistant using Google Agent Development Kit and AWS OpenSearch for context-aware email drafting and meeting scheduling, saving 24 hours per user per month
  • Built a scalable QR code detection and decoding pipeline with fine-tuned YOLOv8 and OpenCV, sanitizing financial documents prior to downstream processing across millions of payroll documents
  • Developed an Indic PII detection and redaction system for financial regulatory compliance using fine-tuned NER models and IndicNLP

Tags: AWS Neptune • LangGraph • Process Mining • Agentic AI • Finance AI • YOLOv8 • Indic NLP


Junior Data Scientist

Claim Genius Pvt. Ltd. | Remote, India | June 2021 – December 2023

  • Built a high-performance Instance Segmentation pipeline using Detectron2 served via FastAPI for vehicle parts identification, achieving 95% mAP and accelerating assessment throughput by 26%
  • Engineered an ML pipeline failure tracing system using MLflow to monitor inference-time distributions and surface root-cause analysis, reducing manual diagnosis time by 30%
  • Deployed an image super-resolution and denoising ensemble combining SWIN2SR and NAFNet models, reducing model failures by 16% and improving downstream accuracy by 12%
  • Built an automatic labeling error-detection service using Scikit-learn confidence scoring and trained PyTorch GAN models for synthetic image generation to resolve class imbalance
  • Designed a geometric flat-tire detection approach with zero curated data and boosted vehicle damage severity classification by 3% through a fusion ensemble combining PyTorch CNN with XGBoost

Tags: Detectron2 • FastAPI • GradCAM++ • GAN • Instance Segmentation • MLflow • XGBoost


🎓 Education & Certifications

Certifications

  • 🕸️ Agentic Knowledge Graph Construction — DeepLearning.AI × Neo4j (Aug 2025)
  • 🗃️ Neo4j Fundamentals — Neo4j GraphAcademy (Jul 2025)
  • 🧠 Pretraining LLMs — DeepLearning.AI × Upstage (Feb 2025)
  • 🔧 TensorFlow Developer Certificate — Coursera (2023)
  • 🧠 Deep Learning Specialization — Coursera (2022)
  • An Introduction To Practical Deep Learning — Intel - Coursera (2022)
  • 💬 Technical Support Fundamentals — Google - Coursera (2021)

Education

  • M.Sc. Computer Science (Big Data Analytics) — Central University of Rajasthan, Kishangarh, India
  • Integrated B.Sc. B.Ed. (Physical Sciences and Education) — Regional Institute of Education (NCERT), Bhubaneswar, India

🔬 Featured Projects

Research paper implementation for improving multi-label classification on imbalanced datasets using the Label-Specific Feature learning algorithm. Tags: Multi-label • Feature Learning • Class Imbalance • Python

Master's thesis — hybrid deep learning approach combining Label-Specific Feature learning with MLSMOTE to address tail-labels in multi-label classification. Tags: Multi-label • Deep Learning • SMOTE • Thesis

Implementation of the LIFT algorithm for label-specific feature transformations to improve multi-label classification performance. Tags: Multi-label • Feature Learning • Classification • Python

AI-powered assistant for teachers that automatically creates lesson plans, drafts question sets, and supports classroom preparation workflows. Tags: GenAI • Education • AI • LLM • Agentic • Python

Curated collection of multi-label classification benchmarks used across research experiments on label imbalance, feature learning, and SMOTE-based augmentation. Tags: Dataset • Multi-label • Research • Python

Implementations of fuzzy sets, fuzzy logic inference systems, and fuzzy control applications from the Fuzzy Computing course at CURAJ. Tags: Fuzzy Logic • Control Systems • Academic • Jupyter

Collection of ML algorithm implementations from 2018 — supervised learning, unsupervised learning, and regression techniques built from first principles. Tags: Machine Learning • Scikit-learn • Foundations • Jupyter

Ongoing journey through Data Structures and Algorithms — curated problem sets, solutions, and notes in Python. Tags: DSA • Python • Problem Solving

→ View All Repositories on GitHub


✍️ Latest Articles

Deep-dives into ML research, audio source separation, and multilingual NLP published on Medium and Analytics Drift.

→ View All 42+ Articles on AnalyticsDrift


📊 GitHub Statistics

GitHub Stats

GitHub Streak


🚀 What I'm Currently Working On

  • 🔭 Building production-grade GraphRAG pipelines for medical question-answering systems
  • 🧠 Fine-tuning domain-specific LLMs for healthcare and financial applications
  • 🤖 Developing multimodal agentic systems with voice and vision capabilities
  • 📚 Contributing to open-source ML/AI projects
  • ✍️ Writing technical deep-dives on LLMs, knowledge graphs, and production ML systems

💬 Let's Connect!

I'm always excited to discuss:

  • GenAI & LLM applications and fine-tuning strategies
  • Medical AI and healthcare tech innovations
  • Production ML/MLOps architectures at scale
  • Career opportunities in AI/ML engineering
  • Research collaborations and open-source contributions

Feel free to reach out on LinkedIn, Twitter, or email me directly!


Built with ❤️ by Pradyumna Kumar Sahoo | Portfolio | Resume

Pinned Loading

  1. LLSF_DL-MLSMOTE-Hybrid-for-handling-tail-labels LLSF_DL-MLSMOTE-Hybrid-for-handling-tail-labels Public

    My Masters' thesis along with all codes and datsets.

    Python 9 2

  2. LLSF-Learning-Label-Specific-Features-for-Multi-Label-Classifcation LLSF-Learning-Label-Specific-Features-for-Multi-Label-Classifcation Public

    A research paper implementation from scratch for improving classification for imbalanced datasets.

    Python 12 3

  3. LIFT-MultiLabel-Learning-with-Label-Specific-Features LIFT-MultiLabel-Learning-with-Label-Specific-Features Public

    A research paper implementation from scratch for pet project :)

    Python 6 1

  4. Multillabel-Datasets Multillabel-Datasets Public

    Python 1