Senior Data Scientist | AI/ML Engineer | GenAI & Medical AI Specialist
Impact-driven Data Scientist with 5+ years of experience building production-grade AI/ML systems across Medical and Finance domains for Computer Vision, Audio, and Generative AI use-cases. Proven track record in architecting real-time audio chatbots, Knowledge-graph based GraphRAG pipelines, fine-tuning large language models, developing multimodal agents, and leading cross-functional teams.
Currently: Data Scientist @ Mondee Pvt. Ltd. | Open to: Senior AI/ML roles, GenAI research collaborations, and consulting opportunities
| Platform | Link |
|---|---|
| pradyumna.sahoo@outlook.in | |
| 🔗 Portfolio | prady029.github.io |
| 📄 Resume | Download PDF |
| 🐙 GitHub | @Prady029 |
| @prady029 | |
| ✍️ Medium | @prady029 |
| 🐦 Twitter/X | @prady029 |
| @prady029 |
Full Fine-tuning • PEFT / LoRA / QLoRA • Instruction Tuning • RLHF / DPO • Mixed-precision (bf16/fp16) • Gradient Checkpointing • vLLM • Quantization (GPTQ / AWQ / bitsandbytes) • HuggingFace Transformers • DSPy • Unsloth
LangChain • LangGraph • LiveKit • Tool-use / Function Calling • Multi-agent Orchestration • Multimodal RAG • Model Context Protocol (MCP) • Google Agent Development Kit • Prompt Engineering
Neo4j • AWS Neptune • Qdrant • LanceDB • Knowledge Graph Construction • Agentic KG • RxNorm API • PubMed API • Drug Interaction Systems
PyTorch • Detectron2 • YOLOv8 • OpenCV • GradCAM++ • GAN / Synthetic Data • Instance Segmentation • SpaCy • NLTK • ASR Fine-tuning • SWIN2SR • NAFNet
Weights & Biases • MLflow • Docker • FastAPI • Apache Airflow • Apache Spark (Databricks) • ETL Pipeline Design • Process Mining • CI/CD for ML • Git
AWS Bedrock • AWS Neptune • AWS S3 / Lambda / EC2 • AWS OpenSearch • GCP Vertex AI • Azure ML • PostgreSQL • MongoDB • SQLite • Python • SQL • Bash
Mondee Pvt. Ltd. | Hyderabad, India | August 2025 – Present
- Architected a medical-grade GraphRAG audio chatbot for clinical decision support system (CDSS) deployed across Surekha Hospital Chain and BhaktiVedant Hospital, constructing structured knowledge graphs from medical textbooks using NER and Neo4J
- Engineered a drug–drug interaction checker and dosage scheduler agent served through Model Context Protocol (MCP), integrating real-time data from RxNorm and PubMed APIs
- Led end-to-end fine-tuning and serving of Medgemma-27b-text-it for domain-specific clinical NLP tasks using Unsloth and vLLM, coordinating with research scholars from IIT Madras and IIT Hyderabad
- Directed large-scale Speech-to-Text (STT/ASR) data preparation and multimodal fine-tuning of gemma-3n-e2b-it via LoRA, building a medical voice agent for real-time clinical transcription
Tags: Medical GraphRAG • Clinical NLP • LLM Fine-tuning • Voice AI • Medgemma-27b • MCP
ADP India Pvt. Ltd. | Hyderabad, India | December 2023 – July 2025
- Designed and deployed a Knowledge-Graph based RAG pipeline on AWS Neptune & Bedrock orchestrated via LangGraph, enhancing financial data processing accuracy (recognized as runner-up in ADP Global Hackathon 2024)
- Developed an intelligent Process Mining Chatbot for Global Payroll Services using Microsoft Power Automate and Databricks to optimize payment workflows across 73 payroll cycles per client
- Engineered an agentic assistant using Google Agent Development Kit and AWS OpenSearch for context-aware email drafting and meeting scheduling, saving 24 hours per user per month
- Built a scalable QR code detection and decoding pipeline with fine-tuned YOLOv8 and OpenCV, sanitizing financial documents prior to downstream processing across millions of payroll documents
- Developed an Indic PII detection and redaction system for financial regulatory compliance using fine-tuned NER models and IndicNLP
Tags: AWS Neptune • LangGraph • Process Mining • Agentic AI • Finance AI • YOLOv8 • Indic NLP
Claim Genius Pvt. Ltd. | Remote, India | June 2021 – December 2023
- Built a high-performance Instance Segmentation pipeline using Detectron2 served via FastAPI for vehicle parts identification, achieving 95% mAP and accelerating assessment throughput by 26%
- Engineered an ML pipeline failure tracing system using MLflow to monitor inference-time distributions and surface root-cause analysis, reducing manual diagnosis time by 30%
- Deployed an image super-resolution and denoising ensemble combining SWIN2SR and NAFNet models, reducing model failures by 16% and improving downstream accuracy by 12%
- Built an automatic labeling error-detection service using Scikit-learn confidence scoring and trained PyTorch GAN models for synthetic image generation to resolve class imbalance
- Designed a geometric flat-tire detection approach with zero curated data and boosted vehicle damage severity classification by 3% through a fusion ensemble combining PyTorch CNN with XGBoost
Tags: Detectron2 • FastAPI • GradCAM++ • GAN • Instance Segmentation • MLflow • XGBoost
- 🕸️ Agentic Knowledge Graph Construction — DeepLearning.AI × Neo4j (Aug 2025)
- 🗃️ Neo4j Fundamentals — Neo4j GraphAcademy (Jul 2025)
- 🧠 Pretraining LLMs — DeepLearning.AI × Upstage (Feb 2025)
- 🔧 TensorFlow Developer Certificate — Coursera (2023)
- 🧠 Deep Learning Specialization — Coursera (2022)
- ✨ An Introduction To Practical Deep Learning — Intel - Coursera (2022)
- 💬 Technical Support Fundamentals — Google - Coursera (2021)
- M.Sc. Computer Science (Big Data Analytics) — Central University of Rajasthan, Kishangarh, India
- Integrated B.Sc. B.Ed. (Physical Sciences and Education) — Regional Institute of Education (NCERT), Bhubaneswar, India
Research paper implementation for improving multi-label classification on imbalanced datasets using the Label-Specific Feature learning algorithm. Tags: Multi-label • Feature Learning • Class Imbalance • Python
Master's thesis — hybrid deep learning approach combining Label-Specific Feature learning with MLSMOTE to address tail-labels in multi-label classification. Tags: Multi-label • Deep Learning • SMOTE • Thesis
Implementation of the LIFT algorithm for label-specific feature transformations to improve multi-label classification performance. Tags: Multi-label • Feature Learning • Classification • Python
AI-powered assistant for teachers that automatically creates lesson plans, drafts question sets, and supports classroom preparation workflows. Tags: GenAI • Education • AI • LLM • Agentic • Python
Curated collection of multi-label classification benchmarks used across research experiments on label imbalance, feature learning, and SMOTE-based augmentation. Tags: Dataset • Multi-label • Research • Python
Implementations of fuzzy sets, fuzzy logic inference systems, and fuzzy control applications from the Fuzzy Computing course at CURAJ. Tags: Fuzzy Logic • Control Systems • Academic • Jupyter
Collection of ML algorithm implementations from 2018 — supervised learning, unsupervised learning, and regression techniques built from first principles. Tags: Machine Learning • Scikit-learn • Foundations • Jupyter
Ongoing journey through Data Structures and Algorithms — curated problem sets, solutions, and notes in Python. Tags: DSA • Python • Problem Solving
→ View All Repositories on GitHub
Deep-dives into ML research, audio source separation, and multilingual NLP published on Medium and Analytics Drift.
- Salesforce Uses AWS Textract For Intelligent Document Automation
- Extracting Vocals And Instrumentals From Music The Deep Learning Way
- Microsoft Speller100: A Spell-Checker For Over 100 Languages
- A Deep Dive Into IBM Quantum Roadmap
- IIT Kanpur Offers Free 8-Weeks Computational Science Course
- Dealing With Racially-Biased Hate-Speech Detection Models
→ View All 42+ Articles on AnalyticsDrift
- 🔭 Building production-grade GraphRAG pipelines for medical question-answering systems
- 🧠 Fine-tuning domain-specific LLMs for healthcare and financial applications
- 🤖 Developing multimodal agentic systems with voice and vision capabilities
- 📚 Contributing to open-source ML/AI projects
- ✍️ Writing technical deep-dives on LLMs, knowledge graphs, and production ML systems
I'm always excited to discuss:
- GenAI & LLM applications and fine-tuning strategies
- Medical AI and healthcare tech innovations
- Production ML/MLOps architectures at scale
- Career opportunities in AI/ML engineering
- Research collaborations and open-source contributions
Feel free to reach out on LinkedIn, Twitter, or email me directly!

