Data Scientist β’ AI Systems Architect β’ ML, NLP, LLMs & Scalable Data Engineering
I build production-grade AI systems that transform messy, multi-source data into real-time intelligence, automation, and decision-ready insights.
My work blends ML, NLP, LLMs, large-scale scraping, enrichment architectures, and agentic automation to design AI products that actually ship β not just experiments.
I specialize in:
- πΈοΈ Universal scraping systems across 50+ dynamic, anti-bot-protected websites
- π§© Multi-source data enrichment engines integrating 20+ APIs with waterfall logic
- π§ LLM-powered extraction & automation workflows
- βοΈ High-uptime data pipelines & research intelligence platforms
- π‘ Production APIs & real-time enrichment services
-
AI Systems Architecture
End-to-end workflows, LLM automation, intelligent extraction, decision systems -
Universal Web Scraping
Playwright/Selenium-based scrapers for dynamic JS, forms, pagination, bot protection -
Multi-Source Data Enrichment
Apollo, PDL, ContactOut, SimilarWeb, Enrich.so, custom API routing & fallback pipelines -
LLM-Driven Data Workflows
Classification, entity extraction, topic mapping, lead intelligence generation -
Scalable Data Engineering
FastAPI services, data validation, auto-retry systems, queue-based workflows -
Product & Team Leadership
Led global teams across India, Hong Kong, France & the US
(Hong Kong β Remote)
At KurationAI, I built the foundational intelligence layer powering:
- π A universal web scraper deployed across 50+ global sources
- π A 20+ API enrichment engine with waterfall failovers, retries & key rotation
- π§ LLM-based classification & extraction pipelines
- π Similarity-search-driven lead intelligence datasets
- β‘ Production-grade FastAPI services for real-time enrichment
Stack: Python, Playwright, Selenium, FastAPI, LangChain, MongoDB, RSS aggregation, GPT/Claude/Perplexity APIs
Built org-wide data strategy, BI platform, ML operations, and scalable pipelines.
Optimized expert-matching using NLP, automation, search & scalable data workflows.
Built Infosphere, an NLP-powered research engine using 15+ extracted attributes.
Customer segmentation, forecasting, sentiment analysis.
Built a university developer community; delivered technical workshops.
Interactive visualization systems using D3.js, Node.js, MongoDB.
- M.Sc. Data Science & AI β IISER Tirupati (2025β2026)
- International Exchange β ISEP Paris (Data Science & Big Data Analytics)
- B.Tech Computer Science Engineering β Alliance University
| Area | Tools |
|---|---|
| Languages | Python, R, SQL |
| ML/AI | LangChain, LangSmith, scikit-learn, LLM APIs |
| Scraping | Playwright, Selenium, Scrapy, PhantomBuster |
| APIs / Enrichment | Apollo, ContactOut, PDL, SimilarWeb, RSS |
| Cloud / DevOps | Azure, GCP, Docker, Azure DevOps |
| Data Engineering | FastAPI, REST APIs, MongoDB, PostgreSQL |
| Low/No-Code | Bubble.io, Airtable, Make.com, Zapier |
| Visualization | RStudio, Jupyter, Klipfolio, Power BI |
- π Winners β Philips Digital Healthcare Conclave 2015
- π§ Built intelligence platforms integrating 100+ data sources
- π Research in neural-symbolic topic evolution & text analytics
- π₯ Multiple Best Speaker awards
- β‘ Universal AI Web Scraper β Dynamic JS, anti-bot, forms, pagination
- π Multi-Source Enrichment Engine β 20+ APIs with smart fallback
- π Infosphere β NLP-driven research engine with Algolia
- π« LLM-Powered Toxic Comment Classifier
- π€ Automated Lead Intelligence Platform
β‘οΈ Check pinned repositories for demos & code.
-
GitHub activity and contribution statistics available on demand
β View live GitHub stats -
Language usage breakdown
β View top languages
Iβm always open to collaborating on AI systems, enrichment engines, LLM automation, scalable pipelines, or research intelligence tooling.
Letβs build something impactful.



