Irfan Ali irfanalidv

👋 Hi, I’m Irfan

Data Scientist • AI Systems Architect • ML, NLP, LLMs & Scalable Data Engineering

I build production-grade AI systems that transform messy, multi-source data into real-time intelligence, automation, and decision-ready insights.

My work blends ML, NLP, LLMs, large-scale scraping, enrichment architectures, and agentic automation to design AI products that actually ship — not just experiments.

I specialize in:

🕸️ Universal scraping systems across 50+ dynamic, anti-bot-protected websites
🧩 Multi-source data enrichment engines integrating 20+ APIs with waterfall logic
🧠 LLM-powered extraction & automation workflows
⚙️ High-uptime data pipelines & research intelligence platforms
📡 Production APIs & real-time enrichment services

🚀 What I Do Best

AI Systems Architecture
End-to-end workflows, LLM automation, intelligent extraction, decision systems
Universal Web Scraping
Playwright/Selenium-based scrapers for dynamic JS, forms, pagination, bot protection
Multi-Source Data Enrichment
Apollo, PDL, ContactOut, SimilarWeb, Enrich.so, custom API routing & fallback pipelines
LLM-Driven Data Workflows
Classification, entity extraction, topic mapping, lead intelligence generation
Scalable Data Engineering
FastAPI services, data validation, auto-retry systems, queue-based workflows
Product & Team Leadership
Led global teams across India, Hong Kong, France & the US

🧠 Recent Role

Principal Data Scientist – AI & Scalable Data Engineering @ KurationAI

(Hong Kong — Remote)

At KurationAI, I built the foundational intelligence layer powering:

🌐 A universal web scraper deployed across 50+ global sources
🔗 A 20+ API enrichment engine with waterfall failovers, retries & key rotation
🧠 LLM-based classification & extraction pipelines
🔎 Similarity-search-driven lead intelligence datasets
⚡ Production-grade FastAPI services for real-time enrichment

Stack: Python, Playwright, Selenium, FastAPI, LangChain, MongoDB, RSS aggregation, GPT/Claude/Perplexity APIs

🏢 Past Experience

Head of Data & Analytics — Luminous Power Technologies

Built org-wide data strategy, BI platform, ML operations, and scalable pipelines.

Data Analytics & Automation — Lynk

Optimized expert-matching using NLP, automation, search & scalable data workflows.

Head of Data & Analytics — Brainsfeed

Built Infosphere, an NLP-powered research engine using 15+ extracted attributes.

Data Scientist — RightCust Technologies

Customer segmentation, forecasting, sentiment analysis.

Developer Evangelist — DevMetric

Built a university developer community; delivered technical workshops.

Data Visualization Developer — Datavis Tech (SF)

Interactive visualization systems using D3.js, Node.js, MongoDB.

🎓 Education

M.Sc. Data Science & AI — IISER Tirupati (2025–2026)
International Exchange — ISEP Paris (Data Science & Big Data Analytics)
B.Tech Computer Science Engineering — Alliance University

🧰 Tech Stack

Area	Tools
Languages	Python, R, SQL
ML/AI	LangChain, LangSmith, scikit-learn, LLM APIs
Scraping	Playwright, Selenium, Scrapy, PhantomBuster
APIs / Enrichment	Apollo, ContactOut, PDL, SimilarWeb, RSS
Cloud / DevOps	Azure, GCP, Docker, Azure DevOps
Data Engineering	FastAPI, REST APIs, MongoDB, PostgreSQL
Low/No-Code	Bubble.io, Airtable, Make.com, Zapier
Visualization	RStudio, Jupyter, Klipfolio, Power BI

🏆 Highlights

🏅 Winners — Philips Digital Healthcare Conclave 2015
🧠 Built intelligence platforms integrating 100+ data sources
📝 Research in neural-symbolic topic evolution & text analytics
🥇 Multiple Best Speaker awards

🔬 Featured Projects

⚡ Universal AI Web Scraper — Dynamic JS, anti-bot, forms, pagination
🔗 Multi-Source Enrichment Engine — 20+ APIs with smart fallback
🔍 Infosphere — NLP-driven research engine with Algolia
🚫 LLM-Powered Toxic Comment Classifier
🤖 Automated Lead Intelligence Platform

➡️ Check pinned repositories for demos & code.

📊 Development Activity

GitHub activity and contribution statistics available on demand
→ View live GitHub stats
Language usage breakdown
→ View top languages

🤝 Let’s Connect

I’m always open to collaborating on AI systems, enrichment engines, LLM automation, scalable pipelines, or research intelligence tooling.
Let’s build something impactful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly