MS Data Science @ Indiana University • AI Agents • Data Engineering • Applied ML
Building agentic systems, data pipelines, and applied ML products that turn messy workflows into reliable software.
I am a Data Science graduate student at Indiana University focused on AI agents, data engineering, applied machine learning, and LLM-powered systems.
My work sits at the intersection of:
- AI agent infrastructure — MCP servers, agent marketplaces, trust layers, CLI safety wrappers, orchestration systems
- Data engineering — ELT pipelines, orchestration, analytics-ready modeling, streaming/data quality workflows
- Applied ML & LLM systems — recommendation engines, fine-tuning workflows, RAG systems, NLP, evaluation, and production-style ML tools
- Developer tooling — reusable packages, automation frameworks, local-first tools, and product prototypes
I like building systems that are not just demos, but have real architecture behind them: APIs, databases, queues, observability, workflow orchestration, and clean developer experience.
| Project | Description | Tech / Focus |
|---|---|---|
| Meerkat | Local-first, agent-agnostic CLI security wrapper that protects host systems while autonomous AI agents execute terminal commands. Includes command interception, policy-based approvals, sandboxing, and human-in-the-loop review. | Go • JavaScript • Shell • Makefile • Agent Safety |
| PromptPatch | Local-first MCP server for dynamic system prompt management, enabling context-aware prompt patching, local storage, template validation, and prompt-injection guardrails. | Python • MCP • uv • Prompt Engineering |
| Deepnote-MCP | Prototype MCP server that turns Deepnote into an interactive data-agent environment with tools for SQL execution, Python execution, CSV loading, and chart rendering. | Python • MCP • DuckDB • Pandas • Deepnote |
| Axiomeer | Universal marketplace and API gateway for AI agents and AI services, supporting discovery, orchestration, provider fallbacks, workflow chaining, and centralized API key management. | Python • FastAPI • PostgreSQL • Redis • Docker |
| Verix | Cryptographic trust infrastructure for AI agents, supporting agent certificates, tamper-proof audit logs, Merkle inclusion proofs, and enterprise observability. | Python • FastAPI • PostgreSQL • Redis • Kubernetes • Helm • OpenTelemetry |
| Autonomous AI Company OS | Multi-agent operating framework that models company roles such as CEO, CTO, engineers, and DevOps agents, with organizational memory, RAG, and Redis-based communication. | Python • TypeScript • Redis Streams • Supabase • ChromaDB • LlamaIndex • Docker |
| Project | Description | Tech / Focus |
|---|---|---|
| Supply Chain AI Operating System | AI-native supply chain control tower for autonomous logistics and data disruption resolution, including multi-agent coordination, validation gates, self-healing workflows, and audit logging. | Python • TypeScript • Dagster • MLflow • PostgreSQL • Kafka • Docker |
| ELT Pipeline: Airflow + dbt + Postgres | Fully containerized ELT pipeline with orchestration, transformations, and analytics-ready modeling. | Airflow • dbt • PostgreSQL • Docker • Python |
| Blackjack Analytics Pipeline | End-to-end analytics pipeline for blackjack simulations and strategy analysis, including large-scale data processing and BI reporting. | AWS S3 • EC2 • SageMaker • PySpark • Power BI |
| Interactive ML Pipeline | Modular ML workflow for preprocessing, training, evaluation, and experimentation across datasets and models. | Python • Jupyter • scikit-learn • ML Pipelines |
| dbt-core Fork | Hands-on exploration and customization of the dbt-core analytics engineering codebase. | Python • dbt • Analytics Engineering |
| Project | Description | Tech / Focus |
|---|---|---|
| TextbookAPI | Converts PDFs into a searchable API for question answering using embeddings, FAISS, and local LLM workflows. | FastAPI • FAISS • PyMuPDF • Ollama • Embeddings |
| Structured Knowledge Notation / SKN | Token-efficient extraction format designed to represent confidence, causality, gaps, and epistemic metadata in LLM outputs. | LLM Extraction • Evaluation • Benchmarks |
| Movie Recommendation Engine | Recommendation system using collaborative filtering, matrix factorization, and PyTorch-based modeling. | Python • PyTorch • Recommender Systems |
| Algorithmic Trading System with AI Analysis | Trading system with strategy backtesting, risk management, and AI-powered interpretation using local LLMs. | Python • Ollama • Backtesting • Risk Analysis |
| LLM Fine-Tuning | Fine-tuning experiments focused on conversational style adaptation and training workflows. | LoRA • Datasets • Training Pipelines |
| ATS Resume Optimizer | Resume analysis and optimization tool focused on ATS heuristics, keyword alignment, and formatting improvements. | Python • NLP • Resume Analysis |
| Sentiment Analysis Projects | NLP projects for IMDb reviews and COVID-19 text, including preprocessing, feature extraction, and model evaluation. | Python • NLP • Classification |
| Project | Description | Tech / Focus |
|---|---|---|
| UAutoml | Automated machine learning framework for data analysis, packaged as a reusable Python library. | Python • AutoML • PyPI |
| TicTacToeAI | Game AI package demonstrating algorithmic decision-making and Python packaging. | Python • AI • PyPI |
| AI Data Scraper | TypeScript web app for responsibly scraping and preprocessing website text for AI training datasets. | TypeScript • Web Scraping • Data Collection |
| Crypto News Scraper | Automated scraper for crypto market news collection and downstream NLP or signal analysis. | Python • Scraping • NLP |
| Android Permission Dataset Creation | Dataset tooling around Android permissions and activities for security/data analysis workflows. | Python • Dataset Engineering |
| Avantos / LodeAI / PermitDecoder / Mane / Timer / idea-tinder | Frontend and product prototypes across AI tooling, web apps, startup ideas, and coding challenges. | TypeScript • JavaScript • HTML • CSS |
AI Agents -> MCP servers, agent tools, orchestration, safety wrappers, trust layers
Data Engineering -> Airflow, dbt, Dagster, PostgreSQL, Kafka, Docker, analytics pipelines
LLM Applications -> RAG, embeddings, extraction, prompt systems, evaluation, fine-tuning
Applied ML -> recommender systems, NLP, sentiment analysis, backtesting, classification
Developer Tooling -> local-first tools, reusable packages, automation, CLI/security utilities
Product Engineering -> FastAPI backends, TypeScript prototypes, dashboards, APIs- Building safer and more reliable AI agent infrastructure
- Exploring MCP servers for data, prompt, and developer workflows
- Designing agent trust, observability, and audit systems
- Shipping production-style data + ML pipelines
- Improving LLM applications with better retrieval, evaluation, and structured outputs
I am always interested in conversations around AI agents, applied ML, data engineering, LLM infrastructure, and startup/product ideas.
- Portfolio: ujjwalreddyks.com
- LinkedIn: linkedin.com/in/ujjwalreddyks
- Google Scholar: Scholar Profile
- Email: ujjwalreddyks@gmail.com
Building reliable AI systems, one agent and pipeline at a time.


