Building Production AI Systems with Python + GenAI | Measurable business impact from Day 1
Most entry-level candidates have tutorial projects. I have production code with measurable business impact.
| Most Candidates | What I Bring |
|---|---|
| 🎓 Tutorial projects | ✅ Production ETL system (live, saving $15K/year) |
| ❌ No domain expertise | ✅ 15+ years business data experience |
| 📉 Basic skills | ✅ 4 years finance + 6 years trading domain expertise |
| 📦 Scattered portfolios | ✅ 8 production-grade projects with skills progression (incl. two flagships: AFC + Crucible) |
| 🤖 No AI integration | ✅ GenAI-first tools (LLM SDKs with Anthropic primary + local-first Qwen3/Ollama, RAG, FastMCP server, Multimodal AI, Pydantic structured outputs) |
| 📉 No AI evaluation | ✅ Evaluation-driven development (DeepEval + pytest, RAGAS, SelfCheckGPT, Docker in every project) |
Status: ✅ Live in production | 🌐 Public repository
Automated Python ETL pipeline for retirement plan distribution reconciliation at Daybright Financial.
| Metric | Impact |
|---|---|
| ⚡ Time Saved | 95% reduction (4-6 hours → 15 min/week) |
| 💰 Cost Savings | $15,000+/year in labor costs |
| 📊 Scalability | 10x capacity (300+ accounts vs 30 manual) |
| ✅ Accuracy | Zero errors since deployment |
Tech: Python • pandas • openpyxl • matplotlib • data validation • pytest • GitHub Actions CI • faker (synthetic data)
→ View Full Documentation & Code
Each project introduces new skills that build on the previous — from pure Python ETL to multimodal AI, RAG, statistical research systems, and an autonomous trading platform.
🏗️ Production Standard: Every project ships with architecture diagram (Mermaid), Dockerfile, evaluation metrics table, demo GIF, and "What I Learned" section.
🚩 Two flagships: Attention-Flow Catalyst (read-only small-cap swing research) and Crucible (autonomous intraday execution) — two genuinely different hard problems. Crucible is the chosen first build.
AI-Powered PII-Safe Data Intelligence | "Chat With Your Data"
Natural language analytics for retirement plan operations with PII protection and AI guardrails.
| Feature | Implementation |
|---|---|
| 🤖 AI Chat | LLM SDK (provider-agnostic) + PandasAI with code transparency |
| 🔒 PII Protection | Governance-as-code: PII leak prevention in AI responses |
| 📊 Hybrid Analytics | Pre-built dashboards + AI chat (works even without API key) |
| 🧩 Structured Outputs | Pydantic-validated AI responses |
Tech: Python • pandas • Streamlit • Gemini SDK • PandasAI • Pydantic • DeepEval • Docker • GitHub Actions CI
AI-Powered HR Policy Chatbot | "Ask Your Policies" | 🔌 Exposes FastMCP server
RAG chatbot that answers employee policy questions with cited sources and auto-escalates to HR when uncertain.
| Feature | Implementation |
|---|---|
| 🔍 Semantic Search | Embeddings + ChromaDB vector store + similarity scoring |
| 📎 Cited Answers | Every response cites specific policy section & document |
| 🎫 Smart Escalation | Confidence < 0.7 → auto-generate HR ticket with context |
| 🧠 RAG Pipeline | Document → Chunk → Embed → Retrieve → Generate |
| 🔌 MCP Server | FastMCP exposes retrieval as MCP tools — Cursor/Claude Desktop integration |
Tech: Python • Anthropic SDK (primary, Gemini fallback) • ChromaDB • Gemini Embeddings • Streamlit • Pydantic • DeepEval • RAGAS • SelfCheckGPT • FastMCP • Docker • GitHub Actions CI
AI-Powered Distribution Form Validator | "From Paper to Processing"
Multimodal AI system that reads retirement plan distribution forms (handwritten checkboxes, signatures), validates against business rules, and routes results.
| Feature | Implementation |
|---|---|
| 👁️ Vision AI | Gemini Vision reads checkboxes, handwriting, printed text |
| ✅ Validation | Business rule engine for ERISA-regulated distribution processing |
| 📧 Smart Routing | Complete → operations ticket |
| 📊 Confidence | Field-level extraction confidence scoring |
Tech: Python • Gemini Vision SDK • Streamlit • Pydantic • DeepEval • Docker • GitHub Actions CI
📊 Operations-Demand-Intelligence — Enterprise Analytics
AI-Powered Workflow Demand Analysis | 🚧 In Development
Analyzing 8+ months of OnBase workflow data to enable data-driven staffing decisions with AI-powered natural language queries.
| Feature | Implementation |
|---|---|
| 🔍 Demand Analysis | Volume patterns, Distribution vs Loan segmentation |
| 🤖 AI Integration | LLM SDK + PandasAI chat with guardrails + code transparency |
| 📊 Dashboard | Streamlit with interactive visualizations |
| 🔒 Data Privacy | PII handling, synthetic data for GitHub |
Tech: Python • pandas • Streamlit • Gemini SDK • PandasAI • Plotly • DeepEval • Docker • GitHub Actions CI
AI-Powered Streaming Subscription Rotation Advisor | "Spend Less, Watch More"
Consumer-facing dashboard that optimizes streaming subscriptions through AI-driven rotation scheduling, cost-per-view analytics, and content search.
| Feature | Implementation |
|---|---|
| 📺 Content Search | Watchmode + TMDB API integration ("Where can I watch X?") |
| 🤖 AI Rotation Planner | LLM analyzes habits + content calendar → optimal schedule |
| 💰 Savings Engine | Cost-per-view analytics + annual savings projections |
| 🛡️ Guardrails | Price validation, financial disclaimers, scope limits |
Tech: Python • httpx async • Watchmode/TMDB APIs • Streamlit • Gemini SDK • Pydantic • DeepEval • LangSmith • Docker • GitHub Actions CI
📈 Attention-Flow Catalyst — 🚀 Flagship #1
AI-Powered Predictive Trigger Analysis for Small-Cap Stocks | 🚧 Phase 1A Active
Research Question: Which trigger or combination best predicts +10% price moves within 3 trading days?
Flagship project designed as a defensible research system with statistical methodology — evolves through all 5 career stages.
| Trigger | Data Source | Signal Type |
|---|---|---|
| T1: Insider Buy | SEC Form 4 (edgartools) | Smart Money |
| T2: Wiki Spike | Wikipedia API | Public Attention |
| T3: News Spike | RSS/GDELT | Media Coverage |
| T4: Volume | yfinance (5 sub-signals) | Institutional Activity |
| T5: Dilution State | SEC filings | Capital Structure |
- ✅ Dynamic stock screener with survivorship bias controls
- ✅ 3+ years data collection for 50+ stocks
- ✅ Walk-forward validation (train Y1-2, test Y3)
- ✅ Bootstrap 95% confidence intervals
- ✅ Trigger leaderboard ranked by hit rate
- ✅ Forward signal generator
Phase 1B — AI-Powered Dashboard (Weeks 7-10) (Click to expand)
- 📅 Streamlit multi-page app (6 pages)
- 📅 LLM SDK chat interface + PandasAI with SQL transparency
- 📅 AI guardrails (read-only, cost controls, disclaimers)
- 📅 Interactive trigger leaderboard & signal explorer
- 📅 Deployment on Streamlit Cloud
- 📅 Demo video recording
| Dimension | Implementation |
|---|---|
| Methodology | Walk-forward validation, de-clustering, transaction costs |
| Bias Controls | Historical universe snapshots, corporate actions handling |
| Modern Stack | DuckDB + Parquet lakehouse, httpx async collectors |
| AI Features | LLM SDK (provider-agnostic) + PandasAI with guardrails, SQL transparency |
| AI Evaluation | DeepEval + pytest, elevated faithfulness thresholds (0.9), CI/CD integrated |
Tech: Python • DuckDB • Parquet • httpx async • edgartools • Anthropic SDK (primary, Gemini/OpenAI fallback) • PandasAI • Streamlit • DeepEval • SelfCheckGPT + FActScore • Docker • GitHub Actions CI
🔥 Crucible — 🚀 Flagship #2 (started first)
Autonomous Intraday Trading Research Platform | 🚧 Phase 1 | 🦙 Local-First AI
The question it answers, for any strategy: Does this have a real edge that survives out-of-sample validation — and can an autonomous agent trade it without me babysitting it?
Strategy-agnostic platform that takes any intraday strategy through three validation gates: backtest → paper → live. An LLM research analyst proposes strategy improvements, but its ideas are proved by deterministic backtests it never optimizes against — behind a sealed out-of-sample vault with every peek logged in an overfitting-budget ledger. That's what makes it defensible rather than an overfit black box.
Distinct from AFC (why two flagships, not redundancy): AFC is read-only research on illiquid sub-$5 small-caps over a multi-day swing horizon; Crucible is autonomous execution on liquid names over an intraday horizon. ~70% shared engineering spine, two different hard problems.
| Phase | What It Produces | Stage | Real money? |
|---|---|---|---|
| 1 — Backtest Engine | Own event-driven harness + AI research loop + sealed OOS vault (IT-1 ORB + VWAP Reclaim plugins) | Stage 1 | No |
| 2 — Paper Agent | Migrate to NautilusTrader (engine-parity gate); autonomous paper-trading agent crew (LangGraph); local Qwen3/Ollama analyst | Stages 2–3 | No |
| 3 — Live Agent | Autonomous live micro-sizing on Alpaca + Schwab/TOS; deterministic core + multi-agent oversight | Stages 3–4 | Yes (small) |
| Dimension | Implementation |
|---|---|
| Integrity controls | Sealed out-of-sample vault, logged overfitting budget, walk-forward CV |
| Engine trust | Own harness ↔ NautilusTrader engine-parity gate (validate the tool, don't just trust it) |
| AI safety | LLM behind "the Wall" (sees aggregates, never raw rows); deterministic core owns every trade |
| Plugin design | Strategies are plugins (Protocol + ABC + registry); IT-1 ORB + VWAP Reclaim prove the abstraction |
| Privacy + cost | Local-first (Qwen3/Ollama) — no API fee, financial data stays on-machine |
Tech: Python • own event-driven backtest harness → NautilusTrader (LGPL, free) • Optuna • DuckDB • Parquet • Ollama/Qwen3 (local-first) → Gemini → Anthropic → OpenAI • Pydantic • LangGraph • Alpaca (paper + live) + Schwab Trader API/TOS (live) • DeepEval • Docker • GitHub Actions CI
⚖️ Educational/research project. Not investment advice; makes no claim of positive expectancy — validation is the entire point.
| Type | Repository | Description |
|---|---|---|
| 🏭 Production System | 1099_reconciliation_pipeline | Live ETL system ($15K/year savings) |
| 🔐 First AI Project | datavault-analyst | PII-safe natural language data analytics |
| 📋 RAG Foundation | policypulse | HR policy chatbot with citations & escalation |
| 📄 Document AI | formsense | Multimodal form extraction & validation |
| 📊 Enterprise Analytics | operations-demand-intelligence | AI-powered workflow demand analysis |
| 📺 Consumer AI | streamsmart-optimizer | Streaming subscription rotation advisor |
| 📈 Flagship #1 | attention-flow-catalyst | AI-powered predictive trigger system (swing research) |
| 🔥 Flagship #2 | crucible | Autonomous intraday trading platform (backtest → paper → live) |
| 📖 Learning Journey | learning_journey | Public documentation |
📚 Data Portfolio Hub Central repository linking all projects with business context, technical details, and impact metrics.
Languages & Dev Tools
Data Analysis & Processing
Data Collection
Visualization & BI
Testing & CI/CD
Containerization
AI & GenAI Tools
AI Evaluation Frameworks
Trading & Backtesting (Crucible)
APIs & Domain Expertise
Domain Expertise: 15+ years business data • 4 years financial services • 6 years active trading
Planned skills for GenAI Data Engineer + AI Systems Architect role (Months 6-15):
Cloud: AWS (S3, Redshift, Lambda, Glue), BigQuery
Containers: Docker & Kubernetes Masterclass
Orchestration: Apache Airflow, dbt (data build tool)
Big Data: PySpark, distributed computing
Databases: PostgreSQL, Vector DBs (Pinecone/Weaviate/Qdrant)
AI Systems: RAG infrastructure, embedding pipelines, unstructured data ETL
Certification: AWS Certified Data Engineer Associate
- 🚧 CS50: Introduction to Computer Science (Harvard)
- 🚧 Python for Everybody Specialization (University of Michigan)
- 🚧 Google Data Analytics Professional Certificate
- 🚧 IBM Data Analyst Professional Certificate (11 courses)
- 📅 Statistics with Python Specialization (University of Michigan)
- 🚧 LLM SDK Engineering — Provider-agnostic architecture (Gemini, OpenAI, Claude)
- 🚧 IBM Generative AI Engineering Professional Certificate (16 courses) — RAG, LangChain, fine-tuning (LoRA/QLoRA), deployment (Stage 1 primary)
- 🚧 Generative AI Data Analyst Specialization (Vanderbilt) — ChatGPT+SQL workflows, CLUE/TRUST/CAPTURE frameworks
- 🚧 ChatGPT Prompt Engineering for Developers (DeepLearning.AI) — API basics, prompt design, few-shot learning
- 🚧 AI Python for Beginners (DeepLearning.AI) — Andrew Ng's AI-first Python foundation
- 📅 30 Days of Streamlit Challenge — Build AI UIs fast
- 📅 Building & Evaluating Advanced RAG (DeepLearning.AI) — RAG Triad metrics, evaluation-driven development
- 📅 Docker for Beginners with Hands-on Labs (KodeKloud/Coursera) — Dockerfile, Docker Compose, containerization
- ⚡ Cursor AI IDE — Primary editor with Composer mode
- ⚡ ChatGPT Plus — Advanced Data Analysis workflows
- ⚡ VS Code — Jupyter notebooks & data exploration
| Stage | Certifications & Focus |
|---|---|
| Stage 2 | AWS Certified Data Engineer, Docker & Kubernetes Masterclass, BigQuery, Vector DBs (Pinecone/Weaviate), RAG Infrastructure |
| Stage 3 | Deep Learning Specialization, NVIDIA DLI, Generative AI with LLMs (AWS), Fine-Tuning with PEFT, Ollama (powers Crucible Phase 2 local LLM analyst) |
| Stage 4 | Agentic AI (Andrew Ng), MCP (Anthropic), LangGraph, CrewAI, Multi-Agent Systems (powers Crucible Phase 2–3 agent crew) |
| Stage 5 | Automated Testing for LLMOps, System Design, Production AI Evaluation |
- 📈 6+ years active trading (swing & day strategies)
- 🌅 4:30 AM club (early morning focused study)
- ♟️ Chess enthusiast (strategy translates to markets!)
- 🤖 Fascinated by LLMs transforming financial analysis
- 📚 Reading: Machine Learning for Algorithmic Trading + Hands-On LLMs
- 🎯 Obsessed with data-driven decision making
- 💪 Proving systematic learning beats raw talent
📊 37-Month Progression Plan (Click to expand)
While I have a clear technical roadmap for long-term growth, my immediate focus is mastering the Data Analyst domain and delivering value in production environments.
| Stage | Duration | Role | Status |
|---|---|---|---|
| 1 | Months 1-5 | GenAI-First Data Analyst & AI Engineer | 🟢 ACTIVE |
| 2 | Months 6-15 | GenAI Data Engineer + AI Systems Architect | ⚪ Planned |
| 3 | Months 16-29 | ML Engineer + Local LLM Specialist | ⚪ Planned |
| 4 | Months 30-34 | Agentic AI Engineer & LLM Specialist | ⚪ Planned |
| 5 | Months 35-37 | Senior LLM Engineer | 🎯 Goal |
💭 Building in Public Philosophy (Click to expand)
I've worked with data my entire career—manufacturing, bookkeeping, financial services, and 6 years of active trading. One truth became clear: data-driven decisions consistently outperform intuition.
But I hit a ceiling: I could analyze data brilliantly, but couldn't build automated systems to scale insights. So I'm building them myself—publicly, because career transformation shouldn't be mysterious.
Why share openly?
- ✅ Transparency: Real learning is messy—showing process, not just polished results
- ✅ Accountability: Public commits = public commitment
- ✅ Portfolio: This profile IS proof of ability and trajectory
I'm Open To:
- 💼 Data Analyst & AI Engineer Opportunities (Remote preferred)
- 🤝 Code reviews and technical discussions
- 🎓 Knowledge exchange on data + AI + finance
Let's Connect If You:
- Value production code over tutorial completions
- Are hiring Data Analysts with proven delivery capability
- Are building data-driven AI systems
Current Stage: Stage 1 (GenAI-First Data Analyst & AI Engineer) | 🟢 Active • Building in Public
⭐️ Star repos if you find them useful! | 🔔 Follow for updates on the 37-month journey!

