Manuel R. manuel-reyes-ml

Hi, I'm Manuel 👋

Building Production AI Systems with Python + GenAI | Measurable business impact from Day 1

💼 What Makes Me Different

Most entry-level candidates have tutorial projects. I have production code with measurable business impact.

Most Candidates	What I Bring
🎓 Tutorial projects	✅ Production ETL system (live, saving $15K/year)
❌ No domain expertise	✅ 15+ years business data experience
📉 Basic skills	✅ 4 years finance + 6 years trading domain expertise
📦 Scattered portfolios	✅ 8 production-grade projects with skills progression (incl. two flagships: AFC + Crucible)
🤖 No AI integration	✅ GenAI-first tools (LLM SDKs with Anthropic primary + local-first Qwen3/Ollama, RAG, FastMCP server, Multimodal AI, Pydantic structured outputs)
📉 No AI evaluation	✅ Evaluation-driven development (DeepEval + pytest, RAGAS, SelfCheckGPT, Docker in every project)

🚀 Production Highlight

🧾 1099 Reconciliation ETL Pipeline

Status: ✅ Live in production | 🌐 Public repository

Automated Python ETL pipeline for retirement plan distribution reconciliation at Daybright Financial.

Metric	Impact
⚡ Time Saved	95% reduction (4-6 hours → 15 min/week)
💰 Cost Savings	$15,000+/year in labor costs
📊 Scalability	10x capacity (300+ accounts vs 30 manual)
✅ Accuracy	Zero errors since deployment

Tech: Python • pandas • openpyxl • matplotlib • data validation • pytest • GitHub Actions CI • faker (synthetic data)

→ View Full Documentation & Code

📈 Project Pipeline — Skills Progression (Easy → Two Flagships)

Each project introduces new skills that build on the previous — from pure Python ETL to multimodal AI, RAG, statistical research systems, and an autonomous trading platform.

🏗️ Production Standard: Every project ships with architecture diagram (Mermaid), Dockerfile, evaluation metrics table, demo GIF, and "What I Learned" section.

🚩 Two flagships: Attention-Flow Catalyst (read-only small-cap swing research) and Crucible (autonomous intraday execution) — two genuinely different hard problems. Crucible is the chosen first build.

🔐 DataVault Analyst — First AI Project

AI-Powered PII-Safe Data Intelligence | "Chat With Your Data"

Natural language analytics for retirement plan operations with PII protection and AI guardrails.

Feature	Implementation
🤖 AI Chat	LLM SDK (provider-agnostic) + PandasAI with code transparency
🔒 PII Protection	Governance-as-code: PII leak prevention in AI responses
📊 Hybrid Analytics	Pre-built dashboards + AI chat (works even without API key)
🧩 Structured Outputs	Pydantic-validated AI responses

Tech: Python • pandas • Streamlit • Gemini SDK • PandasAI • Pydantic • DeepEval • Docker • GitHub Actions CI

📋 PolicyPulse — RAG Foundation

AI-Powered HR Policy Chatbot | "Ask Your Policies" | 🔌 Exposes FastMCP server

RAG chatbot that answers employee policy questions with cited sources and auto-escalates to HR when uncertain.

Feature	Implementation
🔍 Semantic Search	Embeddings + ChromaDB vector store + similarity scoring
📎 Cited Answers	Every response cites specific policy section & document
🎫 Smart Escalation	Confidence < 0.7 → auto-generate HR ticket with context
🧠 RAG Pipeline	Document → Chunk → Embed → Retrieve → Generate
🔌 MCP Server	FastMCP exposes retrieval as MCP tools — Cursor/Claude Desktop integration

Tech: Python • Anthropic SDK (primary, Gemini fallback) • ChromaDB • Gemini Embeddings • Streamlit • Pydantic • DeepEval • RAGAS • SelfCheckGPT • FastMCP • Docker • GitHub Actions CI

📄 FormSense — Document Intelligence

AI-Powered Distribution Form Validator | "From Paper to Processing"

Multimodal AI system that reads retirement plan distribution forms (handwritten checkboxes, signatures), validates against business rules, and routes results.

Feature	Implementation
👁️ Vision AI	Gemini Vision reads checkboxes, handwriting, printed text
✅ Validation	Business rule engine for ERISA-regulated distribution processing
📧 Smart Routing	Complete → operations ticket
📊 Confidence	Field-level extraction confidence scoring

Tech: Python • Gemini Vision SDK • Streamlit • Pydantic • DeepEval • Docker • GitHub Actions CI

📊 Operations-Demand-Intelligence — Enterprise Analytics

AI-Powered Workflow Demand Analysis | 🚧 In Development

Analyzing 8+ months of OnBase workflow data to enable data-driven staffing decisions with AI-powered natural language queries.

Feature	Implementation
🔍 Demand Analysis	Volume patterns, Distribution vs Loan segmentation
🤖 AI Integration	LLM SDK + PandasAI chat with guardrails + code transparency
📊 Dashboard	Streamlit with interactive visualizations
🔒 Data Privacy	PII handling, synthetic data for GitHub

Tech: Python • pandas • Streamlit • Gemini SDK • PandasAI • Plotly • DeepEval • Docker • GitHub Actions CI

→ View Project

📺 StreamSmart Optimizer — Consumer AI App

AI-Powered Streaming Subscription Rotation Advisor | "Spend Less, Watch More"

Consumer-facing dashboard that optimizes streaming subscriptions through AI-driven rotation scheduling, cost-per-view analytics, and content search.

Feature	Implementation
📺 Content Search	Watchmode + TMDB API integration ("Where can I watch X?")
🤖 AI Rotation Planner	LLM analyzes habits + content calendar → optimal schedule
💰 Savings Engine	Cost-per-view analytics + annual savings projections
🛡️ Guardrails	Price validation, financial disclaimers, scope limits

Tech: Python • httpx async • Watchmode/TMDB APIs • Streamlit • Gemini SDK • Pydantic • DeepEval • LangSmith • Docker • GitHub Actions CI

📈 Attention-Flow Catalyst — 🚀 Flagship #1

AI-Powered Predictive Trigger Analysis for Small-Cap Stocks | 🚧 Phase 1A Active

Research Question: Which trigger or combination best predicts +10% price moves within 3 trading days?

Flagship project designed as a defensible research system with statistical methodology — evolves through all 5 career stages.

Trigger Framework

Trigger	Data Source	Signal Type
T1: Insider Buy	SEC Form 4 (edgartools)	Smart Money
T2: Wiki Spike	Wikipedia API	Public Attention
T3: News Spike	RSS/GDELT	Media Coverage
T4: Volume	yfinance (5 sub-signals)	Institutional Activity
T5: Dilution State	SEC filings	Capital Structure

Phase 1A — Backtest Engine (Weeks 1-6)

✅ Dynamic stock screener with survivorship bias controls
✅ 3+ years data collection for 50+ stocks
✅ Walk-forward validation (train Y1-2, test Y3)
✅ Bootstrap 95% confidence intervals
✅ Trigger leaderboard ranked by hit rate
✅ Forward signal generator

Phase 1B — AI-Powered Dashboard (Weeks 7-10) (Click to expand)

📅 Streamlit multi-page app (6 pages)
📅 LLM SDK chat interface + PandasAI with SQL transparency
📅 AI guardrails (read-only, cost controls, disclaimers)
📅 Interactive trigger leaderboard & signal explorer
📅 Deployment on Streamlit Cloud
📅 Demo video recording

What Makes This Defensible

Dimension	Implementation
Methodology	Walk-forward validation, de-clustering, transaction costs
Bias Controls	Historical universe snapshots, corporate actions handling
Modern Stack	DuckDB + Parquet lakehouse, httpx async collectors
AI Features	LLM SDK (provider-agnostic) + PandasAI with guardrails, SQL transparency
AI Evaluation	DeepEval + pytest, elevated faithfulness thresholds (0.9), CI/CD integrated

Tech: Python • DuckDB • Parquet • httpx async • edgartools • Anthropic SDK (primary, Gemini/OpenAI fallback) • PandasAI • Streamlit • DeepEval • SelfCheckGPT + FActScore • Docker • GitHub Actions CI

→ View Project

🔥 Crucible — 🚀 Flagship #2 (started first)

Autonomous Intraday Trading Research Platform | 🚧 Phase 1 | 🦙 Local-First AI

The question it answers, for any strategy: Does this have a real edge that survives out-of-sample validation — and can an autonomous agent trade it without me babysitting it?

Strategy-agnostic platform that takes any intraday strategy through three validation gates: backtest → paper → live. An LLM research analyst proposes strategy improvements, but its ideas are proved by deterministic backtests it never optimizes against — behind a sealed out-of-sample vault with every peek logged in an overfitting-budget ledger. That's what makes it defensible rather than an overfit black box.

Distinct from AFC (why two flagships, not redundancy): AFC is read-only research on illiquid sub-$5 small-caps over a multi-day swing horizon; Crucible is autonomous execution on liquid names over an intraday horizon. ~70% shared engineering spine, two different hard problems.

Three Build Phases (Phase 1 now in Stage 1; agentic phases mature across Stages 3–4)

Phase	What It Produces	Stage	Real money?
1 — Backtest Engine	Own event-driven harness + AI research loop + sealed OOS vault (IT-1 ORB + VWAP Reclaim plugins)	Stage 1	No
2 — Paper Agent	Migrate to NautilusTrader (engine-parity gate); autonomous paper-trading agent crew (LangGraph); local Qwen3/Ollama analyst	Stages 2–3	No
3 — Live Agent	Autonomous live micro-sizing on Alpaca + Schwab/TOS; deterministic core + multi-agent oversight	Stages 3–4	Yes (small)

What Makes This Defensible

Dimension	Implementation
Integrity controls	Sealed out-of-sample vault, logged overfitting budget, walk-forward CV
Engine trust	Own harness ↔ NautilusTrader engine-parity gate (validate the tool, don't just trust it)
AI safety	LLM behind "the Wall" (sees aggregates, never raw rows); deterministic core owns every trade
Plugin design	Strategies are plugins (Protocol + ABC + registry); IT-1 ORB + VWAP Reclaim prove the abstraction
Privacy + cost	Local-first (Qwen3/Ollama) — no API fee, financial data stays on-machine

Tech: Python • own event-driven backtest harness → NautilusTrader (LGPL, free) • Optuna • DuckDB • Parquet • Ollama/Qwen3 (local-first) → Gemini → Anthropic → OpenAI • Pydantic • LangGraph • Alpaca (paper + live) + Schwab Trader API/TOS (live) • DeepEval • Docker • GitHub Actions CI

⚖️ Educational/research project. Not investment advice; makes no claim of positive expectancy — validation is the entire point.

→ View Project

📌 Repository Guide

Type	Repository	Description
🏭 Production System	1099_reconciliation_pipeline	Live ETL system ($15K/year savings)
🔐 First AI Project	datavault-analyst	PII-safe natural language data analytics
📋 RAG Foundation	policypulse	HR policy chatbot with citations & escalation
📄 Document AI	formsense	Multimodal form extraction & validation
📊 Enterprise Analytics	operations-demand-intelligence	AI-powered workflow demand analysis
📺 Consumer AI	streamsmart-optimizer	Streaming subscription rotation advisor
📈 Flagship #1	attention-flow-catalyst	AI-powered predictive trigger system (swing research)
🔥 Flagship #2	crucible	Autonomous intraday trading platform (backtest → paper → live)
📖 Learning Journey	learning_journey	Public documentation

📚 Data Portfolio Hub Central repository linking all projects with business context, technical details, and impact metrics.

🛠️ Technical Skills (Stage 1 — Mastering Now)

Languages & Dev Tools

Data Analysis & Processing

Data Collection

Visualization & BI

Testing & CI/CD

Containerization

AI & GenAI Tools

AI Evaluation Frameworks

Trading & Backtesting (Crucible)

APIs & Domain Expertise

Domain Expertise: 15+ years business data • 4 years financial services • 6 years active trading

🔮 Stage 2 Technical Skills (Coming Next)

Planned skills for GenAI Data Engineer + AI Systems Architect role (Months 6-15):

Cloud:         AWS (S3, Redshift, Lambda, Glue), BigQuery
Containers:    Docker & Kubernetes Masterclass
Orchestration: Apache Airflow, dbt (data build tool)
Big Data:      PySpark, distributed computing
Databases:     PostgreSQL, Vector DBs (Pinecone/Weaviate/Qdrant)
AI Systems:    RAG infrastructure, embedding pipelines, unstructured data ETL
Certification: AWS Certified Data Engineer Associate

🎓 Education & Certifications

Active Learning (Stage 1):

Core Data Courses

🚧 CS50: Introduction to Computer Science (Harvard)
🚧 Python for Everybody Specialization (University of Michigan)
🚧 Google Data Analytics Professional Certificate
🚧 IBM Data Analyst Professional Certificate (11 courses)
📅 Statistics with Python Specialization (University of Michigan)

🤖 GenAI Engineering Courses

🚧 LLM SDK Engineering — Provider-agnostic architecture (Gemini, OpenAI, Claude)
🚧 IBM Generative AI Engineering Professional Certificate (16 courses) — RAG, LangChain, fine-tuning (LoRA/QLoRA), deployment (Stage 1 primary)
🚧 Generative AI Data Analyst Specialization (Vanderbilt) — ChatGPT+SQL workflows, CLUE/TRUST/CAPTURE frameworks
🚧 ChatGPT Prompt Engineering for Developers (DeepLearning.AI) — API basics, prompt design, few-shot learning
🚧 AI Python for Beginners (DeepLearning.AI) — Andrew Ng's AI-first Python foundation
📅 30 Days of Streamlit Challenge — Build AI UIs fast

🧪 Evaluation & Containerization (v8.2)

📅 Building & Evaluating Advanced RAG (DeepLearning.AI) — RAG Triad metrics, evaluation-driven development
📅 Docker for Beginners with Hands-on Labs (KodeKloud/Coursera) — Dockerfile, Docker Compose, containerization

AI Development Tools

⚡ Cursor AI IDE — Primary editor with Composer mode
⚡ ChatGPT Plus — Advanced Data Analysis workflows
⚡ VS Code — Jupyter notebooks & data exploration

Planned by Stage:

Stage	Certifications & Focus
Stage 2	AWS Certified Data Engineer, Docker & Kubernetes Masterclass, BigQuery, Vector DBs (Pinecone/Weaviate), RAG Infrastructure
Stage 3	Deep Learning Specialization, NVIDIA DLI, Generative AI with LLMs (AWS), Fine-Tuning with PEFT, Ollama (powers Crucible Phase 2 local LLM analyst)
Stage 4	Agentic AI (Andrew Ng), MCP (Anthropic), LangGraph, CrewAI, Multi-Agent Systems (powers Crucible Phase 2–3 agent crew)
Stage 5	Automated Testing for LLMOps, System Design, Production AI Evaluation

⚡ Quick Facts

📈 6+ years active trading (swing & day strategies)
🌅 4:30 AM club (early morning focused study)
♟️ Chess enthusiast (strategy translates to markets!)
🤖 Fascinated by LLMs transforming financial analysis
📚 Reading: Machine Learning for Algorithmic Trading + Hands-On LLMs
🎯 Obsessed with data-driven decision making
💪 Proving systematic learning beats raw talent

📊 GitHub Activity

📈 Stats & Contributions

💻 Most Used Languages

📊 37-Month Progression Plan (Click to expand)

While I have a clear technical roadmap for long-term growth, my immediate focus is mastering the Data Analyst domain and delivering value in production environments.

Stage	Duration	Role	Status
1	Months 1-5	GenAI-First Data Analyst & AI Engineer	🟢 ACTIVE
2	Months 6-15	GenAI Data Engineer + AI Systems Architect	⚪ Planned
3	Months 16-29	ML Engineer + Local LLM Specialist	⚪ Planned
4	Months 30-34	Agentic AI Engineer & LLM Specialist	⚪ Planned
5	Months 35-37	Senior LLM Engineer	🎯 Goal

→ View Interactive Roadmap

💭 Building in Public Philosophy (Click to expand)

I've worked with data my entire career—manufacturing, bookkeeping, financial services, and 6 years of active trading. One truth became clear: data-driven decisions consistently outperform intuition.

But I hit a ceiling: I could analyze data brilliantly, but couldn't build automated systems to scale insights. So I'm building them myself—publicly, because career transformation shouldn't be mysterious.

Why share openly?

✅ Transparency: Real learning is messy—showing process, not just polished results
✅ Accountability: Public commits = public commitment
✅ Portfolio: This profile IS proof of ability and trajectory

🌐 Let's Connect

I'm Open To:

💼 Data Analyst & AI Engineer Opportunities (Remote preferred)
🤝 Code reviews and technical discussions
🎓 Knowledge exchange on data + AI + finance

Let's Connect If You:

Value production code over tutorial completions
Are hiring Data Analysts with proven delivery capability
Are building data-driven AI systems

Current Stage: Stage 1 (GenAI-First Data Analyst & AI Engineer) | 🟢 Active • Building in Public

⭐️ Star repos if you find them useful! | 🔔 Follow for updates on the 37-month journey!

Manuel R. manuel-reyes-ml

Achievements

Achievements

Highlights

Hi, I'm Manuel 👋

💼 What Makes Me Different

🚀 Production Highlight

🧾 1099 Reconciliation ETL Pipeline

📈 Project Pipeline — Skills Progression (Easy → Two Flagships)

🔐 DataVault Analyst — First AI Project

📋 PolicyPulse — RAG Foundation

📄 FormSense — Document Intelligence

📊 Operations-Demand-Intelligence — Enterprise Analytics

📺 StreamSmart Optimizer — Consumer AI App

📈 Attention-Flow Catalyst — 🚀 Flagship #1

Trigger Framework

Phase 1A — Backtest Engine (Weeks 1-6)

What Makes This Defensible

🔥 Crucible — 🚀 Flagship #2 (started first)

Three Build Phases (Phase 1 now in Stage 1; agentic phases mature across Stages 3–4)

What Makes This Defensible

📌 Repository Guide

🛠️ Technical Skills (Stage 1 — Mastering Now)

🔮 Stage 2 Technical Skills (Coming Next)

🎓 Education & Certifications

Active Learning (Stage 1):

Core Data Courses

🤖 GenAI Engineering Courses

🧪 Evaluation & Containerization (v8.2)

AI Development Tools

Planned by Stage:

⚡ Quick Facts

📊 GitHub Activity

📈 Stats & Contributions

💻 Most Used Languages

🌐 Let's Connect

Pinned Loading

Uh oh!