Systems Architect | Rust Engineer | Ph.D. Mathematician
I specialise in High-Performance Computing (HPC) and ML Infrastructure. My focus is replacing GIL-bound Python bottlenecks with highly optimised Rust and C++ kernels, leveraging SIMD, zero-copy data transfer (Apache Arrow), and in-process OLAP engines (DuckDB).
- Advanced Forecasting: Large-scale Hierarchical Time Series (HTS), Probabilistic Forecasting, and Handling Intermittent Demand (Sparse Data) for Supply Chains.
- Rigorous Statistics: Bringing statistics and ML to production without the Python overhead.
- Industrial AI: Anomaly Detection in manufacturing processes, Predictive Quality, and Root Cause Analysis using Causal Inference.
- Systems Programming: Porting interpretability-heavy Python logic to Rust/C++ (WASM/Native).
- GenAI Infrastructure: Building Model Context Protocol (MCP) servers and dependency-free inference engines for Foundation Models.
- Data Engineering: Designing zero-copy ETL pipelines using DuckDB, Polars, and Apache Arrow.
- RAG & Semantic Search: Developing Magpie, a High-performance vector database and RAG engine built in Rust. Implements semantic search with HNSW/DiskANN backends, hybrid retrieval combining dense vectors with BM25, and cross-encoder reranking. Features AST-aware chunking for 7 programming languages (Python, JS, Rust, Java, C#, R, TypeScript), document extraction for 57+ formats including OCR, and advanced RAG patterns (HyDE, Multi-Query, Corrective RAG). Supports multiple embedding/LLM providers (Ollama, OpenAI, Anthropic, Gemini) with deployment options including REST API, MCP server for Claude integration, and WebAssembly.
- Architecture: Full re-implementation of the Chronos-2 time-series foundation model in pure Rust.
- Objective: Remove heavy PyTorch/Python dependencies for edge and high-throughput environments.
- Tech:
Candle/Burn,WASM,Tokio.
- Performance: Achieved 2,900x speedup vs.
statsmodels/pandasloops by moving logic to C++. - Design: Hybrid architecture using DuckDB for parallelized data shuffling and Rust for vectorized statistical kernels.
- Tech:
Rust,DuckDB C-API,OpenMP.
- Architecture: Model Context Protocol server that injects database schemas, query patterns, and semantic code search into AI coding agents.
- Capability: Enables LLMs to generate contextually-aware SQL by retrieving similar queries and schema documentation at inference time.
- Tech:
axum,HNSW,MCP Protocol,WASM.
- Scale: Indexed 22,771 R packages with 1024-dim embeddings, enabling semantic search across the entire CRAN ecosystem.
- Analysis: HNSW-accelerated clustering (O(n log n)) discovered 41 package communities including Time Series, Bioinformatics, and ML—with hierarchical sub-clustering revealing specialised niches (e.g., GARCH, VAR, Single-Cell RNA-seq).
- Potential: Enables semantic package discovery, automated Task View curation, near-duplicate detection (e.g., 0.97 similarity pairs), ecosystem health monitoring, and RAG-powered R code generation via MCP.
- Potential: Foundation for R package recommendation, dependency analysis, and ecosystem-wide code intelligence.
- Tech:
Rust,hnsw_rs,Louvain community detection,Ollama embeddings.
- Implementation: Custom Rust-based servers implementing the MCP standard to inject dynamic context (DB schemas, API specs) into AI coding agents.
- Tech:
axum,serde,async-trait.
| Crate / Repo | Description | Stack |
|---|---|---|
| sipemu | 4+ Utility crates for statistical computing. | Rust |
| AnoFox-Statistics | High-performance statistical extension for DuckDB. | Rust, DuckDB |
| Polars-Statistics | FFI bindings for high-speed statistics on Polars DataFrames. | Python, Rust, Polars |





