I publicly engineer real-world AI systems from scratch โ documenting decisions, failures, tradeoffs, and mental models along the way.
Real-time AI voice sales agent. 40+ concurrent calls. Sub-600ms end-to-end latency. Zero vendor lock-in.
| Component | Technology | Latency |
|---|---|---|
| Voice Activity Detection | Silero VAD v5 + WebRTC (dual-fusion) | 6ms |
| Speech Recognition | Faster-Whisper Large-v3 (streaming) | 150ms |
| Language Model | Qwen-2.5-7B-Instruct + KV-cache reuse | 320ms |
| Text-to-Speech | Piper TTS (streaming, sentence-by-sentence) | 100ms |
| End-to-End P95 | ~580ms |
What makes it different:
- ๐ KV-cache reuse across turns โ 40% LLM latency reduction by serializing attention tensors to Redis
- ๐ค Adaptive end-of-turn detection โ learns speaking pace per session; 420msโ720ms dynamic silence threshold
- โก Barge-in handling โ multi-signal fusion detects interruptions in <200ms, stops TTS mid-sentence
- ๐งฑ Redis-backed session persistence โ full state, KV-cache, metrics per session; horizontal scaling ready
- โธ๏ธ Kubernetes-ready โ StatefulSet, HPA auto-scaling, PSTN via Asterisk/FreeSWITCH
Stack: Python FastAPI AsyncIO WebSockets Silero Faster-Whisper PyTorch Redis Docker Kubernetes
โ github.com/dhruvthakur2000/SaleTech
| Project | What it does | Stack |
|---|---|---|
| SaleTech | Production real-time AI voice sales agent โ 40+ concurrent calls, <600ms latency, full open-source pipeline | Python ยท FastAPI ยท Silero ยท Faster-Whisper ยท Qwen ยท Piper ยท Redis ยท K8s |
| linux_driver_eval | CLI framework to benchmark how well LLMs write Linux kernel device driver code. Two pipelines: generation + evaluation. Weighted scoring across correctness, security, quality, performance | Python ยท GCC ยท Together API ยท Static Analysis |
| virtual-voicebot | Streamlit voice assistant with persona-aware responses โ the project that started my obsession with real-time audio pipelines | Python ยท Streamlit ยท Groq ยท Whisper ยท LLaMA ยท TTS |
| ๐ HomeAssist (planned) | Smarter Alexa โ always-on edge voice assistant using SaleTech's VAD + ASR + buffer layers. Wake-word detection, local LLM, zero cloud dependency | SaleTech core ยท Edge inference |
| ๐ SaleTech Analytics (planned) | Call intelligence layer โ real-time sentiment, objection detection, sales stage classification per turn | SaleTech core ยท NLP ยท Classification |
I write about real engineering decisions โ not tutorials copied from docs.
| Post | Platform | |
|---|---|---|
| ๐ก | VAD: Voice Activity Detection โ how it actually works | Hashnode |
| ๐ | Understanding the Attention Mechanism: The Heart of the Transformer Revolution | Medium |
| ๐ชต | Structured Logging in Python: A Practical Guide for Production Systems | Medium |


