Technical deep-dive into system design decisions, data flows, and integration patterns.
┌─────────────────────────────────────────────────────────────────────┐
│ NGINX (reverse proxy) │
│ TLS termination, rate limiting │
└────────────────────────────┬────────────────────────────────────────┘
│
┌────────────────────────────▼────────────────────────────────────────┐
│ API (FastAPI) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Auth │ │ Clinical │ │ Pipeline │ │ Graph │ │ Storage │ │
│ │ Service │ │ CRUD │ │ Runner │ │ Query │ │ Manager │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
└───────┼────────────┼────────────┼────────────┼────────────┼────────┘
│ │ │ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│ JWT │ │PostgreSQL│ │Nextflow │ │ Neo4j │ │ MinIO │
│ Auth │ │ (rel) │ │ (pipe) │ │ (graph) │ │ (obj) │
└─────────┘ └─────────┘ └────┬────┘ └─────────┘ └─────────┘
│
┌──────▼──────┐
│ bio-pipeline │
│ (Docker) │
│ strobealign │
│ SAMtools │
│ bcftools │
│ GATK │
└─────────────┘
hospitals ──────────────────────┐
│ │
▼ │
patients ──────────────┐ │
│ │ │
▼ │ │
cases ──────────┐ │ │
│ │ │ │
▼ │ │ │
samples_v2 ──┐ │ │ │
│ │ │ │ │
▼ │ │ │ │
sequencing_runs │ │ │
│ │ │ │ │
▼ │ │ │ │
fastq_files │ │ │ │
│ │ │ │ │
▼ │ │ │ │
pipeline_runs│ │ │ │
│ │ │ │ │
▼ │ │ │ │
variants ────┘ │ │ │
│ │ │ │
▼ │ │ │
annotations │ │ │
│ │ │ │
▼ │ │ │
clinical_reports │ │
│ │ │ │
▼ │ │ │
audit_logs ────┘─────┘───────┘
(Gene) ──[:HAS_MUTATION]──→ (Mutation)
│ │
│ ▼
│ (Disease)
│ ▲
│ │
└──[:INTERACTS_WITH]──→ (Gene)──[:HAS_MUTATION]──→ (Mutation)
(Drug) ──[:TARGETS]──→ (Gene) ──[:PARTICIPATES_IN]──→ (Pathway)
(Mutation) ──[:CITED_IN]──→ (Paper)
All clinical endpoints are versioned under /api/v1/. System endpoints (health, auth, genome, storage, graph, agents) are unversioned.
Client API PostgreSQL
│ │ │
├── POST /auth/login ───→│ │
│ {email, password} ├── SELECT user ───────────→│
│ │←── user data ─────────────┤
│ ├── Argon2.verify ──→ │
│ ├── Generate JWT ──→ │
│ ├── Log audit event ───────→│
│←── {access_token} ─────┤ │
│ │ │
├── GET /api/v1/* ──────→│ │
│ Authorization: Bearer├── Decode JWT ──→ │
│ ├── Verify RBAC ──→ │
│ ├── Execute handler ──→ │
│←── {response} ─────────┤ │
Client API Docker Nextflow
│ │ │ │
├── POST pipeline-runs →│ │ │
│ ├── docker exec →│ │
│ │ ├── nextflow run ───→│
│ │ │ │
│ │←── SSE stream ─┤←── log output ────┤
│←── SSE events ─────────┤ │ │
│ │ │ │
│ │ │←── completion ─────┤
│ │←── status ─────┤ │
│←── {status: complete} ─┤ │ │
| Decision | Rationale |
|---|---|
| strobealign over BWA-MEM | 5-8x faster alignment with comparable accuracy. Critical for development hardware. |
| CRAM over BAM | 30-60% storage savings. Essential when working with limited disk. |
| Nextflow over Snakemake | Better Docker integration, resume capability, native Conda/Singularity support. |
| Neo4j over raw NetworkX | Production-grade graph DB with Cypher, indexing, transactions. NetworkX for in-memory analysis only. |
| MinIO over local filesystem | S3-compatible API, versioning, bucket policies. Drop-in replacement for AWS S3 in production. |
| No ORM (raw asyncpg) | Genomic queries need fine-grained SQL control. ORM overhead not justified for this domain. |
| Custom agents over LangChain | Domain-specific logic doesn't benefit from generic framework abstractions. Simpler debugging. |
| File-based LLM cache | Simple, no additional infrastructure. Sufficient for development. Redis for production. |
| Zustand over Redux | Minimal boilerplate, TypeScript-first, sufficient for dashboard state management. |