Skip to content

Latest commit

 

History

History
147 lines (127 loc) · 8.16 KB

File metadata and controls

147 lines (127 loc) · 8.16 KB

Architecture — AI Genomics Lab

Technical deep-dive into system design decisions, data flows, and integration patterns.


Service Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                        NGINX (reverse proxy)                        │
│                    TLS termination, rate limiting                    │
└────────────────────────────┬────────────────────────────────────────┘
                             │
┌────────────────────────────▼────────────────────────────────────────┐
│                         API (FastAPI)                                │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│  │  Auth    │ │ Clinical │ │ Pipeline │ │  Graph   │ │ Storage  │ │
│  │ Service  │ │  CRUD    │ │ Runner   │ │  Query   │ │ Manager  │ │
│  └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
└───────┼────────────┼────────────┼────────────┼────────────┼────────┘
        │            │            │            │            │
   ┌────▼────┐  ┌────▼────┐ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐
   │ JWT     │  │PostgreSQL│ │Nextflow │ │  Neo4j  │ │  MinIO  │
   │ Auth    │  │  (rel)   │ │ (pipe)  │ │ (graph) │ │ (obj)   │
   └─────────┘  └─────────┘ └────┬────┘ └─────────┘ └─────────┘
                                  │
                           ┌──────▼──────┐
                           │ bio-pipeline │
                           │ (Docker)    │
                           │ strobealign │
                           │ SAMtools    │
                           │ bcftools    │
                           │ GATK        │
                           └─────────────┘

Database Schema Design

PostgreSQL — Entity Relationship

hospitals ──────────────────────┐
    │                           │
    ▼                           │
patients ──────────────┐       │
    │                  │       │
    ▼                  │       │
cases ──────────┐     │       │
    │           │     │       │
    ▼           │     │       │
samples_v2 ──┐ │     │       │
    │        │ │     │       │
    ▼        │ │     │       │
sequencing_runs │   │       │
    │        │ │     │       │
    ▼        │ │     │       │
fastq_files  │ │     │       │
    │        │ │     │       │
    ▼        │ │     │       │
pipeline_runs│ │     │       │
    │        │ │     │       │
    ▼        │ │     │       │
variants ────┘ │     │       │
    │          │     │       │
    ▼          │     │       │
annotations    │     │       │
    │          │     │       │
    ▼          │     │       │
clinical_reports    │       │
    │          │     │       │
    ▼          │     │       │
audit_logs ────┘─────┘───────┘

Neo4j — Graph Model

(Gene) ──[:HAS_MUTATION]──→ (Mutation)
  │                            │
  │                            ▼
  │                        (Disease)
  │                            ▲
  │                            │
  └──[:INTERACTS_WITH]──→ (Gene)──[:HAS_MUTATION]──→ (Mutation)

(Drug) ──[:TARGETS]──→ (Gene) ──[:PARTICIPATES_IN]──→ (Pathway)

(Mutation) ──[:CITED_IN]──→ (Paper)

API Design Patterns

Versioning

All clinical endpoints are versioned under /api/v1/. System endpoints (health, auth, genome, storage, graph, agents) are unversioned.

Authentication Flow

Client                    API                     PostgreSQL
  │                        │                           │
  ├── POST /auth/login ───→│                           │
  │   {email, password}    ├── SELECT user ───────────→│
  │                        │←── user data ─────────────┤
  │                        ├── Argon2.verify ──→       │
  │                        ├── Generate JWT ──→        │
  │                        ├── Log audit event ───────→│
  │←── {access_token} ─────┤                           │
  │                        │                           │
  ├── GET /api/v1/* ──────→│                           │
  │   Authorization: Bearer├── Decode JWT ──→          │
  │                        ├── Verify RBAC ──→         │
  │                        ├── Execute handler ──→     │
  │←── {response} ─────────┤                           │

Pipeline Execution Flow

Client                    API              Docker              Nextflow
  │                        │                │                    │
  ├── POST pipeline-runs →│                │                    │
  │                        ├── docker exec →│                    │
  │                        │                ├── nextflow run ───→│
  │                        │                │                    │
  │                        │←── SSE stream ─┤←── log output ────┤
  │←── SSE events ─────────┤                │                    │
  │                        │                │                    │
  │                        │                │←── completion ─────┤
  │                        │←── status ─────┤                    │
  │←── {status: complete} ─┤                │                    │

Key Design Decisions

Decision Rationale
strobealign over BWA-MEM 5-8x faster alignment with comparable accuracy. Critical for development hardware.
CRAM over BAM 30-60% storage savings. Essential when working with limited disk.
Nextflow over Snakemake Better Docker integration, resume capability, native Conda/Singularity support.
Neo4j over raw NetworkX Production-grade graph DB with Cypher, indexing, transactions. NetworkX for in-memory analysis only.
MinIO over local filesystem S3-compatible API, versioning, bucket policies. Drop-in replacement for AWS S3 in production.
No ORM (raw asyncpg) Genomic queries need fine-grained SQL control. ORM overhead not justified for this domain.
Custom agents over LangChain Domain-specific logic doesn't benefit from generic framework abstractions. Simpler debugging.
File-based LLM cache Simple, no additional infrastructure. Sufficient for development. Redis for production.
Zustand over Redux Minimal boilerplate, TypeScript-first, sufficient for dashboard state management.