Skip to content
View tweep's full-sized avatar

Block or report tweep

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
tweep/README.md

Jan-Hinnerk Vogel · tweep

Senior Principal Research Software Engineer · AI/ML Platforms for Scientific Discovery

I build production-grade AI systems at the intersection of biology, genomics, healthcare, and large-scale computation. After 15+ years architecting data platforms, ecosystems and ML workflows at Genentech/Roche, the Wellcome Trust Sanger Institute, UK and the Max Planck Institute, Germany, I'm now focused on the next generation of AI-driven platforms for drug discovery and precision medicine.


What I Build

AI-Enabled Scientific Workflows Multi-agent LLM systems for laboratory automation, LIMS integration, clinical trial site selection, large scale genomic workflows, and scientific data integration. Built production systems at Genentech that replaced weeks of manual analyst work with automated, auditable workflows — with human-in-the-loop review, parameter tracking and data provenance + lineage.

Large-Scale Distributed Processing Designed and operated genomic data pipelines scaling to 60,000+ parallel EC2 instances. Built FireDB and Sunrise — internal self-serve analysis platforms serving 500+ scientists and research staffacross Genentech's research organization — handling petabyte-scale genomic and clinical data.


Tech Stack

Domain Tools
Languages Python, Perl, SQL, Bash, Java, Go
ML / AI LangChain, multi-agent frameworks (picoAgents), LLM APIs (Anthropic, OpenAI)
Cloud & Infra AWS (EC2, S3, Batch, Lambda et al.), IaC, CI/CD, Docker
Data PostgreSQL, MySQL

Background

  • Genentech / Roche — Acting Director, Research Software Engineering. Built AI-supported workflows for genomics at scale. Scaled distributed compute to 60K+ parallel AWS EC2 nodes, including GPU.
  • EMBL · Sanger Institute · Max Planck Institute — Early-career bioinformatics engineering across genome analysis, variant pipelines, and scientific data infrastructure.

I'm particularly drawn to problems where rigorous engineering meets scientific ambiguity — where the data is messy, the domain is deep, and getting it right actually matters for patients.


Currently

Exploring multi-agent system architectures and LLM integration patterns for scientific discovery. Currently deepening expertise in computational approaches to drug discovery, with a focus on production-grade AI deployment in regulated research environments.

Open to senior engineering roles building AI platforms at the frontier of computational drug discovery and healthcare.


📍 San Francisco Bay Area, CA · LinkedIn · Available for senior platform / ML engineering roles

Pinned Loading

  1. aws-genomics-workflows aws-genomics-workflows Public

    Forked from aws-samples/aws-genomics-workflows

    Genomics Workflows on AWS

    Shell

  2. cromwell cromwell Public

    Forked from broadinstitute/cromwell

    Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments

    Scala

  3. nf-tower nf-tower Public

    Forked from seqeralabs/nf-tower

    Nextflow Tower system

    Groovy

  4. cBioPortal/cbioportal cBioPortal/cbioportal Public

    cBioPortal for Cancer Genomics

    Java 962 819