Skip to content

Latest commit

 

History

History
377 lines (294 loc) · 11 KB

File metadata and controls

377 lines (294 loc) · 11 KB
license mit
tags
cancer-genomics
bioinformatics
graph-database
neo4j
distributed-computing
boinc
healthcare
genomics
fastq
blast
variant-calling
gdc-portal
tcga
library_name cancer-at-home-v2
pipeline_tag other
metrics
accuracy
bleu
bleurt

Cancer@Home v2

Version License Python Neo4j

🧬 Overview

Cancer@Home v2 is a comprehensive distributed computing platform for cancer genomics research that combines BOINC distributed computing, GDC cancer data analysis, sequence processing (FASTQ/BLAST), and Neo4j graph visualization into a unified, easy-to-use system.

Inspired by Cancer@Home v1 and Andrew Kamal's Neo4j Dashboard, this platform makes cancer genomics research accessible, distributed, and visual.

🎯 Key Features

  • 🌐 Interactive Web Dashboard - Modern UI with real-time visualizations
  • 🔍 Neo4j Graph Database - Model complex gene-mutation-patient relationships
  • BOINC Integration - Distributed computing for intensive analyses
  • 📊 GraphQL API - Flexible data querying
  • 🧪 Bioinformatics Pipeline - FASTQ processing, BLAST alignment, variant calling
  • 📚 GDC Portal Integration - Access TCGA/TARGET cancer datasets
  • 🚀 Quick Setup - Running in under 5 minutes

🏗️ Architecture

┌─────────────────────────────────────────────┐
│     Web Dashboard (D3.js + Chart.js)        │
├─────────────────────────────────────────────┤
│     FastAPI Backend (REST + GraphQL)        │
├──────┬──────┬──────┬──────┬────────────────┤
│Neo4j │BOINC │ GDC  │FASTQ │ BLAST/Variant  │
│Graph │Client│ API  │  QC  │    Calling     │
└──────┴──────┴──────┴──────┴────────────────┘

📦 Installation

Prerequisites

  • Python 3.8+
  • Docker Desktop
  • 8GB RAM (16GB recommended)

Quick Start

Windows:

git clone https://huggingface.co/OpenPeerAI/CancerAtHomeV2
cd CancerAtHomeV2
.\setup.ps1
python run.py

Linux/Mac:

git clone https://huggingface.co/OpenPeerAI/CancerAtHomeV2
cd CancerAtHomeV2
chmod +x setup.sh
./setup.sh
python run.py

Then open: http://localhost:5000

🚀 Usage

Web Dashboard

Access the interactive dashboard at http://localhost:5000 with:

  • Dashboard Tab: Overview statistics and mutation charts
  • Neo4j Visualization: Interactive graph of cancer relationships
  • BOINC Tasks: Submit and monitor distributed computing tasks
  • GDC Data: Browse and download cancer datasets
  • Pipeline Tools: Run FASTQ QC, BLAST, and variant calling

GraphQL API

Query cancer data at http://localhost:5000/graphql

Example: Get mutations in TP53 gene

query {
  mutations(gene: "TP53") {
    mutation_id
    chromosome
    position
    consequence
  }
}

Example: Get patient statistics

query {
  cancerStatistics(cancer_type_id: "BRCA") {
    total_patients
    total_mutations
    avg_mutations_per_patient
  }
}

REST API

Database Summary:

curl http://localhost:5000/api/neo4j/summary

Submit BOINC Task:

curl -X POST http://localhost:5000/api/boinc/submit \
  -H "Content-Type: application/json" \
  -d '{"workunit_type": "variant_calling", "input_file": "sample.fastq"}'

Python API

FASTQ Processing:

from backend.pipeline import FASTQProcessor

processor = FASTQProcessor()
stats = processor.calculate_statistics("input.fastq")
filtered = processor.quality_filter("input.fastq")

Variant Calling:

from backend.pipeline import VariantCaller, VariantAnalyzer

caller = VariantCaller()
vcf_file = caller.call_variants("alignment.bam", "reference.fa")
variants = caller.filter_variants(vcf_file)

analyzer = VariantAnalyzer()
cancer_variants = analyzer.identify_cancer_variants(variants)
tmb = analyzer.calculate_mutation_burden(variants)

Neo4j Queries:

from backend.neo4j import DatabaseManager

db = DatabaseManager()
query = """
MATCH (g:Gene {symbol: 'TP53'})<-[:AFFECTS]-(m:Mutation)
RETURN m.position, m.consequence
"""
results = db.execute_query(query)
db.close()

📊 Data Model

Neo4j Graph Schema

Nodes:

  • Gene: Genes with mutations (TP53, BRCA1, KRAS, etc.)
  • Mutation: Genetic variants with position and consequence
  • Patient: Individual cases with demographics
  • CancerType: Cancer classifications (BRCA, LUAD, COAD, GBM)

Relationships:

  • Gene ← AFFECTS ← Mutation
  • Patient → HAS_MUTATION → Mutation
  • Patient → DIAGNOSED_WITH → CancerType

Sample Data Included

  • 7 Genes: TP53, BRAF, BRCA1, BRCA2, PIK3CA, KRAS, EGFR
  • 5 Mutations: Cancer-associated variants
  • 5 Patients: Representative TCGA cases
  • 4 Cancer Types: BRCA, LUAD, COAD, GBM

🔧 Technology Stack

  • Backend: FastAPI, Python 3.8+
  • Database: Neo4j 5.13 (Graph Database)
  • API: GraphQL (Strawberry), REST
  • Frontend: HTML5, CSS3, JavaScript, D3.js, Chart.js
  • Bioinformatics: Biopython, BLAST+
  • Data Source: GDC Portal API (TCGA/TARGET)
  • Infrastructure: Docker, Docker Compose
  • Distributed Computing: BOINC Framework

📚 Documentation

🎓 Use Cases

  1. Cancer Research: Analyze genomics data with distributed computing
  2. Education: Learn cancer genetics and bioinformatics
  3. Data Visualization: Explore gene-mutation-patient relationships
  4. Pipeline Development: Test bioinformatics workflows
  5. Graph Analytics: Query complex biological networks

🔬 Supported Cancer Projects

  • TCGA-BRCA: Breast Cancer (1,098 cases)
  • TCGA-LUAD: Lung Adenocarcinoma (585 cases)
  • TCGA-COAD: Colon Adenocarcinoma (461 cases)
  • TCGA-GBM: Glioblastoma (617 cases)
  • TARGET-AML: Acute Myeloid Leukemia (238 cases)

📈 Bioinformatics Pipeline

FASTQ Processing

  • Quality control and filtering
  • Adapter trimming
  • Statistics calculation
  • QC report generation

BLAST Alignment

  • BLASTN for nucleotide sequences
  • BLASTP for protein sequences
  • Hit filtering by identity/e-value
  • Homology detection

Variant Calling

  • VCF generation from alignments
  • Quality filtering
  • Cancer variant identification
  • Tumor mutation burden (TMB) calculation

🌐 Access Points

🛠️ Configuration

Edit config.yml to customize:

neo4j:
  uri: "bolt://localhost:7687"
  password: "cancer123"

gdc:
  download_dir: "./data/gdc"
  projects: ["TCGA-BRCA", "TCGA-LUAD", "TCGA-COAD"]

pipeline:
  fastq:
    quality_threshold: 20
    min_length: 50
  blast:
    evalue: 0.001
    num_threads: 4

🤝 Contributing

Contributions are welcome! This project is open source under the MIT License.

Development Setup

python -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows
pip install -r requirements.txt
pytest test_cancer_at_home.py

📄 License

MIT License - See LICENSE file

Copyright (c) 2025 OpenPeer AI, Riemann Computing Inc., Bleunomics, Andrew Magdy Kamal

🙏 Acknowledgments

Inspiration

Data Sources

Technologies

  • Neo4j Graph Database
  • BOINC Distributed Computing Project
  • Biopython Community
  • FastAPI Framework

👥 Authors

  • OpenPeer AI - Core development and architecture
  • Riemann Computing Inc. - Distributed computing integration
  • Bleunomics - Bioinformatics pipeline and genomics expertise
  • Andrew Magdy Kamal - Graph database design and visualization

📞 Support

  • Documentation: See project documentation files
  • Issues: Check logs in logs/cancer_at_home.log
  • Configuration: Review config.yml
  • Health Check: http://localhost:5000/api/health

🔮 Roadmap

Planned Features

  • Machine learning for mutation prediction
  • Multi-omics data integration (RNA-seq, proteomics)
  • Survival analysis and clinical outcomes
  • Advanced graph algorithms (PageRank, community detection)
  • Cloud deployment support (AWS, Azure, GCP)
  • Mobile-responsive design
  • User authentication and authorization

📊 Statistics

  • Lines of Code: ~5,000+
  • Modules: 9 Python modules
  • API Endpoints: 15+ REST + GraphQL
  • Documentation: 2,500+ lines
  • Setup Time: < 5 minutes
  • Sample Data: 7 genes, 5 mutations, 5 patients

🎯 Citation

If you use Cancer@Home v2 in your research, please cite:

@software{cancer_at_home_v2,
  title = {Cancer@Home v2: Distributed Cancer Genomics Research Platform},
  author = {OpenPeer AI and Riemann Computing Inc. and Bleunomics and Andrew Magdy Kamal},
  year = {2025},
  url = {https://huggingface.co/OpenPeerAI/CancerAtHomeV2},
  license = {MIT}
}

🏷️ Tags

cancer-genomics bioinformatics neo4j graph-database distributed-computing boinc fastq blast variant-calling gdc-portal tcga target graphql fastapi python docker healthcare precision-medicine computational-biology


Made with ❤️ by OpenPeer AI, Riemann Computing Inc., Bleunomics, and Andrew Magdy Kamal

For cancer research, by researchers, accessible to all.