Cellian is an integrated platform that combines deep learning models, pathway analysis, and AI-powered reasoning to predict and analyze the effects of genetic and drug perturbations on cellular systems. The system provides end-to-end predictions from perturbations → RNA → protein, with comprehensive pathway enrichment analysis and hypothesis generation.
- Gene Perturbations: Predict RNA and protein changes from CRISPR knockouts, knockouts, and overexpression
- Drug Perturbations: Analyze drug effects on cellular transcriptomics and proteomics
- Dual Perturbations: Compare gene and drug perturbations side-by-side
- Pathway Analysis: Comprehensive GSEA, KEGG, Reactome, and GO enrichment analysis
- 3D Visualizations: Interactive 3D cell and network visualizations
- AI-Powered Reasoning: LLM-based query processing and hypothesis generation
- Perturbation → RNA: Uses STATE model for gene perturbations and ST-Tahoe for drug perturbations
- RNA → Protein: Leverages scTranslator for protein expression prediction
- Evaluation: R², Pearson correlation, RMSE, and MAE metrics against ground truth data
- Differential Expression Analysis: RNA and protein level comparisons
- Pathway Enrichment: KEGG, Reactome, and Gene Ontology analysis
- Gene Set Enrichment Analysis (GSEA): Pathway-level insights
- Interactive Dashboards: Real-time results visualization with plots and tables
- Natural Language Queries: Ask questions in plain English
- Intelligent Parsing: Automatically detects gene/drug perturbations from queries
- Hypothesis Generation: AI-powered reasoning about cellular mechanisms
- Workflow Orchestration: Automated pipeline execution based on user queries
- 3D Cell Visualization: Watch perturbations inject into cells with animations
- 3D Network Graph: Interactive translator network showing data flow
- Real-time Progress: Live updates during pipeline execution
- Results Dashboard: Comprehensive view of all analysis results
┌─────────────────────────────────────────────────────────────┐
│ Frontend (React) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Chatbot │ │ 3D Visuals │ │ Dashboard │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
└─────────┼─────────────────┼─────────────────┼─────────────┘
│ │ │
└─────────────────┼─────────────────┘
│
┌───────────────────────────▼───────────────────────────────┐
│ Backend API (FastAPI) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ LLM Layer │ │ Agent Tools │ │ Workflow │ │
│ │ (Gemini) │ │ Pipeline │ │ Orchestrator │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
└─────────┼─────────────────┼─────────────────┼─────────────┘
│ │ │
└─────────────────┼─────────────────┘
│
┌───────────────────────────▼───────────────────────────────┐
│ Agent Tools (Perturbation Pipeline) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ STATE │ │ scTranslator │ │ Pathway │ │
│ │ (RNA) │ │ (Protein) │ │ Analysis │ │
│ └─────────────┘ └──────────────┘ └──────────────┘ │
└───────────────────────────────────────────────────────────┘
- Python: 3.8 or higher
- Node.js: 18.x or higher (for frontend)
- CUDA: 11.8+ (for GPU acceleration)
- Models:
- STATE model checkpoint
- ST-Tahoe model
- scTranslator checkpoint
- Clone the repository:
git clone <repository-url>
cd cellian- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install Python dependencies:
pip install -r requirements.txt- Install backend-specific dependencies:
cd backend
pip install -r requirements.txt- Set up environment variables:
# Create .env file in backend/ or backend/llm/
cp backend/.env.example backend/.env
# Edit backend/.env and add:
# GOOGLE_API_KEY=your_gemini_api_key_here- Navigate to frontend directory:
cd frontend- Install dependencies:
npm install
# or
yarn install- Start development server:
npm run devThe frontend will be available at http://localhost:5173
Ensure you have the following models and data files:
- STATE Model: Place in
/home/nebius/state/test_replogle/hepg2_holdout/ - ST-Tahoe Model: Place in
/home/nebius/ST-Tahoe/ - scTranslator Checkpoint: Place in
/home/nebius/scTranslator/checkpoint/ - Data Files: Place in
data/perturb-cite-seq/
Update paths in Agent_Tools/perturbation_pipeline.py if using different locations.
Create .env files in the following locations:
backend/.env or backend/llm/.env:
GOOGLE_API_KEY=your_gemini_api_key_here
# Alternative:
GEMINI_API_KEY=your_gemini_api_key_hereDefault model paths (can be overridden via command-line arguments):
- STATE model:
/home/nebius/state/test_replogle/hepg2_holdout/ - ST-Tahoe:
/home/nebius/ST-Tahoe/ - scTranslator:
/home/nebius/scTranslator/checkpoint/expression_fine-tuned_scTranslator.pt
Default data paths:
- Control template:
data/perturb-cite-seq/scp1064_control_template.h5ad - Ground truth RNA:
data/perturb-cite-seq/RNA_expression_combined_mapped.h5ad - Ground truth protein:
data/perturb-cite-seq/protein_expression_mapped.h5ad
- Start the backend:
cd backend
python api.py
# Or use the startup script:
./start_backend.shThe backend API will be available at http://localhost:8000
- Start the frontend (in a separate terminal):
cd frontend
npm run dev- Open your browser:
Navigate to
http://localhost:5173
-
Ask a Question: Type a natural language query in the chatbot:
- Gene perturbation:
"What happens if I knock down TP53?" - Drug perturbation:
"Dimethyl fumarate" - Both:
"CHCHD2 vs Dimethyl fumarate"
- Gene perturbation:
-
Select Condition: Choose experimental condition (Control, IFNγ, or Co-culture)
-
Watch the Pipeline:
- 3D cell visualization shows perturbation injection
- 3D network graph highlights active pipeline stages
- Reasoning log shows real-time progress
-
View Results: Check the "Results & Dashboard" tab for:
- Differential expression analysis
- Pathway enrichment results
- Generated hypotheses
cd Agent_Tools
python perturbation_pipeline.py --target-gene TP53python perturbation_pipeline.py --perturbation-type drug --drug "Dimethyl fumarate"python perturbation_pipeline.py \
--target-gene ACTB \
--state-model-dir /path/to/state/model \
--sctranslator-checkpoint /path/to/sctranslator.pt \
--output-dir /path/to/outputSee Agent_Tools/README.md for detailed command-line options.
cellian/
├── backend/ # FastAPI backend service
│ ├── api.py # Main API endpoints
│ ├── llm/ # LLM integration (Gemini)
│ │ ├── input.py # Query parsing
│ │ └── output.py # Result interpretation
│ └── Agent_Tools/ # Pipeline tools (symlinked)
│
├── frontend/ # React + TypeScript frontend
│ ├── src/
│ │ ├── components/ # React components
│ │ │ ├── Chatbot.tsx
│ │ │ ├── Cell3D.tsx
│ │ │ ├── TranslatorNetwork3D.tsx
│ │ │ └── Dashboard.tsx
│ │ └── pages/
│ │ └── Index.tsx # Main page
│ └── package.json
│
├── Agent_Tools/ # Core perturbation pipeline
│ ├── perturbation_pipeline.py # Main pipeline orchestrator
│ ├── state_inference.py # Gene → RNA prediction
│ ├── drug_inference.py # Drug → RNA prediction
│ ├── sctranslator_inference.py # RNA → Protein prediction
│ ├── pathway_analysis.py # Pathway enrichment
│ └── evaluation.py # Metrics calculation
│
├── llm/ # LLM reasoning layer
│ ├── hypothesis_agent.py # Hypothesis generation
│ └── perturbation_orchestrator.py
│
├── reasoning_layer/ # Graph-based reasoning
│ └── engine/ # Reasoning engine
│
├── data/ # Data files (not in git)
│ └── perturb-cite-seq/ # Perturb-CITE-seq dataset
│
├── logs/ # Workflow logs
├── requirements.txt # Python dependencies
└── README.md # This file
Process a natural language query and extract perturbation information.
Request:
{
"query": "What happens if I knock down TP53?"
}Response:
{
"success": true,
"perturbation_info": {
"target": "TP53",
"type": "KD",
"has_both": false,
"confidence": 0.9
}
}Start a perturbation workflow.
Request:
{
"perturbation_info": {
"target": "TP53",
"type": "KD"
},
"condition": "Control",
"perturbation_type": "gene"
}Response:
{
"workflow_id": "uuid-here",
"status": "pending"
}Get workflow status and results.
Response:
{
"status": "running",
"progress": 0.65,
"current_step": "Predicting protein changes...",
"pipeline_stage": "protein",
"logs": [...],
"results": {...}
}Get workflow logs (streaming).
When the backend is running, visit:
- Swagger UI:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
cd backend
# Install development dependencies
pip install -r requirements.txt
# Run with auto-reload
uvicorn api:app --reload --host 0.0.0.0 --port 8000cd frontend
npm run dev# Backend tests
cd backend
pytest tests/
# Frontend tests
cd frontend
npm test- Python: Follow PEP 8, use
blackandruff - TypeScript: Follow ESLint rules, use Prettier
# Format Python code
black backend/
ruff check backend/
# Format TypeScript code
cd frontend
npm run lint-
LLM API Key Not Found
- Ensure
GOOGLE_API_KEYis set inbackend/.envorbackend/llm/.env - Verify the API key is valid and has proper permissions
- Ensure
-
Model Files Not Found
- Check model paths in
Agent_Tools/perturbation_pipeline.py - Ensure all required checkpoint files exist
- Check model paths in
-
Port Already in Use
- Backend: Change port in
api.pyor use--portflag - Frontend: Change port in
vite.config.ts
- Backend: Change port in
-
CORS Errors
- Ensure backend CORS settings include your frontend URL
- Check
backend/api.pyCORS middleware configuration
-
Pipeline Fails
- Check logs in
logs/workflow_*.log - Verify data files exist and are accessible
- Ensure all dependencies are installed
- Check logs in
We welcome contributions! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Write clear commit messages
- Add tests for new features
- Update documentation as needed
- Follow existing code style
- Ensure all tests pass before submitting
[Add your license here]
If you use Cellian in your research, please cite:
@software{cellian2024,
title = {Cellian: Multi-Omics Hypothesis Engine},
author = {[Your Name/Team]},
year = {2024},
url = {https://github.com/yourusername/cellian}
}- STATE Model: For gene perturbation → RNA prediction
- ST-Tahoe: For drug perturbation → RNA prediction
- scTranslator: For RNA → protein translation
- Gemini API: For natural language processing
- React Three Fiber: For 3D visualizations
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Built with ❤️ for the scientific community
Let's build the virtual cell together!
