A comprehensive LLM-powered system for evaluating patient eligibility for clinical trials using advanced agent-based workflows, vector databases, and interactive web interfaces.
LLM Pharma is an intelligent clinical trial management system that automates the evaluation of patients for potential clinical trials. The system utilizes Large Language Models (LLMs), vector databases, and agent-based workflows to:
- Analyze patient medical histories and generate comprehensive profiles
- Evaluate eligibility against institutional policies and trial criteria
- Match patients to relevant clinical trials with detailed explanations
- Prevent hallucinations through advanced grading and verification systems
- Provide interactive dashboards for clinical research coordinators
The LLM Pharma system follows a structured process flow to evaluate patient eligibility for clinical trials:
- Data Collection and Preprocessing: Import and prepare patient data for analysis
- Patient Profile Analysis: Utilize LLMs to analyze patient data and extract relevant information
- Eligibility Verification: Assess patient data against clinical trial policies to determine eligibility
- Trial Matching: Use agentic workflows to identify suitable trials for patients
- Hallucination Prevention: Implement a hallucination grader to ensure model outputs are contextually accurate
- Patient Data Management: SQLite database with 100+ demo patients
- Semantic Search: ChromaDB vector stores with Nomic embeddings
- Multi-Model LLM: Groq and OpenAI integration with fallback logic
- Policy Evaluation: Automated compliance checking with ReAct agent
- Trial Matching: Intelligent matching with metadata filtering
- Hallucination Prevention: Advanced grading with verification
- Interactive Interface: Modern web UI with real-time processing
- Self-Query Retrieval: Metadata-aware trial search
- Multi-step Workflows: LangGraph-based agent orchestration
- Tool Integration: Date calculations and numerical operations
- Profile Rewriting: Adaptive profile enhancement for better matches
- Thread Management: Multi-session support with state persistence
- Demo Mode: Dummy workflow for testing and demonstration
- Configuration Management: Hydra-based flexible configuration
The system includes demo databases for:
- Patients: A SQLite database with 100+ demo patients
- Diseases: A comprehensive list of diseases for trial matching
- Clinical Trials: Data on various clinical trials for matching
- Clinical Policies: Institutional policies stored in ChromaDB for compliance checking
- RAG Strategies: The system employs different Retrieval-Augmented Generation strategies to enhance trial matching
- Hallucination Guard: Advanced grading and verification systems prevent hallucinations in model outputs
- Human-in-the-Loop: Allows clinical research coordinators to review and adjust trial matching results
- Python 3.12+: Core programming language
- LangChain: Framework for building LLM applications
- LangGraph: Workflow orchestration and agent management
- Groq: Primary LLM provider with multiple model options
- OpenAI: Alternative LLM provider for GPT models
- Nomic: Embeddings for semantic search
- ChromaDB: Vector database for semantic search
- Gradio: Web interface framework
- SQLite: Relational database for patient data
- Hydra: Configuration management
- Poetry: Dependency management
- Pytest: Testing framework
πΉ Watch Demo Video - See the application in action with real example and usage scenario.
- Python 3.12+
- Poetry (for dependency management)
- Groq API key (primary) or OpenAI API key (alternative)
-
Clone the repository and navigate to the llm_pharma directory:
cd llm_pharma -
Install dependencies using Poetry:
make dev-install
-
Set up environment variables:
# Create .env file with your API keys echo "GROQ_API_KEY=your_groq_api_key_here" > .env echo "OPENAI_API_KEY=your_openai_api_key_here" >> .env
-
Set up data (required for first run):
make setup-data
make runThen visit http://127.0.0.1:7958 in your browser.
python frontend/app.py --demopython frontend/app.py --host 0.0.0.0 --port 8080 --shareThe system is built using a modular architecture with the following key components:
- WorkflowManager: Orchestrates the LangGraph-based evaluation workflow
- LLMManager: Handles multiple LLM models with fallback logic (Groq/OpenAI)
- DatabaseManager: Manages SQLite patient database and ChromaDB vector stores
- PolicyService: Handles policy evaluation and eligibility assessment
- TrialService: Manages trial matching and relevance scoring
- PatientCollector: Handles patient data collection and profile generation
- State Management: TypedDict-based workflow state management
- Gradio Web Interface: Interactive dashboard with multi-tab results
- Real-time Processing: Live workflow execution with status updates
- Thread Management: Multi-session support with state persistence
- Demo Mode: Dummy workflow for testing and demonstration
- Master Setup Script: Complete data initialization
- Patients Database Creator: SQLite database with demo patients
- Policies Vector Store Creator: ChromaDB for institutional policies
- Trials Vector Store Creator: ChromaDB for clinical trials data
- Hydra Configuration: Centralized config management with YAML files
- Environment Management: Secure API key and settings management
llm_pharma/
βββ backend/ # Backend modules
β βββ my_agent/ # Core agent modules
β β βββ workflow_manager.py # LangGraph workflow orchestration
β β βββ llm_manager.py # Multi-model LLM management
β β βββ database_manager.py # Database and vector store operations
β β βββ policy_service.py # Policy evaluation and tools
β β βββ trial_service.py # Trial matching and scoring
β β βββ patient_collector.py # Patient data collection
β β βββ State.py # Workflow state management
β βββ README.md # Backend documentation
βββ frontend/ # Gradio web interface
β βββ app.py # Main frontend application
β βββ helper_gui.py # Comprehensive Gradio interface
β βββ demo_graph.py # Demo mode with dummy workflow
β βββ README.md # Frontend documentation
βββ scripts/ # Data setup scripts
β βββ setup_all_data.py # Master setup script
β βββ create_patients_database.py # Patients database creator
β βββ create_policies_vectorstore.py # Policies vector store creator
β βββ create_trials_vectorstore.py # Trials vector store creator
β βββ README.md # Scripts documentation
βββ config/ # Configuration files
β βββ config.yaml # Main configuration
βββ tests/ # Test suite
β βββ unit/ # Unit tests
β βββ integration/ # Integration tests
β βββ regression/ # Regression tests
βββ data/ # Data files
βββ vector_store/ # ChromaDB vector stores
βββ sql_server/ # SQLite databases
βββ source_data/ # Source documents
βββ pyproject.toml # Poetry configuration
βββ Makefile # Development automation
βββ README.md # This file
The system uses Hydra for configuration management. Key configuration files:
config/config.yaml: Main configuration file- Environment variables:
GROQ_API_KEY,OPENAI_API_KEY
# Model settings
models:
agent_models:
- id: "mistral-saba-24b"
provider: "groq"
tool_models:
- id: "llama-3.3-70b-versatile"
provider: "groq"
# Directory paths
directories:
sql_server: "sql_server"
vector_store: "vector_store"
# File paths
files:
policy_markdown: "source_data/instut_trials_policy.md"
trials_csv: "data/trials_data.csv"-
Start the web interface:
make run
-
Enter a patient query:
Is patient 15 eligible for any clinical trials? -
Review results in the detailed tabs:
- Agent Tab: Workflow control and status
- Patient Profile: Generated patient summary (editable)
- Policy Evaluation: Institutional policy compliance
- Trials Summary: Overview of matched trials
- Potential Trials: Detailed trial information
- Trials Scores: Comprehensive scoring and ranking
from omegaconf import OmegaConf
from backend.my_agent.workflow_manager import WorkflowManager
# Load configuration
config = OmegaConf.load("config/config.yaml")
# Initialize workflow manager
workflow_manager = WorkflowManager.from_config(config)
# Run evaluation
result = workflow_manager.run_workflow(
patient_prompt="Is patient 5 eligible for any medical trial?",
thread_id="example_session"
)
# Process results
print(f"Patient ID: {result['patient_id']}")
print(f"Policy Eligible: {result['policy_eligible']}")
print(f"Trials Found: {result['trial_found']}")make install # Install production dependencies
make dev-install # Install development dependencies
make run # Run Gradio frontend
make setup-data # Set up all data (patients, policies, trials)
make test-all # Run all tests
make test-unit # Run unit tests only
make test-integration # Run integration tests only
make test-regression # Run regression tests
make lint # Run linting (ruff)
make format # Format code (black, isort)
make check # Run all checks (lint + format + test)
make clean # Clean up build artifacts- Ruff: Fast Python linter
- Black: Code formatting
- isort: Import sorting
- pytest: Testing framework
The project includes comprehensive unit, integration, and regression tests:
# Run all tests
make test-all
# Run unit tests only
make test-unit
# Run integration tests only
make test-integration
# Run regression tests
make test-regression- Unit Tests: Individual module functionality
- Integration Tests: Module interaction testing
- Regression Tests: End-to-end workflow validation
- Synthetic Data: All patient data is synthetic and created for demonstration purposes - no real names or persons are used
- Secure Configuration: Environment-based API key management
- Input Validation: Comprehensive data validation and sanitization
- Error Handling: Graceful error handling with informative messages
- Model Fallback: Automatic switching between LLM providers
- Caching: Vector store persistence for faster retrieval
- Optimized Queries: Efficient database operations
- Memory Management: SQLite checkpointing for workflow state
For detailed information about specific components:
- Backend Documentation: Core modules and workflow
- Frontend Documentation: Web interface and usage
- Scripts Documentation: Data setup and management
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Install development dependencies:
make dev-install - Make your changes and add tests
- Run the test suite:
make check - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
- Follow PEP 8 style guidelines
- Add type hints to all functions
- Write comprehensive tests for new features
- Update documentation as needed
- Use descriptive commit messages
This project is licensed under the MIT License - see the LICENSE file for details.
For questions, issues, or contributions:
- Check the Issues: Look for existing issues or create a new one
- Documentation: Review component-specific README files
- Tests: Run the test suite to verify functionality
- Enhanced RAG: Graph-based retrieval with entity relationships
- Multi-modal Support: Image and document processing
- Advanced Analytics: Trial success prediction and patient outcome analysis
- Integration APIs: RESTful APIs for external system integration
- Scalability: Distributed processing and cloud deployment options
- Embedding Optimization: Fine-tuning of embeddings for clinical domain specificity
Note: This system is designed for research and demonstration purposes. For production clinical trial management, consult with healthcare professionals and ensure compliance with relevant regulations and standards.
Bob Hosseini
- Email: bbkhosseini@gmail.com
- LinkedIn: linkedin.com/in/bobhosseini
- GitHub: github.com/bab-git
- Website: bob-hosseini-portfolio.web.app
Built with β€οΈ for advancing clinical trial management through AI


