Skip to content

bab-git/llm_pharma

Repository files navigation

App Logo

LLM Pharma - Clinical Trial Management System

Python LangChain LangGraph Groq OpenAI Nomic ChromaDB Gradio SQLite Hydra Poetry Pytest

A comprehensive LLM-powered system for evaluating patient eligibility for clinical trials using advanced agent-based workflows, vector databases, and interactive web interfaces.

πŸ”¬ Overview

LLM Pharma is an intelligent clinical trial management system that automates the evaluation of patients for potential clinical trials. The system utilizes Large Language Models (LLMs), vector databases, and agent-based workflows to:

  • Analyze patient medical histories and generate comprehensive profiles
  • Evaluate eligibility against institutional policies and trial criteria
  • Match patients to relevant clinical trials with detailed explanations
  • Prevent hallucinations through advanced grading and verification systems
  • Provide interactive dashboards for clinical research coordinators

🧩 Process Flow

The LLM Pharma system follows a structured process flow to evaluate patient eligibility for clinical trials:

Pipeline Block Diagram

  1. Data Collection and Preprocessing: Import and prepare patient data for analysis
  2. Patient Profile Analysis: Utilize LLMs to analyze patient data and extract relevant information
  3. Eligibility Verification: Assess patient data against clinical trial policies to determine eligibility
  4. Trial Matching: Use agentic workflows to identify suitable trials for patients
  5. Hallucination Prevention: Implement a hallucination grader to ensure model outputs are contextually accurate

πŸ“Š Features

Core Capabilities

  • Patient Data Management: SQLite database with 100+ demo patients
  • Semantic Search: ChromaDB vector stores with Nomic embeddings
  • Multi-Model LLM: Groq and OpenAI integration with fallback logic
  • Policy Evaluation: Automated compliance checking with ReAct agent
  • Trial Matching: Intelligent matching with metadata filtering
  • Hallucination Prevention: Advanced grading with verification
  • Interactive Interface: Modern web UI with real-time processing

Advanced Features

  • Self-Query Retrieval: Metadata-aware trial search
  • Multi-step Workflows: LangGraph-based agent orchestration
  • Tool Integration: Date calculations and numerical operations
  • Profile Rewriting: Adaptive profile enhancement for better matches
  • Thread Management: Multi-session support with state persistence
  • Demo Mode: Dummy workflow for testing and demonstration
  • Configuration Management: Hydra-based flexible configuration

πŸ—„οΈ Demo Databases

The system includes demo databases for:

  • Patients: A SQLite database with 100+ demo patients
  • Diseases: A comprehensive list of diseases for trial matching
  • Clinical Trials: Data on various clinical trials for matching
  • Clinical Policies: Institutional policies stored in ChromaDB for compliance checking

🧠 RAG Strategies and Human-in-the-Loop

  • RAG Strategies: The system employs different Retrieval-Augmented Generation strategies to enhance trial matching
  • Hallucination Guard: Advanced grading and verification systems prevent hallucinations in model outputs
  • Human-in-the-Loop: Allows clinical research coordinators to review and adjust trial matching results

πŸ› οΈ Tech Stack

  • Python 3.12+: Core programming language
  • LangChain: Framework for building LLM applications
  • LangGraph: Workflow orchestration and agent management
  • Groq: Primary LLM provider with multiple model options
  • OpenAI: Alternative LLM provider for GPT models
  • Nomic: Embeddings for semantic search
  • ChromaDB: Vector database for semantic search
  • Gradio: Web interface framework
  • SQLite: Relational database for patient data
  • Hydra: Configuration management
  • Poetry: Dependency management
  • Pytest: Testing framework

πŸš€ Quick Start

Demo Video

πŸ“Ή Watch Demo Video - See the application in action with real example and usage scenario.

Prerequisites

  • Python 3.12+
  • Poetry (for dependency management)
  • Groq API key (primary) or OpenAI API key (alternative)

Installation

  1. Clone the repository and navigate to the llm_pharma directory:

    cd llm_pharma
  2. Install dependencies using Poetry:

    make dev-install
  3. Set up environment variables:

    # Create .env file with your API keys
    echo "GROQ_API_KEY=your_groq_api_key_here" > .env
    echo "OPENAI_API_KEY=your_openai_api_key_here" >> .env
  4. Set up data (required for first run):

    make setup-data

Running the Application

Web Interface (Recommended)

make run

Then visit http://127.0.0.1:7958 in your browser.

Demo Mode (for testing)

python frontend/app.py --demo

With Custom Configuration

python frontend/app.py --host 0.0.0.0 --port 8080 --share

πŸ—οΈ Architecture

The system is built using a modular architecture with the following key components:

Agent Pipeline Graph View

Backend Modules (backend/my_agent/)

  • WorkflowManager: Orchestrates the LangGraph-based evaluation workflow
  • LLMManager: Handles multiple LLM models with fallback logic (Groq/OpenAI)
  • DatabaseManager: Manages SQLite patient database and ChromaDB vector stores
  • PolicyService: Handles policy evaluation and eligibility assessment
  • TrialService: Manages trial matching and relevance scoring
  • PatientCollector: Handles patient data collection and profile generation
  • State Management: TypedDict-based workflow state management

Frontend (frontend/)

  • Gradio Web Interface: Interactive dashboard with multi-tab results
  • Real-time Processing: Live workflow execution with status updates
  • Thread Management: Multi-session support with state persistence
  • Demo Mode: Dummy workflow for testing and demonstration

Data Setup (scripts/)

  • Master Setup Script: Complete data initialization
  • Patients Database Creator: SQLite database with demo patients
  • Policies Vector Store Creator: ChromaDB for institutional policies
  • Trials Vector Store Creator: ChromaDB for clinical trials data

Configuration (config/)

  • Hydra Configuration: Centralized config management with YAML files
  • Environment Management: Secure API key and settings management

πŸ“ Project Structure

llm_pharma/
β”œβ”€β”€ backend/                  # Backend modules
β”‚   β”œβ”€β”€ my_agent/            # Core agent modules
β”‚   β”‚   β”œβ”€β”€ workflow_manager.py    # LangGraph workflow orchestration
β”‚   β”‚   β”œβ”€β”€ llm_manager.py         # Multi-model LLM management
β”‚   β”‚   β”œβ”€β”€ database_manager.py    # Database and vector store operations
β”‚   β”‚   β”œβ”€β”€ policy_service.py      # Policy evaluation and tools
β”‚   β”‚   β”œβ”€β”€ trial_service.py       # Trial matching and scoring
β”‚   β”‚   β”œβ”€β”€ patient_collector.py   # Patient data collection
β”‚   β”‚   └── State.py               # Workflow state management
β”‚   └── README.md            # Backend documentation
β”œβ”€β”€ frontend/                # Gradio web interface
β”‚   β”œβ”€β”€ app.py               # Main frontend application
β”‚   β”œβ”€β”€ helper_gui.py        # Comprehensive Gradio interface
β”‚   β”œβ”€β”€ demo_graph.py        # Demo mode with dummy workflow
β”‚   └── README.md            # Frontend documentation
β”œβ”€β”€ scripts/                 # Data setup scripts
β”‚   β”œβ”€β”€ setup_all_data.py    # Master setup script
β”‚   β”œβ”€β”€ create_patients_database.py      # Patients database creator
β”‚   β”œβ”€β”€ create_policies_vectorstore.py   # Policies vector store creator
β”‚   β”œβ”€β”€ create_trials_vectorstore.py     # Trials vector store creator
β”‚   └── README.md            # Scripts documentation
β”œβ”€β”€ config/                  # Configuration files
β”‚   └── config.yaml          # Main configuration
β”œβ”€β”€ tests/                   # Test suite
β”‚   β”œβ”€β”€ unit/                # Unit tests
β”‚   β”œβ”€β”€ integration/         # Integration tests
β”‚   └── regression/          # Regression tests
β”œβ”€β”€ data/                    # Data files
β”œβ”€β”€ vector_store/            # ChromaDB vector stores
β”œβ”€β”€ sql_server/              # SQLite databases
β”œβ”€β”€ source_data/             # Source documents
β”œβ”€β”€ pyproject.toml           # Poetry configuration
β”œβ”€β”€ Makefile                 # Development automation
└── README.md               # This file

πŸ”§ Configuration

The system uses Hydra for configuration management. Key configuration files:

  • config/config.yaml: Main configuration file
  • Environment variables: GROQ_API_KEY, OPENAI_API_KEY

Key Configuration Options

# Model settings
models:
  agent_models:
    - id: "mistral-saba-24b"
      provider: "groq"
  tool_models:
    - id: "llama-3.3-70b-versatile"
      provider: "groq"

# Directory paths
directories:
  sql_server: "sql_server"
  vector_store: "vector_store"

# File paths
files:
  policy_markdown: "source_data/instut_trials_policy.md"
  trials_csv: "data/trials_data.csv"

🎯 Usage Examples

Web Interface Usage

  1. Start the web interface:

    make run
  2. Enter a patient query:

    Is patient 15 eligible for any clinical trials?
    
  3. Review results in the detailed tabs:

    • Agent Tab: Workflow control and status
    • Patient Profile: Generated patient summary (editable)
    • Policy Evaluation: Institutional policy compliance
    • Trials Summary: Overview of matched trials
    • Potential Trials: Detailed trial information
    • Trials Scores: Comprehensive scoring and ranking

Programmatic Usage

from omegaconf import OmegaConf
from backend.my_agent.workflow_manager import WorkflowManager

# Load configuration
config = OmegaConf.load("config/config.yaml")

# Initialize workflow manager
workflow_manager = WorkflowManager.from_config(config)

# Run evaluation
result = workflow_manager.run_workflow(
    patient_prompt="Is patient 5 eligible for any medical trial?",
    thread_id="example_session"
)

# Process results
print(f"Patient ID: {result['patient_id']}")
print(f"Policy Eligible: {result['policy_eligible']}")
print(f"Trials Found: {result['trial_found']}")

πŸ”„ Development Workflow

Available Make Commands

make install            # Install production dependencies
make dev-install        # Install development dependencies
make run               # Run Gradio frontend
make setup-data        # Set up all data (patients, policies, trials)
make test-all          # Run all tests
make test-unit         # Run unit tests only
make test-integration  # Run integration tests only
make test-regression   # Run regression tests
make lint              # Run linting (ruff)
make format            # Format code (black, isort)
make check             # Run all checks (lint + format + test)
make clean             # Clean up build artifacts

Code Quality Tools

  • Ruff: Fast Python linter
  • Black: Code formatting
  • isort: Import sorting
  • pytest: Testing framework

πŸ§ͺ Testing

The project includes comprehensive unit, integration, and regression tests:

# Run all tests
make test-all

# Run unit tests only
make test-unit

# Run integration tests only  
make test-integration

# Run regression tests
make test-regression

Test Coverage

  • Unit Tests: Individual module functionality
  • Integration Tests: Module interaction testing
  • Regression Tests: End-to-end workflow validation

πŸ”’ Security & Privacy

  • Synthetic Data: All patient data is synthetic and created for demonstration purposes - no real names or persons are used
  • Secure Configuration: Environment-based API key management
  • Input Validation: Comprehensive data validation and sanitization
  • Error Handling: Graceful error handling with informative messages

πŸ“ˆ Performance

  • Model Fallback: Automatic switching between LLM providers
  • Caching: Vector store persistence for faster retrieval
  • Optimized Queries: Efficient database operations
  • Memory Management: SQLite checkpointing for workflow state

πŸ“š Documentation

For detailed information about specific components:

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Install development dependencies: make dev-install
  4. Make your changes and add tests
  5. Run the test suite: make check
  6. Commit your changes: git commit -m 'Add amazing feature'
  7. Push to the branch: git push origin feature/amazing-feature
  8. Open a Pull Request

Development Guidelines

  • Follow PEP 8 style guidelines
  • Add type hints to all functions
  • Write comprehensive tests for new features
  • Update documentation as needed
  • Use descriptive commit messages

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ“ž Support

For questions, issues, or contributions:

  1. Check the Issues: Look for existing issues or create a new one
  2. Documentation: Review component-specific README files
  3. Tests: Run the test suite to verify functionality

🚧 Future Enhancements

  • Enhanced RAG: Graph-based retrieval with entity relationships
  • Multi-modal Support: Image and document processing
  • Advanced Analytics: Trial success prediction and patient outcome analysis
  • Integration APIs: RESTful APIs for external system integration
  • Scalability: Distributed processing and cloud deployment options
  • Embedding Optimization: Fine-tuning of embeddings for clinical domain specificity

Note: This system is designed for research and demonstration purposes. For production clinical trial management, consult with healthcare professionals and ensure compliance with relevant regulations and standards.

πŸ‘¨β€πŸ’» Author

Bob Hosseini


Built with ❀️ for advancing clinical trial management through AI

About

This is a tutorial of an agentic Large Language Model (LLM) application to automate the evaluation of patients for clinical trials. It leverages documents related to patients' medical histories, clinical policies, and trial inclusion criteria.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors