diff --git a/.codeboarding/Application_Orchestration.md b/.codeboarding/Application_Orchestration.md new file mode 100644 index 0000000..7dd6351 --- /dev/null +++ b/.codeboarding/Application_Orchestration.md @@ -0,0 +1,161 @@ +```mermaid + +graph LR + + Application_Orchestration["Application Orchestration"] + + DataHandler["DataHandler"] + + ModelArchitectures["ModelArchitectures"] + + Trainer["Trainer"] + + Predictor["Predictor"] + + Utilities["Utilities"] + + Logger["Logger"] + + Application_Orchestration -- "initiates" --> Trainer + + Application_Orchestration -- "initiates" --> Predictor + + Application_Orchestration -- "configures" --> DataHandler + + Application_Orchestration -- "configures" --> ModelArchitectures + + Application_Orchestration -- "utilizes" --> Utilities + + Application_Orchestration -- "logs via" --> Logger + + Trainer -- "trains" --> ModelArchitectures + + Trainer -- "processes data with" --> DataHandler + + Trainer -- "logs via" --> Logger + + Predictor -- "loads" --> ModelArchitectures + + Predictor -- "processes data with" --> DataHandler + + Predictor -- "logs via" --> Logger + + DataHandler -- "utilizes" --> Utilities + + ModelArchitectures -- "utilizes" --> Utilities + + Logger -- "utilizes" --> Utilities + + click Application_Orchestration href "https://github.com/pfizer-opensource/HLAIIPred/blob/main/.codeboarding//Application_Orchestration.md" "Details" + +``` + + + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Details + + + +The `Application Orchestration` component serves as the central control point for the entire application, embodying the `Configuration-Driven Design` and `Command-Line Interface (CLI)` architectural patterns. It is fundamental because it acts as the application's entry point, interpreting user commands and configuration to initiate the correct workflow (training or prediction). Without this component, the application would lack a unified starting mechanism and the ability to adapt its behavior based on user input and predefined settings. + + + +### Application Orchestration [[Expand]](./Application_Orchestration.md) + +This component is the primary control point, responsible for parsing command-line arguments, loading and interpreting configuration settings from `models/config.yaml`, and orchestrating the execution of either training or prediction workflows. It acts as the central dispatcher, directing control to the appropriate core components based on user input and configuration. + + + + + +**Related Classes/Methods**: + + + +- `cli.HLAIIPred` (1:1) + + + + + +### DataHandler + +Essential for managing all data-related operations, from loading raw data to preparing it for model consumption. It ensures data consistency and efficiency, which is vital for deep learning applications. + + + + + +**Related Classes/Methods**: _None_ + + + +### ModelArchitectures + +Defines the core intellectual property of the project – the neural network structure. It's fundamental as it encapsulates the model's design, allowing for modularity and reusability. + + + + + +**Related Classes/Methods**: _None_ + + + +### Trainer + +Orchestrates the learning process, making it a core component for any deep learning development project. It manages the training loop, optimization, and model persistence. + + + + + +**Related Classes/Methods**: _None_ + + + +### Predictor + +Handles the application of the trained model to new data, delivering the primary output of the system. It's crucial for the practical utility of the developed model. + + + + + +**Related Classes/Methods**: _None_ + + + +### Utilities + +Provides common helper functions, promoting code reusability and reducing redundancy across the project. This is a foundational support component. + + + + + +**Related Classes/Methods**: _None_ + + + +### Logger + +Critical for monitoring, debugging, and understanding the application's behavior during execution. It provides essential feedback for development and operational purposes. + + + + + +**Related Classes/Methods**: _None_ + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Data_Management.md b/.codeboarding/Data_Management.md new file mode 100644 index 0000000..0654fd9 --- /dev/null +++ b/.codeboarding/Data_Management.md @@ -0,0 +1,253 @@ +```mermaid + +graph LR + + DataHandler["DataHandler"] + + Model["Model"] + + Trainer["Trainer"] + + Predictor["Predictor"] + + Utilities["Utilities"] + + Configuration_Manager["Configuration Manager"] + + Logger["Logger"] + + CLI["CLI"] + + Data_Storage["Data Storage"] + + Project_Entry_Point["Project Entry Point"] + + DataHandler -- "utilizes" --> Utilities + + Trainer -- "uses" --> DataHandler + + Predictor -- "uses" --> DataHandler + + Trainer -- "interacts with" --> Model + + Predictor -- "interacts with" --> Model + + Trainer -- "logs to" --> Logger + + Predictor -- "logs to" --> Logger + + CLI -- "invokes" --> Trainer + + CLI -- "invokes" --> Predictor + + DataHandler -- "reads from" --> Data_Storage + + Configuration_Manager -- "provides to" --> Trainer + + Configuration_Manager -- "provides to" --> Predictor + +``` + + + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Details + + + +The Data Management component, specifically the DataHandler, is fundamental to this deep learning project due to its critical role in preparing biological sequence data for model consumption. In deep learning, the quality and format of input data directly impact model performance. The DataHandler ensures that raw, complex biological sequences are transformed into a clean, numerical, and batch-ready format, which is essential for efficient training and accurate predictions. This analysis focuses on the DataHandler and its interactions within the Deep Learning Model Development and Application pattern, detailing core functionalities and relationships among components for maintainability and scalability. + + + +### DataHandler + +Responsible for loading, preprocessing, and encoding raw biological sequence data (peptides, HLA alleles) into a numerical format suitable for deep learning models. It handles tokenization, padding, numerical encoding, and creates data loaders for efficient batching during training and inference. + + + + + +**Related Classes/Methods**: + + + +- `hlapred.dataset` (1:1) + +- `hlapred.utils.get_encoding` (39:75) + + + + + +### Model + +Encapsulates the deep learning model architecture, including layers, activation functions, and forward pass logic. It receives processed numerical data from the DataHandler and performs predictions. + + + + + +**Related Classes/Methods**: + + + +- `hlapred.model_modules` (1:1) + + + + + +### Trainer + +Manages the model training process. It orchestrates the training loop, including iterating over epochs, fetching batches of data from the DataHandler, performing forward and backward passes, optimizing model parameters, and logging training metrics. + + + + + +**Related Classes/Methods**: + + + +- `hlapred.train` (1:1) + + + + + +### Predictor + +Handles the inference process, using a trained model to make predictions on new, unseen data. It utilizes the DataHandler to prepare input data for prediction and the Model to generate outputs. + + + + + +**Related Classes/Methods**: + + + +- `hlapred.predict` (1:1) + + + + + +### Utilities + +Provides helper functions and common utilities used across various components, such as data encoding, file I/O, and general data manipulation. The get_encoding function is a key utility for the DataHandler. + + + + + +**Related Classes/Methods**: + + + +- `hlapred.utils` (1:1) + + + + + +### Configuration Manager + +Responsible for loading and managing project configurations, including model hyperparameters, training settings, and data paths. This promotes a configuration-driven design, making the project flexible and easy to adapt. + + + + + +**Related Classes/Methods**: + + + +- `models.config` (1:1) + + + + + +### Logger + +Handles logging of events, progress, and errors throughout the application, providing insights into the execution flow and aiding in debugging. + + + + + +**Related Classes/Methods**: + + + +- `hlapred.logger` (1:1) + + + + + +### CLI + +Provides a command-line interface for users to interact with the application, initiating training, prediction, or other tasks. It acts as the entry point for user commands. + + + + + +**Related Classes/Methods**: + + + +- `cli.HLAIIPred` (1:1) + + + + + +### Data Storage + +Represents the physical location where raw and processed data are stored. While not a software component in the traditional sense, it's a crucial part of the data flow. + + + + + +**Related Classes/Methods**: + + + +- `data.raw` (1:1) + +- `data.processed` (1:1) + + + + + +### Project Entry Point + +The main entry point for the entire application, often responsible for parsing command-line arguments and orchestrating the execution of other components based on user input. + + + + + +**Related Classes/Methods**: + + + +- `__main__` (1:1) + + + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Prediction_Inference.md b/.codeboarding/Prediction_Inference.md new file mode 100644 index 0000000..58dab92 --- /dev/null +++ b/.codeboarding/Prediction_Inference.md @@ -0,0 +1,141 @@ +```mermaid + +graph LR + + Predictor["Predictor"] + + Model_Architectures["Model Architectures"] + + Data_Handler["Data Handler"] + + Utilities["Utilities"] + + Logger["Logger"] + + Predictor -- "loads/infers from" --> Model_Architectures + + Predictor -- "uses" --> Data_Handler + + Predictor -- "utilizes" --> Utilities + + Predictor -- "logs via" --> Logger + + Model_Architectures -- "provides models to" --> Predictor + + Data_Handler -- "prepares data for" --> Predictor + + Utilities -- "provides support to" --> Predictor + + Logger -- "receives logs from" --> Predictor + +``` + + + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Details + + + +The `Prediction & Inference` subsystem is designed to efficiently utilize a trained deep learning model to generate predictions on new data. It encapsulates the entire workflow from data preparation to final output, ensuring a robust and observable inference process. + + + +### Predictor + +This is the central orchestrator of the inference process. It is responsible for loading pre-trained model weights, preparing input data for prediction, executing the forward pass through the neural network, and generating the final prediction results. It also handles specific aspects like processing different allele types and, if configured, compiling attention weights for interpretability. + + + + + +**Related Classes/Methods**: + + + +- `hlapred/predict.py` (1:1) + + + + + +### Model Architectures + +Defines the neural network architectures used for prediction. This includes the specific layers, attention mechanisms (e.g., `DeConvolutionalAttention`), and overall model structure that the `Predictor` will instantiate and use. + + + + + +**Related Classes/Methods**: + + + +- `hlapred/model_modules.py` (1:1) + + + + + +### Data Handler + +Manages the preparation of raw input data into the correct format required by the model and creates `DataLoader` instances for efficient batch processing during inference. It ensures data integrity and optimal delivery to the model. + + + + + +**Related Classes/Methods**: + + + +- `hlapred/dataset.py` (1:1) + + + + + +### Utilities + +A collection of reusable helper functions and constants that support various operations across the project. This includes tasks like configuration parsing, data transformations, and general computations that are leveraged by other components. + + + + + +**Related Classes/Methods**: + + + +- `hlapred/utils.py` (1:1) + + + + + +### Logger + +Provides a centralized mechanism for recording events, progress, warnings, and errors during the prediction process. It is crucial for monitoring the system's behavior, debugging issues, and ensuring traceability. + + + + + +**Related Classes/Methods**: + + + +- `hlapred/logger.py` (1:1) + + + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/on_boarding.md b/.codeboarding/on_boarding.md new file mode 100644 index 0000000..148f7db --- /dev/null +++ b/.codeboarding/on_boarding.md @@ -0,0 +1,155 @@ +```mermaid + +graph LR + + Data_Management["Data Management"] + + Model_Core["Model Core"] + + Training_Evaluation["Training & Evaluation"] + + Prediction_Inference["Prediction & Inference"] + + Application_Orchestration["Application Orchestration"] + + Data_Management -- "Provides Data To" --> Training_Evaluation + + Data_Management -- "Provides Data To" --> Prediction_Inference + + Application_Orchestration -- "Configures" --> Data_Management + + Model_Core -- "Provides Structure To" --> Training_Evaluation + + Model_Core -- "Provides Structure To" --> Prediction_Inference + + Application_Orchestration -- "Configures" --> Model_Core + + Training_Evaluation -- "Provides Trained Model To" --> Prediction_Inference + + Application_Orchestration -- "Triggers & Configures" --> Training_Evaluation + + Prediction_Inference -- "Provides Results To" --> Application_Orchestration + + Application_Orchestration -- "Triggers & Configures" --> Prediction_Inference + + click Data_Management href "https://github.com/pfizer-opensource/HLAIIPred/blob/main/.codeboarding//Data_Management.md" "Details" + + click Prediction_Inference href "https://github.com/pfizer-opensource/HLAIIPred/blob/main/.codeboarding//Prediction_Inference.md" "Details" + + click Application_Orchestration href "https://github.com/pfizer-opensource/HLAIIPred/blob/main/.codeboarding//Application_Orchestration.md" "Details" + +``` + + + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Details + + + +One paragraph explaining the functionality which is represented by this graph. What the main flow is and what is its purpose. + + + +### Data Management [[Expand]](./Data_Management.md) + +Responsible for loading, preprocessing, and encoding raw biological sequence data into a numerical format suitable for deep learning models. It ensures data integrity and prepares it for consumption by the model training and inference processes. + + + + + +**Related Classes/Methods**: + + + +- `hlapred/dataset.py` (1:1) + +- `hlapred/utils.py:get_encoding` (39:75) + + + + + +### Model Core + +Defines the neural network architecture, including its various layers, attention mechanisms, and configurable building blocks (e.g., EncoderBlock, DecoderBlock). It provides the blueprint for the deep learning model. + + + + + +**Related Classes/Methods**: + + + +- `hlapred/model_modules.py` (1:1) + + + + + +### Training & Evaluation + +Orchestrates the entire model training lifecycle. This includes instantiating the model, setting up optimizers and loss functions, managing the training loop (forward/backward passes), performing periodic evaluations, and saving trained model states. + + + + + +**Related Classes/Methods**: + + + +- `hlapred/train.py` (1:1) + + + + + +### Prediction & Inference [[Expand]](./Prediction_Inference.md) + +Manages the process of using a trained model to make predictions on new, unseen data. It handles loading trained model weights, preparing input for inference, executing the forward pass through the model, and outputting the prediction results. + + + + + +**Related Classes/Methods**: + + + +- `hlapred/predict.py` (1:1) + + + + + +### Application Orchestration [[Expand]](./Application_Orchestration.md) + +Serves as the primary control point for the application. It handles configuration loading, parses command-line arguments, and orchestrates the execution of training or prediction workflows by interacting with other core components. + + + + + +**Related Classes/Methods**: + + + +- `models/config.yaml` (1:1) + +- `cli/HLAIIPred` (1:1) + + + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file