This repository contains a PyTorch implementation of a regression Mutilayered Perceptron model designed for tabular data. It includes data loading, preprocessing, model definition, training, and evaluation functionalities.
- Features
- Installation
- Usage
- Dataset Configuration
- Model Configuration
- Training Configuration
- Dependencies
- Logging
- Error Handling
- Metrics and Visualization
- Contributing
- License
- Data Loading and Preprocessing:
- Loads data from CSV files.
- Supports chunked loading for large datasets.
- Handles missing columns and empty datasets.
- Implements various data scaling techniques (StandardScaler, MinMaxScaler, MaxAbsScaler, RobustScaler, Normalizer).
- Adds noise to the data (Normal, Uniform, Poisson).
- Allows scaling and noise parameters to be configured.
- Model Definition:
- Defines a customizable neural network model with configurable hidden layers, dropout, and L1 regularization.
- Includes batch normalization and ELU activation.
- Training and Evaluation:
- Implements training and validation loops with error handling.
- Supports early stopping based on validation loss.
- Uses Adam optimizer with configurable learning rate, weight decay, and learning rate schedulers (StepLR, ReduceLROnPlateau).
- Calculates and logs metrics (MSE, MAE, R-squared).
- Collects and visualizes training and validation losses, MSE, MAE, and R-squared.
- Plots residuals against predicted values.
- Device Management:
- Automatically detects and uses available GPU (CUDA or MPS) or CPU.
- Logging:
- Uses Python's
loggingmodule to log training and evaluation information. - Logs to both a file (
learn_model.log) and the console.
- Uses Python's
- Error Handling:
- Includes custom exceptions for dataset-related errors.
- Comprehensive error handling throughout the code.
-
Clone the repository:
git clone <repository_url> cd <repository_directory>
-
Install the required dependencies:
pip install numpy pandas torch scikit-learn matplotlib
-
Prepare your dataset:
- Place your CSV data file in the specified
rootdirectory. - Ensure the CSV file contains the columns specified in
xcolandycolwithin theDatasetConfig.
- Place your CSV data file in the specified
-
Configure the dataset:
- Modify the
DatasetConfigin theif __name__ == '__main__':block to match your dataset. - Set the
root,csv_file,xcol,ycol,scaler_type,noise_type,noise_std, andscaling_factoras needed.
- Modify the
-
Configure the model and training:
- Adjust the
configdictionary in theif __name__ == '__main__':block to configure the model, optimizer, learning rate schedulers, and early stopping.
- Adjust the
-
Run the script:
python <your_script_name>.py
-
View the results:
- The script will output training and evaluation metrics to the console and log file.
- Plots of the training and validation losses, MSE, MAE, R-squared, and residuals will be displayed.
The DatasetConfig dataclass allows you to configure the dataset:
@dataclass
class DatasetConfig:
root: str
xcol: list[str]
ycol: list[str]
scaler_type: ScalerType = ScalerType.STANDARD
noise_type: NoiseType = NoiseType.NORMAL
csv_file: str = 'dummy_data.csv'
scaler: object = None
noise_params: dict = None
chunksize: int = None
noise_std: float = 0.0
scaling_factor: float = 1.0