This project is designed to run anomaly detection experiments using various scikit-learn models in a containerized environment. It automates the process of model training, hyperparameter tuning, and evaluation, ensuring reproducibility by tracking experiment configurations.
The core of the project is the anomaly_detection.py script. This script performs the following steps:
- Configuration: It reads configuration from environment variables. These include the dataset ID from OpenML, test set size, cross-validation folds, and model-specific hyperparameters.
- Experiment ID: It computes a unique EXPERIMENT_ID based on the current experiment's configuration. This is used to prevent re-running identical experiments.
- Data Loading & Preprocessing: It downloads the specified dataset (BoT-IoT) from OpenML, applies one-hot encoding to categorical features, and prepares the target variable for anomaly detection.
- Model Training: It uses GridSearchCV to train and tune the specified machine learning models with the provided hyperparameters. It includes special handling for models like IsolationForest that do not require standard cross-validation.
- Evaluation & Comparison: The script evaluates the models based on their cross-validation accuracy scores, and identifies the best-performing model.
The project uses Docker and Docker Compose to manage the experimental environment. The docker-compose.yml file defines services to run experiments for different models in isolation:
- experiment-rf: Tests a RandomForestClassifier.
- experiment-mlp: Tests an MLPClassifier.
Each service is built from a common Dockerfile and runs the same Python script, but with different environment variables to specify the model and its hyperparameters.
- Docker and Docker Compose installed on your machine.
- Access to OpenML datasets (the script uses the KDD-99 dataset by default).
To run the project, follow these steps:
- Clone the repository:
git clone
- Navigate to the project directory:
cd containerisation_exam - Build the Docker images:
docker-compose build
- Run the experiments:
docker-compose up
- Monitor the output logs to see the results of each experiment.
- After the experiments complete, you can check the results in the
resultsdirectory.