Skip to content

Latest commit

 

History

History
84 lines (70 loc) · 2.17 KB

File metadata and controls

84 lines (70 loc) · 2.17 KB

Workshop on Machine Learning

This is the repository for the Machine Learning workshop organized by the MALTO (MAchine Learning @ Polito) team at the Politecnico di Torino.

Part I - Data Exploration and Preprocessing

  • Introduction
  • Dataset Loading
  • Dataset Overview
  • Data Quality
  • Descriptive Statistics
  • Basic Data Visualization
  • Exploring Feature Relationships
  • Handling Missing Values
  • Feature Engineering
  • EXTRA EXERCISE: Unupervised Exploration: PCA and t-SNE

The labs of Part I have been curated by Andrea Lolli, Samet Basarat, and Claudio Savelli.

PART II - Classification

  • Introduction
  • Setup
  • Training Different Models
    • Logistic Regression
    • K-Nearest Neighbors
    • Decision Tree
    • Random Forest
    • Support Vector Classifier
    • Naive Bayes
  • Model Evaluation
    • Confusion Matrix
    • Accuracy, Precision, Recall, F1 Score
    • ROC Curve and AUC
  • Hyperparameter Tuning
  • Cross-Validation

The labs of Part II have been curated by Ayberk Munis, and Claudio Savelli.

PART III - Regression

  • Introduction
    • Regression vs Classification
    • Evaluation Metrics
  • Loading and Inspection the Dataset
  • Train/Validation/Test Split
  • Feature Scaling
  • Training Different Models and Error Evaluation
    • Linear Regression
    • Polynomial Regression
    • Ridge Regression
    • Lasso Regression
    • Decision Tree Regressor
    • Random Forest Regressor
    • Support Vector Regressor
  • Error Analysis

The labs of Part III have been curated by Tommaso Mazzarini, Arman, and Claudio Savelli.

PART IV - Clustering

  • Introduction
    • Supervised Learning VS Unsupervised Learning
  • Loading and preparing the Data
    • Initial feature visualization and distribution
      • Histogram
      • Pairplot
      • Importance of Feature Scaling
  • Exploring the Data Structure
    • PCA (Principal Component Analysis)
    • t-SNE (t-Distributed Stochastic Neighbor Embedding)
  • Clustering Algorithms
    • K-means
    • DBSCAN
    • Hierarchical Clustering
  • How to evaluate Clustering performance
    • Elbow Method
    • Silhouette Score
    • Rand Index
    • Adjusted Rand Index
    • Davies-Bouldin Index

The labs of Part IV have been curated by Niccolò Malgeri, Emanuele and Claudio Savelli.