Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
8ed4626
Updated Syllabus
camarocico Jan 28, 2026
62764b9
Switched to new version
camarocico Jan 28, 2026
6e01e8c
Added markdown to jupytext
camarocico Jan 28, 2026
559ab21
Updated .gitignore
camarocico Jan 28, 2026
592aca7
Added README.md
camarocico Jan 28, 2026
5d5f63d
Updated requirements.txt
camarocico Jan 28, 2026
c0937fa
Removed requirement_gpu.txt
camarocico Jan 28, 2026
a9f4804
S01 Introduction to Machine Learning
camarocico Jan 28, 2026
1a4d13a
Updated .gitignore
camarocico Jan 28, 2026
dec12f6
S01C01 What is Machine Learning
camarocico Jan 28, 2026
48513d6
S01C00 Working with Jupyter on Habrok
camarocico Jan 29, 2026
2384773
S01C01 What is Machine Learning
camarocico Jan 29, 2026
8e73566
S01C02 The Machine Learning Workflow
camarocico Jan 29, 2026
f39ea12
S01C00 Working with Jupyter on Habrok
camarocico Jan 29, 2026
f826d1b
S01C01 What is Machine Learning
camarocico Jan 29, 2026
b89b2d8
S01C02 The Machine Learning Workflow
camarocico Jan 29, 2026
e8460d8
S01C03 Exploratory Data Analysis (EDA)
camarocico Jan 29, 2026
5e908c5
S01C00 Working with Jupyter
camarocico Jan 29, 2026
e014d7a
Added info about HB accounts
camarocico Jan 29, 2026
f545c31
Updated README.md
camarocico Jan 29, 2026
f49eff5
Modified .gitignore
camarocico Jan 30, 2026
f73d408
S01C00 Working with Jupyter
camarocico Jan 30, 2026
5902f46
S01C00 Working with Jupyter
camarocico Jan 30, 2026
c9dcb9b
Updated S01C00 Working with Jupyter
camarocico Jan 30, 2026
3920b3f
Removed comment, updated venv name
camarocico Jan 30, 2026
be8272b
Removed torch entries in requirements.txt
camarocico Jan 30, 2026
07f1b2e
Added .placeholder for data
camarocico Jan 30, 2026
e4c413d
S01C00 Working with Jupyter
camarocico Jan 30, 2026
6b8567d
S01C01 What is Machine Learning
camarocico Jan 31, 2026
8111a68
Updated ML Workflow
camarocico Jan 31, 2026
577c2f7
Updated iris_data display
camarocico Jan 31, 2026
7521f85
Small updates to the Machine Learning Workflo
camarocico Jan 31, 2026
081a054
S01C03 Exploratory Data Analysis (EDA)
camarocico Jan 31, 2026
b305eb1
S01C03 Exploratory Data Analysis (EDA)
camarocico Jan 31, 2026
be077ce
S01C04 Train-Test Splits and Cross-Validation
camarocico Jan 31, 2026
819c069
S01C05 Data Preprocessing
camarocico Feb 1, 2026
7f7d4e3
S01C06 Feature Engineering
camarocico Feb 1, 2026
f2f7284
S02 Supervised Learning: Regression and Evaluation
camarocico Feb 1, 2026
b4e4705
S02C01 Simple Linear Regression
camarocico Feb 1, 2026
1befb16
S02C02 Regression Diagnostics
camarocico Feb 1, 2026
2138cc5
S02C03 Advanced Evaluation Metrics
camarocico Feb 1, 2026
aaf887a
S02C04 Multiple Linear Regression
camarocico Feb 2, 2026
4ba9415
S02C05 Regularized Regression
camarocico Feb 2, 2026
dd61c14
S02C06 Nonlinearities in Regression
camarocico Feb 2, 2026
127e647
S02C07 Overfitting, Underfitting, and Hyperparameter Tuning
camarocico Feb 2, 2026
fdd2bfc
S04 Unsupervised Learning Slides
camarocico Feb 4, 2026
916443c
S04C01 Overview of Unsupervised Learning
camarocico Feb 4, 2026
9ecd1d5
S04C02 Dimensionality Reduction Techniques
camarocico Feb 4, 2026
20585cf
S04C03 Clustering Algorithms
camarocico Feb 4, 2026
70e6a4c
S04C04 Anomaly Detection
camarocico Feb 4, 2026
78c1852
S02C07
camarocico Feb 4, 2026
4d7294f
S03 Supervised Learning: Classification and Evaluation
camarocico Feb 4, 2026
254711d
S03C01 Introduction to Classification
camarocico Feb 4, 2026
5a0ab99
S03C02 Classification Basics
camarocico Feb 4, 2026
cffa9df
S03C03 Evaluation Metrics
camarocico Feb 4, 2026
5bec81c
S03C04 k-NNs and Decision Trees
camarocico Feb 4, 2026
72ab363
S03C04 k-NNs and Decision Trees
camarocico Feb 4, 2026
b58405c
S03C05 Support Vector Machines
camarocico Feb 4, 2026
9af273e
S05 Neural Networks I
camarocico Feb 5, 2026
a8bc196
S05C01 Introduction to Artificial Neural Networks
camarocico Feb 5, 2026
281a951
S05C02 The Multilayer Perceptron (MLP)
camarocico Feb 5, 2026
498e8e8
S05C03 Training Neural Networks
camarocico Feb 5, 2026
8818351
S05C04 Optimizing Neural Networks
camarocico Feb 5, 2026
0d4f96b
S05C05 Building a Network in PyTorch
camarocico Feb 5, 2026
1efdab1
S06 Neural Networks II
camarocico Feb 5, 2026
318b768
S06C01 Convolutional Neural Networks
camarocico Feb 5, 2026
33ca7d5
S06C02 From RNNs to Transformers
camarocico Feb 5, 2026
8c03fc9
S06C03 Large Language Models (LLMs)
camarocico Feb 5, 2026
5d19b2f
S06C04 Generative AI and Modern Applications
camarocico Feb 5, 2026
5b735d3
Some updates
camarocico Feb 5, 2026
9c9e768
S05 Neural Networks I - Foundations and Optimization
camarocico Feb 16, 2026
e09c51f
Updated requirements.txt
camarocico Feb 17, 2026
546428d
S06 Neural Networks II
camarocico Feb 19, 2026
de481cf
Session 01: Introduction to Machine Learning
camarocico Feb 23, 2026
31ee536
Session 02: Supervised Learning - Regression and Evaluation
camarocico Feb 23, 2026
ac0738b
Session 03: Supervised Learning - Classification and Evaluation
camarocico Feb 23, 2026
e32b109
Session 04: Unsupervised Learning - Clustering and Dimensionality Red…
camarocico Feb 23, 2026
e1d5b64
Session 05: Neural Networks I - Foundations and Optimization
camarocico Feb 23, 2026
92c8b21
Session 06: Neural Networks II - DL, Transformers, GenAI
camarocico Feb 24, 2026
e1874e8
Session 07: Ensemble Methods
camarocico Feb 24, 2026
3529eef
Updated Session 06
camarocico Feb 24, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 8 additions & 6 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
data/
.venv
.ipynb_checkpoints
*.org
*.db
src/*/*.md
.idea/
.ipynb_checkpoints/
__pycache__/
*.pyc
src/S*/*.md
uv.lock
.python-version
pyproject.toml
jupytext.toml
data/*.csv
45 changes: 45 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Machine Learning in Python

This repository is associated with the "Machine Learning in Python" training. The lastest iteration of this training is February 2026, and the material for this iteration can be found in the `feb2026` branch of the repository.


## Introduction

The training is structured in six sessions spread over three weeks, with two sessions per week.

Each session is four hours long, with a total in-class time for the entire training of 24 hours. To get the most benefit from the training, we recommend that you spend as much time outside of the class working on your own, with your own datasets.

Each session is made up of several concepts, and each concept is structured as follows:

- A concept is introduced via slides and in a Jupyter notebook
- You will be shown some practical examples that illustrate the concept
- You will be asked to write your own code that uses the concept to solve a small problem
- I will show you how I solved the same problem


## Working with Jupyter

Python is a very popular programming language for Machine Learning, and it is what we will use in this training.

When dealing with heavier computations, the Python code is usually placed in a `.py` file which is then executed as a whole. In a prototyping or teaching situation, though, it is best to do things more interactively, which is why we will use [Jupyter](https://jupyter.org/) notebooks via `JupyterLab` to arrange our code.

During the first session, we will cover three ways of running Jupyter, but we will only use one during the training (as much as possible):


### Jupyter on Habrok

For this training, we recommend using the University of Groningen's High Performance Computing (HPC) cluster, **Habrok**. This ensures everyone has access to the same computational resources and hardware.

**NOTE**:

> You need to have a University of Groningen account to use Habrok. Your account also needs to be enable on Habrok, which you can request by following the [info](https://wiki.hpc.rug.nl/habrok/introduction/policies) on the wiki.


### Jupyter locally

If you don't have a University account, or prefer to run things locally on your computer, you will need to install Jupyter (and Python) locally. We give a few pointers on how to do this, but it depends on the operating system you work with, so we do not go into details.


### Jupyter in Google

If neither of the above options is workable for you, you can use [Google Colab](https://colab.google/), which is an online platform provided by Google to run Jupyter. Although this is easy to set up and requires minimum configuration, we do not recommend it for long-term usage.
116 changes: 68 additions & 48 deletions Syllabus.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,87 +136,107 @@
- Kernel Trick


# Session 4: Ensemble Methods
# Session 4: Unsupervised Learning - Clustering and Dimensionality Reduction


## Introduction to Ensemble Methods
## Introduction to Unsupervised Learning

- What are Ensemble Methods?
- Why Use Ensemble Methods?
- Types of Ensemble Methods
- Supervised vs Unsupervised
- Dimensionality Reduction
- Clustering
- Anomaly Detection


## Bagging and Random Forests
## Dimensionality Reduction

- Bagging (Bootstrap Aggregating) Overview
- Random Forests
- Feature Importance in Random Forests
- Curse of Dimensionality
- Principal Component Analysis (PCA)
- Choosing the number of components
- Linear Discriminant Analysis (LDA)
- t-SNE and UMAP (Visualization techniques)


## Boosting
## Clustering

- Boosting Overview
- AdaBoost
- Gradient Boosting
- XGBoost
- k-Means Clustering
- Elbow Method and Silhouette Score
- Hierarchical Clustering (Dendrograms)
- DBSCAN (Density-based clustering)


## Ensembles of Ensembles
## Anomaly Detection

- Stacking
- Multi-level EoE
- Isolation Forests


# Session 5: Unsupervised Learning - Clustering and Dimensionality Reduction
# Session 5: Neural Networks I - Foundations and Optimization


## Introduction to Unsupervised Learning
## Introduction to Artificial Neural Networks

- Dimensionality Reduction
- Clustering
- Anomaly Detection
- Biological Inspiration
- The Perceptron Model
- Limitations of the Perceptron (XOR problem)


## Dimensionality Reduction
## The Multilayer Perceptron (MLP)

- Principal Component Analysis (PCA)
- Autoencoders
- Linear Discriminant Analysis (LDA)
- Input, Hidden, and Output Layers
- Activation Functions (Sigmoid, Tanh, ReLU, Softmax)
- Forward Propagation (Matrix notation)


## Clustering
## Training Neural Networks

- k-Means Clustering
- Hierarchical Clustering
- Loss Functions (MSE vs. Cross-Entropy)
- Gradient Descent and Stochastic Gradient Descent (SGD)
- Backpropagation intuition
- The Vanishing Gradient Problem


## Anomaly Detection
## Optimizing Neural Networks

- Weight Initialization strategies
- Optimizers (Momentum, RMSProp, Adam)
- Regularization (Dropout, Batch Normalization, Early Stopping)

# Session 6: Artificial Neural Networks and Deep Learning

## Building a Network in PyTorch/TensorFlow

## Introduction to Neural Networks
- Tensors and Operations
- Defining a simple architecture
- The Training Loop

- What are Neural Networks?
- Biological Inspiration
- Structure of a Neural Network
- Activation Functions

# Session 6: Neural Networks II - Deep Learning, Transformers, and GenAI

## Training Neural Networks

- Forward Propagation
- Backpropagation
- Loss Functions
- Gradient Descent
## Convolutional Neural Networks (CNNs)

- Convolution and Pooling Layers
- Padding and Stride
- Common Architectures (VGG, ResNet)


## From RNNs to Transformers

- Recurrent Neural Networks (RNNs) and LSTMs
- The bottleneck of sequential processing
- The Attention Mechanism
- The Transformer Architecture (Encoder-Decoder)


## Large Language Models (LLMs)

- Evolution of LLMs (BERT, GPT, Llama)
- Pre-training, Fine-tuning, and Inference
- Tokenization and Context Windows
- Prompt Engineering basics


## Deep Learning
## Generative AI and Modern Applications

- What is Deep Learning?
- Deep Neural Networks
- Convolutional Neural Networks (CNNs)
- Recurrent Neural Networks (RNNs)
- Autoencoders
- Generative Adversarial Networks (GANs)
- Text Generation (Next-token prediction)
- Retrieval-Augmented Generation (RAG)
- Introduction to Diffusion Models (Image Generation)
- Ethical considerations and Bias in GenAI
Empty file added data/.placeholder
Empty file.
Binary file not shown.
Loading