This module provides an overview of the Design of Machine Learning Systems embedded in data-intensive products and applications. It covers the fundamental components of the infrastructure, systems, and methods necessary to implement and maintain Machine Learning (ML) models in production. In short, we will learn techniques for building an ML model factory.
The module has two components:
- A discussion of the main issues and challenges faced in production, together with some approaches to address them.
- A live lab with demonstrations of implementation techniques.
The module covers the following areas:
- Data engineering.
- Feature engineering.
- Hyperparameter tuning.
- Model deployment.
- Model explainability.
- Logging, experiment tracking, and monitoring.
We will discuss the tools and techniques required to do the above in good order and at scale. However, we will not discuss the inner workings of models, advantages, and so on. We will also not discuss the theoretical aspects of feature engineering or hyperparameter tuning. We will focus on tools and reproducibility.
This module follows the contents of Desinging Machine Learning Systems, by Chip Huyen.
By the end of this module, participants will be able to:
- Describe the main components of a machine learning system.
- Explain the infrastructure required to train and test models in production.
- Implement an experiment tracking system and logging.
- Contrast and evaluate different approaches to storing and manipulating data.
- Design data flows and processes to automate the construction of ML models.
Questions can be submitted to the #cohort-8-help channel on Slack.
-
Technical Facilitator
-
Learning Support Team
This module will include live learning sessions and optional, asynchronous work periods. During live learning sessions, the Technical Facilitator will introduce and explain key concepts and demonstrate core skills. Learning is facilitated during this time. Before and after each live learning session, the instructional team will be available to answer questions about the module's core concepts. Optional work periods are to be used to seek help from peers, the Learning Support team, and to work through the homework and assignments in the learning module, with access to live help. Content is not facilitated, but rather, this time should be driven by participants. We encourage participants to come to these work periods with questions and problems to work through.
Participants are encouraged to engage actively during the learning module. The key to developing the core skills in each learning module is through practice. The more participants engage in coding alongside the instructional team and apply these skills in each module, the more likely they are to solidify them.
| Session | Date | Topic |
|---|---|---|
| 1 | Tue., Jan. 13, 2026 | ML System Design |
| 2 | Wed., Jan. 14, 2026 | Data Engineering Fundamentals |
| 3 | Thur., Jan. 15, 2026 | Working with Training Data |
| -- | Fri., Jan. 16, 2026 | Work Period |
| -- | Sat., Jan. 17, 2026 | Work Period |
| -- | Sun., Jan. 18, 2026 | Submission deadline for Quizzes 1-3 |
| -- | Mon., Jan. 19, 2026 | Submission deadline for Assignment 1 |
| 4 | Tue., Jan. 20, 2026 | Feature Engineering |
| 5 | Wed., Jan. 21, 2026 | Model Development and Evaluation |
| 6 | Thur., Jan. 22, 2026 | Model Explanations and Monitoring |
| -- | Fri., Jan. 23, 2026 | Work Period |
| -- | Sat., Jan. 24, 2026 | Work Period |
| -- | Sun., Jan. 25, 2026 | Submission deadline for Quizzes 4-6 |
| -- | Mon., Jan. 26, 2026 | Submission deadline for Assignment 2 |
- Participants are expected to have completed Shell, Git, and Python learning modules.
- Participants are encouraged to ask questions and collaborate with others to enhance their learning experience.
- Participants must have a computer and an internet connection to participate in online activities.
- Participants must have VSCode installed with the following extensions: - Jupyter - Python
- Participants must install Docker as this module implements a Docker backend that will run a PostgreSQL server. This is intended to mimic a production-like environment. Participants may use SQLite if Docker is not an option.
- Participants must not use generative AI such as ChatGPT to generate code in order to complete assignments. It should be used as a support tool to help you find answers to questions you may have.
- We expect participants to have completed the steps in the onboarding repo.
- We encourage participants to default to having their camera on at all times, and turning the camera off only as needed. This will greatly enhance the learning experience for all participants and provide real-time feedback for the instructional team.
Your performance on this module will be assessed using six quizzes and two assignments.
Quizzes will help you build key concepts in data science, data engineering, and machine learning engineering. Historically, learners take 5-10 minutes to complete each quiz, achieving an average score of +80%.
- Each quiz will contain material from each live learning session.
- You will receive a link to each quiz during the respective live learning session. The links are personalized; please do not share them. If you did not receive a link, contact any member of the course delivery team. Each quiz will contain approximately 10 questions of various types, including true/false, multiple-choice, and simple selection.
- All quizzes are mandatory and should be submitted by their due date.
- The quizzes will remain open until their respective due dates, after which you will not have access to them.
Assignments will help you develop coding and debugging skills. They will cover foundational skills and will extend to advanced concepts. We recommend that you attempt all assignments and submit your work, even if it is incomplete (partial submissions will earn partial marks).
-
Each assignment should be submitted using the usual method in DSI via a Pull Request.
-
The assignments and their respective rubrics are:
All participants will receive a pass or fail mark. For this course, a score of 60 points is required to receive a "pass" mark. The score will be determined as follows:
- Quizzes' average score - 60%
- Assignment 1 - 20%
- Assignment 2 - 20%
Assignments' assessment can be transformed into a numeric grade using:
- Complete - 100 points
- Incomplete / Partially Complete - 50 points
- Missing / Not submitted - 0 points
For example, a learner with the following grades would receive "pass":
- Quizzes 80
- Assignment 1 - Complete (100)
- Assignment 2 - Incomplete (50)
- (0.6 * 80) + (0.2 * 100) + (0.2 * 50) = 48 + 20 + 10 = 78 > 60
A different learner with grades as shown below would receive "fail":
- Quizzes 80
- Assignment 1 - Incomplete (50)
- Assignment 2 - Missing (0)
- (0.6 * 80) + (0.2 * 50) + 0 = 48 + 10 + 0 = 58 < 60
Books that mainly discuss learning methods, their applications, and limitations.
- Burkov. The Hundred-Page Machine Learning Book. A practical and concise overview of ML methods. It has good coverage of the learning process and a good focus on the classical algorithms. It makes a good desktop reference.
- Goodfellow, Bengio, and Courville. Deep Learning. An in-depth discussion of deep learning methods.
- James, Witten, Hastie, Tibishirani, and Taylor. An Introduction to Statistical Learning with Applications in Python. This is the introductory and updated version of Hastie and Tibishirani's classic book. It contains deep discussions, extensive examples, and formal arguments.
- Witten, Frank, Hall, Pal, and Foulds. Data Mining: Practical Machine Learning Tools and Techniques. A great introductory textbook, written from a practical perspective. The book has been updated over the year and it covers a wide spectrum of models (not only Neural Nets/Deep Learners).
Books that discuss how to put learning methods in production, including training, deployment, monitoring. As well as more architecture-oriented references.
- Burkov. Machine Learning Engineering. Similar to Burkov's book above, but for ML Engineering.
- Khun and Silge. Tidy Modelling with R. The book implements its examples in R, but the discussion about models, evaluation techniques, and pipelines is highly worthwhile. Both, Julia Silge and Max Khun, are brilliant data scientists and great communicators.
- Kleppmann. Designing Data-Intensive Applications. A great in-depth resource about the techniques for building data-intensive applications.
References for specific topics like Feature Engineering, Conformal Prediction, and Model Interpretability.
- Khun and Johnson. Feature Engineering and Selection: A Practical Approach for Predictive Models. Discusses feature engineering methods, their evaluation, and ideas on how to implement them. This one does not contain code, only ideas.
- Manokhin. Practical Guide to Applied Conformal Prediction in Python. Discusses conformal prediction methods with Python, which will allow you to clearly define uncertainty of the predictions that you obtain from ML methods.
- Molnar. Interpretable Machine Learning. An overview of model-agnostic explainability/interpretability methods.
- Chip Huyen's DMLS repo on GitHub
- Scikit-Learn User Guide
- Dask Documentation
- ML Flow Documentation
- Docker Installation
- Docker Cheatsheet
.
├── .github
├── 01_materials
├── 02_activities
├── 03_instructional_team
├── 04_this_cohort
├── 05_src
├── .gitignore
├── LICENSE
├── SETUP.md
├── pyproject.toml
└── README.md- .github: Contains issue templates and pull request templates for the repository.
- materials: Module slides and interactive notebooks (.ipynb files) used during learning sessions.
- activities: Contains graded assignments, exercises, and homework to practice concepts covered in the learning module.
- instructional_team: Resources for the instructional team.
- this_cohort: Additional materials and resources for cohort three.
- src: Source code, databases, logs, and required dependencies (requirements.txt) needed during the module.
- .gitignore: Files to exclude from this folder, specified by the Technical Facilitator
- LICENSE: The license for this repository.
- SETUP.md: Contains the steps required to set up this repo for the module.
- pyproject.toml: Tells Python which packages this repo needs to run.
- README.md: This file.