Welcome to the repository for the IMSA Data Science Course! This course is designed to introduce MD students to the fundamentals of programming, statistics, data science, and machine learning, with a focus on applications in the medical field. Through hands-on exercises and real-world examples, students will gain the skills necessary to analyze and interpret medical data effectively.
The repository is organized into the following directories:
-
data
Contains datasets used in lectures, homework assignments, and projects. These include medical datasets for practical analysis and machine learning exercises. -
HW
Includes homework assignments designed to reinforce key concepts and provide practical experience in programming, statistics, and machine learning. -
lectures
Contains Jupyter notebooks, slides, and other materials from course lectures, covering topics such as programming basics, statistical methods, data visualization, and machine learning techniques. -
readings
A collection of recommended readings, articles, and resources to complement the lecture materials and deepen your understanding of course topics.
This course is divided into the following modules:
-
Module 1: Programming Basics
Introduction to Python programming, control structures, functions, and essential libraries like NumPy and Pandas. -
Module 2: Statistics
Study designs, data visualization, statistical distributions, hypothesis testing, and advanced methods like regression and PCA. -
Module 3: Machine Learning
Data cleaning, supervised/unsupervised learning, model evaluation, and practical machine learning applications in medicine. -
Module 4: Advanced Topics
Bioinformatics and medical image analysis, including genomic data and image segmentation tasks. -
Capstone Project
A culminating project where students apply their knowledge to analyze medical datasets and develop predictive models.
- Clone this repository:
git clone https://github.com/armankarshenas/TUMS2025