This repository includes my course notes, assessments, and resources from Module 2: Visualization of the 17-month-long Harvardx Professional Certificate Program in Data Science.
Section 1: Data Visualization, Distributions, Quantiles, Percentiles & Boxplots: Exploratory Data Analysis
- Understood importance of data visualization for communicating data-driven findings
- Used distributions to summarize data
- Used average & standard deviation to understand normal distribution
- Assessed how well a normal distribution fits data using a quantile-quantile plot
- Interpreted data from a boxplot
- Used ggplot2 to create data visualizations in R
- Explained what data components of a graph is
- Identified geometry components of a graph and knew when to use which type of geometry
- Explained what the aesthetic mapping component of a graph is
- Understand and selected appropriate scale components of a graph
- Understood the importance of summarizing data in exploratory data analysis.
- Used "summarize" verb in dplyr to facilitate summarizing data
- Used "group_by" verb in dplyr to facilitate summarizing data
- Accessed values using the dot placeholder
- Use "arrange" to examine data after sorting
- Use effective data visualization to convey data-based trends
- Applied ggplot2 techniques from the previous section to answer questions using data
- Understood how fixed scales across plots can ease comparisons
- Modified graphs to improve data visualization
- Understood basic principles of effective data visualization
- Understood importance of keeping your goal in mind when deciding on a visualization approach
- Understood principles for encoding data, including position, aligned lengths, angles, area, brightness & color
- Knew when to include the number zero in visualizations
- Used techniques to ease comparisons, such as using common axes, putting visual cues to be compared adjacent to one another & using color effectively
- Data Science: R Basics
- Data Science: Visualization
- Data Science: Probability
- Data Science: Inference & Modeling
- Data Science: Productivity Tools
- Data Science: Wrangling
- Data Science: Linear Regression
- Data Science: Machine Learning
- Data Science: Capstone
- Fundamental R programming skills
- Statistical concepts such as probability, inference, and modeling and how to apply them in practice
- Gained experience with the tidyverse, including data visualization with ggplot2 and data wrangling with dplyr
- Became familiar with essential tools for practicing data scientists such as Unix/Linux, git and GitHub, and RStudio
- Implemeneted machine learning algorithms
- In-depth knowledge of fundamental data science concepts through motivating real-world case studies
The course, including its assessment questions and data sets were provided by Rafael A Irizarry, Professor of Biostatistics at Harvard Chan School of Public Health and Professor of Biostatistics and Computational Biology at the Dana-Farber Cancer Institute.