Skip to content

yhecht/Harvardx_Data_Science_Professional_Certificate_Module_2_Visualization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Harvardx Data Science Professional Certificate

This repository includes my course notes, assessments, and resources from Module 2: Visualization of the 17-month-long Harvardx Professional Certificate Program in Data Science.

Module 2: Data Science - Visualization

What I Learned:

Section 1: Data Visualization, Distributions, Quantiles, Percentiles & Boxplots: Exploratory Data Analysis

  • Understood importance of data visualization for communicating data-driven findings
  • Used distributions to summarize data
  • Used average & standard deviation to understand normal distribution
  • Assessed how well a normal distribution fits data using a quantile-quantile plot
  • Interpreted data from a boxplot

Section 2: ggplot2, Customizing Plots

  • Used ggplot2 to create data visualizations in R
  • Explained what data components of a graph is
  • Identified geometry components of a graph and knew when to use which type of geometry
  • Explained what the aesthetic mapping component of a graph is
  • Understand and selected appropriate scale components of a graph

Section 3: Summarizing with dplyr

  • Understood the importance of summarizing data in exploratory data analysis.
  • Used "summarize" verb in dplyr to facilitate summarizing data
  • Used "group_by" verb in dplyr to facilitate summarizing data
  • Accessed values using the dot placeholder
  • Use "arrange" to examine data after sorting

Section 4: Gapminder

  • Use effective data visualization to convey data-based trends
  • Applied ggplot2 techniques from the previous section to answer questions using data
  • Understood how fixed scales across plots can ease comparisons
  • Modified graphs to improve data visualization

Section 5: Data Visualization Principles

  • Understood basic principles of effective data visualization
  • Understood importance of keeping your goal in mind when deciding on a visualization approach
  • Understood principles for encoding data, including position, aligned lengths, angles, area, brightness & color
  • Knew when to include the number zero in visualizations
  • Used techniques to ease comparisons, such as using common axes, putting visual cues to be compared adjacent to one another & using color effectively

Courses in this program:

  1. Data Science: R Basics
  2. Data Science: Visualization
  3. Data Science: Probability
  4. Data Science: Inference & Modeling
  5. Data Science: Productivity Tools
  6. Data Science: Wrangling
  7. Data Science: Linear Regression
  8. Data Science: Machine Learning
  9. Data Science: Capstone

What I Learned:

  • Fundamental R programming skills
  • Statistical concepts such as probability, inference, and modeling and how to apply them in practice
  • Gained experience with the tidyverse, including data visualization with ggplot2 and data wrangling with dplyr
  • Became familiar with essential tools for practicing data scientists such as Unix/Linux, git and GitHub, and RStudio
  • Implemeneted machine learning algorithms
  • In-depth knowledge of fundamental data science concepts through motivating real-world case studies

The course, including its assessment questions and data sets were provided by Rafael A Irizarry, Professor of Biostatistics at Harvard Chan School of Public Health and Professor of Biostatistics and Computational Biology at the Dana-Farber Cancer Institute.

About

Using ggplot2, creating custom plots, summarizing with dplyr, data visualization principles, communicating data-driven findings

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages