forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathPA1_template.Rmd
More file actions
123 lines (92 loc) · 5.09 KB
/
PA1_template.Rmd
File metadata and controls
123 lines (92 loc) · 5.09 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---
title: "Reproducible Research: Peer Assessment 1"
output:
html_document:
keep_md: true
---
## Loading and preprocessing the data
```{r loaddata,echo=TRUE}
columnClasses <- c("numeric", "character", "numeric")
activityData <- read.csv("activity.csv", colClasses = columnClasses)
```
## What is mean total number of steps taken per day?
1. Calculate the total number of steps taken per day.
```{r dailydata, echo=TRUE}
dailyActivityData <- aggregate(steps ~ date, activityData, sum)
```
2. Histogram of total steps taken per day.
```{r histogram, echo=TRUE}
par(mfrow = c(1, 1))
hist(dailyActivityData$steps, main = "Total Steps Taken Per Day", xlab = "Daily steps", col = "red")
```
3. Calculate the mean and median
```{r stats, echo=TRUE}
meanSteps <- mean(dailyActivityData$steps, na.rm = TRUE)
medianSteps <- median(dailyActivityData$steps, na.rm = TRUE)
```
Mean number of steps per day is `r format(meanSteps, big.mark=",", scientific = FALSE)`.
Median number of steps per day is `r format(medianSteps, big.mark=",", scientific = FALSE)`.
## What is the average daily activity pattern?
1. Time series plot of the 5-minute interval vs avg number of steps taken per day.
```{r timeseries, echo=TRUE}
stepActivityData <- aggregate(steps ~ interval, activityData, mean)
plot(stepActivityData$interval, stepActivityData$steps, type = "l", main = "Avg steps at each interval", xlab = "Interval", ylab = "Avg steps per day", col = "blue")
```
2. Which 5-minute interval, on average across all the days in the dataset, contains the maximum number of steps?
Interval `r format(stepActivityData[stepActivityData$steps == max(stepActivityData$steps),]$interval, big.mark=",", scientific = FALSE)` has the most number of average steps, `r format(max(stepActivityData$steps), big.mark=",", scientific = FALSE)`.
## Imputing missing values
1. Calculate and report the total number of missing values in the dataset
```{r missingdata, echo=TRUE}
nrow(activityData[is.na(activityData$steps),])
```
2. Impute missing data
Data will be imputed by using the mean for the interval in which the data is missing. As the data set has missing data for entire days, using the daily avg is not useful since it will not prvoide a value for those days in which data is missing completely.
3. Create a new data set with missing data
```{r impute, echo=TRUE}
naValues <- activityData[is.na(activityData$steps), c("date", "interval")]
assign("imputedActivityData", activityData)
for (naDate in unique(naValues$date)){
for (naInterval in naValues[naValues$date == naDate, "interval"]){
imputedActivityData[imputedActivityData$date == naDate & imputedActivityData$interval == naInterval, "steps"] <- stepActivityData[stepActivityData$interval == naInterval, "steps"]
}
}
```
4. Make a histogram of the total number of steps taken each day and Calculate and report the mean and median total number of steps taken per day.
```{r imputedhist, echo=TRUE}
dailyImputedActivityData <- aggregate(steps ~ date, imputedActivityData, sum)
hist(dailyImputedActivityData$steps, main = "Total Steps Taken Per Day", xlab = "Daily steps", col = "red")
imputedMeanSteps <- mean(dailyImputedActivityData$steps, na.rm = TRUE)
imputedMedianSteps <- median(dailyImputedActivityData$steps, na.rm = TRUE)
```
Do these values differ from the estimates from the first part of the assignment?
Mean? No, as this was the method used for imputation and the only NAs are NA for an entire day so the daily mean remains the same.
Mean number of steps per day (after imputting data) is `r format(imputedMeanSteps, big.mark=",", scientific = FALSE)`.
Median? Yes, the median is now equal to the mean.
Median number of steps per day (after imputting data) is `r format(imputedMedianSteps, big.mark=",", scientific = FALSE)`.
What is the impact of imputing missing data on the estimates of the total daily number of steps?
Little impact as the mean remains the same.
## Are there differences in activity patterns between weekdays and weekends?
1. Create a new factor variable
```{r weekendfun, echo=TRUE}
typeOfDay <- function(x, ...){
if (x == "Saturday" | x == "Sunday"){
"weekend"
} else {
"weekday"
}
}
imputedActivityData$typeOfDay <- sapply(sapply(weekdays(strptime(imputedActivityData[, "date"], "%Y-%m-%d")), typeOfDay), as.factor)
```
2. Panel plot of weekend vs weekday for avg. steps per across intervals
```{r plottingWeekendFun, echo=TRUE}
stepActivityData <- aggregate(steps ~ interval + typeOfDay, imputedActivityData, mean)
par(fig = c(0, 1, 0.35, 1))
plot(stepActivityData[stepActivityData$typeOfDay == "weekend", "interval"], stepActivityData[stepActivityData$typeOfDay == "weekend", "steps"], type = "l", main = "Avg steps at each interval", xlab = "", ylab = "", xaxt = "n", axes = FALSE, col = "blue")
axis(3, labels = FALSE)
axis(4)
box()
text(1200, 165, "Weekend")
par(fig = c(0, 1, 0, 0.65), new = TRUE)
plot(stepActivityData[stepActivityData$typeOfDay == "weekday", "interval"], stepActivityData[stepActivityData$typeOfDay == "weekday", "steps"], type = "l", main = "", xlab = "Interval", ylab = "Avg steps per day", col = "blue")
text(1200, 225, "Weekday")
```