-
Notifications
You must be signed in to change notification settings - Fork 5
Expand file tree
/
Copy path135-datatable.Rmd
More file actions
116 lines (79 loc) · 1.48 KB
/
135-datatable.Rmd
File metadata and controls
116 lines (79 loc) · 1.48 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
# data.table
```{r}
library(data.table)
library(nycflights13)
```
## Read into data.table
```{r, eval=FALSE}
data.table::fread()
```
## Convert to data.table
```{r}
flights <- as.data.table(flights)
```
```{r}
class(flights)
```
## Working with data.table
DT[i, j, k]
DT[filter by row, calculations on columns, groupby variable]
"Take DT, subset rows using i, perfrom calulations with j, groupby k"
```{r}
flights[month == 11]
```
```{r}
flights[month == 11 | month == 12]
```
```{r}
flights[month %in% c(11, 12)]
```
```{r}
flights[, arr_delay]
```
### Select columns list() .()
```{r}
flights[, list(arr_delay)]
```
```{r}
flights[, list(arr_delay, dep_delay)]
```
using `list()` is exactly ths same as using `.()`
```{r}
flights[, .(arr_delay, dep_delay)]
```
### Functions on columns
```{r}
flights[, sum((arr_delay + dep_delay), na.rm = TRUE)]
```
```{r}
flights[origin == "JFK" & month == 6,
.(m_arr = mean(arr_delay, na.rm = TRUE),
m_dep = mean(dep_delay, na.rm = TRUE))]
```
```{r}
flights[origin == "JFK" & month == 6,.(m_arr = mean(arr_delay, na.rm = TRUE), m_dep = mean(dep_delay, na.rm = TRUE))]
```
### .N (counts)
```{r}
flights[origin == 'JFK' & month == 6, .N]
```
### Using dplyr
```{r}
library(dplyr)
```
```{r}
flights %>%
filter(origin == 'JFK', month == 6) %>%
summarize(count = n())
```
```{r}
flights[, .N, by = origin]
```
```{r}
flights[, .N, origin]
```
```{r}
flights %>%
group_by(origin) %>%
summarize(count = n())
```