-
Notifications
You must be signed in to change notification settings - Fork 19
Expand file tree
/
Copy pathR Fundamentals.Rmd
More file actions
739 lines (612 loc) · 27.1 KB
/
R Fundamentals.Rmd
File metadata and controls
739 lines (612 loc) · 27.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
---
title: "R Fundamentals"
author: "Tyler Hunt"
date: "January 17, 2018"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Using R packages
As you may have noted from the setup document R is a generalized language that harnesses the power of individual packages. These packages can be installed from CRAN or from github directly (using `devtools`). To install a package from CRAN place the package name in the `install.packages` function like so.
```{r eval=FALSE}
install.packages("devtools")
```
To install the package from github (usually development branches so be careful) just copy the username/repo for the R package.
```{r eval=FALSE}
library(dplyr)
install_github("hadley/devtools")
```
```{r, eval=FALSE}
install.packages(c("ggplot2", "knitr", "psych", "rmarkdown", "reshape2", "swirl"), dependencies=TRUE)
```
### The `library` function
Anytime you want to utilize a previously installed package, you must retreive it with `library()` when you begin a new R session. You do not need to reinstall packages each time you quit and restart your R session.
```{r}
library(ggplot2)
library(knitr)
library(psych)
library(rmarkdown)
library(reshape2)
library(swirl)
# these packages can now be used in our RStudio session!
```
### Getting help
The `?` symbol can be used to bring up the help pages:
```{r}
?update.packages
?read.csv
?summary
?describe
?plot
?ggplot
?geometric.mean
```
### 1.8 Variable assignment (`<-`)
Variables are special data structures that allow you to enter data into R. Objects are stored in R's memory and can be retrieved ("called") when you need them.
You define objects through `variable assignment` and they are stored in your `global environment`.
You define objects through variable assignment using the **assignment operator** `<-`. This is a "less than" `<` symbol immediately followed by a hyphen `-`
You will also see that `=` is frequently used in place of the assignment operator. This works the same, but we recommend using the assignment operator for consistency. Arguments and functions frequently use `=` and it is best to consistently assign your variables with `<-` until you understand the basics.
**Object definition/variable assignment requires three pieces of information**:
1) object_name
2) `<-`
3) definition/assignment
Let's define an object named `numeric_oject` and define it as the number 4:
```{r}
numeric_object <- 4
ls()
```
### The R global environment
We now have an object defined in our global environment!
Objects in R are stored in your global environment. `ls()` will list previously defined objects in your environment.
```{r}
ls() #There is currently nothing in our global environment
```
"Call" (retrieve) the data contained wihtin the object by typing its name into your script and running the line of code
```{r}
numeric_object # 4 is returned
```
Let's try another example, this time using character data. Note that character data is **always** contained within quotation marks `" "`
```{r}
welcome_object <- "Welcome to the D-Lab"
ls()
```
We now have two objects in our global environment. Call the data contained within the object by typing its name and running the line of code:
```{r}
welcome_object
```
### Naming R variables
Object names can be anything, but are always case sensitive. However, they cannot begin with a number and generally do not begin with symbols. However you choose to name your objects, be consistent and use brief descriptions of their contents.
Names must be __**unique**__. If you resuse an old name, the old definition will be overwritten.
Let's overwrite the object `welcome_object` from above.
```{r}
welcome_object <- "Welcome to Barrows Hall"
```
See how the definition of `welcome_object` changed in your global environment window? However, there are still only two objects in your global environment.
It also prints the new information when called:
```{r}
welcome_object
```
If we want to preserve the original object `welcome_object` we have to define a new object
```{r}
welcome_object2 <- "Welcome to the D-Lab in Barrows Hall"
ls()
```
We now have three objects defined in our global environment! This third object contains new information and did not overwrite the old oblect:
```{r}
welcome_object2
```
### Variable classes (`class`)
Each variable in R has a `class` that summarizes the type of the data stored within the object. We will talk more about data types below.
```{r}
class(numeric_object)
class(welcome_object)
class(welcome_object2)
```
### Removing variables (`rm`)
We remove individual variables from our environment with `rm()`. For example, to remove `numeric_object`, we type:
```{r}
rm(numeric_object)
ls()
```
See how `numeric_object` disappeared from our global environment?
We can also wipe the entire environment with `rm(list = ls())`
```{r}
rm(list=ls())
ls()
```
Now, all objects are gone from our global environment.
# **Challenge 1**
1. What do the following commands do?
```{r, eval=FALSE}
?"<-"
?ls
?dir
?rm
?iris
?mtcars
```
2. What is the three-piece recipe for variable definition?
3. Define two numeric variables
4. What does ls do? Where else do you see this information?
> NOTE: We will use the "iris" and "mtcars" datasets in Parts 2 and 3 of this workshop series.
Double question marks ?? will lead you to coding walkthroughs called "vignettes". These usually come with preloaded data and examples:
```{r}
??psych
```
You will often find that your questions are not answered by the help pages nor vignettes. In that case you should Google-search your question or error message along with the name of a free help website.
For example, to get help making boxplots, I might Google search "R how to make boxplots stack overflow"
You should take a few minutes sometime to browse the contents of these help sites and use their built-in search tools:
[UCLA idre](http://www.ats.ucla.edu/stat/r/)
[Quick-R](http://statmethods.net/)
[R-bloggers](https://www.r-bloggers.com/)
[Stack Overflow - R](http://stackoverflow.com/questions/tagged/r)
### Living in R (`getwd`)
Like in Unix, in R you are always in a directory. Your actions are all relative to that directory. You will find this useful for loading data from files and when you save your output it will be located here.
Figure out where you are with `getwd()`
```{r}
getwd()
```
Tell R you would like to set your working directory to the R-Fundamentals folder you just downloaded with `setwd()`. Follow the format as shown by `getwd()`
Substitute your computer name for "computer.name". The "R-Fundamentals-master" folder I downloaded from GitHub is located on the Desktop, so the file path would look like this:
```{r, eval=FALSE}
setwd("/Users/computer.name/Desktop/R-Fundamentals-master")
# or, simply click Session -> Set Working Directory -> Choose Directory
getwd()
```
Note the difference between Mac and PC users:
```{r, eval=FALSE}
setwd("/Users/computer.name/Desktop/R-Fundamentals-master") # Mac
setwd("C:/Users/computer.name/Desktop/R-Fundamentals-master") # Windows uses "C:/..."
```
You can view the contents of your working directory with `dir()`.
```{r}
dir()
```
# **Challenge 2**
1. Define two numeric objects
2. Define a character object
3. Remove an individual object using `rm`
4. Wipe your environment using `rm(list=ls())`
5. Set your working directory to your "Downloads" folder. Now set it back to the "R-Fundamentals" folder.
# Atomic data types in R
Numeric, character, and logical (aka "boolean") data types all exist at the atomic level. Normally this means that they cannot be broken down any further and are the raw inputs for commands in R. Other R variables are frequently built upon these atomic types.
### Numeric type
Numeric data are numbers and integers. You may also hear numeric data referred to as `float` or `double` data types. By default, R stores everything as `doubles` (64 bit floating point numbers) which makes R very memory hungry.
Define an object called `num` and then check its class
```{r}
num <- 2 * pi
num
class(num)
```
Standard mathematical operators apply to the creation of numeric data:
`+` `-` `*` `^` `**` `/` `%*% (matrix multiplication)` `%/% (integer division)` `%% (modular division)`
```{r}
5 + 2
5 - 2
5 * 2
5 ^ 2
5 ** 2 # same as ^
5 / 2
5 %*% 2
5 %/% 2
5 %% 2
```
### Character type
Character (aka string or text) data are always contained within quotation marks `" "`. Character handling in R is fairly close to character handling in a Unix terminal.
Let's create an object called `char` and define it with some character data. Then, check its class:
```{r}
char <- "This is character data"
char
class(char)
```
### Contatenating sentences (`paste`)
Use `paste()` to combine/concatenate character data. This will paste together separate words to form a sentence.
```{r}
char2 <- paste("Hey", "momma", "I'm", "a", "string")
char2
```
> NOTE: The following section on string operations can be skipped if time is an issue.
Blankspace is the default separater in the `paste()` function. If you don't want this and want the words to be smushed together, use the argument `sep = ""`
```{r}
char3 <- paste0("Hi", "momma", "I'm", "a", "string")
char3
```
Note here that R is not a zero-indexed language - lists begin at the number 1. The one and only element in this object is the sentence "Hey momma I'm a string".
Character data have some useful commands such as `grep`, `substr`, `strsplit`, and `gsub`. Use your help commands to learn more!
### Function arguments
Arguments go inside of the parentheses of R commands. Unnamed arguments are things like the number 4. `split = " "` is what is called a named argument. This argument goes inside the parentheses of another command `paste`.
```{r}
char4 <- "This/is/slash/separated/text"
char4
split1 <- strsplit(char4, split = "/")
split1 # a list is returned.
```
Most commands require one or two arguments to be defined in order for the command to properly run. You will find that commands are full of default and optional arguments that you can manipulate.
### Simple string subset example (`substr`)
`substr` lets you extract text from certain character positions in character data. Refer back to `char2`
```{r}
char2 <- paste("Hey", "momma", "I'm", "a", "string")
char2
```
If we want to extract just the first four characters of the `char2` object we type:
```{r}
substr(char2, 1, 4)
```
"Hello " (Hello + blankspace) is returned.
You can use `substr()` and the assignment operator `<-` to redefine the first four characters in `char2` with the word "Yes " followed by a blankspace
```{r}
substr(char2, 1, 4) <- "Yes "
char2
```
What changed?
### `gsub` example
You can also substitute with `gsub()`. Let's say we want to substitute all periods `.` in `char2` with `X`. We would type:
```{r}
char2
gsub(".", "X", char2)
```
What happened here? R thinks `.` is a special shorthand for "everything".
To be safe, put the period in brackets so that R views it as a subsetting operation and not "everything else". You will learn more about bracket notation during the subsetting portion in Part 2:
```{r}
gsub("[.]", "X", char2)
```
Still nothing happened... Why not? (hint: we don't have any periods in our sentence!). Let's choose another letter:
```{r}
gsub("[m]", "!", char2)
# All "m" letters have now been replaced with "X". Remember that the `char2` object has not changed because we did not reassign it!
char2 <- gsub("[m]", "!", char2)
char2
```
### Logical type
Logical (boolean) data are dichotomous TRUE/FALSE values. R internally stores `FALSE` as 0 and `TRUE` as 1. Define a logical object:
```{r}
bool_object <- TRUE
bool_object
class(bool_object)
```
We recommend to always spell out `TRUE` and `FALSE` instead of abbreviating them `T` and `F` which might be mistaken for previously defined variables - therefore, you should not save anything as `T`, `F`, or `TRUE` or `FALSE`! Thus, there are certain words that do not make good variable names. R has many reserved words that you should avoid using. See `?reserved` for more information.
Note that logical data also take on numeric properties. Remember that `TRUE` is stored as the numeral 1 and `FALSE` is stored as 0.
```{r}
bool_object + 1
TRUE - TRUE
TRUE + FALSE
FALSE - TRUE
```
### Logical tests
Logical tests are helpful in R if you want to check for equivalence. Logical tests compare two objects and return a logical output. This is useful for removing missing data and subsetting (you will learn more about this in Part 2). Note the use of the double equals `==` sign.
```{r, eval=FALSE}
?"=="
?"&"
?"|"
?"!"
```
```{r}
TRUE == TRUE
FALSE == FALSE
TRUE == FALSE
TRUE & TRUE # and
TRUE | FALSE # or
TRUE != FALSE # not
TRUE > FALSE
FALSE >= TRUE
```
# **Challenge 3**
1. Define a numeric object as 0 and check its class
2. Define a boolean object as "FALSE" and check its class
3. Use `==` to compare your numeric and boolean object. Are they equivalent? Why or why not?
4. Define a character object that uses `paste()` to combine more than one word into a sentence.
[Brunsdon and Comber 2015, page 15](https://us.sagepub.com/en-us/nam/an-introduction-to-r-for-spatial-analysis-and-mapping/book241031) offer a useful summary table of logical operators in R. These are useful for comparing two data variables to each other:
Logical operator | Description
---------------- | -----------
== | Equal
!= | Not equal
> | Greater than
< | Less than
>= | Greater than or equal
<= | Less than or equal
! | Not (goes in front of other expressions)
& | And (combines expressions)
`|` | Or (combines expressions)
[Brunsdon and Comber 2015, page 102](https://us.sagepub.com/en-us/nam/an-introduction-to-r-for-spatial-analysis-and-mapping/book241031) also offer an excellent summary table of logical functions in R.
Logical function | Description
---------------- | -----------
any(x) | `TRUE` if any in a vector of conditions `x` is true
all(x) | `TRUE` if all of a vector of conditions `x` is true
is.numeric(x) | `TRUE` if `x` contains a numeric value
is.character(x) | `TRUE` if `x` contains a character value
is.logical(x) | `TRUE` if `x` contains a true or false value
Often it is handy to see what types of data you are working with. Similar to `class()` we can see what data type an object is with `is.type`. A logical response is returned:
```{r}
is.numeric(num)
is.logical(bool_object)
is.logical(char2)
class(char2)
```
### Data testing and type coercion (`as.type`)
"Type coercion" refers to changing the data from one type to another. You can change data types with `as.` and then the data type you wish to convert to.
[Brunsdon and Comber 2015, page 102](https://us.sagepub.com/en-us/nam/an-introduction-to-r-for-spatial-analysis-and-mapping/book241031) also offer an excellent summary table of data types in R.
Type | Test | Conversion
----------- | ------------- | ----------
character | is.character | as.character
complex | is.complex | as.complex
double | is.double | as.double
expression | is.expression | as.expression
integer | is.integer | as.integer
list | is.list | as.list
logical | is.logical | as.logical
numeric | is.numeric | as.numeric
single | is.single | as.single
raw | is.raw | as.raw
### Type coercion example (`as.numeric`)
```{r}
# Create some character data
char_data <- "9"
class(char_data)
# Coerce this character data to numeric data type
as.numeric(char_data)
class(char_data)
# What happened here? Why did `char_data` not change classes? (hint: we did not overwrite the object!)
char.data <- as.numeric(char_data)
class(char.data)
char.data
```
### Type coercion example 2 (`as.integer` and `L`)
You can change numeric type to integer type using `as.integer` and `L`
```{r}
num <- 4
class(num)
int <- as.integer(num)
class(int)
# or
int <- 2L
class(int)
```
Now, create some character data and try to convert it to integer type:
```{r}
# Create a new object
char.num <- "three"
char.num #Note that the word three is contained within " "
class(char.num)
# What happens if we try to coerce character to numeric data type?
as.integer(char.num)
```
Why did this fail? Can R change character data to numbers? Why not? (hint: R has no protocol for automatically coerce words to numbers). As you can see, trying to coerce data types can lead to weird behavior sometimes.
# **Challenge 4**
1. Create a character object and check its type using `is.character` and `class`. What is the difference between these two methods?
2. Try to change ("coerce") this object to another data type using `as.integer`, `as.numeric`, or `as.logical`.
3. Create an object of class "integer". Remember, there are two ways to do this!
4. Why is 1 == "1" true? Why is -1 < FALSE true? Why is "one" < 2 false?
# 3. Data structures
There are several kinds of data structures in R. Data structures are collections of data objects (e.g., numeric, character, and logical vectors, lists, and matrices) that work together. These four are the most common:
1. vector
2. list
3. matrix
4. dataframe
### Vector
A **VECTOR** is an ordered group of the same kind of data. "Ordered" means that their position matters. Vectors are one-dimensional and homogenous, and are thus referred to by their type (e.g., character vector, numeric vector, logical vector).
Create a numeric vector by combining/concatenating elements with `c()`
```{r, eval=FALSE}
?c
```
```{r}
numeric_vector <- c(3, 5, 6, 5, 3)
numeric_vector
```
This numeric vector contains five elements.
You can also add items to a vector using `c()` and a comma `,` (as long as it is the same data type)
```{r}
numeric_vector2 <- c(numeric_vector, 78)
numeric_vector2
```
It doesn't matter what the datatype is for a vector, as long as it is all the same
```{r}
logical_vector <- c(TRUE, TRUE, FALSE, FALSE, TRUE)
logical_vector
logical_vector2 <- c(logical_vector, c(FALSE, FALSE, FALSE))
logical_vector2
```
You can also add and multiply vectors, but they need to be the same length
```{r}
logical_vector * logical_vector
```
What happens when we multiply `c(1,2,3,4) * c(TRUE, FALSE)` ?
```{r}
c(1,2,3,4) * c(TRUE, FALSE)
```
Since the number of elements in the first vector (four) is a multiple of the length of the second vector (two), the second vector gets concatenated against itself two times. This is called "recycling".
### Generating random vectors (`seq`)
You might need to create vectors that are sequences of numbers. You can do this via `seq`. Here we specify a vector from zero to the `length` of our object `logical_vector2` (eight). The argument `by=2` tells R that we want only the even numbers!
```{r, eval=FALSE}
?length
logical_vector2
length(logical_vector2)
seq(from=0,to=length(logical_vector2),by=2)
```
### Sequence where `by = 1` (`:`)
R also gives you a shorthand operator for creating sequences in whole number increments of 1. This is the colon symbol `:`
```{r}
0:8
sequence_object <- c(28:36)
sequence_object
0:length(logical_vector2)
```
### `set.seed`
You can also sample random groups of numbers. Use `set.seed()` to ensure that we all always get the same random draws from the parent universe, even on different machines
```{r}
?set.seed
?runif
?rnorm
?sample
```
Set the seed, and then choose our values:
```{r}
set.seed(1)
uniform <- runif(20, 3, 7) # 20 random samples from uniform distribution between the numbers 3 and 7
uniform
normal <- rnorm(20, 0, 1) # 20 random samples from the normal distribution with a mean of 0 and standard deviation of 1
normal
integer <- sample(5:10, 20, replace=TRUE) # 20 random samples from between the numbers 5 and 10. Note that `replace=TRUE` signifies that it is OK to reuse numbers already selected.
integer
character <- sample(c("Cat", "Dog", "Pig"), 20, replace=TRUE) # 20 random samples of character data
character
logical <- sample(c(TRUE, FALSE), 20, replace=TRUE) # 20 random samples of logical data
logical
```
### Lists
A **LIST** is an ordered group of data that are not of the same type. Lists are heterogenous. Instead of using `c()` like in vector creation, use `list()` to create a list:
```{r, eval=FALSE}
?list
list1 <- list(TRUE, "one", 1) # include three kinds of data: logical, character, and integer
list1
class(list1)
```
Lists are simple containers and are not additive or multiplicative like vectors and matrices are:
```{r, eval=FALSE}
list_object * list(FALSE, "zero", 0) # Error
```
### Matrix
**MATRICES** are homogenous like vectors. They are tables comprised of data all of the same type. Matrices are organized into rows and columns.
Use `matrix()` to create a matrix
```{r, eval=FALSE}
?matrix
```
We can also specify how we want the matrix to be organized using the `nrow` and `ncol` arguments:
```{r}
matrix1 <- matrix(1:12, nrow = 4, ncol = 3) # this makes a 4 x 3 matrix
matrix1
class(matrix1)
```
We can also coerce a vector to a matrix, because a vector is comprised of homogenous data of the same kind, just like a matrix is:
```{r}
# Create a numeric vector from 1 to 20
vec1 <- c(1:20)
vec1
class(vec1)
# Coerce this vector to a matrix with 10 rows and 2 columns:
matrix2 <- matrix(vec1, ncol=2)
matrix2
class(matrix2)
```
### Data frame
_It is worth emphasizing the importance of data frames in R!_ Inside R, a **DATA FRAME** is a list of equal-length vectors. These vectors can be of different types. Data frames are heterogenous like lists.
This is simply a spreadsheet!
Let's create a dataframe called `animals` using the vectors we already created:
```{r, eval=FALSE}
uniform
normal
integer
character
logical
```
We do this using `data.frame()`
```{r, eval=FALSE}
?data.frame
```
```{r}
animals <- data.frame(uniform, normal, integer, character, logical, stringsAsFactors=FALSE)
# NOTE: `stringsAsFactors=FALSE` means that R will NOT try to interpret character data as factor type. More on this below.
```
Take a peek at the `animals` data frame to see what it looks like:
```{r}
head(animals)
```
We can change the names of the columns by passing into it a vector with our desired names
```{r}
# Create a vector called `new_df_names` with the new column names:
new_df_names <- c("Weight", "Progress", "Height", "Type", "Healthy")
# Pass this vector into `colnames()`
colnames(animals) <- new_df_names
head(animals)
```
We can check the structure of our data frame via `str()`
```{r, eval=FALSE}
?str
str(animals)
```
Here, we can see that a data frame is just a list of equal-length vectors! In `str()`, our columns are displayed top to bottom, while the data are displayed left to right.
##### Learning about your data frame
```{r}
# View the dimensions (nrow x ncol) of the data frame:
dim(animals)
# View column names:
colnames(animals)
# View row names (we did not change these, so they default to character type)
rownames(animals)
class(rownames(animals))
```
##### Change column order
You can change the order of the columns by specifying their new order using `c()` within what is called "bracket notation" `[]`.
> This will be covered with the rest of subsetting in Part 2.
```{r}
animals <- animals[,c("Type", "Healthy", "Weight", "Height", "Progress")]
head(animals)
```
##### Factor data type
Factor data are categorical types used to make comparisons between data. Factors group the data by their "levels" (the different categories within a factor).
For example, in our `animals` dataframe, we can coerce "Type" from character to factor data type. Cat, Dog, and Pig are the factor levels. If we might want to compare heights and weights between Cat, Dog, and Pigs, we set the character "Type" vector to factor data type. We can do so with `factor()`.
The dollar sign operator `$` is used to call a single vector from a data frame. This will be discussed more in Part 2 along with the rest of subsetting.
```{r}
str(animals) # "Name" is character data type. See how each column name is preceded by `$`?
animals$Type <- factor(animals$Type)
str(animals) # "Name" is now factor data type!
```
Notice that R stores factors internally as integers and uses the character strings as labels. Also notice that by default R orders factors alphabetically and returns them when we call the "Type" vector.
```{r}
animals$Type
levels(animals$Type)
```
##### Changing factor "levels"
Each animal type (Cat, Dog, and Pig) within the factor `animals$Type` vector are the factor levels.
If we want to change how R stores the factor levels, we can specify their levels using the `levels()` argument. For example:
```{r}
animals$Type # Levels: Cat Dog Pig (default alphabetical sort)
# What if we want to change the factor level sort to Levels: Dog Pig Cat?
animals$Type <- factor(animals$Type, levels = c("Dog", "Pig", "Cat"))
animals$Type # Now when we call animals$Name, we can see that the levels have changed
```
> NOTE: we will load the `animals` data frame from file at the beginning of Part 2, so do not worry if your dataframe does not look identical!
# **Challenge 5**
Vectors
1. Create a vector of a sequence of numbers between 1 to 10.
2. Coerce that vector into a character vector
3. Add the element "11" to the end of the vector
4. Evaluate the `str` of the vector.
Lists
5. How does a list differ from an atomic vector?
6. Create three objects of different types and lengths and then combine them into a list names `x`.
7. If `x` is a list, what is the class of `x[1]`? How about `x[[1]]`? (this is a preview of Part 2).
Data frames
8. Create a 3x2 data frame called `basket`. List the name of each fruit in the first column and its price in teh second column.
9. Now give your dataframe appropriate column and row names.
10. We can add a new column using `$` (this will be covered with data subsetting in Part 2). Can you guess how to add a third column called "Color", that shows the color of each fruit?
Basket example solution:
```{r}
f <- factor(c("Apple", "Orange", "Pear"))
p <- c(10, 28, 36)
basket <- data.frame(f, p)
basket
colnames(basket) <- c("Fruit", "Price")
basket
basket$Color <- factor(c("Red", "Orange", "Green"))
str(basket)
basket
```
# Saving your work (`write.csv`)
It is always good to save your work. Saving a dataframe as a .CSV file is a convenient way to store it for future use. **Anything saved will be placed into your working directory!**
The syntax looks like this:
`write.csv(object_name, "nameOfFile.csv", row.names=TRUE)`
#**Challenge 6**
1. Save the "animals data frame as a .CSV" file.
2. How many basic arguments should you use to save using the `write.csv` function?
- Check your working directory to see if it worked!
```{r, eval=FALSE}
?write.csv
```
# Acknowledgements
- [Wickham H. 2014. Advanced R](http://adv-r.had.co.nz/)
- [Lander J. 2013. R for everyone: Advanced analytics and graphics](http://www.jaredlander.com/r-for-everyone/)
- [Matloff N. 2011. The art of R programming: A tour of statistical software design](https://www.nostarch.com/artofr.htm)
- [Brunsdon C, Comber L. 2015. An Introduction to R for Spatial Analysis and Mapping](https://us.sagepub.com/en-us/nam/an-introduction-to-r-for-spatial-analysis-and-mapping/book241031)
- Contributions by Evan Muzzall, Rochelle Terman, Dillon Niederhut, Sam Abdel-Ghaffar
Far more complicated plots are also possible. In this example we are using a function to demonstrate how confidence intervals work.