STA 326 2.0 Programming and Data Analysis with RLesson 1: Introduction to RDr Thiyanga Talagala2020-02-111 / 51

What is R?

R is a software environment for statistical computing and graphics
Language designers: Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand
Parent language: S
The latest R version 3.6.2 has been released on 2019-12-12

description of the image

2 / 51

Why R?

Free
Powerful: Over 14600 contributed packages on the main repository (CRAN), as of July 2019, provided by top international researchers and programmers
Flexible: It is a language, and thus allows you to create your own solutions
Community: Large global community friendly and helpful, lots of resources

3 / 51

R environment4 / 51

The RStudio IDE5 / 51

The RStudio IDE

Image Credit: Clastic Detritus

6 / 51

R and RStudio

Image Credit: Clastic Detritus

7 / 51

R and RStudio

"If R were an airplane, RStudio would be the airport, providing many, many supporting services that make it easier for you, the pilot, to take off and go to awesome places. Sure, you can fly an airplane without an airport, but having those runways and supporting infrastructure is a game-changer."

-- Julie Lowndes

Image Credit: Clastic Detritus

8 / 51

Create a new project9 / 51

10 / 51

11 / 51

12 / 51

13 / 51

14 / 51

15 / 51

R Console

7+1

[1] 8

rnorm(10)

 [1] -0.3805010 -2.2459140 -0.5516191  1.0157288 -0.5636009  0.0912911
 [7] -0.3473837  0.8967408  0.7094069 -0.3845299

16 / 51

R Console

7+1

[1] 8

rnorm(10)

 [1] -0.3805010 -2.2459140 -0.5516191  1.0157288 -0.5636009  0.0912911
 [7] -0.3473837  0.8967408  0.7094069 -0.3845299

Variable assignment

a <- rnorm(10)
a

 [1]  0.7601029 -0.4016582  1.2890499 -0.4854536  1.5334595 -0.8243906
 [7]  0.3579681  0.5746972 -0.7215895 -0.7779021

16 / 51

R Console

7+1

[1] 8

rnorm(10)

 [1] -0.3805010 -2.2459140 -0.5516191  1.0157288 -0.5636009  0.0912911
 [7] -0.3473837  0.8967408  0.7094069 -0.3845299

Variable assignment

a <- rnorm(10)
a

 [1]  0.7601029 -0.4016582  1.2890499 -0.4854536  1.5334595 -0.8243906
 [7]  0.3579681  0.5746972 -0.7215895 -0.7779021

b <- a*100
b

 [1]  76.01029 -40.16582 128.90499 -48.54536 153.34595 -82.43906  35.79681
 [8]  57.46972 -72.15895 -77.79021

16 / 51

Data permanency

ls() can be used to display the names of the objects which are currently stored within R.
The collection of objects currently stored is called the workspace

ls()

[1] "a" "b"

17 / 51

Data permanency

ls() can be used to display the names of the objects which are currently stored within R.
The collection of objects currently stored is called the workspace

ls()

[1] "a" "b"

To remove objects the function rm is available
- remove all objects rm(list=ls())
- remove specific objects rm(x, y, z)

rm(a)
ls()

[1] "b"

rm(list=ls())
ls()

character(0)

17 / 51

18 / 51

At the end of an R session, if save: the objects are written to a file called .RData in the current directory, and the command lines used in the session are saved to a file called .Rhistory

18 / 51

When R is started at later time from the same directory

19 / 51

When R is started at later time from the same directory it reloads the associated workspace and commands history.

20 / 51

21 / 51

When R is started at later time from the same directory it reloads the associated workspace and commands history.

21 / 51

Comment your code

Each line of a comment should begin with the comment symbol and a single space: # .

rnorm(10) # This is a comment

 [1]  0.4310973  2.4025568 -0.4692903  1.2052056 -0.3137667  1.0006081
 [7]  2.0435857 -0.4941967 -1.5253943 -0.8166049

sum(1:10) # 1+2

[1] 55

22 / 51

Style Guide

Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread. -- Hadley Wickham

sum(1:10)#Bad commenting style

[1] 55

sum(1:10) # Good commenting style

[1] 55

Also, use commented lines of - and = to break up your file into easily readable sub-sections.

# Read data ----------------
# Plot data ----------------

To learn more read Hadley Wickham's Style guide.

23 / 51

Objects in R

R is an object-oriented language.

24 / 51

Objects in R

R is an object-oriented language.
An object in R is anything (data structures, functions, etc., that can be assigned to a variable).

24 / 51

Objects in R

R is an object-oriented language.
An object in R is anything (data structures, functions, etc., that can be assigned to a variable).

Let's take a look of some common types of objects.

24 / 51

Objects in R

R is an object-oriented language.
An object in R is anything (data structures, functions, etc., that can be assigned to a variable).

Let's take a look of some common types of objects.

Data structures are the ways of arranging data.
- You can create objects, using the left pointing arrow <-

24 / 51

Objects in R

R is an object-oriented language.
An object in R is anything (data structures, functions, etc., that can be assigned to a variable).

Let's take a look of some common types of objects.

Data structures are the ways of arranging data.
- You can create objects, using the left pointing arrow <-
Functions tell R to do something.
- A function may be applied to an object.
- Result of applying a function is usually an object too.
- All function calls need to be followed by parentheses.

a <- 1:20 # data structure
sum(a) # sum is a function applied on a

[1] 210

help.start() # Some functions work on their own.

24 / 51

Getting help with functions and features

R has inbuilt help facility

Method 1

help(rnorm)

For a feature specified by special characters such as for, if, [[

help("[[")

Search the help files for a word or phrase.

help.search(‘weighted mean’)

Method 2

?rnorm

??rnorm

25 / 51

Data structures

Image Credit: venus.ifca.unican.es

26 / 51

Data structures

Data structures differ in terms of,

Type of data they can hold
How they are created
Structural complexity
Notation to identify and access individual elements

Image Credit: venus.ifca.unican.es

27 / 51

1. Vectors28 / 51

Vectors

Vectors are one-dimensional arrays that can hold numeric data, character data, or logical data.
Combine function c() is used to form the vector.
Data in a vector must only be one type or mode (numeric, character, or logical). You can’t mix modes in the same vector.

Vector assignment

Syntax

vector_name <- c(element1, element2, element3)

x <- c(5, 6, 3, 1 , 100)

assignment operator ('<-'), '=' can be used as an alternative.
c() function

What will be the output of the following code?

y <- c(x, 500, 600)

29 / 51

Types and tests with vectors

first_vec <- c(10, 20, 50, 70)
second_vec <- c("Jan", "Feb", "March", "April")
third_vec <- c(TRUE, FALSE, TRUE, TRUE)
fourth_vec <- c(10L, 20L, 50L, 70L)

To check if it is a

vector: is.vector()

is.vector(first_vec)

[1] TRUE

character vector: is.character()

is.character(first_vec)

[1] FALSE

30 / 51

double: is.double()

is.double(first_vec)

[1] TRUE

integer: is.integer()

is.integer(first_vec)

[1] FALSE

logical: is.logical()

is.logical(first_vec)

[1] FALSE

length

length(first_vec)

[1] 4

31 / 51

Coercion

Vectors must be homogeneous. When you attempt to combine different types they will be coerced to the most flexible type so that every element in the vector is of the same type.

Order from least to most flexible

logical --> integer --> double --> character

a <- c(3.1, 2L, 3, 4, "GPA") 
typeof(a)

[1] "character"

anew <- c(3.1, 2L, 3, 4)
typeof(anew)

[1] "double"

32 / 51

Explicit coercion

Vectors can be explicitly coerced from one class to another using the as.* functions, if available. For example, as.character, as.numeric, as.integer, and as.logical.

vec1 <- c(TRUE, FALSE, TRUE, TRUE)
typeof(vec1)

[1] "logical"

vec2 <- as.integer(vec1)
typeof(vec2)

[1] "integer"

vec2

[1] 1 0 1 1

Why does the below output NAs?

x <- c("a", "b", "c")
as.numeric(x)

Warning: NAs introduced by coercion

[1] NA NA NA

33 / 51

x1 <- 1:3
x2 <- c(10, 20, 30)
combinedx1x2 <- c(x1, x2)
combinedx1x2

[1]  1  2  3 10 20 30

34 / 51

x1 <- 1:3
x2 <- c(10, 20, 30)
combinedx1x2 <- c(x1, x2)
combinedx1x2

[1]  1  2  3 10 20 30

class(x1)

[1] "integer"

class(x2)

[1] "numeric"

class(combinedx1x2)

[1] "numeric"

34 / 51

x1 <- 1:3
x2 <- c(10, 20, 30)
combinedx1x2 <- c(x1, x2)
combinedx1x2

[1]  1  2  3 10 20 30

class(x1)

[1] "integer"

class(x2)

[1] "numeric"

class(combinedx1x2)

[1] "numeric"

If you combine a numeric vector and a character vector

y1 <- c(1, 2, 3)
y2 <- c("a", "b", "c")
c(y1, y2)

[1] "1" "2" "3" "a" "b" "c"

34 / 51

Name elements in a vector

You can name elements in a vector in different ways. We will learn two of them.

When creating it

x1 <- c(a=1991, b=1992, c=1993)
x1

##    a    b    c 
## 1991 1992 1993

Modifying the names of an existing vector

x2 <- c(1, 5, 10)
names(x2) <- c("a", "b", "b")
x2

##  a  b  b 
##  1  5 10

Note that the names do not have to be unique.

35 / 51

To remove names of a vector

Method 1

unname(x1); x1

[1] 1991 1992 1993

   a    b    c 
1991 1992 1993

Method 2

names(x2) <- NULL; x2

[1]  1  5 10

What will be the output of the following code?

v <- c(1, 2, 3)
names(v) <- c("a")
v

36 / 51

Simplifying vector creation

colon : produce regular spaced ascending or descending sequences.

 10:16

[1] 10 11 12 13 14 15 16

-0.5:8.5

 [1] -0.5  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5

37 / 51

Simplifying vector creation

colon : produce regular spaced ascending or descending sequences.

 10:16

[1] 10 11 12 13 14 15 16

-0.5:8.5

 [1] -0.5  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5

sequence: seq(initial_value, final_value, increment)

seq(1,11)

 [1]  1  2  3  4  5  6  7  8  9 10 11

seq(1, 11, length.out=5)

[1]  1.0  3.5  6.0  8.5 11.0

seq(0, 11, by=2)

[1]  0  2  4  6  8 10

37 / 51

repeats rep()

rep(9, 5)

[1] 9 9 9 9 9

rep(1:4, 2)

[1] 1 2 3 4 1 2 3 4

rep(1:4, each=2) # each element is repeated twice

[1] 1 1 2 2 3 3 4 4

rep(1:4, times=2) # whole sequence is repeated twice

[1] 1 2 3 4 1 2 3 4

rep(1:4, each=2, times=3)

 [1] 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4

rep(1:4, 1:4)

 [1] 1 2 2 3 3 3 4 4 4 4

rep(1:4, c(4, 1, 4, 2))

 [1] 1 1 1 1 2 3 3 3 3 4 4

38 / 51

Logical operators

c(1, 2, 3) == c(10, 20, 3)

[1] FALSE FALSE  TRUE

c(1, 2, 3) != c(10, 20, 3)

[1]  TRUE  TRUE FALSE

1:5 > 3

[1] FALSE FALSE FALSE  TRUE  TRUE

1:5 < 3

[1]  TRUE  TRUE FALSE FALSE FALSE

<= less than or equal to
>= greater than or equal to
| or
& and

39 / 51

Operators: `%in%` - in the set

a <- c(1, 2, 3)
b <- c(1, 10, 3)
a%in%b

[1]  TRUE FALSE  TRUE

x <- 1:10
y <- 1:3
x

 [1]  1  2  3  4  5  6  7  8  9 10

[1] 1 2 3

x %in% y

 [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

y %in% x

[1] TRUE TRUE TRUE

40 / 51

Vector arithmetic

operations are performed element by element.

c(10, 100, 100) + 2 # two is added to every element in the vector

[1]  12 102 102

41 / 51

Vector arithmetic

operations are performed element by element.

c(10, 100, 100) + 2 # two is added to every element in the vector

[1]  12 102 102

operations between two vectors

v1 <- c(1, 2, 3); v2 <- c(10, 100, 1000)
v1 + v2

[1]   11  102 1003

41 / 51

Vector arithmetic

operations are performed element by element.

c(10, 100, 100) + 2 # two is added to every element in the vector

[1]  12 102 102

operations between two vectors

v1 <- c(1, 2, 3); v2 <- c(10, 100, 1000)
v1 + v2

[1]   11  102 1003

Add two vectors of unequal length

longvec <- seq(10, 100, length=10); shortvec <- c(1, 2, 3, 4, 5)
shortvec + longvec

 [1]  11  22  33  44  55  61  72  83  94 105

What will be the output of the following code?

first <- c(1, 2, 3, 4); second <- c(10, 100)
first * second

41 / 51

Other vector operations

Please see the cheatsheet and course materials STA Data Analysis II

42 / 51

Missing values

Use NA or NaN to place a missing value in a vector.

z <- c(10, 101, 2, 3, NA)
is.na(z)

[1] FALSE FALSE FALSE FALSE  TRUE

43 / 51

Question 144 / 51

45 / 51

We are in the midst of a medical crisis! The deadly coronavirus that originated in China has infected hundreds of people and is now spreading across the globe at an alarming rate. World Health Organization (WHO) alerted the world about the Novel Coronavirus(2019-nCoV) in January, 2020. After issuance of the global alert, a formal reporting of Corona cases was put in place, and WHO published daily reports on the number of cases on their website here. Use WHO: Situation Report-21 for this question.

46 / 51

47 / 51

Table 1 reports the confirmed cases of 2019-nCoV reported by provinces, regions and cities in China.

i) Enter confirmed cases in table 1 to a vector.

ii) Name the elements by province/regions/cities in China.

iii) Write R codes to answer the following questions.

Which province/region/city has the highest number of confirmed cases?
Number of confirmed cases reported in Hebei, China.
Total number of confirmed cases reported in China
Number of cases reported in the capital of China
Number of cases reported in Inner Mongolia

48 / 51

Table 2 reports the confirmed 2019-nCoV cases and deaths in China, Singapore, Republic of Korea, Japan, Malaysia, Australia, Viet Nam, Philippines, Cambodia, Thailand, India, Nepal, Sri Lanka, United States of America, Canada, Germany, France, The United Kingdom, Italy, Russian Federation, Spain , Belgium, Finland, Sweden, UAE as

a <- c(40235, 43, 27, 26, 18, 15, 14, 3, 1, 32, 3, 1, 1, 12, 7, 14, 11, 4, 3, 2, 2, 1, 1, 100, 7)

rename the vector a as confirmed_cases_countries
Name elements according to the associated country
Mistakenly 100 cases were recorded to Sweden, correct it.
Add the record for other category into your vector.
Create a new vector to enter WHO regions
China, Singapore, Malaysia, The United Kingdom, Spain have been reported new cases. Create a new vector to code these countries as TRUE and the rest as FALSE

49 / 51

50 / 51

Slides available at: hellor.netlify.com

51 / 51

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Start & Stop the presentation timer

Reset the presentation timer

?, h

Toggle this help

STA 326 2.0 Programming and Data Analysis with R

Lesson 1: Introduction to R

Dr Thiyanga Talagala

2020-02-11

What is R?

Why R?

R environment

The RStudio IDE

The RStudio IDE

R and RStudio

R and RStudio

Create a new project

R Console

R Console

Variable assignment

R Console

Variable assignment

Data permanency

Data permanency

Comment your code

Style Guide

Objects in R

Objects in R

Objects in R

Objects in R

Objects in R

Getting help with functions and features

Method 1

Method 2

Data structures

Data structures

1. Vectors

Vectors

Vector assignment

Types and tests with vectors

Coercion

Explicit coercion

Name elements in a vector

To remove names of a vector

Simplifying vector creation

Simplifying vector creation

Logical operators

Operators: %in% - in the set

Vector arithmetic

Vector arithmetic

Vector arithmetic

Other vector operations

Missing values

Question 1

What is R?

Help

Operators: `%in%` - in the set