R is a software environment for statistical computing and graphics
Language designers: Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand
Parent language: S
The latest R version 3.6.2 has been released on 2019-12-12
Free
Powerful: Over 14600 contributed packages on the main repository (CRAN), as of July 2019, provided by top international researchers and programmers
Flexible: It is a language, and thus allows you to create your own solutions
Community: Large global community friendly and helpful, lots of resources
"If R were an airplane, RStudio would be the airport, providing many, many supporting services that make it easier for you, the pilot, to take off and go to awesome places. Sure, you can fly an airplane without an airport, but having those runways and supporting infrastructure is a game-changer."
-- Julie Lowndes
Image Credit: Clastic Detritus
7+1
[1] 8
rnorm(10)
[1] -0.3805010 -2.2459140 -0.5516191 1.0157288 -0.5636009 0.0912911 [7] -0.3473837 0.8967408 0.7094069 -0.3845299
7+1
[1] 8
rnorm(10)
[1] -0.3805010 -2.2459140 -0.5516191 1.0157288 -0.5636009 0.0912911 [7] -0.3473837 0.8967408 0.7094069 -0.3845299
a <- rnorm(10)a
[1] 0.7601029 -0.4016582 1.2890499 -0.4854536 1.5334595 -0.8243906 [7] 0.3579681 0.5746972 -0.7215895 -0.7779021
7+1
[1] 8
rnorm(10)
[1] -0.3805010 -2.2459140 -0.5516191 1.0157288 -0.5636009 0.0912911 [7] -0.3473837 0.8967408 0.7094069 -0.3845299
a <- rnorm(10)a
[1] 0.7601029 -0.4016582 1.2890499 -0.4854536 1.5334595 -0.8243906 [7] 0.3579681 0.5746972 -0.7215895 -0.7779021
b <- a*100b
[1] 76.01029 -40.16582 128.90499 -48.54536 153.34595 -82.43906 35.79681 [8] 57.46972 -72.15895 -77.79021
ls()
can be used to display the names of the objects which are currently stored within R.
The collection of objects currently stored is called the workspace
ls()
[1] "a" "b"
ls()
can be used to display the names of the objects which are currently stored within R.
The collection of objects currently stored is called the workspace
ls()
[1] "a" "b"
To remove objects the function rm
is available
remove all objects rm(list=ls())
remove specific objects rm(x, y, z)
rm(a)ls()
[1] "b"
rm(list=ls())ls()
character(0)
At the end of an R session, if save: the objects are written to a file called .RData in the current directory, and the command lines used in the session are saved to a file called .Rhistory
When R is started at later time from the same directory
When R is started at later time from the same directory it reloads the associated workspace and commands history.
When R is started at later time from the same directory it reloads the associated workspace and commands history.
rnorm(10) # This is a comment
[1] 0.4310973 2.4025568 -0.4692903 1.2052056 -0.3137667 1.0006081 [7] 2.0435857 -0.4941967 -1.5253943 -0.8166049
sum(1:10) # 1+2
[1] 55
sum(1:10)#Bad commenting style
[1] 55
sum(1:10) # Good commenting style
[1] 55
# Read data ----------------# Plot data ----------------
To learn more read Hadley Wickham's Style guide.
R is an object-oriented language.
An object in R is anything (data structures, functions, etc., that can be assigned to a variable).
R is an object-oriented language.
An object in R is anything (data structures, functions, etc., that can be assigned to a variable).
Let's take a look of some common types of objects.
R is an object-oriented language.
An object in R is anything (data structures, functions, etc., that can be assigned to a variable).
Let's take a look of some common types of objects.
Data structures are the ways of arranging data.
R is an object-oriented language.
An object in R is anything (data structures, functions, etc., that can be assigned to a variable).
Let's take a look of some common types of objects.
Data structures are the ways of arranging data.
Functions tell R to do something.
A function may be applied to an object.
Result of applying a function is usually an object too.
All function calls need to be followed by parentheses.
a <- 1:20 # data structuresum(a) # sum is a function applied on a
[1] 210
help.start() # Some functions work on their own.
help(rnorm)
for
, if
, [[
help("[[")
help.search(‘weighted mean’)
?rnorm
??rnorm
Data structures differ in terms of,
Type of data they can hold
How they are created
Structural complexity
Notation to identify and access individual elements
Image Credit: venus.ifca.unican.es
Vectors are one-dimensional arrays that can hold numeric data, character data, or logical data.
Combine function c() is used to form the vector.
Data in a vector must only be one type or mode (numeric, character, or logical). You can’t mix modes in the same vector.
Syntax
vector_name <- c(element1, element2, element3)
x <- c(5, 6, 3, 1 , 100)
assignment operator ('<-'), '=' can be used as an alternative.
c()
function
What will be the output of the following code?
y <- c(x, 500, 600)
first_vec <- c(10, 20, 50, 70)second_vec <- c("Jan", "Feb", "March", "April")third_vec <- c(TRUE, FALSE, TRUE, TRUE)fourth_vec <- c(10L, 20L, 50L, 70L)
To check if it is a
is.vector()
is.vector(first_vec)
[1] TRUE
is.character()
is.character(first_vec)
[1] FALSE
is.double()
is.double(first_vec)
[1] TRUE
is.integer()
is.integer(first_vec)
[1] FALSE
is.logical()
is.logical(first_vec)
[1] FALSE
length(first_vec)
[1] 4
Vectors must be homogeneous. When you attempt to combine different types they will be coerced to the most flexible type so that every element in the vector is of the same type.
Order from least to most flexible
logical
--> integer
--> double
--> character
a <- c(3.1, 2L, 3, 4, "GPA") typeof(a)
[1] "character"
anew <- c(3.1, 2L, 3, 4)typeof(anew)
[1] "double"
Vectors can be explicitly coerced from one class to another using the as.*
functions, if available. For example, as.character
, as.numeric
, as.integer
, and as.logical
.
vec1 <- c(TRUE, FALSE, TRUE, TRUE)typeof(vec1)
[1] "logical"
vec2 <- as.integer(vec1)typeof(vec2)
[1] "integer"
vec2
[1] 1 0 1 1
Why does the below output NAs?
x <- c("a", "b", "c")as.numeric(x)
Warning: NAs introduced by coercion
[1] NA NA NA
x1 <- 1:3x2 <- c(10, 20, 30)combinedx1x2 <- c(x1, x2)combinedx1x2
[1] 1 2 3 10 20 30
x1 <- 1:3x2 <- c(10, 20, 30)combinedx1x2 <- c(x1, x2)combinedx1x2
[1] 1 2 3 10 20 30
class(x1)
[1] "integer"
class(x2)
[1] "numeric"
class(combinedx1x2)
[1] "numeric"
x1 <- 1:3x2 <- c(10, 20, 30)combinedx1x2 <- c(x1, x2)combinedx1x2
[1] 1 2 3 10 20 30
class(x1)
[1] "integer"
class(x2)
[1] "numeric"
class(combinedx1x2)
[1] "numeric"
y1 <- c(1, 2, 3)y2 <- c("a", "b", "c")c(y1, y2)
[1] "1" "2" "3" "a" "b" "c"
You can name elements in a vector in different ways. We will learn two of them.
x1 <- c(a=1991, b=1992, c=1993)x1
## a b c ## 1991 1992 1993
x2 <- c(1, 5, 10)names(x2) <- c("a", "b", "b")x2
## a b b ## 1 5 10
Note that the names do not have to be unique.
Method 1
unname(x1); x1
[1] 1991 1992 1993
a b c 1991 1992 1993
Method 2
names(x2) <- NULL; x2
[1] 1 5 10
What will be the output of the following code?
v <- c(1, 2, 3)names(v) <- c("a")v
:
produce regular spaced ascending or descending sequences. 10:16
[1] 10 11 12 13 14 15 16
-0.5:8.5
[1] -0.5 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5
:
produce regular spaced ascending or descending sequences. 10:16
[1] 10 11 12 13 14 15 16
-0.5:8.5
[1] -0.5 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5
seq(initial_value, final_value, increment)
seq(1,11)
[1] 1 2 3 4 5 6 7 8 9 10 11
seq(1, 11, length.out=5)
[1] 1.0 3.5 6.0 8.5 11.0
seq(0, 11, by=2)
[1] 0 2 4 6 8 10
rep()
rep(9, 5)
[1] 9 9 9 9 9
rep(1:4, 2)
[1] 1 2 3 4 1 2 3 4
rep(1:4, each=2) # each element is repeated twice
[1] 1 1 2 2 3 3 4 4
rep(1:4, times=2) # whole sequence is repeated twice
[1] 1 2 3 4 1 2 3 4
rep(1:4, each=2, times=3)
[1] 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4
rep(1:4, 1:4)
[1] 1 2 2 3 3 3 4 4 4 4
rep(1:4, c(4, 1, 4, 2))
[1] 1 1 1 1 2 3 3 3 3 4 4
c(1, 2, 3) == c(10, 20, 3)
[1] FALSE FALSE TRUE
c(1, 2, 3) != c(10, 20, 3)
[1] TRUE TRUE FALSE
1:5 > 3
[1] FALSE FALSE FALSE TRUE TRUE
1:5 < 3
[1] TRUE TRUE FALSE FALSE FALSE
<=
less than or equal to
>=
greater than or equal to
|
or
&
and
%in%
- in the seta <- c(1, 2, 3)b <- c(1, 10, 3)a%in%b
[1] TRUE FALSE TRUE
x <- 1:10y <- 1:3x
[1] 1 2 3 4 5 6 7 8 9 10
y
[1] 1 2 3
x %in% y
[1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
y %in% x
[1] TRUE TRUE TRUE
c(10, 100, 100) + 2 # two is added to every element in the vector
[1] 12 102 102
c(10, 100, 100) + 2 # two is added to every element in the vector
[1] 12 102 102
v1 <- c(1, 2, 3); v2 <- c(10, 100, 1000)v1 + v2
[1] 11 102 1003
c(10, 100, 100) + 2 # two is added to every element in the vector
[1] 12 102 102
v1 <- c(1, 2, 3); v2 <- c(10, 100, 1000)v1 + v2
[1] 11 102 1003
Add two vectors of unequal length
longvec <- seq(10, 100, length=10); shortvec <- c(1, 2, 3, 4, 5)shortvec + longvec
[1] 11 22 33 44 55 61 72 83 94 105
What will be the output of the following code?
first <- c(1, 2, 3, 4); second <- c(10, 100)first * second
Use NA
or NaN
to place a missing value in a vector.
z <- c(10, 101, 2, 3, NA)is.na(z)
[1] FALSE FALSE FALSE FALSE TRUE
We are in the midst of a medical crisis! The deadly coronavirus that originated in China has infected hundreds of people and is now spreading across the globe at an alarming rate. World Health Organization (WHO) alerted the world about the Novel Coronavirus(2019-nCoV) in January, 2020. After issuance of the global alert, a formal reporting of Corona cases was put in place, and WHO published daily reports on the number of cases on their website here. Use WHO: Situation Report-21 for this question.
i) Enter confirmed cases in table 1 to a vector.
ii) Name the elements by province/regions/cities in China.
iii) Write R codes to answer the following questions.
Which province/region/city has the highest number of confirmed cases?
Number of confirmed cases reported in Hebei, China.
Total number of confirmed cases reported in China
Number of cases reported in the capital of China
Number of cases reported in Inner Mongolia
a <- c(40235, 43, 27, 26, 18, 15, 14, 3, 1, 32, 3, 1, 1, 12, 7, 14, 11, 4, 3, 2, 2, 1, 1, 100, 7)
rename the vector a
as confirmed_cases_countries
Name elements according to the associated country
Mistakenly 100 cases were recorded to Sweden, correct it.
Add the record for other
category into your vector.
Create a new vector to enter WHO regions
China, Singapore, Malaysia, The United Kingdom, Spain have been reported new cases. Create a new vector to code these countries as TRUE and the rest as FALSE
R is a software environment for statistical computing and graphics
Language designers: Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand
Parent language: S
The latest R version 3.6.2 has been released on 2019-12-12
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
s | Start & Stop the presentation timer |
t | Reset the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |