+ - 0:00:00
Notes for current slide
Notes for next slide

STA 517 3.0 Programming and Statistical Computing with R

🚦Built-in functions in R

Dr Thiyanga Talagala

1

Function Anatomy

vec1 <- c(1, 2, 3, 4, 5)
mean(vec1)
[1] 3
vec2 <- c(1, 2, NA, 3, 4, 5)
mean(vec2)
[1] NA

Help

?mean

2

help: mean

3

help: sort

4

5

mean with additional inputs

vec1 <- c(1, 2, 3, 4, 5)
mean(vec1)
[1] 3
vec2 <- c(1, 2, NA, 3, 4, 5)
mean(vec2)
[1] NA
mean(vec2, na.rm=TRUE)
[1] 3

6

7

8
vec <- c(10, 1, 2, 4, 100, 15)
sort(vec)
[1] 1 2 4 10 15 100
sort(vec, decreasing = TRUE)
[1] 100 15 10 4 2 1
sort(vec, decreasing = FALSE)
[1] 1 2 4 10 15 100

9

rep

10

Working with built-in functions in R

  • How to call a built-in function

  • Arguments matching

  • Basic functions

  • Test and type conversion functions

  • Probability distribution functions

  • Reproducibility of scientific results

  • Data visualization: qplot

12

Functions in R

👉🏻 Perform a specific task according to a set of instructions.

13

Functions in R

👉🏻 Perform a specific task according to a set of instructions.

👉🏻 Some functions we have discussed so far,

c, matrix, array, list, data.frame, str, dim, length, nrow, which.max, diag, summary

14

Functions in R

👉🏻 Perform a specific task according to a set of instructions.

👉🏻 Some functions we have discussed so far,

c, matrix, array, list, data.frame, str, dim, length, nrow, which.max, diag, summary

👉🏻 In R, functions are objects of class function.

class(length)
[1] "function"
15

Functions in R (cont.)

👉🏻 There are basically two types of functions:

💻 Built-in functions

Already created or defined in the programming framework to make our work easier.

👨 User-defined functions

Sometimes we need to create our own functions for a specific purpose.

16

How to call a built-in function in R

function_name(arg1 = 1, arg2 = 3)

Argument matching

The following calls to mean are all equivalent

mydata <- c(rnorm(20), 100000)
mean(mydata) # matched by position
mean(x = mydata) # matched by name
mean(mydata, na.rm = FALSE)
mean(x = mydata, na.rm = FALSE)
mean(na.rm = FALSE, x = mydata)
mean(na.rm = FALSE, mydata)
[1] 4762.105

⚠️ Even though it works, do not change the order of the arguments too much.

17

Argument matching (cont.)

  • some arguments have default values
mean(mydata, trim=0)
[1] 4762.105
mean(mydata) # Default value for trim is 0
[1] 4762.105
mean(mydata, trim=0.1)
[1] 0.2973882
mean(mydata, tr=0.1) # Partial Matching
[1] 0.2973882
18

?mean

19

Your turn

20
  1. Calculate the mean of 1, 2, 3, 8, 10, 20, 56, NA.

  2. Arrange the numbers according to the descending order and ascending order.

  3. Compute standard error of the above numbers.

21

Basic maths functions

Operator Description
abs(x) absolute value of x
log(x, base = y) logarithm of x with base y; if base is not specified, returns the natural logarithm
exp(x) exponential of x
sqrt(x) square root of x
factorial(x) factorial of x
22

Basic statistic functions

Operator Description
mean(x) mean of x
median(x) median of x
mode(x) mode of x
var(x) variance of x
sd(x) standard deviation of x
scale(x) z-score of x
quantile(x) quantiles of x
summary(x) summary of x: mean, minimum, maximum, etc.
23

Test and Type conversion functions

Test Convert
is.numeric() as.numeric()
is.character() as.character()
is.vector() as.vector()
is.matrix() as.matrix()
is.data.frame() as.data.frame()
is.factor() as.factor()
is.logical() as.logical()
is.na()
24

Test and Type conversion functions

Test Convert
is.numeric() as.numeric()
is.character() as.character()
is.vector() as.vector()
is.matrix() as.matrix()
is.data.frame() as.data.frame()
is.factor() as.factor()
is.logical() as.logical()
is.na()
a <- c(1, 2, 3); a
[1] 1 2 3
is.numeric(a)
[1] TRUE
is.vector(a)
[1] TRUE
25

Test and Type conversion functions

Test Convert
is.numeric() as.numeric()
is.character() as.character()
is.vector() as.vector()
is.matrix() as.matrix()
is.data.frame() as.data.frame()
is.factor() as.factor()
is.logical() as.logical()
is.na()
26

Test and Type conversion functions

Test Convert
is.numeric() as.numeric()
is.character() as.character()
is.vector() as.vector()
is.matrix() as.matrix()
is.data.frame() as.data.frame()
is.factor() as.factor()
is.logical() as.logical()
is.na()
b <- as.character(a); b
[1] "1" "2" "3"
is.vector(b)
[1] TRUE
is.character(b)
[1] TRUE
27

Your turn

28

Remove missing values in the following vector

a
[1] 0.61940020 -0.93808729 0.95518590 -0.22663938 0.29591186 NA
[7] 0.36788089 0.71791098 0.71202022 0.22765782 NA NA
[13] -0.74024324 0.02081516 -0.14979979 -0.22351308 0.98729725 NA
[19] NA NA NA NA NA NA
[25] NA NA NA -1.50016003 0.18682734 0.20808590
[31] 0.70102264 -0.10633074 -1.18460046 0.06475501 0.11568817 -0.04333140
[37] -0.22020064 0.02764713 0.10165760 -0.18234246 1.32914659 -1.29704248
[43] 1.05317749 -0.70109051 0.09798707 0.10457263 -0.21449845
29

Probability distribution functions

  • Each probability distribution in R is associated with four functions.

  • Naming convention for the four functions:

    For each function there is a root name. For example, the root name for the normal distribution is norm. This root is prefixed by one of the letters d, p, q, r.

    • d prefix for the distribution function

    • p prefix for the cumulative probability

    • q prefix for the quantile

    • r prefix for the random number generator

  • Example: dnorm, pnorm, qnorm, rnorm

30
31

Illustration with Standard normal distribution

The general formula for the probability density function of the normal distribution with mean μ and variance σ is given by

fX(x)=1σ(2π)e(xμ)2/2σ2

If we let the mean μ=0 and the standard deviation σ=1, we get the probability density function for the standard normal distribution.

fX(x)=1(2π)e(x)2/2

32

Standard Normal Distribution

fX(x)=1(2π)e(x)2/2

dnorm(0)
[1] 0.3989423
Standard normal probability density function: dnorm(0)

Standard normal probability density function: dnorm(0)

33

Standard Normal Distribution

fX(x)=1(2π)e(x)2/2

pnorm(0)
[1] 0.5
Standard normal probability density function: pnorm(0)

Standard normal probability density function: pnorm(0)

34

Standard Normal Distribution

fX(x)=1(2π)e(x)2/2

qnorm(0.5)
[1] 0
Standard normal probability density function: qnorm(0.5)

Standard normal probability density function: qnorm(0.5)

35

Normal distribution: norm

pnorm(3)
[1] 0.9986501
pnorm(3, sd=1, mean=0)
[1] 0.9986501
pnorm(3, sd=2, mean=1)
[1] 0.8413447
36

Binomial distribution

dbinom(2, size=10, prob=0.2)
[1] 0.3019899
a <- dbinom(0:10, size=10, prob=0.2)
a
[1] 0.1073741824 0.2684354560 0.3019898880 0.2013265920 0.0880803840
[6] 0.0264241152 0.0055050240 0.0007864320 0.0000737280 0.0000040960
[11] 0.0000001024
cumsum(a)
[1] 0.1073742 0.3758096 0.6777995 0.8791261 0.9672065 0.9936306 0.9991356
[8] 0.9999221 0.9999958 0.9999999 1.0000000
37
cumsum(a)
[1] 0.1073742 0.3758096 0.6777995 0.8791261 0.9672065 0.9936306 0.9991356
[8] 0.9999221 0.9999958 0.9999999 1.0000000
pbinom(0:10, size=10, prob=0.2)
[1] 0.1073742 0.3758096 0.6777995 0.8791261 0.9672065 0.9936306 0.9991356
[8] 0.9999221 0.9999958 0.9999999 1.0000000
qbinom(0.4, size=10, prob=0.2)
[1] 2
38

Standard Normal Distribution: rnorm

set.seed(262020)
random_numbers <- rnorm(5)
random_numbers
[1] 0.2007818 0.9587335 1.1836906 1.4951375 1.1810922
sort(random_numbers) ## sort the numbers then it is easy to map with the graph
[1] 0.2007818 0.9587335 1.1810922 1.1836906 1.4951375

39

Other distributions in R

40
  • beta: beta distribution

  • binom: binomial distribution

  • cauchy: Cauchy distribution

  • chisq: chi-squared distribution

  • exp: exponential distribution

  • f: F distribution

  • gamma: gamma distribution

  • geom: geometric distribution

  • hyper: hyper-geometric distribution

  • lnorm: log-normal distribution

  • multinom: multinomial distribution

  • nbinom: negative binomial distribution

  • norm: normal distribution

  • pois: Poisson distribution

  • t: Student's t distribution

  • unif: uniform distribution

  • weibull: Weibull distribution

41

🙋 Getting help with R: ?Distributions

42

Your turn

43

Q1 Suppose ZN(0,1). Calculate the following standard normal probabilities.

  • P(Z1.25),

  • P(Z>1.25),

  • P(Z1.25),

  • P(.38Z1.25).

Q2 Find the following percentiles for the standard normal distribution.

  • 90th,

  • 95th,

  • 97.5th,

44

Q3 Determine the Zα for the following

  • α=0.1

  • α=0.95

Q4 Suppose XN(15,9). Calculate the following probabilities

  • P(X15),

  • P(X<15),

  • P(X10).

02:00
45

Q5 A particular mobile phone number is used to receive both voice messages and text messages. Suppose 20% of the messages involve text messages, and consider a sample of 15 messages. What is the probability that

  • At most 8 of the messages involve a text message?

  • Exactly 8 of the messages involve a text message.

02:00
46

Q6 Generate 20 random values from a Poisson distribution with mean 10 and calculate the mean. Compare your answer with others.

02:00
47

Reproducibility of scientific results

rnorm(10) # first attempt
[1] 1.6582609 -1.8912734 -2.8471112 -2.1617741 0.6401224 -0.4295948
[7] -0.3122580 -1.0267992 1.4231150 0.8661058
rnorm(10) # second attempt
[1] -0.91879540 -0.06053766 -0.20263170 -0.26301690 0.97964620 -0.46034817
[7] 0.81826880 -0.60935778 1.71086661 0.49294451

As you can see above you will get different results.

48

Reproducibility of scientific results (cont.)

set.seed(1)
rnorm(10) # First attempt with set.seed
[1] -0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078 -0.8204684
[7] 0.4874291 0.7383247 0.5757814 -0.3053884
set.seed(1)
rnorm(10) # Second attempt with set.seed
[1] -0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078 -0.8204684
[7] 0.4874291 0.7383247 0.5757814 -0.3053884
49

R Apply family and its variants

  • apply() function
marks <- data.frame(maths=c(10, 20, 30), chemistry=c(100, NA, 60)); marks
maths chemistry
1 10 100
2 20 NA
3 30 60
apply(marks, 1, mean)
[1] 55 NA 45
50
apply(marks, 2, mean)
maths chemistry
20 NA
apply(marks, 1, mean, na.rm=TRUE)
[1] 55 20 45
51

Your turn

52

Calculate the row and column wise standard deviation of the following matrix

[,1] [,2] [,3] [,4]
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20
03:00
53

Your turn

54

Your turn

Find about the following variants of apply family functions in R lapply(), sapply(), vapply(), mapply(), rapply(), and tapply() functions.

Resourses: You can follow the DataCamp tutorial here.

  • You should clearly explain,

    • syntax for each function/ Provide your own example for each function

    • function inputs

    • how each function works?/ The task of the function.

    • output of the function.

    • differences between the functions (apply vs lapply, apply vs sapply, etc.)

55

Data Visualization: qplot()

?qplot

56

Data Visualization: qplot()

?qplot

57

Installing R Packages

Method 1

58

Installing R Packages

Method 2

install.packages("ggplot2")
59

Load package

library(ggplot2)

Now search ?qplot

Note: You shouldn't have to re-install packages each time you open R. However, you do need to load the packages you want to use in that session via library.

60

install.packages vs library

Image credit: Professor Di Cook

61

mozzie dataset

library(mozzie)
data(mozzie)
62

Data Visualization with R

boxplot(mpg ~ cyl, data = mtcars,
xlab = "Quantity of Cylinders",
ylab = "Miles Per Gallon",
main = "Boxplot Example",
notch = TRUE,
varwidth = TRUE,
col = c("green","yellow","red"),
names = c("High","Medium","Low")
)

counts <- table(mtcars$gear)
barplot(counts, main="Car Distribution",
xlab="Number of Gears")

63

Default R installation: graphics package

[1] "abline" "arrows" "assocplot" "axis"
[5] "Axis" "axis.Date" "axis.POSIXct" "axTicks"
[9] "barplot" "barplot.default" "box" "boxplot"
[13] "boxplot.default" "boxplot.matrix" "bxp" "cdplot"
[17] "clip" "close.screen" "co.intervals" "contour"
[21] "contour.default" "coplot" "curve" "dotchart"
[25] "erase.screen" "filled.contour" "fourfoldplot" "frame"
[29] "grconvertX" "grconvertY" "grid" "hist"
[33] "hist.default" "identify" "image" "image.default"
[37] "layout" "layout.show" "lcm" "legend"
[41] "lines" "lines.default" "locator" "matlines"
[45] "matplot" "matpoints" "mosaicplot" "mtext"
[49] "pairs" "pairs.default" "panel.smooth" "par"
[53] "persp" "pie" "plot" "plot.default"
[57] "plot.design" "plot.function" "plot.new" "plot.window"
[61] "plot.xy" "points" "points.default" "polygon"
[65] "polypath" "rasterImage" "rect" "rug"
[69] "screen" "segments" "smoothScatter" "spineplot"
[73] "split.screen" "stars" "stem" "strheight"
[77] "stripchart" "strwidth" "sunflowerplot" "symbols"
[81] "text" "text.default" "title" "xinch"
[85] "xspline" "xyinch" "yinch"
64

65

66
67

mozzie

head(mozzie)
# A tibble: 6 × 28
ID Year Week Colombo Gampaha Kalutara Kandy Matale `Nuwara Eliya` Galle
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 2008 52 15 7 1 11 4 0 0
2 2 2009 1 44 23 5 16 21 2 0
3 3 2009 2 39 19 11 42 9 1 2
4 4 2009 3 57 23 12 28 3 2 1
5 5 2009 4 53 24 19 32 20 2 2
6 6 2009 5 29 17 10 21 6 0 3
# … with 18 more variables: Hambantota <int>, Matara <int>, Jaffna <int>,
# Kilinochchi <int>, Mannar <int>, Vavuniya <int>, Mulative <int>,
# Batticalo <int>, Ampara <int>, Trincomalee <int>, Kurunagala <int>,
# Puttalam <int>, Anuradhapura <int>, Polonnaruwa <int>, Badulla <int>,
# Monaragala <int>, Ratnapura <int>, Kegalle <int>
68

Data Visualization with qplot

plot vs qplot

plot(mozzie$Colombo, mozzie$Gampaha)

qplot(Colombo, Gampaha, data=mozzie)

69

Data Visualization with qplot

qplot(Colombo, Gampaha, data=mozzie)

qplot(Colombo, Gampaha, data=mozzie,
colour=Year)

70

Data Visualization with qplot

qplot(Colombo, Gampaha, data=mozzie)

qplot(Colombo, Gampaha, data=mozzie,
size=Year)

71

Data Visualization with qplot

qplot(Colombo, Gampaha, data=mozzie)

qplot(Colombo, Gampaha, data=mozzie,
size=Year, alpha=0.5)

72

Data Visualization with qplot

qplot(Colombo, Gampaha, data=mozzie)

qplot(Colombo, Gampaha, data=mozzie,
geom="point")

73

Data Visualization with qplot

qplot(ID, Gampaha, data=mozzie)

qplot(ID, Gampaha, data=mozzie,
geom="line")

74

Data Visualization with qplot

qplot(ID, Gampaha, data=mozzie)

qplot(ID, Gampaha, data=mozzie,
geom="path")

75

Data Visualization with qplot

qplot(Colombo, Gampaha, data=mozzie,
geom="line")

qplot(Colombo, Gampaha, data=mozzie,
geom="path")

76

Data Visualization with qplot

qplot(Colombo, Gampaha, data=mozzie,
geom=c("line", "point"))

qplot(Colombo, Gampaha, data=mozzie,
geom=c("path", "point"))

77

Data Visualization with qplot

boxplot(Colombo~Year, data=mozzie)

qplot(factor(Year), Colombo, data=mozzie,
geom="boxplot")

78

Data Visualization with qplot

qplot(factor(Year), Colombo, data=mozzie,
geom="boxplot")

qplot(factor(Year), Colombo, data=mozzie) # geom="point"-default

79

Data Visualization with qplot

qplot(factor(Year), Colombo, data=mozzie,
geom="point")

qplot(factor(Year), Colombo, data=mozzie,
geom="jitter") # geom="point"-default

80

Data Visualization with qplot

qplot(factor(Year), Colombo, data=mozzie,
geom="jitter")

qplot(factor(Year), Colombo, data=mozzie,
geom=c("jitter", "boxplot")) # geom="point"-default

81

qplot(factor(Year), Colombo, data=mozzie,
geom=c("jitter", "boxplot")) # geom="point"-default

82
qplot(factor(Year), Colombo, data=mozzie,
geom=c("jitter", "boxplot")) # geom="point"-default

qplot(factor(Year), Colombo, data=mozzie,
geom=c("jitter", "boxplot"),
outlier.shape = NA) # geom="point"-default

83

Data Visualization with qplot

qplot(Colombo, data=mozzie)

qplot(Colombo, data=mozzie, geom="density")

84

Your turn

85

Explore iris dataset with suitable graphics.

head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa

class: center, middle

Thank you!

Slides available at: hellor.netlify.app

All rights reserved by Thiyanga S. Talagala

86

Function Anatomy

vec1 <- c(1, 2, 3, 4, 5)
mean(vec1)
[1] 3
vec2 <- c(1, 2, NA, 3, 4, 5)
mean(vec2)
[1] NA

Help

?mean

2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow