+ - 0:00:00
Notes for current slide
Notes for next slide

STA 326 2.0 Programming and Data Analysis with R

Lesson 3: Functions in R

Dr Thiyanga Talagala

2020-02-25

1 / 55

Functions in R

👉🏻 Perform a specific task according to a set of instructions.

2 / 55

Functions in R

👉🏻 Perform a specific task according to a set of instructions.

👉🏻 Some functions we have discussed so far,

c, matrix, array, list, data.frame, str, dim, length, nrow, plot

2 / 55

Functions in R

👉🏻 Perform a specific task according to a set of instructions.

👉🏻 Some functions we have discussed so far,

c, matrix, array, list, data.frame, str, dim, length, nrow, plot

👉🏻 In R, functions are objects of class function.

class(length)
[1] "function"
2 / 55

Functions in R

👉🏻 Perform a specific task according to a set of instructions.

👉🏻 Some functions we have discussed so far,

c, matrix, array, list, data.frame, str, dim, length, nrow, plot

👉🏻 In R, functions are objects of class function.

class(length)
[1] "function"

👉🏻 There are basically two types of functions:

💻 Built-in functions

Already created or defined in the programming framework to make our work easier.

👨 User-defined functions

Sometimes we need to create our own functions for a specific purpose.
2 / 55

Basic components of a function

Syntax

name <- function(arg1, aug2, ...){
<FUNCTION BODY>
return(value)
}

Example

cal_sqrt <- function(x){
a <- x^2
b <- x^3
out <- c(a, b)
names(out) <- c("squared", "cubed")
out # or return(out)
}

Evaluation

cal_sqrt(2)
squared cubed
4 8
3 / 55

Basic components of a function

Syntax

name <- function(arg1, aug2, ...){
<FUNCTION BODY>
return(value)
}

Example

cal_sqrt <- function(x){
a <- x^2
b <- x^3
out <- c(a, b)
names(out) <- c("squared", "cubed")
out # or return(out)
}

Evaluation

cal_sqrt(2)
squared cubed
4 8

👉 Functions are created using the function().

3 / 55

Basic components of a function

Syntax

name <- function(arg1, aug2, ...){
<FUNCTION BODY>
return(value)
}

Example

cal_sqrt <- function(x){
a <- x^2
b <- x^3
out <- c(a, b)
names(out) <- c("squared", "cubed")
out # or return(out)
}

Function name: cal_sqrt

  • use verbs, where possible.

  • should be meaningful.

  • Use an underscore (_) to separate words.

  • avoid names of built-in functions.

  • start with lower case letters. Note that R is a case sensitive language.

4 / 55

Basic components of a function

Syntax

name <- function(arg1, aug2, ...){
<FUNCTION BODY>
return(value)
}

Example

cal_sqrt <- function(x){
a <- x^2
b <- x^3
out <- c(a, b)
names(out) <- c("squared", "cubed")
out # or return(out)
}

Function arguments: x

  • value passed to the function to obtain the function's result.
5 / 55

Basic components of a function

Syntax

name <- function(arg1, aug2, ...){
<FUNCTION BODY>
return(value)
}

Example

cal_sqrt <- function(x){
a <- x^2
b <- x^3
out <- c(a, b)
names(out) <- c("squared", "cubed")
out # or return(out)
}

Function body

6 / 55

Function body (Cont.)

  • Place spaces around all operators such as =, +, -, <-, etc.

  • Exception: Do not place spaces around the operators :, :: and :::

1+2 # bad
1 + 2 # good
7 / 55

Function body (Cont.)

  • Place spaces around all operators such as =, +, -, <-, etc.

  • Exception: Do not place spaces around the operators :, :: and :::

1+2 # bad
1 + 2 # good
  • Place a space before left parentheses except evaluating the function (function call)
if (a > 2) # good
if(a>2) # bad
# Function call ----
rnorm(2) # good
rnorm (2) # bad
  • Use extra spacing to align multiple lines with <- or =
# Bad ------
a = sum(c(1, 5, 8, 10))/2
sd = sd(c(1, 5, 8, 10))
# Good ------
a = sum(c(1, 5, 8, 10))/2
sd = sd(c(1, 5, 8, 10))
7 / 55

Function body (Cont.)

  • Spacing inside parentheses or square brackets
# Good ---
a[1, 2]
a[1, ]
if(x < 2)
# Bad ---
a[1,2]
a[1,]
if(x<2)
if( x<2 )
  • {} do not go in one single line, always two lines
# Good ---
if(y == 2){
print("even")
}
# Bad ---
if(y == 2){ print("even")}
8 / 55

Built-in Functions

How to call a built-in function in R

function_name(arg1 = 1, arg2 = 3)

Argument matching

The following calls to mean are all equivalent

mydata <- c(rnorm(20), 100000)
mean(mydata) # matched by position
mean(x = mydata) # matched by name
mean(mydata, na.rm = FALSE)
mean(x = mydata, na.rm = FALSE)
mean(na.rm = FALSE, x = mydata)
mean(na.rm = FALSE, mydata)
[1] 4761.94

⚠️ Even though it works, do not change the order of the arguments too much.

9 / 55

Argument matching (cont.)

  • some arguments have default values
mean(mydata, trim=0)
[1] 4761.94
mean(mydata) # Default value for trim is 0
[1] 4761.94
mean(mydata, trim=0.1)
[1] 0.1449313
mean(mydata, tr=0.1) # Partial Matching
[1] 0.1449313
10 / 55

?mean

11 / 55

Your turn

12 / 55
  1. Calculate the mean of 1, 2, 3, 8, 10, 20, 56, NA.
13 / 55

Basic maths functions

Operator Description
abs(x) absolute value of x
log(x, base = y) logarithm of x with base y; if base is not specified, returns the natural logarithm
exp(x) exponential of x
sqrt(x) square root of x
factorial(x) factorial of x
14 / 55

Basic statistic functions

Operator Description
mean(x) mean of x
median(x) median of x
mode(x) mode of x
var(x) variance of x
sd(x) standard deviation of x
scale(x) z-score of x
quantile(x) quantiles of x
summary(x) summary of x: mean, minimum, maximum, etc.
15 / 55

Type conversion functions

Test Convert
is.numeric() as.numeric()
is.character() as.character()
is.vector() as.vector()
is.matrix() as.matrix()
is.data.frame() as.data.frame()
is.factor() as.factor()
is.logical() as.logical()
is.na()
16 / 55

Type conversion functions

Test Convert
is.numeric() as.numeric()
is.character() as.character()
is.vector() as.vector()
is.matrix() as.matrix()
is.data.frame() as.data.frame()
is.factor() as.factor()
is.logical() as.logical()
is.na()

Example

a <- c(1, 2, 3); a
[1] 1 2 3
is.numeric(a)
[1] TRUE
is.vector(a)
[1] TRUE
b <- as.character(a); b
[1] "1" "2" "3"
is.vector(b)
[1] TRUE
is.character(b)
[1] TRUE
16 / 55

Your turn

17 / 55

Remove missing values in the following vector

[1] 0.61940020 -0.93808729 0.95518590 -0.22663938 0.29591186 NA
[7] 0.36788089 0.71791098 0.71202022 0.22765782 NA NA
[13] -0.74024324 0.02081516 -0.14979979 -0.22351308 0.98729725 NA
[19] NA NA NA NA NA NA
[25] NA NA NA -1.50016003 0.18682734 0.20808590
[31] 0.70102264 -0.10633074 -1.18460046 0.06475501 0.11568817 -0.04333140
[37] -0.22020064 0.02764713 0.10165760 -0.18234246 1.32914659 -1.29704248
[43] 1.05317749 -0.70109051 0.09798707 0.10457263 -0.21449845
18 / 55

Probability distribution functions

  • Each probability distribution in R is associated with four functions.

  • Naming convention for the four functions:

    For each function there is a root name. For example, the root name for the normal distribution is norm. This root is prefixed by one of the letters d, p, q, r.

    • d prefix for the distribution function

    • p prefix for the cumulative probability

    • q prefix for the quantile

    • r prefix for the random number generator

  • Example: dnorm, pnorm, qnorm, rnorm

19 / 55

Illustration with Standard normal distribution

The general formula for the probability density function of the normal distribution with mean μ and variance σ is given by

fX(x)=1σ(2π)e(xμ)2/2σ2

If we let the mean μ=0 and the standard deviation σ=1, we get the probability density function for the standard normal distribution.

fX(x)=1(2π)e(x)2/2

20 / 55

Standard Normal Distribution

fX(x)=1(2π)e(x)2/2

Standard normal probability density function: dnorm(0)

dnorm(0)
[1] 0.3989423
21 / 55

Standard Normal Distribution

fX(x)=1(2π)e(x)2/2

pnorm(0)
[1] 0.5

Standard normal probability density function: dnorm(0)

22 / 55

Standard Normal Distribution

fX(x)=1(2π)e(x)2/2

pnorm(0)
[1] 0.5

Standard normal probability density function: dnorm(0)

23 / 55

Standard Normal Distribution

fX(x)=1(2π)e(x)2/2

qnorm(0.5)
[1] 0

Standard normal probability density function: dnorm(0)

24 / 55

Standard Normal Distribution: rnorm

set.seed(262020)
random_numbers <- rnorm(10)
random_numbers
[1] 0.20078181 0.95873346 1.18369056 1.49513750 1.18109222 -0.57789570
[7] 0.01790671 0.81185245 0.39488199 -0.44337927
sort(random_numbers) ## sort the numbers then it is easy to map with the graph
[1] -0.57789570 -0.44337927 0.01790671 0.20078181 0.39488199 0.81185245
[7] 0.95873346 1.18109222 1.18369056 1.49513750

25 / 55

Other distributions in R

  • beta: beta distribution

  • binom: binomial distribution

  • cauchy: Cauchy distribution

  • chisq: chi-squared distribution

  • exp: exponential distribution

  • f: F distribution

  • gamma: gamma distribution

  • geom: geometric distribution

  • hyper: hyper-geometric distribution

  • lnorm: log-normal distribution

  • multinom: multinomial distribution

  • nbinom: negative binomial distribution

  • norm: normal distribution

  • pois: Poisson distribution

  • t: Student's t distribution

  • unif: uniform distribution

  • weibull: Weibull distribution

26 / 55

Other distributions in R

  • beta: beta distribution

  • binom: binomial distribution

  • cauchy: Cauchy distribution

  • chisq: chi-squared distribution

  • exp: exponential distribution

  • f: F distribution

  • gamma: gamma distribution

  • geom: geometric distribution

  • hyper: hyper-geometric distribution

  • lnorm: log-normal distribution

  • multinom: multinomial distribution

  • nbinom: negative binomial distribution

  • norm: normal distribution

  • pois: Poisson distribution

  • t: Student's t distribution

  • unif: uniform distribution

  • weibull: Weibull distribution

🙋 Getting help with R: ?Distributions

26 / 55

Your turn

27 / 55
  1. Suppose ZN(0,1). Calculate the following standard normal probabilities.

    • P(Z1.25),

    • P(Z>1.25),

    • P(Z1.25),

    • P(.38Z1.25).

  2. Find the following percentiles for the standard normal distribution.

    • 90th,

    • 95th,

    • 97.5th,

  3. Determine the Zα for the following

    • α=0.1

    • α=0.95

28 / 55
  1. Suppose XN(15,9). Calculate the following probabilities

    • P(X15),

    • P(X<15),

    • P(X10).

  2. A particular mobile phone number is used to receive both voice messages and text messages. Suppose 20% of the messages involve text messages, and consider a sample of 15 messages. What is the probability that

    • At most 8 of the messages involve a text message?

    • Exactly 8 of the messages involve a text message.

  3. Generate 20 random values from a Poisson distribution with mean 10 and calculate the mean. Compare your answer with your friend's answer.

29 / 55

Reproducibility of scientific results

rnorm(10) # first attempt
[1] 1.4701904 -0.2375662 0.1765985 -0.5257483 -1.3674764 -1.4422500
[7] 0.7576607 0.6475122 -1.1543034 0.9066248
rnorm(10) # second attempt
[1] -1.7603264 -0.3402939 -1.0335807 1.0645014 -0.3874459 0.5975271
[7] -2.1535707 0.6602928 1.1581404 0.6133446

As you can see above you will get different results

set.seed(1)
rnorm(10) # First attempt with set.seed
[1] -0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078 -0.8204684
[7] 0.4874291 0.7383247 0.5757814 -0.3053884
set.seed(1)
rnorm(10) # Second attempt with set.seed
[1] -0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078 -0.8204684
[7] 0.4874291 0.7383247 0.5757814 -0.3053884
30 / 55

R Apply family and its variants

  • apply() function
marks <- data.frame(maths=c(10, 20, 30), chemistry=c(100, NA, 60))
marks
maths chemistry
1 10 100
2 20 NA
3 30 60
apply(marks, 1, mean)
[1] 55 NA 45
apply(marks, 2, mean)
maths chemistry
20 NA
31 / 55

R Apply family and its variants

  • apply() function
marks <- data.frame(maths=c(10, 20, 30), chemistry=c(100, NA, 60))
marks
maths chemistry
1 10 100
2 20 NA
3 30 60
apply(marks, 1, mean)
[1] 55 NA 45
apply(marks, 2, mean)
maths chemistry
20 NA
apply(marks, 1, mean, na.rm=TRUE)
[1] 55 20 45
31 / 55

Your turn

32 / 55

Calculate the row and column wise standard deviation of the following matrix

[,1] [,2] [,3] [,4]
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20
33 / 55

Your turn

34 / 55

Assignment 1: Individual

Find about the following variants of apply family functions in R lapply(), sapply(), vapply(), mapply(), rapply(), and tapply() functions.

Resourses: You can follow the DataCamp tutorial here.

  • You should clearly explain,

    • syntax for each function

    • function inputs

    • how each function works?/ The task of the function.

    • output of the function.

    • differences between the functions (apply vs lapply, apply vs sapply, etc.)

  • Provide your own example for each function.

Use only 1 A4 sheet, you may use both sides.

Assignment due date: 3 March 2020

35 / 55

Data Visualization: qplot()

?qplot

36 / 55

Data Visualization: qplot()

?qplot

36 / 55

Installing R Packages

Method 1

Method 2

install.packages("ggplot2")
37 / 55

Load package

library(ggplot2)

Now search ?qplot

Note: You shouldn't have to re-install packages each time you open R. However, you do need to load the packages you want to use in that session via library.

38 / 55

mozzie dataset

library(mozzie)
data(mozzie)
39 / 55

Data Visualization with qplot

plot vs qplot

plot(mozzie$Colombo, mozzie$Gampaha)

qplot(Colombo, Gampaha, data=mozzie)

40 / 55

Data Visualization with qplot

qplot(Colombo, Gampaha, data=mozzie)

qplot(Colombo, Gampaha, data=mozzie, colour=Year)

41 / 55

Data Visualization with qplot

qplot(Colombo, Gampaha, data=mozzie)

qplot(Colombo, Gampaha, data=mozzie, size=Year)

42 / 55

Data Visualization with qplot

qplot(Colombo, Gampaha, data=mozzie)

qplot(Colombo, Gampaha, data=mozzie, geom="point")

43 / 55

Data Visualization with qplot

qplot(ID, Gampaha, data=mozzie)

qplot(ID, Gampaha, data=mozzie, geom="line")

44 / 55

Data Visualization with qplot

qplot(ID, Gampaha, data=mozzie)

qplot(ID, Gampaha, data=mozzie, geom="path")

45 / 55

Data Visualization with qplot

qplot(Colombo, Gampaha, data=mozzie, geom="line")

qplot(Colombo, Gampaha, data=mozzie, geom="path")

46 / 55

Data Visualization with qplot

qplot(Colombo, Gampaha, data=mozzie, geom=c("line", "point"))

qplot(Colombo, Gampaha, data=mozzie, geom=c("path", "point"))

47 / 55

Data Visualization with qplot

boxplot(Colombo~Year, data=mozzie)

qplot(factor(Year), Colombo, data=mozzie, geom="boxplot")

48 / 55

Data Visualization with qplot

qplot(factor(Year), Colombo, data=mozzie, geom="boxplot")

qplot(factor(Year), Colombo, data=mozzie) # geom="point"-default

49 / 55

Data Visualization with qplot

qplot(factor(Year), Colombo, data=mozzie, geom="point")

qplot(factor(Year), Colombo, data=mozzie, geom=c("jitter", "point")) # geom="point"-default

50 / 55

Data Visualization with qplot

qplot(factor(Year), Colombo, data=mozzie, geom=c("jitter", "point"))

qplot(factor(Year), Colombo, data=mozzie, geom=c("jitter", "point", "boxplot")) # geom="point"-default

51 / 55

Data Visualization with qplot

qplot(Colombo, data=mozzie)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

qplot(Colombo, data=mozzie, geom="density")

52 / 55

Your turn

53 / 55

Explore iris dataset with suitable graphics.

head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa

54 / 55

Slides available at: hellor.netlify.com

All rights reserved by Thiyanga S. Talagala

55 / 55

Functions in R

👉🏻 Perform a specific task according to a set of instructions.

2 / 55
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
s Start & Stop the presentation timer
t Reset the presentation timer
?, h Toggle this help
Esc Back to slideshow