+ - 0:00:00
Notes for current slide
Notes for next slide

STA 326 2.0 Programming and Data Analysis with R

Lesson 4: Writing Functions in R

Dr Thiyanga Talagala

2020-03-03

1 / 40
2 / 40

Load the mozzie dataset

library(mozzie)
data(mozzie)
head(mozzie, 2)
ID Year Week Colombo Gampaha Kalutara Kandy Matale Nuwara Eliya Galle
1 1 2008 52 15 7 1 11 4 0 0
2 2 2009 1 44 23 5 16 21 2 0
Hambantota Matara Jaffna Kilinochchi Mannar Vavuniya Mulative Batticalo
1 6 22 0 0 8 0 0 1
2 5 18 1 0 0 0 0 0
Ampara Trincomalee Kurunagala Puttalam Anuradhapura Polonnaruwa Badulla
1 0 0 2 1 2 0 1
2 1 1 10 5 0 0 1
Monaragala Ratnapura Kegalle
1 1 2 16
2 0 1 25
3 / 40

Load the mozzie dataset

library(mozzie)
data(mozzie)
head(mozzie, 2)
ID Year Week Colombo Gampaha Kalutara Kandy Matale Nuwara Eliya Galle
1 1 2008 52 15 7 1 11 4 0 0
2 2 2009 1 44 23 5 16 21 2 0
Hambantota Matara Jaffna Kilinochchi Mannar Vavuniya Mulative Batticalo
1 6 22 0 0 8 0 0 1
2 5 18 1 0 0 0 0 0
Ampara Trincomalee Kurunagala Puttalam Anuradhapura Polonnaruwa Badulla
1 0 0 2 1 2 0 1
2 1 1 10 5 0 0 1
Monaragala Ratnapura Kegalle
1 1 2 16
2 0 1 25

Use Min-Max transformation to rescale all the districts variables onto 0-1 range.

Min-Max transformation is ximin(x)max(x)min(x) where x=(x1,x2,...xn).

3 / 40

Min-Max transformation on mozzie

# Colombo district
minmax.colombo <- (mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) /
(max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE))
head(minmax.colombo)
[1] 0.03157895 0.09263158 0.08210526 0.12000000 0.11157895 0.06105263
4 / 40

Min-Max transformation on mozzie

# Colombo district
minmax.colombo <- (mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) /
(max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE))
head(minmax.colombo)
[1] 0.03157895 0.09263158 0.08210526 0.12000000 0.11157895 0.06105263
# Gampaha district
minmax.gampaha <- (mozzie$Gampaha - min(mozzie$Gampaha, na.rm = TRUE)) /
(max(mozzie$Gampaha, na.rm = TRUE) - min(mozzie$Gampaha, na.rm = TRUE))
head(minmax.gampaha)
[1] 0.02734375 0.08984375 0.07421875 0.08984375 0.09375000 0.06640625
4 / 40

Min-Max transformation on mozzie

# Colombo district
minmax.colombo <- (mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) /
(max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE))
head(minmax.colombo)
[1] 0.03157895 0.09263158 0.08210526 0.12000000 0.11157895 0.06105263
# Gampaha district
minmax.gampaha <- (mozzie$Gampaha - min(mozzie$Gampaha, na.rm = TRUE)) /
(max(mozzie$Gampaha, na.rm = TRUE) - min(mozzie$Gampaha, na.rm = TRUE))
head(minmax.gampaha)
[1] 0.02734375 0.08984375 0.07421875 0.08984375 0.09375000 0.06640625
# Kalutara district
minmax.kalutara <- (mozzie$Gampaha - min(mozzie$Kalutara, na.rm = TRUE)) /
(max(mozzie$Kalutara, na.rm = TRUE) - min(mozzie$Kalutara, na.rm = TRUE))
head(minmax.kalutara)
[1] 0.09333333 0.30666667 0.25333333 0.30666667 0.32000000 0.22666667
4 / 40

Min-Max transformation on mozzie

# Colombo district
minmax.colombo <- (mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) /
(max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE))
head(minmax.colombo)
[1] 0.03157895 0.09263158 0.08210526 0.12000000 0.11157895 0.06105263
# Gampaha district
minmax.gampaha <- (mozzie$Gampaha - min(mozzie$Gampaha, na.rm = TRUE)) /
(max(mozzie$Gampaha, na.rm = TRUE) - min(mozzie$Gampaha, na.rm = TRUE))
head(minmax.gampaha)
[1] 0.02734375 0.08984375 0.07421875 0.08984375 0.09375000 0.06640625
# Kalutara district
minmax.kalutara <- (mozzie$Gampaha - min(mozzie$Kalutara, na.rm = TRUE)) /
(max(mozzie$Kalutara, na.rm = TRUE) - min(mozzie$Kalutara, na.rm = TRUE))
head(minmax.kalutara)
[1] 0.09333333 0.30666667 0.25333333 0.30666667 0.32000000 0.22666667

Very easily made errors when copying-and-pasting the codes.

A mistake copied becomes a mistake repeated.

4 / 40

When should you write a function?

  • Whenever you need to copy and paste a block of codes many times.

    • A function is a reusable block of programming code designed to do a specific task.
  • If you don't find a suitable built-in function to serve your purpose, you can write your own function.

  • To share your work with others.

5 / 40

Writing a function

Step 1: Function name

rescale_minmax
6 / 40

Writing a function

Step 1: Function name

rescale_minmax

Step 2: Assign your function to the name

rescale_minmax <-
6 / 40

Writing a function

Step 1: Function name

rescale_minmax

Step 2: Assign your function to the name

rescale_minmax <-

Step 3: Tell R that you are writing a function

rescale_minmax <- function() # Arguments/inputs should be defined inside ()
6 / 40

Writing a function

Step 1: Function name

rescale_minmax

Step 2: Assign your function to the name

rescale_minmax <-

Step 3: Tell R that you are writing a function

rescale_minmax <- function() # Arguments/inputs should be defined inside ()

Step 4: Curly braces define the start and the end of your work

rescale_minmax <- function(){
# Task
# output
}
6 / 40

Step 5: Function inputs, task and outputs

Find all the inputs that correspond to a given function output?

# Colombo district
(mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) /
(max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE))
7 / 40

Step 5: Function inputs, task and outputs

Find all the inputs that correspond to a given function output?

# Colombo district
(mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) /
(max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE))

Re-write the code with general names

x <- mozzie$Colombo
(x - min(x, na.rm = TRUE)) / (max(x, na.rm=TRUE) - min(x, na.rm=TRUE))
7 / 40

Step 5: Function inputs, task and outputs

Find all the inputs that correspond to a given function output?

# Colombo district
(mozzie$Colombo - min(mozzie$Colombo, na.rm = TRUE)) /
(max(mozzie$Colombo, na.rm=TRUE) - min(mozzie$Colombo, na.rm=TRUE))

Re-write the code with general names

x <- mozzie$Colombo
(x - min(x, na.rm = TRUE)) / (max(x, na.rm=TRUE) - min(x, na.rm=TRUE))

Remove duplication/ Make your code efficient and readable

rng <- range(x, na.rm = TRUE)
rng
[1] 0 475
rng <- range(x, na.rm = TRUE)
(x - rng[1]) / (rng[2] - rng[1])
7 / 40

Step 6: Complete your function

Type A

rescale_minmax <- function(x){
rng <- range(x, na.rm = TRUE)
(x - rng[1]) / (rng[2] - rng[1])
}
8 / 40

Step 6: Complete your function

Type A

rescale_minmax <- function(x){
rng <- range(x, na.rm = TRUE)
(x - rng[1]) / (rng[2] - rng[1])
}

Type B

rescale_minmax <- function(x){
rng <- range(x, na.rm = TRUE)
out.rescaled <- (x - rng[1]) / (rng[2] - rng[1])
out.rescaled
}
8 / 40

Step 6: Complete your function

Type A

rescale_minmax <- function(x){
rng <- range(x, na.rm = TRUE)
(x - rng[1]) / (rng[2] - rng[1])
}

Type B

rescale_minmax <- function(x){
rng <- range(x, na.rm = TRUE)
out.rescaled <- (x - rng[1]) / (rng[2] - rng[1])
out.rescaled
}

Type C

rescale_minmax <- function(x){
rng <- range(x, na.rm = TRUE)
out.rescaled <- (x - rng[1]) / (rng[2] - rng[1])
return(out.rescaled)
}

In this situation Type A is the best.

8 / 40

Step 7: Check your function with a few different inputs

rescale_minmax <- function(x){
rng <- range(x, na.rm = TRUE)
(x - rng[1]) / (rng[2] - rng[1])
}
9 / 40

Step 7: Check your function with a few different inputs

rescale_minmax <- function(x){
rng <- range(x, na.rm = TRUE)
(x - rng[1]) / (rng[2] - rng[1])
}
rescale_minmax(c(1, 200, 250, 80, NA))
[1] 0.0000000 0.7991968 1.0000000 0.3172691 NA

Back to our original example

minmax.colombo <- rescale_minmax(mozzie$Colombo)
head(minmax.colombo)
[1] 0.03157895 0.09263158 0.08210526 0.12000000 0.11157895 0.06105263
9 / 40

Step 7: Check your function with a few different inputs

rescale_minmax <- function(x){
rng <- range(x, na.rm = TRUE)
(x - rng[1]) / (rng[2] - rng[1])
}
rescale_minmax(c(1, 200, 250, 80, NA))
[1] 0.0000000 0.7991968 1.0000000 0.3172691 NA

Back to our original example

minmax.colombo <- rescale_minmax(mozzie$Colombo)
head(minmax.colombo)
[1] 0.03157895 0.09263158 0.08210526 0.12000000 0.11157895 0.06105263
minmax.gampaha <- rescale_minmax(mozzie$Gampaha)
minmax.kalutara <- rescale_minmax(mozzie$Kalutara)
9 / 40

Move forward: When the requirements changes

new.data.col <- c(400, 500, 350, 250, 60, 70, Inf)
rescale_minmax(new.data.col)
[1] 0 0 0 0 0 0 NaN
10 / 40

Move forward: When the requirements changes

new.data.col <- c(400, 500, 350, 250, 60, 70, Inf)
rescale_minmax(new.data.col)
[1] 0 0 0 0 0 0 NaN

Fix the code

rescale_minmax <- function(x){
rng <- range(x, na.rm = TRUE, finite=TRUE)
(x - rng[1]) / (rng[2] - rng[1])
}
new.data.col <- c(400, 500, 350, 250, 60, 70, Inf)
rescale_minmax(new.data.col)
[1] 0.77272727 1.00000000 0.65909091 0.43181818 0.00000000 0.02272727 Inf
10 / 40

Your turn

11 / 40

Rewrite rescale_minmax so that -Inf is set to 0, and Inf is mapped to 1.

04:00
12 / 40

Your turn

13 / 40

R for Data Science - Exercise 19.2.1, Question 3

04:00
14 / 40

Your turn

15 / 40

R for Data Science - Exercise 19.2.1, Question 4

10:00
16 / 40

Functions are for humans and computers

  • Descriptive names for variables.

  • Comment your code.

17 / 40

Your turn

18 / 40

Write your own function to calculate parameter estimates of simple linear regression model.

Help: β^=(XTX)1XTY

05:00
19 / 40

Write a function to calculate confidence intervals for mean. x¯±tα/2,(n1)s(n)

10:00
20 / 40

Function arguments

cal_mean_ci <- function(x, conf){
len.x <- length(x)
se <- sd(x) / sqrt(len.x)
alpha <- 1-conf
mean(x) + se * qt(c(alpha / 2, 1 - alpha / 2), df = len.x-1)
}
data <- c(165, 170, 175, 180, 185)
cal_mean_ci(data, 0.95)
[1] 165.1838 184.8162
21 / 40

Function arguments

cal_mean_ci <- function(x, conf){
len.x <- length(x)
se <- sd(x) / sqrt(len.x)
alpha <- 1-conf
mean(x) + se * qt(c(alpha / 2, 1 - alpha / 2), df = len.x-1)
}
data <- c(165, 170, 175, 180, 185)
cal_mean_ci(data, 0.95)
[1] 165.1838 184.8162

Function with default values

cal_mean_ci <- function(x, conf = 0.95){
len.x <- length(x)
se <- sd(x) / sqrt(len.x)
alpha <- 1-conf
mean(x) + se * qt(c(alpha / 2, 1 - alpha / 2), df = len.x-1)
}
cal_mean_ci(data)
[1] 165.1838 184.8162
cal_mean_ci(data, 0.99)
[1] 158.7221 191.2779
21 / 40

Conditional executions

  • Control the flow of the execution.

  • Common ones include:

    • if, else

    • for

    • while

    • repeat

    • break

    • next

    • switch

22 / 40

If

if (condition) {
# do something
} else {
# do something else
}

Example

test_even_odd <- function(x){
if (x %% 2 == 0){
print("even number")
} else {
print("odd number")
}
}
test_even_odd(5)
[1] "odd number"
test_even_odd(6)
[1] "even number"
23 / 40

ifelse: vectorization with ifelse

ifelse(condition, if TRUE the output, if FALSE the output)

Example

test_even_odd_v2 <- function(x){
ifelse(x %% 2 == 0, "even number", "odd number")
}
test_even_odd_v2(5)
FALSE [1] "odd number"
test_even_odd_v2(c(1,6))
FALSE [1] "odd number" "even number"
24 / 40

Difference between if, else and ifelse

test_even_odd <- function(x){
if (x %% 2 == 0) {
print("even number")
} else {
print("odd number")
}
}
test_even_odd(5)
FALSE [1] "odd number"
test_even_odd(c(1,6))
FALSE Warning in if (x%%2 == 0) {: the condition has length > 1 and only the first
FALSE element will be used
FALSE [1] "odd number"
test_even_odd_v2 <- function(x){
ifelse (x %% 2 == 0, "even number", "odd number")
}
test_even_odd_v2(5)
FALSE [1] "odd number"
test_even_odd_v2(c(1,6))
FALSE [1] "odd number" "even number"
25 / 40

Nested if-else

  • Multiple conditions
grade_marks <- function(marks){
if (marks < 20) {
"D"
} else if (marks <= 50) {
"C"
} else if (marks <= 60) {
"B"
} else {
"A"
}
}
grade_marks(75)
[1] "A"
26 / 40

Your turn

27 / 40

R for Data Science-Exercises 9.4.4 - Q2

Help:

lubridate::now() and lubridate::hour()

10:00
28 / 40

for loop

  • execute a block of code a specific number of times or until the end of a sequence.
for (i in 1:5) {
print(i*100)
}
[1] 100
[1] 200
[1] 300
[1] 400
[1] 500
29 / 40
continents <- c("Asia", "EU", "AUS", "NA", "SA", "Africa")
for (i in continents) {
print(continents[i])
}
for (i in 1:4) {
print(continents[i])
}
for (i in seq(continents)) {
print(continents[i])
}
for (i in 1:4) print(continents[i])
## [1] "Asia"
## [1] "EU"
## [1] "AUS"
## [1] "NA"
## [1] "SA"
## [1] "Africa"
30 / 40

Nested loops

mat <- matrix(1:6, ncol=2)
mat
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
for (i in 1:3) {
for (j in 1:2) {
print(mat[i, j])
}
}
[1] 1
[1] 4
[1] 2
[1] 5
[1] 3
[1] 6
31 / 40

Your turn

32 / 40

Write a function to count the number of even numbers in a vector.

08:00
33 / 40

While

i <- 1 # initial value
while (i < 10) {
print(i)
i <- i + 1 # increment
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
34 / 40

Your turn

35 / 40

Fibonacci Sequence

Print the first n numbers of the Fibonacci Sequence.

0, 1, 1, 2, 3, 5, 8....

36 / 40

Repeat and break

  • Iterate over a block of code multiple number of times.

  • No condition check in repeat loop to exit the loop.

  • The only way to exit a repeat loop is to call break.

Example 1

x <- 5
repeat {
print(x)
x = x+1
if (x == 10){
break
}
}
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
37 / 40

Repeat and break

  • Iterate over a block of code multiple number of times.

  • No condition check in repeat loop to exit the loop.

  • The only way to exit a repeat loop is to call break.

Example 1

x <- 5
repeat {
print(x)
x = x+1
if (x == 10){
break
}
}
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9

Example 2

set.seed(1)
repeat {
x<-runif(1, 5, 10)
print(x)
if(x < 6.1){
break
}
}
[1] 6.327543
[1] 6.860619
[1] 7.864267
[1] 9.541039
[1] 6.00841
37 / 40

Next

for(i in 1:10) {
if(i <= 5) {
next # Skip the first 5 iterations
}
print(i)
}
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
38 / 40

switch

When you want a function to do different things in different circumstances, then the switch function can be useful.

feelings <- c("sad", "afraid")
for (i in feelings){
print(
switch(i,
happy = "I am glad you are happy",
afraid = "There is nothing to fear",
sad = "Cheer up",
angry = "Calm down now"
))
}
[1] "Cheer up"
[1] "There is nothing to fear"
39 / 40

Slides available at: hellor.netlify.com

All rights reserved by Thiyanga S. Talagala

40 / 40
2 / 40
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
s Start & Stop the presentation timer
t Reset the presentation timer
?, h Toggle this help
Esc Back to slideshow