+ - 0:00:00
Notes for current slide
Notes for next slide

STA 326 2.0 Programming and Data Analysis with R

The Grammar of Graphics

Dr Thiyanga Talagala

Online distance learning/teaching materials during the COVID-19 outbreak.

1 / 86

description of the image

Acknowledgement: Justin Matejke and George Fitzmaurice, Autodesk Research, Canada

2 / 86

Grammar of Graphics

knitrhex

knitrhex

3 / 86

Packages

library(tidyverse) # To obtain ggplot2
library(magrittr)

knitrhex rmarkdown

4 / 86

Dataset

library(gapminder)
glimpse(gapminder)
Rows: 1,704
Columns: 6
$ country <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, Afghani…
$ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia,…
$ year <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997,…
$ lifeExp <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.…
$ pop <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372, 1…
$ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134,…
5 / 86

Plotting with R

Base R

  • using plot() function

Using ggplot2: grammar of graphics

  1. ggplot2 package: qplot() function

    • qplot: quick plot

    • very similar to how you graph with plot() function

  2. ggplot2 package: ggplot() function

    • fully utilize the power of grammar
6 / 86

Grammar

English

  • Nouns

  • Article

  • Adjective

  • Verb

  • Adverb

  • Proposition

Graphics

knitrhex

7 / 86

Grammar

English

The little monkey hangs confidently by a branch.

  • Article: The

  • Adjective: little

  • Noun: monkey

  • Verb: hangs

  • Adverb: Confidently

  • Proposition: by

  • Noun: a branch

Graphics

ggplot(iris)+
aes(x = Sepal.Length,
y = Sepal.Width)+
geom_point()

8 / 86

Elements of ggplot2 object

  • Data

  • Aesthetics: x, y, col

  • Geometrics: geom_point, geom_boxplot

9 / 86

Elements of ggplot2 object

knitrhex

  • Data: data

  • Aesthetics: aes

  • Geometrics: geom_*

10 / 86
11 / 86

Making your first plot with ggplot

12 / 86

Data: data to be plotted

knitrhex

'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
13 / 86

Data

ggplot(iris)

14 / 86

Aesthetics: mapping variables

knitrhex

  • x

  • y

  • colour

  • shape

15 / 86

Data + Aesthetics

ggplot(iris,
aes(x =S epal.Length,
y = Sepal.Width))

16 / 86

Geometrics

knitrhex

  • geom_point

  • geom_boxplot

17 / 86

Data + Aesthetics + Geometrics

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width))+
geom_point()

18 / 86

Data + Aesthetics + Geometrics

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width))+
geom_point()

knitrhex

19 / 86

Data + Aesthetics + Geometrics

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width))+
geom_point(col = "forestgreen")

20 / 86

Data + Aesthetics + Geometrics

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width))+
geom_point(col = "forestgreen",
shape = 8)

21 / 86

Data + Aesthetics + Geometrics

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width,
col = Species))+
geom_point()

22 / 86

Data + Aesthetics + Geometrics

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width,
col= Species))+
geom_point(
shape = 3
)

23 / 86

Data + Aesthetics + Geometrics

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width,
col = Species))+
geom_point()

knitrhex

24 / 86

Facets: small multiples

knitrhex

25 / 86

Data + Aesthetics + Geometrics + Facets

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width,
col = Species))+
geom_point()+
facet_grid(~Species)

26 / 86

Data + Aesthetics + Geometrics + Facets

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width,
col = Species))+
geom_point()+
facet_grid(Species ~.)

27 / 86

Statistics

knitrhex

28 / 86

Data + Aesthetics + Geometrics + Facets + Statistics

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width,
col = Species))+
geom_point()+
facet_wrap(~Species)+
stat_smooth(method = "lm", se = F, col ="red")

29 / 86

Data + Aesthetics + Geometrics + Facets + Statistics

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width,
col = Species))+
geom_point()+
facet_wrap( ~ Species)+
stat_smooth(method = "lm", se = T, col = "red")

30 / 86

Coordinate

knitrhex

31 / 86

Data + Aesthetics + Geometrics + Facets + Statistics + Coordinate

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width,
col = Species)) +
geom_point() +
facet_wrap( ~ Species) +
stat_smooth(method = "lm", se = T, col = "red") +
coord_equal()

32 / 86

Theme

knitrhex

33 / 86

Data + Aesthetics + Geometrics + Facets + Statistics + Coordinate+ Theme

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width,
col = Species)) +
geom_point() +
facet_wrap( ~ Species) +
stat_smooth(method = "lm", se = T, col ="red") +
coord_equal() +
theme(legend.position = "bottom")

34 / 86

Scale

knitrhex

35 / 86

Data + Aesthetics + Geometrics + Facets + Statistics + Coordinate + Theme + Scale

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width,
col = Species)) +
geom_point() +
facet_wrap( ~ Species) +
stat_smooth(method = "lm", se = T, col = "red") +
coord_equal() +
theme(legend.position = "bottom") +
scale_color_manual(values = c("#1b9e77", "#d95f02", "#7570b3"))

36 / 86

titles and axes labels

ggplot(iris,
aes(x = Sepal.Length,
y = Sepal.Width,
col = Species)) +
geom_point() +
facet_wrap( ~ Species) +
stat_smooth(method = "lm", se = T, col = "red") +
coord_equal() +
theme(legend.position = "bottom") +
scale_color_manual(values = c("#1b9e77", "#d95f02", "#7570b3"))+
labs(title="Scatter plot of Sepal Length vs Sepal Width",
x ="Sepal Length (cm)", y = "Sepal Width (cm)")

37 / 86

Your turn

Dataset: gapminder

Visualize the relationship between life expectancy, GDP per capita and continent in 2007.

38 / 86
gapminder2007 <- gapminder %>%
filter(year == 2007)
ggplot(gapminder2007,
aes(x = lifeExp, y = gdpPercap, col=continent)) +
geom_point() + theme(legend.position = "bottom") +
labs(title = "Relationship between life expectancy and GPD per capita by continent - 2007",
x ="life expectancy at birth, in years",
y = "GDP per capita (US$, inflation-adjusted)")

39 / 86

Add a vertical line

gapminder2007 <- gapminder %>%
filter(year == 2007)
ggplot(gapminder2007,
aes(x = lifeExp, y = gdpPercap, col=continent)) +
geom_point() +
geom_vline(xintercept = 70)

40 / 86

Add a horizontal line

gapminder2007 <- gapminder %>%
filter(year == 2007)
ggplot(gapminder2007,
aes(x = lifeExp, y = gdpPercap, col=continent)) +
geom_point() +
geom_hline(yintercept = 20000)

41 / 86

Add a diagonal line

gapminder2007 <- gapminder %>%
filter(year == 2007)
ggplot(gapminder2007,
aes(x = lifeExp, y = gdpPercap, col=continent)) +
geom_point() +
geom_abline(intercept = 20, slope=200)

42 / 86

All Geoms

[1] "geom_abline" "geom_area" "geom_bar"
[4] "geom_bin2d" "geom_blank" "geom_boxplot"
[7] "geom_col" "geom_contour" "geom_contour_filled"
[10] "geom_count" "geom_crossbar" "geom_curve"
[13] "geom_density" "geom_density_2d" "geom_density2d"
[16] "geom_dotplot" "geom_errorbar" "geom_errorbarh"
[19] "geom_freqpoly" "geom_hex" "geom_histogram"
[22] "geom_hline" "geom_jitter" "geom_label"
[25] "geom_line" "geom_linerange" "geom_map"
[28] "geom_path" "geom_point" "geom_pointrange"
[31] "geom_polygon" "geom_qq" "geom_qq_line"
[34] "geom_quantile" "geom_raster" "geom_rect"
[37] "geom_ribbon" "geom_rug" "geom_segment"
[40] "geom_sf" "geom_sf_label" "geom_sf_text"
[43] "geom_smooth" "geom_spoke" "geom_step"
[46] "geom_text" "geom_tile" "geom_violin"
[49] "geom_vline"
43 / 86

geom_boxplot

ggplot(gapminder2007, aes(x=lifeExp, y=continent)) +
geom_boxplot()

44 / 86

geom_boxplot

ggplot(gapminder2007, aes(x=lifeExp, y=continent, color=continent)) +
geom_boxplot()

45 / 86

geom_boxplot

ggplot(gapminder2007, aes(x=lifeExp, y=continent, fill=continent)) +
geom_boxplot()

46 / 86

geom_boxplot

ggplot(gapminder2007, aes(x=lifeExp, y=continent)) +
geom_boxplot(fill="forestgreen")

47 / 86

geom_boxplot

ggplot(gapminder2007, aes(x=lifeExp, y=continent)) +
geom_boxplot(fill="forestgreen", alpha=0.5)

48 / 86

geom_point

ggplot(gapminder2007, aes(x=lifeExp, y=continent)) +
geom_point()

49 / 86

geom_jitter

ggplot(gapminder2007, aes(x=lifeExp, y=continent)) +
geom_jitter()

50 / 86

geom_jitter + geom_boxplot

ggplot(gapminder2007, aes(x=lifeExp, y=continent)) +
geom_jitter() +
geom_boxplot()

51 / 86

geom_jitter + geom_boxplot

ggplot(gapminder2007, aes(x=lifeExp, y=continent)) +
geom_jitter() +
geom_boxplot(alpha=0.5)

52 / 86

geom_jitter + geom_boxplot

ggplot(gapminder2007, aes(x=lifeExp, y=continent)) +
geom_boxplot() +
geom_jitter()

53 / 86

geom_jitter + geom_boxplot

ggplot(gapminder2007, aes(x=lifeExp, y=continent, fill=continent)) +
geom_boxplot() +
geom_jitter()

54 / 86

geom_jitter + geom_boxplot

ggplot(gapminder2007, aes(x=lifeExp, y=continent, fill=continent)) +
geom_boxplot() +
geom_jitter(aes(col=continent))

55 / 86

geom_jitter + geom_boxplot (outlier.shape = NA)

ggplot(gapminder2007, aes(x = lifeExp, y = continent, fill = continent)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(aes(col = continent))

56 / 86

geom_jitter + geom_boxplot

ggplot(gapminder2007, aes(x=lifeExp, y=continent, fill=continent, col=continent))+
geom_boxplot(outlier.shape = NA) +
geom_jitter(aes(col=continent))

57 / 86

geom_jitter + geom_boxplot

ggplot(gapminder2007,
aes(x=lifeExp, y=continent, fill=continent, col=continent))+
geom_boxplot(outlier.shape = NA, alpha=0.2) +
geom_jitter(aes(col=continent))

58 / 86

geom_jitter + geom_boxplot + coord_flip

ggplot(gapminder2007,
aes(x=lifeExp, y=continent, fill=continent, col=continent))+
geom_boxplot(outlier.shape = NA, alpha=0.2) +
geom_jitter(aes(col=continent)) +
coord_flip()

59 / 86

geom_boxplot

ggplot(gapminder2007, aes(y=lifeExp))+
geom_boxplot()

60 / 86

geom_boxplot + facet_wrap

ggplot(gapminder2007,
aes(y = lifeExp))+
geom_boxplot() + facet_wrap(~continent, ncol = 5)

61 / 86

geom_density

ggplot(gapminder2007,
aes(x=lifeExp))+
geom_density() +
facet_wrap(~continent, ncol=5)

62 / 86

Your turn

Modify the code below to obtain the following plot.

ggplot(gapminder2007,
aes(x=lifeExp))+
geom_density()

63 / 86

geom_histogram

ggplot(gapminder2007,
aes(x=lifeExp))+
geom_histogram()

64 / 86

geom_bar

ggplot(gapminder2007,
aes(x=continent))+
geom_bar()

65 / 86

Your turn

Modify the code below to obtain the following plot.

ggplot(gapminder2007,
aes(x=continent))+
geom_bar()

66 / 86

geom_bar (stat="identity")

Method 1

cut.percent <- data.frame(cut=c("Fair", "Good", "Very Good", "Premium",
"Ideal"), percent=c(3, 9, 22.4, 25.6, 40))
cut.percent
cut percent
1 Fair 3.0
2 Good 9.0
3 Very Good 22.4
4 Premium 25.6
5 Ideal 40.0
ggplot(data=cut.percent, aes(x=cut, y=percent)) +
geom_bar(stat="identity")

67 / 86

geom_col

Method 2

cut.percent <- data.frame(cut=c("Fair", "Good", "Very Good", "Premium",
"Ideal"), percent=c(3, 9, 22.4, 25.6, 40))
cut.percent
cut percent
1 Fair 3.0
2 Good 9.0
3 Very Good 22.4
4 Premium 25.6
5 Ideal 40.0
ggplot(data=cut.percent, aes(x=cut, y=percent)) +
geom_col()

68 / 86

Change the order of levels

Method 2

cut.percent <- data.frame(cut=c("Fair", "Good", "Very Good", "Premium",
"Ideal"), percent=c(3, 9, 22.4, 25.6, 40))
cut.percent$cut <- factor(cut.percent$cut,
levels = c("Fair", "Good", "Very Good",
"Premium", "Ideal"))
ggplot(data=cut.percent, aes(x=cut, y=percent)) +
geom_col()

69 / 86

geom_point

gapminder %>%
filter(country == "India") %>%
ggplot(aes(x = year, y = gdpPercap)) +
geom_point()

70 / 86

geom_line

gapminder %>%
filter(country == "India") %>%
ggplot(aes(x = year, y = gdpPercap)) +
geom_line()

71 / 86

geom_line + geom_point

gapminder %>%
filter(country == "India") %>%
ggplot(aes(x = year, y = gdpPercap)) +
geom_line() +
geom_point()

72 / 86

Your turn

Modify the code below to obtain the following plot.

gapminder %>% filter(country == "India") %>%
ggplot(aes(x = year, y = gdpPercap)) + geom_line() + geom_point()

73 / 86

Data Wrangling + Data Visualization

avglifeExp <- gapminder %>%
group_by(continent, year) %>%
summarise(meanlifeExp=mean(lifeExp))
avglifeExp
# A tibble: 60 x 3
# Groups: continent [5]
continent year meanlifeExp
<fct> <int> <dbl>
1 Africa 1952 39.1
2 Africa 1957 41.3
3 Africa 1962 43.3
4 Africa 1967 45.3
5 Africa 1972 47.5
6 Africa 1977 49.6
7 Africa 1982 51.6
8 Africa 1987 53.3
9 Africa 1992 53.6
10 Africa 1997 53.6
# … with 50 more rows
74 / 86

Your turn

Write an R code to reproduce the plot below.

Hint: use avglifeExp

75 / 86

Your turn

Write an R code to reproduce the plot below.

76 / 86

Your turn

Write an R code to reproduce the plot below.

Hint: Next slide

77 / 86
gapminder %>%
ggplot(aes(y=log(lifeExp), x=log(gdpPercap), color=continent)) +
geom_point() +
labs(y = "log(Life Expectancy)",
x = "log(GDP per capita)")

78 / 86

Your turn

Write an R code to reproduce the plot below.

79 / 86

geom_point

ggplot(gapminder, aes(x=year, y=gdpPercap, colour=continent))+geom_point()

80 / 86

geom_smooth

ggplot(gapminder, aes(x=year, y=gdpPercap, colour=continent))+
geom_smooth()

81 / 86

Your turn

Write an R code to reproduce the plot below.

82 / 86

Your turn

Write an R code to reproduce the plot below.

83 / 86

Your turn

Write an R code to visualize the shape of standard normal distribution.

Hint: dnorm

84 / 86

Recap

aes

  • x
  • y
  • colour
  • size

geom

  • geom_point
  • geom_jitter
  • geom_line
  • geom_bar
  • geom_col
  • geom_histogram
  • geom_smooth
  • geom_density
  • geom_abline
  • geom_vline
  • geom_hline

geom arguments

  • colour
  • fill
  • size
  • alpha
  • shape

other elements

  • labs
  • coord_equal
  • coord_flip
  • scale_colour_manual
  • labs
  • facet_wrap
  • theme
85 / 86

Slides available at: hellor.netlify.app

All rights reserved by Thiyanga S. Talagala

86 / 86

description of the image

Acknowledgement: Justin Matejke and George Fitzmaurice, Autodesk Research, Canada

2 / 86
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
s Start & Stop the presentation timer
t Reset the presentation timer
?, h Toggle this help
Esc Back to slideshow