+ - 0:00:00
Notes for current slide
Notes for next slide

STA 326 2.0 Programming and Data Analysis with R

R Data Import and Export

Dr Thiyanga Talagala

Online distance learning/teaching materials during the COVID-19 outbreak.

1 / 11

Data import with readr

R package

readr: part of the core tidyverse.

library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✓ ggplot2 3.3.5 ✓ purrr 0.3.4
✓ tibble 3.1.2 ✓ dplyr 1.0.7
✓ tidyr 1.1.3 ✓ stringr 1.4.0
✓ readr 1.4.0 ✓ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()

readr data import functions

  • read_csv: reads comma-delimited files.

  • read_csv2: reads semicolon-separated files

  • read_tsv: reads tab-delimited files

2 / 11

🛠 Import data from a .csv file

Syntax

datasetname <- read_csv("include_file_path")

When you run read_csv, it prints out the names and type of each column.

Switch to R

3 / 11

If the file is saved inside the project folder

Demo: Go to google classroom and watch the video importdatacsv1.mov

If the file is saved outside the project folder

Demo: Go to google classroom and watch the video importdatacsv2.mov

4 / 11

🛠 Importing csv file from a website

Syntax

datasetname <- read_csv("include url here")

Example

url <- "https://thiyanga.netlify.app/project/datasets/foodlabel.csv"
foodlabel <- read_csv(url)
Warning: Missing column names filled in: 'X43' [43]
Parsed with column specification:
cols(
.default = col_double()
)
See spec(...) for full column specifications.
head(foodlabel, 1)
# A tibble: 1 x 80
Gender Age Education Employment Income Housesize children marital fshopper
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 22 5 4 3 5 2 0 0
# … with 71 more variables: mplanner <dbl>, place <dbl>, FA <dbl>,
# Diabetes <dbl>, Metabolic cyndrents <dbl>, Other <dbl>, specific <dbl>,
# job1 <dbl>, job2 <dbl>, Exercise <dbl>, Health <dbl>, taste <dbl>,
# easy <dbl>, familiarity <dbl>, friends <dbl>, Useful <dbl>, Easiness <dbl>,
# Sufficient <dbl>, Trusfulness <dbl>, Clear <dbl>, attractive pack <dbl>,
# hc/nutriclaims <dbl>, graphical <dbl>, Free/prize <dbl>, source <dbl>,
# netquan <dbl>, low in fat <dbl>, low in cho <dbl>, sodium <dbl>,
# e labels <dbl>, place2 <dbl>, fa2 <dbl>, Health_1 <dbl>, X43 <dbl>,
# f1 <dbl>, f2 <dbl>, f3 <dbl>, f4 <dbl>, f5 <dbl>, f6 <dbl>, f7 <dbl>,
# f8 <dbl>, f9 <dbl>, f10 <dbl>, f11 <dbl>, f12 <dbl>, f13 <dbl>, f14 <dbl>,
# f15 <dbl>, f16 <dbl>, f17 <dbl>, f18 <dbl>, i1 <dbl>, i2 <dbl>, i3 <dbl>,
# i4 <dbl>, i5 <dbl>, i6 <dbl>, i7 <dbl>, i8 <dbl>, i9 <dbl>, i10 <dbl>,
# i11 <dbl>, i12 <dbl>, i13 <dbl>, i14 <dbl>, i15 <dbl>, i16 <dbl>,
# i17 <dbl>, i18 <dbl>, cluster <dbl>
5 / 11

read.csv and read_csv

  • read.csv is in base R.

  • read_csv is in tidyverse.

  • read.csv() performs a similar job to read_csv().

  • read_csv() works well with other parts of the tidyverse.

  • read_csv() is faster than read.csv().

  • read_csv() will always read variables containing text as character variable. In contrast, the base R function read.csv() will, by default, convert any character variable to a factor.

6 / 11

🛠 Writing to a File

  • We can save tibble (or dataframe) to a csv file, using write_csv().

  • write_csv() is in the readr package.

Syntax

write_csv(name_of_the_data_set_you_want_to_save, "path_to_write_to")

Example

data(iris)
# This will save inside your project folder
write_csv(iris, "iris.csv")
# This will save inside the data folder which is inside your project folder
write_csv(iris, "data/iris.csv")

Switch to R

Demo: Go to google classroom and watch the video exportdatacsv.mov

7 / 11

🛠 Importing Excel .xlsx files

Syntax

library(readxl)
mydata <- read_xlsx("file_path")

Switch to R

Demo: Go to google classroom and watch the video readxlsx.mov

8 / 11

Importing SAS, SPSS and STATA files

SAS

read_sas("mtcars.sas7bdat")
write_sas(mtcars, "mtcars.sas7bdat")

SPSS

read_sav("mtcars.sav")
write_sav(mtcars, "mtcars.sav")

Stata

read_dta("mtcars.dta")
write_dta(mtcars, "mtcars.dta")
9 / 11

Importing other types of data

  • feather: for sharing with Python and other languages

  • httr: for web apis

  • jsonlite: for JSON

  • rvest: for web scraping

  • xml2: for XML

Working with feather, httr, jsonlite, rvest and xml2 is beyond the scope of the course.

10 / 11

Slides available at: hellor.netlify.app

All rights reserved by Thiyanga S. Talagala

11 / 11

Data import with readr

R package

readr: part of the core tidyverse.

library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✓ ggplot2 3.3.5 ✓ purrr 0.3.4
✓ tibble 3.1.2 ✓ dplyr 1.0.7
✓ tidyr 1.1.3 ✓ stringr 1.4.0
✓ readr 1.4.0 ✓ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()

readr data import functions

  • read_csv: reads comma-delimited files.

  • read_csv2: reads semicolon-separated files

  • read_tsv: reads tab-delimited files

2 / 11
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
s Start & Stop the presentation timer
t Reset the presentation timer
?, h Toggle this help
Esc Back to slideshow