Tidyverse for Beginners

rladiessydney.org/littlemisstidyverse

Danielle Navarro (@djnavarro)

github.com/rladiessydney/littlemisstidyverse

Why be "Little Miss Tidyverse"?

Tidyverse is efficient...

https://gist.github.com/emitanaka/ef1d1b6ade5e57acc4734bc4bfbcc0cd

https://github.com/djnavarro/skyliner/blame/master/skyliner.R

Okay, let's do this!

www.bigblocksingsong.com

Sticky notes

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

r-project.org

www.rstudio.com

Install R and RStudio

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

Tutorial folder

  • Ensure LittleMissTidyverse is the working directory
  • Laziest way? Open RStudio by clicking on the LittleMissTidyverse.Rproj file :-)

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

R scripts & commands

script1.R

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

# Script1.R has a quick orientation for 
# complete newcomers:
#
#   - Comment symbol is #, R skips comments
#   - Variables are created using <-

my_age <- 40

#   - When you "source" the script the 
#     command above executes invisibly
#   - To get results to "print" to the 
#     console, do this: 

print(my_age / 10)

Get tidyverse

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

 

Get tidyverse

install.packages("tidyverse")

script2.R

library(tidyverse)

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

 

# Script2.R installs & loads tidyverse.
# Just to check... it draws a picture!

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

 

Get tidyverse

script2.R

# If you see this, it is working!

Exercise: Draw this plot!

script3.R

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 


my_data %>%
   do_this_thing() %>%
   now_do_that_thing() %>%
   and_one_more()

Key concept: %>%

This is referred to as "piping" the data through a series of steps

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 


the_result <- my_data %>%
   do_this_thing() %>%
   now_do_that_thing() %>%
   and_one_more()

Key concept: %>%

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 


the_summary <- my_data %>%
   group_by(GROUP) %>%
   summarise( 
    mean_var = mean(OUTCOME)
   )

print(the_summary)
 

Group & summarise

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

script4.R

script5.R

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

Additional columns?

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

 

the_summary <- my_data %>%
   group_by(GROUP) %>%
   summarise( 
    median = median(VARIABLE),
    mean = mean(VARIABLE),
    std_dev = sd(VARIABLE)
   )

print(the_summary)

Group & summarise

(just FYI)

the_summary <- my_data %>%
   group_by(GROUP) %>%
   summarise( 
    median = median(VARIABLE, na.rm = TRUE),
    mean = mean(VARIABLE, na.rm = TRUE),
    std_dev = sd(VARIABLE, na.rm = TRUE)
   )

print(the_summary)

Group & summarise

Handling missing data?

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

(just FYI)

my_data %>% 
  group_by(variable) %>%
  summarise(count = n())
my_data %>% 
  group_by(variable1, variable2) %>%
  summarise(count = n())

Frequency tables?

Group & summarise

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

(just FYI)

Explore your data!

result <- data %>%
   group_by(GROUP) %>%
   summarise(mean_var = mean(OUTCOME))
data <- read_csv(PATH_TO_FILE)
library(tidyverse)
picture <- my_data %>%
  ggplot(aes(x=XVAR, y=YVAR)) + 
  geom_point() + 
  geom_smooth()

(1)

(2)

(3)

(4)

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

Arrange

# for ascending order
my_data %>% arrange(VARIABLE)

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

# for descending order:
my_data %>% arrange(-VARIABLE)
my_data %>% arrange(desc(VARIABLE))

# to sort by one variable then another
my_data %>% arrange(VARIABLE1, VARIABLE2)
my_data %>% 
    filter(a_logical_expression)
# data from the women
my_data %>% 
    filter(gender == "female")

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

Filter

# data from women over 50
my_data %>% 
    filter(gender == "female" & age >= 50)
# keeps all rows, but only retain the
# columns for VARIABLE1 and VARIABLE2

my_data %>% 
    select(VARIABLE1, VARIABLE2)

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

Select

# reminder: arrange, filter, select DON'T 
# change the original data set (my_data).
# instead they return a new one, which you
# can assign to a variable

new_data <- old_data %>%
    filter( SOME ROWS ) %>% 
    select( SOME VARIABLES ) %>% 
    arrange( BY VARIABLE )

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

Arrange, filter, select

# reminder: arrange, filter, select DON'T 
# change the original data set (my_data).
# instead they return a new one, which you
# can assign to a variable

new_data <- old_data %>%
    filter( SOME ROWS ) %>% 
    select( SOME VARIABLES ) %>% 
    arrange( BY VARIABLE )
# here's how you alter the original:

my_data <- my_data %>%
    filter( SOME ROWS ) %>% 
    select( SOME VARIABLES ) %>% 
    arrange( BY VARIABLE )

script6.R

script7.R

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

Arrange, filter, select

# typical use case: create new variables 
# within the ORIGINAL data set:

my_data <- my_data %>%
    mutate(variable = expression)
# e.g.: create a log-transformed variable
my_data <- my_data %>% 
    mutate(log_RT = log(rt))

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

Mutate

script8.R

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

Mutate

Make it pretty!

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

Grammar of Graphics

pic <- my_data %>%
    ggplot(aes(AESTHETICS)) + 
    geom_something() 

plot(pic)

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

  • Aesthetics: things that can be mapped to variables (x-position, y-position, colour, shape, fill, etc)
  • Geometries: tools that can visually render the data (points, lines, histogram, violin, boxplots, contours, etc)

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

 

Grammar of Graphics

Grammar of Graphics

data %>%  
  ggplot(aes(  # set up the plot & aesthetics
     x = named_month, 
     y = total_users)) + 
  geom_col() +                    # add bars
  xlab("Month (2018)") +          # x-axis label
  ylab("Total Number of Users") + # y-axis label
  theme_bw()   # black & white theme

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

script9.R

Points and lines

tidy_users %>% 
    ggplot(aes(x = DayNum, y = Users)) +
    geom_point() +
    geom_line()

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

script9.R

Points and lines

# see script 9 
# for extra 
# customisation 
# options...

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

tidy_users %>%
    ggplot(aes(x = Users)) + 
    geom_histogram()

Histogram

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

# see script 9 
# for extra 
# customisation 
# options...

Box plot

tidy_users %>%
  ggplot(aes(
    x = MonthName,
    y = Users,
    colour = MonthName
  )) +
  geom_boxplot()

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

 

script9.R

Violin plot

tidy_users %>%
  ggplot(aes(
    x = MonthName,
    y = Users,
    colour = MonthName
  )) +
  geom_violin()

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

script9.R

Error bar plots

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

users_summary <- tidy_users %>%
  group_by(MonthName) %>%
  summarise(
    mean_user = mean(Users),
    sem = sd(Users) / sqrt(n())
  )

pic <- users_summary %>%
  ggplot(aes(
    x = MonthName,
    y = mean_user,
    ymin = mean_user - sem,
    ymax = mean_user + sem 
  )) +
  geom_col() + 
  geom_errorbar(width = .5)

script9.R

Error bar plots

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

Facets

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

tidy_users %>%
  ggplot(aes(x = Users)) +
  geom_histogram(bins = 10) + 
  facet_wrap(~named_month)

script9.R

Overall workflow

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

# read...
afl <- read_csv("./data/afl.csv")

# summarise...
attendance <- afl %>%
  group_by(year, game_type) %>%
  summarise(attendance = mean(attendance))

# visualise...
attendance %>%
  ggplot(aes(x = year, y = attendance)) +
  geom_point() + 
  facet_wrap(~ game_type) + 
  geom_smooth()

script10.R

Overall workflow

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

 

compcogscisydney.org/psyr/visualising-data.html

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

www.nailsmag.com/page/70218/market-research

tidy_nails.csv

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

These are two different "kinds" of way to "count" prevalence

script11.R

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

Gather

long_nails <- tidy_nails %>%
  gather(key = "kind", value = "count", 
         technicians, salons)

script12.R

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

What about going the other way??

script12.R

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

Spread

flat_nails <- pointy_nails %>%
  spread(key = "kind_year", value = "count")

script13.R

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

Nail art...

R markdown

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

R & RStudio

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

compcogscisydney.org/psyr

rladiessydney.org/littlemisstidyverse

 

          compcogscisydney.org

          d.navarro@unsw.edu.au

          twitter.com/djnavarro

          github.com/djnavarro

Tidyverse for Beginners

By Danielle Navarro

Tidyverse for Beginners

Slides for the "Little Miss Tidyverse" workshop hosted by R-Ladies Sydney (August 22, 2018). Blog post here: http://rladiessydney.org/littlemisstidyverse. Github repository here: https://github.com/rladiessydney/littlemisstidyverse

  • 10,879