TA 4

By CM 2018/4/26

Outline

  • apply
  • pipeline
  • data manipulation
  • dplyr
  • tidyr

Apply family

  • lapply
  • sapply
  • vapply

lapply

  • lapply(X, FUN, ...)

  • When you want to apply a function to each element of a list in turn and get a list back.

  • returns a list of the same length as X

  • each element of which is the result of applying FUN to the corresponding element of X.

sapply

  • sapply(X, FUN, ...)

  • When you want to apply a function to each element of a list in turn, but you want a vector back, rather than a list.

vapply

  • vapply(X, FUN, FUN.VALUE, ...)

  • similar to sapply

  • The FUN.VALUE argument expects a template for the return argument of this function FUN

  • FUN.VALUE: numeric / logical / ...

  • safer (and sometimes faster) to use

Pipeline

  • a chain of data-processing stages
  • mathematic point of view: composite function
    %>% in R (magrittr)
  • f(x) => x %>% f
  • g( f(x) ) => x %>% f %>% g
  • packages: magrittr or dplyr

analytic process

  • data manipulation

  • data visualization

  • statistical analysis / modeling

  • deployment

data manipulation

  • cleaning and preparing the data (80% time)

  • well structured data

  • data wrangling

    • Data wrangling, sometimes referred to as data munging, is the process of transforming and mapping data from one "raw" data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics.

dplyr

  • install.packages(‘dplyr’) / library(dplyr)

  • basic verb of data manipulation

tidyr

  • install / library

  • gather / spread (pivot tables in excel)

  • separate / unite

R_Basic_week4

By a136489

R_Basic_week4

  • 612