107-1 R Basic TA 4

Outline

  • pipe operator
  • data manipulation
    • dplyr
    • tidyr
  • ggplot2

Pipe operator

Pipe

  • a chain of data-processing stages
  • mathematical point of view: composite function
  • %>% in R (magrittr)
    • f(x) => x %>% f
  • g( f(x) ) => x %>% f %>% g
  • packages: magrittr / dplyr
  • Pipes in R Tutorial For Beginners

analytic process

analytic process

  • data manipulation

  • data visualization

  • statistical analysis / modeling

  • deployment

data manipulation

  • cleaning and preparing the data (80% time)

  • well structured data

  • data wrangling

    • Data wrangling, sometimes referred to as data munging, is the process of transforming and mapping data from one "raw" data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics.

dplyr

  • install.packages(‘dplyr’) / library(dplyr)

  • basic verb of data manipulation

tidyr

  • install / library

  • gather / spread (pivot tables in excel)

  • separate / unite

ggplot2

grammar graphics

MAIN parameters

  • data: iris, diamonds, (your own data), ...
  • aesthetic: x-y, shape, color, ...
  • geometry: point, line, bar, boxplot, ...

107-1 R Basic TA 4

By a136489

107-1 R Basic TA 4

  • 619