TA 4
By CM 2018/4/26
Outline
- apply
- pipeline
- data manipulation
- dplyr
- tidyr
Apply family
- lapply
- sapply
- vapply
lapply
-
lapply(X, FUN, ...)
-
When you want to apply a function to each element of a list in turn and get a list back.
-
returns a list of the same length as X
-
each element of which is the result of applying FUN to the corresponding element of X.
sapply
-
sapply(X, FUN, ...)
-
When you want to apply a function to each element of a list in turn, but you want a vector back, rather than a list.
vapply
-
vapply(X, FUN, FUN.VALUE, ...)
-
similar to sapply
-
The FUN.VALUE argument expects a template for the return argument of this function FUN
-
FUN.VALUE: numeric / logical / ...
-
safer (and sometimes faster) to use
Pipeline
- a chain of data-processing stages
- mathematic point of view: composite function
%>% in R (magrittr) - f(x) => x %>% f
- g( f(x) ) => x %>% f %>% g
- packages: magrittr or dplyr
analytic process
-
data manipulation
-
data visualization
-
statistical analysis / modeling
-
deployment
data manipulation
-
cleaning and preparing the data (80% time)
-
well structured data
-
-
Data wrangling, sometimes referred to as data munging, is the process of transforming and mapping data from one "raw" data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics.
-
dplyr
-
install.packages(‘dplyr’) / library(dplyr)
-
basic verb of data manipulation
tidyr
-
install / library
-
gather / spread (pivot tables in excel)
-
separate / unite
R_Basic_week4
By a136489
R_Basic_week4
- 612