Karl Ho
Data Generation datageneration.io
Karl Ho
School of Economic, Political and Policy Sciences
University of Texas at Dallas
Data Programming
Data Acquisition
Data Visualization
Data programming
}
- Maribel Fernandez 2014
# Create preload function # Check if a package is installed. # If yes, load the library # If no, install package and load the library preload<-function(x) { x <- as.character(x) if (!require(x,character.only=TRUE)) { install.packages(pkgs=x, repos="http://cran.r-project.org") require(x,character.only=TRUE) } }
learning how to program can significantly enhance how social scientists can think about their studies, and especially those premised on the collection and analysis of digital data.
- Brooker 2019:
Chances are the language you learn today will quite likely not be the language you'll be using tomorrow.
- Venables, Smith and the R Core team
Source: Nick Thieme. 2018. R Generation: 25 years of R https://rss.onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2018.01169.x
The script window:
You can store a document of commands you used in R to reference later or repeat analyses
Environment:
Lists all of the objects
Console:
Output appears here. The > sign means R is ready to accept commands.
Plot/Help:
Plots appear in this window. You can resize the window if plots appear too small or do not fit.
The script window:
You can store a document of commands you used in R to reference later or repeat analyses
Environment:
Lists all of the objects
Console:
Output appears here. The > sign means R is ready to accept commands.
Plot/Help:
Plots appear in this window. You can resize the window if plots appear too small or do not fit.
mydata <- read.csv(“path”,sep=“,”,header=TRUE)
mydata.spss <- read.spss(“path”,sep=“,”,header=TRUE)
mydata.dta <- read.dta(“path”,sep=“,”,header=TRUE)
happy=read.csv("https://raw.githubusercontent.com/kho7/SPDS/master/R/happy.csv")
mydata$column
mydata$Age.rec<-recode(mydata$Age, "18:19='18to19'; 20:29='20to29';30:39='30to39'")
start.time <- Sys.time() # use base function
message("Start....")
.... # Procedure codes
message("Done!")
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken
library(tictoc)
tic("message")
.... # Procedure codes
toc("message")
User time:
the amount of CPU time spent by the current process (i.e., the current R session) executing the expression. It measures the time spent executing user-level code, such as loops, conditionals, and function calls.
System time:
the amount of CPU time spent by the kernel (the operating system) on behalf of the current process executing the expression. It measures the time spent executing system-level code, such as opening files, doing input or output, starting other processes, and looking at the system clock.
Elapsed time:
the wall clock time taken to execute the expression. It measures the total time taken to execute the expression, including time spent waiting for input or output, time spent waiting for other processes to complete, and time spent waiting for the CPU to become available.
Socket
socket launches a new version of R on each core. Technically this connection is done via networking (e.g. the same as connecting to a remote server), but the connection is happening all on local computer.
works on all platforms including Windows and MacOS
Pro:
Each process on each node is unique so it can’t cross-contaminate.
Con:
Each process is unique so it will be slower
Package loading need to be done in each process separately. Variables defined on main version of R do not exist on each core unless explicitly placed there.
More complicated to implement.
parallel
in the background) or when running in a GUI (such as RStudio). Errickson, Josh. Parallel Processing in R
Jones Matt. 2017. Quick introduction to Parallel Computing in R
Beware of bugs in the above code; I have only proved it correct, not tried it."
- Donald Knuth, author of The Art of Computer Programming
Source: https://www.frontiersofknowledgeawards-fbbva.es/version/edition_2010/
By Karl Ho