R and Git[Hub]

Anton Antonov (@tonytonov)

St.Petersburg R user group

So, science

Science isn't about why -- it's about why not

Keep calm

&

adopt a cat

technologies

Q: Is it going to hurt?

A: Yes

Dropbox, Google docs

Are designed to synchronize files

  • Try editing the same place simultaneously
  • Try working offline and then sync with others
  • Good luck figuring out who did that particular change

Git

Git is a VCS (version control system)

created by Linus Torvalds, 2005

Git contains full project evolution

in a series of filesystem snapshots

Git is open source, reliable, secure and fast,

making it a de-facto standard for developers

Git is a sufficiently advanced technology,

therefore it's magic*

*Arthur C. Clarke

Git basics

repository             project folder

commit                  save snapshot

  • Commit is a "checkpoint"
  • Contains info: who and when did it
  • Human-readable commit messages

script.R

script_rev2.R

script_rev6_comments.R

script_rev22_ver5_latest.R

script.R

script.R

script.R

script.R

commit

commit

commit

script.R

script.R

script.R

script.R

commit

commit

commit

John Doe, 13:37 04/10/17

"Changed NA handling"

John Doe, 20:17 04/10/17

"Fixed incorrect input

(now using character, not factor)"

John Doe, 00:42 05/10/17

"ggplot style tweaked via theme()"

Git basics

remote          repository that is somewhere else 

pull                 grab commits from remote

clone              initial pull from remote

push               send commits to remote

  • Remote serves as a backup and provides sync
  • Commit often, push when ready to share

remote

1. clone

2. commit

3. push

1. clone

4. pull

GitHub

  1. Very popular web-based remote
  2. Yep, it's free
  3. Active (R) community
  4. Visibility
  5. Ease of collaboration
  6. Great web interface
  7. Git flow adapted for various purposes
  8. Did I mention it's totally free?

RStudio integration

Alternatives: Git clients (SourceTree, etc.), terminal 

diff

  1. Go to GitHub and create a repo
  2. Use RStudio to clone it
  3. Do some changes, commit (x2-5)
  4. Upload changes via push

Your turn

GitHub

fork                     make a copy of GitHub repo

pull request      send changes from fork to original repo

  • Fork any public repository to tailor the project to your specific needs
  • Great way to contribute to open source, FTW
  • Convenient to maintain your project
  1. Go to https://github.com/tonytonov/tcts-git/issues/2
  2. Take repository one comment above and fork it 
  3. Use RStudio to clone your fork
  4. Do some changes, commit
  5. Upload changes via push
  6. Send a pull request with proposed changes
  7. Review someone else's pull request for your repo
  8. Check out your GitHub profile, be proud

Your turn

GitHub issues

R packages

A great way to share R code

Learn by observing

  • R code
  • Help files
  • Package description
  • Imports/exports
  • Tell Git not to track these files

Bottomline

Pros

  • no revision hell
  • transparent collaboration
  • backup on steroids
  • it's hard to screw things up*
  • visibility, exposure
  • same instrument, many targets

Cons

  • takes time & effort to learn & maintain
  • same applies for everyone on your group
  • has (known) limitations

Extra stuff

  • More GitHub features (visualisation, analytics)
  • devtools::install_github()
  • Similar services: GitLab, Bitbucket
  • GitHub pages
  • RMarkdown, LaTeX
  • Bookdown
  • Integration with other services

Links

Links

Thank you!

R and Git[Hub]

By Antonov Anton