Transition your workflow, keep your sanity

Danielle Navarro

Sydney Research Bazaar, September 2019

slides.com/djnavarro/workflow

djnavarro

compcogscisydney.org

The year is 2009:

 

Danielle does not think about her project directory structure...

 

White space in a filename is a bad idea (partly for historical reasons)

Mistake #1

These are unnecessary copies of existing files

Mistake #2

Unnecessary files generated by MATLAB

Mistake #3

Mistake #4

Irrelevant files generated by the operating system

Mistake #5

Data stored in a format that is hard to work with unless you have a MATLAB licence

My data analysis scripts were in MATLAB so I can't run them anymore

Mistake #6

I honestly don't know why I used .eps graphics and nothing else

Mistake #7

If you don't want "future you" to be embarrassed, maybe think about your file names?

Mistake #8

Everything is dumped in one folder

Mistake #9

There is no README file!!!

Mistake #10

Inconsistent separator characters in file names

Mistake #11

Inconsistent naming scheme

Mistake #12

Names that are not very helpful to the human reader

Mistake #13

The year is 2019:

 

Danielle has become paranoid about her project directory structure...

 

Project templates

  • A standard folder structure
  • Documentation describing it
  • Agreed conventions for usage
  • Miscellaneous "nudges" to suggest good practice

 

Template

Project

A good template should include instructions for how to use it

e.g., https://github.com/djnavarro/newproject

If you use GitHub you can create new projects directly from your template (or from someone elses!)

Okay... so why did I organise my template to produce file structures like this?

My folder structure is organised to mirror the structure of a typical research project

Files and folders use a consistent naming scheme that is easy for humans to read and easy for machines to read

Every folder has a README

 

(I cannot stress this enough... future you will love you for this)

README.md

Relying on free and open source tools so that other people can use it

Best practice version control using a git repository

Cloud storage for the project using GitHub

How did I get here?

I had to learn a lot of new technical tools...

github.com

rstudio.com

r-project.org

osf.io

git-scm.com

psyarxiv.com

I've been helped by some wonderful teachers

I've relied on communities for support

twitter.com/rladiessydney

ozunconf19.ropensci.org

chdsummerschool.com

But most of all...

Transitions are hard

the gatekeepers' attempts to suppress the number of trans people allowed to transition occurred at virtually every step

indefinite periods of psychotherapy designed to evaluate whether or not they met the psychiatrists criteria

those who were allowed to begin the real-life test often faced additional obstacles as some gender identity clinics required trans people to begin their tests prior to starting hormone replacement therapy

exposed the transsexual to all sorts of discrimination, harassment and potential violence

little more than a hazing period

How you imagine it goes

How it actually goes...

Why is it hard?

Fear and anxiety

Exhaustion

Gatekeeping

The in-between problem

All models and code available as documented repositories on GitHub

I am a believer in open science. You should have used OSF

Gatekeeping via review?

We used a Bayesian approach for exploratory model building, evaluated by checking for robust performance on theoretically meaningful desiderata... blah blah blah

Give me a p-value

Gatekeeping via review?

We like music?

Your favourite music sucks

Gatekeeping via review?

Transitioning requires trade-offs

"Unlike my cisgender counterparts I was never subject to sexual harassment in maths class, but on the other hand I didn’t attend very many of my classes either because I was too busy [self-harming] ... Looking back, my unwillingness to face up to how I felt about gender looks a lot like a trade-off: purchasing a form of male privilege at the expense of my mental health"

git / GitHub

Everybody laughs because learning git is one of the hardest steps. We all go through this stage... and that's OKAY!

git / GitHub

  • Hard to learn
  • Manual commits
  • Can be finicky

 

Dropbox

  • Easy to learn
  • Automatic backup
  • Usually "just works"
  • Preserves your history
  • Issues threads
  • Integration w. Travis etc
  • Fork and modify
  • History is limited
  • No real analog of issues
  • No CI service at all
  • No analog of forks

https://happygitwithr.com/

Transitioning from Dropbox to git + GitHub was a good exchange for me because it suits the work I do.

 

 

Neither system is better in general. Your trade-off will look different to mine!

Some positive tips!

  • Offload cognition
  • Plan projects
  • Use side projects
  • Invest in your tools
  • Find your community

Offloading cognition

GitHub issues threads are excellent for this: your commits can link the code changes to the issue thread

Offloading cognition

You can do this for regular scientific projects as well as software development

Offloading cognition

Planning your project

Planning your time

Email tends to prioritise other people's needs, especially the loudest people

Manage todo lists and time management explicitly (e.g., via todoist.com) gives you better control

Use your side projects...

asciify

rainbowr

... to learn skills for real ones

Invest in your tools

Surviving the long climb

Cast of characters

A cognitive psychologist with some experience with R

A grumpy psychometrician with a limited time budget

A social psychologist who wants to believe

Hey, I've been learning new things!

Awesome! What have you learned?

Hey, I've been learning new things!

I can add cat gifs to a plot?

I can draw pretty maps?

I can model associative learning as Bayesian inference over inhomogeneous Markov random fields?

Hello?

  Hello? Was it something I said?

A complicating factor:

Hey, how would you feel about posting research code to the web?

Oh hell no. People would call me stupid

What? You literally wrote a textbook on latent variable modelling and your Ph.D. research won a best paper award at Multivariate Behavior Research. You are not stupid

Yeah but my code has to be written in 20 minute consults and it's ugly. People are unkind about code

  • Our code works (mostly), but it's ugly
  • We have limited time & money
  • It's difficult for us to make nice code, but at least we have some training 
  • You want me to put myself in the firing line too? Pfft
  • Why should I do that????
  • Byeeee....
  • "Be kind: Everyone you meet is fighting a hard battle"

 

  • Academic communities should not become Thunderdome

 

  • Public shaming masquerading as "scientific critique" has no value. Don't do it

So... it's a long way to the top, someone might push me off a cliff, and my peers might not respect the effort if I make it?

 

(seriously, you are terrible at sales)

Well, maybe I exaggerate.

Still, it helps to climb with a team...

tidyverse

lme

git

BayesFactor

lavaan

car

Shiny

R markdown

learning all the things

It can be hard to find a team within a single applied field...

#rstats

But there are people who will support you

 

Yay!

 

          compcogscisydney.org

          d.navarro@unsw.edu.au

          twitter.com/djnavarro

          github.com/djnavarro

Find your community!

#rstats

...But look, it's a long climb. Help me believe it's worthwhile

Okay, I'm back

Structural equations models? With Bayes?

Flexible experiments with Shiny apps?

Social network analysis? With beautiful pictures?

Okay, I'm listening, but...

Better tools for reproducible research? 

Which packages should I prioritise?

How will journal editors & reviewers respond?

How do I move my whole lab across?

base R

tidyverse

jsPsych

Matlab

JASP

BayesFactor

.Rmd

SQL

lme

JAGS

Stan

Which packages?

Migration?

base R

tidyverse

jsPsych

Matlab

JASP

BayesFactor

.Rmd

SQL

lme

JAGS

Stan

Transition your workflow, keep your sanity

By Danielle Navarro

Transition your workflow, keep your sanity

Keynote for Sydney ResBaz 2019

  • 8,210