Organizing
Your Research
Data

Andrea Baruzzi (Sciences)

Sarah Elichko (Social Sciences)

Keeping your files, folders, and research life in order

Decisions

creating

editing

saving

sharing

Files
&
Folders

Approaches that will generally work with tools you already use

(Mac or PC, Google Drive, Dropbox, etc.)

Today's focus:

What do we mean by organized?

 

• Human-friendly

• Machine-friendly

• Reliable

Why bother?

 

• Reduce time and energy spent on logistics

 

• Improve collaboration

 

• Opportunity to think through your work

 

• Can facilitate using digital tools for analysis

Folders
Files
Storage

 

Folders
Files
Storage

 

Folders

provide structure

(because without them, all you'd have is this...)

Folder Structure: Common Patterns

/[Program]/[Year]/[Aspect of program]

/[Project]/[Type of file]/[Data collector name]/[YYYYMMDD]

/butterfly/tabular/mcneill/20160117
/butterfly/images/mcneill/20160117

/2016-2017/fall/hist091/paper1

/2016-2017/fall/soan098/proposal

/[Academic Year]/[Semester]/[Course]/[Assignment]

     Jan 17, 2016

/RIAs/2016-2017/applications
/RIAs/2016-2017/lesson_plans

Folder Structure: Best Practices

• Format dates, numbers, and names so they sort well   (sortable)

Machine-friendly:

Human-friendly:

• Choose names that make sense to you   (skimmable)

• Use consistent abbreviations + terms   (predictable)

• Consider adding a readme file   (understandable)

Folder Structure: Best Practices

Format dates so they sort well

// Machine-friendly // 

To sort correctly, use:     YYYY-MM-DD

YYYY-MM-DD

2011-10-13

20111013

Year-Month-Day

2015-10-12

20151012

Folder Structure: Best Practices

Use leading zeroes so numbers sort well

// Machine-friendly // 

Usual practice:

  • List as entered: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12

  • When sorted:     1, 10, 11, 12, 2, 3, 4, 5, 6, 7, 8, 9

 Better practice: 

  • List as entered:   001, 002 ... 010, 011, 012

  • When sorted:       001, 002 … 010, 011, 012

Folder Structure: Best Practices

Format names so they sort well  

// Machine-friendly // 

Typical sort order*:

  1. Underscores
  2. Numbers
  3. Letters

 

   * on most systems, including       Google Drive and Mac

Folders sorted alphabetically:

    _2017-07-14
    2017-07-14
    2017-08-01
    ASAP
    SOON
    TODAY

Folder Structure: Best Practices

Try it out

• Adjust dates so they're sortable:  

           YYYY-MM-DD
           2017-06-23
 

• Add leading zeroes to numbers
   in folder names:                

   001, 002 ... 010, 011, 012

Folder Structure: Best Practices

Research-Group-A / Project-1

Research-Group-A / Project-2

2016-2017 / Fall / ANTH-043E
2016-2017 / Fall / SOCI-055C

// Human Friendly //

Choose names that will make sense at a glance
(to you and to colleagues)

Folder Structure: Best Practices

// Human Friendly //

Use consistent abbreviations + terms   (predictable)

• If there's a standard out there, use it
           (e.g. department abbreviations)

• If not, document your decision (see below)

Add a readme file to folders   (understandable)

• Short & simple - what do you mean by xyz

• Gift to your future self (/colleagues)

Creating Your Folders

1.)  What are the key structures that organize your work?

Semesters

Courses 

Projects

Groups / Collaborations

Do any repeat?  These might make good top-level folders.

Research-Group-A / Project-1

Research-Group-A / Project-2

Meetings

Assignments

2016-2017 / Fall / ANTH-043E
2016-2017 / Fall / SOCI-055C

Creating Your Folders

2.)  Do any structures "nest" inside others?

Semesters

Courses 

Projects

Groups / Collaborations

These might make good mid-level folders.

Meetings

Assignments

Research-Group-A / Project-1

Research-Group-A / Project-2

2016-2017 / Fall / ANTH-043E / paper1
2016-2017 / Fall / SOCI-055C / paper2

Creating Your Folders

3.)  What kinds of documents do you generate while working?

Drafts

Procedures

Images

Numerical data

Meeting notes / minutes

Notes (ideas, etc.)

Lists (to-do, to-read, etc.)

Do any repeat?  If so, these might make good sub-folders.

Research-Group-A / Project-2 / MeetingNotes

2016-2017 / Fall / ANTH-043E / paper1 / notes

Class notes

Putting it all together: Folder Structure Examples

/butterfly/tabular/mcneill/20160117
/butterfly/images/mcneill/20160117

/Research-Group-A/Project-2/MeetingNotes

/2016-2017/fall/anth-043e/paper1/notes

/2016-2017/fall/hist091/paper1

/2016-2017/fall/soan098/proposal

/RIAs/2016-2017/applications
/RIAs/2016-2017/lesson_plans

/Research-Group-A/Project-1
/Research-Group-A/Project-2

Folders
Files
Storage

 

Folders
Files
Storage

 

Files

Which would you prefer to find on your laptop...

six months from now?

File Naming Best Practices

• Include some descriptive info  (skimmable)

Be consistent  (predictable)

Human-friendly:

• Use versioning - avoid "final final FINAL.pdf"  (sortable)

• Keep it simple - no special characters  (compatible)

Machine-friendly:

Reliable:

• Choose a future-friendly file format (when possible)

File Naming Best Practices

/ Research-Group-A / Project-2 / MeetingNotes

Include some descriptive information

even if files are inside well-labeled folders       (skimmable)

File Naming Best Practices

/ 2016-2017 / Fall / ANTH-043E / paper1

Use versioning - avoid "final final FINAL.pdf"

• Sequential numbers indicate change over time  (sortable)

• Text labels explain versions  (skimmable, understandable)

File Naming Best Practices

Keep it simple - no special characters

• Good:  letters, numbers, underscores (_), dashes (-)

      even better: all lower-case
 

• Avoid: other punctuation marks & special characters, spaces

 

readings_class-04.pdf

Readings!!/class 4.pdf

data-survey04.xls

data.survey.4.xls

File Naming Best Practices

• Good:  letters, numbers, underscores (_), dashes (-)

      even better: all lower-case

• Avoid: other punctuation marks & special characters, spaces

readings_class-04.pdf

Readings!!/class 4.pdf

data-survey04.xls

data.survey.4.xls

Try it out

Using sample data or your data,
improve a few file names.

Examples:

File Naming Best Practices

Use a future-friendly file format (when possible)

Aim for file formats that are:

  • Non-proprietary
  • Open, documented standard
  • Commonly used by research community
  • Recommended here (Library of Congress)

File Naming Best Practices

Use a future-friendly file format (when possible)

Spreadsheets / tabular data: CSV  (not Excel/.xlsx)
Text: TXT, PDF/A, or ODF  (not Word/.docx)


Still images: TIFF, JPEG 2000
Moving images: MPEG-4, AVI, MXF

Sounds: WAVE, AIFF, MP3, MXF

Statistics: ASCII, DTA, POR, SAS, SAV

Storage

where is your work?

Storage Best Practices

Ask: When might you need this again?

• Later this semester

• Applying for internships (next spring?)

• Taking a related course (next year?)

• Considering graduate school

• Applying for a job or graduate school

Storage Best Practices

Already on faculty/staff computers

Personal folders
Group folders (AODocs)

Larger, more complex storage needs

Putting it into practice

Something > nothing

 

• Start small
• Build gradually
• Don't psych yourself out

Questions

Andrea Baruzzi

 

Sarah Elichko

 

ITS Help Desk 

What do we mean by organized?

 

Human-friendly

• Machine-friendly

• Reliable

Skimmable

• Easily read at a glance

Understandable

• To your colleagues
• To your future self

Predictable

• Easy to keep up

What do we mean by organized?

 

• Human-friendly

Machine-friendly

• Reliable

Sortable

• Can be arranged correctly

Compatible

• Names that won't confuse systems

Organizing Your Research

By Swarthmore Reference

Organizing Your Research

  • 717