introduction to DC

All research relies on software

Forget about the big examples, it's the small stuff that matters day-to-day and that's why you're doing this course

Take something as simple as Excel... Rienhart and Rogoff

And we know how much

National survey indications

Then even more precise in Southampton survey

 

What does this mean?

The idea that knowledge of software and data is needed only by physicists and data scientists is decades out of date

Knowing your way round software and data is a fundamental part of being a researcher

Funders, government and universities are really start to care, because of reliability and reproducibility

But undergraduate courses are yet to catch up with the needs of research

That's where this course comes in: we'll introduce you to some of the most useful software and data concepts you'll need

What are we teaching

 

  1. Spreadsheets - they're not our first choice for reproducible tools, but they're used a lot so let's do it properly
  2. Open refine - very few people know this important tool for understanding and cleaning data
  3. Command line - take the boredom and errors out of repetitive tasks
  4. Version control - understand what changed when
  5. Cloud computing - access computational power when it is needed
  6. Data analysis and visualisation - how to deal understand your data and generate results

How the course works

Takes place over one morning every week in the first semester

Starts with a video introduction

2 hours to run through materials - you might get through it faster

You'll be supported on Slack

30 min seminar to discuss ideas afterwards

 

 

 

Prerequisite software

 

 

Some detail

Spreadsheets

 

OpenRefine

Bash

Version control

R

introduction to DC

By Simon Hettrick

introduction to DC

  • 797