How to Tell a Story With Data

Tools of the Trade

Dhrumil Mehta

 

Database Journalist, Politics - FiveThirtyEight

Democracy and Technology Fellow - AshCenter

 

dhrumil.mehta@fivethirtyeight.com  

 @datadhrumil

@dmil

To Do

 

 

  • Install Google Chrome if you don't have it
     
# Install Homebrew Package Manager
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

# Install Caskroom (to learn more...https://caskroom.github.io/)
brew install caskroom/cask/brew-cask
brew tap caskroom/versions

# Install Some Coding Tools
brew cask install iterm2
brew cask install sublime-text3

# Install Node and Underscore CLI
brew install node
npm install -g underscore-cli


# Default package manager is Apt-Get

# Install Sublime Text
sudo add-apt-repository ppa:webupd8team/sublime-text-3
sudo apt-get update
sudo apt-get install sublime-text-installer

# Install Node and Underscore CLI
sudo apt-get install nodejs
sudo apt-get install npm
npm install -g underscore-cli

MAC

Ubuntu

Install Chrome Extensions

Survey Responses

Agenda

  • Introductions
  • Command Line
  • Github/Git
  • Web Browser Dev Tools
  • APIs
  • Regex
  • Data Cleaning/Parsing/Verifying
  • Data Storytelling

 

Command Line

whoami # your username
hostname # my computer's network name

pwd # print working directory
cd # change directory
ls # list directory

mkdir # make directory
rmdir # remove directory
cp # copy a file or directory
mv # move a file or directory

cat # print a whole file
head # print first few lines of a file
tail # print last few lines of a file
less # page through a file

find # find files
grep # find things inside files

man # read a manual page
env # look at your environment

echo # print some arguments
export # export/set a new environment variable
exit # exit the shell

sudo # DANGER! become super user root DANGER!
chmod # change permission modifiers
chown # change ownership

Commands

  • Take some time to cd around and explore your filesystem. See what is at the root, see if you can find some of the files you use daily.
  • Create a directory on the desktop called "code"
  • Move it from the Desktop to your home directory
  • cd into your new "code" directory

Try It:

Piping

echo "Hello, World." > hello.txt
# ">" Grabs the output from the command on the 
# left and feeds it to the file on the right

echo "Goodbye, World." >> hello.txt
# ">>" is similar to ">" except if the file already 
# exits it will append to the end of it rather than
# overwriting the file

grep World hello.txt
# grep is a command that searches a file for a
# regular expression, we can try it on hello.txt

ls -a1 | grep .txt
# "|" takes the output of the command to the left and
# uses it as the input for the command to the right

Additional Resources

 

Git/Github

  • Git: A version control system
  • GitHub: a hosting service for Git repositories. 

Git

# Initialize the local directory as a Git repository.
git init

# Add the files to your new repository's staging area.
git add <filename>

# View which files are in the staging area
git status

# Commit the files that you've staged in your local repository
# with a descriptive commit message.
git commit -m "<commit message>"

# Sets the new remote
git remote add origin remote repository URL

# Verifies the new remote URL
git remote -v

# Push the changes in your local repository to GitHub.
git push origin master
git add <filename>

working directory

staging area 

repository

git commit -m "<commit message>"
git push

Local

Remote

 

Forking and Pull Requests

(Open Source Collaboration Model)

http://stackoverflow.com/questions/3611256/forking-vs-branching-in-github

Forking and Pull Requests

(Open Source Collaboration Model)

# 1) Fork this repo in github
#    https://github.com/dmil/tools-of-the-trade

# 2) Make a local clone of your fork
git clone <url_of_your_fork>

# 3) Add yourself to the list of people who know how to fork
echo "Dhrumil Mehta" >> forkers.md

# 4) Commit your changes
git add forkers.md
git commit -m "Add dhrumil to list of people who fork"

# 5) Push the file to your own github
git push origin master

# 6) Issue a pull request to me on github

# 7) Wait for me to merge the pull requests

# 8) the changes from upstream
git remote add upstream https://github.com/dmil/tools-of-the-trade.git
git pull upstream

Branching & Merging

(and sometimes Pull Requests)

  • Branch - stored in .git folder within a repo
  • Fork - simply a clone in Github's servers (an entirely new copy of the repo you forked from)
  • Merge - merge between two branches
  • Pull - merge between two copies of a repositories
  • Pull Request - a polite pull

Other Github Features

# Head over to GitHub and create a new repository 
# named username.github.io, where username is your username

# Clone your new repository
git clone <your new repo>

# Create an index.html file
cd <your_username>.github.io
~$echo "Hello World" > index.html

# Push to github
git commit -m "Create personal github page"
git add --all
git push -u origin master

# Voila!
open http://<your_username>.github.io

Additional Resources

Web Browser / Scraping

Install Chrome Extensions

Scraping from HTML

  • Grabbing Tables
  • Scraping sites

Inspect Element

 


  • Right Click, View Source
  • Right Click, Inspect
  • HTML, CSS, and Javascript
  • The Network Tab
  • ​Manipulate URL patterns




Lets Try a Few:

HTTP - GET & POST

What is going on when a page loads?

GET POST
Requests data from a specific resource. Submits data to be processed by a specific resource.
Data is submitted as part of the URL Data is submitted in the request body
Less secure but faster More secure but slower
Can be cached by browser Not Cached by Browser
Length Limited by URL size MaxLength determined by server

HTTP - Request

CSS Selectors

Caveats

  • Legal Ramifications (Craigslist)
  • Rate limits / Accidental DDOS
  • Website can see your IP
  • Scraping is a scary word

Additional Resources

APIs

(When people want you to have data)

Two Examples

Lets build some

HTTP Requests

  • Open the Postman Chrome App

Regular Expressions

Additional Resources

Data Manipulation

(and validation)

Example of Command Line Data Manipulation

curl -L https://github.com/dmil/tools-of-the-trade/blob/master/roster.xlsx?raw=true
in2csv roster.xlsx > roster.csv
csvcut roster.csv -c 8 | tail -n +2 | awk '{gsub("\"", "")}1' |  cut -d'|' -f 1-  --output-delimiter=$'\n' | sed 's/^[ \t]*//;s/[ \t]*$//'  |  sort | uniq -cd

# Or if I wanted to be even more obnoxious
in2csv roster.xlsx | csvcut -c 8 | tail -n +2 | awk '{gsub("\"", "")}1' |  cut -d'|' -f 1-  --output-delimiter=$'\n' | sed 's/^[ \t]*//;s/[ \t]*$//'  |  sort | uniq -cd

Data-Driven Storytelling

 

dhrumil.mehta@fivethirtyeight.com  

 @datadhrumil

@dmil 

 

http://fivethirtyeight.com/contributors/dhrumil-mehta/​