Introduction to Programming for Data Science


Yulan Lin
@y3l2n

who am i?

Scope of this talk

  • Overview of the open-source programming landscape for data science
  • Miscellaneous useful topics you should know about
  • Introduction to online resources for learning to code
  • A basic toolkit for how not to get stuck

Caveats

  • Data-science-and-analysis bias
  • Not comprehensive

some definitions to start with

  • Programming language:
    A formalized language to tell a computer (or other machine) what to do.
  • Terminal:
    A way to "interact" with a computer through text interactions and displayed output.
  • Interpreter:
    The "translator" between you and the computer.
  • Integrated Development Environment:
    Microsoft Word for programming.

so... what is programming?

Writing a set of instructions in a particular formalized language that you provide to the computer.

why are there different languages?

Each language has a different purpose- different things that it was created to do easily.

Common programming languages for data science

  • Bash: Interacting with the computer, sometimes also parsing/cleaning data.
  • Python: Good at data wrangling/modeling.
  • R: Statistical programming language - great for tabular data and modeling.
  • Julia: For math and fast processing - relatively new.
  • SQL: Working with databases.
  • HTML/Javascript: Web development - often for interactive visualizations.

Bash

Interacting with the "terminal," or "shell."
Where to learn it: Codeacademy, data36

> echo 'hello world'

Python

Data cleaning/analysis, building applications.
Where to learn it? Codeacademy, DataCamp.
> print(hello world)

R

Data analysis/modeling.
Where to learn it? CodeSchool
> print("hello world")

Julia

Computation and parallel computing.
Where to learn it? Julia documentation.
> println("hello world")

SQL

Working with and querying databases.
Where to learn it? w3schools
> SELECT 'HELLO WORLD';

HTML/Javscript

Web development and data visualization.
Where to learn it? Codeacademy
<script>
console.log("hello world") 
</script>

Other miscellaneous topics

Github

  • Version control
  • Code-sharing

Regular Expressions

Very fancy find-(and-maybe-replace). Play with the syntax at this website.

Notebooks & Interactive Programming

Resources to get started

How do I get started?

  • Pick a project
  • Apply it to something at work
  • Attend a hackathon (Houston, Space Apps)

How to get unstuck?

What does this look like practically?

Case study 1: When I'm learning SQL from scratch

stackoverflow

try to build a database out of spreadsheets

realized I didn't know how to import a csv into MySQL db

w3schools tutorials

realized that my unique key wasn't unique

read MySQL documentation

Local communities to help

  • NASA Slack
  • Houston Data Visualization

Questions?

Coding for Data Science

By Yulan Lin

Coding for Data Science

  • 2,276