who am i?
follow along
Scope of this talk
- Overview of the open-source programming landscape for data science
- Miscellaneous useful topics you should know about
- Introduction to online resources for learning to code
- A basic toolkit for how not to get stuck
Caveats
- Data-science-and-analysis bias
- Not comprehensive
some definitions to start with
-
Programming language:
A formalized language to tell a computer (or other machine) what to do. -
Terminal:
A way to "interact" with a computer through text interactions and displayed output. -
Interpreter:
The "translator" between you and the computer. -
Integrated Development Environment:
Microsoft Word for programming.
so... what is programming?
Writing a set of instructions in a particular formalized language that you provide to the computer.why are there different languages?
Each language has a different purpose- different things that it was created to do easily.Common programming languages for data science
- Bash: Interacting with the computer, sometimes also parsing/cleaning data.
- Python: Good at data wrangling/modeling.
- R: Statistical programming language - great for tabular data and modeling.
- Julia: For math and fast processing - relatively new.
- SQL: Working with databases.
- HTML/Javascript: Web development - often for interactive visualizations.
Bash
Interacting with the "terminal," or "shell."
Where to learn it: Codeacademy, data36
> echo 'hello world'
Python
Data cleaning/analysis, building applications.Where to learn it? Codeacademy, DataCamp.
> print(hello world)
Julia
Computation and parallel computing.Where to learn it? Julia documentation.
> println("hello world")
HTML/Javscript
Web development and data visualization.Where to learn it? Codeacademy
<script>
console.log("hello world")
</script>
Other miscellaneous topics
Github
- Version control
- Code-sharing
Regular Expressions
Very fancy find-(and-maybe-replace). Play with the syntax at this website.Notebooks & Interactive Programming
Resources to get started
How do I get started?
- Pick a project
- Apply it to something at work
- Attend a hackathon (Houston, Space Apps)
How to get unstuck?
- Rubber ducky method
- StackOverflow
- Data Science Stack Exchange
- Online communities (twitter, slack)
What does this look like practically?
Case study 1: When I'm learning SQL from scratch
stackoverflow
try to build a database out of spreadsheets
realized I didn't know how to import a csv into MySQL db
w3schools tutorials
realized that my unique key wasn't unique
read MySQL documentation
Local communities to help
- NASA Slack
- Houston Data Visualization
Questions?
Coding for Data Science
By Yulan Lin
Coding for Data Science
- 2,193