Introduction to Programming for Data Science
who am i?
Scope of this talk
- Overview of the open-source programming landscape for data science
- Miscellaneous useful topics you should know about
- Introduction to online resources for learning to code
- A basic toolkit for how not to get stuck
- Data-science-and-analysis bias
- Not comprehensive
some definitions to start with
A formalized language to tell a computer (or other machine) what to do.
A way to "interact" with a computer through text interactions and displayed output.
The "translator" between you and the computer.
Integrated Development Environment:
Microsoft Word for programming.
so... what is programming?Writing a set of instructions in a particular formalized language that you provide to the computer.
why are there different languages?Each language has a different purpose- different things that it was created to do easily.
Common programming languages for data science
- Bash: Interacting with the computer, sometimes also parsing/cleaning data.
- Python: Good at data wrangling/modeling.
- R: Statistical programming language - great for tabular data and modeling.
- Julia: For math and fast processing - relatively new.
- SQL: Working with databases.
Where to learn it? CodeSchool
> print("hello world")
JuliaComputation and parallel computing.
Where to learn it? Julia documentation.
> println("hello world")
SQLWorking with and querying databases.
Where to learn it? w3schools
> SELECT 'HELLO WORLD';
HTML/JavscriptWeb development and data visualization.
Where to learn it? Codeacademy
<script> console.log("hello world") </script>
Other miscellaneous topics
- Version control
Regular ExpressionsVery fancy find-(and-maybe-replace). Play with the syntax at this website.
What does this look like practically?
Case study 1: When I'm learning SQL from scratch
try to build a database out of spreadsheets
realized I didn't know how to import a csv into MySQL db
realized that my unique key wasn't unique
read MySQL documentation
Local communities to help
- NASA Slack
- Houston Data Visualization
Coding for Data Science
By Yulan Lin