SQL and SQL Databases
SQL = Structured Query Language
What is it?
A programming language for managing data in a relational database.
Relational Databases
Susie
Jay
Lara
| Trig | Alg | Geom | Calc |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 2.0 | 2.3 | |
| 1.3 | 1.0 | 3.0 |
Susie
Jay
Lara
| Last | ID | Uni | Cats |
|---|---|---|---|
| Jones | 45 | TUM | 0 |
| Sun | 48 | LMU | 6 |
| Blue | 66 | LMU | 1 |
grades
students
Susie
Jay
Lara
unis
LMU
Ulm
TUM
| City | Students | Courses |
|---|---|---|
| Munich | [48, 66] | [Trig, Alg] |
| Munich | [45] | [Geom, Calc] |
| Ulm | [] | [Trig, Alg, Calc] |
courses
Geo
Alg
Trig
| ID | Prof ID | Students |
|---|---|---|
| 1 | 44 | [45, 48, 66] |
| 2 | 154 | [45, 66] |
| 3 | 22 | [45, 48] |
Movie rental store 🍿
APIs
Application Programming Interface
Think of APIs like a waiter at a restaurant:
Monte Carlo (MC) Methods
Use random sampling to estimate a very complicated probability distribution
scipy.constants
scipy.stats
scipy.integrate
scipy.interpolate
scipy.optimize
scipy.fft
SciPy
Tons of tools and highly optimized algorithms for doing math and science
Goals for today
Tutorial: code up our own ML model
What is ML?
Machine Learning:
What is ML?
f
Inputs
Outputs
We assume that everything has an underlying function, no matter how complex
What is ML?
f
Inputs
Outputs
Is it a cat or a dog?
What is ML?
f
Inputs
Outputs
What is the price of the house?
What is ML?
f
Inputs
Outputs
How many classes of galaxies are there?
What is ML?
f
Inputs
Outputs
How can you help a student pick which degrees would be most interesting for them?
Susie
Jay
Lara
| Trig | Alg | Music | Hist |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 2.0 | 2.3 | |
| 1.3 | 1.0 | 3.0 |
Ana
Juli
Max
| Trig | Alg | Music | Hist |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 2.0 | 2.3 | |
| 1.3 | 1.0 | 3.0 |
What is ML?
f
Inputs
Outputs
In most cases, we will never be able to determine the exact function
ML: algorithms that approximate these complex, non-linear functions as well as possible, without manual intervention
There are hundreds of algorithms to choose from!
Supervised
ML Algorithms
Unsupervised
Supervised
Supervised learning in a nutshell
f
Inputs
Outputs
Known
Known
Use labeled data to slowly push the unknown function towards correctness
Supervised
Supervised learning in a nutshell
Classification
Supervised
Regression
Output is limited set of categories
Output is somewhere along a number line
Supervised
ML Algorithms
Unsupervised
Unsupervised
Unsupervised learning in a nutshell
f
Inputs
Outputs
Known
unknown
Use unlabeled data and allow the algorithm to find patters on its own
Unsupervised
Clustering
Dimensionality
Clustering
Feed it tons of galaxy images
Too many to label by hand!
Why not let an algorithm figure out how many classes there are
Unsupervised
Dimensionality
Reducing the dimensionality of your data
Unsupervised
Susie
Jay
Lara
| Trig | Alg | Music | Hist |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 2.0 | 2.3 | |
| 1.3 | 1.0 | 3.0 |
Ana
Juli
Max
| Trig | Alg | Music | Hist |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 2.0 | 2.3 | |
| 1.3 | 1.0 | 3.0 |
How can you help a student pick which degree would be most interesting for them?
Now imagine you had 100 grades to look at per student
Ana
Juli
Max
| Trig | Alg | Music | Hist |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 2.0 | 2.3 | |
| 1.3 | 1.0 | 3.0 |
Susie
Jay
Lara
| Trig | Alg | Music | Hist |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 2.0 | 2.3 | |
| 1.3 | 1.0 | 3.0 |
Ana
Juli
Max
| Trig | Alg | Music | Hist |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 2.0 | 2.3 | |
| 1.3 | 1.0 | 3.0 |
Ana
Juli
Max
| Trig | Alg | Music | Hist |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 2.0 | 2.3 | |
| 1.3 | 1.0 | 3.0 |
Susie
Jay
Lara
| Trig | Alg | Music | Hist |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 2.0 | 2.3 | |
| 1.3 | 1.0 | 3.0 |
Ana
Juli
Max
| Trig | Alg | Music | Hist |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 2.0 | 2.3 | |
| 1.3 | 1.0 | 3.0 |
Ana
Juli
Max
| Trig | Alg | Music | Hist |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 2.0 | 2.3 | |
| 1.3 | 1.0 | 3.0 |
Susie
Jay
Lara
| Trig | Alg | Music | Hist |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 2.0 | 2.3 | |
| 1.3 | 1.0 | 3.0 |
Ana
Juli
Max
| Trig | Alg | Music | Hist |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 2.0 | 2.3 | |
| 1.3 | 1.0 | 3.0 |
Ana
Juli
Max
| Trig | Alg | Music | Hist |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 2.0 | 2.3 | |
| 1.3 | 1.0 | 3.0 |
Susie
Jay
Lara
| Trig | Alg | Music | Hist |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 2.0 | 2.3 | |
| 1.3 | 1.0 | 3.0 |
Ana
Juli
Max
| Trig | Alg | Music | Hist |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 2.0 | 2.3 | |
| 1.3 | 1.0 | 3.0 |
Ana
Juli
Max
| Trig | Alg | Music | Hist |
|---|---|---|---|
| 1.3 | 1.3 | 3.7 | 2.3 |
| 4.0 | 2.0 | 2.3 | |
| 1.3 | 1.0 | 3.0 |
Susie
Jay
Lara
| Math | Humanities |
|---|---|
| 1.3 | 1.2 |
| 3.0 | 2.3 |
| 1.1 | 2.0 |
Dimensionality
Reducing the dimensionality of your data
Unsupervised
Susie
Jay
Lara
| Math | Humanities |
|---|---|
| 1.3 | 1.2 |
| 3.0 | 2.3 |
| 1.1 | 2.0 |
Let the algorithm find 'hidden axes' in your data
Supervised
ML Algorithms
Unsupervised
Classification
Regression
Clustering
Dimensionality
Just some of the algorithms that are out there:
Time for today:
The standard pipline
Linear Regression
Usually has an analytical solution!
Supervised + Regression
Logistic Regression
Variant of linear regression for binary classification
Supervised + Classification
k-nearest neighbors
Supervised + Classification
k-nearest neighbors
Supervised + Regression
hyperparameters: choosing k
k is a so-called "hyperparameter"
A hyperparameter is a parameter you choose before training
Optimization of hyperparameters is an art!
⚠️ Overfitting and Underfitting
Decision Trees
Supervised + Regression/Classification
Decision Trees
Supervised + Regression/Classification
k-means clustering
Unsupervised + Classification
Random Forests
Supervised + Regression/Classification
Random Forests
Supervised + Regression/Classification
To the notebook!
The End