1.3 Six Elements of ML
A defining framework for understanding concepts in the course

Recap: Machine Learning
What we saw in the previous chapter?
(c) One Fourth Labs

Repeat the last slide of the previous chapter
A jargon cloud
How do you make sense of all the jargon?
(c) One Fourth Labs


From jargons to jars
What are the six jars of Machine Lerarning
(c) One Fourth Labs







Data data everywhere
What is the fuel of Machine Learning?
(c) One Fourth Labs



















Data data everywhere
How do you feed data to machines ?
(c) One Fourth Labs

We encode all data into numbers - typically high dimension
For instance, in this course you will learn to embed image and text data as large vectors
Data entries are related - eg. given a MRI scan whether there is a tumour or not
Include a table that shows two/three MRI scans in first col, shows large vectors in second column, 1/0 for last column of whether there is tumour or not
Include a table that shows two/three reviews in first col, shows large vectors in second column, 1/0 for last column for whether review is positive or negative
Title the columns as x and y







Input-1 | Input-2 | Input-3 | Input-4 | y |
---|---|---|---|---|
2.3 | 5.9 | 11.0 | -10.3 | 0 |
-8.5 | -1.7 | -1.3 | 9.0 | 0 |
12.3 | 5.4 | 3.4 | 2.4 | 1 |
1.9 | 7.9 | 8.1 | -3.3 | 1 |
-9.1 | 1.2 | -2.1 | 7.8 | 0 |
3.2 | -11.2 | 5.6 | 12.1 | 1 |
4.5 | 3.75 | -1.2 | -10.0 | 1 |


Data data everywhere
How do you feed data to machines ?
(c) One Fourth Labs

We encode all data into numbers - typically high dimension
For instance, in this course you will learn to embed image and text data as large vectors
Data entries are related - eg. given a MRI scan whether there is a tumour or not
Include a table that shows two/three MRI scans in first col, shows large vectors in second column, 1/0 for last column of whether there is tumour or not
Include a table that shows two/three reviews in first col, shows large vectors in second column, 1/0 for last column for whether review is positive or negative
Title the columns as x and y
Input-1 | Input-2 | Input-3 | Input-4 | y |
---|---|---|---|---|
4.3 | 5.9 | 1.0 | 13.2 | Positive |
-9.5 | 1.7 | 1.3 | 9.2 | Positive |
2.3 | 5.4 | 3.8 | 2.9 | Negative |
19.1 | 8.9 | 8.2 | -3.3 | Positive |
-9.2 | 11.2 | -12.1 | 1.8 | Positive |
4.5 | -11.2 | 4.6 | 2.1 | Negative |
12.2 | -3.8 | 0.2 | -1.0 | Negative |










Data data everywhere
How do you feed data to machines ?
(c) One Fourth Labs

We encode all data into numbers - typically high dimension
For instance, in this course you will learn to embed image and text data as large vectors
Data entries are related - eg. given a MRI scan whether there is a tumour or not
Include a table that shows two/three MRI scans in first col, shows large vectors in second column, 1/0 for last column of whether there is tumour or not
Include a table that shows two/three reviews in first col, shows large vectors in second column, 1/0 for last column for whether review is positive or negative
Title the columns as x and y





1.3 | -4.3 | 2.1 | -6.7 | ... | 1.5 | 8.9 | 10.1 | -4.5 |
2.6 | 7.9 | -0.3 | 8.1 | ... | -4.2 | 0.3 | -11.2 | 9.4 |
-5.2 | -3.2 | 4.2 | 0.3 | ... | 3.5 | 8.3 | -1.4 | -8.7 |
8.5 | 2.1 | -6.3 | 5.3 | ... | 7.2 | -1.3 | -4.5 | 11.8 |
2.3 | -5.6 | -1.2 | 7.8 | ... | 9.9 | 10.1 | -12.1 | 3.5 |
Data curation
Where do I get the data from?
(c) One Fourth Labs

Source data from existing datasets
- Google datasets
- Mitesh link on datasets
- data.gov.in, etc. => Assignment: Go check out this website and formulate ML problems
Collect data yourself/others => Dataturks => Assignment: create a project, upload 5 images of sign boards, and ask five friends to label
- Take pictures of Indian dishes
- Labelling of data
Create data specific to your problem
- Also in capstone
Data data everywhere
What is the fuel of Machine Learning?
(c) One Fourth Labs

Show data jars
Tasks
What do you do with this data?
(c) One Fourth Labs

Consider the case of Amazon with product data
Multiple tasks can be done with this:
1. From product description to structured specs
2. From specs + reviews to writing FAQs
3. From specs + reviews + FAQs to question answering
4. From specs + reviews + personal data to recommendations
Tasks
What do you do with this data?
(c) One Fourth Labs

Consider the case of Facebook photos
Multiple tasks can be done with this:
1. From photos identify people, places, activities
2. From posts + personal data recommend posts
3. From video detect profanity, etc.
Tasks
What do you do with this data?
(c) One Fourth Labs

Different types of tasks:
1. Supervised
- Classification - text or no text
- Regression - fitting bounding boxes (more later)
2. Unsupervised
- Clustering - clustering news articles by similarity
- Generation - deep art, deep poetry
Most of the realworld ML tasks (90%) are supervised. This course will exclusively focus on this class of problems. Except for easter eggs.
In supervised ML it is about finding y given x
Tasks
What do you do with this data?
(c) One Fourth Labs

In the dataturks labelled data, define tasks that you can perform. At least 3
// Binary classification of whether there is text
// Detect text with bounding box - is accuracy easy to define here?
Tasks
What do you do with this data?
(c) One Fourth Labs

Show data, tasks jars
What is the mathematical formulation of a task?
(c) One Fourth Labs

\( x \)
\( y \)
bat
car
dog
cat
Models
\( \left[\begin{array}{lcr} 0.2, 0.1, 0.7, ......0.8 \end{array} \right]\)
\( \left[\begin{array}{lcr} 0, 0, 1,0 \end{array} \right]\)
Now show cat, then car then ship, then dog again and keep growing the matrix
\( y = f(x) \) [true relation, unknown]
\( \hat{y} = \hat{f}(x) \) [our approximation]
Models
What are the choices for \( \hat{f} \) ?
(c) One Fourth Labs

- Show some points sampled from this function
- Say that there is some complex relation between x
- Naively I assumed that its y=mx + c
- no matter how I adjust m and c I can't make f and \( \hat{f} \) equal
- Let's try net function...better...better better
\( y = mx + c \)
\( y = ax^2 + bx + c \)
\( y = \sigma(wx + b) \)
\( y = AlexNet(x) \)
\( y = \hat{f}(x) \) [our approximation]

\( \left [\begin{array}{lcr} 0.2\\ 0.1\\ 0.7\\ ....\\0.8 \end{array} \right]\)
\( \left [\begin{array}{lcr} 2.2\\ 3.1\\ 0.7\\ ....\\4.8 \end{array} \right]\)
\( x \)
\( y \)
\( y = ax^3 + bx^2 + cx + d \)
\( y = ax^4 + bx^3 + cx + d \)
Models
Why not just use a complex model always ?
(c) One Fourth Labs

\( \left [\begin{array}{lcr} 0.2\\ 0.1\\ 0.7\\ ....\\0.8 \end{array} \right]\)
\( \left [\begin{array}{lcr} 0.4\\ 0.2\\ 1.4\\ ....\\1.6 \end{array} \right]\)
\( x \)
\( y \)

This will be replaced by a simple line
We will show animation how it will be easy to fit a line but difficult to fit 100 degree polynomial
\( y = mx + c \) [true function, simple]
\(y = ax^{100} + bx^{99} + ... + c \) [our approximation, very complex]
- Overkill
- Harder to Learn
- Need More data
Models
What are the choices for \( \hat{f} \) ?
(c) One Fourth Labs

Add model jar
Loss Function
How do we know which model is better ?
(c) One Fourth Labs

\( \left [\begin{array}{lcr} 0.2\\ 0.1\\ 0.7\\ ....\\0.8 \end{array} \right]\)
\( \left [\begin{array}{lcr} 0.4\\ 0.2\\ 1.4\\ ....\\1.6 \end{array} \right]\)
\( x \)
\( y \)
\( \hat{f_1}(x) = ax^2 + bx + c \)
\( \hat{f_2}(x) = ax^3 + bx^2 + cx + d \)
\( \hat{f_3}(x) = ax^4 + bx^3 + cx + d \)
?
Show plots for true f and f_1 f_2 f_3... From the plots it will not be clear
but from the columns it will be clear
why is it clear? because you are computing some numbers
\( y_1 \)
\( y_2 \)
\( y_3 \)
Loss Function
What does a loss function look like ?
(c) One Fourth Labs

\( \left [\begin{array}{lcr} 0.2\\ 0.1\\ 0.7\\ ....\\0.8 \end{array} \right]\)
\( \left [\begin{array}{lcr} 0.4\\ 0.2\\ 1.4\\ ....\\1.6 \end{array} \right]\)
\( x \)
\( y \)
\( \hat{f_1}(x) = ax^2 + bx + c \)
\( \hat{f_2}(x) = ax^3 + bx^2 + cx + d \)
\( \hat{f_3}(x) = ax^4 + bx^3 + cx + d \)
?
Show squared error loss
compute the error for y_1, y_2, y_3
Indeed y_2 seems to be the better model
\( y_1 \)
\( y_2 \)
\( y_3 \)
Loss Function
What does a loss function look like ?
(c) One Fourth Labs

Add jar for loss function and have a recap
Who will give us the parameters ?
Learning Algorithm
How do we identify parameters of the model?
(c) One Fourth Labs

Show data on LHS
A box for learning on the RHS
A complex model equation on top of the box
Loss function at the bottom of the box
In Plain English:
Say that this is a search problems
Simplest algorithm is to use brute force on this 3-dimensional parameter space
But now imagine what happens if you have more than 3 parameters!
Learning Algorithm
How do we identify parameters of the model?
(c) One Fourth Labs

Show data on LHS
A box for learning on the RHS
A complex model equation on top of the box
Loss function at the bottom of the box
In Plain English:
We want to find the parameters a, b, c such that when we plugin a x into the f(x) the output should be as close to the true output
Mathematically,
Optimization problem
Minimization function
Learning Algorithm
How do we identify parameters of the model?
(c) One Fourth Labs

Show data on LHS
A box for learning on the RHS
A complex model equation on top of the box
Loss function at the bottom of the box
Now show images of Gradient Descent, Adam, Adagrad, etc. with citations
Learning Algorithm
How do we identify parameters of the model?
(c) One Fourth Labs

Add jar for Learning Algorithm
Evaluation
How do we compute a score for our ML model?
(c) One Fourth Labs

Show a matrix for x and y
Now add a matrix y for model predictions
Now show ticks and crosses and show we can compute accuracy (show formula)
End by saying that there are other metrics such as precision, recall, etc.
Standard evaluation (example ImageNet)
Evaluation
What are some other evaluation metrics
(c) One Fourth Labs

Show a matrix for x and y
Now add a matrix y for model predictions which is a ranked list
Now show ticks and crosses for top-1, top-5
End by saying that there are other metrics such as precision, recall, etc.
Standard evaluation (example ImageNet)
Evaluation
How is this different from loss function ?
(c) One Fourth Labs

Task is whether I should press the brake or not. I just want to know how many times I did this correctly
But to train the model I might choose to use the distance form the obstruction as a metric for training the model.
Evaluation
Should we learn and test on the same data?
(c) One Fourth Labs

Does it make sense to have same question as homework and exam. Why not?
This can over-estimate your performance
For an unbiased evaluation, test data should be different from train data
Typically split 80:20
Evaluation
How is this different from loss function ?
(c) One Fourth Labs

Add jar for evaluation
Putting it all together
How does all the jargon fit into these jars?
(c) One Fourth Labs

Show six jars
Data, democratisation, devices
Why ML is very successful?
(c) One Fourth Labs

Show six jars
Rapid progress and revolution in algorithms which have been democratized
Standardized evaluation, learning, loss, models
Standardize frameworks
You focus on getting data and formulating tasks
Typical ML effort
How to distribute your work through the six jars?
(c) One Fourth Labs

Show six jars
Connecting to the Capstone
How to distribute your work through the six jars?
(c) One Fourth Labs

- Data curation, labelling
- Task identification
- Model selection
- Formulating loss function
- Learning algorithm with bag of tricks
- Evaluation
Assignment
How do you apply the six jars to a problem that you have encountered?
(c) One Fourth Labs

Explain the problem
Give link to the quiz
Suman's Copy of 1.3 The Six Elements of ML
By suman banerjee
Suman's Copy of 1.3 The Six Elements of ML
- 543