1.3 Six Elements of ML
A defining framework for understanding concepts in the course

Recap: Machine Learning
What we saw in the previous chapter?
(c) One Fourth Labs


A jargon cloud
How do you make sense of all the jargon?
(c) One Fourth Labs


From jargons to jars
What are the six jars of Machine Lerarning
(c) One Fourth Labs







Data data everywhere
What is the fuel of Machine Learning?
(c) One Fourth Labs
















Data data everywhere
How do you feed data to machines ?
(c) One Fourth Labs

We encode all data into numbers - typically high dimension
For instance, in this course you will learn to embed image and text data as large vectors
Data entries are related - eg. given a MRI scan whether there is a tumour or not
Include a table that shows two/three MRI scans in first col, shows large vectors in second column, 1/0 for last column of whether there is tumour or not
Include a table that shows two/three reviews in first col, shows large vectors in second column, 1/0 for last column for whether review is positive or negative
Title the columns as x and y




All data encoded as numbers
Typically high dimensional
scans | ||
---|---|---|
2.3 | 5.9 | ... | 11.0 | -0.3 | 8.9 | 0 |
---|
-8.5 | -1.7 | ... | -1.3 | 9.0 | 7.2 | 1 |
---|
-0.4 | 6.7 | ... | -2.4 | 4.7 | -7.3 | 0 |
---|
1.6 | -0.4 | ... | -4.6 | 6.4 | 1.9 | 1 |
---|



3.9 | -4.1 | ... | 6.7 | -3.1 | 2.1 | 1 |
---|
5.1 | 3.7 | ... | 1.8 | -4.2 | 9.3 | 1 |
---|

Data data everywhere
How do you feed data to machines ?
(c) One Fourth Labs

We encode all data into numbers - typically high dimension
For instance, in this course you will learn to embed image and text data as large vectors
Data entries are related - eg. given a MRI scan whether there is a tumour or not
Include a table that shows two/three MRI scans in first col, shows large vectors in second column, 1/0 for last column of whether there is tumour or not
Include a table that shows two/three reviews in first col, shows large vectors in second column, 1/0 for last column for whether review is positive or negative
Title the columns as x and y
All data encoded as numbers
Typically high dimensional
Document | ||
---|---|---|
1.9 | 3.2 | ... | -9.8 | -6.7 | 1.2 |
---|
1.3 | 3.6 | ... | -5.4 | 9.1 | 2.3 |
---|
0.4 | 7.6 | ... | -0.1 | -1.4 | 8.7 |
---|
1.5 | -0.8 | ... | 7.8 | 8.4 | 0.3 |
---|

Don't buy this MI 6 Pro, Speaker volume is very bad
Delivered as shown. Good price and fits perfect
What a phone.. A handy epic phone. MI at its best ...
Its look stunning in pictures , but not in real.
negative
negative
positive
positive
Amazing camera and battery. Good deal!
2.5 | -5.7 | ... | 0.9 | 5.3 | -8.1 |
---|
positive

Data data everywhere
How do you feed data to machines ?
(c) One Fourth Labs





1.3 | -4.3 | 2.1 | -6.7 | ... | 1.5 | 8.9 | 10.1 | -4.5 |
2.6 | 7.9 | -0.3 | 8.1 | ... | -4.2 | 0.3 | 1.2 | 9.4 |
-5.2 | -3.2 | 4.2 | 0.3 | ... | 3.5 | 8.3 | -1.4 | -8.7 |
8.5 | 2.1 | -6.3 | 5.3 | ... | 7.2 | -1.3 | -4.5 | 11.8 |
2.3 | -5.6 | -1.2 | 7.8 | ... | 9.9 | 10.1 | -1.1 | 3.5 |
All data encoded as numbers
Typically high dimensional
In this course


text
image

Data curation
Where do I get the data from?
(c) One Fourth Labs






I am lucky
I am rich
I am smart

+ मुंबई

= मुंबई
In this course

Data data everywhere
What is the fuel of Machine Learning?
(c) One Fourth Labs

Data

Tasks
What do you do with this data?
(c) One Fourth Labs

Input
Output


















Hello John,
Hello John,
From product description to structured specifications
From specifications + revies to writing FAQs
From specifications + reviews + FAQs to Question Answering
From specifications + reviews + personal data to recommendations





+


+

+


Hello John,

(c) One Fourth Labs
Tasks
What do you do with this data?
(c) One Fourth Labs







From images identify people

Shahrukh Khan
Aamir Khan
From images identify activities

Eating
From images identify places

Gym

From posts recommend posts

Output
Input
Tasks
What do you do with this data?
(c) One Fourth Labs

Supervised
Classification






3.2 | 5.9 | ... | 11.0 | 8.9 | 1 |
---|
-8.5 | -1.7 | ... | 9.0 | 7.2 | 1 |
---|
-0.4 | 6.7 | ... | 4.7 | -7.2 | 0 |
---|
2.7 | 3.1 | ... | -2.1 | 9.7 | 0 |
---|
3.9 | 7.8 | ... | -5.1 | 3.7 | 0 |
---|
7.1 | 0.9 | ... | 1.5 | -4.2 | 1 |
---|







Tasks
What do you do with this data?
(c) One Fourth Labs

Supervised
Regression



-8.5 | -1.7 | ... | 9.0 | 7.2 | 2.3 | 1.2 | 9.2 | 10.1 |
---|
0.9 | -2.1 | ... | -8.1 | 1.9 | 4.3 | 4.2 | 7.1 | 5.1 |
---|
2.9 | -4.5 | ... | -3.7 | 8.9 | 2.3 | 7.2 | 6.9 | 7.3 |
---|

Tasks
What do you do with this data?
(c) One Fourth Labs

Clustering
Unupervised



3.2 | 5.9 | ... | 11.0 | 8.9 |
---|
-8.5 | -1.7 | ... | 9.0 | 7.2 |
---|
-0.4 | 6.7 | ... | 4.7 | -4.1 |
---|
2.7 | 3.1 | ... | -2.1 | 9.7 |
---|
3.9 | 7.8 | ... | -5.1 | 3.7 |
---|
7.1 | 0.9 | ... | 1.5 | -4.2 | 1 |
---|















Tasks
What do you do with this data?
(c) One Fourth Labs

Generation
Unupervised

3.2 | 5.9 | ... | 11.0 | 8.9 |
---|
-8.5 | -1.7 | ... | 9.0 | 7.2 |
---|
-0.4 | 6.7 | ... | 4.7 | -4.1 |
---|
2.7 | 3.1 | ... | -2.1 | 9.7 |
---|
3.9 | 7.8 | ... | -5.1 | 3.7 |
---|
7.1 | 0.9 | ... | 1.5 | -4.2 | 1 |
---|






Tasks
What do you do with this data?
Generation
Unupervised
Tweets | |
---|---|
2.3 | 5.9 | ... | 11.0 | -0.3 | 8.9 |
---|
-8.5 | -1.7 | ... | -1.3 | 9.0 | 7.2 |
---|
-0.4 | 6.7 | ... | -2.4 | 4.7 | -6.2 |
---|
1.6 | -0.4 | ... | -4.6 | 6.4 | 1.9 |
---|





(c) One Fourth Labs

Tasks
What do you do with this data?
(c) One Fourth Labs

\( `` \)
Supervised Learning has created 99% of economic value in AI
In this course
Classification
Regression








Tasks
What do you do with this data?
(c) One Fourth Labs

Data


Task
What is the mathematical formulation of a task?
(c) One Fourth Labs

\( x \)
\( y \)
bat
car
dog
cat
Models
\( \left[\begin{array}{lcr} 2.1, 1.2, \dots, 5.6, 7.2 \end{array} \right]\)
\( \left[\begin{array}{lcr} 0, 0, 1,0, 0 \end{array} \right]\)
\( y = f(x) \) [true relation, unknown]
\( \hat{y} = \hat{f}(x) \) [our approximation]
ship
\( \left[\begin{array}{lcr} 0, 1, 0, 0, 0 \end{array} \right]\)
\( \left[\begin{array}{lcr} 0, 0, 0, 0, 1 \end{array} \right]\)
\( \left[\begin{array}{lcr} 1, 0, 0, 0, 0 \end{array} \right]\)
\( \left[\begin{array}{lcr} 0, 0, 1, 0, 0 \end{array} \right]\)
\( \left[\begin{array}{lcr} 0.1, 3.1, \dots, 1.7, 3.4\end{array} \right]\)
\( \left[\begin{array}{lcr} 0.5, 9.1,\dots, 5.1, 0.8 \end{array} \right]\)
\( \left[\begin{array}{lcr} 1.2, 4.1, \dots, 6.3, 7.4 \end{array} \right]\)
\( \left[\begin{array}{lcr} 3.2, 2.1, \dots, 3.1, 0.9 \end{array} \right]\)
Models
What are the choices for \( \hat{f} \) ?
(c) One Fourth Labs

\( \hat{y} = mx + c \)
\(\hat{ y} = ax^2 + bx + c \)
\( y = \sigma(wx + b) \)
\( y = Deep\_NN(x) \)
\( \hat{y} = \hat{f}(x) \) [our approximation]
\( \left [\begin{array}{lcr} 0.5\\ 0.2\\ 0.6\\ \dots\\0.3\ \end{array} \right]\)
\( \left [\begin{array}{lcr} 14.8\\ 13.3\\ 11.6\\ \dots\\6.16 \end{array} \right]\)
\( x \)
\( y \)
\(\hat{ y} = ax^3 + bx^2 + cx + d \)
\(\hat{ y} = ax^4 + bx^3 + cx + d \)
Data
In this course
\( y = Deep\_CNN(x) \) ...
\( y = RNN(x) \) ...
Data is drawn from the following function
\(\hat{ y} = ax^{25} + bx^{24} + \dots + cx + d \)










Models
Why not just use a complex model always ?
(c) One Fourth Labs

\( \left [\begin{array}{lcr} 0.1\\ 0.2\\ 0.4\\ ....\\0.8 \end{array} \right]\)
\( \left [\begin{array}{lcr} 2.6\\ 2.4\\ 3.1\\ ....\\4.1 \end{array} \right]\)
\( x \)
\( y \)
\( y = mx + c \) [true function, simple]
\(\hat{y} = ax^{100} + bx^{99} + ... + c \)
[our approximation, very complex]
Later in this course
Bias-Variance Tradeoff
Overfitting
Regularization






Models
What are the choices for \( \hat{f} \) ?
(c) One Fourth Labs

Model
Data


Task

Loss Function
How do we know which model is better ?

\( \left [\begin{array}{lcr} 0.00\\ 0.10\\ 0.20\\ ....\\6.40 \end{array} \right]\)
\( \left [\begin{array}{lcr} 0.24\\ 0.08\\ 0.12\\ ....\\0.36 \end{array} \right]\)
\( x \)
\( y \)
?
\( \hat{f_1}(x) \)
\( \left [\begin{array}{lcr} 0.25\\ 0.09\\ 0.11\\ ....\\0.36 \end{array} \right]\)
\( \left [\begin{array}{lcr} 0.32\\ 0.30\\ 0.31\\ ....\\0.22 \end{array} \right]\)
\( \left [\begin{array}{lcr} 0.08\\ 0.20\\ 0.14\\ ....\\0.15 \end{array} \right]\)
\( \hat{f_1}(x) = 1.79x^{25} - 4.54 x^{24} + ... - 1.48x + 2.48 \)
\( \hat{f_2}(x) = 2.27x^{25} + 9.89x^{24} + ... + 2.79x + 3.22 \)
\( \hat{f_3}(x) = 3.78x^{25} + 1.57x^{24} + ... + 1.01x + 8.68 \)
\( \begin{array}{lcr} 1\\ 2\\ 3\\ ....\\n \end{array} \)
\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)
\( \hat{f_2}(x) \)
\( \hat{f_3}(x) \)
\( \mathscr{L}_2 = \sum_{i=1}^{n} (y_i - \hat{f}_2(x_i))^2 \)
\( \mathscr{L}_3 = \sum_{i=1}^{n} (y_i - \hat{f}_3(x_i))^2 \)
True Function
\( \hat{f_1}(x) \)
\( \hat{f_2}(x) \)
\( \hat{f_3}(x) \)






why not use numbers ?
whose function is better?
?



Loss Function
How do we know which model is better ?
(c) One Fourth Labs

\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 = ? \)
\( \mathscr{L}_2 = \sum_{i=1}^{n} (y_i - \hat{f}_2(x_i))^2 = 2.02\)
\( \mathscr{L}_3 = \sum_{i=1}^{n} (y_i - \hat{f}_3(x_i))^2 = 2.34 \)
In this course
Square Error Loss
Cross Entropy Loss
KL divergence



\( \left [\begin{array}{lcr} 0.00\\ 0.10\\ 0.20\\ ....\\6.40 \end{array} \right]\)
\( \left [\begin{array}{lcr} 0.24\\ 0.08\\ 0.12\\ ....\\0.36 \end{array} \right]\)
\( x \)
\( y \)
\( \hat{f_1}(x) \)
\( \left [\begin{array}{lcr} 0.25\\ 0.09\\ 0.11\\ ....\\0.36 \end{array} \right]\)
\( \left [\begin{array}{lcr} 0.32\\ 0.30\\ 0.31\\ ....\\0.22 \end{array} \right]\)
\( \left [\begin{array}{lcr} 0.08\\ 0.20\\ 0.14\\ ....\\0.15 \end{array} \right]\)
\( \begin{array}{lcr} 1\\ 2\\ 3\\ ....\\n \end{array} \)
\( \hat{f_2}(x) \)
\( \hat{f_3}(x) \)
\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)
\( = (0.24-0.25)^2 + (0.08-0.09)^2 + \newline (0.12-0.11)^2 + ... + (0.36-0.36)^2 \)
\( = 1.38 \)
\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 = 1.38\)
\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 = ? \)
Loss Function
What does a loss function look like ?
(c) One Fourth Labs

Loss
Model
Data


Task


Learning Algorithm
How do we identify parameters of the model?
(c) One Fourth Labs

\( \hat{f_1}(x) = 3.5x_1^2 + 2.5x_2^{3} + 1.2x_3^{2} \)
\( \hat{f_1}(x) = ax_1^2 + bx_2^{3} + cx_3^{2} \)
\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)
Budget in (100crs) |
Box Office Collection in (100 crs) | Action Scene in times (100 mins) | IMDB Rating |
---|---|---|---|
0.55 | 0.66 | 0.22 | 4.8 |
0.68 | 0.91 | 0.77 | 7.2 |
0.66 | 0.88 | 0.67 | 6.7 |
0.72 | 0.94 | 0.97 | 8.1 |
0.58 | 0.74 | 0.35 | 5.3 |







Learning Algorithm
How do you formulate this mathematically ?
(c) One Fourth Labs

\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)
In practice, brute force search is infeasible
Find \(a, b, c \) such that
is minimized
\( \hat{f_1}(x) = ax_1^2 + bx_2^{3} + cx_3^{2} \)
Budget (100crore) |
Box Office Collection(100 crore) | Action Scene times (100 mins) | IMDB Rating |
---|---|---|---|
0.55 | 0.66 | 0.22 | 4.8 |
0.68 | 0.91 | 0.77 | 7,2 |
0.66 | 0.88 | 0.67 | 6.7 |
0.72 | 0.94 | 0.97 | 8.1 |
0.58 | 0.74 | 0.35 | 5.3 |


Learning Algorithm
How do you formulate this mathematically ?
(c) One Fourth Labs

\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)
Many optimization solvers are available
\(min_{a,b,c}\)
\( \hat{f_1}(x) = ax_1^2 + bx_2^{3} + cx_3^{2} \)
Budget (100crore) |
Box Office Collection(100 crore) | Action Scene times (100 mins) | IMDB Rating |
---|---|---|---|
0.55 | 0.66 | 0.22 | 4.8 |
0.68 | 0.91 | 0.77 | 7,2 |
0.66 | 0.88 | 0.67 | 6.7 |
0.72 | 0.94 | 0.97 | 8.1 |
0.58 | 0.74 | 0.35 | 5.3 |


Learning Algorithm
How do you formulate this mathematically ?
(c) One Fourth Labs

\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)
Many optimization solvers are available
\(min_{a,b,c}\)
\( \hat{f_1}(x) = ax_1^2 + bx_2^{3} + cx_3^{2} \)
In this course
Gradient Descent ++
Adagrad
RMSProp
Adam
Budget (100crore) |
Box Office Collection(100 crore) | Action Scene times (100 mins) | IMDB Rating |
---|---|---|---|
0.55 | 0.66 | 0.22 | 4.8 |
0.68 | 0.91 | 0.77 | 7,2 |
0.66 | 0.88 | 0.67 | 6.7 |
0.72 | 0.94 | 0.97 | 8.1 |
0.58 | 0.74 | 0.35 | 5.3 |
(c) One Fourth Labs

Learning Algorithm
How do you formulate this mathematically ?
Learning
Loss
Model
Data


Task



Evaluation
How do we compute a score for our ML model?
(c) One Fourth Labs

\( \left[\begin{array}{lcr} 2.1, 1.2, \dots, 5.6, 7.8 \end{array} \right]\)
\( \left[\begin{array}{lcr} 3.5, 6.6, \dots, 2.5, 6.3 \end{array} \right]\)



\( \left[\begin{array}{lcr} 6.3, 2.6, \dots, 4.5, 3.8 \end{array} \right]\)

\( \left[\begin{array}{lcr} 2.8, 3.6, \dots, 7.5, 2.1 \end{array} \right]\)

\( \left[\begin{array}{lcr} 6.3, 2.6, \dots, 4.5, 3.8 \end{array} \right]\)
True Labels
Predicted Labels
1
2
3
4
5
5
4
1
3
1
Class Labels | |
---|---|
Lion | 1 |
Tiger | 2 |
Cat | 3 |
Giraffe | 4 |
Dog | 5 |


\( \left[\begin{array}{lcr} 1.9, 3.3, \dots, 4.2, 1.1 \end{array} \right]\)
\( \left[\begin{array}{lcr} 2.2, 1.7, \dots, 2.5, 1.8 \end{array} \right]\)
3
5
2
5
Top - 1
Evaluation
How do we compute a score for our ML model?
(c) One Fourth Labs

\( \left[\begin{array}{lcr} 2.1, 1.2, \dots, 5.6, 7.8 \end{array} \right]\)
\( \left[\begin{array}{lcr} 3.5, 6.6, \dots, 2.5, 6.3 \end{array} \right]\)



\( \left[\begin{array}{lcr} 6.3, 2.6, \dots, 4.5, 3.8 \end{array} \right]\)

\( \left[\begin{array}{lcr} 2.8, 3.6, \dots, 7.5, 2.1 \end{array} \right]\)

\( \left[\begin{array}{lcr} 6.3, 2.6, \dots, 4.5, 3.8 \end{array} \right]\)
True Labels
Predicted Labels
1
2
3
4
5
Class Labels | |
---|---|
Lion | 1 |
Tiger | 2 |
Cat | 3 |
Giraffe | 4 |
Dog | 5 |


\( \left[\begin{array}{lcr} 1.9, 3.3, \dots, 4.2, 1.1 \end{array} \right]\)
\( \left[\begin{array}{lcr} 2.2, 1.7, \dots, 2.5, 1.8 \end{array} \right]\)
3
5
Top - 3
\( \left[\begin{array}{lcr} 1, 2, 3\end{array} \right]\)
\( \left[\begin{array}{lcr} 1, 2, 3\end{array} \right]\)
\( \left[\begin{array}{lcr} 1, 2, 3\end{array} \right]\)
\( \left[\begin{array}{lcr} 4, 5, 3\end{array} \right]\)
\( \left[\begin{array}{lcr} 5, 2, 1\end{array} \right]\)
\( \left[\begin{array}{lcr} 2, 1, 4\end{array} \right]\)
\( \left[\begin{array}{lcr} 5, 4, 1\end{array} \right]\)
Evaluation
How is this different from loss function ?
(c) One Fourth Labs



Evaluation
Brake
/Go
Loss function

\( maximize \)

Brake
Dog
Evaluation
Should we learn and test on the same data?
(c) One Fourth Labs

\( \left[\begin{array}{lcr} 2.1, 1.2, \dots, 5.6, 7.8 \end{array} \right]\)
\( \left[\begin{array}{lcr} 3.5, 6.6, \dots, 2.5, 6.3 \end{array} \right]\)



\( \left[\begin{array}{lcr} 6.3, 2.6, \dots, 4.5, 3.8 \end{array} \right]\)

\( \left[\begin{array}{lcr} 2.8, 3.6, \dots, 7.5, 2.1 \end{array} \right]\)
\( \left[\begin{array}{lcr} 2.2, 1.7, \dots, 2.5, 1.8 \end{array} \right]\)
1
2
3
4
2
\( \left[\begin{array}{lcr} 6.3, 2.6, \dots, 4.5, 3.8 \end{array} \right]\)
\( \left[\begin{array}{lcr} 2.8, 3.6, \dots, 7.5, 2.1 \end{array} \right]\)
\( \left[\begin{array}{lcr} 2.2, 1.7, \dots, 2.5, 1.8 \end{array} \right]\)
1
3
4
Training Data
Test Data
\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)
\( \hat{f_1}(x) = ax_1^2 + bx_2^{3} + cx_3^{2} \)
\(min_{a,b,c}\)

Evaluation
How is this different from loss function ?
(c) One Fourth Labs

Learning
Loss
Model
Data


Task




Evaluation
Putting it all together
How does all the jargon fit into these jars?
(c) One Fourth Labs

Linear Algebra
Probability
Calculus
Data
Model
Loss
Learning
Task
Evaluation















Data, democratisation, devices
Why ML is very successful?
(c) One Fourth Labs

Data
Model
Loss
Learning
Task
Evaluation


Improvised
Democratised
Abundance














Typical ML effort
How to distribute your work through the six jars?
(c) One Fourth Labs

Your Job
Model
Loss
Learning
Evaluation
Data
Task
Mix and Match












Connecting to the Capstone
How to distribute your work through the six jars?
(c) One Fourth Labs


Mumbai





/
/
मुंबई \( \rightarrow \) Mumbai




\( \sum_{i=1}^{n} (y_i - \hat{f}(x_i))^2 \)
\( -\sum_{i=1}^{n} \log \hat{f}(x_i) \)
Accuracy
Precision/Recall
Top-k accuracy

Data


Task


Model


Loss


Learning


Evaluation


Assignment
How do you apply the six jars to a problem that you have encountered?
(c) One Fourth Labs

Explain the problem
Give link to the quiz
1. Formulate 3 problems from data.gov.in
2. In the dataturks labelled data, define tasks that you can perform and collect 10 data points for each
// Binary classification of whether there is text
// Detect text with bounding box - is accuracy easy to define here?
finalmerge
By preksha nema
finalmerge
- 694