1.3 Six Elements of ML
A defining framework for understanding concepts in the course

Recap: Machine Learning
What we saw in the previous chapter?
(c) One Fourth Labs


A jargon cloud
How do you make sense of all the jargon?
(c) One Fourth Labs


From jargons to jars
What are the six jars of Machine Lerarning
(c) One Fourth Labs







Data data everywhere
What is the fuel of Machine Learning?
(c) One Fourth Labs



















Data data everywhere
How do you feed data to machines ?
(c) One Fourth Labs








Input-1 | Input-2 | Input-3 | Input-4 | y |
---|---|---|---|---|
2.3 | 5.9 | 11.0 | -10.3 | 0 |
-8.5 | -1.7 | -1.3 | 9.0 | 0 |
12.3 | 5.4 | 3.4 | 2.4 | 1 |
1.9 | 7.9 | 8.1 | -3.3 | 1 |
-9.1 | 1.2 | -2.1 | 7.8 | 0 |
3.2 | -11.2 | 5.6 | 12.1 | 1 |
4.5 | 3.75 | -1.2 | -10.0 | 1 |
All data encoded as numbers
Typically high dimensional


@Sir Choose between 3.2 and 3.3
Data data everywhere
How do you feed data to machines ?
(c) One Fourth Labs

We encode all data into numbers - typically high dimension
For instance, in this course you will learn to embed image and text data as large vectors
Data entries are related - eg. given a MRI scan whether there is a tumour or not
Include a table that shows two/three MRI scans in first col, shows large vectors in second column, 1/0 for last column of whether there is tumour or not
Include a table that shows two/three reviews in first col, shows large vectors in second column, 1/0 for last column for whether review is positive or negative
Title the columns as x and y




All data encoded as numbers
Typically high dimensional
scans | ||
---|---|---|
2.3 | 5.9 | ... | 11.0 | -0.3 | 8.9 | 0 |
---|
-8.5 | -1.7 | ... | -1.3 | 9.0 | 7.2 | 1 |
---|
-0.4 | 6.7 | ... | -2.4 | 4.7 | -7.2 | 0 |
---|
1.6 | -0.4 | ... | -4.6 | 6.4 | 1.9 | 1 |
---|


Data data everywhere
How do you feed data to machines ?
(c) One Fourth Labs

We encode all data into numbers - typically high dimension
For instance, in this course you will learn to embed image and text data as large vectors
Data entries are related - eg. given a MRI scan whether there is a tumour or not
Include a table that shows two/three MRI scans in first col, shows large vectors in second column, 1/0 for last column of whether there is tumour or not
Include a table that shows two/three reviews in first col, shows large vectors in second column, 1/0 for last column for whether review is positive or negative
Title the columns as x and y
All data encoded as numbers
Typically high dimensional
R | ||
---|---|---|
2.3 | 5.9 | ... | 11.0 | -0.3 | 8.9 |
---|
-8.5 | -1.7 | ... | -1.3 | 9.0 | 7.2 |
---|
-0.4 | 6.7 | ... | -2.4 | 4.7 | -6.2 |
---|
1.6 | -0.4 | ... | -4.6 | 6.4 | 1.9 |
---|


Don't buy this MI 6 Pro, Speaker volume is very bad
Delivered as shown. Good price and fits perfect
What a phone.. A handy epic phone. MI at its best ...
Its look stunning in pictures , but not in real.
negative
negative
positive
positive
@Sir Choose between 3.4 and 3.5
Data data everywhere
How do you feed data to machines ?
(c) One Fourth Labs

Input-1 | Input-2 | Input-3 | Input-4 | y |
---|---|---|---|---|
4.3 | 5.9 | 1.0 | 13.2 | Positive |
-9.5 | 1.7 | 1.3 | 9.2 | Positive |
2.3 | 5.4 | 3.8 | 2.9 | Negative |
19.1 | 8.9 | 8.2 | -3.3 | Positive |
-9.2 | 11.2 | -12.1 | 1.8 | Positive |
4.5 | -11.2 | 4.6 | 2.1 | Negative |
12.2 | -3.8 | 0.2 | -1.0 | Negative |








All data encoded as numbers
Typically high dimensional


Data data everywhere
How do you feed data to machines ?
(c) One Fourth Labs






1.3 | -4.3 | 2.1 | -6.7 | ... | 1.5 | 8.9 | 10.1 | -4.5 |
2.6 | 7.9 | -0.3 | 8.1 | ... | -4.2 | 0.3 | 1.2 | 9.4 |
-5.2 | -3.2 | 4.2 | 0.3 | ... | 3.5 | 8.3 | -1.4 | -8.7 |
8.5 | 2.1 | -6.3 | 5.3 | ... | 7.2 | -1.3 | -4.5 | 11.8 |
2.3 | -5.6 | -1.2 | 7.8 | ... | 9.9 | 10.1 | -1.1 | 3.5 |
All data encoded as numbers
Typically high dimensional
In this course


text
image
Data curation
Where do I get the data from?
(c) One Fourth Labs






I am lucky
I am rich
I am smart

+ मुंबई

= मुंबई
In this course

Data data everywhere
What is the fuel of Machine Learning?
(c) One Fourth Labs


Data
Tasks
What do you do with this data?
(c) One Fourth Labs

Input
Output


















Hello John,
Hello John,
From product description to structured specifications
From specifications + revies to writing FAQs
From specifications + reviews + FAQs to Question Answering
From specifications + reviews + personal data to recommendations





+


+

+


Hello John,
(c) One Fourth Labs
Tasks
What do you do with this data?
(c) One Fourth Labs







From images identify people

Shahrukh Khan
Aamir Khan
From images identify activities

Eating
From images identify places

Gym

From posts recommend posts

Output
Input
Tasks
What do you do with this data?
(c) One Fourth Labs

Supervised
Classification






3.2 | 5.9 | ... | 11.0 | 8.9 | 1 |
---|
-8.5 | -1.7 | ... | 9.0 | 7.2 | 1 |
---|
-0.4 | 6.7 | ... | 4.7 | -7.2 | 0 |
---|
2.7 | 3.1 | ... | -2.1 | 9.7 | 0 |
---|
3.9 | 7.8 | ... | -5.1 | 3.7 | 0 |
---|
7.1 | 0.9 | ... | 1.5 | -4.2 | 1 |
---|







Tasks
What do you do with this data?
(c) One Fourth Labs

Supervised
Regression



-8.5 | -1.7 | ... | 9.0 | 7.2 | 2.3 | 1.2 | 9.2 | 10.1 |
---|





0.9 | -2.1 | ... | -8.1 | 1.9 | 4.3 | 4.2 | 7.1 | 5.1 |
---|
2.9 | -4.5 | ... | -3.7 | 8.9 | 2.3 | 7.2 | 6.9 | 7.3 |
---|
Tasks
What do you do with this data?
(c) One Fourth Labs

Clustering
Unupervised






3.2 | 5.9 | ... | 11.0 | 8.9 |
---|
-8.5 | -1.7 | ... | 9.0 | 7.2 |
---|
-0.4 | 6.7 | ... | 4.7 | -4.1 |
---|
2.7 | 3.1 | ... | -2.1 | 9.7 |
---|
3.9 | 7.8 | ... | -5.1 | 3.7 |
---|
7.1 | 0.9 | ... | 1.5 | -4.2 | 1 |
---|









Tasks
What do you do with this data?
(c) One Fourth Labs

Generation
Unupervised

3.2 | 5.9 | ... | 11.0 | 8.9 |
---|
-8.5 | -1.7 | ... | 9.0 | 7.2 |
---|
-0.4 | 6.7 | ... | 4.7 | -4.1 |
---|
2.7 | 3.1 | ... | -2.1 | 9.7 |
---|
3.9 | 7.8 | ... | -5.1 | 3.7 |
---|
7.1 | 0.9 | ... | 1.5 | -4.2 | 1 |
---|






Tasks
What do you do with this data?
Generation
Unupervised
Tweets | |
---|---|
2.3 | 5.9 | ... | 11.0 | -0.3 | 8.9 |
---|
-8.5 | -1.7 | ... | -1.3 | 9.0 | 7.2 |
---|
-0.4 | 6.7 | ... | -2.4 | 4.7 | -6.2 |
---|
1.6 | -0.4 | ... | -4.6 | 6.4 | 1.9 |
---|





(c) One Fourth Labs

Tasks
What do you do with this data?
(c) One Fourth Labs

\( `` \)
Supervised Learning has created 99% of economic value in AI
In this course
Classification
Regression












Tasks
What do you do with this data?
(c) One Fourth Labs



Data
Task
What is the mathematical formulation of a task?
(c) One Fourth Labs

\( x \)
\( y \)
bat
car
dog
cat
Models
\( \left[\begin{array}{lcr} 2.1, 1.2, \dots, 5.6, 7.2 \end{array} \right]\)
\( \left[\begin{array}{lcr} 0, 0, 1,0, 0 \end{array} \right]\)
\( y = f(x) \) [true relation, unknown]
\( \hat{y} = \hat{f}(x) \) [our approximation]
ship
\( \left[\begin{array}{lcr} 0, 1, 0, 0, 0 \end{array} \right]\)
\( \left[\begin{array}{lcr} 0, 0, 0, 0, 1 \end{array} \right]\)
\( \left[\begin{array}{lcr} 1, 0, 0, 0, 0 \end{array} \right]\)
\( \left[\begin{array}{lcr} 0, 0, 1, 0, 0 \end{array} \right]\)
\( \left[\begin{array}{lcr} 0.1, 3.1, \dots, 1.7, 3.4\end{array} \right]\)
\( \left[\begin{array}{lcr} 0.5, 9.1,\dots, 5.1, 0.8 \end{array} \right]\)
\( \left[\begin{array}{lcr} 1.2, 4.1, \dots, 6.3, 7.4 \end{array} \right]\)
\( \left[\begin{array}{lcr} 3.2, 2.1, \dots, 3.1, 0.9 \end{array} \right]\)
Models
What are the choices for \( \hat{f} \) ?
(c) One Fourth Labs

\( \hat{y} = mx + c \)
\(\hat{ y} = ax^2 + bx + c \)
\( y = \sigma(wx + b) \)
\( y = Deep\_NN(x) \)
\( \hat{y} = \hat{f}(x) \) [our approximation]
\( \left [\begin{array}{lcr} 0.5\\ 0.2\\ 0.6\\ \dots\\0.3\ \end{array} \right]\)
\( \left [\begin{array}{lcr} 14.8\\ 13.3\\ 11.6\\ \dots\\6.16 \end{array} \right]\)
\( x \)
\( y \)
\(\hat{ y} = ax^3 + bx^2 + cx + d \)
\(\hat{ y} = ax^4 + bx^3 + cx + d \)
Data
In this course
\( y = Deep\_CNN(x) \) ...
\( y = RNN(x) \) ...


Data is drawn from the following distribution
\(\hat{ y} = ax^{25} + bx^{24} + \dots + cx + d \)





Models
Why not just use a complex model always ?
(c) One Fourth Labs

\( \left [\begin{array}{lcr} 0.1\\ 0.2\\ 0.4\\ ....\\0.8 \end{array} \right]\)
\( \left [\begin{array}{lcr} 2.6\\ 2.4\\ 3.1\\ ....\\4.1 \end{array} \right]\)
\( x \)
\( y \)
\( y = mx + c \) [true function, simple]
\(\hat{y} = ax^{100} + bx^{99} + ... + c \)
[our approximation, very complex]
Later in this course
Bias-Variance Tradeoff
Overfitting
Regularization


Models
What are the choices for \( \hat{f} \) ?
(c) One Fourth Labs




Data
Model
Task
Loss Function
How do we know which model is better ?

\( \left [\begin{array}{lcr} 0.00\\ 0.10\\ 0.20\\ ....\\6.40 \end{array} \right]\)
\( \left [\begin{array}{lcr} 0.24\\ 0.08\\ 0.12\\ ....\\0.36 \end{array} \right]\)
\( x \)
\( y \)
?
\( \hat{f_1}(x) \)
\( \left [\begin{array}{lcr} 0.25\\ 0.09\\ 0.11\\ ....\\0.36 \end{array} \right]\)
\( \left [\begin{array}{lcr} 0.32\\ 0.30\\ 0.31\\ ....\\0.22 \end{array} \right]\)
\( \left [\begin{array}{lcr} 0.08\\ 0.20\\ 0.14\\ ....\\0.15 \end{array} \right]\)
\( \hat{f_1}(x) = a_1x^{25} + b_1x^{24} + ... + c_1x + d_1 \)
\( \hat{f_2}(x) = a_1x^{25} + b_1x^{24} + ... + c_1x + d_1 \)
\( \hat{f_3}(x) = a_1x^{25} + b_1x^{24} + ... + c_1x + d_1 \)
\( \begin{array}{lcr} 1\\ 2\\ 3\\ ....\\n \end{array} \)
\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)
\( \hat{f_2}(x) \)
\( \hat{f_3}(x) \)
\( \mathscr{L}_2 = \sum_{i=1}^{n} (y_i - \hat{f}_2(x_i))^2 \)
\( \mathscr{L}_3 = \sum_{i=1}^{n} (y_i - \hat{f}_3(x_i))^2 \)
True Function
\( \hat{f_1}(x) \)
\( \hat{f_2}(x) \)
\( \hat{f_3}(x) \)



Loss Function
How do we know which model is better ?
(c) One Fourth Labs

\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 = 1.38\)
\( \mathscr{L}_2 = \sum_{i=1}^{n} (y_i - \hat{f}_2(x_i))^2 = 2.02\)
\( \mathscr{L}_3 = \sum_{i=1}^{n} (y_i - \hat{f}_3(x_i))^2 = 2.34 \)
In this course
Square Error Loss
Cross Entropy Loss
KL divergence



\( \left [\begin{array}{lcr} 0.00\\ 0.10\\ 0.20\\ ....\\6.40 \end{array} \right]\)
\( \left [\begin{array}{lcr} 0.24\\ 0.08\\ 0.12\\ ....\\0.36 \end{array} \right]\)
\( x \)
\( y \)
\( \hat{f_1}(x) \)
\( \left [\begin{array}{lcr} 0.25\\ 0.09\\ 0.11\\ ....\\0.36 \end{array} \right]\)
\( \left [\begin{array}{lcr} 0.32\\ 0.30\\ 0.31\\ ....\\0.22 \end{array} \right]\)
\( \left [\begin{array}{lcr} 0.08\\ 0.20\\ 0.14\\ ....\\0.15 \end{array} \right]\)
\( \begin{array}{lcr} 1\\ 2\\ 3\\ ....\\n \end{array} \)
\( \hat{f_2}(x) \)
\( \hat{f_3}(x) \)
\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)
\( = (0.24-0.25)^2 + (0.08-0.09)^2 + \newline (0.12-0.11)^2 + ... + (0.36-0.36)^2 \)
\( = 1.38 \)
Loss Function
What does a loss function look like ?
(c) One Fourth Labs





Data
Model
Loss
Task
Learning Algorithm
How do we identify parameters of the model?
(c) One Fourth Labs

\( \hat{f_1}(x) = 3.5x_1^2 + 2.5x_2^{3} + 1.2x_3^{2} \)
\( \hat{f_1}(x) = ax_1^2 + bx_2^{3} + cx_3^{2} \)
\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)

Budget (100crore) |
Box Office Collection(100 crore) | Action Scene times (100 mins) | IMDB Rating |
---|---|---|---|
0.55 | 0.66 | 0.22 | 4.8 |
0.68 | 0.91 | 0.77 | 7,2 |
0.66 | 0.88 | 0.67 | 6.7 |
0.72 | 0.94 | 0.97 | 8.1 |
0.58 | 0.74 | 0.35 | 5.3 |


Learning Algorithm
How do you formulate this mathematically ?
(c) One Fourth Labs

\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)
In practice, brute force search is infeasible
Find \(a, b, c \) such that
is minimized
\( \hat{f_1}(x) = ax_1^2 + bx_2^{3} + cx_3^{2} \)

Budget (100crore) |
Box Office Collection(100 crore) | Action Scene times (100 mins) | IMDB Rating |
---|---|---|---|
0.55 | 0.66 | 0.22 | 4.8 |
0.68 | 0.91 | 0.77 | 7,2 |
0.66 | 0.88 | 0.67 | 6.7 |
0.72 | 0.94 | 0.97 | 8.1 |
0.58 | 0.74 | 0.35 | 5.3 |
Learning Algorithm
How do you formulate this mathematically ?
(c) One Fourth Labs

\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)
Many optimization solvers are available
\(min_{a,b,c}\)
\( \hat{f_1}(x) = ax_1^2 + bx_2^{3} + cx_3^{2} \)

Budget (100crore) |
Box Office Collection(100 crore) | Action Scene times (100 mins) | IMDB Rating |
---|---|---|---|
0.55 | 0.66 | 0.22 | 4.8 |
0.68 | 0.91 | 0.77 | 7,2 |
0.66 | 0.88 | 0.67 | 6.7 |
0.72 | 0.94 | 0.97 | 8.1 |
0.58 | 0.74 | 0.35 | 5.3 |
Learning Algorithm
How do you formulate this mathematically ?
(c) One Fourth Labs

\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)
Many optimization solvers are available
\(min_{a,b,c}\)
\( \hat{f_1}(x) = ax_1^2 + bx_2^{3} + cx_3^{2} \)
In this course
Gradient Descent ++
Adagrad
RMSProp
Adam
Budget (100crore) |
Box Office Collection(100 crore) | Action Scene times (100 mins) | IMDB Rating |
---|---|---|---|
0.55 | 0.66 | 0.22 | 4.8 |
0.68 | 0.91 | 0.77 | 7,2 |
0.66 | 0.88 | 0.67 | 6.7 |
0.72 | 0.94 | 0.97 | 8.1 |
0.58 | 0.74 | 0.35 | 5.3 |
(c) One Fourth Labs

Learning Algorithm
How do you formulate this mathematically ?





Data
Model
Loss
Learning
Task
Evaluation
How do we compute a score for our ML model?
(c) One Fourth Labs

\( \left[\begin{array}{lcr} 2.1, 1.2, \dots, 5.6, 7.8 \end{array} \right]\)
\( \left[\begin{array}{lcr} 3.5, 6.6, \dots, 2.5, 6.3 \end{array} \right]\)



\( \left[\begin{array}{lcr} 6.3, 2.6, \dots, 4.5, 3.8 \end{array} \right]\)

\( \left[\begin{array}{lcr} 2.8, 3.6, \dots, 7.5, 2.1 \end{array} \right]\)

\( \left[\begin{array}{lcr} 6.3, 2.6, \dots, 4.5, 3.8 \end{array} \right]\)
True Labels
Predicted Labels
1
2
3
4
5
5
4
1
3
1
Class Labels | |
---|---|
Lion | 1 |
Tiger | 2 |
Cat | 3 |
Giraffe | 4 |
Dog | 5 |


\( \left[\begin{array}{lcr} 1.9, 3.3, \dots, 4.2, 1.1 \end{array} \right]\)
\( \left[\begin{array}{lcr} 2.2, 1.7, \dots, 2.5, 1.8 \end{array} \right]\)
3
5
2
5
Top - 1
Evaluation
How do we compute a score for our ML model?
(c) One Fourth Labs

\( \left[\begin{array}{lcr} 2.1, 1.2, \dots, 5.6, 7.8 \end{array} \right]\)
\( \left[\begin{array}{lcr} 3.5, 6.6, \dots, 2.5, 6.3 \end{array} \right]\)



\( \left[\begin{array}{lcr} 6.3, 2.6, \dots, 4.5, 3.8 \end{array} \right]\)

\( \left[\begin{array}{lcr} 2.8, 3.6, \dots, 7.5, 2.1 \end{array} \right]\)

\( \left[\begin{array}{lcr} 6.3, 2.6, \dots, 4.5, 3.8 \end{array} \right]\)
True Labels
Predicted Labels
1
2
3
4
5
Class Labels | |
---|---|
Lion | 1 |
Tiger | 2 |
Cat | 3 |
Giraffe | 4 |
Dog | 5 |


\( \left[\begin{array}{lcr} 1.9, 3.3, \dots, 4.2, 1.1 \end{array} \right]\)
\( \left[\begin{array}{lcr} 2.2, 1.7, \dots, 2.5, 1.8 \end{array} \right]\)
3
5
Top - 3
\( \left[\begin{array}{lcr} 1, 2, 3\end{array} \right]\)
\( \left[\begin{array}{lcr} 1, 2, 3\end{array} \right]\)
\( \left[\begin{array}{lcr} 1, 2, 3\end{array} \right]\)
\( \left[\begin{array}{lcr} 4, 5, 3\end{array} \right]\)
\( \left[\begin{array}{lcr} 5, 2, 1\end{array} \right]\)
\( \left[\begin{array}{lcr} 2, 1, 4\end{array} \right]\)
\( \left[\begin{array}{lcr} 5, 4, 1\end{array} \right]\)
Evaluation
How is this different from loss function ?
(c) One Fourth Labs

#( )


Evaluation
Brake
/Go

__________
#( )
Loss function

\( maximize \)

#( )

____________________
#( ) + #(___)
Evaluation
Should we learn and test on the same data?
(c) One Fourth Labs

\( \left[\begin{array}{lcr} 2.1, 1.2, \dots, 5.6, 7.8 \end{array} \right]\)
\( \left[\begin{array}{lcr} 3.5, 6.6, \dots, 2.5, 6.3 \end{array} \right]\)



\( \left[\begin{array}{lcr} 6.3, 2.6, \dots, 4.5, 3.8 \end{array} \right]\)

\( \left[\begin{array}{lcr} 2.8, 3.6, \dots, 7.5, 2.1 \end{array} \right]\)
\( \left[\begin{array}{lcr} 2.2, 1.7, \dots, 2.5, 1.8 \end{array} \right]\)
1
2
3
4
2
\( \left[\begin{array}{lcr} 6.3, 2.6, \dots, 4.5, 3.8 \end{array} \right]\)
\( \left[\begin{array}{lcr} 2.8, 3.6, \dots, 7.5, 2.1 \end{array} \right]\)
\( \left[\begin{array}{lcr} 2.2, 1.7, \dots, 2.5, 1.8 \end{array} \right]\)
1
3
4
Training Data
Test Data
\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)
\( \hat{f_1}(x) = ax_1^2 + bx_2^{3} + cx_3^{2} \)

\(min_{a,b,c}\)
Evaluation
How is this different from loss function ?
(c) One Fourth Labs







Data
Model
Loss
Learning
Task
Evaluation
Putting it all together
How does all the jargon fit into these jars?
(c) One Fourth Labs

Linear Algebra
Probability
Calculus






Data
Model
Loss
Learning
Task
Evaluation









Data, democratisation, devices
Why ML is very successful?
(c) One Fourth Labs







Data
Model
Loss
Learning
Task
Evaluation








Improvised
Democratised
Abudance
Typical ML effort
How to distribute your work through the six jars?
(c) One Fourth Labs

Your Job




Model
Loss
Learning
Evaluation






Data
Task


Connecting to the Capstone
How to distribute your work through the six jars?
(c) One Fourth Labs


Mumbai





/
/
मुंबई \( \rightarrow \) Mumbai




\( \sum_{i=1}^{n} (y_i - \hat{f}(x_i))^2 \)
\( -\sum_{i=1}^{n} \log \hat{f}(x_i) \)
Accuracy
Precision/Recall
Top-k accuracy






Data
Model
Loss
Learning
Task
Evaluation






Assignment
How do you apply the six jars to a problem that you have encountered?
(c) One Fourth Labs

Explain the problem
Give link to the quiz
1. Formulate 3 problems from data.gov.in
2. In the dataturks labelled data, define tasks that you can perform and collect 10 data points for each
// Binary classification of whether there is text
// Detect text with bounding box - is accuracy easy to define here?
Copy of finalmerge
By varshini7
Copy of finalmerge
- 585