A defining framework for understanding concepts in the course
What we saw in the previous chapter?
(c) One Fourth Labs
How do you make sense of all the jargon?
(c) One Fourth Labs
What are the six jars of Machine Lerarning
(c) One Fourth Labs
What is the fuel of Machine Learning?
(c) One Fourth Labs
How do you feed data to machines ?
(c) One Fourth Labs
We encode all data into numbers - typically high dimension
For instance, in this course you will learn to embed image and text data as large vectors
Data entries are related - eg. given a MRI scan whether there is a tumour or not
Include a table that shows two/three MRI scans in first col, shows large vectors in second column, 1/0 for last column of whether there is tumour or not
Include a table that shows two/three reviews in first col, shows large vectors in second column, 1/0 for last column for whether review is positive or negative
Title the columns as x and y
All data encoded as numbers
Typically high dimensional
scans | ||
---|---|---|
2.3 | 5.9 | ... | 11.0 | -0.3 | 8.9 | 0 |
---|
-8.5 | -1.7 | ... | -1.3 | 9.0 | 7.2 | 1 |
---|
-0.4 | 6.7 | ... | -2.4 | 4.7 | -7.3 | 0 |
---|
1.6 | -0.4 | ... | -4.6 | 6.4 | 1.9 | 1 |
---|
3.9 | -4.1 | ... | 6.7 | -3.1 | 2.1 | 1 |
---|
5.1 | 3.7 | ... | 1.8 | -4.2 | 9.3 | 1 |
---|
How do you feed data to machines ?
(c) One Fourth Labs
We encode all data into numbers - typically high dimension
For instance, in this course you will learn to embed image and text data as large vectors
Data entries are related - eg. given a MRI scan whether there is a tumour or not
Include a table that shows two/three MRI scans in first col, shows large vectors in second column, 1/0 for last column of whether there is tumour or not
Include a table that shows two/three reviews in first col, shows large vectors in second column, 1/0 for last column for whether review is positive or negative
Title the columns as x and y
All data encoded as numbers
Typically high dimensional
Document | ||
---|---|---|
1.9 | 3.2 | ... | -9.8 | -6.7 | 1.2 |
---|
1.3 | 3.6 | ... | -5.4 | 9.1 | 2.3 |
---|
0.4 | 7.6 | ... | -0.1 | -1.4 | 8.7 |
---|
1.5 | -0.8 | ... | 7.8 | 8.4 | 0.3 |
---|
Don't buy this MI 6 Pro, Speaker volume is very bad
Delivered as shown. Good price and fits perfect
What a phone.. A handy epic phone. MI at its best ...
Its look stunning in pictures , but not in real.
negative
negative
positive
positive
Amazing camera and battery. Good deal!
2.5 | -5.7 | ... | 0.9 | 5.3 | -8.1 |
---|
positive
How do you feed data to machines ?
(c) One Fourth Labs
1.3 | -4.3 | 2.1 | -6.7 | ... | 1.5 | 8.9 | 10.1 | -4.5 |
2.6 | 7.9 | -0.3 | 8.1 | ... | -4.2 | 0.3 | 1.2 | 9.4 |
-5.2 | -3.2 | 4.2 | 0.3 | ... | 3.5 | 8.3 | -1.4 | -8.7 |
8.5 | 2.1 | -6.3 | 5.3 | ... | 7.2 | -1.3 | -4.5 | 11.8 |
2.3 | -5.6 | -1.2 | 7.8 | ... | 9.9 | 10.1 | -1.1 | 3.5 |
All data encoded as numbers
Typically high dimensional
In this course
text
image
Where do I get the data from?
(c) One Fourth Labs
I am lucky
I am rich
I am smart
+ मुंबई
= मुंबई
In this course
What is the fuel of Machine Learning?
(c) One Fourth Labs
Data
What do you do with this data?
(c) One Fourth Labs
Input
Output
Hello John,
Hello John,
From product description to structured specifications
From specifications + revies to writing FAQs
From specifications + reviews + FAQs to Question Answering
From specifications + reviews + personal data to recommendations
+
+
+
Hello John,
(c) One Fourth Labs
What do you do with this data?
(c) One Fourth Labs
From images identify people
Shahrukh Khan
Aamir Khan
From images identify activities
Eating
From images identify places
Gym
From posts recommend posts
Output
Input
What do you do with this data?
(c) One Fourth Labs
Supervised
Classification
3.2 | 5.9 | ... | 11.0 | 8.9 | 1 |
---|
-8.5 | -1.7 | ... | 9.0 | 7.2 | 1 |
---|
-0.4 | 6.7 | ... | 4.7 | -7.2 | 0 |
---|
2.7 | 3.1 | ... | -2.1 | 9.7 | 0 |
---|
3.9 | 7.8 | ... | -5.1 | 3.7 | 0 |
---|
7.1 | 0.9 | ... | 1.5 | -4.2 | 1 |
---|
What do you do with this data?
(c) One Fourth Labs
Supervised
Regression
-8.5 | -1.7 | ... | 9.0 | 7.2 | 2.3 | 1.2 | 9.2 | 10.1 |
---|
0.9 | -2.1 | ... | -8.1 | 1.9 | 4.3 | 4.2 | 7.1 | 5.1 |
---|
2.9 | -4.5 | ... | -3.7 | 8.9 | 2.3 | 7.2 | 6.9 | 7.3 |
---|
What do you do with this data?
(c) One Fourth Labs
Clustering
Unupervised
3.2 | 5.9 | ... | 11.0 | 8.9 |
---|
-8.5 | -1.7 | ... | 9.0 | 7.2 |
---|
-0.4 | 6.7 | ... | 4.7 | -4.1 |
---|
2.7 | 3.1 | ... | -2.1 | 9.7 |
---|
3.9 | 7.8 | ... | -5.1 | 3.7 |
---|
7.1 | 0.9 | ... | 1.5 | -4.2 | 1 |
---|
What do you do with this data?
(c) One Fourth Labs
Generation
Unupervised
3.2 | 5.9 | ... | 11.0 | 8.9 |
---|
-8.5 | -1.7 | ... | 9.0 | 7.2 |
---|
-0.4 | 6.7 | ... | 4.7 | -4.1 |
---|
2.7 | 3.1 | ... | -2.1 | 9.7 |
---|
3.9 | 7.8 | ... | -5.1 | 3.7 |
---|
7.1 | 0.9 | ... | 1.5 | -4.2 | 1 |
---|
What do you do with this data?
Generation
Unupervised
Tweets | |
---|---|
2.3 | 5.9 | ... | 11.0 | -0.3 | 8.9 |
---|
-8.5 | -1.7 | ... | -1.3 | 9.0 | 7.2 |
---|
-0.4 | 6.7 | ... | -2.4 | 4.7 | -6.2 |
---|
1.6 | -0.4 | ... | -4.6 | 6.4 | 1.9 |
---|
(c) One Fourth Labs
What do you do with this data?
(c) One Fourth Labs
\( `` \)
Supervised Learning has created 99% of economic value in AI
In this course
Classification
Regression
What do you do with this data?
(c) One Fourth Labs
Data
Task
What is the mathematical formulation of a task?
(c) One Fourth Labs
\( x \)
\( y \)
bat
car
dog
cat
\( \left[\begin{array}{lcr} 2.1, 1.2, \dots, 5.6, 7.2 \end{array} \right]\)
\( \left[\begin{array}{lcr} 0, 0, 1,0, 0 \end{array} \right]\)
\( y = f(x) \) [true relation, unknown]
\( \hat{y} = \hat{f}(x) \) [our approximation]
ship
\( \left[\begin{array}{lcr} 0, 1, 0, 0, 0 \end{array} \right]\)
\( \left[\begin{array}{lcr} 0, 0, 0, 0, 1 \end{array} \right]\)
\( \left[\begin{array}{lcr} 1, 0, 0, 0, 0 \end{array} \right]\)
\( \left[\begin{array}{lcr} 0, 0, 1, 0, 0 \end{array} \right]\)
\( \left[\begin{array}{lcr} 0.1, 3.1, \dots, 1.7, 3.4\end{array} \right]\)
\( \left[\begin{array}{lcr} 0.5, 9.1,\dots, 5.1, 0.8 \end{array} \right]\)
\( \left[\begin{array}{lcr} 1.2, 4.1, \dots, 6.3, 7.4 \end{array} \right]\)
\( \left[\begin{array}{lcr} 3.2, 2.1, \dots, 3.1, 0.9 \end{array} \right]\)
What are the choices for \( \hat{f} \) ?
(c) One Fourth Labs
\( \hat{y} = mx + c \)
\(\hat{ y} = ax^2 + bx + c \)
\( y = \sigma(wx + b) \)
\( y = Deep\_NN(x) \)
\( \hat{y} = \hat{f}(x) \) [our approximation]
\( \left [\begin{array}{lcr} 0.5\\ 0.2\\ 0.6\\ \dots\\0.3\ \end{array} \right]\)
\( \left [\begin{array}{lcr} 14.8\\ 13.3\\ 11.6\\ \dots\\6.16 \end{array} \right]\)
\( x \)
\( y \)
\(\hat{ y} = ax^3 + bx^2 + cx + d \)
\(\hat{ y} = ax^4 + bx^3 + cx + d \)
Data
In this course
\( y = Deep\_CNN(x) \) ...
\( y = RNN(x) \) ...
Data is drawn from the following function
\(\hat{ y} = ax^{25} + bx^{24} + \dots + cx + d \)
Why not just use a complex model always ?
(c) One Fourth Labs
\( \left [\begin{array}{lcr} 0.1\\ 0.2\\ 0.4\\ ....\\0.8 \end{array} \right]\)
\( \left [\begin{array}{lcr} 2.6\\ 2.4\\ 3.1\\ ....\\4.1 \end{array} \right]\)
\( x \)
\( y \)
\( y = mx + c \) [true function, simple]
\(\hat{y} = ax^{100} + bx^{99} + ... + c \)
[our approximation, very complex]
Later in this course
Bias-Variance Tradeoff
Overfitting
Regularization
What are the choices for \( \hat{f} \) ?
(c) One Fourth Labs
Model
Data
Task
How do we know which model is better ?
\( \left [\begin{array}{lcr} 0.00\\ 0.10\\ 0.20\\ ....\\6.40 \end{array} \right]\)
\( \left [\begin{array}{lcr} 0.24\\ 0.08\\ 0.12\\ ....\\0.36 \end{array} \right]\)
\( x \)
\( y \)
\( \hat{f_1}(x) \)
\( \left [\begin{array}{lcr} 0.25\\ 0.09\\ 0.11\\ ....\\0.36 \end{array} \right]\)
\( \left [\begin{array}{lcr} 0.32\\ 0.30\\ 0.31\\ ....\\0.22 \end{array} \right]\)
\( \left [\begin{array}{lcr} 0.08\\ 0.20\\ 0.14\\ ....\\0.15 \end{array} \right]\)
\( \hat{f_1}(x) = 1.79x^{25} - 4.54 x^{24} + ... - 1.48x + 2.48 \)
\( \hat{f_2}(x) = 2.27x^{25} + 9.89x^{24} + ... + 2.79x + 3.22 \)
\( \hat{f_3}(x) = 3.78x^{25} + 1.57x^{24} + ... + 1.01x + 8.68 \)
\( \begin{array}{lcr} 1\\ 2\\ 3\\ ....\\n \end{array} \)
\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)
\( \hat{f_2}(x) \)
\( \hat{f_3}(x) \)
\( \mathscr{L}_2 = \sum_{i=1}^{n} (y_i - \hat{f}_2(x_i))^2 \)
\( \mathscr{L}_3 = \sum_{i=1}^{n} (y_i - \hat{f}_3(x_i))^2 \)
True Function
\( \hat{f_1}(x) \)
\( \hat{f_2}(x) \)
\( \hat{f_3}(x) \)
why not use numbers ?
whose function is better?
How do we know which model is better ?
(c) One Fourth Labs
\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 = ? \)
\( \mathscr{L}_2 = \sum_{i=1}^{n} (y_i - \hat{f}_2(x_i))^2 = 2.02\)
\( \mathscr{L}_3 = \sum_{i=1}^{n} (y_i - \hat{f}_3(x_i))^2 = 2.34 \)
In this course
Square Error Loss
Cross Entropy Loss
KL divergence
\( \left [\begin{array}{lcr} 0.00\\ 0.10\\ 0.20\\ ....\\6.40 \end{array} \right]\)
\( \left [\begin{array}{lcr} 0.24\\ 0.08\\ 0.12\\ ....\\0.36 \end{array} \right]\)
\( x \)
\( y \)
\( \hat{f_1}(x) \)
\( \left [\begin{array}{lcr} 0.25\\ 0.09\\ 0.11\\ ....\\0.36 \end{array} \right]\)
\( \left [\begin{array}{lcr} 0.32\\ 0.30\\ 0.31\\ ....\\0.22 \end{array} \right]\)
\( \left [\begin{array}{lcr} 0.08\\ 0.20\\ 0.14\\ ....\\0.15 \end{array} \right]\)
\( \begin{array}{lcr} 1\\ 2\\ 3\\ ....\\n \end{array} \)
\( \hat{f_2}(x) \)
\( \hat{f_3}(x) \)
\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)
\( = (0.24-0.25)^2 + (0.08-0.09)^2 + \newline (0.12-0.11)^2 + ... + (0.36-0.36)^2 \)
\( = 1.38 \)
\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 = 1.38\)
\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 = ? \)
What does a loss function look like ?
(c) One Fourth Labs
Loss
Model
Data
Task
How do we identify parameters of the model?
(c) One Fourth Labs
\( \hat{f_1}(x) = 3.5x_1^2 + 2.5x_2^{3} + 1.2x_3^{2} \)
\( \hat{f_1}(x) = ax_1^2 + bx_2^{3} + cx_3^{2} \)
\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)
Budget in (100crs) |
Box Office Collection in (100 crs) | Action Scene in times (100 mins) | IMDB Rating |
---|---|---|---|
0.55 | 0.66 | 0.22 | 4.8 |
0.68 | 0.91 | 0.77 | 7.2 |
0.66 | 0.88 | 0.67 | 6.7 |
0.72 | 0.94 | 0.97 | 8.1 |
0.58 | 0.74 | 0.35 | 5.3 |
How do you formulate this mathematically ?
(c) One Fourth Labs
\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)
In practice, brute force search is infeasible
Find \(a, b, c \) such that
is minimized
\( \hat{f_1}(x) = ax_1^2 + bx_2^{3} + cx_3^{2} \)
Budget (100crore) |
Box Office Collection(100 crore) | Action Scene times (100 mins) | IMDB Rating |
---|---|---|---|
0.55 | 0.66 | 0.22 | 4.8 |
0.68 | 0.91 | 0.77 | 7,2 |
0.66 | 0.88 | 0.67 | 6.7 |
0.72 | 0.94 | 0.97 | 8.1 |
0.58 | 0.74 | 0.35 | 5.3 |
How do you formulate this mathematically ?
(c) One Fourth Labs
\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)
Many optimization solvers are available
\(min_{a,b,c}\)
\( \hat{f_1}(x) = ax_1^2 + bx_2^{3} + cx_3^{2} \)
Budget (100crore) |
Box Office Collection(100 crore) | Action Scene times (100 mins) | IMDB Rating |
---|---|---|---|
0.55 | 0.66 | 0.22 | 4.8 |
0.68 | 0.91 | 0.77 | 7,2 |
0.66 | 0.88 | 0.67 | 6.7 |
0.72 | 0.94 | 0.97 | 8.1 |
0.58 | 0.74 | 0.35 | 5.3 |
How do you formulate this mathematically ?
(c) One Fourth Labs
\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)
Many optimization solvers are available
\(min_{a,b,c}\)
\( \hat{f_1}(x) = ax_1^2 + bx_2^{3} + cx_3^{2} \)
In this course
Gradient Descent ++
Adagrad
RMSProp
Adam
Budget (100crore) |
Box Office Collection(100 crore) | Action Scene times (100 mins) | IMDB Rating |
---|---|---|---|
0.55 | 0.66 | 0.22 | 4.8 |
0.68 | 0.91 | 0.77 | 7,2 |
0.66 | 0.88 | 0.67 | 6.7 |
0.72 | 0.94 | 0.97 | 8.1 |
0.58 | 0.74 | 0.35 | 5.3 |
(c) One Fourth Labs
How do you formulate this mathematically ?
Learning
Loss
Model
Data
Task
How do we compute a score for our ML model?
(c) One Fourth Labs
\( \left[\begin{array}{lcr} 2.1, 1.2, \dots, 5.6, 7.8 \end{array} \right]\)
\( \left[\begin{array}{lcr} 3.5, 6.6, \dots, 2.5, 6.3 \end{array} \right]\)
\( \left[\begin{array}{lcr} 6.3, 2.6, \dots, 4.5, 3.8 \end{array} \right]\)
\( \left[\begin{array}{lcr} 2.8, 3.6, \dots, 7.5, 2.1 \end{array} \right]\)
\( \left[\begin{array}{lcr} 6.3, 2.6, \dots, 4.5, 3.8 \end{array} \right]\)
True Labels
Predicted Labels
1
2
3
4
5
5
4
1
3
1
Class Labels | |
---|---|
Lion | 1 |
Tiger | 2 |
Cat | 3 |
Giraffe | 4 |
Dog | 5 |
\( \left[\begin{array}{lcr} 1.9, 3.3, \dots, 4.2, 1.1 \end{array} \right]\)
\( \left[\begin{array}{lcr} 2.2, 1.7, \dots, 2.5, 1.8 \end{array} \right]\)
3
5
2
5
Top - 1
How do we compute a score for our ML model?
(c) One Fourth Labs
\( \left[\begin{array}{lcr} 2.1, 1.2, \dots, 5.6, 7.8 \end{array} \right]\)
\( \left[\begin{array}{lcr} 3.5, 6.6, \dots, 2.5, 6.3 \end{array} \right]\)
\( \left[\begin{array}{lcr} 6.3, 2.6, \dots, 4.5, 3.8 \end{array} \right]\)
\( \left[\begin{array}{lcr} 2.8, 3.6, \dots, 7.5, 2.1 \end{array} \right]\)
\( \left[\begin{array}{lcr} 6.3, 2.6, \dots, 4.5, 3.8 \end{array} \right]\)
True Labels
Predicted Labels
1
2
3
4
5
Class Labels | |
---|---|
Lion | 1 |
Tiger | 2 |
Cat | 3 |
Giraffe | 4 |
Dog | 5 |
\( \left[\begin{array}{lcr} 1.9, 3.3, \dots, 4.2, 1.1 \end{array} \right]\)
\( \left[\begin{array}{lcr} 2.2, 1.7, \dots, 2.5, 1.8 \end{array} \right]\)
3
5
Top - 3
\( \left[\begin{array}{lcr} 1, 2, 3\end{array} \right]\)
\( \left[\begin{array}{lcr} 1, 2, 3\end{array} \right]\)
\( \left[\begin{array}{lcr} 1, 2, 3\end{array} \right]\)
\( \left[\begin{array}{lcr} 4, 5, 3\end{array} \right]\)
\( \left[\begin{array}{lcr} 5, 2, 1\end{array} \right]\)
\( \left[\begin{array}{lcr} 2, 1, 4\end{array} \right]\)
\( \left[\begin{array}{lcr} 5, 4, 1\end{array} \right]\)
How is this different from loss function ?
(c) One Fourth Labs
Evaluation
Brake
/Go
Loss function
\( maximize \)
Brake
Dog
Should we learn and test on the same data?
(c) One Fourth Labs
\( \left[\begin{array}{lcr} 2.1, 1.2, \dots, 5.6, 7.8 \end{array} \right]\)
\( \left[\begin{array}{lcr} 3.5, 6.6, \dots, 2.5, 6.3 \end{array} \right]\)
\( \left[\begin{array}{lcr} 6.3, 2.6, \dots, 4.5, 3.8 \end{array} \right]\)
\( \left[\begin{array}{lcr} 2.8, 3.6, \dots, 7.5, 2.1 \end{array} \right]\)
\( \left[\begin{array}{lcr} 2.2, 1.7, \dots, 2.5, 1.8 \end{array} \right]\)
1
2
3
4
2
\( \left[\begin{array}{lcr} 6.3, 2.6, \dots, 4.5, 3.8 \end{array} \right]\)
\( \left[\begin{array}{lcr} 2.8, 3.6, \dots, 7.5, 2.1 \end{array} \right]\)
\( \left[\begin{array}{lcr} 2.2, 1.7, \dots, 2.5, 1.8 \end{array} \right]\)
1
3
4
Training Data
Test Data
\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)
\( \hat{f_1}(x) = ax_1^2 + bx_2^{3} + cx_3^{2} \)
\(min_{a,b,c}\)
How is this different from loss function ?
(c) One Fourth Labs
Learning
Loss
Model
Data
Task
Evaluation
How does all the jargon fit into these jars?
(c) One Fourth Labs
Linear Algebra
Probability
Calculus
Data
Model
Loss
Learning
Task
Evaluation
Why ML is very successful?
(c) One Fourth Labs
Data
Model
Loss
Learning
Task
Evaluation
Improvised
Democratised
Abundance
How to distribute your work through the six jars?
(c) One Fourth Labs
Your Job
Model
Loss
Learning
Evaluation
Data
Task
Mix and Match
How to distribute your work through the six jars?
(c) One Fourth Labs
Mumbai
/
/
मुंबई \( \rightarrow \) Mumbai
\( \sum_{i=1}^{n} (y_i - \hat{f}(x_i))^2 \)
\( -\sum_{i=1}^{n} \log \hat{f}(x_i) \)
Accuracy
Precision/Recall
Top-k accuracy
Data
Task
Model
Loss
Learning
Evaluation
How do you apply the six jars to a problem that you have encountered?
(c) One Fourth Labs
Explain the problem
Give link to the quiz
1. Formulate 3 problems from data.gov.in
2. In the dataturks labelled data, define tasks that you can perform and collect 10 data points for each
// Binary classification of whether there is text
// Detect text with bounding box - is accuracy easy to define here?