Recap: Machine Learning

What we saw in the previous chapter?

(c) One Fourth Labs

A jargon cloud

How do you make sense of all the jargon?

(c) One Fourth Labs

From jargons to jars

What are the six jars of Machine Lerarning

(c) One Fourth Labs

Data data everywhere

What is the fuel of Machine Learning?

(c) One Fourth Labs

Data data everywhere

How do you feed data to machines ?

(c) One Fourth Labs

Input-1	Input-2	Input-3	Input-4	y
2.3	5.9	11.0	-10.3	0
-8.5	-1.7	-1.3	9.0	0
12.3	5.4	3.4	2.4	1
1.9	7.9	8.1	-3.3	1
-9.1	1.2	-2.1	7.8	0
3.2	-11.2	5.6	12.1	1
4.5	3.75	-1.2	-10.0	1

All data encoded as numbers

Typically high dimensional

\mathbb{R}^n

\mathbb{R}^n

@Sir Choose between 3.2 and 3.3

Data data everywhere

How do you feed data to machines ?

(c) One Fourth Labs

We encode all data into numbers - typically high dimension

For instance, in this course you will learn to embed image and text data as large vectors

Data entries are related - eg. given a MRI scan whether there is a tumour or not

Include a table that shows two/three MRI scans in first col, shows large vectors in second column, 1/0 for last column of whether there is tumour or not

Include a table that shows two/three reviews in first col, shows large vectors in second column, 1/0 for last column for whether review is positive or negative

Title the columns as x and y

All data encoded as numbers

Typically high dimensional

\mathbb{R}^n

\mathbb{R}^n

x

x

y

y

scans

2.3	5.9	...	11.0	-0.3	8.9	0

-8.5	-1.7	...	-1.3	9.0	7.2	1

-0.4	6.7	...	-2.4	4.7	-7.2	0

1.6	-0.4	...	-4.6	6.4	1.9	1

Data data everywhere

How do you feed data to machines ?

(c) One Fourth Labs

We encode all data into numbers - typically high dimension

For instance, in this course you will learn to embed image and text data as large vectors

Data entries are related - eg. given a MRI scan whether there is a tumour or not

Include a table that shows two/three MRI scans in first col, shows large vectors in second column, 1/0 for last column of whether there is tumour or not

Include a table that shows two/three reviews in first col, shows large vectors in second column, 1/0 for last column for whether review is positive or negative

Title the columns as x and y

All data encoded as numbers

Typically high dimensional

\mathbb{R}^n

\mathbb{R}^n

x

x

y

y

R

2.3	5.9	...	11.0	-0.3	8.9

-8.5	-1.7	...	-1.3	9.0	7.2

-0.4	6.7	...	-2.4	4.7	-6.2

1.6	-0.4	...	-4.6	6.4	1.9

Don't buy this MI 6 Pro, Speaker volume is very bad

Delivered as shown. Good price and fits perfect

What a phone.. A handy epic phone. MI at its best ...

Its look stunning in pictures , but not in real.

negative

positive

@Sir Choose between 3.4 and 3.5

Data data everywhere

How do you feed data to machines ?

(c) One Fourth Labs

Input-1	Input-2	Input-3	Input-4	y
4.3	5.9	1.0	13.2	Positive
-9.5	1.7	1.3	9.2	Positive
2.3	5.4	3.8	2.9	Negative
19.1	8.9	8.2	-3.3	Positive
-9.2	11.2	-12.1	1.8	Positive
4.5	-11.2	4.6	2.1	Negative
12.2	-3.8	0.2	-1.0	Negative

All data encoded as numbers

Typically high dimensional

\mathbb{R}^n

\mathbb{R}^n

Data data everywhere

How do you feed data to machines ?

(c) One Fourth Labs

1.3

-4.3

2.1

-6.7

...

1.5

8.9

10.1

-4.5

2.6

7.9

-0.3

8.1

...

-4.2

0.3

1.2

9.4

-5.2

-3.2

4.2

0.3

...

3.5

8.3

-1.4

-8.7

8.5

2.1

-6.3

5.3

...

7.2

-1.3

-4.5

11.8

2.3

-5.6

-1.2

7.8

...

9.9

10.1

-1.1

3.5

All data encoded as numbers

Typically high dimensional

\mathbb{R}^n

\mathbb{R}^n

In this course

text

image

Data curation

Where do I get the data from?

(c) One Fourth Labs

I am lucky

I am rich

I am smart

+ मुंबई

= मुंबई

In this course

Data data everywhere

What is the fuel of Machine Learning?

(c) One Fourth Labs

Data

Tasks

What do you do with this data?

(c) One Fourth Labs

Input

Output

Hello John,

From product description to structured specifications

From specifications + revies to writing FAQs

From specifications + reviews + FAQs to Question Answering

From specifications + reviews + personal data to recommendations

+

Hello John,

(c) One Fourth Labs

Tasks

What do you do with this data?

(c) One Fourth Labs

From images identify people

Shahrukh Khan

Aamir Khan

From images identify activities

Eating

From images identify places

Gym

From posts recommend posts

Output

Input

Tasks

What do you do with this data?

(c) One Fourth Labs

Supervised

Classification

x

x

y

y

3.2	5.9	...	11.0	8.9	1

-8.5	-1.7	...	9.0	7.2	1

-0.4	6.7	...	4.7	-7.2	0

2.7	3.1	...	-2.1	9.7	0

3.9	7.8	...	-5.1	3.7	0

7.1	0.9	...	1.5	-4.2	1

Tasks

What do you do with this data?

(c) One Fourth Labs

Supervised

Regression

x

x

-8.5	-1.7	...	9.0	7.2	2.3	1.2	9.2	10.1

left\_x

left\_x

left\_y

left\_y

width

width

depth

depth

left\_x

left\_x

left\_y

left\_y

width

width

height

height

0.9	-2.1	...	-8.1	1.9	4.3	4.2	7.1	5.1

2.9	-4.5	...	-3.7	8.9	2.3	7.2	6.9	7.3

Tasks

What do you do with this data?

(c) One Fourth Labs

Clustering

Unupervised

x

x

3.2	5.9	...	11.0	8.9

-8.5	-1.7	...	9.0	7.2

-0.4	6.7	...	4.7	-4.1

2.7	3.1	...	-2.1	9.7

3.9	7.8	...	-5.1	3.7

7.1	0.9	...	1.5	-4.2	1

Tasks

What do you do with this data?

(c) One Fourth Labs

Generation

Unupervised

x

x

3.2	5.9	...	11.0	8.9

-8.5	-1.7	...	9.0	7.2

-0.4	6.7	...	4.7	-4.1

2.7	3.1	...	-2.1	9.7

3.9	7.8	...	-5.1	3.7

7.1	0.9	...	1.5	-4.2	1

Tasks

What do you do with this data?

Generation

Unupervised

x

x

Tweets

2.3	5.9	...	11.0	-0.3	8.9

-8.5	-1.7	...	-1.3	9.0	7.2

-0.4	6.7	...	-2.4	4.7	-6.2

1.6	-0.4	...	-4.6	6.4	1.9

(c) One Fourth Labs

Tasks

What do you do with this data?

(c) One Fourth Labs

\( `` \)

Supervised Learning has created 99% of economic value in AI

In this course

Classification

Regression

left\_x

left\_x

width

width

depth

depth

left\_y

left\_y

Tasks

What do you do with this data?

(c) One Fourth Labs

Data

Task

What is the mathematical formulation of a task?

(c) One Fourth Labs

\( x \)

\( y \)

bat

car

dog

cat

Models

\( \left[\begin{array}{lcr} 2.1, 1.2, \dots, 5.6, 7.2 \end{array} \right]\)

\( \left[\begin{array}{lcr} 0, 0, 1,0, 0 \end{array} \right]\)

\( y = f(x) \) [true relation, unknown]

\( \hat{y} = \hat{f}(x) \) [our approximation]

ship

\( \left[\begin{array}{lcr} 0, 1, 0, 0, 0 \end{array} \right]\)

\( \left[\begin{array}{lcr} 0, 0, 0, 0, 1 \end{array} \right]\)

\( \left[\begin{array}{lcr} 1, 0, 0, 0, 0 \end{array} \right]\)

\( \left[\begin{array}{lcr} 0, 0, 1, 0, 0 \end{array} \right]\)

\( \left[\begin{array}{lcr} 0.1, 3.1, \dots, 1.7, 3.4\end{array} \right]\)

\( \left[\begin{array}{lcr} 0.5, 9.1,\dots, 5.1, 0.8 \end{array} \right]\)

\( \left[\begin{array}{lcr} 1.2, 4.1, \dots, 6.3, 7.4 \end{array} \right]\)

\( \left[\begin{array}{lcr} 3.2, 2.1, \dots, 3.1, 0.9 \end{array} \right]\)

Models

What are the choices for \( \hat{f} \) ?

(c) One Fourth Labs

\( \hat{y} = mx + c \)

\(\hat{ y} = ax^2 + bx + c \)

\( y = \sigma(wx + b) \)

\( y = Deep\_NN(x) \)

\( \hat{y} = \hat{f}(x) \) [our approximation]

\( \left [\begin{array}{lcr} 0.5\\ 0.2\\ 0.6\\ \dots\\0.3\ \end{array} \right]\)

\( \left [\begin{array}{lcr} 14.8\\ 13.3\\ 11.6\\ \dots\\6.16 \end{array} \right]\)

\( x \)

\( y \)

\(\hat{ y} = ax^3 + bx^2 + cx + d \)

\(\hat{ y} = ax^4 + bx^3 + cx + d \)

Data

In this course

\( y = Deep\_CNN(x) \) ...

\( y = RNN(x) \) ...

Data is drawn from the following distribution

\vdots

\vdots

\(\hat{ y} = ax^{25} + bx^{24} + \dots + cx + d \)

Models

Why not just use a complex model always ?

(c) One Fourth Labs

\( \left [\begin{array}{lcr} 0.1\\ 0.2\\ 0.4\\ ....\\0.8 \end{array} \right]\)

\( \left [\begin{array}{lcr} 2.6\\ 2.4\\ 3.1\\ ....\\4.1 \end{array} \right]\)

\( x \)

\( y \)

\( y = mx + c \) [true function, simple]

\(\hat{y} = ax^{100} + bx^{99} + ... + c \)

[our approximation, very complex]

Later in this course

Bias-Variance Tradeoff

Overfitting

Regularization

Models

What are the choices for \( \hat{f} \) ?

(c) One Fourth Labs

Data

Model

Task

Loss Function

How do we know which model is better ?

\( \left [\begin{array}{lcr} 0.00\\ 0.10\\ 0.20\\ ....\\6.40 \end{array} \right]\)

\( \left [\begin{array}{lcr} 0.24\\ 0.08\\ 0.12\\ ....\\0.36 \end{array} \right]\)

\( x \)

\( y \)

?

\( \hat{f_1}(x) \)

\( \left [\begin{array}{lcr} 0.25\\ 0.09\\ 0.11\\ ....\\0.36 \end{array} \right]\)

\( \left [\begin{array}{lcr} 0.32\\ 0.30\\ 0.31\\ ....\\0.22 \end{array} \right]\)

\( \left [\begin{array}{lcr} 0.08\\ 0.20\\ 0.14\\ ....\\0.15 \end{array} \right]\)

\( \hat{f_1}(x) = a_1x^{25} + b_1x^{24} + ... + c_1x + d_1 \)

\( \hat{f_2}(x) = a_1x^{25} + b_1x^{24} + ... + c_1x + d_1 \)

\( \hat{f_3}(x) = a_1x^{25} + b_1x^{24} + ... + c_1x + d_1 \)

\( \begin{array}{lcr} 1\\ 2\\ 3\\ ....\\n \end{array} \)

\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)

\( \hat{f_2}(x) \)

\( \hat{f_3}(x) \)

\( \mathscr{L}_2 = \sum_{i=1}^{n} (y_i - \hat{f}_2(x_i))^2 \)

\( \mathscr{L}_3 = \sum_{i=1}^{n} (y_i - \hat{f}_3(x_i))^2 \)

True Function

\( \hat{f_1}(x) \)

\( \hat{f_2}(x) \)

\( \hat{f_3}(x) \)

Loss Function

How do we know which model is better ?

(c) One Fourth Labs

\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 = 1.38\)

\( \mathscr{L}_2 = \sum_{i=1}^{n} (y_i - \hat{f}_2(x_i))^2 = 2.02\)

\( \mathscr{L}_3 = \sum_{i=1}^{n} (y_i - \hat{f}_3(x_i))^2 = 2.34 \)

In this course

Square Error Loss

Cross Entropy Loss

KL divergence

\( \left [\begin{array}{lcr} 0.00\\ 0.10\\ 0.20\\ ....\\6.40 \end{array} \right]\)

\( \left [\begin{array}{lcr} 0.24\\ 0.08\\ 0.12\\ ....\\0.36 \end{array} \right]\)

\( x \)

\( y \)

\( \hat{f_1}(x) \)

\( \left [\begin{array}{lcr} 0.25\\ 0.09\\ 0.11\\ ....\\0.36 \end{array} \right]\)

\( \left [\begin{array}{lcr} 0.32\\ 0.30\\ 0.31\\ ....\\0.22 \end{array} \right]\)

\( \left [\begin{array}{lcr} 0.08\\ 0.20\\ 0.14\\ ....\\0.15 \end{array} \right]\)

\( \begin{array}{lcr} 1\\ 2\\ 3\\ ....\\n \end{array} \)

\( \hat{f_2}(x) \)

\( \hat{f_3}(x) \)

\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)

\( = (0.24-0.25)^2 + (0.08-0.09)^2 + \newline (0.12-0.11)^2 + ... + (0.36-0.36)^2 \)

\( = 1.38 \)

Loss Function

What does a loss function look like ?

(c) One Fourth Labs

Data

Model

Loss

Task

Learning Algorithm

How do we identify parameters of the model?

(c) One Fourth Labs

\( \hat{f_1}(x) = 3.5x_1^2 + 2.5x_2^{3} + 1.2x_3^{2} \)

\( \hat{f_1}(x) = ax_1^2 + bx_2^{3} + cx_3^{2} \)

\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)

Budget (100crore)	Box Office Collection(100 crore)	Action Scene times (100 mins)	IMDB Rating
0.55	0.66	0.22	4.8
0.68	0.91	0.77	7,2
0.66	0.88	0.67	6.7

0.72	0.94	0.97	8.1
0.58	0.74	0.35	5.3

\vdots

\vdots

\vdots

\vdots

\vdots

\vdots

\vdots

\vdots

Learning Algorithm

How do you formulate this mathematically ?

(c) One Fourth Labs

\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)

In practice, brute force search is infeasible

Find \(a, b, c \) such that

is minimized

\( \hat{f_1}(x) = ax_1^2 + bx_2^{3} + cx_3^{2} \)

Budget (100crore)	Box Office Collection(100 crore)	Action Scene times (100 mins)	IMDB Rating
0.55	0.66	0.22	4.8
0.68	0.91	0.77	7,2
0.66	0.88	0.67	6.7

0.72	0.94	0.97	8.1
0.58	0.74	0.35	5.3

\vdots

\vdots

\vdots

\vdots

\vdots

\vdots

\vdots

\vdots

Learning Algorithm

How do you formulate this mathematically ?

(c) One Fourth Labs

\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)

Many optimization solvers are available

\(min_{a,b,c}\)

\( \hat{f_1}(x) = ax_1^2 + bx_2^{3} + cx_3^{2} \)

\vdots

\vdots

\vdots

\vdots

\vdots

\vdots

\vdots

\vdots

Budget (100crore)	Box Office Collection(100 crore)	Action Scene times (100 mins)	IMDB Rating
0.55	0.66	0.22	4.8
0.68	0.91	0.77	7,2
0.66	0.88	0.67	6.7

0.72	0.94	0.97	8.1
0.58	0.74	0.35	5.3

Learning Algorithm

How do you formulate this mathematically ?

(c) One Fourth Labs

\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)

Many optimization solvers are available

\(min_{a,b,c}\)

\( \hat{f_1}(x) = ax_1^2 + bx_2^{3} + cx_3^{2} \)

In this course

Gradient Descent ++

Adagrad

RMSProp

Adam

Budget (100crore)	Box Office Collection(100 crore)	Action Scene times (100 mins)	IMDB Rating
0.55	0.66	0.22	4.8
0.68	0.91	0.77	7,2
0.66	0.88	0.67	6.7

0.72	0.94	0.97	8.1
0.58	0.74	0.35	5.3

\vdots

\vdots

\vdots

\vdots

\vdots

\vdots

\vdots

\vdots

(c) One Fourth Labs

Learning Algorithm

How do you formulate this mathematically ?

Data

Model

Loss

Learning

Task

Evaluation

How do we compute a score for our ML model?

(c) One Fourth Labs

\( \left[\begin{array}{lcr} 2.1, 1.2, \dots, 5.6, 7.8 \end{array} \right]\)

\( \left[\begin{array}{lcr} 3.5, 6.6, \dots, 2.5, 6.3 \end{array} \right]\)

\( \left[\begin{array}{lcr} 6.3, 2.6, \dots, 4.5, 3.8 \end{array} \right]\)

\( \left[\begin{array}{lcr} 2.8, 3.6, \dots, 7.5, 2.1 \end{array} \right]\)

\( \left[\begin{array}{lcr} 6.3, 2.6, \dots, 4.5, 3.8 \end{array} \right]\)

True Labels

Predicted Labels

1

2

3

4

5

4

1

3

1

\tiny{\textrm{Accuracy}=\frac{\textrm{Number of correct predictions}}{\textrm{Total number of predictions}}}

\tiny{\textrm{Accuracy}=\frac{\textrm{Number of correct predictions}}{\textrm{Total number of predictions}}}

Class Labels
Lion	1
Tiger	2
Cat	3
Giraffe	4
Dog	5

\tiny{=\frac{\textrm{4}}{\textrm{7}}}=\textrm{0.55}

\tiny{=\frac{\textrm{4}}{\textrm{7}}}=\textrm{0.55}

\( \left[\begin{array}{lcr} 1.9, 3.3, \dots, 4.2, 1.1 \end{array} \right]\)

\( \left[\begin{array}{lcr} 2.2, 1.7, \dots, 2.5, 1.8 \end{array} \right]\)

3

5

2

5

Top - 1

Evaluation

How do we compute a score for our ML model?

(c) One Fourth Labs

\( \left[\begin{array}{lcr} 2.1, 1.2, \dots, 5.6, 7.8 \end{array} \right]\)

\( \left[\begin{array}{lcr} 3.5, 6.6, \dots, 2.5, 6.3 \end{array} \right]\)

\( \left[\begin{array}{lcr} 6.3, 2.6, \dots, 4.5, 3.8 \end{array} \right]\)

\( \left[\begin{array}{lcr} 2.8, 3.6, \dots, 7.5, 2.1 \end{array} \right]\)

\( \left[\begin{array}{lcr} 6.3, 2.6, \dots, 4.5, 3.8 \end{array} \right]\)

True Labels

Predicted Labels

1

2

3

4

5

Class Labels
Lion	1
Tiger	2
Cat	3
Giraffe	4
Dog	5

\( \left[\begin{array}{lcr} 1.9, 3.3, \dots, 4.2, 1.1 \end{array} \right]\)

\( \left[\begin{array}{lcr} 2.2, 1.7, \dots, 2.5, 1.8 \end{array} \right]\)

3

5

Top - 3

\( \left[\begin{array}{lcr} 1, 2, 3\end{array} \right]\)

\( \left[\begin{array}{lcr} 4, 5, 3\end{array} \right]\)

\( \left[\begin{array}{lcr} 5, 2, 1\end{array} \right]\)

\( \left[\begin{array}{lcr} 2, 1, 4\end{array} \right]\)

\( \left[\begin{array}{lcr} 5, 4, 1\end{array} \right]\)

\tiny{=\frac{\textrm{6}}{\textrm{7}}}=\textrm{0.86}

\tiny{=\frac{\textrm{6}}{\textrm{7}}}=\textrm{0.86}

\tiny{\textrm{Accuracy}=\frac{\textrm{Number of correct predictions in top-3}}{\textrm{Total number of predictions}}}

\tiny{\textrm{Accuracy}=\frac{\textrm{Number of correct predictions in top-3}}{\textrm{Total number of predictions}}}

Evaluation

How is this different from loss function ?

(c) One Fourth Labs

#( )

Evaluation

Brake

/Go

__________

#( )

Loss function

\( maximize \)

#( )

____________________

#( ) + #(___)

Evaluation

Should we learn and test on the same data?

(c) One Fourth Labs

\( \left[\begin{array}{lcr} 2.1, 1.2, \dots, 5.6, 7.8 \end{array} \right]\)

\( \left[\begin{array}{lcr} 3.5, 6.6, \dots, 2.5, 6.3 \end{array} \right]\)

\( \left[\begin{array}{lcr} 6.3, 2.6, \dots, 4.5, 3.8 \end{array} \right]\)

\( \left[\begin{array}{lcr} 2.8, 3.6, \dots, 7.5, 2.1 \end{array} \right]\)

\( \left[\begin{array}{lcr} 2.2, 1.7, \dots, 2.5, 1.8 \end{array} \right]\)

1

2

3

4

2

\( \left[\begin{array}{lcr} 6.3, 2.6, \dots, 4.5, 3.8 \end{array} \right]\)

\( \left[\begin{array}{lcr} 2.8, 3.6, \dots, 7.5, 2.1 \end{array} \right]\)

\( \left[\begin{array}{lcr} 2.2, 1.7, \dots, 2.5, 1.8 \end{array} \right]\)

1

3

4

x

x

y

y

x

x

y

y

Training Data

Test Data

\( \mathscr{L}_1 = \sum_{i=1}^{n} (y_i - \hat{f}_1(x_i))^2 \)

\( \hat{f_1}(x) = ax_1^2 + bx_2^{3} + cx_3^{2} \)

\(min_{a,b,c}\)

\tiny{\textrm{Accuracy}=\frac{\textrm{Number of correct predictions in top-3}}{\textrm{Total number of predictions}}}

\tiny{\textrm{Accuracy}=\frac{\textrm{Number of correct predictions in top-3}}{\textrm{Total number of predictions}}}

Evaluation

How is this different from loss function ?

(c) One Fourth Labs

Data

Model

Loss

Learning

Task

Evaluation

Putting it all together

How does all the jargon fit into these jars?

(c) One Fourth Labs

Linear Algebra

Probability

Calculus

Data

Model

Loss

Learning

Task

Evaluation

Data, democratisation, devices

Why ML is very successful?

(c) One Fourth Labs

Data

Model

Loss

Learning

Task

Evaluation

Standardised

Improvised

Democratised

Abudance

Typical ML effort

How to distribute your work through the six jars?

(c) One Fourth Labs

Your Job

Model

Loss

Learning

Evaluation

Data

Task

1.3 Six Elements of ML

Recap: Machine Learning

A jargon cloud

From jargons to jars

Data data everywhere

Data data everywhere

Data data everywhere

Data data everywhere

Data data everywhere

Data data everywhere

Data curation

Data data everywhere

Tasks

Tasks

Tasks

Tasks

Tasks

Tasks

Tasks

Tasks

Tasks

Models

Models

Models

Models

Loss Function

?

Loss Function

Loss Function

Learning Algorithm

Learning Algorithm

Learning Algorithm

Learning Algorithm

Learning Algorithm

Evaluation

Evaluation

Evaluation

Evaluation

Evaluation

Putting it all together

Data, democratisation, devices

Typical ML effort

Connecting to the Capstone

Assignment

Copy of finalmerge

Copy of finalmerge

varshini7

More from varshini7