1.6 The Perceptron Model

Your first model with weights

Recap: MP Neuron

What did we see in the previous chapter?

(c) One Fourth Labs

Screen size (>5 in) 1 0 1 1 1 0 1 0 1 0
Battery (>2000mAh) 0 0 0 1 0 1 1 1 1 0
Like 1 0 1 0 1 1 0 1 0 0
\hat{y}
y^\hat{y}
x_1
x1x_1
x_2
x2x_2
b
bb
\hat{y}=\sum_{i=1}^n x_i \gt b
y^=i=1nxi>b\hat{y}=\sum_{i=1}^n x_i \gt b

Boolean inputs

Boolean output

Linear

Fixed Slope

Few possible intercepts (b's)

The Road Ahead

What's going to change now ?

(c) One Fourth Labs

\( \{0, 1\} \)

Classification

loss = \sum_i (y_i-\hat{y_i})^2
loss=i(yiyi^)2loss = \sum_i (y_i-\hat{y_i})^2
Accuracy=\frac{\text{Number of correct predictions}}{\text{Total number of predictions}}
Accuracy=Number of correct predictionsTotal number of predictionsAccuracy=\frac{\text{Number of correct predictions}}{\text{Total number of predictions}}

Loss

Model

Data

Task

Evaluation

Learning

Linear

Only one parameter, b

Real inputs

Boolean output

Brute force

Boolean inputs

loss = \sum_i max(0,1-y_i*\hat{y_i})
loss=imax(0,1yiyi^)loss = \sum_i max(0,1-y_i*\hat{y_i})

Our 1st learning algorithm

Weights for every input

Data and Task

What kind of data and tasks can Perceptron process ?

(c) One Fourth Labs

Real inputs

Launch (within 6 months) 0 1 1 0 0 1 0 1 1
Weight (g) 151 180 160 205 162 182 138 185 170
Screen size (inches) 5.8 6.18 5.84 6.2 5.9 6.26 4.7 6.41 5.5
dual sim 1 1 0 0 0 1 0 1 0
Internal memory (>= 64 GB, 4GB RAM) 1 1 1 1 1 1 1 1 1
NFC 0 1 1 0 1 0 1 1 1
Radio 1 0 0 1 1 1 0 0 0
Battery(mAh) 3060 3500 3060 5000 3000 4000 1960 3700 3260
Price (INR) 15k 32k 25k 18k 14k 12k 35k 42k 44k
Like (y) 1 0 1 0 1 1 0 1 0
Launch (within 6 months) 0 1 1 0 0 1 0 1 1
Weight (<160g) 1 0 1 0 0 0 1 0 0
Screen size (<5.9 in) 1 0 1 0 1 0 1 0 1
dual sim 1 1 0 0 0 1 0 1 0
Internal memory (>= 64 GB, 4GB RAM) 1 1 1 1 1 1 1 1 1
NFC 0 1 1 0 1 0 1 1 1
Radio 1 0 0 1 1 1 0 0 0
Battery(>3500mAh) 0 0 0 1 0 1 0 1 0
Price > 20k 0 1 1 0 0 0 1 1 1
Like (y) 1 0 1 0 1 1 0 1 0

(c) One Fourth Labs

Launch (within 6 months) 0 1 1 0 0 1 0 1 1
Weight (g) 151 180 160 205 162 182 138 185 170
Screen size (inches) 5.8 6.18 5.84 6.2 5.9 6.26 4.7 6.41 5.5
dual sim 1 1 0 0 0 1 0 1 0
Internal memory (>= 64 GB, 4GB RAM) 1 1 1 1 1 1 1 1 1
NFC 0 1 1 0 1 0 1 1 1
Radio 1 0 0 1 1 1 0 0 0
Battery(mAh) 3060 3500 3060 5000 3000 4000 1960 3700 3260
Price (INR) 15k 32k 25k 18k 14k 12k 35k 42k 44k
Like (y) 1 0 1 0 1 1 0 1 0
screen size
5.8
6.18
5.84
6.2
5.9
6.26
4.7
6.41
5.5
screen size
0.64
0.87
0.67
0.88
0.7
0.91
0
1
0.47

min

max

Standardization formula

x' = \frac{x-min}{max-min}
x=xminmaxminx&#x27; = \frac{x-min}{max-min}
Launch (within 6 months) 0 1 1 0 0 1 0 1 1
Weight (g) 151 180 160 205 162 182 138 185 170
Screen size 0.64 0.87 0.67 0.88 0.7 0.91 0 1 0.47
dual sim 1 1 0 0 0 1 0 1 0
Internal memory (>= 64 GB, 4GB RAM) 1 1 1 1 1 1 1 1 1
NFC 0 1 1 0 1 0 1 1 1
Radio 1 0 0 1 1 1 0 0 0
Battery(mAh) 3060 3500 3060 5000 3000 4000 1960 3700 3260
Price (INR) 15k 32k 25k 18k 14k 12k 35k 42k 44k
Like (y) 1 0 1 0 1 1 0 1 0
battery
3060
3500
3060
5000
3000
4000
1960
3700
3260
battery
0.36
0.51
0.36
1
0.34
0.67
0
0.57
0.43

min

max

Data Preparation

Can the data be used as it is ?

Launch (within 6 months) 0 1 1 0 0 1 0 1 1
Weight (g) 151 180 160 205 162 182 138 185 170
Screen size 0.64 0.87 0.67 0.88 0.7 0.91 0 1 0.47
dual sim 1 1 0 0 0 1 0 1 0
Internal memory (>= 64 GB, 4GB RAM) 1 1 1 1 1 1 1 1 1
NFC 0 1 1 0 1 0 1 1 1
Radio 1 0 0 1 1 1 0 0 0
Battery 0.36 0.51 0.36 1 0.34 0.67 0 0.57 0.43
Price (INR) 15k 32k 25k 18k 14k 12k 35k 42k 44k
Like (y) 1 0 1 0 1 1 0 1 0

Data Preparation

Can the data be used as it is ?

(c) One Fourth Labs

Launch (within 6 months) 0 1 1 0 0 1 0 1 1
Weight 0.19 0.63 0.33 1 0.36 0.66 0 0.70 0.48
Screen size 0.64 0.87 0.67 0.88 0.7 0.91 0 1 0.47
dual sim 1 1 0 0 0 1 0 1 0
Internal memory (>= 64 GB, 4GB RAM) 1 1 1 1 1 1 1 1 1
NFC 0 1 1 0 1 0 1 1 1
Radio 1 0 0 1 1 1 0 0 0
Battery 0.36 0.51 0.36 1 0.34 0.67 0 0.57 0.43
Price 0.09 0.63 0.41 0.19 0.06 0 0.72 0.94 1
Like (y) 1 0 1 0 1 1 0 1 0

The Model

What is the mathematical model ?

(c) One Fourth Labs

Launch (within 6 months) 0 1 1 0 0 1 0 1 1
Weight 0.19 0.63 0.33 1 0.36 0.66 0 0.70 0.48
Screen size 0.64 0.87 0.67 0.88 0.7 0.91 0 1 0.47
dual sim 1 1 0 0 0 1 0 1 0
Internal memory (>= 64 GB, 4GB RAM) 1 1 1 1 1 1 1 1 1
NFC 0 1 1 0 1 0 1 1 1
Radio 1 0 0 1 1 1 0 0 0
Battery 0.36 0.51 0.36 1 0.34 0.67 0 0.57 0.43
Price 0.09 0.63 0.41 0.19 0.06 0 0.72 0.94 1
Like (y) 1 0 1 0 1 1 0 1 0

\(x_1\)

b

\(x_n\)

\(\hat{y}\)

\(x_2\)

\(w_1\)

\(w_2\)

\(w_n\)

\hat{y} = 1 \text{ if } \sum_{i=1}^n w_i x_i \geq b
y^=1 if i=1nwixib\hat{y} = 1 \text{ if } \sum_{i=1}^n w_i x_i \geq b
\hat{y} = 0 \text{ otherwise }
y^=0 otherwise \hat{y} = 0 \text{ otherwise }

The Model

How is this different from the MP Neuron Model ?

(c) One Fourth Labs

Real inputs

Linear

Weights for each input

Adjustable threshold

Boolean inputs

Linear

Inputs are not weighted

Adjustable threshold

\hat{y} = 1 \text{ if } \sum_{i=1}^n w_i x_i \geq b
y^=1 if i=1nwixib\hat{y} = 1 \text{ if } \sum_{i=1}^n w_i x_i \geq b
\hat{y} = 0 \text{ otherwise }
y^=0 otherwise \hat{y} = 0 \text{ otherwise }
\hat{y} = 1 \text{ if } \sum_{i=1}^n x_i \geq b
y^=1 if i=1nxib\hat{y} = 1 \text{ if } \sum_{i=1}^n x_i \geq b
\hat{y} = 0 \text{ otherwise }
y^=0 otherwise \hat{y} = 0 \text{ otherwise }

MP Neuron 

Perceptron

The Model

What do weights allow us to do ?

(c) One Fourth Labs

Launch (within 6 months) 0 1 1 0 0 1 0 1 1
Weight (g) 151 180 160 205 162 182 158 185 170
Screen size (inches) 5.8 6.18 5.84 6.2 5.9 6.26 5.7 6.41 5.5
dual sim 1 1 0 0 0 1 0 1 0
Internal memory (>= 64 GB, 4GB RAM) 1 1 1 1 1 1 1 1 1
NFC 0 1 1 0 1 0 1 1 1
Radio 1 0 0 1 1 1 0 0 0
Battery(mAh) 3060 3500 3060 5000 3000 4000 2960 3700 3260
Price (INR) 15k 32k 25k 18k 14k 12k 35k 42k 44k
Like (y) 1 0 1 0 1 1 0 1 0

\(x_1\)

b

\(x_n\)

\(\hat{y}\)

\(x_2\)

\(w_1\)

\(w_2\)

\(w_n\)

\(w_{price} \rightarrow -ve\)

Like  \(\alpha  \frac{1}{price}\)

\hat{y} = 1 \text{ if } \sum_{i=1}^n w_i x_i \geq b
y^=1 if i=1nwixib\hat{y} = 1 \text{ if } \sum_{i=1}^n w_i x_i \geq b
\hat{y} = 0 \text{ otherwise }
y^=0 otherwise \hat{y} = 0 \text{ otherwise }

Some Math fundae

Can we write the perceptron model slightly more compactly?

(c) One Fourth Labs

x : [0, 0.19, 0.64, 1, 1, 0]

w: [0.3, 0.4, -0.3, 0.1, 0.5]

 \(\textbf{x} \in R^5\)

 \(\textbf{w} \in R^5\)

\( \vec{x} \)

\( \vec{w} \)

\(\textbf{x}.\textbf{w}\) = ?

\(\textbf{x}.\textbf{w} = x_1.w_1 + x_2.w_2 + ... x_n.w_n\)

= \sum_{i=1}^n x_i.w_i
=i=1nxi.wi= \sum_{i=1}^n x_i.w_i
\hat{y}= 1 \text{ (if } \textbf{x}.\textbf{w} \geq b)
y^=1 (if x.wb)\hat{y}= 1 \text{ (if } \textbf{x}.\textbf{w} \geq b)
\hat{y}= 0 \text{ (otherwise)}
y^=0 (otherwise)\hat{y}= 0 \text{ (otherwise)}

\(x_1\)

b

\(x_n\)

\(\hat{y}\)

\(x_2\)

\(w_1\)

\(w_2\)

\(w_n\)

\( \textbf{x} \)

\( \textbf{w} \)

\hat{y} = 1 \text{ if } \sum_{i=1}^n w_i x_i \geq b
y^=1 if i=1nwixib\hat{y} = 1 \text{ if } \sum_{i=1}^n w_i x_i \geq b
\hat{y} = 0 \text{ otherwise }
y^=0 otherwise \hat{y} = 0 \text{ otherwise }

\(\textbf{x}.\textbf{w} \)

= \sum_{i=1}^n x_i.w_i
=i=1nxi.wi= \sum_{i=1}^n x_i.w_i

The Model

What is the geometric interpretation of the model ?

(c) One Fourth Labs

More freedom

MP neuron

Perceptron

The Model

Why is more freedom important ?

(c) One Fourth Labs

More freedom

MP neuron

Perceptron

The Model

Is this all the freedom that we need ?

(c) One Fourth Labs

We want even more freedom

The Model

What if we have more than 2 dimensions ?

(c) One Fourth Labs

Loss Function

What is the loss function that you use for this model ?

(c) One Fourth Labs

Weight Screen size Like
0.19 0.64 1
0.63 0.87 1
0.33 0.67 0
1 0.88 0
Weight Screen size Like Loss
0.19 0.64 1 1 0
0.63 0.87 1 0 1
0.33 0.67 0 1 1
1 0.88 0 0 0
\hat{y}
y^\hat{y}
(y)
(y)(y)
L=0,\text{ if } y=\hat{y}
L=0, if y=y^L=0,\text{ if } y=\hat{y}
=1,otherwise
=1,otherwise=1,otherwise
L = \textbf{1}_{(y-\hat{y})}
L=1(yy^)L = \textbf{1}_{(y-\hat{y})}

Q. What is the purpose of the loss function ?

A. To tell the model that some correction needs to be done!

Q. How ?

A. We will see soon

Loss Function

How is this different from the squared error loss function ?

(c) One Fourth Labs

Squared error loss is equivalent to perceptron loss when the outputs are boolean.

Weight Screen size Like
0.19 0.64 1 1
0.63 0.87 1 0
0.33 0.67 0 1
1 0.88 0 0
(y)
(y)(y)
\hat{y}
y^\hat{y}
Perceptron Loss Squared Error Loss

0

0

0

0

1

1

1

1

Perceptron loss = 

Squared Error loss = 

\textbf{1}_{(y-\hat{y})}
1(yy^)\textbf{1}_{(y-\hat{y})}
(y-\hat{y})^2
(yy^)2(y-\hat{y})^2

Loss Function

Can we plot the loss function ?

(c) One Fourth Labs

Price

 
Like
0.2 1
0.4 1
0.6 0
0.7 0
0.45 1
(y)
(y)(y)
Price Like     (w=0.5,b=0.3) Loss
0.2 1 1 0
0.4 1 1 0
0.6 0 1 1
0.7 0 0 0
0.45 1 1 0
\hat{y}
y^\hat{y}
Price Like   (w=0.8,b=0.1) Loss
0.2 1 0 1
0.4 1 0 1
0.6 0 0 0
0.7 0 0 0
0.45 1 0 1
Price Like   (w=1,b=0.5) Loss
0.2 1 1 0
0.4 1 1 0
0.6 0 0 0
0.7 0 0 0
0.45 1 1 0

Error = 1

Error = 3

Error = 0

Learning Algorithm

 

What is the typical recipe for learning parameters of a model ?

(c) One Fourth Labs

Initialise 

\(w_1, w_2, b \)

Iterate over data:

\( \mathscr{L}  = compute\_loss(x_i) \)

\( update(w_1, w_2, b, \mathscr{L}) \)

till satisfied

\(\mathbf{w} = [w_1, w_2] \)

Weight Screen size Like
0.19 0.64 1
0.63 0.87 1
0.33 0.67 0
1 0.88 0

Learning Algorithm

 

What does the perceptron learning algorithm look like ?

Initialize w randomly
    while !convergence do
        Pick random x ∈ P U N
	    if y_i == 1 and w.x < b then
	         w = w + x 
	         b = b + 1
	    end
	    if y_i == 0 and w.x ≥ b then
		 w = w − x 
		 b = b - 1
	    end
    end
/*the algorithm converges when all the
inputs are classified correctly */
\hat{y}=\sum_{i=1}^n w_i x_i \geq b
y^=i=1nwixib\hat{y}=\sum_{i=1}^n w_i x_i \geq b
\hat{y}= 1 \text{ (if } \vec{x}.\vec{w} \geq b)
y^=1 (if x.wb)\hat{y}= 1 \text{ (if } \vec{x}.\vec{w} \geq b)
\hat{y}= 0 \text{ (otherwise)}
y^=0 (otherwise)\hat{y}= 0 \text{ (otherwise)}

X is also a vector!

[0.19, 0.67]
[0.19,0.67][0.19, 0.67]

Learning Algorithm

 

Can we see this algorithm in action ?

(c) One Fourth Labs

Initialize w randomly
    while !convergence do
        Pick random x ∈ P U N
	    if y_i == 1 and w.x < b then
	         w = w + x 
	         b = b + 1
	    end
	    if y_i == 0 and w.x ≥ b then
		 w = w − x 
		 b = b - 1
	    end
    end
/*the algorithm converges when all the
inputs are classified correctly */
x1 x2 x3
2 2 5 10 1 1
2 4 10 17 1 1
4 4 0 11 1 1
0 0 15 14 1 0
-4 -4 -15 -15 0 0
-2 0 -10 -26 0 0
\vec{x}.\vec{w}-b
x.wb \vec{x}.\vec{w}-b
\hat{y}
y^\hat{y}
y
yy

This triggers learning!

\hat{y}= 1 \text{ (if } \vec{x}.\vec{w} \geq b)
y^=1 (if x.wb)\hat{y}= 1 \text{ (if } \vec{x}.\vec{w} \geq b)
\hat{y}= 0 \text{ (otherwise)}
y^=0 (otherwise)\hat{y}= 0 \text{ (otherwise)}

Learning Algorithm

 

What is the geometric interpretation of this ?

\hat{y}=\sum_{i=1}^n w_i x_i \geq b
y^=i=1nwixib\hat{y}=\sum_{i=1}^n w_i x_i \geq b

Misclassified

Let's learn!

Initialize w randomly
    while !convergence do
        Pick random x ∈ P U N
	    if y_i == 1 and w.x < b then
	         w = w + x 
	         b = b + 1
	    end
	    if y_i == 0 and w.x ≥ b then
		 w = w − x 
		 b = b - 1
	    end
    end
/*the algorithm converges when all the
inputs are classified correctly */

Learning Algorithm

 

Will this algorithm always work ?

Only if the data is linearly separable

Learning Algorithm

 

Can we prove that it will always work for linearly separable data ?

(c) One Fourth Labs

\mathbf{Definition}: Two \space sets \space P \space and \space N \space of \space points \newline in \space an \space n-dimensional \space space \newline are \space called \space absolutely \space linearly \space separable \newline if \space n + 1 \space real \space P \space numbers \space w_0 , w_1 , ..., w_n \newline exist \space such \space that \space every \space point (x_1 , x_2 , ..., x_n ) ∈ P \space satisfies \newline \sum_{i=1}^n w_i ∗ x_i > w_0 \newline and \space every \space point \space (x_1 , x_2 , ..., x_n ) ∈ N \space satisfies \newline \sum_{i=1}^n w_i ∗ x_i < w_0 \newline \newline \mathbf{Proposition}:If \space the \space sets \space P \space and \space N \space are \space finite \space and \space linearly \space separable.
Definition:Two sets P and N of pointsin an ndimensional spaceare called absolutely linearly separableif n+1 real P numbers w0,w1,...,wnexist such that every point(x1,x2,...,xn)P satisfiesi=1nwixi&gt;w0and every point (x1,x2,...,xn)N satisfiesi=1nwixi&lt;w0Proposition:If the sets P and N are finite and linearly separable.\mathbf{Definition}: Two \space sets \space P \space and \space N \space of \space points \newline in \space an \space n-dimensional \space space \newline are \space called \space absolutely \space linearly \space separable \newline if \space n + 1 \space real \space P \space numbers \space w_0 , w_1 , ..., w_n \newline exist \space such \space that \space every \space point (x_1 , x_2 , ..., x_n ) ∈ P \space satisfies \newline \sum_{i=1}^n w_i ∗ x_i &gt; w_0 \newline and \space every \space point \space (x_1 , x_2 , ..., x_n ) ∈ N \space satisfies \newline \sum_{i=1}^n w_i ∗ x_i &lt; w_0 \newline \newline \mathbf{Proposition}:If \space the \space sets \space P \space and \space N \space are \space finite \space and \space linearly \space separable.

Learning Algorithm

 

What does "till satisfied" mean ?

(c) One Fourth Labs

Initialise 

\(w_1, w_2, b \)

Iterate over data:

\( \mathscr{L}  = compute\_loss(x_i) \)

\( update(w_1, w_2, b, \mathscr{L}) \)

till satisfied

\( total\_loss  = 0 \)

\( total\_loss  += \mathscr{L} \)

till total loss becomes 0

till total loss becomes < \( \epsilon \)

till number of iterations exceeds k (say 100)

Evaluation

 

How do you check the performance of the perceptron model?

(c) One Fourth Labs

Launch (within 6 months) 0 1 1 0 0 1 0 1 1 0
Weight (<160g) 1 0 1 0 0 0 1 0 0 1
Screen size (<5.9 in) 1 0 1 0 1 0 1 0 1 0
dual sim 1 1 0 0 0 1 0 1 0 0
Internal memory (>= 64 GB, 4GB RAM) 1 1 1 1 1 1 1 1 1 0
NFC 0 1 1 0 1 0 1 1 1 0
Radio 1 0 0 1 1 1 0 0 0 0
Battery(>3500mAh) 0 0 0 1 0 1 0 1 0 0
Price > 20k 0 1 1 0 0 0 1 1 1 0
Like?      (y) 1 0 1 0 1 1 0 1 0 0
predicted        1 0 0 1 1 1 1 0 0 0
\hat{y}=\sum_{i=1}^n w_i x_i \geq 5
y^=i=1nwixi5\hat{y}=\sum_{i=1}^n w_i x_i \geq 5
loss = \sum_i (y_i-\hat{y_i})^2
loss=i(yiyi^)2loss = \sum_i (y_i-\hat{y_i})^2
1 0 0 1
0 1 1 1
0 1 1 1
0 1 0 0
1 0 0 0
0 0 1 0
1 1 1 0
1 1 1 0
0 0 1 0
0 1 0 0
0 1 1 0

Training data

Test data

Accuracy=\frac{\text{Number of correct predictions}}{\text{Total number of predictions}}
Accuracy=Number of correct predictionsTotal number of predictionsAccuracy=\frac{\text{Number of correct predictions}}{\text{Total number of predictions}}
= \frac{3}{4} = 75\%
=34=75%= \frac{3}{4} = 75\%

Take-aways

So when will you use perceptron?

(c) One Fourth Labs

 

\( \in \mathbb{R} \)

Classification

loss = \sum_i (y_i-\hat{y_i})^2
loss=i(yiyi^)2loss = \sum_i (y_i-\hat{y_i})^2
Accuracy=\frac{\text{Number of correct predictions}}{\text{Total number of predictions}}
Accuracy=Number of correct predictionsTotal number of predictionsAccuracy=\frac{\text{Number of correct predictions}}{\text{Total number of predictions}}

Loss

Model

Data

Task

Evaluation

Learning

Real inputs

Boolean Output

Perceptron Learning Algorithm

An Eye on the Capstone project

How is perceptron related to the capstone project ?

(c) One Fourth Labs

Boolean

 

text/

no-text

 Perceptron Learning

Algorithm

The simplest model for binary classification

Data

Task

Model

Loss

Learning

Evaluation

-8.5 -1.7 ... 9.0

-0.4  6.7 ... 4.7

...

loss = \sum_i (y_i-\hat{y_i})^2
loss=i(yiyi^)2loss = \sum_i (y_i-\hat{y_i})^2
Accuracy=\frac{\text{Number of correct predictions}}{\text{Total number of predictions}}
Accuracy=Number of correct predictionsTotal number of predictionsAccuracy=\frac{\text{Number of correct predictions}}{\text{Total number of predictions}}
1 0 0 1
0 1 1 1
Accuracy = \frac{3}{4} = 75\%
Accuracy=34=75%Accuracy = \frac{3}{4} = 75\%

Squared

error loss

Assignments

 

How do you view the learning process ?

(c) One Fourth Labs

Assignment: Give some data including negative values and ask them to standardize it

Made with Slides.com