Amrutha
Course Content Developer for Deep Learning course by Professor Mitesh Khapra. Offered by IIT Madras Online degree - Programming and Data Science.
x1
x2
x3
h2
a3
b2 = [0.01,0.02,0.03]
b3 = [0.01,0.02]
a2
h1
a1
1.5
2.5
3
0.36
0.37
0.38
0.589
0.591
0.593
0.054
0.064
0.074
0.513
0.516
0.518
1.558
1.568
0.497
0.502
y^=h3
L(θ)=−N1∑i=1N(yilog(y^i)+(1−yi)log(1−y^i)) 0.6981
An Example for Backpropagation
"Forward Pass"
x=[1.5,2.5,3]
b1 = [0.01,0.02,0.03]
[h1]=sigmoid(a1)
[h2]=sigmoid(a2)
[h3]=softmax(a3)
[a1]=[1.5,2.5,3]∗
+[0.01,0.02,0.03]
[a2]=[0.589,0.591,0.593]∗
+[0.01,0.02,0.03]
[a3]=[0.513,0.516,0.518]∗
+[0.01,0.02]
y=[1,0]
"Binary Cross Entropy Loss"
x1
x2
x3
a11
h11
a12
h12
a13
h13
a23
h23
a21
h21
a22
h22
a31
y^1
a32
y^2
W111
W112
W113
W121
W122
W123
W131
W132
W133
W211
W212
W213
W221
W222
W223
W231
W232
W233
b11
b12
b13
b21
b22
b23
W311
W312
W321
W322
W331
W332
b31
b32
L(θ)
"Backward Pass" Computing updates
x1
x2
x3
a11
h11
a12
h12
a13
h13
a23
h23
a21
h21
a22
h22
a31
y^1
a32
y^2
W111
W112
W113
W121
W122
W123
W131
W132
W133
W211
W212
W213
W221
W222
W223
W231
W232
W233
b11
b12
b13
b21
b22
b23
W311
W312
W321
W322
W331
W332
b31
b32
L(θ)
Updates for weights W3 and biases b3
x1
x2
x3
a11
h11
a12
h12
a13
h13
a23
h23
a21
h21
a22
h22
a31
y^1
a32
y^2
W111
W112
W113
W121
W122
W123
W131
W132
W133
W211
W212
W213
W221
W222
W223
W231
W232
W233
b11
b12
b13
b21
b22
b23
W311
W312
W321
W322
W331
W332
b31
b32
L(θ)
∂W311∂L(θ)=?
∂y^1∂L(θ)
∂a31∂y^1
∂W311∂a31
=∂y^1∂L(θ)∂a31∂y^1∂W311∂a31
Updates for W3 & b3
a23
h23
a21
h21
a22
h22
a31
y^1
a32
y^2
W311
W312
W321
W322
W331
W332
b31
b32
L(θ)
∂W311∂L(θ)=?
∂y^1∂L(θ)
∂a31∂y^1
∂W311∂a31
=∂y^1∂L(θ)∂a31∂y^1∂W311∂a31
∂y^1∂L(θ)
∂a31∂y^1
∂W311∂a31
To find:
L(θ)=−N1∑i=1N(yilog(y^i)+(1−yi)log(1−y^i))
∂y^1∂L(θ)=−21(y^1y1−1−y^11−y1)=δ1
∂a31∂y^1=∂a31∂softmax(a31)
=(ea31+ea32)2ea31∗ ea32=ε
∂W311∂a31=∂W311∂[h21W311+h22W321+h23W331+b31]
=h21
∂W311∂L(θ)=δ1∗ε∗h21
=−21((y1∗log(y^1))+((1−y1)∗log(1−y^1))+(y2∗log(y^2))+((1−y2)∗log(1−y^2)))
a23
h23
a21
h21
a22
h22
a31
y^1
a32
y^2
W311
W312
W321
W322
W331
W332
b31
b32
L(θ)
∂W311∂L(θ)=?
∂y^1∂L(θ)
∂a31∂y^1
∂W311∂a31
=∂y^1∂L(θ)∂a31∂y^1∂W311∂a31
∂y^1∂L(θ)
∂a31∂y^1
∂W311∂a31
L(θ)=−N1∑i=1N(yilog(y^i)+(1−yi)log(1−y^i))
∂y^1∂L(θ)=−21(y^1y1−1−y^11−y1)=δ1
∂a31∂y^1=∂a31∂softmax(a31)
=(ea31+ea32)2ea31∗ ea32=ε
To find:
∂W311∂a31=∂W311∂[h21W311+h22W321+h23W331+b31]
=h21
∂W311∂L(θ)=δ1∗ε∗h21
0.497
0.502
=−1.006
1.558
1.568
=0.2499
=0.513
0.513
=−0.1289
([y1,y2]=[1,0])
=−21((y1∗log(y^1))+((1−y1)∗log(1−y^1))+(y2∗log(y^2))+((1−y2)∗log(1−y^2)))
x1
x2
x3
a11
h11
a12
h12
a13
h13
a23
h23
a21
h21
a22
h22
a31
y^1
a32
y^2
W111
W112
W113
W121
W122
W123
W131
W132
W133
W211
W212
W213
W221
W222
W223
W231
W232
W233
b11
b12
b13
b21
b22
b23
W311
W312
W321
W322
W331
W332
b31
b32
L(θ)
∂W312∂L(θ)=?
∂y^2∂L(θ)
∂a32∂y^2
∂W312∂a32
=∂y^2∂L(θ)∂a32∂y^2∂W312∂a32
Updates for W3 & b3
a23
h23
a21
h21
a22
h22
a31
y^1
a32
y^2
W311
W312
W321
W322
W331
W332
b31
b32
L(θ)
∂W312∂L(θ)=?
∂y^2∂L(θ)
∂a32∂y^2
∂W312∂a32
=∂y^2∂L(θ)∂a32∂y^2∂W312∂a32
L(θ)=−N1∑i=1N(yilog(y^i)+(1−yi)log(1−y^i))
=−21((y1∗log(y^1))+((1−y1)∗log(1−y^1))+(y2∗log(y^2))+((1−y2)∗log(1−y^2)))
∂y^2∂L(θ)=−21(y^2y2−1−y^21−y2)=δ2
∂a32∂y^1=∂a32∂softmax(a32)
=(ea31+ea32)2ea31∗ ea32=ε
∂W312∂a32=∂W312∂[h21W312+h22W322+h23W332+b32]
=h21
∂W312∂L(θ)=δ2∗ε∗h21
To find:
∂y^2∂L(θ)
∂a32∂y^2
∂W312∂a32
a23
h23
a21
h21
a22
h22
a31
y^1
a32
y^2
W311
W312
W321
W322
W331
W332
b31
b32
L(θ)
∂W312∂L(θ)=?
∂y^2∂L(θ)
∂a32∂y^2
∂W312∂a32
=∂y^2∂L(θ)∂a32∂y^2∂W312∂a32
To find:
∂y^2∂L(θ)
∂a32∂y^2
∂W312∂a32
0.497
0.502
=1.004
1.558
1.568
=0.2499
=0.513
0.513
=0.1287
([y1,y2]=[1,0])
L(θ)=−N1∑i=1N(yilog(y^i)+(1−yi)log(1−y^i))
=−21((y1∗log(y^1))+((1−y1)∗log(1−y^1))+(y2∗log(y^2))+((1−y2)∗log(1−y^2)))
∂y^2∂L(θ)=−21(y^2y2−1−y^21−y2)=δ2
∂a32∂y^1=∂a32∂softmax(a32)
=(ea31+ea32)2ea31∗ ea32=ε
∂W312∂a32=∂W312∂[h21W312+h22W322+h23W332+b32]
=h21
∂W312∂L(θ)=δ2∗ε∗h21
x1
x2
x3
a11
h11
a12
h12
a13
h13
a23
h23
a21
h21
a22
h22
a31
y^1
a32
y^2
W111
W112
W113
W121
W122
W123
W131
W132
W133
W211
W212
W213
W221
W222
W223
W231
W232
W233
b11
b12
b13
b21
b22
b23
W311
W312
W321
W322
W331
W332
b31
b32
L(θ)
∂W321∂L(θ)=?
∂y^1∂L(θ)
∂a31∂y^1
∂W321∂a31
=∂y^1∂L(θ)∂a31∂y^1∂W321∂a31
Updates for W3 & b3
∂W321∂L(θ)=δ1∗ε∗h22
=−0.1297
0.497
0.502
1.558
1.568
([y1,y2]=[1,0])
0.513
x1
x2
x3
a11
h11
a12
h12
a13
h13
a23
h23
a21
h21
a22
h22
a31
y^1
a32
y^2
W111
W112
W113
W121
W122
W123
W131
W132
W133
W211
W212
W213
W221
W222
W223
W231
W232
W233
b11
b12
b13
b21
b22
b23
W311
W312
W321
W322
W331
W332
b31
b32
L(θ)
∂W322∂L(θ)=?
∂y^2∂L(θ)
∂a32∂y^2
∂W322∂a32
=∂y^2∂L(θ)∂a32∂y^2∂W322∂a32
Updates for W3 & b3
∂W322∂L(θ)=δ2∗ε∗h22
=0.1294
0.497
0.502
1.558
1.568
([y1,y2]=[1,0])
0.516
x1
x2
x3
a11
h11
a12
h12
a13
h13
a23
h23
a21
h21
a22
h22
a31
y^1
a32
y^2
W111
W112
W113
W121
W122
W123
W131
W132
W133
W211
W212
W213
W221
W222
W223
W231
W232
W233
b11
b12
b13
b21
b22
b23
W311
W312
W321
W322
W331
W332
b31
b32
L(θ)
∂W331∂L(θ)=?
∂y^1∂L(θ)
∂a31∂y^1
∂W331∂a31
=∂y^1∂L(θ)∂a31∂y^1∂W331∂a31
Updates for W3 & b3
∂W331∂L(θ)=δ1∗ε∗h23
=−0.1302
0.497
0.502
1.558
1.568
([y1,y2]=[1,0])
0.518
x1
x2
x3
a11
h11
a12
h12
a13
h13
a23
h23
a21
h21
a22
h22
a31
y^1
a32
y^2
W111
W112
W113
W121
W122
W123
W131
W132
W133
W211
W212
W213
W221
W222
W223
W231
W232
W233
b11
b12
b13
b21
b22
b23
W311
W312
W321
W322
W331
W332
b31
b32
L(θ)
∂W332∂L(θ)=?
∂y^2∂L(θ)
∂a32∂y^2
∂W332∂a32
=∂y^2∂L(θ)∂a32∂y^2∂W332∂a32
Updates for W3 & b3
∂W332∂L(θ)=δ2∗ε∗h23
=0.1299
0.497
0.502
1.558
1.568
([y1,y2]=[1,0])
0.518
x1
x2
x3
a11
h11
a12
h12
a13
h13
a23
h23
a21
h21
a22
h22
a31
y^1
a32
y^2
W111
W112
W113
W121
W122
W123
W131
W132
W133
W211
W212
W213
W221
W222
W223
W231
W232
W233
b11
b12
b13
b21
b22
b23
W311
W312
W321
W322
W331
W332
b31
b32
L(θ)
∂b31∂L(θ)=?
∂y^1∂L(θ)
∂a31∂y^1
∂b31∂a31
=∂y^1∂L(θ)∂a31∂y^1∂b31∂a31
Updates for W3 & b3
∂b31∂L(θ)=δ1∗ε
=−0.2513
0.497
0.502
1.558
1.568
([y1,y2]=[1,0])
x1
x2
x3
a11
h11
a12
h12
a13
h13
a23
h23
a21
h21
a22
h22
a31
y^1
a32
y^2
W111
W112
W113
W121
W122
W123
W131
W132
W133
W211
W212
W213
W221
W222
W223
W231
W232
W233
b11
b12
b13
b21
b22
b23
W311
W312
W321
W322
W331
W332
b31
b32
L(θ)
∂b32∂L(θ)=?
∂y^2∂L(θ)
∂a32∂y^2
∂b32∂a32
=∂y^2∂L(θ)∂a32∂y^2∂b32∂a32
Updates for W3 & b3
∂b32∂L(θ)=δ2∗ε
=0.2508
0.497
0.502
1.558
1.568
([y1,y2]=[1,0])
a23
h23
a21
h21
a22
h22
a31
y^1
a32
y^2
W311
W312
W321
W322
W331
W332
b31
b32
Let η=0.01
Updated matrices of W3 & b3
x1
x2
x3
a11
h11
a12
h12
a13
h13
a23
h23
a21
h21
a22
h22
a31
y^1
a32
y^2
W111
W112
W113
W121
W122
W123
W131
W132
W133
W211
W212
W213
W221
W222
W223
W231
W232
W233
b11
b12
b13
b21
b22
b23
W311
W312
W321
W322
W331
W332
b31
b32
L(θ)
∂y^1∂L(θ)
∂a31∂y^1
∂h21∂a31
= ∂y^1∂L(θ)∂a31∂y^1∂h21∂a31∂a21∂h21∂W211∂a21
Updates for W2 & b2
∂a21∂h21
∂W211∂a21
∂W211∂L(θ)=?
+ ∂y^2∂L(θ)∂a32∂y^2∂h21∂a32∂a21∂h21∂W211∂a21
∂y^2∂L(θ)
∂a32∂y^2
∂h21∂a32
a11
h11
a21
h21
a31
y^1
a32
y^2
W211
W212
W213
W311
W312
L(θ)
∂y^1∂L(θ)
∂a31∂y^1
∂h21∂a31
= ∂y^1∂L(θ)∂a31∂y^1∂h21∂a31∂a21∂h21∂W211∂a21
∂W211∂a21
∂W211∂L(θ)
+ ∂y^2∂L(θ)∂a32∂y^2∂h21∂a32∂a21∂h21∂W211∂a21
∂y^2∂L(θ)
∂a32∂y^2
∂a21∂h21
∂a21∂h21=∂a21∂σ(a21)=σ(a21)∗(1−σ(a21))
=W211
=h21∗(1−h21)
∂W211∂a21=∂h21∂[h11W211+h12W221+h13W231+b21]
∂W211∂L(θ)=δ1∗ε∗W311∗h21∗(1−h21)∗h11
+δ2∗ε∗W312∗h21∗(1−h21)∗h11
∂h21∂a31=∂h21∂[h21W311+h22W321+h23W331+b31]
= W311
∂h21∂a32=∂h21∂[h21W312+h22W322+h23W332+b32]
= W312
∂h21∂a32
a11
h11
a21
h21
a31
y^1
a32
y^2
W211
W212
W213
W311
W312
L(θ)
∂y^1∂L(θ)=−1.004
∂a31∂y^1=0.2499
∂h21∂a31
= ∂y^1∂L(θ)∂a31∂y^1∂h21∂a31∂a21∂h21∂W211∂a21
∂W211∂a21
∂W211∂L(θ)
+ ∂y^2∂L(θ)∂a32∂y^2∂h21∂a32∂a21∂h21∂W211∂a21
∂y^2∂L(θ)=1.006
∂a32∂y^2=0.2499
∂a21∂h21
∂a21∂h21=∂a21∂σ(a21)=σ(a21)∗(1−σ(a21))
=W211
∂h21∂a31=∂h21∂[h21W311+h22W321+h23W331+b31]
= W311
=h21∗(1−h21)
∂W211∂a21=∂h21∂[h11W211+h12W221+h13W231+b21]
∂W211∂L(θ)=δ1∗ε∗W311∗h21∗(1−h21)∗h11
+δ2∗ε∗W312∗h21∗(1−h21)∗h11
0.497
0.502
1.558
1.568
0.513
0.054
0.589
=0.05
([y1,y2]=[1,0])
=0.2498
=7.068×10−5
∂h21∂a32
∂h21∂a32=∂h21∂[h21W312+h22W322+h23W332+b32]
= W312
=0.05
=0.05
x1
x2
x3
a11
h11
a12
h12
a13
h13
a23
h23
a21
h21
a22
h22
a31
y^1
a32
y^2
W111
W112
W113
W121
W122
W123
W131
W132
W133
W211
W212
W213
W221
W222
W223
W231
W232
W233
b11
b12
b13
b21
b22
b23
W311
W312
W321
W322
W331
W332
b31
b32
L(θ)
∂y^1∂L(θ)
∂a31∂y^1
∂h22∂a31
= ∂y^1∂L(θ)∂a31∂y^1∂h22∂a31∂a22∂h22∂W212∂a22
Updates for W2 & b2
∂a22∂h22
∂W212∂a22
∂W212∂L(θ)=?
+ ∂y^2∂L(θ)∂a32∂y^2∂h22∂a32∂a22∂h22∂W212∂a22
∂y^2∂L(θ)
∂a32∂y^2
∂h22∂a32
a11
h11
a22
h22
a31
y^1
a32
y^2
W212
W321
W322
L(θ)
∂y^1∂L(θ)
∂a31∂y^1
∂h22∂a31
= ∂y^1∂L(θ)∂a31∂y^1∂h22∂a31∂a22∂h22∂W212∂a22
Updates for W2 & b2
∂a22∂h22
∂W212∂a22
∂W212∂L(θ)=?
+ ∂y^2∂L(θ)∂a32∂y^2∂h22∂a32∂a22∂h22∂W212∂a22
∂y^2∂L(θ)
∂a32∂y^2
∂h22∂a32
∂W212∂L(θ)=δ1∗ε∗W321∗h22∗(1−h22)∗h11
+δ2∗ε∗W322∗h22∗(1−h22)∗h11
=6.5968×10−5
Try yourself for the remaining weights and biases..!
0.497
0.502
1.558
1.568
0.516
0.064
0.589
([y1,y2]=[1,0])
x1
x2
x3
a11
h11
a12
h12
a13
h13
a23
h23
a21
h21
a22
h22
a31
y^1
a32
y^2
W111
W112
W113
W121
W122
W123
W131
W132
W133
W211
W212
W213
W221
W222
W223
W231
W232
W233
b11
b12
b13
b21
b22
b23
W311
W312
W321
W322
W331
W332
b31
b32
L(θ)
Updates for W1 & b1
∂a23∂h23
∂W111∂L(θ)=?
x1
a11
h11
a23
h23
a21
h21
a22
h22
a31
y^1
a32
y^2
W111
W112
W113
W211
W212
W213
W311
W312
W321
W322
W331
W332
L(θ)
Updates for W1 & b1
∂W111∂L(θ)=3.285×10−6
x2
x3
W112
W113
W121
W122
W123
W131
W132
W133
b11
b12
b13
a12
h12
a13
h13
W221
W222
W223
W231
W232
W233
b21
b22
b23
b31
b32
x1
x2
x3
h2
a3
b1 =
b2 =
b3=
a2
h1
a1
1.5
2.5
3
0.359
0.369
0.379
0.588
0.591
0.593
0.054
0.064
0.074
0.513
0.516
0.518
1.562
1.563
0.499
0.5002
h3
L(θ)=0.6936
Loss value computed after updates
By Amrutha