Amrutha
Course Content Developer for Deep Learning course by Professor Mitesh Khapra. Offered by IIT Madras Online degree - Programming and Data Science.
\(x_1\)
\(x_2\)
\(x_3\)
\(h_2\)
\(a_3\)
\(b_2\) = [0.01,0.02,0.03]
\(b_3\) = [0.01,0.02]
\(a_2\)
\(h_1\)
\(a_1\)
1.5
2.5
3
0.36
0.37
0.38
0.589
0.591
0.593
0.054
0.064
0.074
0.513
0.516
0.518
1.558
1.568
0.497
0.502
\(\hat y = h_3 \)
\(\mathscr {L}(\theta) = -\frac{1}{N} \sum_{i=1}^N (y_ilog(\hat y_i)+(1-y_i)log(1- \hat y_i))\) = 0.6981
An Example for Backpropagation
"Forward Pass"
\(x=[1.5, 2.5, 3]\)
\(b_1\) = [0.01,0.02,0.03]
\([h_1]=sigmoid(a_1)\)
\([h_2]=sigmoid(a_2)\)
\([h_3]=softmax(a_3)\)
\([a_1]=[1.5,2.5,3]*\)
\(+ [0.01,0.02,0.03]\)
\([a_2]=[0.589,0.591,0.593]*\)
\(+ [0.01,0.02,0.03]\)
\([a_3]=[0.513,0.516,0.518]*\)
\(+ [0.01,0.02]\)
\(y=[1, 0]\)
"Binary Cross Entropy Loss"
\(x_1\)
\(x_2\)
\(x_3\)
\(a_{11}\)
\(h_{11}\)
\(a_{12}\)
\(h_{12}\)
\(a_{13}\)
\(h_{13}\)
\(a_{23}\)
\(h_{23}\)
\(a_{21}\)
\(h_{21}\)
\(a_{22}\)
\(h_{22}\)
\(a_{31}\)
\(\hat y_1\)
\(a_{32}\)
\(\hat y_2\)
\(W_{111}\)
\(W_{112}\)
\(W_{113}\)
\(W_{121}\)
\(W_{122}\)
\(W_{123}\)
\(W_{131}\)
\(W_{132}\)
\(W_{133}\)
\(W_{211}\)
\(W_{212}\)
\(W_{213}\)
\(W_{221}\)
\(W_{222}\)
\(W_{223}\)
\(W_{231}\)
\(W_{232}\)
\(W_{233}\)
\(b_{11}\)
\(b_{12}\)
\(b_{13}\)
\(b_{21}\)
\(b_{22}\)
\(b_{23}\)
\(W_{311}\)
\(W_{312}\)
\(W_{321}\)
\(W_{322}\)
\(W_{331}\)
\(W_{332}\)
\(b_{31}\)
\(b_{32}\)
\(\mathscr {L}(\theta)\)
"Backward Pass" Computing updates
\(x_1\)
\(x_2\)
\(x_3\)
\(a_{11}\)
\(h_{11}\)
\(a_{12}\)
\(h_{12}\)
\(a_{13}\)
\(h_{13}\)
\(a_{23}\)
\(h_{23}\)
\(a_{21}\)
\(h_{21}\)
\(a_{22}\)
\(h_{22}\)
\(a_{31}\)
\(\hat y_1\)
\(a_{32}\)
\(\hat y_2\)
\(W_{111}\)
\(W_{112}\)
\(W_{113}\)
\(W_{121}\)
\(W_{122}\)
\(W_{123}\)
\(W_{131}\)
\(W_{132}\)
\(W_{133}\)
\(W_{211}\)
\(W_{212}\)
\(W_{213}\)
\(W_{221}\)
\(W_{222}\)
\(W_{223}\)
\(W_{231}\)
\(W_{232}\)
\(W_{233}\)
\(b_{11}\)
\(b_{12}\)
\(b_{13}\)
\(b_{21}\)
\(b_{22}\)
\(b_{23}\)
\(W_{311}\)
\(W_{312}\)
\(W_{321}\)
\(W_{322}\)
\(W_{331}\)
\(W_{332}\)
\(b_{31}\)
\(b_{32}\)
\(\mathscr {L}(\theta)\)
Updates for weights \(W_3\) and biases \(b_3\)
\(x_1\)
\(x_2\)
\(x_3\)
\(a_{11}\)
\(h_{11}\)
\(a_{12}\)
\(h_{12}\)
\(a_{13}\)
\(h_{13}\)
\(a_{23}\)
\(h_{23}\)
\(a_{21}\)
\(h_{21}\)
\(a_{22}\)
\(h_{22}\)
\(a_{31}\)
\(\hat y_1\)
\(a_{32}\)
\(\hat y_2\)
\(W_{111}\)
\(W_{112}\)
\(W_{113}\)
\(W_{121}\)
\(W_{122}\)
\(W_{123}\)
\(W_{131}\)
\(W_{132}\)
\(W_{133}\)
\(W_{211}\)
\(W_{212}\)
\(W_{213}\)
\(W_{221}\)
\(W_{222}\)
\(W_{223}\)
\(W_{231}\)
\(W_{232}\)
\(W_{233}\)
\(b_{11}\)
\(b_{12}\)
\(b_{13}\)
\(b_{21}\)
\(b_{22}\)
\(b_{23}\)
\(W_{311}\)
\(W_{312}\)
\(W_{321}\)
\(W_{322}\)
\(W_{331}\)
\(W_{332}\)
\(b_{31}\)
\(b_{32}\)
\(\mathscr {L}(\theta)\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{311}} = ?\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \)
\( \frac {\partial \hat y_1}{\partial a_{31}} \)
\( \frac {\partial a_{31}}{\partial W_{311}} \)
\(=\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \frac {\partial \hat y_1}{\partial a_{31}} \frac {\partial a_{31}}{\partial W_{311}} \)
Updates for \(W_3\) & \(b_3\)
\(a_{23}\)
\(h_{23}\)
\(a_{21}\)
\(h_{21}\)
\(a_{22}\)
\(h_{22}\)
\(a_{31}\)
\(\hat y_1\)
\(a_{32}\)
\(\hat y_2\)
\(W_{311}\)
\(W_{312}\)
\(W_{321}\)
\(W_{322}\)
\(W_{331}\)
\(W_{332}\)
\(b_{31}\)
\(b_{32}\)
\(\mathscr {L}(\theta)\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{311}} = ?\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \)
\( \frac {\partial \hat y_1}{\partial a_{31}} \)
\( \frac {\partial a_{31}}{\partial W_{311}} \)
\(=\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \frac {\partial \hat y_1}{\partial a_{31}} \frac {\partial a_{31}}{\partial W_{311}} \)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \)
\( \frac {\partial \hat y_1}{\partial a_{31}} \)
\( \frac {\partial a_{31}}{\partial W_{311}} \)
To find:
\(\mathscr {L}(\theta) = -\frac{1}{N} \sum_{i=1}^N (y_ilog(\hat y_i)+(1-y_i)log(1- \hat y_i))\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} = - \frac {1}{2} \big ( \frac {y_1}{\hat y_1} - \frac {1-y_1}{1-\hat y_1} \big) = \delta_1\)
\( \frac {\partial \hat y_1}{\partial a_{31}} = \frac {\partial }{\partial a_{31}} softmax (a_{31})\)
\( = \frac {e^{a_{31}}* \space e^{a_{32}}}{(e^{a_{31}}+e^{a_{32}})^2} = \varepsilon\)
\( \frac {\partial a_{31}}{\partial W_{311}} = \frac {\partial}{\partial W_{311}} \big[ h_{21}W_{311} + h_{22}W_{321} + h_{23}W_{331} + b_{31} \big ]\)
\( = h_{21}\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{311}} = \delta_1*\varepsilon*h_{21}\)
\(= -\frac{1}{2} \bigg ( (y_1*log(\hat y_1)) + ((1-y_1)*log(1-\hat y_1) )+ (y_2*log(\hat y_2)) + ((1-y_2)*log(1-\hat y_2)) \bigg )\)
\(a_{23}\)
\(h_{23}\)
\(a_{21}\)
\(h_{21}\)
\(a_{22}\)
\(h_{22}\)
\(a_{31}\)
\(\hat y_1\)
\(a_{32}\)
\(\hat y_2\)
\(W_{311}\)
\(W_{312}\)
\(W_{321}\)
\(W_{322}\)
\(W_{331}\)
\(W_{332}\)
\(b_{31}\)
\(b_{32}\)
\(\mathscr {L}(\theta)\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{311}} = ?\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \)
\( \frac {\partial \hat y_1}{\partial a_{31}} \)
\( \frac {\partial a_{31}}{\partial W_{311}} \)
\(=\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \frac {\partial \hat y_1}{\partial a_{31}} \frac {\partial a_{31}}{\partial W_{311}} \)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \)
\( \frac {\partial \hat y_1}{\partial a_{31}} \)
\( \frac {\partial a_{31}}{\partial W_{311}} \)
\(\mathscr {L}(\theta) = -\frac{1}{N} \sum_{i=1}^N (y_ilog(\hat y_i)+(1-y_i)log(1- \hat y_i))\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} = - \frac {1}{2} \big ( \frac {y_1}{\hat y_1} - \frac {1-y_1}{1-\hat y_1} \big) = \delta_1\)
\( \frac {\partial \hat y_1}{\partial a_{31}} = \frac {\partial }{\partial a_{31}} softmax (a_{31})\)
\( = \frac {e^{a_{31}}* \space e^{a_{32}}}{(e^{a_{31}}+e^{a_{32}})^2} = \varepsilon\)
To find:
\( \frac {\partial a_{31}}{\partial W_{311}} = \frac {\partial}{\partial W_{311}} \big[ h_{21}W_{311} + h_{22}W_{321} + h_{23}W_{331} + b_{31} \big ]\)
\( = h_{21}\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{311}} = \delta_1*\varepsilon*h_{21}\)
0.497
0.502
\(= -1.006\)
1.558
1.568
\(= 0.2499\)
\(= 0.513\)
0.513
\(= -0.1289\)
(\([y_1,y_2]=[1, 0]\))
\(= -\frac{1}{2} \bigg ( (y_1*log(\hat y_1)) + ((1-y_1)*log(1-\hat y_1) )+ (y_2*log(\hat y_2)) + ((1-y_2)*log(1-\hat y_2)) \bigg )\)
\(x_1\)
\(x_2\)
\(x_3\)
\(a_{11}\)
\(h_{11}\)
\(a_{12}\)
\(h_{12}\)
\(a_{13}\)
\(h_{13}\)
\(a_{23}\)
\(h_{23}\)
\(a_{21}\)
\(h_{21}\)
\(a_{22}\)
\(h_{22}\)
\(a_{31}\)
\(\hat y_1\)
\(a_{32}\)
\(\hat y_2\)
\(W_{111}\)
\(W_{112}\)
\(W_{113}\)
\(W_{121}\)
\(W_{122}\)
\(W_{123}\)
\(W_{131}\)
\(W_{132}\)
\(W_{133}\)
\(W_{211}\)
\(W_{212}\)
\(W_{213}\)
\(W_{221}\)
\(W_{222}\)
\(W_{223}\)
\(W_{231}\)
\(W_{232}\)
\(W_{233}\)
\(b_{11}\)
\(b_{12}\)
\(b_{13}\)
\(b_{21}\)
\(b_{22}\)
\(b_{23}\)
\(W_{311}\)
\(W_{312}\)
\(W_{321}\)
\(W_{322}\)
\(W_{331}\)
\(W_{332}\)
\(b_{31}\)
\(b_{32}\)
\(\mathscr {L}(\theta)\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{312}} = ?\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \)
\( \frac {\partial \hat y_2}{\partial a_{32}} \)
\( \frac {\partial a_{32}}{\partial W_{312}} \)
\(=\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \frac {\partial \hat y_2}{\partial a_{32}} \frac {\partial a_{32}}{\partial W_{312}} \)
Updates for \(W_3\) & \(b_3\)
\(a_{23}\)
\(h_{23}\)
\(a_{21}\)
\(h_{21}\)
\(a_{22}\)
\(h_{22}\)
\(a_{31}\)
\(\hat y_1\)
\(a_{32}\)
\(\hat y_2\)
\(W_{311}\)
\(W_{312}\)
\(W_{321}\)
\(W_{322}\)
\(W_{331}\)
\(W_{332}\)
\(b_{31}\)
\(b_{32}\)
\(\mathscr {L}(\theta)\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{312}} = ?\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \)
\( \frac {\partial \hat y_2}{\partial a_{32}} \)
\( \frac {\partial a_{32}}{\partial W_{312}} \)
\(=\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \frac {\partial \hat y_2}{\partial a_{32}} \frac {\partial a_{32}}{\partial W_{312}} \)
\(\mathscr {L}(\theta) = -\frac{1}{N} \sum_{i=1}^N (y_ilog(\hat y_i)+(1-y_i)log(1- \hat y_i))\)
\(= -\frac{1}{2} \bigg ( (y_1*log(\hat y_1)) + ((1-y_1)*log(1-\hat y_1) )+ (y_2*log(\hat y_2)) + ((1-y_2)*log(1-\hat y_2)) \bigg )\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} = - \frac {1}{2} \big ( \frac {y_2}{\hat y_2} - \frac {1-y_2}{1-\hat y_2} \big) = \delta_2\)
\( \frac {\partial \hat y_1}{\partial a_{32}} = \frac {\partial }{\partial a_{32}} softmax (a_{32})\)
\( = \frac {e^{a_{31}}* \space e^{a_{32}}}{(e^{a_{31}}+e^{a_{32}})^2} = \varepsilon\)
\( \frac {\partial a_{32}}{\partial W_{312}} = \frac {\partial}{\partial W_{312}} \big[ h_{21}W_{312} + h_{22}W_{322} + h_{23}W_{332} + b_{32} \big ]\)
\( = h_{21}\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{312}} = \delta_2*\varepsilon*h_{21}\)
To find:
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \)
\( \frac {\partial \hat y_2}{\partial a_{32}} \)
\( \frac {\partial a_{32}}{\partial W_{312}} \)
\(a_{23}\)
\(h_{23}\)
\(a_{21}\)
\(h_{21}\)
\(a_{22}\)
\(h_{22}\)
\(a_{31}\)
\(\hat y_1\)
\(a_{32}\)
\(\hat y_2\)
\(W_{311}\)
\(W_{312}\)
\(W_{321}\)
\(W_{322}\)
\(W_{331}\)
\(W_{332}\)
\(b_{31}\)
\(b_{32}\)
\(\mathscr {L}(\theta)\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{312}} = ?\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \)
\( \frac {\partial \hat y_2}{\partial a_{32}} \)
\( \frac {\partial a_{32}}{\partial W_{312}} \)
\(=\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \frac {\partial \hat y_2}{\partial a_{32}} \frac {\partial a_{32}}{\partial W_{312}} \)
To find:
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \)
\( \frac {\partial \hat y_2}{\partial a_{32}} \)
\( \frac {\partial a_{32}}{\partial W_{312}} \)
0.497
0.502
\(= 1.004\)
1.558
1.568
\(= 0.2499\)
\(= 0.513\)
0.513
\(= 0.1287\)
(\([y_1,y_2]=[1, 0]\))
\(\mathscr {L}(\theta) = -\frac{1}{N} \sum_{i=1}^N (y_ilog(\hat y_i)+(1-y_i)log(1- \hat y_i))\)
\(= -\frac{1}{2} \bigg ( (y_1*log(\hat y_1)) + ((1-y_1)*log(1-\hat y_1) )+ (y_2*log(\hat y_2)) + ((1-y_2)*log(1-\hat y_2)) \bigg )\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} = - \frac {1}{2} \big ( \frac {y_2}{\hat y_2} - \frac {1-y_2}{1-\hat y_2} \big) = \delta_2\)
\( \frac {\partial \hat y_1}{\partial a_{32}} = \frac {\partial }{\partial a_{32}} softmax (a_{32})\)
\( = \frac {e^{a_{31}}* \space e^{a_{32}}}{(e^{a_{31}}+e^{a_{32}})^2} = \varepsilon\)
\( \frac {\partial a_{32}}{\partial W_{312}} = \frac {\partial}{\partial W_{312}} \big[ h_{21}W_{312} + h_{22}W_{322} + h_{23}W_{332} + b_{32} \big ]\)
\( = h_{21}\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{312}} = \delta_2*\varepsilon*h_{21}\)
\(x_1\)
\(x_2\)
\(x_3\)
\(a_{11}\)
\(h_{11}\)
\(a_{12}\)
\(h_{12}\)
\(a_{13}\)
\(h_{13}\)
\(a_{23}\)
\(h_{23}\)
\(a_{21}\)
\(h_{21}\)
\(a_{22}\)
\(h_{22}\)
\(a_{31}\)
\(\hat y_1\)
\(a_{32}\)
\(\hat y_2\)
\(W_{111}\)
\(W_{112}\)
\(W_{113}\)
\(W_{121}\)
\(W_{122}\)
\(W_{123}\)
\(W_{131}\)
\(W_{132}\)
\(W_{133}\)
\(W_{211}\)
\(W_{212}\)
\(W_{213}\)
\(W_{221}\)
\(W_{222}\)
\(W_{223}\)
\(W_{231}\)
\(W_{232}\)
\(W_{233}\)
\(b_{11}\)
\(b_{12}\)
\(b_{13}\)
\(b_{21}\)
\(b_{22}\)
\(b_{23}\)
\(W_{311}\)
\(W_{312}\)
\(W_{321}\)
\(W_{322}\)
\(W_{331}\)
\(W_{332}\)
\(b_{31}\)
\(b_{32}\)
\(\mathscr {L}(\theta)\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{321}} = ?\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \)
\( \frac {\partial \hat y_1}{\partial a_{31}} \)
\( \frac {\partial a_{31}}{\partial W_{321}} \)
\(=\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \frac {\partial \hat y_1}{\partial a_{31}} \frac {\partial a_{31}}{\partial W_{321}} \)
Updates for \(W_3\) & \(b_3\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{321}} = \delta_1*\varepsilon*h_{22}\)
\(=- 0.1297\)
0.497
0.502
1.558
1.568
(\([y_1,y_2]=[1, 0]\))
0.513
\(x_1\)
\(x_2\)
\(x_3\)
\(a_{11}\)
\(h_{11}\)
\(a_{12}\)
\(h_{12}\)
\(a_{13}\)
\(h_{13}\)
\(a_{23}\)
\(h_{23}\)
\(a_{21}\)
\(h_{21}\)
\(a_{22}\)
\(h_{22}\)
\(a_{31}\)
\(\hat y_1\)
\(a_{32}\)
\(\hat y_2\)
\(W_{111}\)
\(W_{112}\)
\(W_{113}\)
\(W_{121}\)
\(W_{122}\)
\(W_{123}\)
\(W_{131}\)
\(W_{132}\)
\(W_{133}\)
\(W_{211}\)
\(W_{212}\)
\(W_{213}\)
\(W_{221}\)
\(W_{222}\)
\(W_{223}\)
\(W_{231}\)
\(W_{232}\)
\(W_{233}\)
\(b_{11}\)
\(b_{12}\)
\(b_{13}\)
\(b_{21}\)
\(b_{22}\)
\(b_{23}\)
\(W_{311}\)
\(W_{312}\)
\(W_{321}\)
\(W_{322}\)
\(W_{331}\)
\(W_{332}\)
\(b_{31}\)
\(b_{32}\)
\(\mathscr {L}(\theta)\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{322}} = ?\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \)
\( \frac {\partial \hat y_2}{\partial a_{32}} \)
\( \frac {\partial a_{32}}{\partial W_{322}} \)
\(=\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \frac {\partial \hat y_2}{\partial a_{32}} \frac {\partial a_{32}}{\partial W_{322}} \)
Updates for \(W_3\) & \(b_3\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{322}} = \delta_2*\varepsilon*h_{22}\)
\(= 0.1294\)
0.497
0.502
1.558
1.568
(\([y_1,y_2]=[1, 0]\))
0.516
\(x_1\)
\(x_2\)
\(x_3\)
\(a_{11}\)
\(h_{11}\)
\(a_{12}\)
\(h_{12}\)
\(a_{13}\)
\(h_{13}\)
\(a_{23}\)
\(h_{23}\)
\(a_{21}\)
\(h_{21}\)
\(a_{22}\)
\(h_{22}\)
\(a_{31}\)
\(\hat y_1\)
\(a_{32}\)
\(\hat y_2\)
\(W_{111}\)
\(W_{112}\)
\(W_{113}\)
\(W_{121}\)
\(W_{122}\)
\(W_{123}\)
\(W_{131}\)
\(W_{132}\)
\(W_{133}\)
\(W_{211}\)
\(W_{212}\)
\(W_{213}\)
\(W_{221}\)
\(W_{222}\)
\(W_{223}\)
\(W_{231}\)
\(W_{232}\)
\(W_{233}\)
\(b_{11}\)
\(b_{12}\)
\(b_{13}\)
\(b_{21}\)
\(b_{22}\)
\(b_{23}\)
\(W_{311}\)
\(W_{312}\)
\(W_{321}\)
\(W_{322}\)
\(W_{331}\)
\(W_{332}\)
\(b_{31}\)
\(b_{32}\)
\(\mathscr {L}(\theta)\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{331}} = ?\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \)
\( \frac {\partial \hat y_1}{\partial a_{31}} \)
\( \frac {\partial a_{31}}{\partial W_{331}} \)
\(=\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \frac {\partial \hat y_1}{\partial a_{31}} \frac {\partial a_{31}}{\partial W_{331}} \)
Updates for \(W_3\) & \(b_3\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{331}} = \delta_1*\varepsilon*h_{23}\)
\(= -0.1302\)
0.497
0.502
1.558
1.568
(\([y_1,y_2]=[1, 0]\))
0.518
\(x_1\)
\(x_2\)
\(x_3\)
\(a_{11}\)
\(h_{11}\)
\(a_{12}\)
\(h_{12}\)
\(a_{13}\)
\(h_{13}\)
\(a_{23}\)
\(h_{23}\)
\(a_{21}\)
\(h_{21}\)
\(a_{22}\)
\(h_{22}\)
\(a_{31}\)
\(\hat y_1\)
\(a_{32}\)
\(\hat y_2\)
\(W_{111}\)
\(W_{112}\)
\(W_{113}\)
\(W_{121}\)
\(W_{122}\)
\(W_{123}\)
\(W_{131}\)
\(W_{132}\)
\(W_{133}\)
\(W_{211}\)
\(W_{212}\)
\(W_{213}\)
\(W_{221}\)
\(W_{222}\)
\(W_{223}\)
\(W_{231}\)
\(W_{232}\)
\(W_{233}\)
\(b_{11}\)
\(b_{12}\)
\(b_{13}\)
\(b_{21}\)
\(b_{22}\)
\(b_{23}\)
\(W_{311}\)
\(W_{312}\)
\(W_{321}\)
\(W_{322}\)
\(W_{331}\)
\(W_{332}\)
\(b_{31}\)
\(b_{32}\)
\(\mathscr {L}(\theta)\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{332}} = ?\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \)
\( \frac {\partial \hat y_2}{\partial a_{32}} \)
\( \frac {\partial a_{32}}{\partial W_{332}} \)
\(=\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \frac {\partial \hat y_2}{\partial a_{32}} \frac {\partial a_{32}}{\partial W_{332}} \)
Updates for \(W_3\) & \(b_3\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{332}} = \delta_2*\varepsilon*h_{23}\)
\(= 0.1299\)
0.497
0.502
1.558
1.568
(\([y_1,y_2]=[1, 0]\))
0.518
\(x_1\)
\(x_2\)
\(x_3\)
\(a_{11}\)
\(h_{11}\)
\(a_{12}\)
\(h_{12}\)
\(a_{13}\)
\(h_{13}\)
\(a_{23}\)
\(h_{23}\)
\(a_{21}\)
\(h_{21}\)
\(a_{22}\)
\(h_{22}\)
\(a_{31}\)
\(\hat y_1\)
\(a_{32}\)
\(\hat y_2\)
\(W_{111}\)
\(W_{112}\)
\(W_{113}\)
\(W_{121}\)
\(W_{122}\)
\(W_{123}\)
\(W_{131}\)
\(W_{132}\)
\(W_{133}\)
\(W_{211}\)
\(W_{212}\)
\(W_{213}\)
\(W_{221}\)
\(W_{222}\)
\(W_{223}\)
\(W_{231}\)
\(W_{232}\)
\(W_{233}\)
\(b_{11}\)
\(b_{12}\)
\(b_{13}\)
\(b_{21}\)
\(b_{22}\)
\(b_{23}\)
\(W_{311}\)
\(W_{312}\)
\(W_{321}\)
\(W_{322}\)
\(W_{331}\)
\(W_{332}\)
\(b_{31}\)
\(b_{32}\)
\(\mathscr {L}(\theta)\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial b_{31}} = ?\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \)
\( \frac {\partial \hat y_1}{\partial a_{31}} \)
\( \frac {\partial a_{31}}{\partial b_{31}} \)
\(=\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \frac {\partial \hat y_1}{\partial a_{31}} \frac {\partial a_{31}}{\partial b_{31}} \)
Updates for \(W_3\) & \(b_3\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial b_{31}} = \delta_1*\varepsilon\)
\(= - 0.2513\)
0.497
0.502
1.558
1.568
(\([y_1,y_2]=[1, 0]\))
\(x_1\)
\(x_2\)
\(x_3\)
\(a_{11}\)
\(h_{11}\)
\(a_{12}\)
\(h_{12}\)
\(a_{13}\)
\(h_{13}\)
\(a_{23}\)
\(h_{23}\)
\(a_{21}\)
\(h_{21}\)
\(a_{22}\)
\(h_{22}\)
\(a_{31}\)
\(\hat y_1\)
\(a_{32}\)
\(\hat y_2\)
\(W_{111}\)
\(W_{112}\)
\(W_{113}\)
\(W_{121}\)
\(W_{122}\)
\(W_{123}\)
\(W_{131}\)
\(W_{132}\)
\(W_{133}\)
\(W_{211}\)
\(W_{212}\)
\(W_{213}\)
\(W_{221}\)
\(W_{222}\)
\(W_{223}\)
\(W_{231}\)
\(W_{232}\)
\(W_{233}\)
\(b_{11}\)
\(b_{12}\)
\(b_{13}\)
\(b_{21}\)
\(b_{22}\)
\(b_{23}\)
\(W_{311}\)
\(W_{312}\)
\(W_{321}\)
\(W_{322}\)
\(W_{331}\)
\(W_{332}\)
\(b_{31}\)
\(b_{32}\)
\(\mathscr {L}(\theta)\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial b_{32}} = ?\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \)
\( \frac {\partial \hat y_2}{\partial a_{32}} \)
\( \frac {\partial a_{32}}{\partial b_{32}} \)
\(=\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \frac {\partial \hat y_2}{\partial a_{32}} \frac {\partial a_{32}}{\partial b_{32}} \)
Updates for \(W_3\) & \(b_3\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial b_{32}} = \delta_2*\varepsilon\)
\(= 0.2508\)
0.497
0.502
1.558
1.568
(\([y_1,y_2]=[1, 0]\))
\(a_{23}\)
\(h_{23}\)
\(a_{21}\)
\(h_{21}\)
\(a_{22}\)
\(h_{22}\)
\(a_{31}\)
\(\hat y_1\)
\(a_{32}\)
\(\hat y_2\)
\(W_{311}\)
\(W_{312}\)
\(W_{321}\)
\(W_{322}\)
\(W_{331}\)
\(W_{332}\)
\(b_{31}\)
\(b_{32}\)
Let \(\eta = 0.01\)
Updated matrices of \(W_3\) & \(b_3\)
\(x_1\)
\(x_2\)
\(x_3\)
\(a_{11}\)
\(h_{11}\)
\(a_{12}\)
\(h_{12}\)
\(a_{13}\)
\(h_{13}\)
\(a_{23}\)
\(h_{23}\)
\(a_{21}\)
\(h_{21}\)
\(a_{22}\)
\(h_{22}\)
\(a_{31}\)
\(\hat y_1\)
\(a_{32}\)
\(\hat y_2\)
\(W_{111}\)
\(W_{112}\)
\(W_{113}\)
\(W_{121}\)
\(W_{122}\)
\(W_{123}\)
\(W_{131}\)
\(W_{132}\)
\(W_{133}\)
\(W_{211}\)
\(W_{212}\)
\(W_{213}\)
\(W_{221}\)
\(W_{222}\)
\(W_{223}\)
\(W_{231}\)
\(W_{232}\)
\(W_{233}\)
\(b_{11}\)
\(b_{12}\)
\(b_{13}\)
\(b_{21}\)
\(b_{22}\)
\(b_{23}\)
\(W_{311}\)
\(W_{312}\)
\(W_{321}\)
\(W_{322}\)
\(W_{331}\)
\(W_{332}\)
\(b_{31}\)
\(b_{32}\)
\(\mathscr {L}(\theta)\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \)
\( \frac {\partial \hat y_1}{\partial a_{31}} \)
\( \frac {\partial a_{31}}{\partial h_{21}} \)
= \(\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \frac {\partial \hat y_1}{\partial a_{31}} \frac {\partial a_{31}}{\partial h_{21}} \frac {\partial h_{21}}{\partial a_{21}} \frac {\partial a_{21}}{\partial W_{211}} \)
Updates for \(W_2\) & \(b_2\)
\( \frac {\partial h_{21}}{\partial a_{21}} \)
\( \frac {\partial a_{21}}{\partial W_{211}} \)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{211}} = ?\)
+ \(\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \frac {\partial \hat y_2}{\partial a_{32}} \frac {\partial a_{32}}{\partial h_{21}} \frac {\partial h_{21}}{\partial a_{21}} \frac {\partial a_{21}}{\partial W_{211}} \)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \)
\( \frac {\partial \hat y_2}{\partial a_{32}} \)
\( \frac {\partial a_{32}}{\partial h_{21}} \)
\(a_{11}\)
\(h_{11}\)
\(a_{21}\)
\(h_{21}\)
\(a_{31}\)
\(\hat y_1\)
\(a_{32}\)
\(\hat y_2\)
\(W_{211}\)
\(W_{212}\)
\(W_{213}\)
\(W_{311}\)
\(W_{312}\)
\(\mathscr {L}(\theta)\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \)
\( \frac {\partial \hat y_1}{\partial a_{31}} \)
\( \frac {\partial a_{31}}{\partial h_{21}} \)
= \(\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \frac {\partial \hat y_1}{\partial a_{31}} \frac {\partial a_{31}}{\partial h_{21}} \frac {\partial h_{21}}{\partial a_{21}} \frac {\partial a_{21}}{\partial W_{211}} \)
\( \frac {\partial a_{21}}{\partial W_{211}} \)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{211}}\)
+ \(\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \frac {\partial \hat y_2}{\partial a_{32}} \frac {\partial a_{32}}{\partial h_{21}} \frac {\partial h_{21}}{\partial a_{21}} \frac {\partial a_{21}}{\partial W_{211}} \)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \)
\( \frac {\partial \hat y_2}{\partial a_{32}} \)
\( \frac {\partial h_{21}}{\partial a_{21}} \)
\( \frac {\partial h_{21}}{\partial a_{21}} = \frac {\partial}{\partial a_{21}} \sigma (a_{21}) = \sigma(a_{21})*(1-\sigma (a_{21}))\)
\(=h_{11}\)
\( = h_{21}*(1-h_{21})\)
\( \frac {\partial a_{21}}{\partial W_{211}} = \frac {\partial}{\partial W_{211}} \big[ h_{11}W_{211} + h_{12}W_{221} + h_{13}W_{231} + b_{21} \big ]\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{211}} = \delta_1*\varepsilon*W_{311}*h_{21}*(1-h_{21})*h_{11}\)
\( + \delta_2*\varepsilon*W_{312}*h_{21}*(1-h_{21})*h_{11}\)
\( \frac {\partial a_{31}}{\partial h_{21}} = \frac {\partial}{\partial h_{21}} \big[ h_{21}W_{311} + h_{22}W_{321} + h_{23}W_{331} + b_{31} \big ]\)
\(= W_{311}\)
\( \frac {\partial a_{32}}{\partial h_{21}} = \frac {\partial}{\partial h_{21}} \big[ h_{21}W_{312} + h_{22}W_{322} + h_{23}W_{332} + b_{32} \big ]\)
\(= W_{312}\)
\( \frac {\partial a_{32}}{\partial h_{21}} \)
\(a_{11}\)
\(h_{11}\)
\(a_{21}\)
\(h_{21}\)
\(a_{31}\)
\(\hat y_1\)
\(a_{32}\)
\(\hat y_2\)
\(W_{211}\)
\(W_{212}\)
\(W_{213}\)
\(W_{311}\)
\(W_{312}\)
\(\mathscr {L}(\theta)\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} =-1.004\)
\( \frac {\partial \hat y_1}{\partial a_{31}} =0.2499\)
\( \frac {\partial a_{31}}{\partial h_{21}} \)
= \(\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \frac {\partial \hat y_1}{\partial a_{31}} \frac {\partial a_{31}}{\partial h_{21}} \frac {\partial h_{21}}{\partial a_{21}} \frac {\partial a_{21}}{\partial W_{211}} \)
\( \frac {\partial a_{21}}{\partial W_{211}} \)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{211}}\)
+ \(\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \frac {\partial \hat y_2}{\partial a_{32}} \frac {\partial a_{32}}{\partial h_{21}} \frac {\partial h_{21}}{\partial a_{21}} \frac {\partial a_{21}}{\partial W_{211}} \)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} =1.006\)
\( \frac {\partial \hat y_2}{\partial a_{32}} =0.2499\)
\( \frac {\partial h_{21}}{\partial a_{21}} \)
\( \frac {\partial h_{21}}{\partial a_{21}} = \frac {\partial}{\partial a_{21}} \sigma (a_{21}) = \sigma(a_{21})*(1-\sigma (a_{21}))\)
\(=h_{11}\)
\( \frac {\partial a_{31}}{\partial h_{21}} = \frac {\partial}{\partial h_{21}} \big[ h_{21}W_{311} + h_{22}W_{321} + h_{23}W_{331} + b_{31} \big ]\)
\(= W_{311}\)
\( = h_{21}*(1-h_{21})\)
\( \frac {\partial a_{21}}{\partial W_{211}} = \frac {\partial}{\partial W_{211}} \big[ h_{11}W_{211} + h_{12}W_{221} + h_{13}W_{231} + b_{21} \big ]\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{211}} = \delta_1*\varepsilon*W_{311}*h_{21}*(1-h_{21})*h_{11}\)
\( + \delta_2*\varepsilon*W_{312}*h_{21}*(1-h_{21})*h_{11}\)
0.497
0.502
1.558
1.568
0.513
0.054
0.589
\(= 0.05\)
(\([y_1,y_2]=[1, 0]\))
\(= 0.2498\)
\(= 7.068 \times 10^{-5}\)
\( \frac {\partial a_{32}}{\partial h_{21}} \)
\( \frac {\partial a_{32}}{\partial h_{21}} = \frac {\partial}{\partial h_{21}} \big[ h_{21}W_{312} + h_{22}W_{322} + h_{23}W_{332} + b_{32} \big ]\)
\(= W_{312}\)
\(= 0.05\)
\(= 0.589\)
\(x_1\)
\(x_2\)
\(x_3\)
\(a_{11}\)
\(h_{11}\)
\(a_{12}\)
\(h_{12}\)
\(a_{13}\)
\(h_{13}\)
\(a_{23}\)
\(h_{23}\)
\(a_{21}\)
\(h_{21}\)
\(a_{22}\)
\(h_{22}\)
\(a_{31}\)
\(\hat y_1\)
\(a_{32}\)
\(\hat y_2\)
\(W_{111}\)
\(W_{112}\)
\(W_{113}\)
\(W_{121}\)
\(W_{122}\)
\(W_{123}\)
\(W_{131}\)
\(W_{132}\)
\(W_{133}\)
\(W_{211}\)
\(W_{212}\)
\(W_{213}\)
\(W_{221}\)
\(W_{222}\)
\(W_{223}\)
\(W_{231}\)
\(W_{232}\)
\(W_{233}\)
\(b_{11}\)
\(b_{12}\)
\(b_{13}\)
\(b_{21}\)
\(b_{22}\)
\(b_{23}\)
\(W_{311}\)
\(W_{312}\)
\(W_{321}\)
\(W_{322}\)
\(W_{331}\)
\(W_{332}\)
\(b_{31}\)
\(b_{32}\)
\(\mathscr {L}(\theta)\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \)
\( \frac {\partial \hat y_1}{\partial a_{31}} \)
\( \frac {\partial a_{31}}{\partial h_{22}} \)
= \(\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \frac {\partial \hat y_1}{\partial a_{31}} \frac {\partial a_{31}}{\partial h_{22}} \frac {\partial h_{22}}{\partial a_{22}} \frac {\partial a_{22}}{\partial W_{212}} \)
Updates for \(W_2\) & \(b_2\)
\( \frac {\partial h_{22}}{\partial a_{22}} \)
\( \frac {\partial a_{22}}{\partial W_{212}} \)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{212}} = ?\)
+ \(\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \frac {\partial \hat y_2}{\partial a_{32}} \frac {\partial a_{32}}{\partial h_{22}} \frac {\partial h_{22}}{\partial a_{22}} \frac {\partial a_{22}}{\partial W_{212}} \)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \)
\( \frac {\partial \hat y_2}{\partial a_{32}} \)
\( \frac {\partial a_{32}}{\partial h_{22}} \)
\(a_{11}\)
\(h_{11}\)
\(a_{22}\)
\(h_{22}\)
\(a_{31}\)
\(\hat y_1\)
\(a_{32}\)
\(\hat y_2\)
\(W_{212}\)
\(W_{321}\)
\(W_{322}\)
\(\mathscr {L}(\theta)\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \)
\( \frac {\partial \hat y_1}{\partial a_{31}} \)
\( \frac {\partial a_{31}}{\partial h_{22}} \)
= \(\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \frac {\partial \hat y_1}{\partial a_{31}} \frac {\partial a_{31}}{\partial h_{22}} \frac {\partial h_{22}}{\partial a_{22}} \frac {\partial a_{22}}{\partial W_{212}} \)
Updates for \(W_2\) & \(b_2\)
\( \frac {\partial h_{22}}{\partial a_{22}} \)
\( \frac {\partial a_{22}}{\partial W_{212}} \)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{212}} = ?\)
+ \(\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \frac {\partial \hat y_2}{\partial a_{32}} \frac {\partial a_{32}}{\partial h_{22}} \frac {\partial h_{22}}{\partial a_{22}} \frac {\partial a_{22}}{\partial W_{212}} \)
\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \)
\( \frac {\partial \hat y_2}{\partial a_{32}} \)
\( \frac {\partial a_{32}}{\partial h_{22}} \)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{212}} = \delta_1*\varepsilon*W_{321}*h_{22}*(1-h_{22})*h_{11}\)
\( + \delta_2*\varepsilon*W_{322}*h_{22}*(1-h_{22})*h_{11}\)
\(= 6.5968 \times 10^{-5}\)
Try yourself for the remaining weights and biases..!
0.497
0.502
1.558
1.568
0.516
0.064
0.589
(\([y_1,y_2]=[1, 0]\))
\(x_1\)
\(x_2\)
\(x_3\)
\(a_{11}\)
\(h_{11}\)
\(a_{12}\)
\(h_{12}\)
\(a_{13}\)
\(h_{13}\)
\(a_{23}\)
\(h_{23}\)
\(a_{21}\)
\(h_{21}\)
\(a_{22}\)
\(h_{22}\)
\(a_{31}\)
\(\hat y_1\)
\(a_{32}\)
\(\hat y_2\)
\(W_{111}\)
\(W_{112}\)
\(W_{113}\)
\(W_{121}\)
\(W_{122}\)
\(W_{123}\)
\(W_{131}\)
\(W_{132}\)
\(W_{133}\)
\(W_{211}\)
\(W_{212}\)
\(W_{213}\)
\(W_{221}\)
\(W_{222}\)
\(W_{223}\)
\(W_{231}\)
\(W_{232}\)
\(W_{233}\)
\(b_{11}\)
\(b_{12}\)
\(b_{13}\)
\(b_{21}\)
\(b_{22}\)
\(b_{23}\)
\(W_{311}\)
\(W_{312}\)
\(W_{321}\)
\(W_{322}\)
\(W_{331}\)
\(W_{332}\)
\(b_{31}\)
\(b_{32}\)
\(\mathscr {L}(\theta)\)
Updates for \(W_1\) & \(b_1\)
\( \frac {\partial h_{23}}{\partial a_{23}} \)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{111}} = ?\)
\(x_1\)
\(a_{11}\)
\(h_{11}\)
\(a_{23}\)
\(h_{23}\)
\(a_{21}\)
\(h_{21}\)
\(a_{22}\)
\(h_{22}\)
\(a_{31}\)
\(\hat y_1\)
\(a_{32}\)
\(\hat y_2\)
\(W_{111}\)
\(W_{112}\)
\(W_{113}\)
\(W_{211}\)
\(W_{212}\)
\(W_{213}\)
\(W_{311}\)
\(W_{312}\)
\(W_{321}\)
\(W_{322}\)
\(W_{331}\)
\(W_{332}\)
\(\mathscr {L}(\theta)\)
Updates for \(W_1\) & \(b_1\)
\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{111}} = 3.285 \times 10^{-6}\)
\(x_2\)
\(x_3\)
\(W_{112}\)
\(W_{113}\)
\(W_{121}\)
\(W_{122}\)
\(W_{123}\)
\(W_{131}\)
\(W_{132}\)
\(W_{133}\)
\(b_{11}\)
\(b_{12}\)
\(b_{13}\)
\(a_{12}\)
\(h_{12}\)
\(a_{13}\)
\(h_{13}\)
\(W_{221}\)
\(W_{222}\)
\(W_{223}\)
\(W_{231}\)
\(W_{232}\)
\(W_{233}\)
\(b_{21}\)
\(b_{22}\)
\(b_{23}\)
\(b_{31}\)
\(b_{32}\)
\(x_1\)
\(x_2\)
\(x_3\)
\(h_2\)
\(a_3\)
\(b_1\) =
\(b_2\) =
\(b_3 =\)
\(a_2\)
\(h_1\)
\(a_1\)
1.5
2.5
3
0.359
0.369
0.379
0.588
0.591
0.593
0.054
0.064
0.074
0.513
0.516
0.518
1.562
1.563
0.499
0.5002
\(h_3\)
\(\mathscr {L}(\theta) = 0.6936\)
Loss value computed after updates
(Initial \(\mathscr L (\theta)\) =0.6981)
By Amrutha