\(x_1\)

\(x_2\)

\(x_3\)

\(h_2\)

\(a_3\)

\(b_2\) = [0.01,0.02,0.03]

\(b_3\) = [0.01,0.02]

\(W_2\)

\(W_3\)

\(W_1=\)

\(a_2\)

\(h_1\)

\(a_1\)

1.5

2.5

3

\begin{bmatrix} 0.05 & 0.05 & 0.05 \\ 0.05 & 0.05 & 0.05 \\ 0.05 & 0.05 & 0.05 \\ \end{bmatrix}

\(W_2=\)

\begin{bmatrix} 0.025 & 0.025 & 0.025 \\ 0.025 & 0.025 & 0.025 \\ 0.025 & 0.025 & 0.025 \\ \end{bmatrix}

\(W_3=\)

\begin{bmatrix} 1 & 1\\ 1 & 1\\ 1 & 1\\ \end{bmatrix}

0.36

0.37

0.38

0.589

0.591

0.593

0.054

0.064

0.074

0.513

0.516

0.518

1.558

1.568

0.497

0.502

\(\hat y = h_3 \)

\(\mathscr {L}(\theta) = -\frac{1}{N} \sum_{i=1}^N (y_ilog(\hat y_i)+(1-y_i)log(1- \hat y_i))\) = 0.6981

\(W_1\)

An Example for Backpropagation

"Forward Pass"

\(x=[1.5, 2.5, 3]\)

\(b_1\) = [0.01,0.02,0.03]

\([h_1]=sigmoid(a_1)\)

\([h_2]=sigmoid(a_2)\)

\([h_3]=softmax(a_3)\)

\([a_1]=[1.5,2.5,3]*\)

\begin{bmatrix} 0.05 & 0.05 & 0.05 \\ 0.05 & 0.05 & 0.05 \\ 0.05 & 0.05 & 0.05 \\ \end{bmatrix}

\(+ [0.01,0.02,0.03]\)

\([a_2]=[0.589,0.591,0.593]*\)

\begin{bmatrix} 0.025 & 0.025 & 0.025 \\ 0.025 & 0.025 & 0.025 \\ 0.025 & 0.025 & 0.025 \\ \end{bmatrix}

\(+ [0.01,0.02,0.03]\)

\([a_3]=[0.513,0.516,0.518]*\)

\begin{bmatrix} 1 & 1 \\ 1 & 1 \\ 1 & 1 \\ \end{bmatrix}

\(+ [0.01,0.02]\)

\(y=[1, 0]\)

"Binary Cross Entropy Loss"

\(x_1\)

\(x_2\)

\(x_3\)

\(a_{11}\)

\(h_{11}\)

\(a_{12}\)

\(h_{12}\)

\(a_{13}\)

\(h_{13}\)

\(a_{23}\)

\(h_{23}\)

\(a_{21}\)

\(h_{21}\)

\(a_{22}\)

\(h_{22}\)

\(a_{31}\)

\(\hat y_1\)

\(a_{32}\)

\(\hat y_2\)

\(W_{111}\)

\(W_{112}\)

\(W_{113}\)

\(W_{121}\)

\(W_{122}\)

\(W_{123}\)

\(W_{131}\)

\(W_{132}\)

\(W_{133}\)

\(W_{211}\)

\(W_{212}\)

\(W_{213}\)

\(W_{221}\)

\(W_{222}\)

\(W_{223}\)

\(W_{231}\)

\(W_{232}\)

\(W_{233}\)

\(b_{11}\)

\(b_{12}\)

\(b_{13}\)

\(b_{21}\)

\(b_{22}\)

\(b_{23}\)

\(W_{311}\)

\(W_{312}\)

\(W_{321}\)

\(W_{322}\)

\(W_{331}\)

\(W_{332}\)

\(b_{31}\)

\(b_{32}\)

\(\mathscr {L}(\theta)\)

"Backward Pass" Computing updates

\(x_1\)

\(x_2\)

\(x_3\)

\(a_{11}\)

\(h_{11}\)

\(a_{12}\)

\(h_{12}\)

\(a_{13}\)

\(h_{13}\)

\(a_{23}\)

\(h_{23}\)

\(a_{21}\)

\(h_{21}\)

\(a_{22}\)

\(h_{22}\)

\(a_{31}\)

\(\hat y_1\)

\(a_{32}\)

\(\hat y_2\)

\(W_{111}\)

\(W_{112}\)

\(W_{113}\)

\(W_{121}\)

\(W_{122}\)

\(W_{123}\)

\(W_{131}\)

\(W_{132}\)

\(W_{133}\)

\(W_{211}\)

\(W_{212}\)

\(W_{213}\)

\(W_{221}\)

\(W_{222}\)

\(W_{223}\)

\(W_{231}\)

\(W_{232}\)

\(W_{233}\)

\(b_{11}\)

\(b_{12}\)

\(b_{13}\)

\(b_{21}\)

\(b_{22}\)

\(b_{23}\)

\(W_{311}\)

\(W_{312}\)

\(W_{321}\)

\(W_{322}\)

\(W_{331}\)

\(W_{332}\)

\(b_{31}\)

\(b_{32}\)

\(\mathscr {L}(\theta)\)

Updates for weights \(W_3\) and biases \(b_3\)

\(x_1\)

\(x_2\)

\(x_3\)

\(a_{11}\)

\(h_{11}\)

\(a_{12}\)

\(h_{12}\)

\(a_{13}\)

\(h_{13}\)

\(a_{23}\)

\(h_{23}\)

\(a_{21}\)

\(h_{21}\)

\(a_{22}\)

\(h_{22}\)

\(a_{31}\)

\(\hat y_1\)

\(a_{32}\)

\(\hat y_2\)

\(W_{111}\)

\(W_{112}\)

\(W_{113}\)

\(W_{121}\)

\(W_{122}\)

\(W_{123}\)

\(W_{131}\)

\(W_{132}\)

\(W_{133}\)

\(W_{211}\)

\(W_{212}\)

\(W_{213}\)

\(W_{221}\)

\(W_{222}\)

\(W_{223}\)

\(W_{231}\)

\(W_{232}\)

\(W_{233}\)

\(b_{11}\)

\(b_{12}\)

\(b_{13}\)

\(b_{21}\)

\(b_{22}\)

\(b_{23}\)

\(W_{311}\)

\(W_{312}\)

\(W_{321}\)

\(W_{322}\)

\(W_{331}\)

\(W_{332}\)

\(b_{31}\)

\(b_{32}\)

\(\mathscr {L}(\theta)\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{311}} = ?\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \)

\( \frac {\partial \hat y_1}{\partial a_{31}} \)

\( \frac {\partial a_{31}}{\partial W_{311}} \)

\(=\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \frac {\partial \hat y_1}{\partial a_{31}} \frac {\partial a_{31}}{\partial W_{311}} \)  

Updates for \(W_3\) & \(b_3\)

\(a_{23}\)

\(h_{23}\)

\(a_{21}\)

\(h_{21}\)

\(a_{22}\)

\(h_{22}\)

\(a_{31}\)

\(\hat y_1\)

\(a_{32}\)

\(\hat y_2\)

\(W_{311}\)

\(W_{312}\)

\(W_{321}\)

\(W_{322}\)

\(W_{331}\)

\(W_{332}\)

\(b_{31}\)

\(b_{32}\)

\(\mathscr {L}(\theta)\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{311}} = ?\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \)

\( \frac {\partial \hat y_1}{\partial a_{31}} \)

\( \frac {\partial a_{31}}{\partial W_{311}} \)

\(=\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \frac {\partial \hat y_1}{\partial a_{31}} \frac {\partial a_{31}}{\partial W_{311}} \)  

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \)

\( \frac {\partial \hat y_1}{\partial a_{31}} \)

\( \frac {\partial a_{31}}{\partial W_{311}} \)

To find:

\(\mathscr {L}(\theta) = -\frac{1}{N} \sum_{i=1}^N (y_ilog(\hat y_i)+(1-y_i)log(1- \hat y_i))\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} = - \frac {1}{2} \big ( \frac {y_1}{\hat y_1} - \frac {1-y_1}{1-\hat y_1} \big) = \delta_1\)

\( \frac {\partial \hat y_1}{\partial a_{31}} = \frac {\partial }{\partial a_{31}} softmax (a_{31})\)

\( = \frac {e^{a_{31}}* \space e^{a_{32}}}{(e^{a_{31}}+e^{a_{32}})^2} = \varepsilon\)

\( \frac {\partial a_{31}}{\partial W_{311}} = \frac {\partial}{\partial W_{311}} \big[ h_{21}W_{311} + h_{22}W_{321} + h_{23}W_{331} + b_{31} \big ]\)

\( = h_{21}\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{311}} = \delta_1*\varepsilon*h_{21}\)

\(= -\frac{1}{2} \bigg ( (y_1*log(\hat y_1)) + ((1-y_1)*log(1-\hat y_1) )+ (y_2*log(\hat y_2)) + ((1-y_2)*log(1-\hat y_2)) \bigg )\)

\(a_{23}\)

\(h_{23}\)

\(a_{21}\)

\(h_{21}\)

\(a_{22}\)

\(h_{22}\)

\(a_{31}\)

\(\hat y_1\)

\(a_{32}\)

\(\hat y_2\)

\(W_{311}\)

\(W_{312}\)

\(W_{321}\)

\(W_{322}\)

\(W_{331}\)

\(W_{332}\)

\(b_{31}\)

\(b_{32}\)

\(\mathscr {L}(\theta)\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{311}} = ?\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \)

\( \frac {\partial \hat y_1}{\partial a_{31}} \)

\( \frac {\partial a_{31}}{\partial W_{311}} \)

\(=\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \frac {\partial \hat y_1}{\partial a_{31}} \frac {\partial a_{31}}{\partial W_{311}} \)  

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \)

\( \frac {\partial \hat y_1}{\partial a_{31}} \)

\( \frac {\partial a_{31}}{\partial W_{311}} \)

\(\mathscr {L}(\theta) = -\frac{1}{N} \sum_{i=1}^N (y_ilog(\hat y_i)+(1-y_i)log(1- \hat y_i))\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} = - \frac {1}{2} \big ( \frac {y_1}{\hat y_1} - \frac {1-y_1}{1-\hat y_1} \big) = \delta_1\)

\( \frac {\partial \hat y_1}{\partial a_{31}} = \frac {\partial }{\partial a_{31}} softmax (a_{31})\)

\( = \frac {e^{a_{31}}* \space e^{a_{32}}}{(e^{a_{31}}+e^{a_{32}})^2} = \varepsilon\)

To find:

\( \frac {\partial a_{31}}{\partial W_{311}} = \frac {\partial}{\partial W_{311}} \big[ h_{21}W_{311} + h_{22}W_{321} + h_{23}W_{331} + b_{31} \big ]\)

\( = h_{21}\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{311}} = \delta_1*\varepsilon*h_{21}\)

0.497

0.502

\(= -1.006\)

1.558

1.568

\(= 0.2499\)

\(= 0.513\)

0.513

\(= -0.1289\)

(\([y_1,y_2]=[1, 0]\))

\(= -\frac{1}{2} \bigg ( (y_1*log(\hat y_1)) + ((1-y_1)*log(1-\hat y_1) )+ (y_2*log(\hat y_2)) + ((1-y_2)*log(1-\hat y_2)) \bigg )\)

\(x_1\)

\(x_2\)

\(x_3\)

\(a_{11}\)

\(h_{11}\)

\(a_{12}\)

\(h_{12}\)

\(a_{13}\)

\(h_{13}\)

\(a_{23}\)

\(h_{23}\)

\(a_{21}\)

\(h_{21}\)

\(a_{22}\)

\(h_{22}\)

\(a_{31}\)

\(\hat y_1\)

\(a_{32}\)

\(\hat y_2\)

\(W_{111}\)

\(W_{112}\)

\(W_{113}\)

\(W_{121}\)

\(W_{122}\)

\(W_{123}\)

\(W_{131}\)

\(W_{132}\)

\(W_{133}\)

\(W_{211}\)

\(W_{212}\)

\(W_{213}\)

\(W_{221}\)

\(W_{222}\)

\(W_{223}\)

\(W_{231}\)

\(W_{232}\)

\(W_{233}\)

\(b_{11}\)

\(b_{12}\)

\(b_{13}\)

\(b_{21}\)

\(b_{22}\)

\(b_{23}\)

\(W_{311}\)

\(W_{312}\)

\(W_{321}\)

\(W_{322}\)

\(W_{331}\)

\(W_{332}\)

\(b_{31}\)

\(b_{32}\)

\(\mathscr {L}(\theta)\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{312}} = ?\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \)

\( \frac {\partial \hat y_2}{\partial a_{32}} \)

\( \frac {\partial a_{32}}{\partial W_{312}} \)

\(=\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \frac {\partial \hat y_2}{\partial a_{32}} \frac {\partial a_{32}}{\partial W_{312}} \)  

Updates for \(W_3\) & \(b_3\)

\(a_{23}\)

\(h_{23}\)

\(a_{21}\)

\(h_{21}\)

\(a_{22}\)

\(h_{22}\)

\(a_{31}\)

\(\hat y_1\)

\(a_{32}\)

\(\hat y_2\)

\(W_{311}\)

\(W_{312}\)

\(W_{321}\)

\(W_{322}\)

\(W_{331}\)

\(W_{332}\)

\(b_{31}\)

\(b_{32}\)

\(\mathscr {L}(\theta)\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{312}} = ?\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \)

\( \frac {\partial \hat y_2}{\partial a_{32}} \)

\( \frac {\partial a_{32}}{\partial W_{312}} \)

\(=\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \frac {\partial \hat y_2}{\partial a_{32}} \frac {\partial a_{32}}{\partial W_{312}} \)  

\(\mathscr {L}(\theta) = -\frac{1}{N} \sum_{i=1}^N (y_ilog(\hat y_i)+(1-y_i)log(1- \hat y_i))\)

\(= -\frac{1}{2} \bigg ( (y_1*log(\hat y_1)) + ((1-y_1)*log(1-\hat y_1) )+ (y_2*log(\hat y_2)) + ((1-y_2)*log(1-\hat y_2)) \bigg )\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} = - \frac {1}{2} \big ( \frac {y_2}{\hat y_2} - \frac {1-y_2}{1-\hat y_2} \big) = \delta_2\)

\( \frac {\partial \hat y_1}{\partial a_{32}} = \frac {\partial }{\partial a_{32}} softmax (a_{32})\)

\( = \frac {e^{a_{31}}* \space e^{a_{32}}}{(e^{a_{31}}+e^{a_{32}})^2} = \varepsilon\)

\( \frac {\partial a_{32}}{\partial W_{312}} = \frac {\partial}{\partial W_{312}} \big[ h_{21}W_{312} + h_{22}W_{322} + h_{23}W_{332} + b_{32} \big ]\)

\( = h_{21}\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{312}} = \delta_2*\varepsilon*h_{21}\)

To find:

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \)

\( \frac {\partial \hat y_2}{\partial a_{32}} \)

\( \frac {\partial a_{32}}{\partial W_{312}} \)

\(a_{23}\)

\(h_{23}\)

\(a_{21}\)

\(h_{21}\)

\(a_{22}\)

\(h_{22}\)

\(a_{31}\)

\(\hat y_1\)

\(a_{32}\)

\(\hat y_2\)

\(W_{311}\)

\(W_{312}\)

\(W_{321}\)

\(W_{322}\)

\(W_{331}\)

\(W_{332}\)

\(b_{31}\)

\(b_{32}\)

\(\mathscr {L}(\theta)\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{312}} = ?\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \)

\( \frac {\partial \hat y_2}{\partial a_{32}} \)

\( \frac {\partial a_{32}}{\partial W_{312}} \)

\(=\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \frac {\partial \hat y_2}{\partial a_{32}} \frac {\partial a_{32}}{\partial W_{312}} \)  

To find:

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \)

\( \frac {\partial \hat y_2}{\partial a_{32}} \)

\( \frac {\partial a_{32}}{\partial W_{312}} \)

0.497

0.502

\(= 1.004\)

1.558

1.568

\(= 0.2499\)

\(= 0.513\)

0.513

\(= 0.1287\)

(\([y_1,y_2]=[1, 0]\))

\(\mathscr {L}(\theta) = -\frac{1}{N} \sum_{i=1}^N (y_ilog(\hat y_i)+(1-y_i)log(1- \hat y_i))\)

\(= -\frac{1}{2} \bigg ( (y_1*log(\hat y_1)) + ((1-y_1)*log(1-\hat y_1) )+ (y_2*log(\hat y_2)) + ((1-y_2)*log(1-\hat y_2)) \bigg )\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} = - \frac {1}{2} \big ( \frac {y_2}{\hat y_2} - \frac {1-y_2}{1-\hat y_2} \big) = \delta_2\)

\( \frac {\partial \hat y_1}{\partial a_{32}} = \frac {\partial }{\partial a_{32}} softmax (a_{32})\)

\( = \frac {e^{a_{31}}* \space e^{a_{32}}}{(e^{a_{31}}+e^{a_{32}})^2} = \varepsilon\)

\( \frac {\partial a_{32}}{\partial W_{312}} = \frac {\partial}{\partial W_{312}} \big[ h_{21}W_{312} + h_{22}W_{322} + h_{23}W_{332} + b_{32} \big ]\)

\( = h_{21}\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{312}} = \delta_2*\varepsilon*h_{21}\)

\(x_1\)

\(x_2\)

\(x_3\)

\(a_{11}\)

\(h_{11}\)

\(a_{12}\)

\(h_{12}\)

\(a_{13}\)

\(h_{13}\)

\(a_{23}\)

\(h_{23}\)

\(a_{21}\)

\(h_{21}\)

\(a_{22}\)

\(h_{22}\)

\(a_{31}\)

\(\hat y_1\)

\(a_{32}\)

\(\hat y_2\)

\(W_{111}\)

\(W_{112}\)

\(W_{113}\)

\(W_{121}\)

\(W_{122}\)

\(W_{123}\)

\(W_{131}\)

\(W_{132}\)

\(W_{133}\)

\(W_{211}\)

\(W_{212}\)

\(W_{213}\)

\(W_{221}\)

\(W_{222}\)

\(W_{223}\)

\(W_{231}\)

\(W_{232}\)

\(W_{233}\)

\(b_{11}\)

\(b_{12}\)

\(b_{13}\)

\(b_{21}\)

\(b_{22}\)

\(b_{23}\)

\(W_{311}\)

\(W_{312}\)

\(W_{321}\)

\(W_{322}\)

\(W_{331}\)

\(W_{332}\)

\(b_{31}\)

\(b_{32}\)

\(\mathscr {L}(\theta)\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{321}} = ?\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \)

\( \frac {\partial \hat y_1}{\partial a_{31}} \)

\( \frac {\partial a_{31}}{\partial W_{321}} \)

\(=\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \frac {\partial \hat y_1}{\partial a_{31}} \frac {\partial a_{31}}{\partial W_{321}} \)  

Updates for \(W_3\) & \(b_3\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{321}} = \delta_1*\varepsilon*h_{22}\)

\(=- 0.1297\)

0.497

0.502

1.558

1.568

(\([y_1,y_2]=[1, 0]\))

0.513

\(x_1\)

\(x_2\)

\(x_3\)

\(a_{11}\)

\(h_{11}\)

\(a_{12}\)

\(h_{12}\)

\(a_{13}\)

\(h_{13}\)

\(a_{23}\)

\(h_{23}\)

\(a_{21}\)

\(h_{21}\)

\(a_{22}\)

\(h_{22}\)

\(a_{31}\)

\(\hat y_1\)

\(a_{32}\)

\(\hat y_2\)

\(W_{111}\)

\(W_{112}\)

\(W_{113}\)

\(W_{121}\)

\(W_{122}\)

\(W_{123}\)

\(W_{131}\)

\(W_{132}\)

\(W_{133}\)

\(W_{211}\)

\(W_{212}\)

\(W_{213}\)

\(W_{221}\)

\(W_{222}\)

\(W_{223}\)

\(W_{231}\)

\(W_{232}\)

\(W_{233}\)

\(b_{11}\)

\(b_{12}\)

\(b_{13}\)

\(b_{21}\)

\(b_{22}\)

\(b_{23}\)

\(W_{311}\)

\(W_{312}\)

\(W_{321}\)

\(W_{322}\)

\(W_{331}\)

\(W_{332}\)

\(b_{31}\)

\(b_{32}\)

\(\mathscr {L}(\theta)\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{322}} = ?\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \)

\( \frac {\partial \hat y_2}{\partial a_{32}} \)

\( \frac {\partial a_{32}}{\partial W_{322}} \)

\(=\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \frac {\partial \hat y_2}{\partial a_{32}} \frac {\partial a_{32}}{\partial W_{322}} \)  

Updates for \(W_3\) & \(b_3\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{322}} = \delta_2*\varepsilon*h_{22}\)

\(= 0.1294\)

0.497

0.502

1.558

1.568

(\([y_1,y_2]=[1, 0]\))

0.516

\(x_1\)

\(x_2\)

\(x_3\)

\(a_{11}\)

\(h_{11}\)

\(a_{12}\)

\(h_{12}\)

\(a_{13}\)

\(h_{13}\)

\(a_{23}\)

\(h_{23}\)

\(a_{21}\)

\(h_{21}\)

\(a_{22}\)

\(h_{22}\)

\(a_{31}\)

\(\hat y_1\)

\(a_{32}\)

\(\hat y_2\)

\(W_{111}\)

\(W_{112}\)

\(W_{113}\)

\(W_{121}\)

\(W_{122}\)

\(W_{123}\)

\(W_{131}\)

\(W_{132}\)

\(W_{133}\)

\(W_{211}\)

\(W_{212}\)

\(W_{213}\)

\(W_{221}\)

\(W_{222}\)

\(W_{223}\)

\(W_{231}\)

\(W_{232}\)

\(W_{233}\)

\(b_{11}\)

\(b_{12}\)

\(b_{13}\)

\(b_{21}\)

\(b_{22}\)

\(b_{23}\)

\(W_{311}\)

\(W_{312}\)

\(W_{321}\)

\(W_{322}\)

\(W_{331}\)

\(W_{332}\)

\(b_{31}\)

\(b_{32}\)

\(\mathscr {L}(\theta)\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{331}} = ?\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \)

\( \frac {\partial \hat y_1}{\partial a_{31}} \)

\( \frac {\partial a_{31}}{\partial W_{331}} \)

\(=\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \frac {\partial \hat y_1}{\partial a_{31}} \frac {\partial a_{31}}{\partial W_{331}} \)  

Updates for \(W_3\) & \(b_3\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{331}} = \delta_1*\varepsilon*h_{23}\)

\(= -0.1302\)

0.497

0.502

1.558

1.568

(\([y_1,y_2]=[1, 0]\))

0.518

\(x_1\)

\(x_2\)

\(x_3\)

\(a_{11}\)

\(h_{11}\)

\(a_{12}\)

\(h_{12}\)

\(a_{13}\)

\(h_{13}\)

\(a_{23}\)

\(h_{23}\)

\(a_{21}\)

\(h_{21}\)

\(a_{22}\)

\(h_{22}\)

\(a_{31}\)

\(\hat y_1\)

\(a_{32}\)

\(\hat y_2\)

\(W_{111}\)

\(W_{112}\)

\(W_{113}\)

\(W_{121}\)

\(W_{122}\)

\(W_{123}\)

\(W_{131}\)

\(W_{132}\)

\(W_{133}\)

\(W_{211}\)

\(W_{212}\)

\(W_{213}\)

\(W_{221}\)

\(W_{222}\)

\(W_{223}\)

\(W_{231}\)

\(W_{232}\)

\(W_{233}\)

\(b_{11}\)

\(b_{12}\)

\(b_{13}\)

\(b_{21}\)

\(b_{22}\)

\(b_{23}\)

\(W_{311}\)

\(W_{312}\)

\(W_{321}\)

\(W_{322}\)

\(W_{331}\)

\(W_{332}\)

\(b_{31}\)

\(b_{32}\)

\(\mathscr {L}(\theta)\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{332}} = ?\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \)

\( \frac {\partial \hat y_2}{\partial a_{32}} \)

\( \frac {\partial a_{32}}{\partial W_{332}} \)

\(=\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \frac {\partial \hat y_2}{\partial a_{32}} \frac {\partial a_{32}}{\partial W_{332}} \)  

Updates for \(W_3\) & \(b_3\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{332}} = \delta_2*\varepsilon*h_{23}\)

\(= 0.1299\)

0.497

0.502

1.558

1.568

(\([y_1,y_2]=[1, 0]\))

0.518

\(x_1\)

\(x_2\)

\(x_3\)

\(a_{11}\)

\(h_{11}\)

\(a_{12}\)

\(h_{12}\)

\(a_{13}\)

\(h_{13}\)

\(a_{23}\)

\(h_{23}\)

\(a_{21}\)

\(h_{21}\)

\(a_{22}\)

\(h_{22}\)

\(a_{31}\)

\(\hat y_1\)

\(a_{32}\)

\(\hat y_2\)

\(W_{111}\)

\(W_{112}\)

\(W_{113}\)

\(W_{121}\)

\(W_{122}\)

\(W_{123}\)

\(W_{131}\)

\(W_{132}\)

\(W_{133}\)

\(W_{211}\)

\(W_{212}\)

\(W_{213}\)

\(W_{221}\)

\(W_{222}\)

\(W_{223}\)

\(W_{231}\)

\(W_{232}\)

\(W_{233}\)

\(b_{11}\)

\(b_{12}\)

\(b_{13}\)

\(b_{21}\)

\(b_{22}\)

\(b_{23}\)

\(W_{311}\)

\(W_{312}\)

\(W_{321}\)

\(W_{322}\)

\(W_{331}\)

\(W_{332}\)

\(b_{31}\)

\(b_{32}\)

\(\mathscr {L}(\theta)\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial b_{31}} = ?\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \)

\( \frac {\partial \hat y_1}{\partial a_{31}} \)

\( \frac {\partial a_{31}}{\partial b_{31}} \)

\(=\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \frac {\partial \hat y_1}{\partial a_{31}} \frac {\partial a_{31}}{\partial b_{31}} \)  

Updates for \(W_3\) & \(b_3\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial b_{31}} = \delta_1*\varepsilon\)

\(= - 0.2513\)

0.497

0.502

1.558

1.568

(\([y_1,y_2]=[1, 0]\))

\(x_1\)

\(x_2\)

\(x_3\)

\(a_{11}\)

\(h_{11}\)

\(a_{12}\)

\(h_{12}\)

\(a_{13}\)

\(h_{13}\)

\(a_{23}\)

\(h_{23}\)

\(a_{21}\)

\(h_{21}\)

\(a_{22}\)

\(h_{22}\)

\(a_{31}\)

\(\hat y_1\)

\(a_{32}\)

\(\hat y_2\)

\(W_{111}\)

\(W_{112}\)

\(W_{113}\)

\(W_{121}\)

\(W_{122}\)

\(W_{123}\)

\(W_{131}\)

\(W_{132}\)

\(W_{133}\)

\(W_{211}\)

\(W_{212}\)

\(W_{213}\)

\(W_{221}\)

\(W_{222}\)

\(W_{223}\)

\(W_{231}\)

\(W_{232}\)

\(W_{233}\)

\(b_{11}\)

\(b_{12}\)

\(b_{13}\)

\(b_{21}\)

\(b_{22}\)

\(b_{23}\)

\(W_{311}\)

\(W_{312}\)

\(W_{321}\)

\(W_{322}\)

\(W_{331}\)

\(W_{332}\)

\(b_{31}\)

\(b_{32}\)

\(\mathscr {L}(\theta)\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial b_{32}} = ?\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \)

\( \frac {\partial \hat y_2}{\partial a_{32}} \)

\( \frac {\partial a_{32}}{\partial b_{32}} \)

\(=\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \frac {\partial \hat y_2}{\partial a_{32}} \frac {\partial a_{32}}{\partial b_{32}} \)  

Updates for \(W_3\) & \(b_3\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial b_{32}} = \delta_2*\varepsilon\)

\(= 0.2508\)

0.497

0.502

1.558

1.568

(\([y_1,y_2]=[1, 0]\))

\(a_{23}\)

\(h_{23}\)

\(a_{21}\)

\(h_{21}\)

\(a_{22}\)

\(h_{22}\)

\(a_{31}\)

\(\hat y_1\)

\(a_{32}\)

\(\hat y_2\)

\(W_{311}\)

\(W_{312}\)

\(W_{321}\)

\(W_{322}\)

\(W_{331}\)

\(W_{332}\)

\(b_{31}\)

\(b_{32}\)

\(W_3=\)

\begin{bmatrix} W_{311} & W_{312} \\ W_{321} & W_{322} \\ W_{331} & W_{332} \\ \end{bmatrix}

\(W_{3new}=\)

\(W_{3new}=\)

Let \(\eta = 0.01\)

\begin{bmatrix} 1.001289 & 0.998713 \\ 1.001297 & 0.998706 \\ 1.001302 & 0.998701 \\ \end{bmatrix}

\(b_3=\)

\begin{bmatrix} b_{31} \\ b_{32} \\ \end{bmatrix}

\(b_{3new}=\)

\(b_{3new}=\)

\begin{bmatrix} 0.012513 \\ 0.017492 \\ \end{bmatrix}

Updated matrices of \(W_3\) & \(b_3\)

- \eta * \begin{bmatrix} \delta_1*\varepsilon*h_{21} & \delta_2*\varepsilon*h_{21} \\ \delta_1*\varepsilon*h_{22} & \delta_2*\varepsilon*h_{22} \\ \delta_1*\varepsilon*h_{23} & \delta_2*\varepsilon*h_{23} \\ \end{bmatrix}
\begin{bmatrix} W_{311} & W_{312} \\ W_{321} & W_{322} \\ W_{331} & W_{332} \\ \end{bmatrix}
\begin{bmatrix} b_{31} \\ b_{32} \\ \end{bmatrix}
- \eta * \begin{bmatrix} \delta_1*\varepsilon \\ \delta_2*\varepsilon \\ \end{bmatrix}

\(x_1\)

\(x_2\)

\(x_3\)

\(a_{11}\)

\(h_{11}\)

\(a_{12}\)

\(h_{12}\)

\(a_{13}\)

\(h_{13}\)

\(a_{23}\)

\(h_{23}\)

\(a_{21}\)

\(h_{21}\)

\(a_{22}\)

\(h_{22}\)

\(a_{31}\)

\(\hat y_1\)

\(a_{32}\)

\(\hat y_2\)

\(W_{111}\)

\(W_{112}\)

\(W_{113}\)

\(W_{121}\)

\(W_{122}\)

\(W_{123}\)

\(W_{131}\)

\(W_{132}\)

\(W_{133}\)

\(W_{211}\)

\(W_{212}\)

\(W_{213}\)

\(W_{221}\)

\(W_{222}\)

\(W_{223}\)

\(W_{231}\)

\(W_{232}\)

\(W_{233}\)

\(b_{11}\)

\(b_{12}\)

\(b_{13}\)

\(b_{21}\)

\(b_{22}\)

\(b_{23}\)

\(W_{311}\)

\(W_{312}\)

\(W_{321}\)

\(W_{322}\)

\(W_{331}\)

\(W_{332}\)

\(b_{31}\)

\(b_{32}\)

\(\mathscr {L}(\theta)\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \)

\( \frac {\partial \hat y_1}{\partial a_{31}} \)

\( \frac {\partial a_{31}}{\partial h_{21}} \)

= \(\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \frac {\partial \hat y_1}{\partial a_{31}} \frac {\partial a_{31}}{\partial h_{21}} \frac {\partial h_{21}}{\partial a_{21}} \frac {\partial a_{21}}{\partial W_{211}} \)  

Updates for \(W_2\) & \(b_2\)

\( \frac {\partial h_{21}}{\partial a_{21}} \)

\( \frac {\partial a_{21}}{\partial W_{211}} \)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{211}} = ?\)

+ \(\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \frac {\partial \hat y_2}{\partial a_{32}} \frac {\partial a_{32}}{\partial h_{21}} \frac {\partial h_{21}}{\partial a_{21}} \frac {\partial a_{21}}{\partial W_{211}} \)  

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \)

\( \frac {\partial \hat y_2}{\partial a_{32}} \)

\( \frac {\partial a_{32}}{\partial h_{21}} \)

\(a_{11}\)

\(h_{11}\)

\(a_{21}\)

\(h_{21}\)

\(a_{31}\)

\(\hat y_1\)

\(a_{32}\)

\(\hat y_2\)

\(W_{211}\)

\(W_{212}\)

\(W_{213}\)

\(W_{311}\)

\(W_{312}\)

\(\mathscr {L}(\theta)\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \)

\( \frac {\partial \hat y_1}{\partial a_{31}} \)

\( \frac {\partial a_{31}}{\partial h_{21}} \)

= \(\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \frac {\partial \hat y_1}{\partial a_{31}} \frac {\partial a_{31}}{\partial h_{21}} \frac {\partial h_{21}}{\partial a_{21}} \frac {\partial a_{21}}{\partial W_{211}} \)  

\( \frac {\partial a_{21}}{\partial W_{211}} \)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{211}}\)

+ \(\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \frac {\partial \hat y_2}{\partial a_{32}} \frac {\partial a_{32}}{\partial h_{21}} \frac {\partial h_{21}}{\partial a_{21}} \frac {\partial a_{21}}{\partial W_{211}} \)  

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \)

\( \frac {\partial \hat y_2}{\partial a_{32}} \)

\( \frac {\partial h_{21}}{\partial a_{21}} \)

\( \frac {\partial h_{21}}{\partial a_{21}} = \frac {\partial}{\partial a_{21}} \sigma (a_{21}) = \sigma(a_{21})*(1-\sigma (a_{21}))\)

\(=h_{11}\)

\( = h_{21}*(1-h_{21})\)

\( \frac {\partial a_{21}}{\partial W_{211}} = \frac {\partial}{\partial W_{211}} \big[ h_{11}W_{211} + h_{12}W_{221} + h_{13}W_{231} + b_{21} \big ]\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{211}} = \delta_1*\varepsilon*W_{311}*h_{21}*(1-h_{21})*h_{11}\)

\( + \delta_2*\varepsilon*W_{312}*h_{21}*(1-h_{21})*h_{11}\)

\( \frac {\partial a_{31}}{\partial h_{21}} = \frac {\partial}{\partial h_{21}} \big[ h_{21}W_{311} + h_{22}W_{321} + h_{23}W_{331} + b_{31} \big ]\)

\(=  W_{311}\)

\( \frac {\partial a_{32}}{\partial h_{21}} = \frac {\partial}{\partial h_{21}} \big[ h_{21}W_{312} + h_{22}W_{322} + h_{23}W_{332} + b_{32} \big ]\)

\(=  W_{312}\)

\( \frac {\partial a_{32}}{\partial h_{21}} \)

\(a_{11}\)

\(h_{11}\)

\(a_{21}\)

\(h_{21}\)

\(a_{31}\)

\(\hat y_1\)

\(a_{32}\)

\(\hat y_2\)

\(W_{211}\)

\(W_{212}\)

\(W_{213}\)

\(W_{311}\)

\(W_{312}\)

\(\mathscr {L}(\theta)\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} =-1.004\)

\( \frac {\partial \hat y_1}{\partial a_{31}} =0.2499\)

\( \frac {\partial a_{31}}{\partial h_{21}} \)

= \(\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \frac {\partial \hat y_1}{\partial a_{31}} \frac {\partial a_{31}}{\partial h_{21}} \frac {\partial h_{21}}{\partial a_{21}} \frac {\partial a_{21}}{\partial W_{211}} \)  

\( \frac {\partial a_{21}}{\partial W_{211}} \)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{211}}\)

+ \(\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \frac {\partial \hat y_2}{\partial a_{32}} \frac {\partial a_{32}}{\partial h_{21}} \frac {\partial h_{21}}{\partial a_{21}} \frac {\partial a_{21}}{\partial W_{211}} \)  

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} =1.006\)

\( \frac {\partial \hat y_2}{\partial a_{32}} =0.2499\)

\( \frac {\partial h_{21}}{\partial a_{21}} \)

\( \frac {\partial h_{21}}{\partial a_{21}} = \frac {\partial}{\partial a_{21}} \sigma (a_{21}) = \sigma(a_{21})*(1-\sigma (a_{21}))\)

\(=h_{11}\)

\( \frac {\partial a_{31}}{\partial h_{21}} = \frac {\partial}{\partial h_{21}} \big[ h_{21}W_{311} + h_{22}W_{321} + h_{23}W_{331} + b_{31} \big ]\)

\(=  W_{311}\)

\( = h_{21}*(1-h_{21})\)

\( \frac {\partial a_{21}}{\partial W_{211}} = \frac {\partial}{\partial W_{211}} \big[ h_{11}W_{211} + h_{12}W_{221} + h_{13}W_{231} + b_{21} \big ]\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{211}} = \delta_1*\varepsilon*W_{311}*h_{21}*(1-h_{21})*h_{11}\)

\( + \delta_2*\varepsilon*W_{312}*h_{21}*(1-h_{21})*h_{11}\)

0.497

0.502

1.558

1.568

0.513

0.054

0.589

\(= 0.05\)

(\([y_1,y_2]=[1, 0]\))

\(= 0.2498\)

\(= 7.068 \times 10^{-5}\)

\( \frac {\partial a_{32}}{\partial h_{21}} \)

\( \frac {\partial a_{32}}{\partial h_{21}} = \frac {\partial}{\partial h_{21}} \big[ h_{21}W_{312} + h_{22}W_{322} + h_{23}W_{332} + b_{32} \big ]\)

\(=  W_{312}\)

\(= 0.05\)

\(= 0.589\)

\(x_1\)

\(x_2\)

\(x_3\)

\(a_{11}\)

\(h_{11}\)

\(a_{12}\)

\(h_{12}\)

\(a_{13}\)

\(h_{13}\)

\(a_{23}\)

\(h_{23}\)

\(a_{21}\)

\(h_{21}\)

\(a_{22}\)

\(h_{22}\)

\(a_{31}\)

\(\hat y_1\)

\(a_{32}\)

\(\hat y_2\)

\(W_{111}\)

\(W_{112}\)

\(W_{113}\)

\(W_{121}\)

\(W_{122}\)

\(W_{123}\)

\(W_{131}\)

\(W_{132}\)

\(W_{133}\)

\(W_{211}\)

\(W_{212}\)

\(W_{213}\)

\(W_{221}\)

\(W_{222}\)

\(W_{223}\)

\(W_{231}\)

\(W_{232}\)

\(W_{233}\)

\(b_{11}\)

\(b_{12}\)

\(b_{13}\)

\(b_{21}\)

\(b_{22}\)

\(b_{23}\)

\(W_{311}\)

\(W_{312}\)

\(W_{321}\)

\(W_{322}\)

\(W_{331}\)

\(W_{332}\)

\(b_{31}\)

\(b_{32}\)

\(\mathscr {L}(\theta)\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \)

\( \frac {\partial \hat y_1}{\partial a_{31}} \)

\( \frac {\partial a_{31}}{\partial h_{22}} \)

= \(\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \frac {\partial \hat y_1}{\partial a_{31}} \frac {\partial a_{31}}{\partial h_{22}} \frac {\partial h_{22}}{\partial a_{22}} \frac {\partial a_{22}}{\partial W_{212}} \)  

Updates for \(W_2\) & \(b_2\)

\( \frac {\partial h_{22}}{\partial a_{22}} \)

\( \frac {\partial a_{22}}{\partial W_{212}} \)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{212}} = ?\)

+ \(\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \frac {\partial \hat y_2}{\partial a_{32}} \frac {\partial a_{32}}{\partial h_{22}} \frac {\partial h_{22}}{\partial a_{22}} \frac {\partial a_{22}}{\partial W_{212}} \)  

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \)

\( \frac {\partial \hat y_2}{\partial a_{32}} \)

\( \frac {\partial a_{32}}{\partial h_{22}} \)

\(a_{11}\)

\(h_{11}\)

\(a_{22}\)

\(h_{22}\)

\(a_{31}\)

\(\hat y_1\)

\(a_{32}\)

\(\hat y_2\)

\(W_{212}\)

\(W_{321}\)

\(W_{322}\)

\(\mathscr {L}(\theta)\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \)

\( \frac {\partial \hat y_1}{\partial a_{31}} \)

\( \frac {\partial a_{31}}{\partial h_{22}} \)

= \(\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_1} \frac {\partial \hat y_1}{\partial a_{31}} \frac {\partial a_{31}}{\partial h_{22}} \frac {\partial h_{22}}{\partial a_{22}} \frac {\partial a_{22}}{\partial W_{212}} \)  

Updates for \(W_2\) & \(b_2\)

\( \frac {\partial h_{22}}{\partial a_{22}} \)

\( \frac {\partial a_{22}}{\partial W_{212}} \)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{212}} = ?\)

+ \(\frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \frac {\partial \hat y_2}{\partial a_{32}} \frac {\partial a_{32}}{\partial h_{22}} \frac {\partial h_{22}}{\partial a_{22}} \frac {\partial a_{22}}{\partial W_{212}} \)  

\( \frac {\partial \mathscr{L}(\theta)}{\partial \hat y_2} \)

\( \frac {\partial \hat y_2}{\partial a_{32}} \)

\( \frac {\partial a_{32}}{\partial h_{22}} \)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{212}} = \delta_1*\varepsilon*W_{321}*h_{22}*(1-h_{22})*h_{11}\)

\( + \delta_2*\varepsilon*W_{322}*h_{22}*(1-h_{22})*h_{11}\)

\(= 6.5968 \times 10^{-5}\)

Try yourself for the remaining weights and biases..!

\(W_{2new}=\)

\begin{bmatrix} 0.02499 & 0.02499 & 0.02499 \\ 0.02499 & 0.02499 & 0.02499 \\ 0.02499 & 0.02499 & 0.02499 \\ \end{bmatrix}

\(b_{2new}=\)

\begin{bmatrix} 0.009998 \\ 0.019998 \\ 0.029998 \\ \end{bmatrix}

0.497

0.502

1.558

1.568

0.516

0.064

0.589

(\([y_1,y_2]=[1, 0]\))

\(x_1\)

\(x_2\)

\(x_3\)

\(a_{11}\)

\(h_{11}\)

\(a_{12}\)

\(h_{12}\)

\(a_{13}\)

\(h_{13}\)

\(a_{23}\)

\(h_{23}\)

\(a_{21}\)

\(h_{21}\)

\(a_{22}\)

\(h_{22}\)

\(a_{31}\)

\(\hat y_1\)

\(a_{32}\)

\(\hat y_2\)

\(W_{111}\)

\(W_{112}\)

\(W_{113}\)

\(W_{121}\)

\(W_{122}\)

\(W_{123}\)

\(W_{131}\)

\(W_{132}\)

\(W_{133}\)

\(W_{211}\)

\(W_{212}\)

\(W_{213}\)

\(W_{221}\)

\(W_{222}\)

\(W_{223}\)

\(W_{231}\)

\(W_{232}\)

\(W_{233}\)

\(b_{11}\)

\(b_{12}\)

\(b_{13}\)

\(b_{21}\)

\(b_{22}\)

\(b_{23}\)

\(W_{311}\)

\(W_{312}\)

\(W_{321}\)

\(W_{322}\)

\(W_{331}\)

\(W_{332}\)

\(b_{31}\)

\(b_{32}\)

\(\mathscr {L}(\theta)\)

Updates for \(W_1\) & \(b_1\)

\( \frac {\partial h_{23}}{\partial a_{23}} \)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{111}} = ?\)

\(x_1\)

\(a_{11}\)

\(h_{11}\)

\(a_{23}\)

\(h_{23}\)

\(a_{21}\)

\(h_{21}\)

\(a_{22}\)

\(h_{22}\)

\(a_{31}\)

\(\hat y_1\)

\(a_{32}\)

\(\hat y_2\)

\(W_{111}\)

\(W_{112}\)

\(W_{113}\)

\(W_{211}\)

\(W_{212}\)

\(W_{213}\)

\(W_{311}\)

\(W_{312}\)

\(W_{321}\)

\(W_{322}\)

\(W_{331}\)

\(W_{332}\)

\(\mathscr {L}(\theta)\)

Updates for \(W_1\) & \(b_1\)

\( \frac {\partial \mathscr{L}(\theta)}{\partial W_{111}} = 3.285 \times 10^{-6}\)

\(x_2\)

\(x_3\)

\(W_{112}\)

\(W_{113}\)

\(W_{121}\)

\(W_{122}\)

\(W_{123}\)

\(W_{131}\)

\(W_{132}\)

\(W_{133}\)

\(b_{11}\)

\(b_{12}\)

\(b_{13}\)

\(a_{12}\)

\(h_{12}\)

\(a_{13}\)

\(h_{13}\)

\(W_{221}\)

\(W_{222}\)

\(W_{223}\)

\(W_{231}\)

\(W_{232}\)

\(W_{233}\)

\(b_{21}\)

\(b_{22}\)

\(b_{23}\)

\(b_{31}\)

\(b_{32}\)

\(x_1\)

\(x_2\)

\(x_3\)

\(h_2\)

\(a_3\)

\(b_1\) =

\(b_2\) =

\(b_3 =\) 

\(W_2\)

\(W_3\)

\(W_1=\)

\(a_2\)

\(h_1\)

\(a_1\)

1.5

2.5

3

\begin{bmatrix} 0.0499 & 0.0499 & 0.0499 \\ 0.0499 & 0.0499 & 0.0499 \\ 0.0499 & 0.0499 & 0.0499 \\ \end{bmatrix}

\(W_2=\)

\(W_3=\)

0.359

0.369

0.379

0.588

0.591

0.593

0.054

0.064

0.074

0.513

0.516

0.518

1.562

1.563

0.499

0.5002

\(h_3\)

\(\mathscr {L}(\theta) = 0.6936\)

\(W_1\)

\begin{bmatrix} 0.02499 & 0.02499 & 0.02499 \\ 0.02499 & 0.02499 & 0.02499 \\ 0.02499 & 0.02499 & 0.02499 \\ \end{bmatrix}
\begin{bmatrix} 0.009998 \\ 0.019998 \\ 0.029998 \\ \end{bmatrix}
\begin{bmatrix} 0.012513 \\ 0.017492 \\ \end{bmatrix}
\begin{bmatrix} 1.001289 & 0.998713 \\ 1.001297 & 0.998706 \\ 1.001302 & 0.998701 \\ \end{bmatrix}
\begin{bmatrix} 0.0099 \\ 0.0199 \\ 0.0299 \\ \end{bmatrix}

Loss value computed  after updates

(Initial \(\mathscr L (\theta)\) =0.6981)

Example Back propagation

By Amrutha

Example Back propagation

  • 287