This is a version of back propagation.
CC-BY E.King & J. Wilson, Linear Data
\[\langle v,w\rangle = \sum_i w_i v_i =\int v dw\]
\[y=\frac{1}{1+e^{-x}}\qquad y'=y(1-y)\]
\[\begin{aligned} f(v_1,v_2,v_3) &= y(w_1v_1+w_2v_2+w_3v_3+b)\\ &=\frac{1}{1+e^{-(w_1 v_1+w_2 v_2+w_3 v_3+b)}}\end{aligned}\]
WARNING: This function is bounded between 0 and 1. So if you want to reach a number outside of [0,1] you will need to follow this activation function by a final layer that has a linear activation function that rescales to whatever you target.
Even so, you can seek to minimize the distance from 0,1.
\[\begin{aligned} f(x_1,x_2,x_3) &= y(w_1x_1+w_2x_2+w_3 x_3)\\ &=\frac{1}{1+e^{-(w_1 x_1+w_2 x_2+w_3 x_3)}}\end{aligned}\]
What function has \(g(3,4,5)=0\)?
Could be loads! That is the point. You don't know and neither does the computer. All you told it was one point and now you want it to try and adjust one silly function \(f(x_1,x_2,x_3)\) to try and pretend to be \(g\). That this works can't depend on any deep learning. There simply isn't enough information given to "learn". .... but you and your machine could turn it into a guessing game and improve your odds.
By the way, \(g(a,b,c)=c^2-a^2-b^2\) was what I was thinking about.
\[\begin{aligned} f_w(v) &= y(wv)\\ &=\frac{1}{1+e^{-(w v) }}\end{aligned}\]
Knowing this go back and recompute you're weight.
Rate of change of L as function of \(w\)
\[\frac{d L}{dw}=\frac{dL}{d\sigma}\frac{d\sigma}{dy}\frac{dy}{dw}\]