Chris Hung, Jin Xun
Give some features
Predict another feature
(Output one scalar)
Supervised Learning Regression
Give some features
Classify it
(Output one vector)
Supervised Learning Classification
把regression重複做
每次乘上不同的w和加上不同的bias
1. Normalization
2. 放大差距(指數)
Help Escape from
Crtical Points
-Gradient(n)+movement(n-1)
想像球從高處滾下來,帶著之前的動量跑
可能可以翻出local minimum
Best Optimizer: Adam=RMSProp + Momentum
Adagrad consider all gradients before
Might not be dynamic
Original paper: https://arxiv.org/pdf/1412.6980.pdf
從頭到尾用一個learning rate, 可能前面累積的參數會影響前進
Make it easier to train
Text
No correlation
Better!!