Deep Learning Intro
Neuron
x 是輸入(input)
w是權重(weight)
b是偏差(bias)
這個線性單位(linear unit)的表示方法就是
這個線性單位(linear unit)的表示方法就是
from tensorflow import keras
from tensorflow.keras import layers
model = keras.Sequential([
layers.Dense(units=1, input_shape=[3])
])
Deep Neural Network
Activation Function
讓神經網路可以不再只有線性關係
可以進行複雜的數據轉換
Sequential Models
簡單的模型,單一輸入、單一輸出,按順序一層(Dense)一層的由上往下執行。
from tensorflow import keras
from tensorflow.keras import layers
model = keras.Sequential([
layers.Dense(units=4, activation='relu', input_shape=[2]),
layers.Dense(units=3, activation='relu'),
layers.Dense(units=1),
])
Training Deep Learning Model
Loss Function
將預測值跟實際值的差取平均
將預測值跟實際值的差平方再取平均
Backpropagation
我們只知道結果,並不知道中間的隱藏層(hidden layer)的權重(weight)對Loss的梯度,所以才要利用反向傳播搭配損失函數(loss function)來計算梯度(gradient)
不斷地連鎖律、不斷地偏微分
要講微積分要講太久了
我們先著重在如何使用上
Optimizer
調整權重來讓損失(Loss)最小
Stochastic Gradient Descent
每次選出其中的一些樣本(minibatch)來計算梯度並更新權重(weight)
model.compile(
optimizer="adam",
loss="mae",
)
Gradient Descent
梯度的方向是走向局部最大的方向,所以在梯度下降法中是往梯度的反方向走,就可以走到局部最小值。
微分
Gradient Descent
Gradient Descent
就像是丟一顆球到碗裡,球會在碗內左右振盪,隨著阻力的慢慢趨向最低點
看起來Momentum比較慢?
對每個梯度分量除以該梯度的歷史累加值
AdaGrad
跟Momentum很像,但多了摩擦力
Implement - Red Wine Quality
import pandas as pd
red_wine = pd.read_csv('winequality-red.csv')
import pandas as pd
red_wine = pd.read_csv('winequality-red.csv')
df_train = red_wine.sample(frac=0.7, random_state=0)
df_valid = red_wine.drop(df_train.index)
Normalization
Normalization
import pandas as pd
red_wine = pd.read_csv('winequality-red.csv')
df_train = red_wine.sample(frac=0.7, random_state=0)
df_valid = red_wine.drop(df_train.index)
max_ = df_train.max(axis=0)
min_ = df_train.min(axis=0)
df_train = (df_train - min_) / (max_ - min_)
df_valid = (df_valid - min_) / (max_ - min_)
import pandas as pd
red_wine = pd.read_csv('winequality-red.csv')
df_train = red_wine.sample(frac=0.7, random_state=0)
df_valid = red_wine.drop(df_train.index)
max_ = df_train.max(axis=0)
min_ = df_train.min(axis=0)
df_train = (df_train - min_) / (max_ - min_)
df_valid = (df_valid - min_) / (max_ - min_)
X_train = df_train.drop('quality', axis=1)
X_valid = df_valid.drop('quality', axis=1)
y_train = df_train['quality']
y_valid = df_valid['quality']
from tensorflow import keras
from keras import layers
model = keras.Sequential([
layers.Dense(512, activation='relu', input_shape=[11]),
layers.Dense(512, activation='relu'),
layers.Dense(512, activation='relu'),
layers.Dense(1),
])
from tensorflow import keras
from keras import layers
model = keras.Sequential([
layers.Dense(512, activation='relu', input_shape=[11]),
layers.Dense(512, activation='relu'),
layers.Dense(512, activation='relu'),
layers.Dense(1),
])
model.compile(
optimizer='adam',
loss='mae',
)
from tensorflow import keras
from keras import layers
model = keras.Sequential([
layers.Dense(512, activation='relu', input_shape=[11]),
layers.Dense(512, activation='relu'),
layers.Dense(512, activation='relu'),
layers.Dense(1),
])
model.compile(
optimizer='adam',
loss='mae',
)
history = model.fit(
X_train, y_train,
validation_data=(X_valid, y_valid),
batch_size=256,
epochs=10,
)
import matplotlib.pyplot as plt
history_df = pd.DataFrame(history.history)
plt.plot(history_df['loss'])
plt.show()
Overfitting & Underfitting
model = keras.Sequential([
layers.Dense(16, activation='relu'),
layers.Dense(1),
])
怎麼改變?
增加每層中神經元的數量
wider = keras.Sequential([
layers.Dense(32, activation='relu'),
layers.Dense(1),
])
更多層
deeper = keras.Sequential([
layers.Dense(16, activation='relu'),
layers.Dense(16, activation='relu'),
layers.Dense(1),
])
當損失(loss)開始回升時,就停止並往前找最小值
from tensorflow.keras.callbacks import EarlyStopping
early_stopping = EarlyStopping(
min_delta=0.001,
patience=20,
restore_best_weights=True,
)
history = model.fit(
X_train, y_train,
validation_data=(X_valid, y_valid),
batch_size=256,
epochs=500,
callbacks=[early_stopping],
)
在每一個 epoch 都隨機選擇一批單元,不讓他們進行前向推理和後向傳播
每次訓練迭代時,隨機選擇一批單元不參與訓練,這使得每個單元不會依賴於特定的單元,因此具有一定的獨立性,可以防止過適
keras.Sequential([
# ...
layers.Dropout(rate=0.3),
layers.Dense(16),
# ...
])
以 mini-batch 為單位,依照各個 mini-batch 來進行正規化
layers.Dense(16, activation='relu'),
layers.BatchNormalization(),
layers.Dense(16),
layers.BatchNormalization(),
layers.Activation('relu'),
Binary Classification
只有兩種答案
ex.是或否, 狗或貓, True或False
我們希望得到機率之間的距離,因此選用
交叉熵(Cross-Entropy)
用Sigmoid把數據轉換為零到一之間的數字
model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['binary_accuracy'],
)