機器學習小社-2

講師 000

你可以叫我 000 / Lucas
建國中學資訊社37th學術長
建國中學電子計算機研習社44th學術
校際交流群創群者
不會音遊不會競程不會數學的笨
資訊技能樹亂點但都一樣爛
專案爛尾大師
IZCC x SCINT x Ruby Taiwan 聯課負責人

講師章魚

建國中學電子計算機研習社44th學術+總務
是的，我是總務。在座的你各位下次記得交社費束脩給我
技能樹貧乏
想拿機器學習做專題結果只學會使用API
上屆社展烙跑到資訊社的叛徒
科班墊神

殭屍抱柑橘

梯度下降
線性代數
感知機
分類問題

梯度下降

微分

Diffrential

\(f(x)=ax^2+bx+c\)

\(f'(x)=2ax^1+bx\)

\frac{\mathrm{d} y}{\mathrm{d} x} = f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}

偏微分

Partial Diffrential

f(x, y) = x^2 + xy + y^2

\frac{\partial f(x, y)}{\partial x} = 2x+y

\frac{\partial f(x, y)}{\partial x} = 2x+1y+0y^2

簡單線性回歸

Simple Linear Regression

\(y=b_0+b_1x\)

看扣

\(\beta_0\)的改變與 \(\hat y-y\) 有關

\(\beta_1\)的改變與 \((\hat y_i-y)x_i\) 有關

\(\hat y=b_0+b_1x\)

最適迴歸直線

殘差平方和最小

\((x_1,y_1),(x_2,y_2)...(x_n,y_n)\)

\(\hat y = b_0 + b_1x\)

\(\hat y_1 - y_1 = (b_0 + b_1x_1) - y_1\)

\(\sum_{i=1}^{n} (\hat y_i - y_i)^2\)

損失函數

Loss Function

用一個函數來計算誤差

ex.

MAE(Mean Absolute Error)

MSE(Mean Square Error)

cross-entropy

還有其他各式各樣的

\(L = \sum_{i=1}^{n} (\hat y_i - y_i)^2\)

\(= \sum_{i=1}^{n} (b_0 + b_1x_i-y_i)^2\)

看扣

\(f'(x)=0\)

梯度下降

Gradient Descent

\(x\)

梯度=\(f'(x)\)

看扣

梯度下降

Gradient Descent

\(x'\)

看扣

學習率

Learning Rate

\(x' = x - f'(x)\)

\(x' = x -\alpha f'(x)\)

超參數(hyperparameter)

\(L = \sum_{i=1}^{n} (\hat y_i - y_i)^2\)

\(= \sum_{i=1}^{n} (b_0 + b_1x_i-y_i)^2\)

\(\frac{\partial L}{\partial b_0}=0\)

\(\frac{\partial L}{\partial b_1}=0\)

\(f(x) = (b_0+b_1x-y)^2\)

\(\frac {\partial}{\partial b_0}(b_0+b_1x-y)^2\) ?

鏈鎖律

Chain Rule

\(\frac{dy}{dx}=\frac{dy}{du} \frac{du}{dx}\)

\(y = (x+1)^2\)

令 \(u=x+1\)

\(\frac{dy}{dx}=\frac{dy}{du} \frac{du}{dx}=\frac{(x+1)^2}{x+1} \frac{x+1}{x}\)

鏈鎖律

Chain Rule

\(\frac{dy}{dx}=\frac{dy}{du} \frac{du}{dx}=\frac{(x+1)^2}{x+1} \frac{x+1}{x}\)

\(\frac{dy}{du}=2(x+1)\)

\(\frac{du}{dx}=1\)

\(\frac{dy}{dx}=2(x+1) \times 1=2x+2\)

\(f(x) = (b_0+b_1x-y)^2\)

\(\frac {\partial}{\partial b_0}f(x)=\frac{\partial f(x)}{\partial u}\frac{\partial u}{\partial b_0}\)

令 \(u=b_0+b_1x-y\)

\(\frac{\partial}{\partial b_0}f(x)=2(b_0+b_1x-y) \times 1\)

\(\frac{\partial}{\partial b_0}f(x)=2(\hat y-y)\)

\(f(x) = (b_0+b_1x-y)^2\)

\(\frac {\partial}{\partial b_1}f(x)=\frac{\partial f(x)}{\partial u}\frac{\partial u}{\partial b_1}\)

令 \(u=b_0+b_1x-y\)

\(\frac{\partial}{\partial b_1}=2(b_0+b_1x-y)\times x\)

\(\frac{\partial}{\partial b_1}=2(\hat y-y)\times x\)

import random
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

def MSE(b_0, b_1):
    loss = 0
    for i in range(len(x)):
        y_hat= b_0 + b_1 * x[i]
        loss += (y_hat- y[i])**2
    return loss / len(x)

def gradient_0(b_0, b_1):
    gradient = 0
    for i in range(len(x)):
        y_hat= b_0 + b_1 * x[i]
        gradient += 2 * (y_hat- y[i]) / len(x)
    return gradient

def gradient_1(b_0, b_1):
    gradient = 0
    for i in range(len(x)):
        y_hat= b_0 + b_1 * x[i]
        gradient += 2 * (y_hat- y[i]) * x[i] / len(x)
    return gradient

# y_hat= 0.5 * x - 5
random.seed(114514)
x = [i for i in range(-50, 50, 1)]
y = [0.5 * i - 5 + random.randint(-10, 10) for i in x]
b_0_history = []
b_1_history = []
loss_history = []

#起始位置
b_0 = -15
b_1 = 0.2

#hyperparameter
epoch = 10
lr = 0.001 #別超過0.0015

#梯度下降
for i in range(epoch):
    b_0_history.append(b_0)
    b_1_history.append(b_1)
    loss_history.append(MSE(b_0, b_1))
    b_0 -= lr*gradient_0(b_0, b_1)
    b_1 -= lr*gradient_1(b_0, b_1)

# 定義 b_0_axis 和 b_1_axis 的範圍
b_0_axis = np.linspace(-20, 10, 100)
b_1_axis = np.linspace(0, 1, 100)

# 創建網格
b_0_axis, b_1_axis = np.meshgrid(b_0_axis, b_1_axis)
Loss_values = np.zeros_like(b_0_axis)

# 計算每個 (b_0_axis, b_1_axis) 對應的 Loss 值
for i in range(b_0_axis.shape[0]):
    for j in range(b_0_axis.shape[1]):
        Loss_values[i, j] = MSE(b_0_axis[i, j], b_1_axis[i, j])

# 繪製三維圖
fig = plt.figure(figsize=(10, 7))
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(b_0_axis, b_1_axis, Loss_values, cmap='viridis', alpha=0.8)

# 繪製梯度下降路徑
ax.plot(b_0_history, b_1_history, loss_history,"o-", color='red')

# 添加標籤
ax.set_xlabel('b_0_axis')
ax.set_ylabel('b_1_axis')
ax.set_zlabel('Loss(b_0_axis, b_1_axis)')
ax.legend()

# 顯示圖形
plt.show()

import random
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

def MSE(b_0, b_1):
    loss = 0
    for i in range(len(x)):
        y_hat= b_0 + b_1 * x[i]
        loss += (y_hat- y[i])**2
    return loss / len(x)

def gradient_0(b_0, b_1):
    gradient = 0
    for i in range(len(x)):
        y_hat= b_0 + b_1 * x[i]
        gradient += 2 * (y_hat- y[i]) / len(x)
    return gradient

def gradient_1(b_0, b_1):
    gradient = 0
    for i in range(len(x)):
        y_hat= b_0 + b_1 * x[i]
        gradient += 2 * (y_hat- y[i]) * x[i] / len(x)
    return gradient

# y_hat= 0.5 * x - 5
random.seed(114514)
x = [i for i in range(-50, 50, 1)]
y = [0.5 * i - 5 + random.randint(-10, 10) for i in x]
b_0_history = []
b_1_history = []
loss_history = []

#起始位置
b_0 = -15
b_1 = 0.2

#hyperparameter
epoch = 10
lr = 0.001 #別超過0.0015

#梯度下降
for i in range(epoch):
    b_0_history.append(b_0)
    b_1_history.append(b_1)
    loss_history.append(MSE(b_0, b_1))
    b_0 -= lr*gradient_0(b_0, b_1)
    b_1 -= lr*gradient_1(b_0, b_1)

# 定義 b_0_axis 和 b_1_axis 的範圍
b_0_axis = np.linspace(-20, 10, 100)
b_1_axis = np.linspace(0, 1, 100)

# 創建網格
b_0_axis, b_1_axis = np.meshgrid(b_0_axis, b_1_axis)
Loss_values = np.zeros_like(b_0_axis)

# 計算每個 (b_0_axis, b_1_axis) 對應的 Loss 值
for i in range(b_0_axis.shape[0]):
    for j in range(b_0_axis.shape[1]):
        Loss_values[i, j] = MSE(b_0_axis[i, j], b_1_axis[i, j])

# 繪製三維圖
fig = plt.figure(figsize=(10, 7))
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(b_0_axis, b_1_axis, Loss_values, cmap='viridis', alpha=0.8)

# 繪製梯度下降路徑
ax.plot(b_0_history, b_1_history, loss_history,"o-", color='red')

# 添加標籤
ax.set_xlabel('b_0_axis')
ax.set_ylabel('b_1_axis')
ax.set_zlabel('Loss(b_0_axis, b_1_axis)')
ax.legend()

# 顯示圖形
plt.show()

均方誤差

Mean Square Error

MSE=\(\frac{1}{n} \sum_{i=1}^{n} (b_0 + b_1x_i-y_i)^2\)

import random
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

def MSE(b_0, b_1):
    loss = 0
    for i in range(len(x)):
        y_hat= b_0 + b_1 * x[i]
        loss += (y_hat- y[i])**2
    return loss / len(x)

def gradient_0(b_0, b_1):
    gradient = 0
    for i in range(len(x)):
        y_hat= b_0 + b_1 * x[i]
        gradient += 2 * (y_hat- y[i]) / len(x)
    return gradient

def gradient_1(b_0, b_1):
    gradient = 0
    for i in range(len(x)):
        y_hat= b_0 + b_1 * x[i]
        gradient += 2 * (y_hat- y[i]) * x[i] / len(x)
    return gradient

# y_hat= 0.5 * x - 5
random.seed(114514)
x = [i for i in range(-50, 50, 1)]
y = [0.5 * i - 5 + random.randint(-10, 10) for i in x]
b_0_history = []
b_1_history = []
loss_history = []

#起始位置
b_0 = -15
b_1 = 0.2

#hyperparameter
epoch = 10
lr = 0.001 #別超過0.0015

#梯度下降
for i in range(epoch):
    b_0_history.append(b_0)
    b_1_history.append(b_1)
    loss_history.append(MSE(b_0, b_1))
    b_0 -= lr*gradient_0(b_0, b_1)
    b_1 -= lr*gradient_1(b_0, b_1)

# 定義 b_0_axis 和 b_1_axis 的範圍
b_0_axis = np.linspace(-20, 10, 100)
b_1_axis = np.linspace(0, 1, 100)

# 創建網格
b_0_axis, b_1_axis = np.meshgrid(b_0_axis, b_1_axis)
Loss_values = np.zeros_like(b_0_axis)

# 計算每個 (b_0_axis, b_1_axis) 對應的 Loss 值
for i in range(b_0_axis.shape[0]):
    for j in range(b_0_axis.shape[1]):
        Loss_values[i, j] = MSE(b_0_axis[i, j], b_1_axis[i, j])

# 繪製三維圖
fig = plt.figure(figsize=(10, 7))
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(b_0_axis, b_1_axis, Loss_values, cmap='viridis', alpha=0.8)

# 繪製梯度下降路徑
ax.plot(b_0_history, b_1_history, loss_history,"o-", color='red')

# 添加標籤
ax.set_xlabel('b_0_axis')
ax.set_ylabel('b_1_axis')
ax.set_zlabel('Loss(b_0_axis, b_1_axis)')
ax.legend()

# 顯示圖形
plt.show()

import random
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

def MSE(b_0, b_1):
    loss = 0
    for i in range(len(x)):
        y_hat= b_0 + b_1 * x[i]
        loss += (y_hat- y[i])**2
    return loss / len(x)

def gradient_0(b_0, b_1):
    gradient = 0
    for i in range(len(x)):
        y_hat= b_0 + b_1 * x[i]
        gradient += 2 * (y_hat- y[i]) / len(x)
    return gradient

def gradient_1(b_0, b_1):
    gradient = 0
    for i in range(len(x)):
        y_hat= b_0 + b_1 * x[i]
        gradient += 2 * (y_hat- y[i]) * x[i] / len(x)
    return gradient

# y_hat= 0.5 * x - 5
random.seed(114514)
x = [i for i in range(-50, 50, 1)]
y = [0.5 * i - 5 + random.randint(-10, 10) for i in x]
b_0_history = []
b_1_history = []
loss_history = []

#起始位置
b_0 = -15
b_1 = 0.2

#hyperparameter
epoch = 10
lr = 0.001 #別超過0.0015

#梯度下降
for i in range(epoch):
    b_0_history.append(b_0)
    b_1_history.append(b_1)
    loss_history.append(MSE(b_0, b_1))
    b_0 -= lr*gradient_0(b_0, b_1)
    b_1 -= lr*gradient_1(b_0, b_1)

# 定義 b_0_axis 和 b_1_axis 的範圍
b_0_axis = np.linspace(-20, 10, 100)
b_1_axis = np.linspace(0, 1, 100)

# 創建網格
b_0_axis, b_1_axis = np.meshgrid(b_0_axis, b_1_axis)
Loss_values = np.zeros_like(b_0_axis)

# 計算每個 (b_0_axis, b_1_axis) 對應的 Loss 值
for i in range(b_0_axis.shape[0]):
    for j in range(b_0_axis.shape[1]):
        Loss_values[i, j] = MSE(b_0_axis[i, j], b_1_axis[i, j])

# 繪製三維圖
fig = plt.figure(figsize=(10, 7))
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(b_0_axis, b_1_axis, Loss_values, cmap='viridis', alpha=0.8)

# 繪製梯度下降路徑
ax.plot(b_0_history, b_1_history, loss_history,"o-", color='red')

# 添加標籤
ax.set_xlabel('b_0_axis')
ax.set_ylabel('b_1_axis')
ax.set_zlabel('Loss(b_0_axis, b_1_axis)')
ax.legend()

# 顯示圖形
plt.show()

特徵縮放

Feature Scaling

\(\frac{\partial L}{\partial b_0}=0\)

\(\frac{\partial L}{\partial b_1}=0\)

歸一化

把數據線性壓縮到0~1之間

\(x_1,x_2,...x_n\)

\(x_i'=\frac{x_i-min(x_1,x_2,...,x_n)}{max(x_1,x_2,...,x_n)-min(x_1,x_2,...,x_n)}\)

標準化

使平均值 = 0，標準差 = 1

\(x_1,x_2,...x_n\)

\(x_i'=\frac{x_i-\mu_x}{\sigma_x}\)

68%

96%

slides 1-3

import random

import matplotlib.pyplot as plt
from IPython import display

# y = 0.5 * x - 5
random.seed(114514)
x = [i for i in range(-50, 50, 1)]
y = [0.5 * i - 5 + random.randint(-10, 10) for i in x]
b_0 = random.uniform(-1, 1)
b_1 = random.uniform(-1, 1)
print("before train:")
print(f"b_0:{b_0}")
print(f"b_1:{b_1}\n")

alpha_0 = 0.01
alpha_1 = 0.0001
epoch = 100

def f(x):
    return b_0 + b_1 * x

def delta_0(y, yh):
    return -2 * (y-yh)

def delta_1(y, yh, x):
    return -2 * (y-yh) * x

for _ in range(epoch):
    plt.clf()
    plt.plot([-50,50], [0,0], c="black", alpha=0.5)
    plt.plot([0,0], [-50,50], c="black", alpha=0.5)
    plt.xlim(-50, 50)
    plt.ylim(-50, 50)

    i = random.randint(0, 99)
    yh = b_0 + b_1 * x[i]
    b_0 -= alpha_0 * delta_0(y[i], yh)
    b_1 -= alpha_1 * delta_1(y[i], yh, x[i])
    plt.plot([-50, 50], [f(-50), f(50)], c="red")
    plt.scatter(x, y, c="blue")
    plt.scatter(x[i], y[i], c="red")
    plt.pause(0.01)
    display.clear_output(wait=True)

plt.pause(-1)
print("after train:")
print(f"b_0:{b_0}")
print(f"b_1:{b_1}")

slides 1-9

import random
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

def Loss(b_0, b_1):
    loss = 0
    for i in range(len(x)):
        y_hat = b_0 + b_1 * x[i]
        loss += (y_hat - y[i])**2
    return loss

# y_hat = 0.5 * x - 5
random.seed(114514)
x = [i for i in range(-50, 50, 1)]
y = [0.5 * i - 5 + random.randint(-10, 10) for i in x]

# 定義 b_0 和 b_1 的範圍
b_0 = np.linspace(-20, 10, 100)
b_1 = np.linspace(0, 1, 100)

# 創建網格
B_0, B_1 = np.meshgrid(b_0, b_1)
Loss_values = np.zeros_like(B_0)

# 計算每個 (b_0, b_1) 對應的 Loss 值
for i in range(B_0.shape[0]):
    for j in range(B_0.shape[1]):
        Loss_values[i, j] = Loss(B_0[i, j], B_1[i, j])

# 繪製三維圖
fig = plt.figure(figsize=(10, 7))
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(B_0, B_1, Loss_values, cmap='viridis', alpha=0.8)

# 添加紅點標記 b_0 = -5, b_1 = 0.5
b_0_point = -5
b_1_point = 0.5
loss_point = Loss(b_0_point, b_1_point)
ax.scatter(b_0_point, b_1_point, loss_point, color='red', s=50, label=f"Global Minima (-5, 0.5, {loss_point:.2f})")

# 添加標籤
ax.set_xlabel('b_0')
ax.set_ylabel('b_1')
ax.set_zlabel('Loss(b_0, b_1)')
ax.legend()

# 顯示圖形
plt.show()

Code

slides 1-11

import matplotlib.pyplot as plt
import random
import numpy as np
from IPython import display

def f(x):
    return 3*x**2
def df(x):
    return 6*x

x = 3
lr = 0.3 #learning rate
step = 10
x_history = []
y_history = []
for i in range(step):
    x_history.append(x)
    y_history.append(f(x))
    x -= lr*df(x)

X = np.linspace(-5,5,100)
plt.plot(X,f(X),"b-")
plt.plot(x_history,y_history,"o-",c="r")
plt.show()

#動畫
'''
for i in range(len(x_history)):
    plt.clf()
    X = np.linspace(-5,5,100)
    plt.plot(X,f(X),"b-")
    plt.plot(x_history[i],y_history[i],"o",c="r")
    plt.pause(0.2)
    display.clear_output(wait=True)
plt.pause(-1)
'''

線性代數

是的我們又要講數學了。

#我偷我自己

感知機

\vec X

Perceptron

\sigma( (\sum_{i=1}^{n} x_i \times w_i) +b)

= \sigma( \left[ \begin{matrix} x_{0} \\ x_{1} \\ \vdots \\ x_{n} \\ \end{matrix} \right] \cdot \left[ \begin{matrix} w_{0} & w_{1} \cdots w_{n} \end{matrix} \right] + b)

\sigma(\vec X \cdot \vec W + b) = Y

感知機

\vec X

Perceptron

\sigma(\vec X \cdot \vec W + b) = Y

這就是每個Neuron在進行的事情

把輸入( X )乘上一個Weight並且加上Bias

最後在套上Activation Function

如果一直進行wx+b的操作會發現它始終是線性的，這樣一來這整個神經網路能夠擬合出來的函式會被限縮

並且能夠對每個神經元輸出的值進行一定程度的控制

例如: 限縮範圍，限縮負數成長

激勵函數

Activation Function

而激勵函數能夠提供給這個神經網路的方程式非線性的來改善這個問題

以下為幾種常見的激勵函數範例

激勵函數

Activation Function

Identity
Binary step
Sigmoid
tanh
ReLU

激勵函數

Activation Function

分類問題

鳶尾花資料集

Iris Species dataset

透過花瓣跟萼片分辨鳶尾花種類

經典分類問題

輸入：花瓣長寬+花萼長寬(4項)

輸出：鳶尾花種類(3種)

山鳶尾(setosa)

雜色鳶尾(vesicolor)

維吉尼亞鳶尾(virginica)

二元分類

Binary Classification

輸出一個機率

山鳶尾(setosa) => 1

雜色鳶尾(vesicolor) => 0

\hat y = \sigma( \left[ \begin{matrix} x_{0} \\ x_{1} \\ x_{2} \\ x_{3} \\ \end{matrix} \right] \cdot \left[ \begin{matrix} w_{0} & w_{1} & w_{2} & w_{3} \end{matrix} \right] + b)

Activation Function

=> Sigmoid

Activation Function

=> Sigmoid

\(\hat y=\sigma(\vec X \cdot \vec W +b)\)

Loss Function

\(L = -\sum [y\ln (\hat y)+(1-y)\ln (1-\hat y)]\)

\(y\)是實際機率

\(\hat y\)是預估機率

Gradient Descent

\(\frac{\partial L}{\partial w_i}=\frac{\partial L}{\partial u}\frac{\partial u}{\partial w_i}\), \(u=x_i\times w_i+b\)

\(\frac{\partial L}{\partial u}=\frac{\partial L}{\partial \sigma(u)}\frac{\partial \sigma(u)}{\partial u}=\sum (\hat y-y)\)

\(\frac{\partial L}{\partial u}=\frac{\partial L}{\partial \hat y}\frac{\partial \hat y}{\partial u}\)

\(\frac{\partial L}{\partial u}=\frac{\partial L}{\partial \sigma (u)}\frac{\partial \sigma (u)}{\partial u}\)

Gradient Descent

\(\frac{\partial L}{\partial w_i}=\sum(\hat y_i-y_i)\times x_i\)

\(\frac{\partial L}{\partial w_i}=\sum(\hat y-y)\times\frac{\partial u}{\partial w_i}\), \(u=x_i\times w_i+b\)

\(\frac{\partial L}{\partial b}=\sum(\hat y-y)\times\frac{\partial u}{\partial b}\), \(u=x_i\times w_i+b\)

\(\frac{\partial L}{\partial b}=\sum(\hat y_i-y_i)\)

實作

id	花萼長	花萼寬	花瓣長	花瓣寬	品種
1	5.1	3.5	1.4	0.2	Iris-setosa
2	4.9	3.0	1.4	0.2	Iris-setosa
...	...	...	...	...	...
150	5.9	3.0	5.1	1.8	Iris-virginica

Iris Species dataset

setosa

versicolor

virginica

1~50

51~100

101~150

id	花萼長	花萼寬	花瓣長	花瓣寬	品種
1	5.1	3.5	1.4	0.2	Iris-setosa
2	4.9	3.0	1.4	0.2	Iris-setosa
...	...	...	...	...	...
100	5.7	2.8	4.1	1.3	Iris-versicolor

setosa

versicolor

1~50

51~100

data = data[ :100, : ]

id	花萼長	花萼寬	花瓣長	花瓣寬	Label
1	5.1	3.5	1.4	0.2	1
2	4.9	3.0	1.4	0.2	1
...	...	...	...	...	...
100	5.7	2.8	4.1	1.3	0

setosa => 1

versicolor => 0

1~50

51~100

Label

花萼長	花萼寬	花瓣長	花瓣寬	Label
5.1	3.5	1.4	0.2	1
4.9	3.0	1.4	0.2	1
...	...	...	...	...
5.7	2.8	4.1	1.3	0

setosa => 1

versicolor => 0

1~50

51~100

data = data[ :, 1: ]

花萼長	花萼寬	花瓣長	花瓣寬	Label
5.0	2.3	3.3	1.0	0
5.1	3.8	1.9	0.4	1
...	...	...	...	...
4.7	3.2	1.6	0.2	1

setosa => 1

versicolor => 0

np.random.shuffle(data)

y = data[:,4].reshape(-1,1)

花萼長	花萼寬
5.0	2.3
5.1	3.8
...	...
4.7	3.2

Label
0
1
...
1

x = data[:,0:2]

y = data[:,4].reshape(-1,1)

x = data[:,0:2]

x = \left[ \begin{matrix} x_{00}&x_{01}\\ x_{10}&x_{11}\\ \vdots&\vdots\\ x_{1000}&x_{1001}\\ \end{matrix} \right]

y = \left[ \begin{matrix} y_{0}\\ y_{1}\\ \vdots\\ y_{100}\\ \end{matrix} \right]

W = np.random.randn(2,1)

b = np.random.randn(1)

\left[ \begin{matrix} w_{0}\\ w_{1}\\ \end{matrix} \right]

\left[ \begin{matrix} b\\ \end{matrix} \right]

Sigmoid

\sigma(x) = \frac {1} {1+e^{-x}}

\(\hat y=\sigma(\vec X \cdot \vec W +b)\)

F(X)

Loss

\(-\sum [y\ln (\hat y)+(1-y)\ln (1-\hat y)]\)

\(\hat y=\sigma(\vec X \cdot \vec W +b)\)

F(X)

\left[ \begin{matrix} x_{00}&x_{01}\\ x_{10}&x_{11}\\ \vdots&\vdots\\ x_{1000}&x_{1001}\\ \end{matrix} \right] \left[ \begin{matrix} w_{0}\\ w_{1}\\ \end{matrix} \right] = \left[ \begin{matrix} x_{00}w_{0}+x_{01}w_{1}\\ x_{10}w_{0}+x_{11}w_{1}\\ \vdots\\ x_{1000}w_{0}+x_{1001}w_{1}\\ \end{matrix} \right]

\left[ \begin{matrix} x_{00}&x_{01}\\ x_{10}&x_{11}\\ \vdots&\vdots\\ x_{1000}&x_{1001}\\ \end{matrix} \right] \left[ \begin{matrix} w_{0}\\ w_{1}\\ \end{matrix} \right]

\(\hat y=\sigma(\vec X \cdot \vec W +b)\)

F(X)

\left[ \begin{matrix} x_{00}w_{0}+x_{01}w_{1}\\ x_{10}w_{0}+x_{11}w_{1}\\ \vdots\\ x_{1000}w_{0}+x_{1001}w_{1}\\ \end{matrix} \right] + \left[ \begin{matrix} b \end{matrix} \right]

\left[ \begin{matrix} x_{00}w_{0}+x_{01}w_{1}+b\\ x_{10}w_{0}+x_{11}w_{1}+b\\ \vdots\\ x_{1000}w_{0}+x_{1001}w_{1}+b\\ \end{matrix} \right]

\(\hat y=\sigma(\vec X \cdot \vec W +b)\)

F(X)

\sigma ( \left[ \begin{matrix} x_{00}w_{0}+x_{01}w_{1}+b\\ x_{10}w_{0}+x_{11}w_{1}+b\\ \vdots\\ x_{1000}w_{0}+x_{1001}w_{1}+b\\ \end{matrix} \right])

\left[ \begin{matrix} \sigma (x_{00}w_{0}+x_{01}w_{1}+b)\\ \sigma (x_{10}w_{0}+x_{11}w_{1}+b)\\ \vdots\\ \sigma (x_{1000}w_{0}+x_{1001}w_{1}+b)\\ \end{matrix} \right]

\(\hat y=\sigma(\vec X \cdot \vec W +b)\)

F(X)

\left[ \begin{matrix} y_{0}\\ y_{1}\\ \vdots\\ y_{100}\\ \end{matrix} \right] = \left[ \begin{matrix} \sigma (x_{00}w_{0}+x_{01}w_{1}+b)\\ \sigma (x_{10}w_{0}+x_{11}w_{1}+b)\\ \vdots\\ \sigma (x_{1000}w_{0}+x_{1001}w_{1}+b)\\ \end{matrix} \right]

Gradient Descent

\(\frac{\partial L}{\partial w_i}=\sum(\hat y_i-y_i)\times x_i\)

\(\frac{\partial L}{\partial b}=\sum(\hat y_i-y_i)\)

\(\frac{\partial L}{\partial w_i}=\sum(\hat y_i-y_i)\times x_i\)

\left[ \begin{matrix} \Delta w_{0}\\ \Delta w_{1}\\ \end{matrix} \right]

\(\frac{\partial L}{\partial w_i}=\sum(\hat y_i-y_i)\times x_i\)

x = \left[ \begin{matrix} x_{00}&x_{01}\\ x_{10}&x_{11}\\ \vdots&\vdots\\ x_{1000}&x_{1001}\\ \end{matrix} \right]

\(\frac{\partial L}{\partial w_i}=\sum(\hat y_i-y_i)\times x_i\)

x^T = \left[ \begin{matrix} x_{00}&x_{10}&\cdots &x_{1000}\\ x_{01}&x_{11}&\cdots &x_{1001}\\ \end{matrix} \right]

\(\frac{\partial L}{\partial w_i}=\sum(\hat y_i-y_i)\times x_i\)

\left[ \begin{matrix} \hat y_{0}\\ \hat y_{1}\\ \vdots\\ \hat y_{100}\\ \end{matrix} \right] - \left[ \begin{matrix} y_{0}\\ y_{1}\\ \vdots\\ y_{100}\\ \end{matrix} \right] = \left[ \begin{matrix} \hat y_{0}-y_{0}\\ \hat y_{1}-y_{1}\\ \vdots\\ \hat y_{100}-y_{100}\\ \end{matrix} \right]

\(\frac{\partial L}{\partial w_i}=\sum(\hat y_i-y_i)\times x_i\)

\left[ \begin{matrix} x_{00}&x_{10}&\cdots &x_{1000}\\ x_{01}&x_{11}&\cdots &x_{1001}\\ \end{matrix} \right] \left[ \begin{matrix} \hat y_{0}-y_{0}\\ \hat y_{1}-y_{1}\\ \vdots\\ \hat y_{100}-y_{100}\\ \end{matrix} \right]

\(\frac{\partial L}{\partial w_i}=\sum(\hat y_i-y_i)\times x_i\)

\left[ \begin{matrix} \Delta w_{0}\\ \Delta w_{1}\\ \end{matrix} \right] = \left[ \begin{matrix} \sum (\hat y_{i}-y_{i})\times x_{i0}\\ \sum (\hat y_{i}-y_{i})\times x_{i1}\\ \end{matrix} \right]

\(\frac{\partial L}{\partial b}=\sum(\hat y_i-y_i)\)

\left[ \begin{matrix} \hat y_{0}-y_{0}\\ \hat y_{1}-y_{1}\\ \vdots\\ \hat y_{100}-y_{100}\\ \end{matrix} \right]

np.sum

\left[ \begin{matrix} \hat y_{0}-y_{0}\\ +\\ \hat y_{1}-y_{1}\\ +\\ \vdots\\ +\\ \hat y_{100}-y_{100}\\ \end{matrix} \right]

[\Delta b] = \left[ \begin{matrix} \hat y_{0}-y_{0}\\ +\\ \hat y_{1}-y_{1}\\ +\\ \vdots\\ +\\ \hat y_{100}-y_{100}\\ \end{matrix} \right]

id	花萼長	花萼寬	花瓣長	花瓣寬	Label
1	5.1	3.5	1.4	0.2	1
2	4.9	3.0	1.4	0.2	1
...	...	...	...	...	...
100	5.7	2.8	4.1	1.3	0

id	花萼長	花萼寬	花瓣長	花瓣寬	Label
1	5.1	3.5	1.4	0.2	1
2	4.9	3.0	1.4	0.2	1
...	...	...	...	...	...
100	5.7	2.8	4.1	1.3	0

機器學習小社-2

講師 000

講師 章魚

殭屍抱柑橘

梯度下降

線性代數

感知機

分類問題

目錄

梯度下降

微分

Diffrential

偏微分

Partial Diffrential

簡單線性回歸

Simple Linear Regression

\(\beta_0\)的改變與 \(\hat y-y\) 有關

\(\beta_1\)的改變與 \((\hat y_i-y)x_i\) 有關

最適迴歸直線

損失函數

Loss Function

梯度下降

Gradient Descent

梯度下降

Gradient Descent

學習率

Learning Rate

鏈鎖律

Chain Rule

鏈鎖律

Chain Rule

均方誤差

Mean Square Error

特徵縮放

Feature Scaling

歸一化

標準化

Code

線性代數

#我偷我自己

感知機

感知機

Perceptron

感知機

Perceptron

激勵函數

Activation Function

激勵函數

Activation Function

Identity

Binary step

Sigmoid

tanh

ReLU

激勵函數

Activation Function

分類問題

鳶尾花資料集

Iris Species dataset

二元分類

Binary Classification

Activation Function

=> Sigmoid

Activation Function

=> Sigmoid

Loss Function

Gradient Descent

Gradient Descent

實作

Iris Species dataset

data = data[ :100, : ]

Label

data = data[ :, 1: ]

np.random.shuffle(data)

Sigmoid

F(X)

Loss

F(X)

F(X)

F(X)

講師章魚

機器學習社課第二堂

id	花萼長	花萼寬	花瓣長	花瓣寬	Label
1	5.1	3.5	1.4	0.2	1
2	4.9	3.0	1.4	0.2	1
...	...	...	...	...	...
100	5.7	2.8	4.1	1.3	0