author : thomaswang
and Seanliu
Part I - Calculus
\(\lim_{x \to 2} \frac{x^2 - 4}{x - 2} = 4\)
Stewart, 9th edition, p.83
We say \(\lim_{x \to a} f(x) = L\) if we can make \(f(x)\) as close to \(L\) as we want by making \(x\) as close to \(a\) as possible.
\(\lim_{x \to a} f(x) = y\) iff for all \(\epsilon > 0\), there exists some \(\delta\) such that
\(|x - a| < \delta \implies |f(x) - f(a)| < \epsilon\)
\(f\) is not necessarily defined at \(a\)!
Some easy limit properties: let \(\lim_{x \to a} f(x) = L\), \(\lim_{x \to a} g(x) = M\).
Let \(\lim_{x \to a} f(x) = L\), \(\lim_{x \to a} g(x) = M\).
Then \(\forall \epsilon\), \(\exist \delta_1, \delta_2\) s.t.
\(|x - a| < \delta_1 \implies |f(x) - L| < \epsilon/2\)
\(|x - a| < \delta_2 \implies |g(x) - M| < \epsilon/2\)
Let \(\delta = \min(\delta_1, \delta_2)\). Then
\(|x - a| < \delta \implies |f(x) - L|, |g(x) - M| < \epsilon/2\)
so
\(-\epsilon/2 < f(x) - L < \epsilon/2\), \(-\epsilon/2 < g(x) - M < \epsilon/2\)
\(-\epsilon < (f(x) + g(x)) - (L + M) < \epsilon\)
\(\implies |f(x) + g(x) - (L + M)| < \epsilon\)
What about things like
What about things like
What about things like
Slope of a function \(f\) between \(a\) and \(b\): \(\frac{f(b) - f(a)}{b - a}\)
Slope of a function \(f\) at \(a\):
\( f'(a) = \frac{df}{da} = \lim_{\delta \to 0} \frac{f(a + \delta) - f(a)}{\delta}\)
Derivatives of common functions:
We will prove these three!
\([x^n]' = nx^{n - 1}\):
Proof.
\([x^n]' = \lim_{\delta \to 0} \frac{(x + \delta)^n - x^n}{\delta} = \lim_{\delta \to 0} \frac{(x^n + n\delta x^{n - 1} + O(\delta^2)) - x^n}{\delta}\)
\(= \lim_{\delta \to 0} \frac{\delta \cdot nx^{n - 1}}{\delta} = nx^{n - 1} \)
\(O(\delta^2)\) too small, safely ignore
\([fg]' = f'g + g'f\):
Proof.
\([fg]' = \lim_{\delta \to 0} \frac{f(x + \delta)g(x + \delta) - f(x)g(x)}{\delta} \)
\(= \lim_{\delta \to 0} \frac{\left[f(x) + \delta f'(x)\right]\left[g(x) + \delta g'(x)\right] - f(x)g(x)}{\delta} \)
\(= \lim_{\delta \to 0} \frac{f'g + g'f + O(\delta^2)}{\delta}\)
\( = f'g + g'f\)
\(O(\delta^2)\) too small, safely ignore
\([f \circ g]' = g' (f' \circ g)\):
Proof. Easy when we use Leibniz notation:
\(\frac{d(f \circ g)}{dx} = \frac{df}{dg} \frac{dg}{dx} = g'(x) \cdot f'(g(x))\)
Ex. \(A\) and \(B\) are two variables, and you can control a variable \(x\). \(A\) changes by \(2\) for each unit of change in \(x\), and \(B\) changes by \(3\) for each unit of change in \(A\). How much does \(B\) change when you change \(x\)?
If a function \(f: \mathbb{R}^n \to \mathbb{R}\) has a local extrema (minima/maxima) at \(\mathbf{x}\), then \(f(\mathbf{x})' = 0\)
Part II - Matrix Theory
Suppose \(\mathbf{u, v} \in \mathbb{R}^d\). Then:
Suppose \(A, B \in \mathbb{R}^{n \times m}, C \in \mathbb{R}^{m \times k}\). Then:
Suppose \(A, B \in \mathbb{R}^{n \times m}, \mathbf{u}, \mathbf{v} \in \mathbb{R}^{m}\), and let \(x, y \in \mathbb{R}\). Then:
Suppose \(A, B \in \mathbb{R}^{n \times m}, \mathbf{u}, \mathbf{v} \in \mathbb{R}^{m}\), and let \(x, y \in \mathbb{R}\). Then:
那我們到底要上什麼???
人工智慧亦稱智械、機器智慧,指由人製造出來的機器所表現出來的智慧。通常人工智慧是指透過普通電腦程式來呈現人類智慧的技術。(維基百科)
#include <bits/stdc++.h>
using namespace std;
int main(){
int a,b;
cin>>a>>b;
if(a>b)
cout<<a<<" is bigger than "<<b<<endl;
else if(a==b)
cout<<a<<" is equal to "<<b<<endl;
else
cout<<a<<" is smaller than "<<b<<endl;
return 0;
}
人工智慧
handcrafted
machine learning
決策樹、線性回歸、類神經網路、邏輯回歸、支援向量機(SVM)、相關向量機(RVM)
訓練資料
有功能的人工智慧
學習 processing...
神經元
訊號
\(x_1,x_2,x_3\):前面神經元的放電大小
\(w_1,w_2,w_3\):前面神經元對這個神經元的影響大小(距離)
\(\Sigma x_iw_i\):將前面的影響加總
激勵函數:整合前面的影響,向下一個神經元輸出
之後再慢慢解釋吧\(\cdots\)
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD
import matplotlib.pyplot as plt
# Generate data
d=4
x_train=np.random.random((10000,d))
y_train=[]
x_test=np.random.random((1000,d))
y_test=[]
for i in x_train:
t=0
for j in range(d):
t=t+j*i[j]
y_train.append(t)
for i in x_test:
t=0
for j in range(d):
t=t+j*(i[j]**j)
y_test.append(t)
# training
model = Sequential()
model.add(Dense(100, input_dim=d))
model.add(Dense(1))
model.compile(loss='mse',
optimizer='Adam',
metrics=['mse'])
model.summary()
history=model.fit(x_train, y_train, epochs=10, batch_size=10, validation_split=0.2)
score = model.evaluate(x_test, y_test, batch_size=64)
print(f"score: {score}")
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper right')
plt.show()
# generating training and testing data
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
import numpy as np
import random as rd
import matplotlib.pyplot as plt
d=2
n=1000
test_n=100
x_train=[]
y_train=[]
x_test=[]
y_test=[]
mua=[-2,2]
mub=[2,-2]
sigma=[[5,0],[0,5]]
for i in range(n):
if rd.randint(0,1)==0:
x_train.append(np.random.multivariate_normal(mua,sigma))
y_train.append([1,0])
else:
x_train.append(np.random.multivariate_normal(mub,sigma))
y_train.append([0,1])
for i in range(test_n):
if rd.randint(0,1)==0:
x_test.append(np.random.multivariate_normal(mua,sigma))
y_test.append([1,0])
else:
x_test.append(np.random.multivariate_normal(mub,sigma))
y_test.append([0,1])
x_train=np.array(x_train)
y_train=np.array(y_train)
x_test=np.array(x_test)
y_test=np.array(y_test)
for i in range(n):
if y_train[i][0]==0:
plt.scatter(x_train[i][0],x_train[i][1],c="red",marker=".")
else:
plt.scatter(x_train[i][0],x_train[i][1],c="blue",marker=".")
plt.show()
# training
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
model = Sequential()
model.add(Dense(100, input_dim=d, activation='sigmoid'))
model.add(Dense(100, input_dim=d, activation='sigmoid'))
model.add(Dense(100, input_dim=d, activation='sigmoid'))
model.add(Dense(100, input_dim=d, activation='sigmoid'))
model.add(Dense(100, input_dim=d, activation='sigmoid'))
model.add(Dense(2,activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.summary()
history=model.fit(x_train, y_train, epochs=50, batch_size=128, validation_split=0.2)
score = model.evaluate(x_test, y_test, batch_size=128)
print(f"score: {score}")
# Plot training & validation accuracy values
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper right')
plt.show()
# Plot training & validation loss values
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper right')
plt.show()
神經
網路
ImageNet Large Scale Visual Recognition Competition
通用近似定理
the universal approximation theorem states that a feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of \(\R^n\), under mild assumptions on the activation function.
cat
cat
dog
dog
INPUT
OUTPUT
predict function
\(f(x)\)
1.define
function set
2.find the
best of them
\(a(x)\)
\(b(x)\)
\(h(x)\)
\(z(x)\)
\(l(x)\)
\(o(x)\)
\(y(x)\)
function | error |
---|---|
a(x) | 0.98 |
b(x) | 40 |
g(x) | 7122 |
h(x) | 0.001 |
y(x) | 90 |
z(x) | 127 |
\(x_0\)
\(x_1\)
\(x_2\)
\(x_3\)
\(a_0\)
\(a_1\)
\(a_2\)
\(a_4\)
\(a_5\)
\(c_0\)
\(c_1\)
\(c_2\)
\(c_4\)
\(c_5\)
\(y_1\)
\(y_2\)
so
many
layers
\(\vdots\)
\(\vdots\)
class 1
\(1\)
\(1\)
\(1\)
\(0\)
picture
update parameters due to error
\(0.3\)
\(0.7\)
ans
\(1\)
\(0\)
error
night
animal
baby
smell
night
Neural
Network
output
input
generate pictures
robot
robot
painter
teacher
\(1^{st}\) try
terrible !!!
\(3^{rd}\) try
not exactly right
\(2^{nd}\) try
the right shape
\(4^{th}\) try
perfect !!!
painter = generative network(生成網絡)
teacher = discriminative network(判別網絡)
painter generate pictures similar to true pictures
teacher distinguishes generated pictures from true pictures
painter tries to fool teacher , increase teacher's error rate
the opposites become smarter after rounds(iterations)
train an AI to play games
environment
observation
action
reward
score
frame
shoot,move right
actor = player
critic = assessor(評估員)
critic gather informations from observation
actor take actions due to figures provided by critic
what we're looking for
\(y=1.02x-0.009\)
讓我們換個想法\(\cdots\)
一開始隨機初始值 \(w_0,b_i\)
讀入第一次數據
調整參數變成 \(w_1,b_1\)
讀入第二次數據
調整參數變成 \(w_2,b_2\)
\(\vdots\)
讀入第\(n\)次數據
調整參數變成 \(w_n,b_n\)
\(f_{a,b,c}(x)=ax^2+bx+c\)
\(f_{2,15,78}(x)=2x^2+15x+78\)
\(f_{1,2,1}(x)=1x^2+2x+1\)
\(\in\)
\(f_{0,0,0}(x)=0x^2+0x+0\)
\(f_{7,1,22}(x)=7x^2+1x+22\)
\(f_{7,12,2}(x)=7x^2+12x+2\)
\(f_{71,2,2}(x)=71x^2+2x+2\)
\(f_{w,b}(x)=wx+b\)
\(f_{1,2}(x)=1x+2\)
\(f_{2,\pi}(x)=2x+\pi\)
\(\in\)
\(f_{0,0}(x)=0x+0\)
\(f_{71,22}(x)=71x+22\)
\(f_{7,122}(x)=7x+122\)
\(f_{712,2}(x)=712x+2\)
\[ f_1(x)\rightarrow f_2 (x)\rightarrow\cdots\rightarrow f_n(x)\]
\(\Longrightarrow\)不可能窮舉!!
最佳函數
初始\((w,b)\)
\((w,b)\)代表\(f_{w,b}(x)\)
尋找\((w\pm\Delta,b),(w,b\pm\Delta),(w\pm\Delta,b\pm\Delta)\)哪一個最好
移動單位\(\Delta\)越小,結果越精準
若\((w,b)\)已是四周最小的,則\(f_{w,b}(x)為最好的函數\)
否則\((w,b)\)更新為四周最好的那組
以相同方法繼續更新\((w,b)\)
\((w_0,b_0)=(-4,4)\),\(\Delta=0.5\)
\((w_1,b_1)=(-4.5,3.5)\),\(\Delta=0.5\)
\((w_2,b_2)=(-4,3)\),\(\Delta=0.5\)
\((w_3,b_3)=(-3.5,2.5)\),\(\Delta=0.5\)
\((w_4,b_4)=(-3,2)\),\(\Delta=0.5\)
\((w_5,b_5)=(-2.5,1.5)\),\(\Delta=0.5\)
\((w_6,b_6)=(-2,1.5)\),\(\Delta=0.5\)
\((w_7,b_7)=(-1.5,1)\),\(\Delta=0.5\)
\((w_8,b_8)=(-1,1)\),\(\Delta=0.5\)
\((w_9,b_9)=(-0.5,0.5)\),\(\Delta=0.5\)
\((w_{10},b_{10})=(0,0.5)\),\(\Delta=0.5\)
\((w_{11},b_{11})=(0.5,0)\),\(\Delta=0.5\)
\((w_{12},b_{12})=(1,0)\),\(\Delta=0.5\)
\((w_{final},b_{final})=(1.020,-0.009)\),\(\Delta=0.001\)
\(y=1.02x-0.009\)
import matplotlib.colors as colors
import matplotlib.cm as cm
from matplotlib import animation, rc
from IPython.display import HTML
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# getting training data from github
url = 'https://thomasinfor.github.io/train-data/linear_regression1.csv'
data = pd.read_csv(url)
x=np.array(data['x'])
y=np.array(data['y'])
n=len(x)
# define loss function and best-point function
def loss(w,b):
total=0
for i in range(n):
total=total+(w*x[i]+b-y[i])**2
return total/n
delta=0.001
def p(w,b):
pnt=(w,b)
for dw in range (-1,2):
for db in range (-1,2):
if loss(pnt[0],pnt[1])>loss(w+dw*delta,b+db*delta):
pnt=(w+dw*delta,b+db*delta)
trace.append(pnt)
return pnt
point=(-4,4)
trace=[[point[0]],[point[1]],[loss(point[0],point[1])]]
# train!!!
while True:
# get the best point around
temp=p(point[0],point[1])
# break the loop if the current
# point is already the best
if(temp==point):
break;
# step to the best point
point=temp
# adding the trace for plotting
trace[0].append(point[0])
trace[1].append(point[1])
trace[2].append(loss(point[0],point[1]))
print(f"result: w = {point[0]} , b = {point[1]}")
# plotting result
# you can just ignore this part when
# you're reading as an beginner
fig, ax = plt.subplots(1,2)
fig.patch.set_alpha(0)
fig.set_size_inches(fig.get_figwidth()*2,fig.get_figheight())
plt.close()
ax[0].set_xlim(-0.1*len(trace[0]),len(trace[0])*1.1)
ax[0].set_ylim(-0.1*max(trace[2]),max(trace[2])*1.1)
ax[1].set_xlabel('$iteration$')
ax[1].set_ylabel('$loss$')
X = np.linspace(-6, 6, 100)
Y = np.linspace(-6, 6, 100)
X, Y = np.meshgrid(X, Y)
Z = Y * 0
for i in range(n):
Z = Z + (X*x[i]+Y-y[i])**2
pcm = ax[1].pcolor(X, Y, Z, norm=colors.Normalize(vmin=Z.min(), vmax=Z.max()), cmap=cm.jet)
ax[1].contour(X,Y,Z,30)
fig.colorbar(pcm, ax=ax, extend='max')
ax[0].set_xlabel('$iteration$')
ax[0].set_ylabel('$loss$')
ax[1].set_xlabel('$w$')
ax[1].set_ylabel('$b$')
x0, y0 = [], []
x1, y1 = [], []
line0, = ax[0].plot([],[],c='black')
line1, = ax[1].plot([],[],c='white')
def animate(i):
x0.append(i)
y0.append(trace[2][int(i)])
x1.append(trace[0][int(i)])
y1.append(trace[1][int(i)])
line0.set_data(x0,y0)
line1.set_data(x1,y1)
return (line0,line1)
anim = animation.FuncAnimation(fig, animate, frames=np.linspace(0,len(trace[0])-1,100), interval=30, blit=True)
rc('animation', html='jshtml')
anim
\(\Delta x=3\)
\(\Delta x=-1\)
\(\Delta y=2\)
\(\Delta y=3\)
坡度\(=\frac{2}{3}\)
坡度\(=\frac{3}{-1}\)
Within a small enough neighbourhood of \(\mathbf{x}\),
\(f(\mathbf{x} + \mathbf{u}) = f(\mathbf{x}) + \mathbf{u} \cdot \nabla f\)
Choose \(\mathbf{u} = -\nabla f\) for maximum decrease in \(f\)!
Therefore \(\mathbf{x}' = \mathbf{x} - \nabla f\) should have a smaller value of \(f\)
\(w\)
\(loss\)
finished !!!
\(b\)
\(w_0,b\)
\(w_1,b\)
\(w_2,b\)
\(w_3,b\)
\(w_4,b\)
import matplotlib.colors as colors
import matplotlib.cm as cm
from matplotlib import animation, rc
from IPython.display import HTML
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import time
# getting training data from github
url = 'https://thomasinfor.github.io/train-data/linear_regression1.csv'
data = pd.read_csv(url)
x=np.array(data['x'])
y=np.array(data['y'])
n=len(x)
# define loss function
def loss(w,b):
total=0
for i in range(n):
total+=(w*x[i]+b-y[i])**2
return total/n
# define differatial
def dldw(w,b):
total=0
for i in range(n):
total+=(w*x[i]+b-y[i])*x[i]
return total/n
# define differatial
def dldb(w,b):
total=0
for i in range(n):
total+=(w*x[i]+b-y[i])
return total/n
# initialize
w=-4
b=4
trace=[[w],[b],[loss(w,b)]]
learning_rate=1
# 1.65 1.6 1.5 1 0.1 0.01
# train!!!
iters=100
start=time.time()
for iteration in range(iters):
last=(w,b)
w-=learning_rate*dldw(last[0],last[1])
b-=learning_rate*dldb(last[0],last[1])
# if ((last[0]-w)**2+(last[1]-b)**2)<0.00000001:
# break;
trace[0].append(w)
trace[1].append(b)
trace[2].append(loss(w,b))
print("finished")
end=time.time()
print(f"time: {end-start} sec.")
print(f"result: w = {w} , b = {b}")
# plotting result
# you can just ignore this part when
# you're reading as an beginner
fig, ax = plt.subplots(1,2)
fig.patch.set_alpha(0)
fig.set_size_inches(fig.get_figwidth()*2,fig.get_figheight())
plt.close()
ax[0].set_xlim(-iters*0.1,iters*1.1)
ax[0].set_ylim(-0.1*max(trace[2]),max(trace[2])*1.1)
ax[1].set_xlabel('$iteration$')
ax[1].set_ylabel('$loss$')
X = np.linspace(-7, 7, 100)
Y = np.linspace(-7, 7, 100)
X, Y = np.meshgrid(X, Y)
Z = Y * 0
for i in range(n):
Z = Z + (X*x[i]+Y-y[i])**2
pcm = ax[1].pcolor(X, Y, Z, norm=colors.Normalize(vmin=Z.min(), vmax=Z.max()), cmap=cm.jet)
ax[1].contour(X,Y,Z,30)
fig.colorbar(pcm, ax=ax, extend='max')
ax[0].set_xlabel('$iteration$')
ax[0].set_ylabel('$loss$')
ax[1].set_xlabel('$w$')
ax[1].set_ylabel('$b$')
x0, y0 = [], []
x1, y1 = [], []
line0, = ax[0].plot([],[],c='black')
line1, = ax[1].plot([],[],c='white')
def animate(i):
x0.append(i)
y0.append(trace[2][int(i)])
x1.append(trace[0][int(i)])
y1.append(trace[1][int(i)])
line0.set_data(x0,y0)
line1.set_data(x1,y1)
return (line0,line1)
anim = animation.FuncAnimation(fig, animate, frames=iters, interval=30, blit=True)
rc('animation', html='jshtml')
anim
\(\Longrightarrow\)可處理更複雜的函數
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import matplotlib.cm as cm
import numpy as np
d=10
n=40
x=[[0.99, 0.15, 0.15, 0.31, 0.78, 0.78, 0.87, 0.3, 0.46], [0.91, 0.05, 0.58, 0.99, 0.41, 0.49, 0.98, 0.29, 0.3], [0.61, 0.52, 0.01, 0.85, 0.74, 0.64, 0.08, 0.7, 0.93], [0.1, 0.14, 0.06, 0.1, 0.61, 0.97, 0.39, 0.72, 0.7], [0.51, 0.67, 0.19, 0.65, 0.2, 0.71, 0.2, 0.08, 0.41], [0.2, 0.4, 0.79, 0.45, 0.83, 0.95, 0.56, 0.89, 0.66], [0.12, 0.84, 0.59, 0.96, 0.1, 0.48, 0.17, 0.82, 0.38], [0.95, 0.3, 0.44, 0.02, 0.34, 0.12, 0.38, 0.87, 0.9], [0.3, 0.78, 0.19, 0.71, 0.13, 0.63, 0.49, 0.49, 0.67], [0.07, 0.82, 0.04, 0.55, 0.11, 0.68, 0.32, 0.11, 0.45], [0.85, 0.52, 0.3, 0.39, 0.2, 0.88, 0.88, 0.26, 0.46], [0.46, 0.97, 0.49, 0.24, 0.76, 0.24, 0.05, 0.25, 0.82], [0.68, 0.04, 0.32, 0.71, 0.63, 0.33, 0.72, 0.47, 0.62], [0.96, 0.4, 0.64, 0.54, 0.3, 0.54, 0.8, 0.13, 0.61], [0.33, 0.62, 0.01, 0.15, 0.98, 0.8, 0.64, 0.78, 0.82], [0.05, 0.51, 0.94, 0.26, 0.46, 0.77, 0.22, 0.86, 0.21], [0.25, 0.43, 0.95, 0.1, 0.33, 0.86, 0.26, 0.04, 0.94], [0.37, 0.62, 0.16, 0.53, 0.55, 0.2, 0.04, 0.0, 0.8], [0.59, 0.59, 0.0, 0.32, 0.64, 0.28, 0.04, 0.0, 0.83], [0.17, 0.53, 0.62, 0.78, 0.7, 0.28, 0.2, 0.68, 0.45], [0.08, 0.38, 0.44, 0.22, 0.83, 0.85, 0.55, 0.87, 0.26], [0.24, 0.7, 0.57, 0.64, 0.5, 0.73, 0.29, 0.0, 0.21], [0.22, 0.25, 0.4, 0.59, 0.3, 0.41, 0.51, 0.54, 0.52], [0.18, 1.01, 0.07, 0.75, 0.14, 0.73, 0.84, 0.23, 0.69], [0.25, 0.39, 0.15, 0.92, 0.03, 0.84, 0.6, 0.3, 0.48], [0.89, 0.33, 0.8, 0.2, 0.11, 0.88, 0.91, 0.63, 0.53], [0.76, 0.14, 0.48, 0.53, 0.1, 0.7, 0.34, 0.39, 0.11], [0.68, 0.03, 0.81, 0.88, 0.15, 0.19, 0.58, 0.41, 0.36], [0.55, 0.17, 0.04, 0.28, 0.18, 0.58, 0.7, 0.6, 0.03], [0.65, 0.03, 0.57, 0.28, 0.42, 0.46, 0.27, 0.33, 0.37], [0.63, 0.47, 0.29, 0.27, 0.93, 0.44, 0.28, 0.49, 0.91], [0.87, 0.23, 0.95, 0.87, 0.06, 0.31, 0.67, 0.81, 0.53], [0.12, 0.55, 0.59, 0.87, 0.83, 0.76, 0.79, 0.86, 0.81], [0.13, 0.97, 0.16, 0.23, 0.46, 0.13, 0.43, 0.69, 0.66], [0.07, 0.62, 0.87, 0.78, 0.93, 0.54, 0.89, 0.36, 0.79], [0.34, 0.6, 0.86, 0.33, 0.56, 0.28, 0.94, 0.51, 0.72], [0.31, 0.68, 0.94, 0.53, 0.07, 0.38, 0.11, 0.85, 0.15], [0.37, 0.2, 0.41, 0.5, 0.56, 0.78, 0.39, 0.99, 0.27], [0.21, 0.12, 0.38, 0.64, 0.81, 0.94, 0.25, 0.62, 0.23], [0.43, 0.78, 0.81, 0.32, 0.18, 0.57, 0.48, 0.62, 0.64]]
y=[28.4, 27.58, 31.07, 29.83, 21.39, 35.269999999999996, 26.499999999999996, 27.77, 27.689999999999998, 21.680000000000003, 27.270000000000003, 24.52, 28.549999999999997, 26.79, 33.87, 26.88, 26.61, 21.37, 21.39, 26.87, 29.96, 21.83, 26.07, 28.779999999999998, 26.31, 30.479999999999997, 21.240000000000002, 24.069999999999997, 22.349999999999998, 21.88, 29.17, 29.46, 37.7, 26.160000000000004, 34.3, 30.92, 23.139999999999997, 29.09, 27.419999999999998, 28.27]
def ip(a,b):
t=0
for i in range(len(a)):
t=t+a[i]*b[i]
return t
def l(w,b):
t=0
for i in range(n):
t=t+(ip(w,x[i])+b-y[i])**2
return t/n
def dldw(w,b,j):
k=0
for i in range(n):
k=k+(ip(w,x[i])+b-y[i])*x[i][j]
return k/n;
def dldb(w,b):
k=0
for i in range(n):
k=k+(ip(w,x[i])+b-y[i])
return k/n;
w=[]
b=0
for i in range(d-1):
w.append(0)
t=[[w,b]]
fig, ax = plt.subplots()
lr=0.5
for cnt in range(5000):
last=(w,b)
for i in range(d-1):
w[i]=w[i]-lr*dldw(w,b,i)
b=b-lr*dldb(w,b)
t.append([w,b])
for i in range(len(t)-1):
plt.plot([i,i+1],[l(t[i][0],t[i][1]),l(t[i+1][0],t[i+1][1])],c="black")
ax.set_xlabel('$iterate$')
ax.set_ylabel('$Loss$')
plt.show()
print(w)
print(b)
print(l(w,b))
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import matplotlib.cm as cm
import numpy as np
d=5
n=100
x=[0.76, 0.96, 0.94, 0.02, 0.25, 0.88, 0.92, 0.99, 0.38, 0.67, 0.48, 0.69, 0.57, 0.98, 0.1, 0.41, 0.47, 0.71, 0.37, 0.58, 1.0, 0.64, 0.15, 0.91, 0.03, 0.96, 0.42, 0.57, 0.93, 0.14, 0.29, 0.11, 0.86, 0.62, 0.98, 0.46, 0.71, 0.65, 0.15, 0.76, 0.27, 0.88, 0.96, 0.07, 0.63, 0.04, 0.31, 0.87, 0.02, 0.91, 0.4, 0.48, 0.27, 0.18, 0.16, 0.27, 0.18, 0.28, 0.98, 0.4, 0.15, 0.71, 0.83, 0.05, 0.98, 0.89, 0.37, 0.25, 0.66, 0.85, 0.68, 0.53, 0.46, 0.19, 0.13, 0.28, 0.16, 0.55, 0.94, 0.63, 0.52, 0.55, 0.83, 0.28, 0.98, 0.62, 1.01, 0.97, 0.43, 0.7, 0.96, 0.31, 0.47, 1.01, 0.61, 0.84, 0.04, 0.13, 0.41, 0.6]
y=[]
for i in range(n):
t=0
for j in range(d):
t=t+j*(x[i]**j)
y.append(t)
def l(aa):
t=0
for i in range(n):
k=0
for j in range(d):
k=k+aa[j]*(x[i]**j)
t=t+((k-y[i])**2)/n
return t
def dlda(aa,j):
t=0
for i in range(n):
k=0
for m in range(d):
k=k+aa[m]*(x[i]**m)
t=t+(k-y[i])*(x[i]**j)/n
return t
a=[]
for i in range(d):
a.append(0)
t=[list(a)]
fig, ax = plt.subplots()
lr=1
for cnt in range(10000):
last=list(a)
for i in range(d):
a[i]=a[i]-lr*dlda(last,i)
t.append(list(a))
plt.plot([cnt,cnt+1],[l(last),l(a)],c="black")
ax.set_xlabel('$iterate$')
ax.set_ylabel('$Loss$')
plt.show()
print(a)
print(l(a))
總共更新\(a\)次、\(n\)組學習資料、\(d\)個參數:
複雜度\(O(a\cdot n\cdot d)\)
剛剛多項式回歸:\(O(10000\times 100\times 10)\Rightarrow 1 min\)
當參數越來越大量\(\cdots\)我們又要優化了
import matplotlib.colors as colors
import matplotlib.cm as cm
from matplotlib import animation, rc
from IPython.display import HTML
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import random as rd
import time
# getting training data from github
url = 'https://thomasinfor.github.io/train-data/linear_regression2.csv'
data = pd.read_csv(url)
x=np.array(data['x'])/100
y=np.array(data['y'])/100
n=len(x)
# define loss function
def loss(w,b):
total=0
for i in range(n):
total+=(w*x[i]+b-y[i])**2
return total/n
# define differatial
def dldw(w,b):
i=rd.randint(0,n-1)
return (w*x[i]+b-y[i])*x[i]
# define differatial
def dldb(w,b):
i=rd.randint(0,n-1)
return (w*x[i]+b-y[i])
# initialize
w=-4
b=4
trace=[[w],[b],[]]
learning_rate=1.4
iters=100
# train!!!
start=time.time()
for iteration in range(iters):
last=(w,b)
w-=learning_rate*dldw(last[0],last[1])
b-=learning_rate*dldb(last[0],last[1])
# if ((last[0]-w)**2+(last[1]-b)**2)<0.00000001:
# break;
trace[0].append(w)
trace[1].append(b)
print("finished")
end=time.time()
print(f"time: {end-start} sec.")
print(f"result: w = {w} , b = {b}")
for i in range(len(trace[0])):
trace[2].append(loss(trace[0][i],trace[1][i]))
# plotting result
# you can just ignore this part when
# you're reading as an beginner
fig, ax = plt.subplots(1,2)
plt.close()
fig.patch.set_alpha(0)
fig.set_size_inches(fig.get_figwidth()*2,fig.get_figheight())
ax[0].set_title('learning rate = '+str(learning_rate))
ax[0].set_xlim(-iters*0.1,iters*1.1)
ax[0].set_ylim(-0.1*max(trace[2]),max(trace[2])*1.1)
ax[1].set_title('learning rate = '+str(learning_rate))
ax[1].set_xlabel('$iteration$')
ax[1].set_ylabel('$loss$')
X = np.linspace(-7, 7, 100)
Y = np.linspace(-7, 7, 100)
X, Y = np.meshgrid(X, Y)
Z = Y * 0
for i in range(n):
Z = Z + (X*x[i]+Y-y[i])**2
pcm = ax[1].pcolor(X, Y, Z, norm=colors.Normalize(vmin=Z.min(), vmax=Z.max()), cmap=cm.jet)
ax[1].contour(X,Y,Z,30)
fig.colorbar(pcm, ax=ax, extend='max')
ax[0].set_xlabel('$iteration$')
ax[0].set_ylabel('$loss$')
ax[1].set_xlabel('$w$')
ax[1].set_ylabel('$b$')
x0, y0 = [], []
x1, y1 = [], []
line0, = ax[0].plot([],[],c='black')
line1, = ax[1].plot([],[],c='white')
def animate(i):
x0.append(i)
y0.append(trace[2][int(i)])
x1.append(trace[0][int(i)])
y1.append(trace[1][int(i)])
line0.set_data(x0,y0)
line1.set_data(x1,y1)
return (line0,line1)
rc('animation', html='jshtml',)
anim = animation.FuncAnimation(fig, animate, frames=np.linspace(0,iters-1,100), interval=30, blit=True)
anim
雖然期望值與誤差函數相同,但變異太大!
import matplotlib.colors as colors
import matplotlib.cm as cm
from matplotlib import animation, rc
from IPython.display import HTML
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import random as rd
import time
batch_size=50
# getting training data from github
url = 'https://thomasinfor.github.io/train-data/linear_regression2.csv'
data = pd.read_csv(url)
x=np.array(data['x'])/100
y=np.array(data['y'])/100
n=len(x)
print(n,'datas')
# define loss function
def loss(w,b):
total=0
for i in range(n):
total+=(w*x[i]+b-y[i])**2
return total/n
# define differatial
def dldw(w,b):
total=0
for cnt in range(batch_size):
i=rd.randint(0,n-1)
total+=(w*x[i]+b-y[i])*x[i]
return total/batch_size
# define differatial
def dldb(w,b):
total=0
for cnt in range(batch_size):
i=rd.randint(0,n-1)
total+=(w*x[i]+b-y[i])
return total/batch_size
# initialize
w=-4
b=4
trace=[[w],[b],[]]
learning_rate=1
iters=100
# train!!!
start=time.time()
for iteration in range(iters):
last=(w,b)
w-=learning_rate*dldw(last[0],last[1])
b-=learning_rate*dldb(last[0],last[1])
# if ((last[0]-w)**2+(last[1]-b)**2)<0.00000001:
# break;
trace[0].append(w)
trace[1].append(b)
print("finished")
end=time.time()
print(f"time: {end-start} sec.")
print(f"result: w = {w} , b = {b}")
for i in range(len(trace[0])):
trace[2].append(loss(trace[0][i],trace[1][i]))
# plotting result
# you can just ignore this part when
# you're reading as an beginner
fig, ax = plt.subplots(1,2)
plt.close()
fig.patch.set_alpha(0)
fig.set_size_inches(fig.get_figwidth()*2,fig.get_figheight())
ax[0].set_title('learning rate = '+str(learning_rate))
ax[0].set_xlim(-iters*0.1,iters*1.1)
ax[0].set_ylim(-0.1*max(trace[2]),max(trace[2])*1.1)
ax[1].set_title('learning rate = '+str(learning_rate))
ax[1].set_xlabel('$iteration$')
ax[1].set_ylabel('$loss$')
X = np.linspace(-7, 7, 100)
Y = np.linspace(-7, 7, 100)
X, Y = np.meshgrid(X, Y)
Z = Y * 0
for i in range(n):
Z = Z + (X*x[i]+Y-y[i])**2
pcm = ax[1].pcolor(X, Y, Z, norm=colors.Normalize(vmin=Z.min(), vmax=Z.max()), cmap=cm.jet)
ax[1].contour(X,Y,Z,30)
fig.colorbar(pcm, ax=ax, extend='max')
ax[0].set_xlabel('$iteration$')
ax[0].set_ylabel('$loss$')
ax[1].set_xlabel('$w$')
ax[1].set_ylabel('$b$')
x0, y0 = [], []
x1, y1 = [], []
line0, = ax[0].plot([],[],c='black')
line1, = ax[1].plot([],[],c='white')
def animate(i):
x0.append(i)
y0.append(trace[2][int(i)])
x1.append(trace[0][int(i)])
y1.append(trace[1][int(i)])
line0.set_data(x0,y0)
line1.set_data(x1,y1)
return (line0,line1)
rc('animation', html='jshtml',)
anim = animation.FuncAnimation(fig, animate, frames=np.linspace(0,iters-1,100), interval=30, blit=True)
anim
linear | 10D | polynomial | |
---|---|---|---|
before | 30s | 48s | 222s |
after | 11s | 10s | 17s |
dataset | 1000 | 500 | 1000 |
batchsize | 50 | 50 | 50 |
global minima v.s. local minima
最小值
極小值
事實上\(\cdots\)
坡度緩,走慢一點
坡度陡,走快一點
假設同一種東西分布為常態分布(Gaussian distribution)
draw an appropriate curve to it
楚河漢界
tribe x
tribe y
determined by three features:
tribe x's \(f(x)\):
曲線下面積\(=1\)
\(f(a)\)代表多少比例的人分布在 \(x=a\)
bad
bad
better
worst
almost there
the best !!!
\(\Sigma_i=\frac{1}{|C_i|}\sum\limits_{x\in C_i}(x-\mu_i)(x-\mu_i)^T\)
\(135\sim 140\)
60%
12%
28%
\(130\sim 135\)
22%
33%
45%
先做一個小小的假設\(\cdots\)令\(\Sigma_0=\Sigma_1=\Sigma\)
我也還不知道為什麼\(\cdots\)
\(x_1\)
\(x_2\)
\(x_n\)
\(\cdots\)
\(w_1\)
\(w_2\)
\(w_n\)
\(+b\)
\(\sigma\)
output
Sigmoid function
讓所有\(y_i\)是true 的\(f(x_i)\)接近\(1\)
讓所有\(y_i\)是false的\(f(x_i)\)接近\(0\)
均方誤差?
分布 \(p\)
分布 \(q\)
cross entropy \(=H(y,f(x)),c(f(x),y)\)
\(X=\{0,1\}\)
linear regression
logistic regression
function set
loss function
differential
\(\frac{\partial L(f_{w,b})}{\partial w_j}=\frac{2}{n}\sum\limits_i(f_{w,b}(x_i)-y_i)(x_j)\)
\(\frac{\partial L(f_{w,b})}{\partial w_j}=\frac{1}{n}\sum\limits_i(f_{w,b}(x_i)-y_i)(x_j)\)
import matplotlib.colors as colors
import matplotlib.cm as cm
from matplotlib import animation, rc
from IPython.display import HTML
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import time
url = 'https://thomasinfor.github.io/train-data/logistic_regression1.csv'
data = pd.read_csv(url)
n=len(data['x'])
temp=[data['x'],data['y']]
train_x=np.array([[temp[0][i],temp[1][i]] for i in range(n)])
train_y=np.array(data['class'])
plt.scatter([train_x[i][0] for i in range(n) if train_y[i]==0],
[train_x[i][1] for i in range(n) if train_y[i]==0],c="red",s=3)
plt.scatter([train_x[i][0] for i in range(n) if train_y[i]==1],
[train_x[i][1] for i in range(n) if train_y[i]==1],c="blue",s=3)
plt.show()
# define functions
def sigmoid(x):
return 1/(1+np.exp(-x))
# inner product
def dot(w,x):
total=0
for i in range(2):
total=total+w[i]*x[i]
return total
# function set
def f(w,b,x):
return sigmoid(dot(w,x)+b)
# cross entropy (single loss)
def cross_entropy(a,b):
if b==0 or b==1:
return abs(a-b)*1000000000
return -np.exp(b)*a-np.exp(1-b)*(1-a)
# loss function (summation over all data loss)
def loss(w,b):
total=0;
for i in range(n):
total+=cross_entropy(f(w,b,train_x[i]),train_y[i])
return total/n
# defferential
def dldw(w,b,i):
total=0
for j in range(n):
total=total+(f(w,b,train_x[j])-train_y[j])*train_x[j][i]
return total/n
# defferentail
def dldb(w,b):
total=0
for j in range(n):
total=total+(f(w,b,train_x[j])-train_y[j])
return total/n
# predict answer (return predicted class)
def predict(w,b,x):
if f(w,b,x)>=0.5:
return 1
else:
return 0
# predict all data
def test_all(w,b,x,y):
cnt=0
for i in range(len(x)):
if predict(w,b,x[i])==y[i]:
cnt+=1
return cnt/len(x)
# training !!!
w,b=[0.1,0.1],1
history={'w':[w],'b':[b],'loss':[loss(w,b)],'acc':[test_all(w,b,train_x,train_y)]}
learning_rate=0.01
iters=1000
print("start training")
start=time.time()
for cnt in range(iters):
last=list(w)
for i in range(2):
w[i]-=learning_rate*dldw(last,b,i)
b-=learning_rate*dldb(last,b)
history['w'].append(list(w))
history['b'].append(b)
history['loss'].append(loss(w,b))
history['acc'].append(test_all(w,b,train_x,train_y))
print("finished")
end=time.time()
print(f"time: {end-start} seconds")
print(f"result: w = {w}, b = {b}")
print(f"accuracy: {test_all(w,b,train_x,train_y)*100}%")
# plotting result
# you can just ignore this part when
# you're reading as an beginner
fig, ax = plt.subplots(1,2)
fig.patch.set_alpha(0)
fig.set_size_inches(fig.get_figwidth()*2,fig.get_figheight())
plt.close()
ax[0].set_xlim(-iters*0.1,iters*1.1)
ax[0].set_ylim(-0.1*max(history['acc']),max(history['acc'])*1.1)
X = np.linspace(-15,15,100)
X, Y = np.meshgrid(X, X)
Z = Y * 0
Z=f(history['w'][0],history['b'][0],[X,Y])
pcm = ax[1].pcolor(X, Y, Z, norm=colors.Normalize(vmin=Z.min(), vmax=Z.max()), cmap=cm.jet)
fig.colorbar(pcm, ax=ax, extend='max')
ax[0].set_xlabel('$iteration$')
ax[0].set_ylabel('$acc$')
ax[1].set_xlabel('$x$')
ax[1].set_ylabel('$y$')
x0, y0 = [], []
line0, = ax[0].plot([],[],c='black')
def animate(i):
Z=f(history['w'][int(i)],history['b'][int(i)],[X,Y])
ax[1].pcolor(X, Y, Z, norm=colors.Normalize(vmin=Z.min(), vmax=Z.max()), cmap=cm.jet)
ax[1].scatter([train_x[i][0] for i in range(n) if train_y[i]==0],
[train_x[i][1] for i in range(n) if train_y[i]==0],c="red",s=3)
ax[1].scatter([train_x[i][0] for i in range(n) if train_y[i]==1],
[train_x[i][1] for i in range(n) if train_y[i]==1],c="blue",s=3)
x0.append(i)
y0.append(history['acc'][int(i)])
line0.set_data(x0,y0)
return (line0,)
anim = animation.FuncAnimation(fig, animate, frames=200, interval=30, blit=True)
rc('animation', html='jshtml')
anim
\(x_1\)
\(x_2\)
\(x_n\)
\(\cdots\)
\(+b_1\)
\(\cdots\)
\(+b_2\)
\(+b_3\)
\(+b_k\)
\(\sum=z_1\)
\(\sum=z_2\)
\(\sum=z_3\)
\(\sum=z_k\)
\(exp\)
\(exp\)
\(exp\)
\(exp\)
\(e^{z_1}\)
\(e^{z_2}\)
\(e^{z_3}\)
\(e^{z_k}\)
\(p_1=\frac{e^{z_1}}{\sum\limits_je^{z_j}}\)
\(p_2=\frac{e^{z_2}}{\sum\limits_je^{z_j}}\)
\(p_3=\frac{e^{z_3}}{\sum\limits_je^{z_j}}\)
\(p_k=\frac{e^{z_k}}{\sum\limits_je^{z_j}}\)
softmax
accuracy:45.77%
import matplotlib.colors as colors
import matplotlib.cm as cm
from matplotlib import animation, rc
from IPython.display import HTML
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import time
url = 'https://thomasinfor.github.io/train-data/feature_extraction1.csv'
data = pd.read_csv(url)
n=len(data['x'])
temp=[data['x'],data['y']]
x_train=np.array([[temp[0][i],temp[1][i]] for i in range(n)])
y_train=np.array(data['class'])
plt.scatter([x_train[i][0] for i in range(n) if y_train[i]==0],
[x_train[i][1] for i in range(n) if y_train[i]==0],c="red",s=3)
plt.scatter([x_train[i][0] for i in range(n) if y_train[i]==1],
[x_train[i][1] for i in range(n) if y_train[i]==1],c="blue",s=3)
plt.show()
# define functions
# define functions
def sigmoid(x):
return 1/(1+np.exp(-x))
# inner product
def dot(w,x):
total=0
for i in range(1):
total=total+w[i]*x[i]
return total
# function set
def f(w,b,x):
return sigmoid(dot(w,x)+b)
# cross entropy (single loss)
def cross_entropy(a,b):
if b==0 or b==1:
return abs(a-b)*1000000000
return -np.exp(b)*a-np.exp(1-b)*(1-a)
# loss function (summation over all data loss)
def loss(w,b):
total=0;
for i in range(n):
total+=cross_entropy(f(w,b,feature[i]),y_train[i])
return total/n
# defferential
def dldw(w,b,i):
total=0
for j in range(n):
total=total+(f(w,b,feature[j])-y_train[j])*feature[j][i]
return total/n
# defferentail
def dldb(w,b):
total=0
for j in range(n):
total=total+(f(w,b,feature[j])-y_train[j])
return total/n
# predict answer (return predicted class)
def predict(w,b,x):
if f(w,b,x)>=0.5:
return 1
else:
return 0
# predict all data
def test_all(w,b,x,y):
cnt=0
for i in range(len(x)):
if predict(w,b,x[i])==y[i]:
cnt+=1
return cnt/len(x)
# feature extraction
def extract(x):
x=np.array(x)
if x.shape==(2,):
return np.array([np.sqrt(x[0]**2+x[1]**2)])
else:
xp=[]
for i in range(len(x)):
xp.append([np.sqrt(x[i][0]**2+x[i][1]**2)])
return np.array(xp)
x_train=np.array(x_train)
feature=extract(x_train)
# training !!!
w,b=[0.1],1
history={'w':[w],'b':[b],'loss':[loss(w,b)],'acc':[test_all(w,b,feature,y_train)]}
learning_rate=1
iters=1000
print("start training")
start=time.time()
for cnt in range(iters):
last=list(w)
for i in range(1):
w[i]-=learning_rate*dldw(last,b,i)
b-=learning_rate*dldb(last,b)
history['w'].append(list(w))
history['b'].append(b)
history['loss'].append(loss(w,b))
history['acc'].append(test_all(w,b,feature,y_train))
print("finished")
end=time.time()
print(f"time: {end-start} seconds")
print(f"result: w = {w}, b = {b}")
print(f"accuracy: {test_all(w,b,feature,y_train)*100}%")
# plot boundary
X = np.linspace(-6,6,100)
X, Y = np.meshgrid(X, X)
Z = X * 0
for i in range(len(X)):
for j in range(len(X)):
Z[i][j]=f(w,b,extract([X[i][j],Y[i][j]]))
fig, ax = plt.subplots(figsize=(7,5))
pcm = ax.pcolor(X, Y, Z, norm=colors.Normalize(vmin=Z.min(), vmax=Z.max()), cmap=cm.jet)
ax.scatter([x_train[i][0] for i in range(n) if y_train[i]==0],
[x_train[i][1] for i in range(n) if y_train[i]==0],c="red",s=3)
ax.scatter([x_train[i][0] for i in range(n) if y_train[i]==1],
[x_train[i][1] for i in range(n) if y_train[i]==1],c="blue",s=3)
fig.colorbar(pcm, ax=ax, extend='max')
ax.set_xlabel('$y$')
ax.set_ylabel('$x$')
plt.show()
# plotting result
# you can just ignore this part when
# you're reading as an beginner
fig, ax = plt.subplots(1,2)
fig.patch.set_alpha(0)
fig.set_size_inches(fig.get_figwidth()*2,fig.get_figheight())
plt.close()
ax[0].set_xlim(-iters*0.1,iters*1.1)
ax[0].set_ylim(-0.1*max(history['acc']),max(history['acc'])*1.1)
X = np.linspace(-6,6,100)
X, Y = np.meshgrid(X, X)
Z = Y * 0
for x in range(len(X)):
for y in range(len(X)):
Z[x][y]=f(history['w'][0],history['b'][0],extract([X[x][y],Y[x][y]]))
pcm = ax[1].pcolor(X, Y, Z, norm=colors.Normalize(vmin=Z.min(), vmax=Z.max()), cmap=cm.jet)
fig.colorbar(pcm, ax=ax, extend='max')
ax[0].set_xlabel('$iteration$')
ax[0].set_ylabel('$acc$')
ax[1].set_xlabel('$x$')
ax[1].set_ylabel('$y$')
x0, y0 = [], []
line0, = ax[0].plot([],[],c='black')
def animate(i):
for x in range(len(X)):
for y in range(len(X)):
Z[x][y]=f(history['w'][int(i)],history['b'][int(i)],extract([X[x][y],Y[x][y]]))
ax[1].pcolor(X, Y, Z, norm=colors.Normalize(vmin=Z.min(), vmax=Z.max()), cmap=cm.jet)
ax[1].scatter([x_train[i][0] for i in range(n) if y_train[i]==0],
[x_train[i][1] for i in range(n) if y_train[i]==0],c="red",s=3)
ax[1].scatter([x_train[i][0] for i in range(n) if y_train[i]==1],
[x_train[i][1] for i in range(n) if y_train[i]==1],c="blue",s=3)
x0.append(i)
y0.append(history['acc'][int(i)])
line0.set_data(x0,y0)
return (line0,)
anim = animation.FuncAnimation(fig, animate, frames=np.linspace(0,iters-1,50), interval=30, blit=True)
rc('animation', html='jshtml')
anim
import matplotlib.colors as colors
import matplotlib.cm as cm
from matplotlib import animation, rc
from IPython.display import HTML
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import time
url = 'https://thomasinfor.github.io/train-data/feature_extraction2.csv'
data = pd.read_csv(url)
n=len(data['x'])
temp=[data['x'],data['y']]
x_train=np.array([[temp[0][i],temp[1][i]] for i in range(n)])
y_train=np.array(data['class'])
plt.scatter([x_train[i][0] for i in range(n) if y_train[i]==0],
[x_train[i][1] for i in range(n) if y_train[i]==0],c="red",s=3)
plt.scatter([x_train[i][0] for i in range(n) if y_train[i]==1],
[x_train[i][1] for i in range(n) if y_train[i]==1],c="blue",s=3)
plt.show()
# define functions
# define functions
def sigmoid(x):
return 1/(1+np.exp(-x))
# inner product
def dot(w,x):
total=0
for i in range(1):
total=total+w[i]*x[i]
return total
# function set
def f(w,b,x):
return sigmoid(dot(w,x)+b)
# cross entropy (single loss)
def cross_entropy(a,b):
if b==0 or b==1:
return abs(a-b)*1000000000
return -np.exp(b)*a-np.exp(1-b)*(1-a)
# loss function (summation over all data loss)
def loss(w,b):
total=0;
for i in range(n):
total+=cross_entropy(f(w,b,feature[i]),y_train[i])
return total/n
# defferential
def dldw(w,b,i):
total=0
for j in range(n):
total=total+(f(w,b,feature[j])-y_train[j])*feature[j][i]
return total/n
# defferentail
def dldb(w,b):
total=0
for j in range(n):
total=total+(f(w,b,feature[j])-y_train[j])
return total/n
# predict answer (return predicted class)
def predict(w,b,x):
if f(w,b,x)>=0.5:
return 1
else:
return 0
# predict all data
def test_all(w,b,x,y):
cnt=0
for i in range(len(x)):
if predict(w,b,x[i])==y[i]:
cnt+=1
return cnt/len(x)
# feature extraction
def extract(x):
x=np.array(x)
if x.shape==(2,):
return np.array([x[0]*x[1]])
else:
xp=[]
for i in range(len(x)):
xp.append([x[i][0]*x[i][1]])
return np.array(xp)
x_train=np.array(x_train)
feature=extract(x_train)
# training !!!
w,b=[10],10
history={'w':[w],'b':[b],'loss':[loss(w,b)],'acc':[test_all(w,b,feature,y_train)]}
learning_rate=1
iters=1000
print("start training")
start=time.time()
for cnt in range(iters):
last=list(w)
for i in range(1):
w[i]-=learning_rate*dldw(last,b,i)
b-=learning_rate*dldb(last,b)
history['w'].append(list(w))
history['b'].append(b)
history['loss'].append(loss(w,b))
history['acc'].append(test_all(w,b,feature,y_train))
print("finished")
end=time.time()
print(f"time: {end-start} seconds")
print(f"result: w = {w}, b = {b}")
print(f"accuracy: {test_all(w,b,feature,y_train)*100}%")
# plotting result
# you can just ignore this part when
# you're reading as an beginner
fig, ax = plt.subplots(1,2)
fig.patch.set_alpha(0)
fig.set_size_inches(fig.get_figwidth()*2,fig.get_figheight())
plt.close()
ax[0].set_xlim(-iters*0.1,iters*1.1)
ax[0].set_ylim(-0.1*max(history['acc']),max(history['acc'])*1.1)
X = np.linspace(-6,6,100)
X, Y = np.meshgrid(X, X)
Z = Y * 0
for x in range(len(X)):
for y in range(len(X)):
Z[x][y]=f(history['w'][0],history['b'][0],extract([X[x][y],Y[x][y]]))
pcm = ax[1].pcolor(X, Y, Z, norm=colors.Normalize(vmin=Z.min(), vmax=Z.max()), cmap=cm.jet)
fig.colorbar(pcm, ax=ax, extend='max')
ax[0].set_xlabel('$iteration$')
ax[0].set_ylabel('$acc$')
ax[1].set_xlabel('$x$')
ax[1].set_ylabel('$y$')
x0, y0 = [], []
line0, = ax[0].plot([],[],c='black')
def animate(i):
for x in range(len(X)):
for y in range(len(X)):
Z[x][y]=f(history['w'][int(i)],history['b'][int(i)],extract([X[x][y],Y[x][y]]))
ax[1].pcolor(X, Y, Z, norm=colors.Normalize(vmin=Z.min(), vmax=Z.max()), cmap=cm.jet)
ax[1].scatter([x_train[i][0] for i in range(n) if y_train[i]==0],
[x_train[i][1] for i in range(n) if y_train[i]==0],c="red",s=3)
ax[1].scatter([x_train[i][0] for i in range(n) if y_train[i]==1],
[x_train[i][1] for i in range(n) if y_train[i]==1],c="blue",s=3)
x0.append(i)
y0.append(history['acc'][int(i)])
line0.set_data(x0,y0)
return (line0,)
anim = animation.FuncAnimation(fig, animate, frames=np.linspace(0,iters-1,50), interval=30, blit=True)
rc('animation', html='jshtml')
anim
人工轉換 | 機器學習 | |
時機 | 已知一些規律,較熟悉 | 無法找到規律 |
方法 | 預處理轉換 | 類神經網路!! |
人工轉換 | 機器學習 | |
時機 | 已知一些規律,較熟悉 | 無法找到規律 |
方法 | 預處理轉換 | 類神經網路!! |
\(x_1\)
\(x_2\)
\(x_3\)
\(x_4\)
\(x_5\)
\(\Sigma_1\)
\(\sigma(\Sigma_1+b_1)=P_1\)
\(w_1\)
\(w_2\)
\(w_3\)
\(w_4\)
\(w_5\)
\(h_1\)
\(h_2\)
recognizing feature 1
recognizing feature 2
\(h_3\)
\(h_4\)
recognizing feature 3 according to feature 1 and 2
recognizing feature 4 according to feature 1 and 2
\(y_1\)
\(y_2\)
make decisions according to feature 3 and 4
黑色面積多?
身高高?
音頻高?
黑色線條多?
移動速度?
強壯?
長髮?
INPUT
HIDDEN
OUTPUT
\(x_0\)
\(x_1\)
\(x_2\)
\(x_3\)
\(x_4\)
\(a_0\)
\(a_1\)
\(a_2\)
\(a_3\)
\(a_4\)
\(a_5\)
\(a_6\)
\(b_0\)
\(b_1\)
\(b_2\)
\(b_3\)
\(b_4\)
\(b_5\)
\(b_6\)
\(c_0\)
\(c_1\)
\(c_2\)
\(c_3\)
\(c_4\)
\(c_5\)
\(c_6\)
\(y_0\)
\(y_1\)
\(y_2\)
\(a_1\)
\(a_2\)
\(a_3\)
\(a_4\)
\(b_1\)
\(b_2\)
\(w_{ij}\)
\(b_3\)
\(x_0\)
\(x_1\)
\(x_2\)
\(x_3\)
\(x_4\)
\(a_0\)
\(a_1\)
\(a_2\)
\(a_3\)
\(a_4\)
\(a_5\)
\(a_6\)
\(b_0\)
\(b_1\)
\(b_2\)
\(b_3\)
\(b_4\)
\(b_5\)
\(b_6\)
\(c_0\)
\(c_1\)
\(c_2\)
\(c_3\)
\(c_4\)
\(c_5\)
\(c_6\)
\(y_0\)
\(y_1\)
\(y_2\)
\(X\)
\(W_1\)
\(W_2\)
\(W_3\)
\(W_4\)
\(Y\)
\(X\)
\(W_1\)
\(\sigma(\)
\()\)
\(W_2\)
\(\sigma(\)
\()\)
\()\)
\()\)
\(\sigma(\)
\(W_3\)
\(W_4\)
\(\sigma(\)
\(=\)
\(Y\)
\(x_0\)
\(x_1\)
\(x_2\)
\(x_3\)
\(x_4\)
\(a_0\)
\(a_1\)
\(a_2\)
\(a_3\)
\(a_4\)
\(a_5\)
\(a_6\)
\(c_0\)
\(c_1\)
\(c_2\)
\(c_3\)
\(c_4\)
\(c_5\)
\(c_6\)
\(y_0\)
\(y_1\)
\(y_2\)
so
many
layers
\(x_0\)
\(x_1\)
\(x_2\)
\(x_3\)
\(x_4\)
\(a_0\)
\(a_1\)
\(a_2\)
\(a_3\)
\(a_4\)
\(a_5\)
\(a_6\)
\(b_0\)
\(b_1\)
\(b_2\)
\(b_3\)
\(b_4\)
\(b_5\)
\(b_6\)
\(c_0\)
\(c_1\)
\(c_2\)
\(c_3\)
\(c_4\)
\(c_5\)
\(c_6\)
\(y_0\)
\(y_1\)
\(y_2\)
INPUT
OUTPUT
\(y_0\)
\(y_1\)
\(y_2\)
\(0.01\)
\(0.23\)
\(0.76\)
\(0\)
\(0\)
\(1\)
ans
output
In information theory, the cross entropy between two probability distributions \(p\) and \(q\) over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is optimized for an estimated probability distribution \(q\), rather than the true distribution \(p\).
在資訊理論中,基於相同事件測度的兩個概率分布 \(p\) 和 \(q\) 的交叉熵是指,當基於一個「非自然」(相對於「真實」分布 \(p\) 而言)的概率分布 \(q\) 進行編碼時,在事件集合中唯一標識一個事件所需要的平均比特數(bit)。
\(x_0\)
\(x_1\)
\(x_2\)
\(x_3\)
\(x_4\)
\(a_0\)
\(a_1\)
\(a_2\)
\(a_3\)
\(a_4\)
\(a_5\)
\(a_6\)
\(b_0\)
\(b_1\)
\(b_2\)
\(b_3\)
\(b_4\)
\(b_5\)
\(b_6\)
\(c_0\)
\(c_1\)
\(c_2\)
\(c_3\)
\(c_4\)
\(c_5\)
\(c_6\)
\(y_0\)
\(y_1\)
\(y_2\)
\(v_1\)
\(b_1\)
\(b_2\)
\(w_1\)
\(d_1\)
\(d_2\)
\(\cdots\) \(C\)
\(w_2\)
\(v_1\)
\(b_1\)
\(b_2\)
\(w_1\)
\(d_1\)
\(d_2\)
\(\cdots\) \(C\)
\(w_2\)
\(w_1\)
\(w_2\)
\(w_3\)
\(w_4\)
\(\cdots\)
等等 我要寫出那些東西?!
\(\cdots\)
25個
import torch.nn as nn
class Model(nn.Module):
def __init__(self):
super().__init__()
self.model = nn.Sequential(
nn.Linear(25, 25),
nn.ReLU()
nn.Linear(25, 2)
)
def forward(self, x):
return self.model(x)
\(\cdots\)
25個
\(\cdots\)
25個
model.add(Dense(1,activation='sigmoid'))
\(\cdots\)
25個
\(\cdots\)
25個
model.summary()
Model: "sequential_128"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_392 (Dense) (None, 25) 75
_________________________________________________________________
dense_393 (Dense) (None, 25) 650
_________________________________________________________________
dense_394 (Dense) (None, 1) 26
=================================================================
Total params: 751
Trainable params: 751
Non-trainable params: 0
_________________________________________________________________
model.compile(loss='binary_crossentropy',optimizer=SGD())
誤差函數
使用
隨機梯度下降法
Stochastic
Gradient
Descent
'binary_crossentropy'
# ylogY+(1-y)log(1-Y)
'categorical_entropy'
# 很多東西的交叉熵
'mse'
# 均方誤差
...
SGD(...)
Adam(...)
RMSprop(...)
Adagrad(...)
model.fit(x,y,batch_size,epochs,verbose=1)
# x: 訓練資料的x
# y: 訓練資料的y
# batch_size: 就是batch_size
# epochs: 要跑過整個訓練資料幾次(iterations)
# 更多參數可以參考keras的documentation
# verbose: 要不要印出訓練過程
Epoch 1/10
1000/1000 [==============================] - 1s 910us/step - loss: 0.7360
Epoch 2/10
1000/1000 [==============================] - 0s 19us/step - loss: 0.6937
Epoch 3/10
1000/1000 [==============================] - 0s 19us/step - loss: 0.6938
Epoch 4/10
1000/1000 [==============================] - 0s 26us/step - loss: 0.6930
Epoch 5/10
1000/1000 [==============================] - 0s 16us/step - loss: 0.6925
Epoch 6/10
1000/1000 [==============================] - 0s 21us/step - loss: 0.6927
Epoch 7/10
1000/1000 [==============================] - 0s 22us/step - loss: 0.6929
Epoch 8/10
1000/1000 [==============================] - 0s 17us/step - loss: 0.6927
Epoch 9/10
1000/1000 [==============================] - 0s 20us/step - loss: 0.6935
Epoch 10/10
1000/1000 [==============================] - 0s 22us/step - loss: 0.6930
print(model.evaluate(x,y,verbose=1))
1000/1000 [==============================] - 0s 414us/step
0.6922322082519531
print(model.predict([[1,2],[3,4],[5,6],[7,8]]))
# print(model.predict(x))
[[9.9988043e-01]
[4.0138029e-03]
[8.0009222e-06]
[1.1767075e-06]]
# array of model's prediction
import matplotlib.pyplot as plt
from matplotlib import cm
import numpy as np
import pandas as pd
url = 'https://thomasinfor.github.io/train-data/feature_extraction1.csv'
data = pd.read_csv(url)
n=len(data['x'])
x_train=np.array([[data['x'][i],data['y'][i]] for i in range(n)])
y_train=np.array(data['class'])
plt.scatter([x_train[i][0] for i in range(n)],
[x_train[i][1] for i in range(n)],
c=y_train,cmap=cm.bwr,s=3,vmin=0,vmax=1)
plt.show()
import keras
from keras.layers import Dense
from keras.optimizers import SGD
from keras.models import Sequential
from keras.callbacks import LambdaCallback
# not important
def result(model,xlim,ylim,prtdata=True,density=100):
coord=[[x,y] for x in np.linspace(-xlim,xlim,density) for y in np.linspace(-ylim,ylim,density)]
coord=np.array(coord)
res=model.predict(coord)
res=[i[0] for i in res]
plt.xlim(-xlim,xlim)
plt.ylim(-ylim,ylim)
plt.scatter([coord[i][0] for i in range(len(coord))],[coord[i][1] for i in range(len(coord))],c=res,cmap=cm.bwr,s=0.3,vmin=0,vmax=1)
if prtdata:
plt.scatter([x_train[i][0] for i in range(n)],
[x_train[i][1] for i in range(n)],
c=y_train,cmap=cm.bwr,s=3,vmin=0,vmax=1)
plt.show()
# 1.function set
model=Sequential()
model.add(Dense(25,input_shape=(2,),activation='sigmoid'))
model.add(Dense(25,activation='sigmoid'))
model.add(Dense(1,activation='sigmoid'))
# 2.define loss 3-1.calculate gradient
model.compile(loss='binary_crossentropy',optimizer=SGD(0.9),metrics=['accuracy'])
# print function
model.summary()
# not important
def func(epoch,logs):
if epoch%50==0:
print(epoch,':',sep='')
result(model,6,6,False)
plot=LambdaCallback(on_epoch_end=func)
# 3-2.update function
model.fit(x_train,y_train,batch_size=100,epochs=500,callbacks=[plot],verbose=0)
# print result
print(model.evaluate(x_train,y_train))
result(model,6,6)
url = 'https://thomasinfor.github.io/train-data/feature_extraction2.csv'
url = 'https://thomasinfor.github.io/train-data/feature_extraction3.csv'
import keras
from keras.layers import Dense
from keras.optimizers import SGD
from keras.models import Sequential
from keras.callbacks import LambdaCallback
# not important
def result(model,xlim,ylim,prtdata=True,density=100):
coord=[[x,y] for x in np.linspace(-xlim,xlim,density) for y in np.linspace(-ylim,ylim,density)]
coord=np.array(coord)
res=model.predict(coord)
res=[i[0] for i in res]
plt.xlim(-xlim,xlim)
plt.ylim(-ylim,ylim)
plt.scatter([coord[i][0] for i in range(len(coord))],[coord[i][1] for i in range(len(coord))],c=res,cmap=cm.bwr,s=0.3,vmin=0,vmax=1)
if prtdata:
plt.scatter([x_train[i][0] for i in range(n)],
[x_train[i][1] for i in range(n)],
c=y_train,cmap=cm.bwr,s=3,vmin=0,vmax=1)
plt.show()
# 1.function set
model=Sequential()
model.add(Dense(...,input_shape=(2,),activation='sigmoid'))
model.add(Dense(...,activation='sigmoid'))
...
model.add(Dense(1,activation='sigmoid'))
# 2.define loss 3-1.calculate gradient
model.compile(loss='binary_crossentropy',optimizer=SGD(...),metrics=['accuracy'])
# print function
model.summary()
# not important
def func(epoch,logs):
if epoch%50==0:
print(epoch,':',sep='')
result(model,6,6,False)
plot=LambdaCallback(on_epoch_end=func)
# 3-2.update function
model.fit(x_train,y_train,batch_size=50,epochs=...,callbacks=[plot],verbose=0)
# print result
print(model.evaluate(x_train,y_train))
result(model,6,6)
model=Sequential()
model.add(Dense(25,input_shape=(2,),activation='relu'))
model.add(Dense(25,activation='relu'))
model.add(Dense(1,activation='sigmoid'))
sigmoid function => 梯度容易變很小 => 不易更新函數
RMSprop(...)
Adagrad(...)
Adam(...)
model.fit(x,y,validation_split=0.2)
what we expect
用training data以外的東西來檢查它有沒有過適
----你沒有辦法「過適」你沒看過的東西!
在overfitting前
就停止訓練
每天都是不同的你
\(x_0\)
\(x_1\)
\(a_0\)
\(a_1\)
\(a_2\)
\(a_3\)
\(b_0\)
\(b_1\)
\(b_2\)
\(b_3\)
\(c_0\)
\(c_1\)
\(c_2\)
\(c_3\)
\(y_0\)
\(y_1\)
model.add(Dense(128))
model.add(Dropout(0.5))
model.add(Dense(128))
常規化
L1 norm
L2 norm
from keras.utils import plot_model
plot_model(model)
from keras.models import load_model
model.save('my_model.h5')
del model
model=load_model('my_model.h5')
history=model.fit(x,y)
print(history)
# then we can...
plt.plot(history.history['acc'])
plt.show()
爽啦!偷別人東西
Sequential \(\Leftrightarrow\) functional
堆疊型 \(\Leftrightarrow\) 函數型
# sequential
model=Sequential()
model.add(Dense(128,input_shape=(10,)))
model.add(Dropout(0.5))
model.add(Dense(128))
# functional
input_layer=Input(sgape=(10,))
hidden1=Dense(128)(input_layer)
hidden2=Dropout(0.5)(hidden1)
output_layer=Dense(128)(hidden2)
model=Model(input_layer,output_layer)
把layer當做一個函數!
layer1
layer2
fully
connected
layer1
layer2
\(f\)
\(f(\text{layer1})=\text{layer2}\\\text{Dense}(\text{layer1})=\text{layer2}\)
layer2=Dense(...)(layer2)
# 第一個輸入
input1=Input(shape=(10,))
x1=Dense(128)(input1)
x1=Dense(128)(x1)
x1=Dense(32)(x1)
# 第二個輸入
input2=Input(shape=(8,))
x2=Dense(32)(input2)
x2=Dense(32)(x2)
x2=Dense(32)(x2)
# 將處理過的兩個輸入接起來
con=concatenate([x1,x2])
con=Dense(28)(con)
# 接出第一個輸出
x=Dense(64)(con)
output1=Dense(1)(x)
# 接出第二個輸出
x=Dense(16)(con)
output2=Dense(1)(x)
# 建構(2輸入)->(2輸出)的模型
model=Model([input1,input2],
[output1,output2])
MNIST in ML = "hello world" in C++
from keras.datasets import mnist
import keras
from keras.layers import Dense
from keras.optimizers import SGD
from keras.models import Sequential
from keras.datasets import mnist
# input image dimensions
img_rows, img_cols = 28, 28
num_classes=10
# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(x_train.shape[0], img_rows * img_cols)
x_test = x_test.reshape(x_test.shape[0], img_rows * img_cols)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
model=Sequential()
model.add(Dense(10,input_shape=(784,),activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer=SGD(),metrics=['accuracy'])
model.summary()
model.fit(x_train,y_train,epochs=10,batch_size=32)
model = Sequential()
宣告模型
model.add(Dense(128,
activation='relu',
input_shape=(img_rows*img_cols,)))
add a hidden layer with 128 neurons
\(\cdots 128\cdots\)
model.add(Dense(128,
activation='relu')
add a hidden layer with 128 neurons
model.add(Dense(128,
activation='relu')
add a hidden layer with 128 neurons
\(\cdots 128\cdots\)
\(\cdots 128\cdots\)
INPUT
(input_shape)
model.add(Dense(num_classes,
activation='softmax'))
add an output layer with 10 neurons (10 categories)
\(\cdots 10\cdots\)
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=SGD(),
metrics=['accuracy'])
define loss function , optimize with stochastic gradient descent
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1)
train with x_train and y_train
epochs = repeat times
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
evaluate accuracy and loss with x_test and y_test
model.summary()
print model
import matplotlib.pyplot as plt
plt.plot(history.history['acc'])
plt.xlabel('epochs')
plt.ylabel('accuracy')
plt.show()
plt.plot(history.history['loss'])
plt.xlabel('epochs')
plt.ylabel('loss')
plt.show()
train acc: 0.9956 test acc: 0.9775
小畫家--在\(28\times 28\)的方格內畫數字
upload it
import matplotlib.pyplot as plt
import numpy as np
def rgb2gray(rgb):
return np.dot(rgb[...,:3], [0.299, 0.587, 0.114])
FILE = 'replace this with your uploaded file name'
source = rgb2gray(plt.imread(FILE))
source = 1-source
result = model.predict(source.reshape(1,img_cols*img_rows))
print('Result:', result.argmax())
plt.imshow(source)
plt.show()
convolutional neural network
捲積
神經網路
inner product
\(*\)
\(=\)
內積:兩個形狀一樣的東西對應相乘再加起來
\(=6\)
inner product
每個神經元都在做捲積
權重不同的內積
內積可以找出特定的pattern
\(*\)
\(=5\)
\(=4.50\)
\(=-10.19\)
filter
\(*\)
\(=5\)
\(=-1\)
\(=-5\)
這樣我們不就需要這九個filter...?
Text
同一個小filter用很多次!
大東西和小東西做很多次內積\(\Rightarrow\)捲積
聽不懂的話我道歉
這九個神經元:
model.add(Conv2D(units,kernal_size,padding=''))
# units: filter的數量
# kernal_size: pattern的大小
# padding: 'valid'=一般 , 'same'=保持大小
model=Sequential()
model.add(Conv2D(128,(3,3),padding='same',
input_shape=(28,28,1)))
model.add(Activation('relu'))
model.add(Conv2D(128,(3,3),padding='same'))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(loss='sparse_categorical_crossentropy',
optimizer=Adam(),metrics=['accuracy'])
model.summary()
import keras
from keras.layers import Dense,Conv2D,Activation,Flatten
from keras.models import Sequential
from keras.optimizers import Adam
from keras.datasets import mnist
(x_train,y_train),(x_test,y_test)=mnist.load_data()
print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)
x_train=x_train.reshape(x_train.shape+(1,))/255
x_test=x_test.reshape(x_test.shape+(1,))/255
from keras.datasets import fashion_mnist
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
x_train = x_train.reshape(x_train.shape + (1,))
x_test = x_test.reshape(x_test.shape + (1,))