行動技術與應用
Lesson 10: More Examples
波士頓房價預測資料敘述

13個輸入變量
1個輸出變量
波士頓房價預測資料敘述
訓練模型:預測房價的「中位數」

迴歸 (regression)
given: N輸入變數
output: 數值
波士頓房價預測訓練集/測試集
0.00632 18.00 2.310 0 0.5380 6.5750 65.20 4.0900 1 296.0 15.30 396.90 4.98 24.00
0.02731 0.00 7.070 0 0.4690 6.4210 78.90 4.9671 2 242.0 17.80 396.90 9.14 21.60
0.02729 0.00 7.070 0 0.4690 7.1850 61.10 4.9671 2 242.0 17.80 392.83 4.03 34.70
0.03237 0.00 2.180 0 0.4580 6.9980 45.80 6.0622 3 222.0 18.70 394.63 2.94 33.40
0.06905 0.00 2.180 0 0.4580 7.1470 54.20 6.0622 3 222.0 18.70 396.90 5.33 36.20
0.02985 0.00 2.180 0 0.4580 6.4300 58.70 6.0622 3 222.0 18.70 394.12 5.21 28.70
...資料集:共606筆
訓練集:404筆 測試集:102筆
問題:
訓練樣本少!
欄位尺度互不相同,有介於0~1之間的比值,也有其他尺度。
波士頓房價預測資料準備
```{r}
library(keras)
dataset <- dataset_boston_housing()
# multiple assignment from dataset to variables on the left hand side
c(c(train_data, train_targets), c(test_data, test_targets)) %<-% dataset
# Compactly display training data
str(train_data)
str(test_data)
```資料集:
資料來源 dataset_boston_housing()
dataset$train$x,..., dataset$test$y
# 讀取資料集, 分成訓練集,測試集
mnist <- dataset_mnist()
x_train <- mnist$train$x
y_train <- mnist$train$y
x_test <- mnist$test$x
y_test <- mnist$test$y波士頓房價預測資料準備
```{r}
mean <- apply(train_data, 2, mean) # 計算平均值, 2為按列計算, 套用於每個feature
std <- apply(train_data, 2, sd) # 計算標準差
train_data <- scale(train_data, center = mean, scale = std) # 正規化
test_data <- scale(test_data, center = mean, scale = std)
```此處正規化是 (數值-平均值)/標準差。
正規化後數值以0為中心,且大部分數值落在一個標準差內。
R的scale()函式,恰可完成此調整動作。
欄位尺度互不相同:NN學習效能將大打折扣,需正規化處理
波士頓房價預測網路模型建置
```{r}
# build_model function
build_model <- function() {
model <- keras_model_sequential() %>%
layer_dense(units = 64, activation = "relu",
input_shape = dim(train_data)[[2]]) %>%
layer_dense(units = 64, activation = "relu") %>%
layer_dense(units = 1)
model %>% compile(
optimizer = "rmsprop",
loss = "mse",
metrics = c("mae")
)
}
```輸入層:dim(train_data)[[2]] 13個輸入值
兩個隱藏層:各64個節點。 此舉是為了避免overfitting
輸出層:單一節點,無設定激勵函式,「線性迴歸」的標準設定方式。
訓練資料極少:網路不宜太大。
-最佳化:rmsprop
套用移動平均數概念
適用於小樣本學習
-損失函數:`mse`
均方誤差
迴歸問題廣泛使用
-指標使用: `mae`
Mean Absolute Error
波士頓房價預測網路模型建置
波士頓房價預測交叉驗證
```{r, echo=TRUE, results='hide'}
k <- 4 # 分成4份
indices <- sample(1:nrow(train_data)) # sample() 隨機產生rows序號
folds <- cut(indices, breaks = k, labels = FALSE) # 將rows編號對應到1~4區間
num_epochs <- 100
all_scores <- c()
for (i in 1:k) {
cat("處理fold #", i, "\n")
# 準備驗證集: 第i份為驗證集
val_indices <- which(folds == i, arr.ind = TRUE)
val_data <- train_data[val_indices,]
val_targets <- train_targets[val_indices]
# 準備訓練集: 第i份以外全部是訓練集
partial_train_data <- train_data[-val_indices,]
partial_train_targets <- train_targets[-val_indices]
# 呼叫先前所寫的build_model()
model <- build_model()
# 訓練網路模型 (in silent mode, verbose=0)
model %>% fit(partial_train_data, partial_train_targets,
epochs = num_epochs, batch_size = 1, verbose = 0)
# 使用驗證集評估訓練好的網路模型
results <- model %>% evaluate(val_data, val_targets, verbose = 0)
all_scores <- c(all_scores, results$mean_absolute_error)
}
```訓練資料極少:可使用交叉驗證,尋求較正確的模型評估方式。
-使用交叉驗證(K-fold cross-validation)
訓練資料分成K份(K通常為4或5)
建立K個相同的網路模型
(K-1)份訓練資料
剩下的那一份則作為驗證資料
驗證分數: K個分數的平均
波士頓房價預測參數調整
```{r}
# Some memory clean-up
K <- backend()
K$clear_session()
```
```{r, echo=TRUE, results='hide'}
num_epochs <- 500
all_mae_histories <- NULL
for (i in 1:k) {
cat("處理fold #", i, "\n")
# 準備驗證集: 第i份為驗證集
val_indices <- which(folds == i, arr.ind = TRUE)
val_data <- train_data[val_indices,]
val_targets <- train_targets[val_indices]
# 準備訓練集: 第i份以外全部是訓練集
partial_train_data <- train_data[-val_indices,]
partial_train_targets <- train_targets[-val_indices]
# 呼叫先前所寫的build_model()
model <- build_model()
# 訓練網路模型 (in silent mode, verbose=0)
history <- model %>% fit(
partial_train_data, partial_train_targets,
validation_data = list(val_data, val_targets),
epochs = num_epochs, batch_size = 1, verbose = 0
)
mae_history <- history$metrics$val_mean_absolute_error #記錄訓練過程
all_mae_histories <- rbind(all_mae_histories, mae_history)
}
```驗證結果不理想?調整參數,例如epochs從100上調至500
紀錄訓練資料
波士頓房價預測參數調整
計算每一epoch的MAE scores(4個folds平均值)
```{r}
average_mae_history <- data.frame(
epoch = seq(1:ncol(all_mae_histories)),
validation_mae = apply(all_mae_histories, 2, mean)
)
```
圖形繪出如下:
```{r}
library(ggplot2)
ggplot(average_mae_history, aes(x = epoch, y = validation_mae)) + geom_line()
```波士頓房價預測參數調整
圖形變化大,改用 `geom_smooth()`,簡化圖形:
```{r}
ggplot(average_mae_history, aes(x = epoch, y = validation_mae)) + geom_smooth()
```70 epochs以後的驗證值並未持續改善
故epochs改成80
其他參數也需進行調校,最後才能得到最佳模型。
波士頓房價預測參數調整
最後,假設已找到可接受的模型,再進行訓練,並套用測試集:
```{r, echo=FALSE, results='hide'}
# Get a fresh, compiled model.
model <- build_model()
# Train it on the entirety of the data.
model %>% fit(train_data, train_targets,
epochs = 80, batch_size = 16, verbose = 0)
result <- model %>% evaluate(test_data, test_targets)
```
```{r}
result
```波士頓房價預測
```{r setup, include=FALSE}
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
if(!"ggplot2" %in% installed.packages())
install.packages('ggplot2')
# 安裝'devtools' package:方便從github安裝套件
if(!"devtools" %in% installed.packages())
install.packages('devtools')
require(devtools)
# install tensorflow(如果你要tensorflow的話)
devtools::install_github("rstudio/tensorflow")
# installing keras(如果你要keras的話)
devtools::install_github("rstudio/keras")
```
## Boston Housing Price資料集
1970年代Boston市郊房價中位數預測
訓練資料集不大,共506筆,分為訓練集404筆,測試集102筆。
此外各特徵欄位的尺度並不相同,有些為介於0~1之間的比值,有些則是1~12或是0~100的尺度。
資料集如下:
```{r}
library(keras)
dataset <- dataset_boston_housing()
# multiple assignment from dataset to variables on the left hand side
c(c(train_data, train_targets), c(test_data, test_targets)) %<-% dataset
```
```{r}
# Compactly display training data
str(train_data)
```
```{r}
str(test_data)
```
前13個數值是輸入變數
第14個數值是房價,單位是千元。
目標是學習房價的「中位數」(the median values)。
```{r}
str(train_targets)
```
房價大多介於 \$10,000 and \$50,000.之間.
## 資料準備
當輸入值為各種不同的數值範圍時,NN學習效能將大打折扣。因此,必須將輸入資料正規化。
此處正規化是 (數值-平均值)/標準差。正規化後數值以0為中心,且大部分數值落在一個標準差內。
R的scale()函式,恰可完成此調整動作。
```{r}
mean <- apply(train_data, 2, mean) # 計算平均值, 2為按列計算, 套用於每個feature
std <- apply(train_data, 2, sd) # 計算標準差
train_data <- scale(train_data, center = mean, scale = std) # 正規化
test_data <- scale(test_data, center = mean, scale = std)
```
## 網路模型建置
由於訓練資料極少,故網路不宜太大。此處使用兩個隱藏層,各64個節點。 此舉是為了避免overfitting。
```{r}
# build_model function
build_model <- function() {
model <- keras_model_sequential() %>%
layer_dense(units = 64, activation = "relu",
input_shape = dim(train_data)[[2]]) %>%
layer_dense(units = 64, activation = "relu") %>%
layer_dense(units = 1)
model %>% compile(
optimizer = "rmsprop",
loss = "mse",
metrics = c("mae")
)
}
```
輸出層為單一節點,且無設定激勵函式,此線性設定方式是解「線性迴歸」預測問題的標準設定方式。
損失函式採用:`mse` loss function --均方誤差( Mean Squared Error)。也是迴歸問題廣泛使用的loss function。
指標使用: `mae`。Mean Absolute Error。
## Validating our approach using K-fold validation
使用交叉驗證方式(K-fold cross-validation). 將訓練資料分成K份(K通常為4或5)。 接著建立K個相同的網路模型,並將(K-1)份訓練資料丟進去訓練,剩下的那一份則作為驗證用途。因此,網路模型的驗證分數就是 K個分數的平均。
K-fold cross-validation 做法如下:
```{r, echo=TRUE, results='hide'}
k <- 4 # 分成4份
indices <- sample(1:nrow(train_data)) # sample() 隨機產生rows序號
folds <- cut(indices, breaks = k, labels = FALSE) # 將rows編號對應到1~4區間
num_epochs <- 100
all_scores <- c()
for (i in 1:k) {
cat("處理fold #", i, "\n")
# 準備驗證集: 第i份為驗證集
val_indices <- which(folds == i, arr.ind = TRUE)
val_data <- train_data[val_indices,]
val_targets <- train_targets[val_indices]
# 準備訓練集: 第i份以外全部是訓練集
partial_train_data <- train_data[-val_indices,]
partial_train_targets <- train_targets[-val_indices]
# 呼叫先前所寫的build_model()
model <- build_model()
# 訓練網路模型 (in silent mode, verbose=0)
model %>% fit(partial_train_data, partial_train_targets,
epochs = num_epochs, batch_size = 1, verbose = 0)
# 使用驗證集評估訓練好的網路模型
results <- model %>% evaluate(val_data, val_targets, verbose = 0)
all_scores <- c(all_scores, results$mean_absolute_error)
}
```
```{r}
all_scores
```
```{r}
#k-fold validation取平均值做為評估分數
mean(all_scores)
```
每輪的評估分數差異頗大,故平均值應是較為合理的評估指標。
不過,以平均值評估,訓練出來的模型,其平均誤差大約是$2500,似乎仍偏高(房價介於 \$10,000 to \$50,000. )
Epochs重新調整,由100改為 500 epochs.
```{r}
# 記憶體清除,準備重新訓練
K <- backend()
K$clear_session()
```
```{r, echo=TRUE, results='hide'}
num_epochs <- 500
all_mae_histories <- NULL
for (i in 1:k) {
cat("處理fold #", i, "\n")
# 準備驗證集: 第i份為驗證集
val_indices <- which(folds == i, arr.ind = TRUE)
val_data <- train_data[val_indices,]
val_targets <- train_targets[val_indices]
# 準備訓練集: 第i份以外全部是訓練集
partial_train_data <- train_data[-val_indices,]
partial_train_targets <- train_targets[-val_indices]
# 呼叫先前所寫的build_model()
model <- build_model()
# 訓練網路模型 (in silent mode, verbose=0)
history <- model %>% fit(
partial_train_data, partial_train_targets,
validation_data = list(val_data, val_targets),
epochs = num_epochs, batch_size = 1, verbose = 0
)
mae_history <- history$metrics$val_mean_absolute_error #記錄訓練過程
all_mae_histories <- rbind(all_mae_histories, mae_history)
}
```
計算每一epoch的MAE scores(4個folds平均值)
```{r}
average_mae_history <- data.frame(
epoch = seq(1:ncol(all_mae_histories)),
validation_mae = apply(all_mae_histories, 2, mean)
)
```
圖形繪出如下:
```{r}
library(ggplot2)
ggplot(average_mae_history, aes(x = epoch, y = validation_mae)) + geom_line()
```
圖形變化大,改用 `geom_smooth()`,簡化圖形:
```{r}
ggplot(average_mae_history, aes(x = epoch, y = validation_mae)) + geom_smooth()
```
據此圖形,70 epochs以後的驗證值並未持續改善。
故epochs改成80,其他參數也需進行調校,最後才能得到最佳模型。
最後,假設已找到可接受的模型,再進行訓練,並套用測試集:
```{r, echo=FALSE, results='hide'}
# Get a fresh, compiled model.
model <- build_model()
# Train it on the entirety of the data.
model %>% fit(train_data, train_targets,
epochs = 80, batch_size = 16, verbose = 0)
result <- model %>% evaluate(test_data, test_targets)
```
```{r}
result
```完整範例
波士頓房價預測What we learned
-
迴歸問題:MSE loss function, 與分類問題不同
-
評估指標:'acc'不適用於迴歸問題(改使用'mae')
-
輸出資料尺度不同:需做正規化前處理
-
訓練資料量少:交叉驗證(如K-Fold validation)
-
訓練資料量少:小型網路,較少的隱藏層(1或2層)
遞歸類神經網路
Recurrent Neural Network, RNN
長短期記憶模型(long short-term memory, LSTM)


RNN

回收循環
今天的預測結果:明天被重新利用,成為明天的「昨日預測」
Bidirectional LSTM

IMDb資料集
IMDb: 網路電影資料庫
內容為影評文字
應用:情緒分析(sentiment analysis)/ 意見探勘
50000筆影評,已經詞頻統計、依序編號
訓練集:25000筆
測試集:25000筆
標記:正面 or 負面 評價
模型: RNN-長短期記憶模型
考慮前言後語(同一層的其他神經元),以免斷章取義
長短期記憶: 記憶功能(改善RNN的缺點)
IMDb資料集Keras官網說明
num_words參數
只取最常見的字
IMDb資料集資料前處理參數設定
```{r}
library(keras)
# 只取詞頻排名前20000的字
max_features <- 20000
# 每一篇評論只取100字
# (among top max_features most common words)
maxlen <- 100
batch_size <- 32
```每篇評論長度需相同,才能餵入NN訓練
資料庫內的字太多,過濾掉不常見的字
NN訓練參數
IMDb資料集資料前處理
```{r}
# Load imdb dataset
cat('Loading data...\n')
imdb <- dataset_imdb(num_words = max_features)
# Define training and test sets
x_train <- imdb$train$x
y_train <- imdb$train$y
x_test <- imdb$test$x
y_test <- imdb$test$y
# Output lengths of testing and training sets
cat(length(x_train), 'train sequences\n')
cat(length(x_test), 'test sequences\n')
```max_features=20000 #取最常出現之前兩萬字
IMDb資料集資料前處理
```{r}
cat('Pad sequences (samples x time)\n')
# Pad training and test inputs
x_train <- pad_sequences(x_train, maxlen = maxlen)
x_test <- pad_sequences(x_test, maxlen = maxlen)
# Output dimensions of training and test inputs
cat('x_train shape:', dim(x_train), '\n')
cat('x_test shape:', dim(x_test), '\n')
```pad_sequences(): 截去多於100以上的字,不足100補0
IMDb資料集模型
```{r}
model <- keras_model_sequential()
model %>%
# Creates dense embedding layer; outputs 3D tensor
# with shape (batch_size, sequence_length, output_dim)
layer_embedding(input_dim = max_features,
output_dim = 128,
input_length = maxlen) %>%
bidirectional(layer_lstm(units = 64)) %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = 1, activation = 'sigmoid')
```
layer_embedding(): 將文字序號轉成固定長度(output_dim)的向量
layer_lstm(): 長短期記憶模型
bidirectional(): 雙向
IMDb資料集why layer embedding
one hot編碼:非常稀疏
試想:輸入層是20000個詞頻最高的文字,每篇輸入文章最多只取100字,超過19900個輸入是0
以固定長度,如長度5的vector取代
長度:20000 -> 128
IMDb資料集model
```{r}
model <- keras_model_sequential()
model %>%
# Creates dense embedding layer; outputs 3D tensor
# with shape (batch_size, sequence_length, output_dim)
layer_embedding(input_dim = max_features,
output_dim = 128,
input_length = maxlen) %>%
bidirectional(layer_lstm(units = 64)) %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = 1, activation = 'sigmoid')
```
IMDb資料集model compile
# Try using different optimizers and different optimizer configs
model %>% compile(
loss = 'binary_crossentropy',
optimizer = 'adam',
metrics = c('accuracy')
)
IMDb資料集訓練
# Train model over four epochs
cat('Train...\n')
model %>% fit(
x_train, y_train,
batch_size = batch_size,
epochs = 4,
validation_data = list(x_test, y_test)
)IMDb資料集訓練
驗證結果不高 約84%
IMDb資料集完整範例
```{r}
#' Train a Bidirectional LSTM on the IMDB sentiment classification task.
#'
#' Output after 4 epochs on CPU: ~0.8146
#' Time per epoch on CPU (Core i7): ~150s.
library(keras)
# Define maximum number of input features
max_features <- 20000
# Cut texts after this number of words
# (among top max_features most common words)
maxlen <- 100
batch_size <- 32
```
```{r}
# Load imdb dataset
cat('Loading data...\n')
imdb <- dataset_imdb(num_words = max_features)
# Define training and test sets
x_train <- imdb$train$x
y_train <- imdb$train$y
x_test <- imdb$test$x
y_test <- imdb$test$y
# Output lengths of testing and training sets
cat(length(x_train), 'train sequences\n')
cat(length(x_test), 'test sequences\n')
```
```{r}
cat('Pad sequences (samples x time)\n')
# Pad training and test inputs
x_train <- pad_sequences(x_train, maxlen = maxlen)
x_test <- pad_sequences(x_test, maxlen = maxlen)
# Output dimensions of training and test inputs
cat('x_train shape:', dim(x_train), '\n')
cat('x_test shape:', dim(x_test), '\n')
```
```{r}
# Initialize model
model <- keras_model_sequential()
model %>%
# Creates dense embedding layer; outputs 3D tensor
# with shape (batch_size, sequence_length, output_dim)
layer_embedding(input_dim = max_features,
output_dim = 128,
input_length = maxlen) %>%
bidirectional(layer_lstm(units = 64)) %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = 1, activation = 'sigmoid')
# Try using different optimizers and different optimizer configs
model %>% compile(
loss = 'binary_crossentropy',
optimizer = 'adam',
metrics = c('accuracy')
)
# Train model over four epochs
cat('Train...\n')
model %>% fit(
x_train, y_train,
batch_size = batch_size,
epochs = 4,
validation_data = list(x_test, y_test)
)
```更多範例
行動技術與應用
By Leuo-Hong Wang
行動技術與應用
Lesson 10: More Examples
- 1,490