OpenCV 教學範例程式

Part 6

Machine Learning

主要章節

26_k-nearest
27_k-means
28_svm

26_k-nearest

K-nearest neighbor

K 個最近的鄰居

kNN

k-Nearest Neighbor (k-NN) classifier is a supervised learning algorithm, and it is a lazy learner. It is called lazy algorithm because it doesn't learn a discriminative function from the training data but memorizes the training dataset instead.

假設欲預測點是 x
找出離 x 最近的 k 筆資料中，

多數是哪一類，即為預測 x 的類型

k 值的決定？

KNN屬於機器學習中的監督式學習 (Supervised learning)，不過一般來說監督式學習是透過資料訓練 (training) 出一個 model，但是在 KNN 其實並沒有做 training 的動作。KNN 一般用來做資料的分類，如果你已經有一群分好類別的資料，後來加進去點就可以透過KNN的方式指定新增加資料的分類。

Ref: https://towardsdatascience.com/knn-k-nearest-neighbors-1-a4707b24bd1d

Ref: https://northbei.medium.com/machine-learning-knn%E5%88%86%E9%A1%9E%E6%BC%94%E7%AE%97%E6%B3%95-b3e9b5aea8df

Ref: https://www.bogotobogo.com/python/scikit-learn/scikit_machine_learning_k-NN_k-nearest-neighbors-algorithm.php

距離的計算

曼哈頓距離 (Manhattan Distance)

歐幾里得距離 (Euclidean Distance)

d(x,y) = |x_1-x_2| + |y_1-y_2|

d(x,y) = \sqrt{(x_1-x_2)^2 + (y_1-y_2)^2}

k=5

綠色圓，是屬於紅色群，或藍色群

Ref: Understanding k-Nearest Neighbour

範例：OCR 字元辨識

Ref: OCR of Hand-written Data using kNN

全圖為 1000x2000 像素，共 5000 張數字圖

每個數字有 500 張圖，每張圖尺寸 20x20

前一半做為訓練資料集

後一半做為測試資料集

(50,100,20,20)

(2500,400)

reshape

每個字變成一維陣列

共有 2500 個元素

(50,50,20,20)

分成兩個

......

27_k-means

K-Means Clustering

K-Means Clustering is an unsupervised learning algorithm.

分群就是對所有數據進行分組，將相似的數據歸類為一起，每一筆數據的能有一個分組，每一組稱作為群集(Cluster)。

... 將每一個點分類到離自己最近的群集中心(可用直線距離)。重新計算各組的群集中心(常用平均值)。

反覆上述動作，直到群集不變，群集中心不動為止。

k = 2

Ref: https://www.ml-science.com/k-means-clustering

http://shabal.in/visuals/kmeans/1.html

k-mean 初始點的選擇

K-means 初始點的選擇有很大關係

參考此網站

1.html 原始分佈圖
2.html 最左邊四個點
3.html 最左邊四個點
4.html 最上面四個點
5.html 最下面四個點
6.html 隨機挑四個點

原始資料分佈圖

4 left-most points

4 right-most points

4 top points

4 bottom points

4 random points

應用：color_quantization

顏色量化

Ref: https://demonstrations.wolfram.com/ColorQuantizationOfPhotographicImagesIIPaletteFromColorsOutO/

......

28_svm

Text

Support Vector Machine

支援向量機

27_svm

Text

SVM是一種監督式的學習方法，用統計風險最小化的原則來估計一個分類的超平面(hyperplane)

其基礎的概念，就是找到一個決策邊界(decision boundary) 讓兩類之間的邊界(margins)最大化，使其可以完美區隔開來。

舉例說明要「如何只用身高體重就來判斷是男生還是女生」。

分類男生和女生兩類，特徵資料只有「身高」和「體重」。

Ref: https://chih-sheng-huang821.medium.com/%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92-%E6%94%AF%E6%92%90%E5%90%91%E9%87%8F%E6%A9%9F-support-vector-machine-svm-%E8%A9%B3%E7%B4%B0%E6%8E%A8%E5%B0%8E-c320098a3d2e

目標：找到紅色那條分類的線
(不一定是直線，有可能是曲線)

Ref: https://chih-sheng-huang821.medium.com/%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92-%E6%94%AF%E6%92%90%E5%90%91%E9%87%8F%E6%A9%9F-support-vector-machine-svm-%E8%A9%B3%E7%B4%B0%E6%8E%A8%E5%B0%8E-c320098a3d2e

SVM則是假設有一個 hyperplane(wTx+b=0)可以完美分割兩組資料，所以 SVM 就是在找參數 (w和b)，讓兩組之間的距離最大化。

有一天上帝給了你一個考驗，要你用一個棍子將這兩顆不同顏色的球分開

Ref: https://medium.com/jameslearningnote/%E8%B3%87%E6%96%99%E5%88%86%E6%9E%90-%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92-%E7%AC%AC3-4%E8%AC%9B-%E6%94%AF%E6%8F%B4%E5%90%91%E9%87%8F%E6%A9%9F-support-vector-machine-%E4%BB%8B%E7%B4%B9-9c6c6925856b

......

......

OpenCV 教學範例程式 (Part 6)

By 陳信嘉

OpenCV 教學範例程式 (Part 6)

67

陳信嘉

Shinjia Chen