By Frankie Xu
OCR(Optical Character Recognition)
中文為光學文字辨識的縮寫,透過光學輸入的技術掃描印刷上的文字轉化為圖像,並利用識別技術把圖像中的文字轉換成文本格式。
最常見的AI應用除了語音辨識、影像辨識及自然語言處理外,OCR文字識別也正被廣泛地使用。
圖片 -> 文字 ; 非結構化 -> 結構化
基本流程
驗證碼
-> 使用線上ocr服務
自行開發成本高
掃出結果 ≠ 結構化
import cv2
image = cv2.imread('lena.jpg')
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow('Result', image)
cv2.waitKey(0)
二值化
import cv2
import numpy as np
from matplotlib import pyplot as plt
img = cv2.imread('gradient.png',0)
ret,thresh1 = cv2.threshold(img,127,255,cv2.THRESH_BINARY)
ret,thresh2 = cv2.threshold(img,127,255,cv2.THRESH_BINARY_INV)
ret,thresh3 = cv2.threshold(img,127,255,cv2.THRESH_TRUNC)
ret,thresh4 = cv2.threshold(img,127,255,cv2.THRESH_TOZERO)
ret,thresh5 = cv2.threshold(img,127,255,cv2.THRESH_TOZERO_INV)
titles = ['Original Image','BINARY','BINARY_INV','TRUNC','TOZERO','TOZERO_INV']
images = [img, thresh1, thresh2, thresh3, thresh4, thresh5]
for i in xrange(6):
plt.subplot(2,3,i+1),plt.imshow(images[i],'gray')
plt.title(titles[i])
plt.xticks([]),plt.yticks([])
plt.show()
給定 global threshold (閾值)
常用算法
Threshold Binary:即二值化,將大於閾值的灰度值設為最大灰度值,小於閾值的值設為0。
Threshold Binary, Inverted:將大於閾值的灰度值設為0,其他值設為最大灰度值。
Truncate:將大於閾值的灰度值設為閾值,小於閾值的值保持不變。
Threshold to Zero:將小於閾值的灰度值設為0,大於閾值的值保持不變。
Threshold to Zero, Inverted:將大於閾值的灰度值設為0,小於閾值的值保持不變。
但應用場景流程需要自動化,給定 global 閾值不太可行
自動化 Adaptive threshold (自適應閾值)
import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
img = cv.imread('sudoku.png',0)
img = cv.medianBlur(img,5)
ret,th1 = cv.threshold(img,127,255,cv.THRESH_BINARY)
th2 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_MEAN_C,\
cv.THRESH_BINARY,11,2)
th3 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_GAUSSIAN_C,\
cv.THRESH_BINARY,11,2)
titles = ['Original Image', 'Global Thresholding (v = 127)',
'Adaptive Mean Thresholding', 'Adaptive Gaussian Thresholding']
images = [img, th1, th2, th3]
for i in xrange(4):
plt.subplot(2,2,i+1),plt.imshow(images[i],'gray')
plt.title(titles[i])
plt.xticks([]),plt.yticks([])
plt.show()
自動化 Adaptive threshold (自適應閾值)
常見 Adaptive threshold 演算法
蠻適合處理光影不均的影像
Otsu 大津二值化演算法
灰度直方圖 -> 找出加權平方差最小的閾值,適合灰度直方圖呈現明顯雙峰的影像
Otsu 大津二值化演算法
import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
img = cv.imread('noisy2.png',0)
# global thresholding
ret1,th1 = cv.threshold(img,127,255,cv.THRESH_BINARY)
# Otsu's thresholding
ret2,th2 = cv.threshold(img,0,255,cv.THRESH_BINARY+cv.THRESH_OTSU)
# Otsu's thresholding after Gaussian filtering
blur = cv.GaussianBlur(img,(5,5),0)
ret3,th3 = cv.threshold(blur,0,255,cv.THRESH_BINARY+cv.THRESH_OTSU)
# plot all the images and their histograms
images = [img, 0, th1,
img, 0, th2,
blur, 0, th3]
titles = ['Original Noisy Image','Histogram','Global Thresholding (v=127)',
'Original Noisy Image','Histogram',"Otsu's Thresholding",
'Gaussian filtered Image','Histogram',"Otsu's Thresholding"]
for i in xrange(3):
plt.subplot(3,3,i*3+1),plt.imshow(images[i*3],'gray')
plt.title(titles[i*3]), plt.xticks([]), plt.yticks([])
plt.subplot(3,3,i*3+2),plt.hist(images[i*3].ravel(),256)
plt.title(titles[i*3+1]), plt.xticks([]), plt.yticks([])
plt.subplot(3,3,i*3+3),plt.imshow(images[i*3+2],'gray')
plt.title(titles[i*3+2]), plt.xticks([]), plt.yticks([])
plt.show()
沒有哪種演算法最好,
因應各種場景遇到的影像,判定適合用哪種演算法
卷積核內全部都是白才會留白,否則黑
腐蝕前景(白色)物體的邊界
去除白色躁點、切割兩個相連物體等等...
import cv2
import numpy as np
img = cv2.imread('j.png',0)
kernel = np.ones((5,5),np.uint8)
erosion = cv2.erode(img,kernel,iterations = 1)
cv2.imshow('j',erosion)
cv2.waitKey(0)
cv2.destroyAllWindows()
與腐蝕相反,
卷積核內只要有一點白就會填白
膨脹前景(白色)物體的邊界
難題: 文本邊角的座標未知
場景需求 -> 文本複雜度 -> 使用線上ocr服務 -> 影像標準化 -> 解析