Photo by Alif Caesar Rizqi Pratama on Unsplash
How to Use AI for Chinese Book Classification?
Lecturer: Julie Wang
Library and Information Science Department
Date: 2022-11-25
Introduction
Services in Library
Technical Service
Public Service
Catalog
New Classification Scheme for Chinese Libraries
(中文圖書分類法)
- 000 Generalities
- 100 Philosophy
- 200 Religion
- 300 Sciences
- 400 Applied sciences
- 500 Social sciences
- 600 History of China and Geography of China
- 700 World history and Geography
- 800 Linguistics and Literature
- 900 Arts


The Problem
- More than one topic covered in the book
- Lack of catalog experience
- Time consuming
Goal
- Can AI classify Chinese books?
- Use BERT to classify Chinese books.
Method
Solution
- Analyze the book
- Extract feature
- Build the model
- Evaluate the model
Analyze Books from Different Genres


- title
- author
- classification number
- ISBN (International Standard Book Number)
- subject
- introduction
Extract Feature
| 800~809 | 0 |
| 810~819 | 1 |
| 820~829 | 2 |
| 830~839 | 3 |
| 840~849 | 4 |
| 850~859 | 5 |
| 860~869 | 6 |
| 870~879 | 7 |
| 880~889 | 8 |
| 890~899 | 9 |
- Title: 極速西班牙語
- Author: 史密斯 Smith Elisabeth
- Subject: 西班牙語. 會話.
- Introduction: 身在異地,人生路不熟,加上言語不通,會令整個旅程大打折扣。本書幫助讀者在極短時間內擺脫溝通障礙,《極速西班牙語》的主體部分是一張75分鐘的CD,以生動的實境教讀者學講幾十句必備西班牙語。本書另有遊客須知、精彩景點簡介,點菜指南和詞彙,還有特別設計的筆記欄供讀者隨學隨記。
<data>
<tag>

Build the Classification Model by BERT
BERT
<mask>
<tag_a>
<data>
<tag_b>
Bidirectional Encoder Representations from Transformers (BERT)
- Pre-training
- Fine-tune
Result
Evaluation
Matthews correlation coefficient (MCC)

'mcc': 0.759


Limitation
- amount of data
- effect of classification scheme
- Indian culture books -->British literature
Conclusion
- The model got about 0.76 on MCC metric, which proof the ability of the model in book classification.
- Books classified 880-889 (Other countries literatures) & 840-849 (Chinese literature: individual works) are easy to be wrong classified.
- More data and experts' opinion will be better.
The Study of Automatic Book Classification Using BERT in Fu Jen Catholic University Library
By juliewah
The Study of Automatic Book Classification Using BERT in Fu Jen Catholic University Library
- 109