How to Use AI for Chinese Book Classification?

Lecturer: Julie Wang

Library and Information Science Department

Date: 2022-11-25

Introduction

Services in Library

Technical Service

Public Service

Catalog

New Classification Scheme for Chinese Libraries

(中文圖書分類法)

  • 000 Generalities
  • 100 Philosophy
  • 200 Religion
  • 300 Sciences
  • 400 Applied sciences
  • 500 Social sciences
  • 600 History of China and Geography of China
  • 700 World history and Geography
  • 800 Linguistics and Literature
  • 900 Arts

The Problem

  • More than one topic covered in the book
  • Lack of catalog experience
  • Time consuming

Goal

  1. Can AI classify Chinese books?
  2. Use BERT to classify Chinese books.

Method

Solution

  1. Analyze the book
  2. Extract feature
  3. Build the model
  4. Evaluate the model

Analyze Books from Different Genres

  • title
  • author
  • classification number
  • ISBN (International Standard Book Number)
  • subject
  • introduction

Extract Feature

800~809 0
810~819 1
820~829 2
830~839 3
840~849 4
850~859 5
860~869 6
870~879 7
880~889 8
890~899 9
  • Title: 極速西班牙語
  • Author: 史密斯 Smith Elisabeth
  • Subject: 西班牙語. 會話.
  • Introduction: 身在異地,人生路不熟,加上言語不通,會令整個旅程大打折扣。本書幫助讀者在極短時間內擺脫溝通障礙,《極速西班牙語》的主體部分是一張75分鐘的CD,以生動的實境教讀者學講幾十句必備西班牙語。本書另有遊客須知、精彩景點簡介,點菜指南和詞彙,還有特別設計的筆記欄供讀者隨學隨記。

<data>

<tag>

Build the Classification Model by BERT

BERT

<mask>

<tag_a>

<data>

<tag_b>

Bidirectional Encoder Representations from Transformers (BERT)

  • Pre-training
  • Fine-tune

Result

Evaluation

Matthews correlation coefficient (MCC)

'mcc': 0.759

Limitation

  • amount of data
  • effect of classification scheme
    • Indian culture books             -->British literature

Conclusion

  • The model got about 0.76 on MCC metric, which proof the ability of the model in book classification.
  • Books classified 880-889 (Other countries literatures) & 840-849 (Chinese literature: individual works) are easy to be wrong classified.
  • More data and experts' opinion will be better.

The Study of Automatic Book Classification Using BERT in Fu Jen Catholic University Library

By juliewah

The Study of Automatic Book Classification Using BERT in Fu Jen Catholic University Library

  • 109