穩健的口語化文本分割方法

林榮顯, 黃純敏*, 陳硯楷, 林揚展

TANET 2023 Oral

Session

1. INTRODUCTION

2. RELATED WORK

3. METHOD

4. RESULT&CONCLUSION 4 結果與結論

1. INTRODUCTION

文本分割

詞

句子

段落

1. INTRODUCTION

文本分割

口語化

詞

句子

段落

STS

(ASR/Human)

1. INTRODUCTION

What is the difference?

文本分割

口語化

1. INTRODUCTION

What is the difference?

Q: Can we apply a model that has been trained only on written data directly to spoken data?

文本分割

口語化

1. INTRODUCTION

What is the difference?

文本分割

口語化

SegBot

Text Segmentation as a

Supervised Learning Task

2016

2018

Text Segmentation by

Cross Segment Attention

2020

1. INTRODUCTION

What is the difference?

文本分割

口語化

SegBot

Text Segmentation as a

Supervised Learning Task

2016

2018

Text Segmentation by

Cross Segment Attention

2020

Choi

0.7k

1. INTRODUCTION

What is the difference?

文本分割

口語化

SegBot

Text Segmentation as a

Supervised Learning Task

2016

2018

Text Segmentation by

Cross Segment Attention

2020

Wiki-727K

727k

Choi

0.7k

larger 1000x

1. INTRODUCTION

What is the difference?

文本分割

口語化

SegBot

Text Segmentation as a

Supervised Learning Task

2016

2018

Text Segmentation by

Cross Segment Attention

2020

1. INTRODUCTION

What is the difference?

文本分割

口語化

風格：口語表達通常較書面更隨意與破碎，多數語句傳達的資訊通常比書面更加稀薄

1. INTRODUCTION

What is the difference?

文本分割

口語化

干擾：由人類手工轉錄成本高且耗時，現有方法通常藉由ASR轉錄，但轉錄結果還是會有一定誤差

1. INTRODUCTION

What is the difference?

文本分割

口語化

資料集：相較目前(2023年8月)常用的口語化文本分割資料集AMI僅含有約100小時的轉錄文字，相較目前常用的書面文本分割資料集Wiki-727k擁有約727k篇英語維基百科頁面，其多樣性與資料量皆不足以支撐書面文本分割常用的監督式學習方法。

AMI meeting corpus

1000K words

larger 34x

Wiki-727K

34442K words

1. INTRODUCTION

What is the difference?

文本分割

口語化

Q: Can we apply a model that has been trained only on written data directly to spoken data?

Ans: No, we can't.

2. RELATED WORK

Semantic TextTiling

2. RELATED WORK

Semantic TextTiling

Problem

1. 無法保證S是完整的句子

2. RELATED WORK

Semantic TextTiling

Problem

1. 無法保證S是完整的句子

S1: win and lose condition in our game will

S2: depend on how many score points can the

S1: … mounted from floor

S2: all the way to the ceiling …

S1: … splice method needs at least two

S2: arguments the first argument is the …

2. RELATED WORK

Semantic TextTiling

Problem

1. 無法保證S是完整的句子

2. 嵌入空間是否各向同性

2. RELATED WORK

Semantic TextTiling

Problem

1. 無法保證S是完整的句子

2. 嵌入空間是否各向同性

2. RELATED WORK

Semantic TextTiling

Problem

1. 無法保證S是完整的句子

2. 嵌入空間是否各向同性

2. RELATED WORK

Semantic TextTiling

Problem

1. 無法保證S是完整的句子

2. 嵌入空間是否各向同性

3. METHOD

4. RESULT&CONCLUSION

Metric

P_k

WindowDiff

4. RESULT&CONCLUSION

Metric

P_k

WindowDiff

P_k (ref, hyp, k)

=\frac{1}{N-k+1} \sum_{i=1}^{N-k+1} (b(ref_i, ref_{i+k}>0) \odot b(hyp_i, hyp_{i+k}>0))

ref: human\ labeling\ list \\ hyp: model\ labeling\ list \\ k: sliding\ window\ width \\ b(a, b): count\ boundary\ from\ index\ a\ to\ index\ b

4. RESULT&CONCLUSION

Metric

WindowDiff

P_k

WindowDiff(ref, hyp, k)

=\frac{1}{N-k+1} \sum_{i=1}^{N-k+1} (|b(ref_i, ref_{i+k}) - b(hyp_i, hyp_{i+k})|>0)

ref: human\ labeling\ list \\ hyp: model\ labeling\ list \\ k: sliding\ window\ width \\ b(a, b): count\ boundary\ from\ index\ a\ to\ index\ b

4. RESULT&CONCLUSION

Metric

P_k

WindowDiff

W_{s-ord}(S_e, T_b)

= \sum^{|S_e|}_{j=1}(w_s+\frac{|S_e[j][0]-S_e[j][1]|}{max(T_b)-min(T_b)})

W_{t-span}(T_e, n_t)

= \sum^{|T_e|}_{j=1}(w_t+\frac{|T_e[j][0]-T_e[j][1]|}{n_t-1})

B (s_1, s_2, n_t)

=1-\frac {|A_e|+W_{t-span}(T_e, n_t)+W_{s-ord}(S_e, T_b)} {|A_e|+|T_e|+|S_e|+|B_m|}

A_e = 未準確命中且在n_𝑡容許範圍外 \\

T_e = 未準確命中但在n_𝑡容許範圍 \\

B_m = 段落切點準確命中 \\

S_e = 具有以上多種分割狀態

s_1 = model\ labeling\ list \\

s_2 = human\ labeling\ list \\

n_t = maximum\ transposition\ spanning\ distance \\

4. RESULT&CONCLUSION 4 結果與結論

Copy of Copy of Copy of Copy of AWS

By r oger

Copy of Copy of Copy of Copy of AWS

穩健的口語化文本分割方法

Session

1. INTRODUCTION

2. RELATED WORK

3. METHOD

4. RESULT&CONCLUSION 4 結果與結論

1. INTRODUCTION

1. INTRODUCTION

1. INTRODUCTION

1. INTRODUCTION

Q: Can we apply a model that has been trained only on written data directly to spoken data?

1. INTRODUCTION

1. INTRODUCTION

1. INTRODUCTION

1. INTRODUCTION

1. INTRODUCTION

1. INTRODUCTION

1. INTRODUCTION

1. INTRODUCTION

Q: Can we apply a model that has been trained only on written data directly to spoken data?

Ans: No, we can't.

2. RELATED WORK

Semantic TextTiling

2. RELATED WORK

Semantic TextTiling

Problem

2. RELATED WORK

Semantic TextTiling

Problem

2. RELATED WORK

Semantic TextTiling

Problem

2. RELATED WORK

Semantic TextTiling

Problem

2. RELATED WORK

Semantic TextTiling

Problem

2. RELATED WORK

Semantic TextTiling

Problem

3. METHOD

3. METHOD

3. METHOD

3. METHOD

3. METHOD

4. RESULT&CONCLUSION

Metric

4. RESULT&CONCLUSION

Metric

4. RESULT&CONCLUSION

Metric

4. RESULT&CONCLUSION

Metric

4. RESULT&CONCLUSION 4 結果與結論

Copy of Copy of Copy of Copy of AWS

More from r oger