Presentations
Templates
Features
Teams
Pricing
Log in
Sign up
Log in
Sign up
Menu
Word Alignment
Paul Charousset &
Yannick Péroux
29 April 2014 - CA446
Dublin City University
The Goal
Can we correct the output of a word-aligner from a set of supervised alignments ?
Testing Data
Data from the Hansard
447 sentences from English to French
Word-alignment made by linguists
Unsupervised word-alignment with Giza++
Tools
Stanford POS-tagger
NLTK
Weka
Methodology
We generate an ARFF from the data
We add some attributes (with the POS tagger)
2 classes
S - a supervised alignment
N - an alignment detected by Giza++ but not in the reference
We train Weka with the ARFF and do cross-validation
Results
Starting point:
80% of the alignments are good (S)
20% are bad (N)
We improved it by a very small degree (< 0.1 %)
Our corpus isn't long enough
We can't use complex structure with Weka
Conclusion
We tried to extract information from a known set of word-alignments
We didn't have enough data
A naive Machine-Learning algorithm isn't sufficient
Questions ?
SMT
By Yannick Péroux
Made with Slides.com
SMT
1,110
Yannick Péroux
k4nar
More from
Yannick Péroux