D. Zlatkova, D. Kopev, K. Mitov,
A. Atanasov, M. Hardalov, I. Koychev
FMI, Sofia University, Bulgaria
The 18th International Conference on Artificial Intelligence: Methodology, Systems, Applications
Varna, Bulgaria
13 September 2018
P. Nakov
Qatar Computing Research Institute, HBKU, Doha, Qatar
Given a document, determine whether it contains style changes or not, i.e., if it was written by a single or multiple authors. [Kestemont et al., 2018]
1
Given a document, determine whether it is multi-authored, and if yes, find the borders where authors switch. (Hard)
2
1. Authorship attribution:
2. Style Breach Detection:
Characters:
Words:
Sentences:
WindowDiff (lower is better)
WinPR (higher is better)
Source: [Pevzner el al., 2002, Scaiano el al., 2012]
5-fold Cross Validation on Train data
[Kestemont et al., 2018] Kestemont, M., Tschuggnall, M., Stamatatos, E., Daelemans, W., Specht, G., Stein, B., Potthast, M.: Overview of the author identification task at PAN-2018: Cross-domain authorship attribution and style change detection. In: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum. CLEF ’18, Avignon, France (2018)
[Safin et al., 2017] Safin, K., Kuznetsova, R.: Style breach detection with neural sentence embeddings—notebook for PAN at CLEF 2017. In: CLEF 2017 Evaluation Labs and Workshop – Working Notes Papers. CLEF ’17, Dublin, Ireland (2017)
[Karaś et al., 2017] Karaś, D., Śpiewak, M., Sobecki, P.: OPI-JSA at CLEF 2017: Author Clustering and Style Breach Detection—Notebook for PAN at CLEF 2017. In: CLEF 2017 Evaluation Labs and Workshop – Working Notes Papers. CLEF ’17, Dublin, Ireland (2017)
[Kuznetsov et al., 2016] Kuznetsov, M., Motrenko, A., Kuznetsova, R., Strijov, V.: Methods for intrinsic plagiarism detection and author diarization—notebook for PAN at CLEF 2016. In: CLEF 2016 Evaluation Labs and Workshop – Working Notes Papers. CLEF ’16, Évora, Portugal (2016)
[Khan et al., 2017] Khan, J.: Style breach detection: An unsupervised detection model—notebook for PAN at CLEF 2017. In: CLEF 2017 Evaluation Labs and Workshop – Working Notes Papers. CLEF ’17, Dublin, Ireland (2017)
[Sittar et al., 2016] Sittar, A., Iqbal, H., Nawab, R.: Author Diarization Using Cluster-Distance Approach—Notebook for PAN at CLEF 2016. In: CLEF 2016 Evaluation Labs and Workshop – Working Notes Papers. CLEF ’16, Évora, Portugal (2016)
[Pevzner et al., 2002] Pevzner, L., Hearst, M.A.: A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics 28(1), 19–36 (2002)
[Scaiano et al., 2012] Scaiano, M., Inkpen, D.: Getting more from segmentation evaluation. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 362–366. NAACL-HLT ’12, Montreal, Canada (2012)