European Literary Text Collection (ELTeC):
a multilingual European Literary Text Collection (ELTeC), ultimately containing around 2,500 full-text novels in at least 10 different languages, permitting to test methods and compare results across national traditions
NER
Geocoded Places
в начало июль, в чрезвычайно жаркий время, под вечер, один молодой человек выходить из свой каморка, который нанимать от жилец в с -- м переулок, на улица и медленно, как бы в нерешимость, отправляться к
к -- ну мост.
В начале июля, в чрезвычайно жаркое время, под вечер, один молодой человек вышел из своей каморки, которую нанимал от жильцов в С -- м переулке, на улицу и медленно, как бы в нерешимости, отправился к
К -- ну мосту.
https://voyant-tools.org/
Download from https://www.laurenceanthony.net/software/antconc/
Expanding BookNLP to Spanish, German, Japanese and Russian
BookNLP (Bamman et al., 2014) is a natural language processing pipeline for reasoning about the linguistic structure of text of books, specifically designed for works of fiction. In addition to its pipeline of part-of-speech tagging, named entity recognition, and coreference resolution, BookNLP identifies the characters in a literary text, and represents them through the actions they participate in, the objects they possess, their attributes, and dialogue. The availability of this tool has driven much work in the computational humanities, especially surrounding character (Underwood et al., 2018; Kraicer and Piper, 2018; Dubnicek et al., 2018). At the same time, however, BookNLP has one major limitation: it currently only supports texts written in English. The goal of this project is to develop a version of BookNLP to support literature in Spanish, Japanese, Russian and German, and create a blueprint for others to develop it for additional languages in the future.