Computational Methods for Russian Literature: Current State and Future Directions

Current State

 

Системный Блокъ

 

European Literary Text Collection (ELTeC):

a multilingual European Literary Text Collection (ELTeC), ultimately containing around 2,500 full-text novels in at least 10 different languages, permitting to test methods and compare results across national traditions

 

 

 

Prozhito

NER

Geocoded Places

Named Entity Recognition & Entity Linking

All of this is very interesting, but...

Disappointed by search?

It gets worse if you try DH methods

Lemmatization to the rescue!

в начало июль, в чрезвычайно жаркий время, под вечер, один молодой человек выходить из свой каморка, который нанимать от жилец в с -- м переулок, на улица и медленно, как бы в нерешимость, отправляться к

к -- ну мост.

В начале июля, в чрезвычайно жаркое время, под вечер, один молодой человек вышел из своей каморки, которую нанимал от жильцов в С -- м переулке, на улицу и медленно, как бы в нерешимости, отправился к

К -- ну мосту.

Voyant

https://voyant-tools.org/

AntConc

Download from https://www.laurenceanthony.net/software/antconc/

When (seemingly) good text goes bad...

What about all my PDFs?

The Future

Multlingual BookNLP

Expanding BookNLP to Spanish, German, Japanese and Russian

BookNLP (Bamman et al., 2014) is a natural language processing pipeline for reasoning about the linguistic structure of text of books, specifically designed for works of fiction. In addition to its pipeline of part-of-speech tagging, named entity recognition, and coreference resolution, BookNLP identifies the characters in a literary text, and represents them through the actions they participate in, the objects they possess, their attributes, and dialogue. The availability of this tool has driven much work in the computational humanities, especially surrounding character (Underwood et al., 2018; Kraicer and Piper, 2018; Dubnicek et al., 2018). At the same time, however, BookNLP has one major limitation: it currently only supports texts written in English. The goal of this project is to develop a version of BookNLP to support literature in Spanish, Japanese, Russian and German, and create a blueprint for others to develop it for additional languages in the future.

Sawchen Lecture

By Andrew Janco

Sawchen Lecture

  • 484