2nd Conference for Computational Literary Studies, Würzburg, 23.06.2023
Luca Giovannini — Daniil Skorinkin
University of Potsdam, Germany
This presentation: plu.sh/libretti
A new, “artificial” genre born in the early 17th century in Italy and rapidly exported across Europe
Traditional scholarly focus on music more than words
Librettology: still largely an analogic discipline
Few computational investigations
Is it possible to consider libretti a unitary genre with its own structural features?
Do libretti possess a peculiar "genre signal" which sets them apart from contemporary comedies and tragedies?
How did the structure of libretti evolve compared to the other genres?
(☞ Fischer et al. 2017, dracor.org)
libretto
' column:
libretto
' and 'normalized_genre
' columns:
libretto
' and 'normalized_genre
' were mutually exclusive (= 0 multi-label plays)libretto
' to 'normalized genre
''subtitle'
containing one of these labels for operatic subgenres:Problem #1: blurred boundaries
of the concept of libretto
normalized_genre
nor libretto
filledProblem #2: missing
genre indicators
🇩🇪
+ 51%
🇫🇷
+ 55%
Vectorisation of plays according to structural features
EDA on different textual aspects
num_p
, num_l
, num_female_speakers
)num_of_segments, num_of_speakers,
num_of_person groups, word_count_sp,
word_count_stage, average_degree, density, average_clustering, max_degree,
num_of_connected components,
diameter, average_path_length
A mixture of network measures, size statistics, and speech distribution metrics
Results were unsatisfying: no meaningful clustering, no signs of libretto being a unitary genre
Semi-automatic labelling of libretti as comic/non comic, based on their subtitles (e.g. komisches Oper → comic libretto)
Results: clustering still problematic BUT
significant topological patterns emerge
comic space
tragic zone
non-comic libretti
Random Forest Classifier
5-fold cross validation on all data
Iterative selection of the best n estimators
parameter (10-1000)
Removed highly correlated values (see correlation matrix)
word_count_stage
word_count_sp
num_connected_components
density
num_of_speakers
diameter
word_count_sp
num_of_person_groups
average_degree
four-class implementation
plotting each play individually
LOWESS-based smoothing curves to make trends visible
Libretti have consistently less spoken text and more stage directions
trend more prominent in French, but visible also in German
🇩🇪 num_groups / word_count_sp
🇫🇷: density
/ num_speakers
it is easier to confuse comedies and comic libretti
it is easier to confuse comedies and comic libretti
Even the two types of French libretti
are more distinct than the German ones
Comparison: topic modelling (Schöch 2017)
Individual structural features might be useful for distinguishing libretti from non-libretti (e.g. text length), or comedies from tragedies (density)
However, it is generally not easy to distinguish between plays formalised as vectors of multiple features
Drama often seems too homogenous, in terms of structural properties, for discriminative clustering
Need to employ better features or rethink operationalisation patterns