An experiment on DraCor Corpora
Luca Giovannini
Daniil Skorinkin
Workshop version - 09.02.2023
A new, “artificial” genre born in the early 17th c. in Italy, and rapidly exported across Europe
Traditionally: focus on music more than words
Librettology: still an analogic discipline
Few computational investigations
Is it possible to consider libretti a unitary genre with its own structural features?
Do libretti possess a peculiar "genre signal" which sets them apart from contemporary comedies and tragedies?
How did they structurally evolve in comparison to the other genres?
Data preparation
Features selection
Data exploration
Results and discussion
libretto
' column:
libretto
' and 'normalized_genre
' columns:
libretto
' and 'normalized_genre
' were mutually exclusive (= 0 multi-label plays)libretto or genre
' columnlibretto
' over 'normalized genre
'libretto or genre
' stats:'subtitle'
containing one of these labels for operatic subgenres:Problem #1: blurred boundaries for the concept of opera
normalized_genre
nor libretto
filledProblem #2: missing genres
num_p
, num_l
, num_female_speakers
)edge = high correlation (>0.75 or <-0.75)
average_path_length
diameter
max_degree
num_connected_components
num_of_segments
average_path_length
max_degree
comic space
tragic zone
non-comic libretti
autonomous region
1770-1819
1820-1869
1870-1921
1620–1669
1670–1719
1720–1769
1770-1819
1820-1889
word_count_stage word_count_sp
Random Forest Classifier
5-fold cross validation on all data
Iterative selection of the best n estimators
parameter (10-1000)
Looking at feature importances
word_count_stage
word_count_sp
num_connected_components
density
num_of_speakers
diameter
word_count_sp
num_of_person_groups
average_degree
four-class implementation
plotting each play individually
LOWESS-based smoothing curves to make trends visible
Libretti have less spoken text and more stage directions
trend more prominent in French, but visible also in German
🇩🇪 num_groups / word_count_sp
🇫🇷: density
/ num_speakers
it is easier to confuse comedies and comic libretti
it is easier to confuse comedies and comic libretti
the two types of French libretti are more distinct than the German ones
Individual structural features might be useful for distinguishing libretti from non-libretti (e.g. text length), or comedies from tragedies (density)
However, it is not easy to distinguish between plays formalised as vectors of multiple features
Drama seems too homogenous, in terms of structural properties, for discriminative clustering
Topic modelling seems actually to work better in distinguishing genres — as per Shaw's famous quote