An experiment on DraCor Corpora
Luca Giovannini
Daniil Skorinkin
Workshop version - 09.02.2023
A new, “artificial” genre born in the early 17th c. in Italy, and rapidly exported across Europe
Traditionally: focus on music more than words
Librettology: still an analogic discipline
Few computational investigations
Is it possible to consider libretti a unitary genre with its own structural features?
Do libretti possess a peculiar "genre signal" which sets them apart from contemporary comedies and tragedies?
How did they structurally evolve in comparison to the other genres?
Data preparation
Features selection
Data exploration
Results and discussion
libretto' column:
libretto' and 'normalized_genre' columns:
libretto' and 'normalized_genre' were mutually exclusive (= 0 multi-label plays)libretto or genre' columnlibretto' over 'normalized genre'libretto or genre' stats:'subtitle' containing one of these labels for operatic subgenres:Problem #1: blurred boundaries for the concept of opera
normalized_genre nor libretto filledProblem #2: missing genres
num_p, num_l, num_female_speakers)edge = high correlation (>0.75 or <-0.75)
average_path_lengthdiametermax_degreenum_connected_components
num_of_segmentsaverage_path_lengthmax_degree
comic space
tragic zone
non-comic libretti
autonomous region
1770-1819
1820-1869
1870-1921
1620–1669
1670–1719
1720–1769
1770-1819
1820-1889
word_count_stage word_count_sp
Random Forest Classifier
5-fold cross validation on all data
Iterative selection of the best n estimators
parameter (10-1000)
Looking at feature importances
word_count_stageword_count_sp num_connected_componentsdensity num_of_speakersdiameterword_count_spnum_of_person_groupsaverage_degreefour-class implementation
plotting each play individually
LOWESS-based smoothing curves to make trends visible
Libretti have less spoken text and more stage directions
trend more prominent in French, but visible also in German
🇩🇪 num_groups / word_count_sp
🇫🇷: density / num_speakers
it is easier to confuse comedies and comic libretti
it is easier to confuse comedies and comic libretti
the two types of French libretti are more distinct than the German ones
Individual structural features might be useful for distinguishing libretti from non-libretti (e.g. text length), or comedies from tragedies (density)
However, it is not easy to distinguish between plays formalised as vectors of multiple features
Drama seems too homogenous, in terms of structural properties, for discriminative clustering
Topic modelling seems actually to work better in distinguishing genres — as per Shaw's famous quote