Investigating the Evolution of Early Modern European Drama with Digital Methods

Luca Giovannini

(Potsdam)

Ringvorlesung Digital Humanities im Fokus: Methoden, Anwendungen und Perspektiven

Rostock, 28.04.2025

Today

  • Theory
  • Corpus
  • Methods
  • Experiments
  • Conclusions

Theory

Research question

How did the different European dramatic literatures develop their own peculiar features during the early modern era?

Corneille, Cinna (1639)

Shirley, The Gentleman of Venice (1639)

A possible approach

 

 

Franco Moretti

"Modern European literature: A geographical sketch"

New Left Review 

206 (1994), 86-86

 

 

Core argument

According to Moretti, the evolutionary mechanism of literary history is similar to the biological one:

 

  • New animal/vegetal species are born as a consequence of their movement in new spaces 🦜
  • New literary forms are born thanks to the new political-geographical spaces emerging throughout European history 📚

Speciationist model (Darwin's finches)

          1400                        1500                          1600                            1700                           1800

According to Moretti, the same process of speciation happened in European tragedy

🇫🇷

🇩🇪

🇮🇹

🇬🇧

🇪🇸

Common model of European tragedy (based on Senecan and medieval heritage)

Discontinuous, fractured, the European space functions as a sort of archipelago of (national) sub-spaces, each of them specializing in one formal variation.

(Moretti 2013 [1994]: 12)

Concurring explanations

  • Other scholars emphasise unity over diversity:
    • Küpper (2018): early modern drama as a cultural
      net
       in which textual elements such as plots, characters, and motifs circulate and are periodically extracted, reworked and reused in individual
      plays
    • Clubb (1990): diffusion of theathergrams

'Irregular' theatres 🇪🇸 🇬🇧

  • influenced by medieval theatre practices
  • emphasis on performance
  • looser structure

'Regular' theatres 🇮🇹 🇫🇷

  • associated with humanist theatre
  • meant mostly to be read
  • codified structure after Aristotle

Early modern drama aesthetics

 

According to Lotman's typology of culture:

irregular theatres example-based culture

regular theatres rule-based culture

 

How to empirically test the evolution of early modern drama?

Corpus

Corpus parametres

  • 150 plays in five languages (🇮🇹 🇫🇷 🇪🇸 🇩🇪 🇬🇧)

  • time span: 1561-1710 (150 years)

  • purposefully non-canonical approach

  • balance between representativeness and practicability

Birth locations for the corpus authors (via Wikidata, incomplete)

  • DraCor (Drama Corpora) is an open access platform for research on dramatic literature.
  • Currently: 21 “programmable corpora” in 16 different languages; more than 4000 texts in XML-TEI format.
  • Applications and tools for CLS + easy API access (e.g. computing network metrics and speaker distribution, SPARQL searches on Linked Open Data, etc.)

Homepage: dracor.org

Paper: Fischer et al. 2019

Tour: bit.ly/einfdrac

A model for corpus construction: DraCor

DraCor Summit

Berlin, 01-05.09.2025

Tutorials, barcamp, round tables & keynotes, corpus conference, workshop on computational drama analyis, and much more!

⚠️ CfPs open until 06.05 ⚠️

👉 summit.dracor.org 👈

Text onboarding pipeline

Integrating the corpus within the DraCor environment

  • Creating offline custom DraCor corpora from local TEI/XML files through the Docker technology
    • Enables use of all DraCor tools for CLS analysis
    • Ensures reproducibility of results

Early Modern Drama Corpus (EmDraCor)

Homepage

Methods

Operationalising drama

  • Identifying key component of drama:
    • dialogue
    • characters
    • plot
  • Surveying popular options for computational drama analysis (after Willand et al. 2017):
    • topic modelling of dramatic genres
    • exploration of network structures
    • analysis of character speeches (stylometry).
  • Here: a mix of quantitative formalist approaches
    • content- and language-agnostic
    • form-oriented
  • Boris Yarkho
    • Member of the Moscow Formalist Circle
    • Methodology for a Precise Science of Literature (*2006)
  • Solomon Marcus
    • Mathematical Poetics (1970)
  • Hartmut Ilsemann
    • “Computerized Drama Analysis” (1995)
    • DRAMANALYS.exe

Models and forerunners

"Measurements can be taken of any quantifiable aspect of a text, but figuring out the significance of that metric to an understanding of the text, or better, mapping that metric onto a preexisting critical concept (such as style, plot, or theme), is crucial to making sense of what is being measured" (Algee-Hewitt 2017: 759)

Individuating suitable metrics to be extracted from EmDraCor texts

Drama metrics

collected via the DraCor API or re-computed independently

(following Algee-Hewitt 2017, Szemes and Vida 2024, Trilcke et al. 2017)

Type Features
Network Size; Average clustering coefficient; Density; Average path length; Average degree; Diameter; Maximum degree; Number of edges; Number of connected components; Ratio, average degree to maximum degree; Ratio, maximum degree to number of characters; Weighted degree distribution; Protagonism; Mediateness
Cast and speech Average characters per scene; Average length of character speech; Speech intensity; Gendered speakers; Collective speakers
Size Number of acts; Number of Segments; Word count, whole text; Word count, spoken text; Word count, stage directions; Number of prose lines; Number of verse lines
Plot All-in index; Final scene size; Drama change rate

Desiderata

  • go beyond the comparison of single measures
  • capture different structural dimensions of
    drama in an holistic fashion

Solution

  • collect metrics in feature vectors or play embeddings, i.e., multidimensional vectors containing a variety of measures embodying different textual aspects of the dramatic text

example_play_1 = {10, 2, 0.5714, 3, 16792, ...}

example_play_2 = {26, 6, 0.3422, 2, 40098, ...}

num_speakers num_speakers_
groups
density (SNA) avg_degree (SNA) word_count_sp
Example_play_1 10 2 0.5714 3 16792 ...
Example_play_2 26 6 0.3422 2 40098 ...

How vectors work

Methodological reflection

  • drama vectorisation as an expression of the ’cultural technique of flattening’ (Krämer 2023) typical of DH
    • not as a reductionist attempt to overtly simplify the complexity of theatrical texts
    • rather as a prosecution of formalist morphological thinking with modern computational methods
  • interplay between operationalisation and flattening:

Experiments

Why vectorisation?

  • offers a pathway to an empirical falsification of Moretti’s account on the progressive diversification of European early modern drama
  • allows an investigation on “how dramatic forms change and are preserved within a framework of cultural evolution” (Szemes and Vida 2024: 18)

Ground hypothesis

  • distances between vectors can be assumed to express the degree of (dis)similarity between them
    • two plays whose vectors are quite far one from the other will also be different in terms of form
  • the branching of dramatic traditions can be seen through the ’progressive distancing’ of the plays’ vectors

1. Distances

Options for measuring distances between vectors

                      pairwise                                 centroid-based

First implementation: pairs

  1. Divide the 150 play embeddings into groups according to their normalised year (30 and 15 years long timespans)
  2. Within each group, calculate the Euclidean distance between each pair of plays.

3. Assume the average of these distances to represent the evolution of 'formal difference' within each time frame

the lighter the colours, the more different groups of plays are (in terms of formal differences)

Second implementation: centroids

2. Clusters

1. Representing play vectors as points on a Cartesian plane via dimensionality reduction methods (here: PCA)

 

 

 

 

 

 

 

 

 

2. Visually identifying clusters based on formal/structural similarities

Some clustering seems to emerge towards the end, but it is still not enough

Plays are plotted according to successive 30-years-long timeframes

3. Patterns

Approach

  • Focus on each individual metric and follow its evolution across the different sub-corpora
  • Compute shifts in absolute value for each metric across the time span
  • Verify whether the variation of a given feature has become highly distinctive of a national tradition against the others
    • Caveat: not all metric variations are significant!

First steps towards a quantitative profiling of dramatic traditions

  • 🇬🇧 : enhanced network connectedness, dispersion of the protagonist role (cf. Algee-Hewitt 2017), increase in female cast, progressive implementation of French liaison des scènes
  • 🇩🇪 : shift towards sparser network, increase in segmentation (Neoclassical influence)
  • 🇪🇸 : expanding and more intricately connected dramatic networks, absence of central mediating figures, increase in non-gendered speaker, decreasing drama change rate
  • 🇮🇹 : progressive concentration of the protagonist role in
    one or few characters, increase of stage directions (CdA)
  • 🇫🇷 : streamlined network models, more stability in stage configurations

Evolutionary trends within the subcorpora

A reproduction experiment

  • Concerns about reliability of results → attempt at reproduction (same methods, similar but not unrelated data, cf. Schöch 2023)
  • Data: EngDraCor (source: Early Print Project) and FreDraCor (source: Théâtre Classique)
  • Method: vectorisation and distance experiments  
  • Goal: compare results to those obtained on the corresponding EmDraCor subsets (EmDraCor-eng and EmDraCor-fre)

Results

  • Trends correspondence: 🇬🇧: 62%, 🇫🇷: 30%
    • due to small-sample bias, different markup quality
    • divergence is however limited (±5%)
  • Generally, trends recognised in the smaller EmDraCor sub-corpora are amplified versions of patterns found in the broader one (EngDraCor/FreDraCor)
  • Results from EmDraCor are not fully replicated, but its trend can still highlight trends in the formal development of European theatre
  • Just like in EmDraCor, compressing plays into meaningful vectors of formal features shows relevant differences between the larger EngDraCor and FreDraCor.
  • Even a simple PCA underscores the ’regularity’ of French theatre as against the ’irregularity’ of the English one.

Conclusions

  • Findings partially support Moretti’s thesis of increasing formal diversification in early modern drama.
  • Vector distance analysis is inconclusive, but PCA projections suggest a branching of dramatic traditions, though clusters are not sharply defined.
  • Metric distribution analysis provided the most valuable insights into dramatic evolution: trends in plot structures, character features, and network arrangements indicate the emergence of distinct dramatic traditions.
  • The reproduction experiment on English and French DraCor supports the validity of EmDraCor results.

Results

  • theoretical framework
    • other models of cultural evolution beyond speciation might be more suitable (cf. Sobchuk 2023)
  • corpus specifics (size, languages, etc.)
  • operationalisation of the concept of drama
    • choice of metrics employed

Limitations

  • construction of a multilingual, open-access, machine-actionable corpus of
    150 TEI/XML-encoded plays
  • empirical reassessment of a previous theory via quantitative methods ('triangulation')
  • further development of a key methodology (vectorisation of text based on formal features)

Contributions

Thanks!

giovannini@uni-potsdam.de

@lucagiovannini.bsky.social

Bibliography

#Ringvorlesung: Rostock

By luca-giovannini

#Ringvorlesung: Rostock

  • 105