data-driven history?
the #ww1 and #covid19fr projects

data science for the humanities

frédéric clavert / university of luxembourg / / @inactinique

who am I?

// from monetary history
// to european integration history and digitized archive

// to digital memory studies

data science: nothing new?

history and quantitative methods

The school of the Annales (France)
and its second generation of historians

  • Furet & Daumard, 1959
  • Garelli & Gardin, 1961
  • Prost, 1974 (lexicography)

then, what's new?

the nature of the data we are dealing with

  • digitization
  • big data platforms artefacts


Boullier, 2016

the centenary of the first world war on twitter

harvested corpus

  • 1st april 2014 - 1st december 2019
  • 9 million+ tweets collected
  • +/- 1.5 million users
  • 2/3 of retweets / 1/3 of original tweets


seems to be a lot

it is not

historian facing an unmanageable ocean of data (or a painting by D. K. Friedrich)

historian facing an unmanageable ocean of data

 Der Wanderer über dem Nebelmeer (C. D. Friedrich)

how to  read
9 millions tweets?

the machine reads it for you

[...] whereas what we really need is a little pact with the devil: we know how to read texts, now let's learn how not to read them. Distant reading: where distance, let me repeat it, is a condition of knowledge: it allows you to focus on units that are much smaller or much larger than the text: devices, themes, tropes -- or genres and systems.

multiscale reading

distant reading


an unbroken link to individual tweets

distant reading in practice

number of tweets per day in the #ww1 corpus (01.04.2014-01.12.2019)

estimation of the language repartition (french / english)

hierachichal descending classification


(Reinert 1983 & 1993: théorie des mondes lexicaux)


historical research and social media in times of pandemic

IFPH: « You are the primary source: COVID-19 Story-Collecting Initiatives »

Source: IFPH.
« You are the primary source: COVID-19 Story-Collecting Initiatives »

We are living in extraordinary times. Every South Australian is experiencing a truly global, history-making event, with both shared and unique perspectives. The History Trust of South Australia aims to document and collect objects that are connected to the experiences of people in our state during the pandemic—preserving the present for the future.

History Trust of South Australia (May 2020)


harvesting primary sources in a world of data (in crisis)


Harvesting tweets about the pandemic

  • Only French hashtags (because of the 1%)
  • Re-use of the #ww1 savoir-faire => fast answer
  • Re-use of the #ww1 server


  • collecting tweets as long as possible
  • observing the memorialization of the crisis
    • memory in the making
  • comparing
    • with d. paci (ca foscaria)

the question of the memory of the crisis was discussed as soon as the lockdown started

DNA, 17.07.2020


temporality of crisis /
temporality of history



as a conclusion

digital history as bricolage

La pensée sauvage (1962), Claude Levi- Strauss


  • intellectual bricolage : concrete thinking allowing social organisation and collective rebalancing, when scientific thinking can lead to destablilization of a social order
  • digital bricolage is hence here understood as an (academic) answer to technological disruption

how to carry your own research, while tools, methods, and even primary sources (its form and its volume) are fastly changing whereas you are not able to read / understand all the litterature you should read and understand.

digital history pitfalls

why twitter? the digitization / born digital shadows

risks of dealing with twitter data / born digital sources

what is a balanced corpus?

the illusionary order

bibliographical elements

François Furet et Adeline Daumard, « Méthodes de l’Histoire sociale: les Archives notariales et la Mécanographie », Annales ESC 14 (4), 1959, pp. 676‑693.

Prost, Antoine. 1974. Vocabulaire des proclamations électorales de 1881, 1885 et 1889. Paris : Presses universitaires de France.

Paul Garelli et Jean-Claude Gardin, « Étude par ordinateurs des établissements assyriens en Cappadoce », Annales ESC 16 (5), 1961, pp. 837‑876. En ligne: <>.

Boullier, Dominique. « Big data challenges for the social sciences: from society and opinion to replications ». arXiv:1607.05034 [cs], juillet 2016.,

Data Science for the Humanities: the #ww1 and #covid19fr projects

By Frédéric Clavert

Data Science for the Humanities: the #ww1 and #covid19fr projects

  • 815