social media
as primary sources

frédéric clavert / university of luxembourg / / @inactinique

who am I?

// from monetary history
// to european integration history and digitized archive

// to digital memory studies

the centenary of the first world war on twitter

a research project always
starts in a train

harvested corpus

  • 1st april 2014 - 1st december 2019
  • 9 million+ tweets collected
  • +/- 1.5 million users
  • 2/3 of retweets / 1/3 of original tweets


seems to be a lot

it is not

  • lamp server
  • harvesting scripts: 140dev / dmi-tcat
  • keyword / hashtag based
  • twitter api 1.1 (stream)
  • home based server => server at the university
    ( then
historian facing an unmanageable ocean of data (or a painting by D. K. Friedrich)

historian facing an unmanageable ocean of data

 Der Wanderer über dem Nebelmeer (C. D. Friedrich)

how to  read
9 millions tweets?

the machine reads it for you

[...] whereas what we really need is a little pact with the devil: we know how to read texts, now let's learn how not to read them. Distant reading: where distance, let me repeat it, is a condition of knowledge: it allows you to focus on units that are much smaller or much larger than the text: devices, themes, tropes -- or genres and systems.

multiscale reading

distant reading


an unbroken link to individual tweets

distant reading in practice

number of tweets per day in the #ww1 corpus (01.04.2014-01.12.2019)

estimation of the language repartition (french / english)

hierachichal descending classification


(Reinert 1983 & 1993: théorie des mondes lexicaux)


historical research and social media in times of pandemic

IFPH: « You are the primary source: COVID-19 Story-Collecting Initiatives »

Source: IFPH.
« You are the primary source: COVID-19 Story-Collecting Initiatives »

We are living in extraordinary times. Every South Australian is experiencing a truly global, history-making event, with both shared and unique perspectives. The History Trust of South Australia aims to document and collect objects that are connected to the experiences of people in our state during the pandemic—preserving the present for the future.

History Trust of South Australia (May 2020)


harvesting primary sources in a world of data (in crisis)


Harvesting tweets about the pandemic

  • Only French hashtags (because of the 1%)
  • Re-use of the #ww1 savoir-faire => fast answer
  • Re-use of the #ww1 server


  • collecting tweets as long as possible
  • observing the memorialization of the crisis
    • memory in the making
  • comparing
    • with d. paci (ca foscaria)

the question of the memory of the crisis was discussed as soon as the lockdown started

DNA, 17.07.2020


temporality of crisis /
temporality of history



as a conclusion

digital history as bricolage

La pensée sauvage (1962), Claude Levi- Strauss


  • intellectual bricolage : concrete thinking allowing social organisation and collective rebalancing, when scientific thinking can lead to destablilization of a social order
  • digital bricolage is hence here understood as an (academic) answer to technological disruption

how to carry your own research, while tools, methods, and even primary sources (its form and its volume) are fastly changing whereas you are not able to read / understand all the litterature you should read and understand.

digital history pitfalls

why twitter?

risks of dealing with twitter data / born digital sources

what is a balanced corpus?

the illusionary order

limits of digital bricolage

  • big data from a historian’s point of view...
    • not really big data (and it doesn't matter)
  • how to go through the data analysis jungle?
    • too many tools
    • unflexible
    • standardizing research
  • how to see the weak signals

intensively thinking digital historian

a changing allure of the archive?

bibliographical elements

social media as primary sources

By Frédéric Clavert

social media as primary sources

  • 7,466