History
and social media

How to collect tweets for archiving and analysis purposes: practice and methods

1. Why archiving tweets?

2. Case studies

3. The different ways to collect tweets
4. Hand's on!

Why archiving and analysing tweets?

Tweets are primary sources

They can help understand

how people engage with history and the past
how collective memories develop

Archived, they will help understand the past.

Anonymised tweets from the night of the Bataclan attacks

About this tweet, see: Joshua Sternfeld, « Historical Understanding in the Quantum Age », Journal of Digital Humanities, 3-2, 2014.

Tweet from the night of the Bataclan attack, with the #portesouvertes hashtag

Why Twitter?

Because we can.

Facebook/instagram data is hard to collect.

Whatsapp / Snapchat are impossible to collect.

An application programming interface (API) is a computing interface which defines interactions between multiple software intermediaries.

source: wikipedia

Pitfalls

Who are we studying when collecting and analysing Twitter data?

the demographics of Twitter is complicated (if not impossible) to understand
the diversity of Twitter accounts
- people, institutions, groups, bots, etc.

What do we study with Twitter?

Information circulation (memetics) where information is understood in a very wide meaning.

See: Dominique Boullier, « Big data challenges for the social sciences: from society and opinion to replications », arXiv:1607.05034 [cs], 2016.

Practices

Case studies

#ww1 and #covid19

Engaging with the past:
the Centenary of the First World War

Unique series of commemoration
2014-2018
Throughout Europe and North America

9 millions+ tweets collected

Memory in the making: #covid19fr

Background: John Hopkins University coronavirus map. Screenshot (7/7/2020)

Different ways
to collect tweets

Buying tweets
Search API
Streaming API
scrapping

Search API

You can look for tweets in the past (up to 7 days)
You can collect around 3000 tweets per hour
No need to get a twitter developper account

Streaming API

You can get up to 1% of the firehose (of the tweets published at a precise moment)
Only tweets being published / will be published: necessity to anticipate
You need a twitte developper account

Exception: API v2 academic product track

10 millions tweets pro month
in the full history of Twitter
but...

Web scrapping

Not Twitter TOS compliant
Less metadata
But you can get tweets up to 2006 (creation of Twitter)

Hand's on!

Get TAGS

https://tags.hawksey.info/get-tags/

Install TAGS

Copy TAGS to your google drive

Authorize Twitter

Set up the sheet

Collect!

Advanced functionalities

Summary Sheet
Dashboard Sheet

Network dataviz

TAGSExplorer

A look at the metadata

Bibliography
and tools

Tools

DMI TCAT (streaming API)
Twarc / rehydrate (streaming API)
TAGS (search API)
Twitter explorer (search API, https://www.odycceus.eu/news-detail/news/twitter-explorer/)
Twint (scrapping, non compliant to Twitter TOS)

Bibliogaphy

Valérie Schafer, Gérôme Truc, Romain Badouard, Lucien Castex et Francesca Musiani, « Paris and Nice terrorist attacks: Exploring Twitter and web archives », Media, War & Conflict, , 2019, p. 1750635219839382.

Evelien D’heer, Baptist Vandersmissen, Wesley De Neve, Pieter Verdegem et Rik Van de Walle, « What are we missing? An empirical exploration in the structural biases of hashtag-based sampling on Twitter », First Monday, 22-2, 2017.

Martin Grandjean, « A social network analysis of Twitter: Mapping the digital humanities community », Cogent Arts & Humanities, 3-1, 2016, p. 1171458.

Michael Zimmer, « The Twitter Archive at the Library of Congress: Challenges for information practice and information policy », First Monday, 20-7, 2015.

Shirley A. Williams, Melissa M. Terras et Claire Warwick, « What do people study when they study Twitter? Classifying Twitter related academic papers », Journal of Documentation, 69-3, 2013, p. 384‑410.

Danah Boyd, Scott Golder et Gilad Lotan, « Tweet, tweet, retweet: Conversational aspects of retweeting on twitter », IEEE, 2010.
Danah M Boyd et Nicole B. Ellison, « Social Network Sites: Definition, History, and Scholarship », Journal of Computer-Mediated Communication, 13-1, 2007, p. 210‑230.
Hany M. SalahEldeen et Michael L. Nelson, « Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost? », arXiv:1209.3026, , 2012.

Jean-Christophe Peyssard, « Archiving Web Content ». https://halshs.archives-ouvertes.fr/cel-02130558/document

Two prez: «How to deal with 4 millions+ tweets when you are not a data scientist» (https://orbilu.uni.lu/handle/10993/35017)

and «Twitter data as primary sources for historians: a critical approach» (with S. Papastamkou) (https://orbilu.uni.lu/handle/10993/37070)

Documenting the Now

https://www.docnow.io/

History and social media. How to collect tweets for archiving and analysis purposes: practice and methods

By Frédéric Clavert

History and social media. How to collect tweets for archiving and analysis purposes: practice and methods

Slides for the Venice virtual summer camp on digital and public history.

1,992

Frédéric Clavert

historian. digital history. digital memory studies. join me on mastodon: @inactinique@mastodon.social

History and social media

How to collect tweets for archiving and analysis purposes: practice and methods

Why archiving and analysing tweets?

Tweets are primary sources

Why Twitter?

Pitfalls

What do we study with Twitter?

Case studies

#ww1 and #covid19

Engaging with the past: the Centenary of the First World War

Memory in the making: #covid19fr

Different ways to collect tweets

Search API

Streaming API

Exception: API v2 academic product track

Web scrapping

Hand's on!

Get TAGS

https://tags.hawksey.info/get-tags/

Install TAGS

Advanced functionalities

Network dataviz

A look at the metadata

Bibliography and tools

Tools

Bibliogaphy

Documenting the Now

History and social media. How to collect tweets for archiving and analysis purposes: practice and methods

History and social media. How to collect tweets for archiving and analysis purposes: practice and methods

Frédéric Clavert

More from Frédéric Clavert

History
and social media

Engaging with the past:
the Centenary of the First World War

Different ways
to collect tweets

Bibliography
and tools