Media Monitoring of the Past
∨ Use the vertical arrows to navigate the slides
Here's how the slide deck is built
Introduction
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Plan
You are here
Impresso
Simple search
Datalab
Perspectives
Plan
Use the vertical arrows to navigate
or the horizontal arrows to jump to the next chapter
"Media Monitoring of the Past"
Impresso 1
2017-2020
Impresso 2
2023-2027
Developing new approaches and interfaces for the exploration and critical analysis of historical media archives.
Swiss and Luxembourg newspapers
Cutting-edge platform
New collections from across Europe
Radio broadcasts
Datalab
Funded by:
Hosted by:
Principal investigators
An interdisciplinary team spanning history, web development, linguistics, and digital humanities
A network of partner institutions that provide digitized newspaper and radio collections
In short, the Impresso project is a series of challenges:
I.
The data
Multiple sources, multiple formats, massive data, and numerous legal issues
II.
The design
Make all of this accessible and useful to the public
III.
The analysis
Enrich data to create added value and develop relevant analysis tools
IV.
Writing history
How should all of this be interpreted in the context of historical research?
The collections are massive, but also very diverse:
Press from 1732 to 2018
Radio programs in audio format with speech-to-text
Transcripts of radio news broadcasts
The collections come from different institutions:
Consequences:
Impresso user plans
As many features as possible, but with the smoothest possible user experience
Creating an app that provides access to tens of millions of newspaper articles and radio broadcasts is one thing.
Making sure it’s usable by both the general public and advanced researchers is another.
Impresso web app
Multiple filters
date range
language
keywords
source
length
article type
and many more...
Advanced features
Named entities profiles
Corpus overview
Ngrams
Text reuse
Browse newspapers and radio broadcasts at the same time
Example of future integration of audio files into the Impresso app
It's not just about collecting, it's about enriching and connecting
Enriching and connecting historical sources by transforming noisy and heterogeneous sources into semantically enriched and structured data, ultimately connected in a common vector space.
It's not just about collecting, it's about enriching and connecting
Conducting transmedia and transnational historical research using semantically enriched historical media sources
Impresso API
to process other sources
via notebooks in a datalab
and compare them to our data
The Impresso datalab, a suite of analysis tools that leverage the API and enriched data from the press and radio
A Jupyter notebook is a Python script that allows users to retrieve Impresso data and perform operations
Impresso datalab
The goal is to spark users’ curiosity, make computational methods accessible to them, and encourage them to explore code without needing to master it.
Descriptive statistics for a collection
Network of entities
Map of locations
Semantic proximity
Bringing a community together and conducting historical research
Impresso also organizes workshops and international conferences
Bringing a community together and conducting historical research
Impresso Seminar
Building bridges with existing communities
We develop case studies to examine how certain phenomena are covered in the media.
This opens up new avenues for historiographical reflection
In the next chapter, let’s see how to use the Impresso app to explore the relationship between print media and radio
What does the press have to say about radio?
An example of how the tools developed by Impresso can be used to address a media history question
Radio Nations in Prangins in the 1930s
United Nations Archives Geneva (P132-01-007)
Newspaper stand at Zurich train station, 1962
ETH-Bibliothek Zürich (Com_C17-073-047-001-005)
Welcome to the Impresso web app
Before starting a search, log in to access more content and facsimiles.
Let's search for "radio" in article contents
This very broad search returns over a million results
But what are we looking at?
Total number of articles
Proportion (%)
For example: Is the result biased by the nature of the corpus, since some newspapers are only available during certain periods?
But what are we looking at?
Or: Does this keyword mainly return results in a specific language, a specific country, or from specific news outlets?
And is that even the right keyword?
This is a classic problem in searching large text corpora: keywords are an imperfect way to filter
- Is the keyword polysemous?
Radio = the device, the institution, the broadcast?
- Does it appear in any other word?
Radioactivity, Radiography, ...
- Is the concept we're looking for gradually being replaced by another one?
- Are there different ways to spell it?
(in the case of radio, no problem)
Radio -> Podcast
- Does this word often cause OCR issues?
radio, r4dio, radid, adio, 'radio, ...
- Should it be paired with other keywords?
Radio AND Broadcast, Radio OR Antenna
By selecting a date range (in this case, a decade), we reduced the number of results by a factor of 10.
Let's narrow down the search using filters
Here, three French-language Swiss newspapers from different regions and with different political leanings
Remove articles that are less than 100 words long
Separate articles from advertisements
Keep only what is featured on the front page
Search filters
Search summary
Content item
The facsimile is visible only if you have access rights
Each result is available as a facsimile and/or transcript
Article in context, along with the entire issue
Block-by-block transcription, for close reading
To preserve a corpus, it can be saved as a collection.
All articles are now tagged so they stand out in search results.
Each collection has its own page, complete with some descriptive statistics.
A collection can be exported as a CSV file.
Once exported, the CSV file can be downloaded and opened in a spreadsheet program.
It includes all the data that your plan grants you access to.
Now that we’ve explored our collection, how is radio depicted in print media?
Category 1.1 Advertisements
At first glance, this may not be what we’re looking for, but radio is widely present in the press as a commodity.
Radio can also serve as a selling point for other products, such as cars.
Category 1.1 Advertisements
If we compare an opinion newspaper with a mainstream newspaper, the difference in the proportion of advertisements is clear.
Journal de Genève
8 % of "radio" articles are ads
L'Express
31 % of "radio" articles are ads
Category 1.1 Advertisements
Category 1.2
Job advertisements
Radio isn't just a device, it's also an employer. Job listings in the press allow for documentation of this professional reality.
La Liberté 1956
La Liberté 1978
Category 2
Radio program
From the very beginning, the radio needed the press to publicize its weekly schedule.
Journal de Genève 1952
Feuille d'avis de Neuchâtel 1930
It is worth noting that even the dissemination of this very practical information creates synergies among the media
Category 2
Radio program
The opposite is also true: newspapers often include the schedules of many stations because they know this is an essential service for many of their readers.
Gazette de Lausanne 1986
Category 3
Radio as a source of information
More interesting than advertisements or radio programs, radio can also be cited as a source of information in news articles.
Feuille d'avis de Neuchâtel 1940
A search of the Impresso corpus shows that this was fairly rare in Swiss newspapers during the early decades (1920–1950), but it became more common thereafter.
Category 3
Radio as a source of information
An interesting observation: it is very often radio stations from communist countries that are mentioned in Swiss newspapers
Feuille d'avis de Neuchâtel 1950
Feuille d'avis de Neuchâtel 1960
The Swiss media do not have correspondents in these countries and therefore rely on reports from government radio stations
Category 3
Radio as a source of information
As the second half of the 20th century progresses, radio is cited more and more often as a source of information. It is increasingly becoming an integral part of the media landscape.
Feuille d'avis de Neuchâtel 1973
Category 4
Radio as a subject in the news
The “richest” occurrences in this corpus involve references to radio as the subject of a news article.
Feuille d'avis de Neuchâtel 1940
Here, the press is complaining about the poor language used by radio hosts!
As soon as radio became widely accessible, it became a topic of discussion in its own right for the press. For example, newspapers began to review the programs themselves.
Category 4
Radio as a subject in the news
Feuille d'avis de Neuchâtel 1960
"Miroir du monde" is the flagship international news program on the Swiss French-speaking radio, coming soon in the Impresso app!
The press is becoming a platform for reacting to what is said on the radio, a forum for discussing the new profession of radio journalism.
Feuille d'avis de Neuchâtel 1970
The "radio critique" section
Category 4
Radio as a subject in the news
Feuille d'avis de Neuchâtel 1940
The press is a valuable source for exploring the institutional history of radio.
Feuille d'avis de Neuchâtel 1970
Feuille d'avis de Neuchâtel 1980
Category 4
Radio as a subject in the news
Feuille d'avis de Neuchâtel 1940
The press also shows us when and how radio becomes part of people’s daily lives.
Feuille d'avis de Neuchâtel 1960
Category 4
Radio as a subject in the news
L'Impartial 1926
Finally, the press is also a source for studying the history of radio technology
Feuille d'avis de Neuchâtel 1940
Feuille d'avis de Neuchâtel 1960
Category 4
Radio as a subject in the news
Feuille d'avis de Neuchâtel 1960
Worth noting: While radio and TV magazines would be a major source for writing about the people behind the radio, the press is also very valuable.
Feuille d'avis de Neuchâtel 1960
These two articles, published in the same issue on March 30, 1960, offer two unique perspectives on the people behind the microphone:
On the left, a popular radio host “reveals” himself in a photo… for an advertisement for a brand of razors! On the right, a report states that a radio employee has been convicted for refusing to serve in the Swiss army due to his pacifist and religious beliefs.
This leads us down an interesting path: can we study the people who appear in the selected articles?
In the Impresso app, “named entities” (people, places, organizations) mentioned in articles are automatically identified.
This leads us down an interesting path: can we study the people who appear in the selected articles?
In a collection centered on the keyword “radio,” three categories of people emerge: composers, heads of state and prominent politicians, and… figures from the radio world.
In our case, two individuals emerge: the journalist René Payot (who moved from print media to radio), and the satirist Jack Rollan (who did the opposite).
Entities are linked to the Wikidata entries for the relevant individuals, if such entries exist.
The platform allows users to search for all instances of a named entity
This provides insight into the timeline of René Payot’s career, a journalist for the Journal de Genève who went on to become a radio commentator on international news during World War II, before hosting widely listened-to international news programs for two decades.
But very soon, there will also be radio collections available right in Impresso!
1. Radio in text format
Radio typescripts
We have received the transcripts from Swissinfo, the Swiss international radio station, covering the period of World War II, in French and German.
But very soon, there will also be radio collections available right in Impresso!
2. Radio in audio format
Radio broadcasts
We are making major changes to the platform to enable it to support audio.
We have collections from RTS, radio stations in French-speaking Switzerland, and the INA, which archives French public radio, just waiting to be released.
In the next chapter, let’s see how digital humanities tools allow us to go further with the data from Impresso.
Taking it further: press archives and quantitative methods
Welcome to the Impresso datalab
First, you need to log in to the datalab to be then able create a temporary personal token.
The Datalab contains instructions for programmatically accessing the Impresso API. But for the general public, it’s the notebooks that are of particular interest.
We have also developed additional notebooks that are currently pending. We also invite colleagues outside the project to develop their own notebooks, which we could add here.
But what exactly is a notebook?
A Jupyter Notebook is an interactive, (often) web-based tool that lets you combine live code, visual graphics, mathematical equations, and narrative text into a single shareable document.
1. A notebook contains lines of code
2. It also includes comments explaining what we're doing
3. Each cell of code can be executed to see the result immediately
Let's use a very simple statistical analysis notebook
This notebook was created for educational purposes, to generate basic statistics based on an Impresso collection.
We open it in Google Colab, an interface that lets us work online with this Jupyter notebook
And we upload the CSV file of the collection we created during our previous exploration of radio coverage in the press.
Each line of code is commented to guide the user
Here, we load the dataset
The notebook guides us through the process of creating our first chart.
In this case, a histogram that shows how many articles in our collection come from which newspapers.
Remember that our collection includes articles containing the keyword “radio” from three Swiss newspapers during the 1950s.
The notebook then generates a timeline of articles, organized by newspaper.
We can see that the number of articles in La Sentinelle (LSE) mentioning the radio increased toward the end of the decade
The notebook then guides us through the process of generating other exploratory statistics.
For example, an analysis of article length (as a reminder, when we created the collection, we excluded all articles with fewer than 100 words)
The length of articles varies from one newspaper to another: on average, articles about radio in La Liberté (LLE) are twice as long as those in Le Journal de Genève (JDG).
This second notebook helps highlight the main named entities that appear together in the articles
No need to create a collection beforehand; the notebook will connect to the Impresso data on its own.
So we can run a query directly in the notebook. For example, to find the 100 people who appear most frequently in articles containing the word “radio” in the three newspapers covered by our previous analyses for the same period.
It generates a network that shows how often the most frequently mentioned people appear in the same press articles.
This network can then be exported and viewed more clearly in software such as Gephi
In the case of the "radio" keyword, the entities are very strongly influenced by the music programming. Composers (in white) are much more prominent than political figures (in blue) or media personalities (in red).
This third notebook will retrieve the geographic coordinates of the locations mentioned in the articles and generate a map.
Just like in the previous notebook, we run the query directly here.
Next, the top 100 results are geolocated by querying Wikidata.
It generates a map showing the cities (in black) and countries (in red) mentioned in the news articles.
In the case of a corpus built around the keyword “radio,” the results aren’t very interesting: the locations are generally the cities where the broadcasts or concerts take place, with a few capital cities occasionally mentioned in news stories. This approach is much more interesting when working with a more general-purpose corpus.
What are embeddings?
Animation: Coenen and Pearce, “Understanding UMAP”, https://pair-code.github.io/understanding-umap/
How can we interpret a UMAP, a cloud of dots that is a “flattened” version of our original multidimensional object (which is itself too complex to grasp)?
Visualizing vector representations of articles in 2D reduces the dimensionality of the semantic distance between them (from hundreds of dimensions to just 2).
Interpreting semantic proximity maps (UMAPs)
Animation: Michelet and Grandjean (2026) "Les embeddings, nouvel outil d’une histoire numérique des grands corpus de presse?"
A tool for exploring large press corpora
UMAP of all 60 000 articles published in La Liberté, the Journal de Genève and the Feuille d'avis de Neuchâtel in 1950.
Let's analyze our collection as a UMAP
This notebook, which is not yet available on the Datalab, retrieves the embeddings already calculated by Impresso or calculates them again if they are not provided by the API.
We've tweaked the collection's filters slightly to include advertisements and short articles that aren't on the front page, which results in a larger and more diverse corpus. This corpus of 4,000 articles is processed in about 20 minutes.
La Liberté (Catholic, Fribourg)
Journal de Genève (Liberal conservative, Geneva)
La Sentinelle (Socialist, La Chaux-de-Fonds)
A map for exploring a corpus of press articles about radio
Each dot represents an article from 1950 that contains the keyword “radio” in the three selected newspapers.
This UMAP “map” is a simplification of the semantic distance between all these articles; it makes significant compromises to ensure it is readable in 2D.
This doesn't mean that if two articles are close to each other, they are the most similar; it means that they probably belong to the same semantic register (that they are about the same thing). That's why we'll focus mainly on the groups rather than on individual situations.
La Liberté (Catholic, Fribourg)
Journal de Genève (Liberal conservative, Geneva)
La Sentinelle (Socialist, La Chaux-de-Fonds)
What can we see on this map?
Stock market and financial news
(Radio corporation of America stocks)
International news
Religious news
"Faits divers"
(accidents, crimes, unexpected)
National/local news
Foreclosure auctions
(A radio device is included among the items sold)
Radio programs
(Each newspaper has a different style)
Religious calendar
Culture
(music, cinema, theatre)
Radio critique
Advertisement
La Liberté (Catholic, Fribourg)
Journal de Genève (Liberal conservative, Geneva)
La Sentinelle (Socialist, La Chaux-de-Fonds)
Focus on international news: Why is radio mentioned?
Korean war
(North Korean and Russian radios as sources)
United Kingdom
(In many cases, the government addressing the public via radio)
Belgium
(King Leopold III is facing criticism and addresses the nation on the radio)
Sweden
(The death of King Gustav V announced on Swedish radio)
Miscellaneous news
(Articles that contain a series of short news stories from around the world. Radio is often a source)
Various European news
Various Asian news
(China, Tibet, Indochina)
Since all the articles contain the word “radio,” this does not accurately reflect media coverage in 1950; it skews the results in favor of regions where news reaches Swiss newspapers via official (often communist) radio stations.
Now, we want to use embeddings to compare two collections.
For example, to study how radio news programs select information compared to the print media.
Here, we have selected a week's worth of news bulletins from Radio Suisse Internationale in June 1945.
To make them comparable to news articles, these bulletins must first be broken down into segments of information.
This is the text that was read on the air. There’s one every day in French, German, English, and sometimes Spanish.
One of the major news stories this week is the diplomatic progress made at the San Francisco conference among the “Big Five” (establishment of the future UN).
45 radio news
1674 press articles
Let's compare the news coverage in the press and on the radio for the week of June 4–10, 1945
All the articles (no ads) from the Journal de Genève, Gazette de Lausanne, La Sentinelle, Feuille d'Avis de Neuchâtel, L'Impartial and La Liberté.
News coverage in the press and on the radio
June 4–10, 1945
Press articles
Radio broadcast segment
Area of the semantic map that is not covered by the radio bulletins
What is the order of priority in Radio Suisse Internationale's news bulletins?
June 4–10, 1945
Press articles
Radio broadcast segment
Crisis in the Levant
Syria, Lebanon, France
War in the Pacific
Invasion of Okinawa
Germany
Administrating liberated Berlin
The 'Big Five'
San Francisco conference
Tito in Trieste
Bonomi resignation
Italy
Expulsions of nazis
Release of prisonners
Humanitarian issues
Swiss parliament
Resignation of General Guisan
Economic news
INTERNATIONAL
SWITZERLAND
Obituaries
"Faits divers"
Radio programs
Culture
Sports
Swiss international relations
Classified ads
Commercial ads
1 slide to mention the potential of visual embeddings
In the next chapter, we will conclude by discussing the limitations of these approaches and the future developments of Impresso
Critical approach and ongoing developments
Digitizing collections isn’t exactly new, nor is finding quantitative methods to analyze historical data. But creating an interface that allows us to do both—on a large scale and using methods that weren’t yet available when the first project was launched—raises some questions.
In particular, because this combination of data and methods bridges the gap between the computer science and history communities, and we must ensure that the democratization of these tools is accompanied by a critical perspective, regardless of which side one comes from.
A critical digital history
For several years now, we have been testing these tools in our classes. It’s always interesting to see how students—whether or not they have prior technical knowledge—use them to analyze these press archive collections.
Incorporating these tools into historians' toolkit
EPFL MA Course 2025-2026
Examples of student works
Ongoing developments, new collections
List of links and resources
1 publication to highlight?