Machines Reading Maps

https://slides.com/staceymaples/allstaff-mrm

Machines Reading Maps (MRM) is a collaborative project of

University of Southern California Digital Library
Computer Science & Engineering Department at the University of Minnesota (US)
The Alan Turing Institute (UK).
The David Rumsey Map Collection (davidrumsey.com)

The project is funded by the

United States’ National Endowment for the Humanities (NEH)
United Kingdom’s Arts and Humanities Research Council (AHRC) under the first round of NEH/AHRC New Directions for Digital Scholarship.
David & Abby Rumsey

Machines Reading Maps Summit

April 20-21, 2023

at the David Rumsey Map Center at Stanford Libraries

Invite-only Stakeholder Introduction to MRM
Public Introduction to MRM
Invite-only Community-building discussion
MRM/Recogito Workshop

Machines Reading Maps is a project to create a generalizeable ML pipeline that uses human collaboration to:

Process printed text on scanned maps
Enrich the printed text
Convert the printed text to structured data

in order to make scanned historical map content easily searchable with support for complex queries

Why let

Machines Read Maps?

Metadata search is insufficient for searching the content of scanned maps
There are now millions of scanned maps available, publicly.
The infrastructure that those maps are served from is well suited to this work
Existing spatial data sources only contain information about the present (modern placenames), but even those are incomplete...

Why let

Machines Read Maps?

Metadata search is insufficient for search the content of scanned maps
There are now millions of scanned maps available, publicly.
The infrastructure that those maps are served from is well suited to this work
Existing spatial data sources only contain information about the present (modern placenames), but even those are incomplete...

Why let

Machines Read Maps?

Metadata search is insufficient for search the content of scanned maps
There are now millions of scanned maps available, publicly.
The infrastructure that those maps are served from is well suited to this work
Existing spatial data sources only contain information about the present (modern placenames), but even those are incomplete...

Why let

Machines Read Maps?

Metadata search is insufficient for search the content of scanned maps
There are now millions of scanned maps available, publicly.
The infrastructure that those maps are served from is well suited to this work
Existing spatial data sources only contain information about the present (modern placenames), but even those are incomplete...

Why let

Machines Read Maps?

Source: GNS, National Geospatial-Intelligence Agency

Search Demonstration

"This is unreasonably cool, and so visually pleasing, it nearly tickles. It also feels like I’m looking at one of those moments where everything changes."- email exchange with David Rumsey

https://clone.davidrumsey.com

How does MRM work?

Image Cropping

Text Spotting

Uses TExt Spotting TRansformers (TESTR), a generic end-to-end text spotting framework using Transformers for text detection and recognition in the wild.
TESTR is particularly effective when dealing with curved text-boxes where special cares are needed for the adaptation of the traditional bounding-box representations.

The text spotting problem typically consists of two sub-tasks:

text detection that localizes text boxes in a natural image, and
text recognition that reads the characters from the detected text.

The main difficulty in text spotting is contributed by multiple factors including large variations in font, size, style, color, shape, occlusion, distortion, and layout for natural scene images.

Text Spotting

Human Annotations

Text Spotting

SynthMap+

Text Spotting

SynthText Training Data

https://www.robots.ox.ac.uk/~vgg/data/scenetext/

Text Spotting

results

https://clone.davidrumsey.com/luna/servlet/detail/RUMSEY~8~1~73~10135:-Mississippi,-MISSISSIPPI---River,-?annotIds=217290442%2C217290185

Merging

Coordinate Conversion

PostOCR

EntityLinker

The Data

https://searchworks.stanford.edu/view/vn901vj0926

What Now?

Next Steps: Build Community

Next Steps...

Release mapKurator data, models, metadata, and user annotations.
Ensure open-source, sustainable, adaptable, and portable tools for other map collections.
Provide training materials for GLAM professionals to use tools.
Refine mapKurator pipeline and annotation tools for specific collections.
Integrate with IIIF, and the navPlace & geoRef Extensions.
Simplify sample data selection and export and train researchers to work with text on maps data.

Machines Reading Maps

Machines Reading Maps Summit

April 20-21, 2023

Why let

Machines Read Maps?

Why let

Machines Read Maps?

Why let

Machines Read Maps?

Why let

Machines Read Maps?

Why let

Machines Read Maps?

Search Demonstration

https://clone.davidrumsey.com

How does MRM work?

How does MRM work?

Image Cropping

Text Spotting

Text Spotting

Text Spotting

Text Spotting

Text Spotting

Merging

Coordinate Conversion

Coordinate Conversion

PostOCR

EntityLinker

The Data

What Now?

Next Steps: Build Community

Next Steps...

Geo4LibCamp 2023