Machines Reading Maps
https://slides.com/staceymaples/allstaff-mrm
Machines Reading Maps (MRM) is a collaborative project of
- University of Southern California Digital Library
- Computer Science & Engineering Department at the University of Minnesota (US)
- The Alan Turing Institute (UK).
- The David Rumsey Map Collection (davidrumsey.com)
The project is funded by the
- United States’ National Endowment for the Humanities (NEH)
- United Kingdom’s Arts and Humanities Research Council (AHRC) under the first round of NEH/AHRC New Directions for Digital Scholarship.
- David & Abby Rumsey
Machines Reading Maps Summit
April 20-21, 2023
at the David Rumsey Map Center at Stanford Libraries
-
Invite-only Stakeholder Introduction to MRM
-
Public Introduction to MRM
- Invite-only Community-building discussion
- MRM/Recogito Workshop
Machines Reading Maps is a project to create a generalizeable ML pipeline that uses human collaboration to:
-
Process printed text on scanned maps
-
Enrich the printed text
-
Convert the printed text to structured data
in order to make scanned historical map content easily searchable with support for complex queries
Why let
Machines Read Maps?
- Metadata search is insufficient for searching the content of scanned maps
- There are now millions of scanned maps available, publicly.
- The infrastructure that those maps are served from is well suited to this work
- Existing spatial data sources only contain information about the present (modern placenames), but even those are incomplete...
Why let
Machines Read Maps?
- Metadata search is insufficient for search the content of scanned maps
- There are now millions of scanned maps available, publicly.
- The infrastructure that those maps are served from is well suited to this work
- Existing spatial data sources only contain information about the present (modern placenames), but even those are incomplete...
Why let
Machines Read Maps?
- Metadata search is insufficient for search the content of scanned maps
- There are now millions of scanned maps available, publicly.
- The infrastructure that those maps are served from is well suited to this work
- Existing spatial data sources only contain information about the present (modern placenames), but even those are incomplete...
Why let
Machines Read Maps?
- Metadata search is insufficient for search the content of scanned maps
- There are now millions of scanned maps available, publicly.
- The infrastructure that those maps are served from is well suited to this work
- Existing spatial data sources only contain information about the present (modern placenames), but even those are incomplete...
Why let
Machines Read Maps?
Search Demonstration
https://clone.davidrumsey.com
How does MRM work?
How does MRM work?
Image Cropping
Text Spotting
Uses TExt Spotting TRansformers (TESTR), a generic end-to-end text spotting framework using Transformers for text detection and recognition in the wild.
TESTR is particularly effective when dealing with curved text-boxes where special cares are needed for the adaptation of the traditional bounding-box representations.
The text spotting problem typically consists of two sub-tasks:
- text detection that localizes text boxes in a natural image, and
- text recognition that reads the characters from the detected text.
The main difficulty in text spotting is contributed by multiple factors including large variations in font, size, style, color, shape, occlusion, distortion, and layout for natural scene images.
Text Spotting
Human Annotations
Text Spotting
SynthMap+
Text Spotting
SynthText Training Data
Text Spotting
results
Merging
Coordinate Conversion
Coordinate Conversion
PostOCR
EntityLinker
The Data
What Now?
Next Steps: Build Community
Next Steps...
- Release mapKurator data, models, metadata, and user annotations.
- Ensure open-source, sustainable, adaptable, and portable tools for other map collections.
- Provide training materials for GLAM professionals to use tools.
- Refine mapKurator pipeline and annotation tools for specific collections.
- Integrate with IIIF, and the navPlace & geoRef Extensions.
- Simplify sample data selection and export and train researchers to work with text on maps data.
Machines Reading Maps - SULAllstaff
By Stace Maples
Machines Reading Maps - SULAllstaff
- 336