Metadata, Meaning, and Erasure in Astronomy

 

Daina Bouquin

Semantics

The relationships between signifiers

and what they stand for in reality.

 

How we understand what something means.

Metadata

Mechanisms for modeling semantic meaning.

Copernicus, N. (1543). Nicolai Copernici Torinensis De revolutionibus orbium cœlestium libri vi. Norimbergae: Apud Ioh. Petreium.

Copernicus

  • Published towards the end of his life
     
  • Work was dedicated to Pope Paul III
     
  • Theory was largely seen as a mathematical convenience

Galilei, G.  (1610). Osservazioni e calcoli relativi ai Pianeti Medicei.

Galileo
(89 years later)

  • Tried for heresy by the Inquisition
     
  • Imprisoned for life
     
  • Banned his books

Galileo didn't know his chicken scratch would be important.

(Largely seen as the birth of observational astronomy and the scientific method)

 

People didn't care that much about Copernicus' model. 

(It was easy to dismiss)

Meaning is collective agreement about a specific thing at a specific time.

 

Semantic meaning is not static.

Humphrey, S.D. (1849). Multiple Exposures of the Moon: Nine Exposures, daguerreotype.
http://id.lib.harvard.edu/images/olvwork124646/catalog

Gift to the President of Harvard at the time.

(This is it on my desk.)

Now it's art.

  The creators of these objects did not need to care 
about the historic meaning of their work.

 

Norms allowed them to share their work in
ways that were directly attributable to them.

 

Over time, metadata was physically and

digitally recorded and the semantic network 

around their work grew.

 

People understand what these things are
and why they should keep them.

What makes something citable?

Identification

Unambiguous way to point at a specific thing in a specific place at a specific time.

 

(DOI, URI, URN, USBN, Bibcode, arXiv ID, etc.)

Location

Where the thing you are pointing at is at a specific time.

 

 

(URL)

Identifiers aren't magic.

 

Memory Institutions mint identifiers 

and curate metadata to ensure

that works are findable and have
 meaning that can change over time.

 

What gets an identifier?

 

 

I can describe this thing but give it little meaning.

 

Norms prevent me from throwing it away.

(Data is important now and I would feel bad)

 

It gets an identifier because I've been told it's important.

Henrietta Swan Leavitt

First standard with which to measure the distance at a galactic scale.

Log of the period of the star

 Apparent magnitude

Lines correspond to the min and max brightness

Mathematician Gösta Mittag-Leffler tried to nominate Leavitt for the Nobel Prize in 1925 only to find out she had died of cancer in 1921.

 

The Nobel is not awarded posthumously.

Hundreds of women worked with the plates.

Enabling Full-Text Searchability with over 20,000 volunteers

Observations don't stop being scientifically valuable because they're old.

Light curve of 3C273 (786 points) over 100 years

Normalizing plate numbers on the Zooniverse

We now know the names of 216 Women.

All of the plate numbers have been found and the

metadata has been recorded.

 

Software development and database refactoring

necessary to connect the data to the notebooks

is happening now.

This isn't just a historical problem.

Code is Speech.

Science relies on software, but who gets credit?

version 1.0

version 1.1

version 2.0

What are people citing if there are no identifiers?

How do they know if anyone cites their software?

The "citations" we found
fail to accomplish many of the functions of citation.

2003 paper with two authors

What's wrong with citing something else?


 authorship ambiguity

Different software versions have different authors– how many papers would the software authors need to write?

 

can put open source documentation behind paywalls

 Not all papers about software are OA

 

Software citations are made indistinguishable 

from citations to for other purposes  

Many journals don't take papers without scientific findings

 

makes locating software more difficult over time

Links break (if they exist at all)

Alias Locations

XML Tag Combinations

In total, there were 343 papers where software had been mentioned but the location could not be determined.

You need to archive software to

get an identifier for software.

Before

After

And yet it moves.

 

We have a complete history of nothing.

 

Some people get a legacy and some don't.

 

We need to care.

Metadata, Meaning, and Erasure in Astronomy

By Daina Bouquin

Metadata, Meaning, and Erasure in Astronomy

  • 148