Getting to the TEI from crowdsourcing?


Elisa Beshero-Bondar, PhD | @epyllia

Professor of Digital Humanities, Penn State Erie, The Behrend College

Editing and Encoding in the Undergraduate Classroom: Lightning Talk

October 22, 2020.  Link to these slides:

a pedagogical experiment



  • Learn contemporary popular digital contexts for transcribing primary sources

  • Gain experience with written and spoken (recorded) text transcription

  • Experience and reflect on transcription methods within a shared set of rules

Exercise 1: Crowdsourcing Transcription Exercise

Students do some crowdsourcing at the Smithsonian Digital Volunteers Site: Browse Projects to locate unfinished transcriptions, suggestion: African American History Archives

I edited all of the text insertions that the previous contributor did not finish by wrapping the red-ink insertions on the text with ^[[insertion]] as per the transcription instructions, and added insertions that the previous contributor missed. This was a bit tricky, but once my eyes adjusted to all the numbers and the workflow of the transcription, it was easy enough. I can't imagine easily being able to do this if I had dyslexia!”

”In all honesty, if there is one thing I do wonder about, it's how the official transcriptors decide which is the correct spelling if different documents write up different interpretations of the same word.”

”In terms of questions regarding this process, I'm most curious on who does the verifying of the transcriptions. What credentials do they have to be in a high position to make the final decisions? Are they actually Smithsonian employees? I think the website explains this briefly, but something to humanize these mysterious editing overlords would be nice. . . .How are these tricky tablets unscrambled? It'd be really interesting to watch a livestream with some collaborators going through and picking apart these files.”

An ongoing crowd-sourcing context: 

State of the Digital Archive

  • Papers digitally imaged 
  • metadata curated
  • crowdsourcing ”completed”

Last summer I asked...

  • Do you need help with reviewing transcriptions?
  • May my students work on preparing digital editions in TEI from the collection?

With permission, I selected a text from the Anna Julia Cooper collection

  • Asked her "racial philosophy", her handwriting overflows the boundaries of the survey, excerpted as an essay in publications of her work.
  • Published version doesn't preserve the survey structure.
  • Crowd-sourced transcription shows unreconciled options, totally separates form questions from handwritten responses.
['1.'], ['Name'], ['Anna'], ['Julia'], ['Cooper'], ['2.'], ['Present'], ['address'], ['201'], ['T'], ['R.W.'], ['Washington'], ['D.C.'], ['3.'], ['(If'], ['married'], ['woman,'], ['give'], ['maiden'], ['name'], ['on'], ['this'], ['line)'], ['Anna'], ['Julia'], ['Haywood'], ['4.'], ['Length'], ['of'], ['residence'], ['in'], ['this'], ['city'], ['45'], ['Yrs.'], ['State'], ['of'], ['longest'], ['residence'], ['5.'], ['Age'], ['72'], ['Sex'], ['F'], ['Date'], ['of'], ['Birth'], ['Aug'], ['10,'], ['1860'], ['Place'], ['of'], ['birth'], ['Raleigh,'], ['N.C.'], ['6.'], ['Marital'], ['status:'], ['Single'], ['Widowed'], ['Married'], ['-'], ['Date'], ['of'], ['Marriage'], ['June'], ['21,'], ['1877'], ['Age'], ['at'], ['marriage'], ['17'], ['Divorced'], ['Date'], ['Separated'], ['Date'], ['Widowed'], ['Date'], ['Sept.'], ['27'], ['1879'], ['Deserted'], ['Date'], ['Remarried'], ['Date'], ['7.'], ['Children'], ['living:'], ['None'], ['Children'], ['dead:'], ['None'], ['8.'], ['Present'], ['occupation:'], ['Teacher'], ['and'], ['President'], ['Frelinghuysen'], ['University'], ['9.'], ['Present'], ['annual'], ['salary'], ['$'], ['50.00'], ['or'], ['yearly'], ['net'], ['income,'], ['deducting'], ['expenses'], ['of'], ['earning'], ['except'], ['income'], ['tax.....or'], ['kindly'], ['check'], ['the'], ['class'], ['within'], ['which'], ['your'], ['net'], ['income'], ['falls:'], ['Under'], ['$'], ['500'], ['$'], ['500-'], ['$'], ['999'], ['$'], ['1000-1499'], ['X'], ['$'], ['1500-1999'], ['$'], ['2000-2499'], ['$'], ['2500-'], ['$'], ['2999'], ['$'], ['3000-'], ['$'], ['3499'], ['$'], ['3500-'], ['$'], ['3999'], ['$'], ['4000-'], ['$'], ['4499'], ['$'], ['4500-'], ['$'], ['4999'], ['$'], ['5000-'], ['$'], ['5999'], ['$'], ['6000-'], ['$'], ['6999'], ['$'], ['7000-'], ['$'], ['7999'], ['$'], ['8000-'], ['$'], ['8999'], ['$'], ['9000-'], ['$'], ['9999'], ['$'], ['10,000'], ['and'], ['over'], ['10.'], ['Of'], ['what'], ['college'], ['or'], ['professional'], ['school'], ['are'], ['you'], ['a'], ['graduate?'], ['Oberlin'], ['College'], ['Date'], ['of'], ['graduation'], ['BA,'], ['1884,'], ['MA'], ['1887'], ['11.'], ['Occupations'], ['since'], ['graduation'], ['From'], ['To'], ['Yearly'], ['Salary'], ['1.'], ['Professor'], ['Modern'], ['Lang.'], ['and'], ['Lit.'], ['Wilberforce'], ['U.'], ['Sept.'], ['1884'], ['June'], ['1885'], ['1000'], ['2.'], ['Instructor'], ['in'], ['Math'], ['Latin'], ['and'], ['Greek,'], ['St.'], ['Aug.'], ['Normal'], ['Sc.'], ['Sept.'], ['1885'], ['June'], ['1887'], ['Less'], ['than'], ['1000'], ['forget'], ['exactly'], ['3.'], ['Teacher'], ['Washington'], ['High'], ['School'], ['Sept.'], ['1887'], ['from'], ['750'], ['up'], ['4.'], ['Principal'], ['M.'], ['St.'], ['High'], ['School'], ['Dec'], ['1901'], ['Sept'], ['1906'], ['5.'], ['Professor'], ['Foreign'], ['Languages'], ['Lincoln'], ['Inst.'], ['MO'], ['1906'], ['1910'], ['1100'], ['6.'], ['Teacher'], ['of'], ['Latin'], ['Washington'], ['High'], ['School'], ['1910'], ['1930'], ['1800'], ['7.'], ['Retired'], ['from'], ['Public'], ['Schools'], ['June'], ['1930'], ['Pension'], ['1434'], ['8.'], ['President'], ['Frelinghuysen'], ['University'], ['June'], ['1930'], ['50.']]"	1.0	1.0	"Form A. 3533 Date Negro College Graduates Individual Occupational History I. Social Information 1. Name Anna Julia Cooper 2. Present address 201 T R.W. Washington D.C. 3. (If married woman, give maiden name on this line) Anna Julia Haywood 4. Length of residence in this city 45 Yrs. State of longest residence 5. Age 72 Sex F Date of Birth Aug 10, 1860 Place of birth Raleigh, N.C. 6. Marital status: Single Widowed Married - Date of Marriage June 21, 1877 Age at marriage 17 Divorced Date Separated Date Widowed Date Sept. 27 1879 Deserted Date Remarried Date 7. Children living: None Children dead: None 8. Present occupation: Teacher and President Frelinghuysen University 9. Present annual salary $ 50.00 or yearly net income, deducting expenses of earning except income tax.....or kindly check the class within which your net income falls: Under $ 500 $ 500- $ 999 $ 1000-1499 X $ 1500-1999 $ 2000-2499 $ 2500- $ 2999 $ 3000- $ 3499 $ 3500- $ 3999 $ 4000- $ 4499 $ 4500- $ 4999 $ 5000- $ 5999 $ 6000- $ 6999 $ 7000- $ 7999 $ 8000- $ 8999 $ 9000- $ 9999 $ 10,000 and over 10. Of what college or professional school are you a graduate? Oberlin College Date of graduation BA, 1884, MA 1887 11. Occupations since graduation From To Yearly Salary 1. Professor Modern Lang. and Lit. Wilberforce U. Sept. 1884 June 1885 1000 2. Instructor in Math Latin and Greek, St. Aug. Normal Sc. Sept. 1885 June 1887 Less than 1000 forget exactly 3. Teacher Washington High School Sept. 1887 from 750 up 4. Principal M. St. High School Dec 1901 Sept 1906 5. Professor Foreign Languages Lincoln Inst. MO 1906 1910 1100 6. Teacher of Latin Washington High School 1910 1930 1800 7. Retired from Public Schools June 1930 Pension 1434 8. President Frelinghuysen University June 1930 50."	[False]	[1973913.0]

TEI Document Data Modeling Challenge

Begin with Research questions: 

  • How can text encoding model the interaction between the survey prompts and AJC's handwritten responses?
  • By how much does AJC exceed the formal boundaries allotted for her responses?
  • What historical data about people, organizations, events is recorded in this document? 

TEI assignment series: 1

  • First, throw students into ”deep end of the pool”
  • Study the document, think about its structure
  • My inclination: let them model it in their own made-up XML
    • but they have been doing this already
    • and we need an intro to the TEI
    • and TEI is a shared XML language: community building



TEI assignment series: 2

  • Class of 25 students submits variety of code responses
  • Some baffled, most attempting something
  • Multiple students find their way to parts of the TEI Header and coding metadata about handDesc and typeDesc
  • We have two weeks for this orientation. TIME!
    • I put on my Project Director Hat
  • I create an ODD customization (limiting elements / attributes) for the class based on the most interesting and thoughtfully developed submissions
  • Ask the class to code more of the survey following the ODD customization
  • Students begin to make real progress! 



 <div2 type="question" n="65">
   <ab rend="4-L">65. Have you a <q>racial philosophy</q> that can be briefly stated?
   <add hand="AJC">My <q>racial philosophy</q> is not far removed 
    <lb/>from my general philosophy of life: that the greatest happiness comes from
    altruistic service—&amp; this 
    <lb/>is in reach of all of whatever race &amp; condition. The <q>Service</q> 
    here meant
    it is not a pious idea of <hi rend="underline">being used;</hi> 
   <lb/>any sort of exploitation whether active or passive
    is to my mind hateful. Nor is the <q>Happiness</q> a mere bit
    .<note resp="#ebb">This continues on the last page.</note></add>
 <pb n="6"/>
<div2 type="page-heading"><head>SPACE RESERVED FOR SPECIAL COMMENT</head></div2>

<div2 type="margin"> <ab hand="AJC"> <gap/>ily Service <gap/>d Girls Work <gap/>g World War.</ab> 
</div2><!--ebb: There are a few words in the left margin that are not clearly visible and 
don't seem to be connected to the flow of these paragraphs in AJC's response. 
We can sort of collect these up here, because they're not really in sequence with the text below.-->
<metamark function="flag" rend="pointer" target="(65 cont.)"/>
<div2 type="continuation" n="65"><ab hand="#AJC">65 cont.) 
   <lb/>of Pollyanna stuff. I am as sensitive to handicaps as those who are 
   <lb/>always whining about them &amp; the whips &amp; slings of prejudice, 
   <lb/>whether of color or sex, find me neither too calloused to suffer nor too ignorant
   <lb/>to know what is due me. Our own men as a group have not inherited traditions 
   <lb/>of chiv-alry (one sided as it may be among white men) &amp; we women are generally 
   <lb/>left to do our race battling alone except for empty compliments now &amp; then.
   <lb/>Even so, one may make the mistake of looking at race handicaps thro the wrong end 
   <lb/>of the telescope—imagining that oppression
     <!-- (continues) -->

Finish the AJC Survey by December?

  • Not necessary for the whole class 
  • AJC survey = one of six options for the semester project
  • Challenge to students: process, transform, re-mediate your encoding
  • Students continue by learning HTML, XPath, and XSLT
  • Semester project: short document data modeling project
  • Curate the documents in your own XML (on GitHub)
    • with schema rules you define
    • or with a TEI ODD customization you continue refining based on my start
  • Target: Transform XML with XSLT into HTML (on GitHub Pages websites)
    • to share a reading view on a project website
    • to explore research questions
    • to highlight project data and metadata
    • to learn document data modeling via "the XML stack



Course and project dev site:


Getting to the TEI from crowdsourcing?

By Elisa Beshero-Bondar

Getting to the TEI from crowdsourcing?

About a pedagogical experiment introducing the Text Encoding Initiative (TEI) to my Text Encoding students in Fall 2020.

  • 1,452