Langston Hughes and The Blues:

Text Archiving, Analysis, and Black DH for Undergraduates

Mia Borgia and Elisa Beshero-Bondar
DIGIT program <>
Penn State Erie, The Behrend College

Link to these slides: 

Tweets to @epyllia

Presentation for Keystone DH: 15 July 2021
Panel: Natural Language Processing and Text Mining (10:45am - noon)

Background image artist: Larry Poncho Brown

Building digital editions and archives 

  • curation
  • analysis


DH at PSU Behrend (Erie,PA)

  • Part of Digital Media, Arts, and Technology major
  • In midst of "core" DH course sequence:
    • Fall class: Text Encoding (DIGIT 110)
    • Spring class: Text Analysis (DIGIT 210)
    • See Newtfire: for course syllabi, tutorials, projects
  • Text Encoding:
    • transcribing and encoding manuscripts (digital photofacsimiles in 2020)
    • intro to TEI (see next panel)
    • students design and develop a curated digital edition as semester project
  • Text Analysis
    • working with born-digital texts, larger quantities, "big blobs o' text"
    • regex conversion of text to XML formats
    • XQuery and Python, for working with collections at scale
    • NLP, network analysis, SVG: programmed by students
    • document data modeling at scale  

The Ballot and Me - Langston Hughes

  • Real historical Black activists and politicians as characters in play
  • Characters plead with the audience to exercise their right to vote and educate about African-American suffrage in the US.
  • Goal of the digital edition:
    • Accurately represent the original typescript itself
    • Make the characters visible as real people through portrait gallery
    • Create a discoverable index of historical characters' prominent messages through their speaking parts

Historical Emphasis

The Ballot and Me - Typescript to Digital Edition

The Ballot and Me: The Code

  • XML:ID and ID:REF on characters and their speaking parts
  • XML edition's syntax rules loosely based on TEI
  • Learned XSLT to transform the XML into a navigable HTML edition
 <castgroup n="1">
            <activePeriod castGroupRole="narrator">Contemporary</activePeriod>
        <castgroup n="2">
            <lb/><person xml:id="FRAUNCES">SAMUEL FRAUNCES</person>**
        <lb/><person xml:id="SOJOURNER">SOJOURNER TRUTH</person>**
            <activePeriod castGroupRole="2" position = "right">
                <lb/>George Washington's period
                <lb/><period begin = "1797" end = "1833">(1797-1883)</period>
        <castgroup n="3">
            <lb/><person xml:id="DOUGLASS">FREDERICK DOUGLASS</person>**
            <activePeriod castGroupRole="3" position ="right">
                <period begin = "1817" end = "1895">(1817-1895)</period>

The Digital Gallery

Comparing Text Encoding with Large Scale Text Analysis

The Ballot and Me (DIGIT 110)

  • Dr. B hand-selected projects
  • Hand-transcribed from the original typescript
  • Preserving the integrity of the original text was top priority

The Blues (DIGIT 210)

  • Student-chosen topics
  • Scope - MUCH larger-scale
  • Unstable sourcing issues
  • ETHICS - potential for misrepresentation

artist: Winold Reiss | Source

artist: David de Ramon | Source

Blues Analysis Project - Finding a Source

Goal for the Blues Project:

  • Collect and markup as much blues song lyric text data as possible for distant reading, data visualizations, and large-scale analysis of the genre

Curation problem: Link rot from old website HTML!

The Lyric Text Collection Process

  • Python scraper (scripting creds to Dr. B!)
  • As we scraped the rusty old site archive of its song lyric text and metadata, we began to realize how the link rot unevenly affected certain blues artists and older blues song files
  • Out of 3,246 songs on the site originally, only 1,078 were retrievable !
  • Regex: Up-converted to a simple format of XML

Source Code vs. What we Scraped

   <TITLE>Muddy Waters - Mannish Boy</TITLE>
<META name="ROBOTS" content="INDEX, ALL"><META Name="Description" Content="Harry's Blues Lyrics Online, Muddy Waters lyrics, Mannish Boy"><META http-equiv="Keywords" Content="Hharry, blues, blues songs, blues music, chicago blues, delta blues, texas blues, acoustic blues, electric blues, harp blues, harmonica blues, guitar blues, piano blues, music, blues lyrics, lyrics, blues history, blues tree, blues essay, blues language, blues lingo, blues dictionary, blues terms, blues words, blues phrases, blues topics, blues books, blues links, muddy waters, mckinley morganfield, mc kinley morganfield, mckinley, morganfield">
<script language="JavaScript"><!--
function MM_openBrWindow(theURL,winName,features) { //v2.0,winName,features);
<BODY TEXT="#0033FF" LINK="#FF8040" ALINK="#663300" VLINK="#999999" BACKGROUND="../../graphics/background_blue_7.jpg">
<CENTER><A NAME=top></A><SCRIPT LANGUAGE=JavaScript type="text/javascript">
</SCRIPT><FONT SIZE="+3" COLOR="#FF8000"><B>Muddy

<P><IMG src="../../graphics/gif_orangebar.gif" ALT="Ruler" WIDTH="100%" HEIGHT=5 ALIGN=bottom><BR>
   <TR BGCOLOR="#0033FF">
      <TD COLSPAN=6 WIDTH=704>
         <P><FONT SIZE="+2" COLOR="#FFFF00"><B><I>New!</I></B></FONT><FONT SIZE="-1" COLOR="#FFFFFF"><B>
         </B></FONT><FONT SIZE="+1" COLOR="#FFFFFF"><B>Nonstop
         Internet Blues Radio: listen to your favorite blues music
         while you surf!</B></FONT></P>
      <TD WIDTH=116>
         <CENTER><FONT COLOR="#FFFFFF"><IMG src=";bfpage=redirect" WIDTH=1 HEIGHT=1 BORDER=0 ALIGN=bottom nosave>
         </FONT><A HREF="" TARGET="_blank"><FONT COLOR="#0000FF"><B>General
      <TD WIDTH=116>
         <CENTER><FONT COLOR="#0000FF"><B><IMG src=";bfpage=redirect" WIDTH=1 HEIGHT=1 BORDER=0 ALIGN=bottom nosave>
         </B></FONT><A HREF="" TARGET="_blank"><FONT COLOR="#0000FF"><B>Electric
      <TD WIDTH=116>
         <CENTER><IMG src=";bfpage=redirect" WIDTH=1 HEIGHT=1 BORDER=0 ALIGN=bottom nosave><FONT COLOR="#0000FF">
         </FONT><A HREF="" TARGET="_blank"><FONT COLOR="#0000FF"><B>Rockin'
      <TD WIDTH=116>
         <CENTER><FONT SIZE="-1" COLOR="#FFFFFF"><B><IMG src=";bfpage=redirect" WIDTH=1 HEIGHT=1 BORDER=0 ALIGN=bottom nosave>
         </B></FONT><A HREF="" TARGET="_blank"><FONT COLOR="#0000FF"><B>Acoustic
      <TD WIDTH=100>
         <CENTER><FONT SIZE="-1" COLOR="#FFFFFF"><B><IMG src=";bfpage=redirect" WIDTH=1 HEIGHT=1 BORDER=0 ALIGN=bottom nosave></B></FONT><FONT COLOR="#0000FF"><B>
         </B></FONT><A HREF="" TARGET="_blank"><FONT COLOR="#0000FF"><B>Jump
      <TD WIDTH=140>
         <CENTER><FONT COLOR="#0000FF"><B>Also:</B></FONT><FONT SIZE="-1"><B>
         </B></FONT><A HREF="" TARGET="_blank"><FONT COLOR="#0000FF"><B>Blues
 <FONT SIZE="+1"><B>Have you already seen the new
</B></FONT><A HREF="../../free_blues_mp3_charts.htm#top" TARGET="_blank"><FONT SIZE="+1" COLOR="#FF8000"><B>Blues
MP3 page</B></FONT></A><FONT SIZE="+1"><B>?</B></FONT><B><BR>
</B><FONT SIZE="-1">Want free blues mp3 for your own site?
</FONT><IMG src=";siteid=33470537&amp;bfpage=small" WIDTH=1 HEIGHT=1 BORDER=0 ALIGN=bottom nosave><A HREF="" TARGET="_blank"><FONT SIZE="-1"><B>Click
here!</B></FONT></A><FONT SIZE="-1">.</FONT></P>

      <TD WIDTH="70%">
         <P><A NAME="mannish_boy"></A><FONT SIZE="+2"><B>Mannish
      <TD WIDTH="30%">
         <P ALIGN=right><A HREF="../../sounds/muddy_waters/mannish_boy.ram">soundclip</A></P>
<FONT SIZE="+1">by </FONT><A HREF="|PM&p=amg&sql=B103763"><FONT SIZE="+1">Elias
McDaniel a.k.a. Bo Diddley</FONT></A><FONT SIZE="+1"> / adapted by
</FONT><A HREF="|PM&p=amg&sql=B108085"><FONT SIZE="+1">McKinley
Morganfield a.k.a. Muddy Waters</FONT></A><FONT SIZE="+1"><BR>
recording of 19</FONT><BR>
<FONT SIZE="+1">from probably </FONT><A HREF="|PM&p=amg&sql=A124685"><FONT SIZE="+1">Chess
Box (Chess 9340)</FONT></A><FONT SIZE="+1">, </FONT><A HREF="../../disclaimer.htm">copyright
Ooooooh, yeah, ooh, yeah<BR>
Everythin', everythin', everytin's gonna be alright this mornin'<BR>
Ooh yeah, whoaw<BR>
Now when I was a young boy, at the age of five<BR>
My mother said I was, gonna be the greatest man alive<BR>
But now I'm a man, way past 21<BR>
Want you to believe me baby,<BR>
I had lot's of fun<BR>
I'm a man<BR>
I spell mmm, aaa child, nnn<BR>
That represents man<BR>
No B, O child, Y<SUP>1</SUP><BR>
That mean mannish boy<BR>
I'm a man<BR>
I'm a full grown man<BR>
I'm a man<BR>
I'm a natural born lovers man<BR>
I'm a man<BR>
I'm a rollin' stone<BR>
I'm a man<BR>
I'm a <A HREF="../../blueslanguage.htm#hoochie_coochie_man">hoochie
coochie man</A><BR>
Sittin' on the outside, just me and my mate<BR>
You know I'm made to move you honey,<BR>
come up two hours late<BR>
Wasn't that a man<BR>
I spell mmm, aaa child, nnn<BR>
That represents man<BR>
No B, O child, Y<SUP>1</SUP><BR>
That mean mannish boy<BR>
I'm a man<BR>
I'm a full grown man<BR>
I'm a natural born lovers man<BR>
I'm a rolllin' stone<BR>
I'm a hoochie coochie man<BR>
The line I shoot will never miss<BR>
When I make love to a woman,<BR>
she can't resist<BR>
I think I go down,<BR>
to old Kansas Stew<BR>
I'm gonna bring back my second cousin,<BR>
that little <A HREF="../../blueslanguage.htm#johnny_cocheroo">Johnny
All you little girls,<BR>
sittin'out at that line<BR>
I can make love to you woman,<BR>
in five minutes time<BR>
Ain't that a man<BR>
I spell mmm, aaa child, nnn<BR>
That represents man<BR>
No B, O child, Y<SUP>1</SUP><BR>
That mean mannish boy<BR>
I'm a full grown man<BR>
I'm a natural born lovers man<BR>
I'm a rollin' stone<BR>
I'm a man-child<BR>
I'm a hoochie coochie man<BR>
well, well, well, well<BR>
hurry, hurry, hurry, hurry<BR>
Don't hurt me, don't hurt me child<BR>
don't hurt me, don't hurt, don't hurt me child<BR>
well, well, well, well<BR>
Note 1: alternate previously used text for this line, <I>Lord be oooh
child, why</I>. The line <I>No B, O child, Y</I><SUP> </SUP>was
suggested to me by Mike O'Keefe on March 18, 1999 and I think Mike is
right here, it perfectly fits the preceding lines of the lyrics so I
substituted the old text with his suggestion. Thanks to Mike O'Keefe
for this contribution!.<BR>

<CENTER><FONT SIZE="+2"><B>&#91;</B> </FONT><A HREF="../../index.htm#top"><FONT SIZE="+2"><B>Home
Page</B></FONT></A><FONT SIZE="+2"> <B>&#93;</B></FONT><BR>

<A HREF=""><IMG src="" WIDTH=22 HEIGHT=22 BORDER=0 ALIGN=bottom nosave></A><BR>
<!-- webbot bot="HTMLMarkup" startspan --></B></FONT></P>

<P><A HREF=""><FONT SIZE="-1"><B><IMG src=";l=y&amp;hb=WQ59111284NF76EN0&amp;cd=1&amp;n=muddy_waters_songs" ALT="Click Here!" WIDTH=88 HEIGHT=62 BORDER=0 ALIGN=bottom></B></FONT></A></P>

<P><FONT SIZE="-1"><B><!-- webbot bot="HTMLMarkup" endspan --> <!-- END WEBSIDESTORY CODE  --><BR>
<!-- webbot bot="HTMLMarkup" startspan --></B></FONT></P>

<P><A HREF=""><FONT SIZE="-1"><B><IMG src=";l=y&amp;hb=WQ59111284NF76EN0&amp;cd=1&amp;n=single_song_pages" ALT="Click Here!" WIDTH=88 HEIGHT=62 BORDER=0 ALIGN=bottom></B></FONT></A></P>

<P><FONT SIZE="-1"><B><!-- webbot bot="HTMLMarkup" endspan --> <!-- END WEBSIDESTORY CODE  -->
<xml><metadata><artist>Muddy Waters</artist><title> Mannish Boy</title>
McDaniel a.k.a. Bo Diddley
 </name><name> adapted by

Morganfield a.k.a. Muddy Waters

</name></songwriter><recordDate>recording of 19

</recordDate><album>from probably 
Box (Chess 9340)

<l>Ooooooh, yeah, ooh, yeah

<l>Everythin', everythin', everytin's gonna be alright this mornin'

<l>Ooh yeah, whoaw

<l>Now when I was a young boy, at the age of five

<l>My mother said I was, gonna be the greatest man alive

<l>But now I'm a man, way past 21

<l>Want you to believe me baby,

<l>I had lot's of fun

<l>I'm a man

<l>I spell mmm, aaa child, nnn

<l>That represents man

<l>No B, O child, Y

<l>That mean mannish boy

<l>I'm a man

<l>I'm a full grown man

<l>I'm a man

<l>I'm a natural born lovers man

<l>I'm a man

<l>I'm a rollin' stone

<l>I'm a man

<l>I'm a 
coochie man

<l>Sittin' on the outside, just me and my mate

<l>You know I'm made to move you honey,

<l>come up two hours late

<l>Wasn't that a man

<l>I spell mmm, aaa child, nnn

<l>That represents man

<l>No B, O child, Y

<l>That mean mannish boy

<l>I'm a man

<l>I'm a full grown man


<l>I'm a natural born lovers man


<l>I'm a rolllin' stone


<l>I'm a hoochie coochie man

<l>The line I shoot will never miss

<l>When I make love to a woman,

<l>she can't resist

<l>I think I go down,

<l>to old Kansas Stew

<l>I'm gonna bring back my second cousin,

<l>that little 

<l>All you little girls,

<l>sittin'out at that line

<l>I can make love to you woman,

<l>in five minutes time

<l>Ain't that a man

<l>I spell mmm, aaa child, nnn

<l>That represents man

<l>No B, O child, Y

<l>That mean mannish boy


<l>I'm a full grown man


<l>I'm a natural born lovers man


<l>I'm a rollin' stone

<l>I'm a man-child

<l>I'm a hoochie coochie man

<l>well, well, well, well

<l>hurry, hurry, hurry, hurry

<l>Don't hurt me, don't hurt me child

<l>don't hurt me, don't hurt, don't hurt me child

<l>well, well, well, well


<note>Note 1: alternate previously used text for this line, 
Lord be oooh
child, why
. The line 
No B, O child, Y
suggested to me by Mike O'Keefe on March 18, 1999 and I think Mike is
right here, it perfectly fits the preceding lines of the lyrics so I
substituted the old text with his suggestion. Thanks to Mike O'Keefe
for this contribution!</note></lyrics></xml>

Analyzing the Blues:


(from our limited song data pool)

Next Steps! (senior project)

  • Find more blues song lyric sources to fill in the data gaps!
  • Conduct more NLP and data mining research to produce more accurate data
  • Someday create an archive with links to audio files to connect to all of our blues songs with their lyrics on the site!


  • Why is it important to do this work? 
    • the internet is laden with unethical and bigoted information
    • the internet also offers opportunities for spreading culturally important knowledge about history and art
    •  Excavation work:
      • preserving and promoting the integrity of historically/culturally significant works by digitizing and sharing them
  • Showing the continuing life of the blues and emphasizing Black culture and history on the web!

Langston Hughes and The Blues: Text Archiving, Analysis, and Black DH for Undergraduates

By Elisa Beshero-Bondar

Langston Hughes and The Blues: Text Archiving, Analysis, and Black DH for Undergraduates

presentation for the 2021 Keystone DH Conference

  • 1,529