Is Invisible XML Ready

for College Students?

Trying iXML and XProc on a Music Analysis Project
in an Undergraduate Text Analysis course

Balisage: The Markup Conference 2025

Michael Simons and Elisa Beshero-Bondar

Penn State Erie, The Behrend College

The Context:
DIGIT 210 (Large-Scale Text Analysis) @ Penn State Behrend

  • Usually the "regex and python course" 
  • Usually XML technologies pushed a bit to the margins 
    • comes after the Text Encoding course with heavy emphasis on schemas and XSLT
  • This year, we wanted an "XML-forward" approach!
    • why. . . (you may ask)

    • because, maybe too much of standard NLP feels like handing out recipes

    • not enough independent, creative thinking—just tinkering with ready-made scripts and libraries

    • and I had to speak in Japan, and I needed to interest a friend in covering three classes for me, and we both wanted to learn something new, different, and exciting...so...

    • why not try teaching students this crazy new iXML we'd been hearing about at Balisage?

The Battle Testers for DIGIT 210!

  • Elisa Beshero-Bondar
  • Michael Simons
  • David J. Birnbaum
  • Dannika Love
  • Caleb King
from the DIGIT 496 Canvas course page

The Battle Testers for DIGIT 210!

We presented...

“Workshop on Command Line Skills for Humanists and Social Scientists”

Needed separate installation instructions for Windows and Mac (2 separate documents)

Stairway to Heaven
Led Zeppelin IV
Led Zeppelin
C

[Intro]
Am Ammaj9 Am7 D/F# Fmaj7 G Am
Am Ammaj9 Am7 D/F# Fmaj7 G Am
C D Fmaj7 Am C G D
C D Fmaj7 Am
C D
Fmaj7
 
 
[Verse 1]

There's
  Am         Ammaj9
a lady who's sure  
         Am7         D/F#
All that glitters is gold
          Fmaj7                G  Am
And she's buying a stairway to heaven.
         Am             Ammaj9
When she gets there she knows 
       Am7            D/F#
If the stores are all closed
         Fmaj7                     G Am
With a word   she can get what she came for.
C D/F# Fmaj7 Am           C        G           D
 Oh    o______h and she's buying a stairway to heaven.
          C             D          Fmaj7         Am
There's a sign on the wall But she wants to be sure
           C              D              Fmaj7
'Cause you know sometimes words have two meanings.
...

Raw Text Files

iXML and XProc

Our XProc Pipeline Process

Source
iXML
XSLT
Output XML
Output lyrics
Output chords

iXML and XProc

Our iXML

mei: music.
music: title, newline, album, newline, artist, newline, key, newline, newline*, section++newline.
title: ~[#d;#a]+.
album: ~[#d;#a]+.
artist: ~[#d;#a]+.
key: ~[#d;#a]+.
section: type, mdiv.
@type: -"[", ~[#22]+, -"]".
mdiv: ~[#22]+.
-newline: (#d?, #a).
-space: " ".
<?xml version="1.0" encoding="UTF-8"?><mei ixml:state="ambiguous" xmlns:ixml="http://invisiblexml.org/NS">
<music><title>Flower Power</title>
<album>From the Fires</album>
<artist>Greta Van Fleet</artist>
<key>A</key>

<section type="Intro"><mdiv>
A D A D A D A D
 
 </mdiv></section>
<section type="Verse 1"><mdiv>
A                    D
 She is a lady, comes from all around
A                            D
 She's many places, but she's homeward bound
...

Output

Input

Flower Power
From the Fires
Greta Van Fleet
A

[Intro]
A D A D A D A D
 
 
[Verse 1]
A                    D
 She is a lady, comes from all around
A                            D
 She's many places, but she's homeward bound
...

iXML and XProc

Amazing Part of Our XSLT

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math"
    exclude-result-prefixes="xs math"
    version="3.0">
    <xsl:mode on-no-match="shallow-copy"/>
    <xsl:output method="xml" indent="yes"/>    
    <xsl:template match="/">
        <xsl:apply-templates/>
    </xsl:template>
    
    <xsl:template match="mdiv">
        <xsl:analyze-string select="." regex="\n(\s*([A-Z][#ba-z/0-9]*) *([A-Z][#ba-z/0-9]*)?)*\n">
            <xsl:matching-substring>
                <chordLine>
                    <xsl:for-each select="tokenize(., '\s+')">
                        <xsl:if test="current() ! matches(., '\S')">
                            <chord><xsl:value-of select="current()"/></chord>
                        </xsl:if>
                    </xsl:for-each>
                </chordLine>
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <lyrics>
                    <xsl:value-of select=". ! normalize-space()"/>                
                </lyrics>
            </xsl:non-matching-substring>
        </xsl:analyze-string>
    </xsl:template>
</xsl:stylesheet>

Evaluating the possibilities

MEI, MusicXML, and ChordPro

Trouble with MEI's chord notation

 

This seems too geared toward representation of one specific way of playing a chord.

This is still too literally representational for how to play the music.

 

This much fine-grained markup isn't optimized for analysis!

Evaluating the possibilities

MEI, MusicXML, and ChordPro

What ChordPro looks like

{start_of_chorus}
[A]Turn tonight, firelight
[D]Star shines in her eye
[A] Makes me feel like [D]I'm alive
She's outta sight, [A]yeah
[D]Aw yeah
She's al[A]right, she's alright, she's al[D]right
She's outta sight, outta [F]sight [G]
{end_of_chorus}

Evaluating the possibilities

MEI, MusicXML, and ChordPro

What a normal chord chart looks like

[Chorus]
A
Turn tonight, firelight
D
Star shines in her eye
A                   D
 Makes me feel like I'm alive
                   A
She's outta sight, yeah
D
Aw yeah
        A                             D
She's alright, she's alright, she's alright
                         F    G
She's outta sight, outta sight

Evaluating the possibilities

MEI, MusicXML, and ChordPro

<section type="Chorus"><mdiv>
A
Turn tonight, firelight
D
Star shines in her eye
A                   D
 Makes me feel like I'm alive
                   A
She's outta sight, yeah
D
Aw yeah
        A                             D
She's alright, she's alright, she's alright
                         F    G
She's outta sight, outta sight
 </mdiv></section>
<section type="Chorus">
   <line><chord>A</chord>Turn tonight, firelight</line>
   <line><chord>D</chord>Star shines in her eye</line>
   <line><chord>A</chord> Makes me feel like <chord>D</chord>I'm alive</line>
   <line>She's outta sight, <chord>A</chord>yeah</line>
   <line><chord>D</chord>Aw yeah</line>
   <line>She's al<chord>A</chord>right, she's alright, she's al<chord>D</chord>right</line>
   <line>She's outta sight, outta <chord>F</chord>sight <chord>G</chord></line>
</section>

What iXML does with ChordPro

What our iXML did with a
normal chord chart

Tricky iXML for ChordPro: contents of lines

 

  • text-only lines
  • lines with chords and no text
  • mixed text and chords
<section type='intro'>
         <line>
            <chord>A</chord> 
            <chord>D</chord>
            <chord>A</chord>
            <chord>D</chord>
            <chord>A</chord>
            <chord>D</chord>
            <chord>A</chord>
            <chord>D</chord>
         </line>
      </section>
<section type="Chorus">
   <line><chord>A</chord>Turn tonight, firelight</line>
   <line><chord>D</chord>Star shines in her eye</line>
   <line><chord>A</chord> Makes me feel like <chord>D</chord>I'm alive</line>
   <line>She's outta sight, <chord>A</chord>yeah</line>
   <line><chord>D</chord>Aw yeah</line>
   <line>She's al<chord>A</chord>right, she's alright, she's al<chord>D</chord>right</line>
   <line>She's outta sight, outta <chord>F</chord>sight <chord>G</chord></line>
</section>

iXML over ChordPro

xml: metadata, music.
metadata: title, newline, album, newline, artist, newline, key, newline, 
    newline+.
title: ~[#d;#a]+.
album: ~[#d;#a]+.
artist: ~[#d;#a]+.
key: ~[#d;#a]+.

music: section++(newline, newline+), newline?.
section: type, newline, line++newline, newline, outro.
@type: -"{start_of_", ~["}"]+, -"}".
-outro: -"{end_of_", -~["}"]+, -"}".
line: lineContent.
-lineContent: nullableText, (chord++nullableText, nullableText)?.
chord: -"[", ~["]"]+, -"]".
-nullableText: ~["[]{}";#a;#d]*.
-newline: (-#d?, -#a).

No ambiguity in CoffeePot so far

(small sample test set)

Was it worth it?

Transferrable Skills

  • Greatly enhanced command line skilling-up!
  • Grammar-writing encourages appreciation of meaningful structures 
  • iXML:regex recipes::poetry:prose

Learning an art form?

iXML is more than regex search-replace operations

  • more effort
  • more definitive writing of grammar
  • more installations required
  • more declarative

iXML is less than regex search-replace operations

  • less lines of code: elegant, legible simplicity
  • less sequential step-by-step scripting

Is this why iXML is worth teaching in a text analysis course?

  • Requires encountering multiple cultures in text analysis
    • “dominant culture” in NLP / AI: flat string processing
    • "structuralist culture": working with trees and nodes
  • Round-tripping between "unstructured" and "structured" text:
    • Text analysis becomes less about applying recipes...
    • ...and more of an adventure in rolling your own grammar!
  • But is it practical to teach? Will students ever use it in “real life”?