Highlighting Our <egXML>Examples</egXML>

encoding XML examples in pedagogical contexts

James Cummings
Newcastle University

james.cummings@newcastle.ac.uk
@jamescummings

http://slides.com/jamescummings/examples

TEI 2018

“It is a trite but true observation, that examples work more forcibly on the mind than precepts”

Henry Fielding, The History of the adventures of
Joseph Andrews and his friend Mr Abraham Adams
, 1742

Some of the problems

  • Examples are repeated in multiple places
  • Examples give little context of how they work in real TEI files
  • Not enough examples in non-English languages
  • Examples (especially on reference pages) do not reflect all multiple methods that TEI allows but simple methods easily conveyed in short example

My TEI2018 suggestion:

  • external corpus of examples
  • examples included by pointers

This idea is still only a theoretical consideration, and the costs and benefits would need to be carefully weighed up.

Highlighting -- The Problem

Highlighting XML Examples

  • Whenever we encode examples of XML in a TEI document we should be using <egXML> (in the TEI Examples Namespace)
  • There are existing print textbooks and manuals which have examples that are well-formed XML fragments
  • These documents occasionally highlight parts of the XML example (for example print as bold a particular element under discussion, or highlight different bits in successive examples)
  • Also those creating new training materials in a digital context want the ability to highlight various parts for pedagogical reasons  

Pedagogical uses

  • Worked examples: providing multiple instances of a single example where highlighting draws attention to particular aspects discussed in accompanying prose
  • Examples with context: this is what we do in the TEI Guidelines on the 'Show All' page from any element reference page's example -- we can give greater context that an element might appear within and highlight that specific element 

Different Levels of Highlighting

If we accept that highlighting parts of existing or new XML examples is reasonable there are different potential levels:

  • Highlight any arbitrary string range in the example
  • Highlight one (or more) whole lines of an example
  • Highlight a whole element (including attributes and content) 
  • Highlight any arbitrary XML part of the example (attributes, values, element name, just content, etc.)

Highlighting -- The Solutions?

Highlight arbitrary string range

  • This relies on byte offset coding, saying how many characters along the highlighting should start, and then end. 
  • An approach used by many projects (because other solutions don't exist)
  • Also similar to the way docbook example callouts work in allowing line1 col1 line2 col2
  • One could specify a string-range() to give positions
  • Easy method, but: 
    • fragile: string-ranges might change in different contexts, whitespace might be added/deleted
    • opaque: hard for encoders to know what character 300 to 356 contain, this makes it error prone

Highlight one (or more) lines

  • This is what many online tools do in adding additional highlighting to syntax highlighting (e.g. slides.com)
  • One could specify either a string-range() or line number, but basically the same as arbitrary strings, but:
    • still fragile in case of format changes
    • _slightly_ less opaque (to say 'line:30')

Highlight an element

  • Common use case, it is often an entire element that people want to highlight
  • Downside would be limitation if you just want to reference an attribute's value
  • If implementing, it is likely as easy to enable any part of XML tree

Highlight arbitrary part of XML tree

  • This is probably suitable for most cases where there are examples of XML  
  • Parts of an XML tree that need to be addressable:
    • element(s) as a whole
    • element names
    • element content
    • Attributes as a whole
    • Attribute names
    • Attribute values
  • If implementing, using XPath as the language to specify the highlighting address seems reasonable
  • What is difficult in such as system is highlighting the characters that make up the XML language like <, =, ", etc.

Ideas on how to encode?

  • Regardless of the solution, we need a way to say what form of formatting/highlighting is being applied
  • The <rendition> element is a good possibility, a hypothetical string-range or a XPath methods are below:
<rendition scheme="css" 
  match="string-range(//egXML[@xml:id='example1'], 56, 68)"> 
   font-weight: bold;
</rendition>
<rendition scheme="css" 
  selector="egXML[xml:id='example1']"
  match=".//persName[1]/@ref/data()"> 
   background-color: #336699;
</rendition>

Embedding elements

  • Another approach would be to embed special namespaced elements and then use @rendition to point to a <rendition>
  • However, this has a major limitation of not being able to highlight parts of the XML (unless you escape the XML entirely, which ruins the reasons for using an <egXML>)
 <egXML xmlns="http://www.tei-c.org/ns/Examples">
            &lt;p&gt;
               Is this idea of 
               &lt;persName 
                 <hi xmlns="http://www.tei-c.org/ns/1.0" rend="bg-shade">
                   ref=&quot;#JC&quot;
                 </hi>&gt;
                 James Cummings&lt;/persName&gt;
               overkill?
            &lt;/p&gt;
  </egXML>

Implementation worries

  • Implementing the processing of such systems raises a number of issues
  • I think a stand-off approach (either string-ranges or XPaths) is better than trying to embed any form of notation inside <egXML>. Embedding would probably require:
    • escaping the XML
    • thus losing the addressability of the XML (so again why use <egXML> then? ... because it is XML!)
  • String-ranges would have to refer to the egXML as encoded, not any pretty-printed output -- this adds a level of difficulty when processing for display
  • Evaluating XPath has challenges and limitations
    • e.g. you can't highlight from middle of an element name to middle of an attribute value... (but I'm really not sure why you'd want to)

Thinking generally about examples

  • I think we need a way to highlight particular portions of XML examples
  • If not, then TEI <egXML> is not suitable for XML examples of printed or digital works that already do this
  • How we feel about this issue further expounds the theory of (digital) text we create as the TEI community
  • However, there will always be cost vs benefit:
    • Is it worth doing this since implementation is difficult?
    • Is  it worth updating examples in the TEI Guidelines to 'work better on the mind'?
  • It would give us a much more pedagogically flexible mechanism, but at a cost of implementation ... because of this I want to ensure I'm not overlooking a different approach before even considering implementation

Highlighting Our <egXML>Examples</egXML>

encoding XML examples in pedagogical contexts

James Cummings
Newcastle University

james.cummings@newcastle.ac.uk
@jamescummings

http://slides.com/jamescummings/examples

Highlighting Our Examples: encoding XML examples in pedagogical contexts

By James Cummings

Highlighting Our Examples: encoding XML examples in pedagogical contexts

TEI2019 paper: The TEI Guidelines use the element throughout the prose and reference pages for containing XML examples. However, many TEI users know little about this element, and most don’t even realise that it is not even in the usual TEI namespace, but instead in a TEI examples namespace (http://www.tei-c.org/ns/Examples). Following on from my paper at TEI2018 (in which I proposed more detailed ways that the TEI Guidelines might handle examples more generally), this paper will look at possible improvements to the element, specifically designed for modern pedagogical uses. When creating TEI ODD customisations as local encoding manuals, users sometimes use to show how encoders should mark up particular textual phenomena, similar to the use in the Guidelines themselves. Expanding this element’s functionality could benefit not only the TEI Guidelines, but also all those who include snippets of XML markup in encoding manuals, slides, tutorials, exercises, or anything else possibly derived from (or exported to) a TEI source and beyond. Building on the kind of syntax highlighting we are familiar with in XML editors and code snippets online, this paper examines the need to highlight arbitrary portions of XML stored in an element. Whether encoding existing resources containing highlighting of XML or wanting to render modern born-digital pedagogical materials, the TEI Guidelines currently recommend no specific way to do this. This paper looks at a number of possible options for enabling the highlighting of markup, including embedding namespaced elements, out-of-line markup, and byte-offset coding. All of these are summarised, with the problems that they each face, not only in processing, but also in providing flexible methods to enable users to express existing or desired output rendition.

  • 1,427