Elisa Beshero-Bondar PRO
Professor of Digital Humanities and Chair of the Digital Media, Arts, and Technology Program at Penn State Erie, The Behrend College.
An XML file starts from "the root", and branches out in complex ways to the "leaves".
Parent
Siblings
XPath is a language that we write to select parts (or nodes) of an XML document
An XPath expression enables a user to describe, find, and navigate to information inside an XML document
XML languages that employ XPath to find and manipulate information in XML documents:
XSLT (eXtensible Stylesheet Language Transformations)
XQuery (XML Query language)
A node is a piece of information in the XML tree, such as an element, an attribute, a string of text, or a comment.
element()
attribute()
text()
comment()
@
Component | Purpose |
---|---|
path expression | Describe the location of some nodes in a tree. |
axis | Describe the direction in which one looks in the tree. An axis is part of a path expression. |
predicate | Filter the results of a path expression. |
function | Do something with the information retrieved from the document instead of just returning it as received. |
Expression | Description |
---|---|
nodename | Selects all nodes with the name "nodename" |
/ | Selects from the root node |
// | Selects nodes in the document from the current node that match the selection no matter where they are |
. | Selects the current node |
.. | Selects the parent of the current node |
@ | Selects attributes |
Describes the location of some nodes in an XML file. |
|
Expression | Description |
---|---|
ancestor:: | The ancestor axis sends you to parents and above, all the way up to the root node. |
. or self:: |
The self axis designating the current context node and the current location in a path. |
.. or parent:: | The parent axis sends you up a short distance, to the immediate parent of the context node |
/ or child:: | The child axis (the default) sends you down to the immediate child of the context node. |
// or descendant:: | The descendant axis sends you down to the children and their children etc. |
@ or attribute:: | The attribute (@ ) axis for locating attributes and attribute values |
Predicates are always embedded in square brackets.
Filter the results of path expressions
Predicates are used to find a specific node or a node that contains a specific value.
//div[@type="letter"]
//div/p[1]
//persName[@ref="#Chaucer"]
//div[@type="letter"]/p[1]
//div[@type="letter"]/p[1]/persName[1]
//persName[parent::title]
//persName[ancestor::body]
//persName[descendant::forename]
//place[child::placeName]
Wildcard | Description |
---|---|
* | Matches any element node |
@* | Matches any attribute node |
node() | Matches any node of any kind |
XPath Expression | Description |
---|---|
//* | Selects all elements in the document |
//title[@*] | Selects all title elements which have at least one attribute of any kind |
Do something with the information retrieved from the document instead of just returning it as received.
Retrieve all of the <paragraph> elements in a <div> element (that has a type attribute value of "letter") but instead of returning the actual elements, return just a count of how many there are. This uses the count() function.
count(//div[@type="letter"]/p)
//div[@type="letter"]/p => count()
Retrieve all of the <persName> elements in a <div> element (that has a type attribute value of "letter") and return just a count of how many there are.
count(//div[@type="letter"]//persName)
//div[@type="letter"]//persName => count()
Function | Description |
---|---|
distinct-values() | eliminate repetition in a list of results |
last() | Returns the last |
lower-case() upper-case() |
Changes case of string |
not() | Inverts the truth value of the argument. //p[not(q)] returns all <p> elements that do not have any <q> child element. |
normalize-space() | normalize the white space (spaces, tabs, new lines, etc.) in text |
string-length() | returns the length of text by counting characters. |
//div[@type="letter"]//persName=>distinct-values()
//div[@type="letter"]//p[last()]
//div[@type="letter"]//p[1]
//placeName=> distinct-values()
//div[@type="letter"]//persName[not(@ref="#Webb_Mary_younger")]
Operator | Description |
---|---|
! | The simple map operator (! ) means do the thing on the right once for each item on the left. |
=> | The arrow operator (=> ) means apply the function on the right to the entire sequence (all at once) on the left. |
//addrLine ! count(descendant::persName)
//persName => distinct-values()
//div[@type="letter"] ! count(p)
Value | General | Description |
---|---|---|
eq | = | equal to |
ne | != | not equal to |
gt | > | > greater than (may also be written >) |
ge | >= | greater than or equal to (not less than; may also be written >=) |
lt | < | less than (may also be written <) |
le | <= | less than or equal to (not greater than; may also be written <=) |
By Elisa Beshero-Bondar
slides for use in XPath / XSLT workshops
Professor of Digital Humanities and Chair of the Digital Media, Arts, and Technology Program at Penn State Erie, The Behrend College.