XPath Intro

Patricia O'Connor

XML Tree Structure

An XML file starts from "the root", and branches out in complex ways to the "leaves".

Parent

Siblings

What is XPath?

XPath is a language that we write to select parts (or nodes) of an XML document

 An XPath expression enables a user to describe, find, and navigate to information inside an XML document

XML languages that employ XPath to find and manipulate information in XML documents:

XSLT (eXtensible Stylesheet Language Transformations)

XQuery (XML Query language)

Nodes

A node is a piece of information in the XML tree, such as an element, an attribute, a string of text, or a comment.

element()

attribute() 

text() 

comment()

@

XPath Components

Component Purpose
path expression Describe the location of some nodes in a tree.
axis Describe the direction in which one looks in the tree. An axis is part of a path expression.
predicate Filter the results of a path expression.
function Do something with the information retrieved from the document instead of just returning it as received.

Path Expressions

Expression Description
nodename Selects all nodes with the name "nodename"
/ Selects from the root node
// Selects nodes in the document from the current node that match the selection no matter where they are
. Selects the current node
.. Selects the parent of the current node
@ Selects attributes
Describes the location of some nodes in an XML file.

Axes

Expression Description
ancestor:: The ancestor axis sends you to parents and above, all the way up to the root node.
. or self::
 
The self axis designating the current context node and the current location in a path.
.. or parent:: The parent axis sends you up a short distance, to the immediate parent of the context node
/ or child:: The child axis (the default) sends you down to the immediate child of the context node.
​// or descendant:: ​The descendant axis sends you down to the children and their children etc.
@ or attribute:: The attribute (@) axis for locating attributes and attribute values

Predicates

Predicates are always embedded in square brackets.

 Filter the results of path expressions

Predicates are used to find a specific node or a node that contains a specific value.

//div[@type="letter"]

//div/p[1]

//persName[@ref="#Chaucer"]

//div[@type="letter"]/p[1]

//div[@type="letter"]/p[1]/persName[1]

Predicates & Axes

//persName[parent::title]

//persName[ancestor::body]

//persName[descendant::forename]

//place[child::placeName]

Unknown Nodes

Wildcard Description
* Matches any element node
@* Matches any attribute node
node() Matches any node of any kind
XPath Expression Description
//* Selects all elements in the document
//title[@*] Selects all title elements which have at least one attribute of any kind

Function

Do something with the information retrieved from the document instead of just returning it as received.

Retrieve all of the <paragraph> elements in a <div> element (that has a type attribute value of "letter") but instead of returning the actual elements, return just a count of how many there are. This uses the count() function.

count(//div[@type="letter"]/p)

//div[@type="letter"]/p => count()

Functions

Retrieve all of the <persName> elements in a <div> element (that has a type attribute value of "letter") and return just a count of how many there are. 

count(//div[@type="letter"]//persName)

//div[@type="letter"]//persName => count()

Functions

Function Description
distinct-values() eliminate repetition in a list of results
last() Returns the last
lower-case()
upper-case()
Changes case of string
not() Inverts the truth value of the argument. //p[not(q)] returns all <p> elements that do not have any <q> child element.
normalize-space() normalize the white space (spaces, tabs, new lines, etc.) in text
string-length()  returns the length of text by counting characters. 

Functions

//div[@type="letter"]//persName=>distinct-values()

//div[@type="letter"]//p[last()]

//div[@type="letter"]//p[1]

//placeName=> distinct-values()

//div[@type="letter"]//persName[not(@ref="#Webb_Mary_younger")]

Simple Map & Arrow

Operator Description
! The simple map operator (!) means do the thing on the right once for each item on the left. 
=> The arrow operator (=>) means apply the function on the right to the entire sequence (all at once) on the left.

//addrLine ! count(descendant::persName)

 //persName => distinct-values()

//div[@type="letter"] ! count(p)

Comparison Operators

Value General Description
eq = equal to
ne !=  not equal to
gt > > greater than (may also be written &gt;)
ge ​>= greater than or equal to (not less than; may also be written &gt;=)
lt < less than (may also be written &lt;)
le <=  less than or equal to (not greater than; may also be written &lt;=)