PB138: XPath and XSLT
Content:
Using XPath
Using XPath
XSLT transformations
Bonus: OOXML
XPath
used in many languages to work with XML or HTML
you will see it in web scrapping session
Let's use following XML
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book>
<title lang="en">Dragons of Asgard</title>
<price>5.95</price>
</book>
<book>
<title lang="en">Scholomance<title>
<price>5.95</price>
</book>
</bookstore>
Expression | Description |
---|---|
nodename | Selects all nodes with the name "nodename" |
/ | Selects from the root node |
// | Selects nodes in the document from the current node that match the selection no matter where they are |
. | Selects the current node |
.. | Selects the parent of the current node Note: |
@ | Selects attributes |
CSS selectors cannot do this
/bookstore/book[price<35.00]/title
/bookstore/book[1]
/bookstore/book[last()-1]
//title[@lang='en']
Some examples
XSLT
XSL (eXtensible Stylesheet Language) is a styling language for XML.
Not used so much these days.
Typical use cases:
CI output conversion
Government systems
Conversions to Word
and other XML heavy systems (some UI SDKs)
Usage
If opening in browser just link it (cannot be local file)
<?xml-stylesheet href="transformation.xsl" type="text/xsl" ?>
File extension: .xsl
from command line you can use procxslt (can be downloaded here)
procxslt stylesheet.xsl file.xml
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>My CD Collection</h2>
<table border="1">
<tr bgcolor="#9acd32">
<th>Title</th>
<th>Artist</th>
</tr>
<tr>
<td>.</td>
<td>.</td>
</tr>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Step 1: Template
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>My CD Collection</h2>
<table border="1">
<tr bgcolor="#9acd32">
<th>Title</th>
<th>Artist</th>
</tr>
<tr>
<td><xsl:value-of select="catalog/cd/title"/></td>
<td><xsl:value-of select="catalog/cd/artist"/></td>
</tr>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Step 2: Add values
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>My CD Collection</h2>
<table border="1">
<tr bgcolor="#9acd32">
<th>Title</th>
<th>Artist</th>
</tr>
<xsl:for-each select="catalog/cd">
<tr>
<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="artist"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Step 3: For cycle
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>My CD Collection</h2>
<table border="1">
<tr bgcolor="#9acd32">
<th>Title</th>
<th>Artist</th>
</tr>
<xsl:for-each select="catalog/cd">
<xsl:sort select="artist"/>
<tr>
<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="artist"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Step 4: Sorting
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>My CD Collection</h2>
<table border="1">
<tr bgcolor="#9acd32">
<th>Title</th>
<th>Artist</th>
<th>Price</th>
</tr>
<xsl:for-each select="catalog/cd">
<xsl:if test="price > 10">
<tr>
<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="artist"/></td>
<td><xsl:value-of select="price"/></td>
</tr>
</xsl:if>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
IF usage
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>My CD Collection</h2>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
<xsl:template match="cd">
<p>
<xsl:apply-templates select="title"/>
<xsl:apply-templates select="artist"/>
</p>
</xsl:template>
<xsl:template match="title">
Title: <span style="color:#ff0000">
<xsl:value-of select="."/></span>
<br />
</xsl:template>
<xsl:template match="artist">
Artist: <span style="color:#00ff00">
<xsl:value-of select="."/></span>
<br />
</xsl:template>
</xsl:stylesheet>
Code splitting
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:import href="f1.xslt"/>
<xsl:template match="/root">
A: <xsl:value-of select="a/text()" />
<xsl:call-template name="secondTemplate" />
</xsl:template>
</xsl:stylesheet>
Code splitting in multiple files
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template name="secondTemplate">
B: <xsl:value-of select="b/text()" />
</xsl:template>
</xsl:stylesheet
OOXML
in MS Word you create XML documents. They have .docx extension.
Simply rename it to .zip and explore.
Note: ODF also has it's XML format.
Folder structure
[Content_Types].xml
_rels
word
document.xml
<w:document>
<w:body>
<w:p/>
</w:body>
</w:document>
paragraphs
<w:p>
<w:pPr>
<w:pStyle>
w:val="NormalWeb"/>
<w:spacing w:before="120" w:after="120" />
</w:pPr>
<w:r>
<w:t xml:space="preserve">I feel that ...</w:t>
</w:r>
</w:p>
tables
<w:tbl>
<w:tblPr>
<w:tblStyle w:val="TableGrid" />
<w:tblW w:w="5000" w:type="pct" />
</w:tblPr>
<w:tblGrid>
<w:gridCol w:w="2880" />
<w:gridCol w:w="2880" />
<w:gridCol w:w="2880" />
</w:tblGrid>
<w:tr>
<w:tc>
<w:tcPr>
<w:tcW w:w="2880" w:type="dxa" />
</w:tcPr>
<w:p>
<w:r>
<w:t>AAA</w:t>
</w:r>
</w:p>
</w:tc>
<w:tc>
<w:tcPr>
<w:tcW w:w="2880" w:type="dxa" />
</w:tcPr>
<w:p>
<w:r>
<w:t>BBB</w:t>
</w:r>
</w:p>
</w:tc>
<w:tc>
<w:tcPr>
<w:tcW w:w="2880" w:type="dxa" />
</w:tcPr>
<w:p>
<w:r>
<w:t>CCC</w:t>
</w:r>
</w:p>
</w:tc>
</w:tr>
</w:tbl>
That's it
PB138
By Lukáš Grolig
PB138
- 541