PB138: XPath and XSLT

Content:

Using XPath

Using XPath

XSLT transformations

Bonus: OOXML

XPath

used in many languages to work with XML or HTML

you will see it in web scrapping session

 

Let's use following XML

<?xml version="1.0" encoding="UTF-8"?>

<bookstore>

<book>
  <title lang="en">Dragons of Asgard</title>
  <price>5.95</price>
</book>

<book>
  <title lang="en">Scholomance<title>
  <price>5.95</price>
</book>

</bookstore>
Expression Description
nodename Selects all nodes with the name "nodename"
/ Selects from the root node
// Selects nodes in the document from the current node that match the selection no matter where they are
. Selects the current node
.. Selects the parent of the current node
Note: 
@ Selects attributes

CSS selectors cannot do this

/bookstore/book[price<35.00]/title

/bookstore/book[1]

/bookstore/book[last()-1]

//title[@lang='en']

Some examples

XSLT

XSL (eXtensible Stylesheet Language) is a styling language for XML.

Not used so much these days.

 

Typical use cases:

CI output conversion

Government systems

Conversions to Word

and other XML heavy systems (some UI SDKs)

Usage

If opening in browser just link it (cannot be local file)

<?xml-stylesheet href="transformation.xsl" type="text/xsl" ?>

File extension: .xsl

from command line you can use procxslt (can be downloaded here)

procxslt stylesheet.xsl file.xml
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">
  <html>
  <body>
  <h2>My CD Collection</h2>
  <table border="1">
    <tr bgcolor="#9acd32">
      <th>Title</th>
      <th>Artist</th>
    </tr>
    <tr>
      <td>.</td>
      <td>.</td>
    </tr>
  </table>
  </body>
  </html>
</xsl:template>

</xsl:stylesheet>

Step 1: Template

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">
  <html>
  <body>
  <h2>My CD Collection</h2>
  <table border="1">
    <tr bgcolor="#9acd32">
      <th>Title</th>
      <th>Artist</th>
    </tr>
    <tr>
      <td><xsl:value-of select="catalog/cd/title"/></td>
      <td><xsl:value-of select="catalog/cd/artist"/></td>
    </tr>
  </table>
  </body>
  </html>
</xsl:template>

</xsl:stylesheet>

Step 2: Add values

<?xml version="1.0"?>

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">
  <html>
  <body>
    <h2>My CD Collection</h2>
    <table border="1">
      <tr bgcolor="#9acd32">
        <th>Title</th>
        <th>Artist</th>
      </tr>
      <xsl:for-each select="catalog/cd">
        <tr>
          <td><xsl:value-of select="title"/></td>
          <td><xsl:value-of select="artist"/></td>
        </tr>
      </xsl:for-each>
    </table>
  </body>
  </html>
</xsl:template>

</xsl:stylesheet>

Step 3: For cycle

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">
  <html>
  <body>
  <h2>My CD Collection</h2>
  <table border="1">
    <tr bgcolor="#9acd32">
      <th>Title</th>
      <th>Artist</th>
    </tr>
    <xsl:for-each select="catalog/cd">
      <xsl:sort select="artist"/>
      <tr>
        <td><xsl:value-of select="title"/></td>
        <td><xsl:value-of select="artist"/></td>
      </tr>
    </xsl:for-each>
  </table>
  </body>
  </html>
</xsl:template>

</xsl:stylesheet>

Step 4: Sorting

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">
  <html>
  <body>
  <h2>My CD Collection</h2>
  <table border="1">
    <tr bgcolor="#9acd32">
      <th>Title</th>
      <th>Artist</th>
      <th>Price</th>
    </tr>
    <xsl:for-each select="catalog/cd">
      <xsl:if test="price &gt; 10">
        <tr>
          <td><xsl:value-of select="title"/></td>
          <td><xsl:value-of select="artist"/></td>
          <td><xsl:value-of select="price"/></td>
        </tr>
      </xsl:if>
    </xsl:for-each>
  </table>
  </body>
  </html>
</xsl:template>

</xsl:stylesheet>

IF usage

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">
  <html>
  <body>
  <h2>My CD Collection</h2>
  <xsl:apply-templates/>
  </body>
  </html>
</xsl:template>

<xsl:template match="cd">
  <p>
  <xsl:apply-templates select="title"/>
  <xsl:apply-templates select="artist"/>
  </p>
</xsl:template>

<xsl:template match="title">
  Title: <span style="color:#ff0000">
  <xsl:value-of select="."/></span>
  <br />
</xsl:template>

<xsl:template match="artist">
  Artist: <span style="color:#00ff00">
  <xsl:value-of select="."/></span>
  <br />
</xsl:template>

</xsl:stylesheet>

Code splitting

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:import href="f1.xslt"/>

 <xsl:template match="/root">
   A: <xsl:value-of select="a/text()" />
   <xsl:call-template name="secondTemplate" />
 </xsl:template>

</xsl:stylesheet> 

Code splitting in multiple files

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

 <xsl:template name="secondTemplate">
   B: <xsl:value-of select="b/text()" />
 </xsl:template>

</xsl:stylesheet

OOXML

in MS Word you create XML documents. They have .docx extension.

 

Simply rename it to .zip and explore.

 

Note: ODF also has it's XML format.

Folder structure

[Content_Types].xml

_rels

word

document.xml

<w:document>
<w:body>
<w:p/>
</w:body>
</w:document>

paragraphs

<w:p>
    <w:pPr>
        <w:pStyle>
            w:val="NormalWeb"/>
            <w:spacing w:before="120" w:after="120" />
    </w:pPr>
    <w:r>
        <w:t xml:space="preserve">I feel that ...</w:t>
    </w:r>
</w:p>

tables

<w:tbl>
    <w:tblPr>
        <w:tblStyle w:val="TableGrid" />
        <w:tblW w:w="5000" w:type="pct" />
    </w:tblPr>
    <w:tblGrid>
        <w:gridCol w:w="2880" />
        <w:gridCol w:w="2880" />
        <w:gridCol w:w="2880" />
    </w:tblGrid>
    <w:tr>
        <w:tc>
            <w:tcPr>
                <w:tcW w:w="2880" w:type="dxa" />
            </w:tcPr>
            <w:p>
                <w:r>
                    <w:t>AAA</w:t>
                </w:r>
            </w:p>
        </w:tc>
        <w:tc>
            <w:tcPr>
                <w:tcW w:w="2880" w:type="dxa" />
            </w:tcPr>
            <w:p>
                <w:r>
                    <w:t>BBB</w:t>
                </w:r>
            </w:p>
        </w:tc>
        <w:tc>
            <w:tcPr>
                <w:tcW w:w="2880" w:type="dxa" />
            </w:tcPr>
            <w:p>
                <w:r>
                    <w:t>CCC</w:t>
                </w:r>
            </w:p>
        </w:tc>
    </w:tr>
</w:tbl>

That's it

PB138

By Lukáš Grolig

PB138

  • 541