Adam Retter

adam@evolvedbinary.com
 

MarkupUK @ Online
2020-06-25


@adamretter

XQuery
is the
Plumber's Toolkit!

About Me

  • Developer / Consultant

    • XQuery / XSLT / Schema / RelaxNG

    • Scala / Java / C++

    • Concurrency and Scalability

  • Creator of FusionDB multi-model database (4 yrs.)

  • Contributor to Facebook's RocksDB (4 yrs.)

  • Core contributor to eXist-db XML Database (14 yrs.)

  • Founder of EXQuery, and creator of RESTXQ

  • W3C XQuery WG Invited expert

  • Me: www.adamretter.org.uk

A long time ago...

  • Before XProc there was XPL

    • ~2005 - https://www.w3.org/Submission/xpl/

  • Before XPL there was XQuery

    • ~2001 - https://www.w3.org/TR/2001/WD-xquery-20010215/

  • Before XQuery there was XPath

    • ~1999 - https://www.w3.org/1999/07/WD-xpath-19990709

  • Before XPath there was XSLT

    • ~1998 - https://www.w3.org/TR/1998/WD-xsl-19980818

Do we need XProc?

  • Back in 2006...

    • XPL looked nice but was Orbeon only

    • Apache Cocoon was ugly... and dying?

    • So... I wrote "pipelines" in XQuery (and XSLT)!

      • Many modules of small functions

      • XQuery -> XQuery -> XSLT -> XQuery -> XSLT -> XHTML / XUpdate

  • Now...

    • Anything XProc can do... I can do in XQuery!

      • Maybe I can even do more!

      • Why do I need XProc?

The power of XQuery

  • Is it's simplicity!

  • Data Model supports XML/JSON/Text/Binary

  • Mixing XML and XQuery in the same file

    • Creates dynamic documents

  • Depends on the Implementation!

    • HTTP Server fits well with the Web

      • Can produce XHTML/HTML5 dynamic websites

  • Much comes from EXPath/EXQuery/Vendor extensions!

Today's XQuery Challenges...

  1. Is there a proportionate relationsip between Covid-19 mortalities and Covid-19 relief funding?

  2. Is more or less medical research conducted in areas with a higher Covid-19 mortality rate?

* I am not a Data Scientist!

  • Today we will aggregate data from...

    • Various Web API using HTTP

    • Relational database using SQL

    • Graph (RDF) using SPARQL

    • JSON

    • XML Database

    • Binary files

  • We will then produce...

    • An HTML Page using XSLT

    • A PDF Report using XSL:FO

XQuery can integrate all things!

  • Medical Research Journal Articles

  • Statistics on Covid-19 Deaths

  • Statistics on Covid-19 Relief Funding

  • Mapping to/from State Abbreviations and State Names

To answer the challenges...

     we need data!

  • National Library of Medicine

  • Our XQuery

    • Use XQuery extension module - EXPath HTTP

    • Call Search API

    • Call Fetch API n-times in pages of 10,000 results

    • Use XQuery extension module - XMLDB

    • Store XML documents into XML Database

    • See: download-pubmed-store-xml.xq

Medical Research Journal Articles

  • Centers for Disease Control and Prevention

  • Our XQuery

    • Use XQuery extension module - EXPath HTTP

    • Call Download API

    • Parse the CSV into XDM Array

    • Cleanup CSV

    • Use XQuery extension module - SQL JDBC

    • Create SQL Table and Insert rows (MariaDB)

    • See: download-cdc-upload-sql.xq

Statistics on Covid-19 Deaths

  • Centers for Disease Control and Prevention

  • Our XQuery

    • Use XQuery extension module - EXPath HTTP

    • Call Download API

    • Use XQuery extension module - XMLDB

    • Store RDF document into XML Database *

    • See: download-hhs-store-rdf.xq

Statistics on Covid-19 Relief Funding

  • World Population Review

  • Our XQuery

    • Use XQuery extension module - EXPath HTTP

    • Call Download API

    • Use XQuery extension module - XMLDB

    • Store JSON document into XML Database

    • See: download-states-store-json.xq

Mapping to/from State Abbreviations and State Names

Now we have the data!

We have to query and join the different (SQL, RDF, JSON, XML)
datasets together...

But, to answer our questions:

  • Data is in SQL (MariaDB - 1,416 rows)

    • 1. Examine the Data

    • 2. Use the XQuery extension module - SQL JDBC

    • 3. Write some XQuery and SQL to get some stats

    • See: query-sql-dsas.xq

    • See: query-sql-deaths-total-percentage.xq

Step 1. Query the Death Statistics

  • Data is in RDF (eXist-db/TDB - 1,471,085 triples)

    • 1. Examine the Data

    • 2. Use the XQuery extension module - SPARQL

    • 3. Write some XQuery and SPARQL to get some stats

    • See: query-sparql-all-states.xq

    • See: query-sparql-state-payment-total.xq

Step 2. Query the Relief Funding Statistics

  • eXist-db's RDF/SPARQL Module is not returning all the data :-(

  • SPARQL Protocol to the rescue!

    • Load data into Jena TDB

    • Use XQuery extension module - EXPath HTTP

    • Query Fuseki over HTTP

    • See: query-fuseki-all-states.xq

    • Write our own XQuery Module for Jena Fuseki

      • For replacing sparql:query with fuseki:query

      • See: fuseki.xqm

    • See: query-fuseki-all-states-2.xq

Step 2/2. Ooops! ...Switch to Jena Fuseki

  • Death stats have State Names / Funding stats have State Abbrevs.

    • We want to join the datasets by state

    • We will map Abbrevs. to Names

  • Data is in JSON (eXist-db - 1 Map with 55 key/value pairs)

    • 1. Examine the Data

    • 2. Use the XQuery function - fn:json-doc

    • 3. Write some XQuery to lookup names from abbreviations

    • See: query-json-state.xq

Step 3. Map State Abbreviations to Names

  • Our XQuery

    • Uses all of the components so far

    • Calculates some percentages of total deaths

    • Calculates some percentages of total funding relief

    • Iterates through deaths by state

      • Joins funding relief payments by state ON state name

    • Produces two reports in XML

      • Each ordered differently for side-by-side comparison

    • See: answer1.xq

Step 4. Join Datasets and Create Reports

  • Our XQuery

    • Builds on the previous slide

    • We write an XSLT to transform the XML into HTML

      • See: answer1-to-html.xslt

    • Use XQuery extension module - Transform

      • If available, use fn:transform instead!

    • Produces HTML

      • Could be run from the database's Web Server :-)

    • See: answer1-html.xq

Step 5. Create HTML Reports

That buttery smooth level of integration
was amazing!!!

So... do we even need XProc?

Do we need XProc?

  • No!

    • We can and have survived without it!

    • XQuery - best to adopt a functional compsosition style

    • Why even use XQuery?

      • I could have written this in: Assembler / Pascal / Java / C++ / blah!

  • Yes!

    • Cleaner Separation of Concerns

    • Better Orchestration of Tasks

    • Exploit Parrallelism (one day?)

    • But... I want it to run inside the database!

      • Avoid Serialization between steps

      • Exploit Indexes

      • Resource Scheduling

fusiondb.com

XQuery is the Plumber's Toolkit

By Adam Retter

XQuery is the Plumber's Toolkit

Webinar given for MarkupUK - 25 June 2020 - Online

  • 410
Loading comments...

More from Adam Retter