exist db

XML Prague 2024 Pre-Conference Session

Hello

5min

Juri @_o@chaos.social
Nico nverwer@rakensi.com
Dannes dannes@exist-db.org
Boris boris@daliboris.cz

Talk to us

on Slack

Tools

15min

VSCode

VSCode existdb

exist-db on the command line

xst

elevator pitch

Routing

5min

  • controller.xq

  • restxq

  • roaster

OAD + QR code

Lucene

20min

Lucene

  • defined in the collection.xconf file
    • can be in specific (sub)collection
    • stored in the /db/system/config/db/apps subcollection
    • synchronization in VS Code doesn't work (doesn't store file in the /db/system/config/db/apps collection)
    • xst upload --include "*.xconf" --verbose --apply-xconf ./data/dictionaries/ /db/apps/exist-db-lucene/data/dictionaries --config .existdb.json

Indexes

  • XPath
    • @expression="tei:form[@type=('lemma', 'variant')]/tei:pron"
  • XQuery
    • @expression="nav:get-metadata(., 'pronunciation')"
      • /index.xql
        • ​module namespace idx="http://teipublisher.com/index";
      • /data/dictionaries/collexction.xconf
        • <module uri="http://teipublisher.com/index" prefix="nav" at="../../index.xql"/>

Indexes (GettinG value)

  • automatically while
    • uploading/inserting, removing data
    • updating data (update insert | delete | replace | delete | rename)
    • doesn't work if the <text> element contains XPath with pradicate, like //tei:list[@type='index']
  • programmatically
    • xmldb:reindex('/db/apps/app/data/')
    • only the user with admin rights

Indexing

  • manually
    • xst execute "xmldb:reindex('/db/apps/exist-db-lucene/data/dictionaries')" --config admin.xstrc
      • (doesn't always work)
    • from eXide (always works)
      • be logged in as an admin
      • save collection.xconf file (with/out modification)
      • click on OK button in dialog

Indexing

Indexing (eXide)

<field name="domain" expression="nav:get-metadata(., 'domain')" />

  • used for full-text searching
  • like properties of the parent node
  • not only text, but all atomic types (xs:date, xs:dateTime, xs:time, xs:integer, xs:decimal...)
  • can be computed
    • taken from different document
    • taken from different part of the document

ft:field($item, "domain")

Fields

<field name="sortKey" expression="nav:get-metadata(., 'sortKey')" binary="yes" />

  • used for sorting or filtering
  • content can be retrieved, but not queried

ft:binary-field($entry, "sortKey", "xs:string")

Binary Fields

<facet dimension="domain" expression="nav:get-metadata(., 'domain')" />

  • used for filtering (existing values in the search result)
  • field and facet can use the same expression (and thus the values)
  • can be hierachical
  • available only if ft:query function is used

Facets

<ref xml:lang="en" type="reversal">many times</ref>

  • whole text is indexed, but separate words are searched
    • //tei:entry[ft:query(., " reversal: many ")]
  • search for combination of words using double quotes ""
    • //tei:entry[ft:query(., ' reversal: "many times" ')]
  • search with regular expression using slashes //
    • //tei:entry[ft:query(., " reversal: /[Bb]iologic.*/ ")]
  • search parts of words using wildcards (* and ?)
    • //tei:entry[ft:query(., ' reversal: "time*" ')]

Searching

  • return fields (for futher processing)
    • //db:entry[ft:query(., (), map { "fields": ("sortKey") })]

declare namespace tei = "http://www.tei-c.org/ns/1.0";

let $collection := "/db/apps/exist-db-lucene/data/dictionaries"
let $hits := collection($collection)//tei:entry[
  ft:query(., "reversal:although", map { "fields" : "sortKey" } )
  ]
for $hit in $hits
   order by ft:field($hit, "sortKey")
   return $hit

Using fields

declare namespace tei = "http://www.tei-c.org/ns/1.0";
declare namespace exist = "http://exist.sourceforge.net/NS/exist";
 
let $collection := "/db/apps/exist-db-lucene/data/dictionaries"
let $hits := collection($collection)//tei:entry[ft:query(., "reversal:although")]
for $hit in $hits
  let $expanded := util:expand($hit)
  return $expanded
 
<ref xml:lang="en" type="reversal">
   <exist:match>although</exist:match>
</ref>

Highlighting

<gram type="pos" expand="Adjective">adj</gram>

  • can be used only on the result of full-text search (ft:query())

 

let $options := map {
    "facets": map {
        "partOfSpeech": ("subst", "v")
    }
}

//tei:entry[ft:query(., (), $options)]

Filtering (Facets)

  • edit collection.xconf or index.xql
  • if collection.xconf is modified
    • use xst upload --include "*.xconf" --verbose --apply-xconf ./data/dictionaries/ /db/apps/exist-db-lucene/data/dictionaries --config .existdb.json
    • close and open collection.xconf in eXide
  • save collection.xconf in eXide and apply indexation
  • watch eXist-db log for errors
  • look in the Monex/report for the content of the field/facet

Debugging

indexes in Monex

indexes in Monex

indexes in Monex

indexes in Monex

indexes (report module)

indexes (report module)

xARs

20min

build tools

 

ant

maven

gulp-exist

testing

 

xqsuite

junit

end-to-end

Ant setup

gulp-exist example

Java

20min

FN:invisible-xml

https://qt4cg.org/specifications/xpath-functions-40/Overview.html#func-invisible-xml

fn:invisible-xml(
    $grammar     as item()?     := (),
    $options     as map(*)?     := {}
) as    fn(xs:string) as item()
  • Use Markup Blitz parser
  • The invisible-xml() function for eXist
  • Demo!
  • Bonus:
    - Remote debugging
  • To do

https://github.com/nverwer/exist/tree/ixml/exist-core

exist 7

5min

Java 17

eXist-db 4.x.x, 5.x.x, and 6.x.x require JDK 8

Features

  • [5188] Add in-memory cache for RPC query results
  • [5115]  Postfix Expressions for Named Function References

 

  • [5077] Add  XQuery Pragma exist:time

  • [4802] Add mail:get-mail-session#2 with authentication
xs:string#1("hello")​

IMprovements

  • [5084] Optimize Path Expressions for BasicExpressionVisitor
  • [4793]  Optimisations to the Lucene Index Worker
  • [4708]  Ensure that XSLT stylesheets are only compiled once when used from an xsl-pi in the REST Server 

BugFixes (1)

  • [5067] XMLRPC: non ".xml" files always stored as binary
  • [5046] Fix a bug in Node Path equality
  • [5018]  EXPath package:  Fix XQuery transient imports issue
  • [4996] Fixes regarding circular imports
  • [3207] Fix Lucene Index optimisations and Matches (facets)
  • [4980] Correct function signatures that return empty sequences
  • [4973] Fix cardinality issue cast to function: () => xs:string()

BugFixes (2)

  • [4901] Make EXpath (java) packages portable between filesystems
  • [4900] Fix castable as for untyped atomic values
  • [4609] Allow fn:transform load stylesheet from database 
  • [4864] Spec compliant: fn:replace, fn:tokenize, fn:analyze-string
  • [4850] Tails of subsequences are off by one
  • [2102] NPE in match highlight with Lucene & NGram
  • [4741] Fixes for CDATA: doc('has-cdata.html')//script/node()/name()
  • [4703] Fix fn:serialize issues

Worth mentioning

  • Docker images available for develop and develop-6.x.x 
  • [5215] XSD conf.xml: Add documentation
  • [5113]  XSD: Add XSD for EXPath Packaging System and extensions
  • [4941]  Migrate from java.xml to jakarta.xml (JAXB et al)
  • [4854] Allow Saxon licensed editions (EE and PE)
  • Fix for memory leak triggers/restore
  • Fix for defragmentation issue XUpdate + XQuery Update
  • Function type checks
  • The KWIC module is rewritten
  • The SPARQL extension will be revived (yes, there is one)

OUTLOOK

time for Discussions

enjoy your lunch

5min