exist db
XML Prague 2024 Pre-Conference Session
Hello
5min
| Juri | @_o@chaos.social |
| Nico | nverwer@rakensi.com |
| Dannes | dannes@exist-db.org |
| Boris | boris@daliboris.cz |
Talk to us
on Slack

Tools
15min
VSCode
VSCode existdb

exist-db on the command line
xst
elevator pitch
Routing
5min
-
controller.xq
-
restxq
-
roaster
OAD + QR code

Lucene
20min
- developed by Apache Lucene working group
- https://lucene.apache.org
- current version: 9.10.0 (2024-02-20)
- version in eXist-db: 4.10.4
- for indexing (phase 1) and searching (phase 2) in documents
- 1 document =
- 1 xml file (root element and its descendants)
- 1 element (not necessary the root) and its descendants
- eXist-db documentation
Lucene
- defined in the collection.xconf file
- can be in specific (sub)collection
- stored in the /db/system/config/db/apps subcollection
- synchronization in VS Code doesn't work (doesn't store file in the /db/system/config/db/apps collection)
-
xst upload --include "*.xconf" --verbose --apply-xconf ./data/dictionaries/ /db/apps/exist-db-lucene/data/dictionaries --config .existdb.json
Indexes
- XPath
@expression="tei:form[@type=('lemma', 'variant')]/tei:pron"
- XQuery
-
@expression="nav:get-metadata(., 'pronunciation')"- /index.xql
module namespace idx="http://teipublisher.com/index";
- /data/dictionaries/collexction.xconf
<module uri="http://teipublisher.com/index" prefix="nav" at="../../index.xql"/>
- /index.xql
-
Indexes (GettinG value)
- automatically while
- uploading/inserting, removing data
- updating data (update insert | delete | replace | delete | rename)
- doesn't work if the <text> element contains XPath with pradicate, like //tei:list[@type='index']
- programmatically
xmldb:reindex('/db/apps/app/data/')- only the user with admin rights
Indexing
- manually
-
xst execute "xmldb:reindex('/db/apps/exist-db-lucene/data/dictionaries')" --config admin.xstrc- (doesn't always work)
- from eXide (always works)
- be logged in as an admin
- save collection.xconf file (with/out modification)
- click on OK button in dialog
-
Indexing
Indexing (eXide)


<field name="domain" expression="nav:get-metadata(., 'domain')" />
- used for full-text searching
- like properties of the parent node
- not only text, but all atomic types (xs:date, xs:dateTime, xs:time, xs:integer, xs:decimal...)
- can be computed
- taken from different document
- taken from different part of the document
ft:field($item, "domain")
Fields
<field name="sortKey" expression="nav:get-metadata(., 'sortKey')" binary="yes" />
- used for sorting or filtering
- content can be retrieved, but not queried
ft:binary-field($entry, "sortKey", "xs:string")
Binary Fields
<facet dimension="domain" expression="nav:get-metadata(., 'domain')" />
- used for filtering (existing values in the search result)
- field and facet can use the same expression (and thus the values)
- can be hierachical
- available only if ft:query function is used
Facets
<ref xml:lang="en" type="reversal">many times</ref>
- whole text is indexed, but separate words are searched
- //tei:entry[ft:query(., " reversal: many ")]
- search for combination of words using double quotes
""- //tei:entry[ft:query(., ' reversal: "many times" ')]
- search with regular expression using slashes
//- //tei:entry[ft:query(., " reversal: /[Bb]iologic.*/ ")]
- search parts of words using wildcards (* and ?)
- //tei:entry[ft:query(., ' reversal: "time*" ')]
Searching
- return fields (for futher processing)
- //db:entry[ft:query(., (), map { "fields": ("sortKey") })]
declare namespace tei = "http://www.tei-c.org/ns/1.0";
let $collection := "/db/apps/exist-db-lucene/data/dictionaries"
let $hits := collection($collection)//tei:entry[
ft:query(., "reversal:although", map { "fields" : "sortKey" } )
]
for $hit in $hits
order by ft:field($hit, "sortKey")
return $hit
Using fields
declare namespace tei = "http://www.tei-c.org/ns/1.0";declare namespace exist = "http://exist.sourceforge.net/NS/exist";let $collection := "/db/apps/exist-db-lucene/data/dictionaries"let $hits := collection($collection)//tei:entry[ft:query(., "reversal:although")]for $hit in $hits let $expanded := util:expand($hit) return $expanded<ref xml:lang="en" type="reversal"> <exist:match>although</exist:match></ref>Highlighting
<gram type="pos" expand="Adjective">adj</gram>
- can be used only on the result of full-text search (
ft:query())
let $options := map {
"facets": map {
"partOfSpeech": ("subst", "v")
}
}
//tei:entry[ft:query(., (), $options)]
Filtering (Facets)
- edit collection.xconf or index.xql
- if collection.xconf is modified
- use
xst upload --include "*.xconf" --verbose --apply-xconf ./data/dictionaries/ /db/apps/exist-db-lucene/data/dictionaries --config .existdb.json - close and open collection.xconf in eXide
- use
- save collection.xconf in eXide and apply indexation
- watch eXist-db log for errors
- look in the Monex/report for the content of the field/facet
Debugging
indexes in Monex

indexes in Monex

indexes in Monex

indexes in Monex

indexes (report module)

indexes (report module)

xARs
20min
build tools
ant
maven
gulp-exist
testing
xqsuite
junit
end-to-end
Ant setup

gulp-exist example
Java
20min
FN:invisible-xml
https://qt4cg.org/specifications/xpath-functions-40/Overview.html#func-invisible-xml

fn:invisible-xml(
$grammar as item()? := (),
$options as map(*)? := {}
) as fn(xs:string) as item()- Use Markup Blitz parser
- The
invisible-xml()function for eXist - Demo!
- Bonus:
- Remote debugging - To do

https://github.com/nverwer/exist/tree/ixml/exist-core
exist 7
5min
Java 17
eXist-db 4.x.x, 5.x.x, and 6.x.x require JDK 8
xs:string#1("hello")BugFixes (1)
- [5067] XMLRPC: non ".xml" files always stored as binary
- [5046] Fix a bug in Node Path equality
- [5018] EXPath package: Fix XQuery transient imports issue
- [4996] Fixes regarding circular imports
- [3207] Fix Lucene Index optimisations and Matches (facets)
- [4980] Correct function signatures that return empty sequences
- [4973] Fix cardinality issue cast to function: () => xs:string()
BugFixes (2)
- [4901] Make EXpath (java) packages portable between filesystems
- [4900] Fix castable as for untyped atomic values
- [4609] Allow fn:transform load stylesheet from database
- [4864] Spec compliant: fn:replace, fn:tokenize, fn:analyze-string
- [4850] Tails of subsequences are off by one
- [2102] NPE in match highlight with Lucene & NGram
- [4741] Fixes for CDATA: doc('has-cdata.html')//script/node()/name()
- [4703] Fix fn:serialize issues
- Fix for memory leak triggers/restore
- Fix for defragmentation issue XUpdate + XQuery Update
- Function type checks
- The KWIC module is rewritten
- The SPARQL extension will be revived (yes, there is one)
OUTLOOK
time for Discussions
enjoy your lunch
5min
exist-db - XMLPrague pre-conf 2024
By Juri Leino
exist-db - XMLPrague pre-conf 2024
- 6