Acumen
Indexing and Advanced Search
This slideshow http://slides.com/the8bitsquid/acumen
Indexing
Browser
Repo
???
Advanced Search
indexed Fields
Advanced Search
Lucene Syntax
Lucene is the search engine framework on which Solr is built.
More details at http://lucene.apache.org/core/2_9_4/queryparsersyntax.html
- Field
- Wildcards
- Fuzzy
- Proximity
- Ranges
- Booleans
- Grouping
- Field Grouping
- Escaping of special chars
Search using
Advanced Search
Lucene Syntax
More details at http://lucene.apache.org/core/2_9_4/queryparsersyntax.html
Field Search
// In raw Lucene syntax,
// spaces matter!
subject:Eli Bowen
Searches 'subject' field for "Eli" and the default search field for "Bowen"
field_name:query
// Acumen helps a little
subject:Eli Bowen
Searches 'subject' field for "Eli" AND "Bowen"
Advanced Search
Lucene Syntax
More details at http://lucene.apache.org/core/2_9_4/queryparsersyntax.html
Wildcards
test*
te*t
te?t
test?
?est //Not allowed
= Single character
= 0 or more characters
?
*
subject:Eli Bowe*
subject:Eli B*n
subject:Eli Bo?en
subject:El? Bowen
subject:?li Bowen //Allowed
Raw Lucene does not allow leading wildcards
Acumen allows leading wildcards plus typical Lucene wildcards
Advanced Search
Lucene Syntax
More details at http://lucene.apache.org/core/2_9_4/queryparsersyntax.html
Fuzzy Search
roam~
= similar terms should be matched
~
roam~0.5 // Default similarity
roam~0.1 // Less similar
roam~0.8 // More similar
Fuzzy searches use Levenshtein Distance to find similarity between words
Adjust the "similarity" matching using values between 0 and 1
Advanced Search
Lucene Syntax
More details at http://lucene.apache.org/core/2_9_4/queryparsersyntax.html
Proximity in Phrase Search
// Terms can be at most
// 20 words apart
"sent the gin"~20
= similar terms should be matched
~[prox]
Proximity is measured by how many words not in the search phrase separate terms in the search phrase
Advanced Search
Lucene Syntax
More details at http://lucene.apache.org/core/2_9_4/queryparsersyntax.html
Range Searches
// Non-string date types are required
// to be Complete ISO 8601 Date syntax
date_tsf:[1970-12-31T00:00:00.00Z TO 1975-03-06T00:00:00Z]
number_field:[0 TO 10]
Distance between integers and dates
date:{1960 TO 1970}
name:{Wade TO Woodward}
Currently no integer field types indexed, but you can search range between strings
String ranges are very different from integer ranges, so 'date' string ranges are unreliable
Advanced Search
Lucene Syntax
More details at http://lucene.apache.org/core/2_9_4/queryparsersyntax.html
Boolean Operators
students AND snow
students OR quad
(students OR quad) AND snow
"AND", "OR" follow typical boolean logic
Boolean operators allow terms to be combined through logic operators. Lucene supports AND, "+", OR, NOT and "-" as Boolean operators(Note: Boolean operators must be ALL CAPS).
students +snow
//These two act the same
students -quad
NOT quad students
'+' only preceding term is required
'-' and 'NOT' behave the same
Advanced Search
Lucene Syntax
More details at http://lucene.apache.org/core/2_9_4/queryparsersyntax.html
Field and Term Grouping
Use parentheses to group terms with field matching or boolean logic
title:(+students quad) AND snow
Field Grouping
(students OR quad) AND snow
Term Grouping
Advanced Search
Lucene Syntax
More details at http://lucene.apache.org/core/2_9_4/queryparsersyntax.html
Escaping Special Characters with '\'
+ - && || ! ( ) { } [ ] ^ " ~ * ? : \
\(1\+1\)\:2
Search for (1+1):2
How Acumen Searches
Find the haystack in the stack of haystacks
'qf' => 'textAll',
'mm' => '100%',
'mm' => '100%',
'pf' => 'transcript abstract description',
'ps' => '7'
Default 'query field' (qf)
minimum should match (mm)
Phrase fields (pf), phrase slop (ps)
{!edismax qf=title^200 mm=75% v=$q bq=}
{!edismax qf=subject^100 mm=100% v=$q bq=}
{!edismax qf=genre^100 mm=100% v=$q bq=}
{!edismax qf=collection^50 mm=75% v=$q bq=}
{!edismax qf=date mm=100% v=$q bq=}
{!edismax qf=physdesc mm=100% v=$q bq=}
{!edismax qf=name^150 mm=100% v=$q bq=}
Boost queries (bq)
Uses Edismax query parser https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser
Acumen indexing and search
By the8bitsquid
Acumen indexing and search
- 1,335