Acumen

Indexing and Advanced Search

Indexing

Browser

Repo

???

Advanced Search

indexed Fields

Advanced Search

Lucene Syntax

Lucene is the search engine framework on which Solr is built. 

  • Field
  • Wildcards
  • Fuzzy
  • Proximity
  • Ranges
  • Booleans
  • Grouping
  • Field Grouping
  • Escaping of special chars

Search using

Advanced Search

Lucene Syntax

Field Search

// In raw Lucene syntax, 
// spaces matter!
subject:Eli Bowen

Searches 'subject' field for "Eli" and the default search field for "Bowen"

field_name:query
// Acumen helps a little
subject:Eli Bowen

Searches 'subject' field for "Eli" AND "Bowen"

Advanced Search

Lucene Syntax

Wildcards

test*
te*t
te?t
test?
?est //Not allowed

= Single character
= 0 or more characters

?

*

subject:Eli Bowe*
subject:Eli B*n
subject:Eli Bo?en
subject:El? Bowen
subject:?li Bowen //Allowed

Raw Lucene does not allow leading wildcards

Acumen allows leading wildcards plus typical Lucene wildcards

Advanced Search

Lucene Syntax

Fuzzy Search

roam~

= similar terms should be matched

~

roam~0.5 // Default similarity
roam~0.1 // Less similar
roam~0.8 // More similar

Fuzzy searches use Levenshtein Distance to find similarity between words

Adjust the "similarity" matching using values between 0 and 1

Advanced Search

Lucene Syntax

Proximity in Phrase Search

// Terms can be at most
// 20 words apart
"sent the gin"~20

= similar terms should be matched

~[prox]

Proximity is measured by how many words not in the search phrase separate terms in the search phrase

Advanced Search

Lucene Syntax

Range Searches

// Non-string date types are required
// to be Complete ISO 8601 Date syntax
date_tsf:[1970-12-31T00:00:00.00Z TO 1975-03-06T00:00:00Z]
number_field:[0 TO 10]

Distance between integers and dates

date:{1960 TO 1970}
name:{Wade TO Woodward}

Currently no integer field types indexed, but you can search range between strings

String ranges are very different from integer ranges, so 'date' string ranges are unreliable

Advanced Search

Lucene Syntax

Boolean Operators

students AND snow
students OR quad
(students OR quad) AND snow

"AND", "OR" follow typical boolean logic

Boolean operators allow terms to be combined through logic operators. Lucene supports AND, "+", OR, NOT and "-" as Boolean operators(Note: Boolean operators must be ALL CAPS).

students +snow
//These two act the same
students -quad
NOT quad students

 '+' only preceding term is required 
  '-' and 'NOT' behave the same

Advanced Search

Lucene Syntax

Field and Term Grouping

Use parentheses to group terms with field matching or boolean logic

title:(+students quad)  AND snow

Field Grouping

(students OR quad) AND snow

Term Grouping

Advanced Search

Lucene Syntax

Escaping Special Characters with '\'

+ - && || ! ( ) { } [ ] ^ " ~ * ? : \

\(1\+1\)\:2

Search for (1+1):2

How Acumen Searches

Find the haystack in the stack of haystacks

'qf' => 'textAll',
'mm' => '100%',
'mm' => '100%',
'pf' => 'transcript abstract description',
'ps' => '7'

Default 'query field' (qf)

minimum should match (mm)

Phrase fields (pf), phrase slop (ps)

{!edismax qf=title^200 mm=75% v=$q bq=}
{!edismax qf=subject^100 mm=100% v=$q bq=}
{!edismax qf=genre^100 mm=100% v=$q bq=}
{!edismax qf=collection^50 mm=75% v=$q bq=}
{!edismax qf=date mm=100% v=$q bq=}
{!edismax qf=physdesc mm=100% v=$q bq=}
{!edismax qf=name^150 mm=100% v=$q bq=}

Boost queries (bq)

Acumen indexing and search

By the8bitsquid

Acumen indexing and search

  • 1,238