Power Up your Search

with CBElasticSearch

About Me

  • Software Engineer
  • 10 months at Ortus
  • Central NY
  • Learning Spanish

Michael Born

@michaelborn_me

The State of Search

Expectations

  • Search everything
  • Faster search
  • Accurate results
  • Typo correction
  • Auto-complete

Current Solutions

  • CFSearch
  • Database search

CFSearch

"CFSearch is a 2010 solution to a 2020 problem"

- Michael Born

CFSearch

  • Built into the CF engine
  • Limited query language
  • On-instance only
  • No built-in scaling mechanism

Database Search

"Search via relational database is a 2001 solution to a 2020 problem"

- Michael Born

Database Search

  • DB's fit a completely different use case
  • Querying full text is painfully slow
  • No natural language search
  • No ordering by search relevance

ElasticSearch

A 2020 solution to a 2020 problem

What Is ElasticSearch?

  • An open-source search server
  • built on Lucene
  • distributed
  • API-based
  • JSON-rich

ElasticSearch Benefits

  • Fast
  • Intelligent
  • Built to scale
  • Not limited by DB constraints

What Is CBElasticSearch?

A CFML wrapper for ElasticSearch to facilitate:

  • Connecting to Elasticsearch
  • Managing indices
  • Managing documents
  • Searching the index

Installation

and configuration

Installing ElasticSearch

docker run -d \
    -p 9200:9200 \
    -e "discovery.type=single-node" \
     --name="myApp_ES" \
    elasticsearch:7.5.1

Install CBElasticSearch

box install cbelasticsearch
// config/Coldbox.cfc

moduleSettings = {
   "cbElasticsearch" : {
       "hosts" : [ {
           "serverProtocol" : "http",
           "serverName"     : "127.0.0.1",
           "serverPort"     : "9200"
       } ],
       "defaultIndex" = "reviews"
   }
}

Key Concepts

What Is An Index?

  • Generic: A collection of documents
  • Specific: A mapping of tokens found within those documents

Text Analysis (Indexing)

  • Breaks a text string into individual terms
  • May involve "stemming"
  • May remove "stopwords"

Text Analysis

lazy

dog

quick

brown

fox

jumps

over

the

the quick brown fox jumps over the lazy dog

Stemming

Breaks down keywords into their root

run

running

runners

Stopwords

List of words within a language that are not relevant to most searches

mighty

jungle

in the jungle, the mighty jungle

Mapping

A definition of fields and field types for a specific index.

{
  "mappings" : {
    "properties" : {
     "primaryTitle"     : { "type" : "text" },
     "titleType"        : { "type" : "keyword" },
     "runtime_mins"     : { "type" : "integer" },
     "startYear"        : { "type" : "integer" },
     "budget"           : { "type" : "integer" },
     "box_office_gross" : { "type" : "integer" }
    }
  }
}

Document

A single record within an index

{
   "primaryTitle": "The Fellowship Of The Ring",
   "titleType": "movie",
   "runtime_mins": "178",
   "startYear" : "2001",
   "budget" : "93,000,000",
   "box_office_gross": "887,800,000"
}

API Demo

Indices

Creation and management

On Application Load...

component {

    /**
     * Initialize the ElasticSearch index on app load/reinit
     */
    void function afterConfigurationLoad( event, interceptData ){
        // create index...
    }
}

Create an Index

function buildMyIndex() {
  getInstance( "IndexBuilder@cbElasticsearch" )
      .new(
          "reviews",
         {
            "_doc" = {
                "_all" = { "enabled" = false },
                "properties" = {
                    "title" = { "type" = "text" },
                    "authorName" = { "type" = "integer" },
                    "publishedDate" = { "type" = "text" },
                    "stars" = { "type" = "keyword" },
                    "content" = { "type" = "text" }
                }
            }
        }
  ).save();
}

MappingBuilder

function createBookIndex() {

    getInstance( "IndexBuilder" )
        .new( "books", getInstance( "MappingBuilder@cbElasticSearch" )
            .create( function( mapping ) {
                mapping.text( "title" );
                mapping.object( "author", function( mapping ) {
                    mapping.text( "firstname" );
                    mapping.text( "lastname" );
                } );
            } );
        } );

}

Creating A Document

function insertReview() {
  
  var bookReview = {
     "title" : "'Phantom': Lost in Hyperspace",
     "authorName" : "Rita Kempley",
     "publishedDate" : DateFormat( "1999-05-19" ),
     "stars" : "1",
     "content" : "The Empire strikes out..."
  };
  
  var document = getInstance( "Document@cbElasticsearch" )
     .new(
         index = "reviews",
         type = "_doc",
         properties = bookReview
     );
  document.save();
  
}

Searching

with CBElasticsearch

SearchBuilder

function searchByTitle( required string title ) {

    return getInstance( "SearchBuilder@cbElasticSearch" )
            .new( "news" )
            .match( "title", arguments.title )
            .execute();

}

Filter Vs Query

  • Two different ways to restrict search
  • All about context
  • ​Filter: Restrict w/out scoring
  • Query: Find and score

Filter

var results = getInstance( "SearchBuilder@CBelasticsearch" )
                .new( "books" )
                .filterMatch( "description", "toast" )
                .execute();

Query

var results = getInstance( "SearchBuilder@CBelasticsearch" )
                .new( "books" )
                .match( "description", "toast" )
                .execute();

TerM Vs Match

  • ​Term: Perform an exact value search
  • Match: Perform a search based on text analysis

Term Restriction

var result = getInstance( "SearchBuilder@cbElasticSearch" )
             .new( "reviews")
             .term( "stars", 5 )
             .execute();

Must Vs Should

  • Boolean syntax in Elasticsearch
  • Must: Each result must match all restrictions
  • Should: Each result should match some restriction(s)

Match Restriction

var result = getInstance( "SearchBuilder@cbElasticSearch" )
             .new( "reviews" )
             .match( "content", "burnt" )
             .execute();

Must Match: AND Clause

var result = getInstance( "SearchBuilder@cbElasticSearch" )
             .new( "reviews" )
             .mustMatch( "author", "Luis Majano" )
             .mustMatch( "content", "star wars" )
             .execute();

Should Match: OR Clause

var result = getInstance( "SearchBuilder@cbElasticSearch" )
             .new( "reviews" )
             .shouldMatch( "author", "Luis Majano" )
             .shouldMatch( "author", "Jon Clausen" )
             .execute();

Demo

Building a Yelp sample search

When To Use ElasticSearch

  1. Full-text search
  2. Structured data
  3. Future scaling
  4. Any complex search needs

When NOT to use ElasticSearch

  1. If you don't need full-text search
  2. If you don't have many records
  3. No scalability needs

Don't Need It?

Don't Use It!

Questions?

Happy to answer any questions in the CBElasticsearch room

Thank You!

Made with Slides.com