Elasticsearch: Query Basics

Han Yi

April 30, 2018

RESTful API of Search

Basic Search
- URI based search
- Request body based search

GET products/_doc/_search?q=rating:5

POST products/_doc/_search
{
  "query": {
    "nested":{
      "path":"child",
      "score_mode":"max",
      "query":{
        "simple_query_string":{
          "query":"red hat",
          "fields":[
            "child.color^2.0",
            "child.name^1.0",
          ]
        }
      }
    }
  }
}

Filter and Query Context

Filter Context
- Does this document match this query clause?
- case 1: bool filter/must_not
- case 2: constant_score filter
- case 3: aggregation filter
- case 4: score functions
Query Context
- How well does this document match this query clause?

Filter Context Examples

//1. filter inner bool query
POST context/_doc/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "name": "red hat"
        }      
      },
      "filter": {
        "term": {
          "context": "generic"
        }
      }
    }
  }
}

//2. constant_score
POST products/_doc/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "status": "active"
        }
      }
    }
  }
}

//3. aggregation filter
POST products/_doc/_search
{
  "aggs": {
    "nested_aggs": {
      "nested": {
        "path":"child"
      },
      "aggs": {
        "filtered_aggs": {
          "filter": {
            "bool": {
              "must": [
              {
                "term": {
                  "child.color":"red"
                }
              }
              ]
            }
          },
          "aggs": {
            "lvl": {
              "terms": {
                "field": "child.lvl",
                "order": {
                  "count":"desc"
                }
              },
              "aggs": {
                "count": {
                  "reverse_nested":{}
                }
              }
            }
          }
        }
      }
    }
  }
}

//4. Score functions
POST products/_doc/_search
{
  "query": {
    "function_score": {
      "query": { "match_all": {} },
        "boost": "5", 
        "functions": [
        {
          "filter": { "match": { "test": "bar" } },
          "random_score": {}, 
          "weight": 23
        },
        {
          "filter": { "match": { "test": "cat" } },
          "weight": 42
        }
        ],
        "max_boost": 42,
        "score_mode": "max",
        "boost_mode": "multiply",
        "min_score" : 42
    }
  }
}

Pros and Cons of Filter Context

Performance
Caching
- Frequently used filters will be cached
- Cache is stored at the node level
- Default to be 10% of the heap
- Can be configured by indices.queries.cache.size: 10%
No relevancy

Basic Query Patterns

Term Level Query
- term(s), range, exists, prefix, wildcard, regexp, fuzzy, type, ids
Full Text Query
- (multi) match (phrase), common terms, (simple) query string
Compound Query
- constant score, bool, dis max, function score, boosting

Term Level Query

//1. term vs terms query
POST expmgrrules/doc/_search
{
  "query": {
    "term": {
      "triggers": "26972"
    }
  }
}

POST expmgrrules/doc/_search
{
  "query": {
    "terms": {
      "triggers": ["26972","20676"]
    }
  }
}

//2. range query
POST products/doc/_search
{
  "query": {
    "nested": {
      "path": "child",
      "query": {
        "range": {
          "child.price": {
            "lte": 5
          }
        }
      }
    }
  }
}

//3. range with date query
POST products/doc/_search
{
  "from": 0,
  "size": 24,
  "query": {
    "nested": {
      "path": "path",
      "query": {
        "range": {
          "releaseDate": {
            "gt": "now-1h"
          }
        }
      }
    }
  }
}

//4. exists query
POST products/doc/_search
{
  "query": {
    "nested": {
      "path": "child",
      "query": {
        "exists": {
          "field": "special"
        }
      }
    }
  }
}

//5. wildcard/regex query
//* means any match, ? means single match
POST products/doc/_search
{
  "query": {
    "nested": {
      "path": "child",
      "query": {
        "wildcard": {
          "name": "boys*"
        }
      }
    }
  }
}

Term Level Query vs Full-Text Query

Term vs Match
- Term level query can be used for numbers, booleans, dates, and text type field, but it ignores mapping types and only matches low-level exact terms inverted index
- Match allows to use mapping type for input query and document field, then build complex term level query to look up the inverted index
- Match by default use "should" to build term query, the user can use {"operator": "and"} to generate "must" term query

Full-Text Query: Match

//1. basic match
POST products/_doc/_search
{
  "query": {
    "nested":{
      "path":"child",
      "query":{
        "match": {
          "child.name": "men hat"
        }
      }
    }
  }
}

//2. complete match with the operator
POST products/_doc/_search
{
  "query": {
    "nested":{
      "path":"child",
      "query":{
        "match": {
          "child.name": {
            "query": "men hat",
            "operator": "AND"
          }
        }
      }
    }
  }
}

//3. multi match
POST products/_doc/_search
{
  "query": {
    "nested":{
      "path":"child",
      "query":{
        "multi_match": {
          "query": "red hat",
          "fields": ["child.name^1.0", "child.color^2.0"]
        }
      }
    }
  }
}

//4. match phrase
POST products/doc/_search
{
  "query": {
    "nested": {
      "path": "child",
      "query": {
        "match_phrase": {
          "child.name": {
            "query": "men hat",
            "slop": 2
          }
        }
      }
    }
  }
}

Full-Text Query: (Simple) Query String

//1. query string
POST products/_doc/_search
{
  "query": {
    "nested":{
      "path":"child",
      "score_mode":"max",
      "query":{
        "query_string":{
          "query":"(red hat) OR (blue hat)",
          "fields":[
            "child.color^2.0",
            "child.name^1.0",
          ]
        }
      }
    }
  }
}

//2. simple query string
POST products/_doc/_search
{
  "query": {
    "nested":{
      "path":"child",
      "score_mode":"max",
      "query":{
        "simple_query_string":{
          "query":"(red hat) OR (blue hat)",
          "fields":[
            "child.color^2.0",
            "child.name^1.0",
          ]
        }
      }
    }
  }
}

Full-Text Query: (Simple) Query String

Query String Syntax
- Field names: "rating:(4 OR 5)"
- Wildcards: "g*l ha?"
- Regular expression: "name:/red.*hat/"
- Fuzziness: "synoyms~1"
- Proximity searches: "red hat~5"
- Ranges: "rating:[3 TO 5]"
- Boosting
  - "color^2.0" or "red^2.0 hat"
- Boolean operators: "+red hat -shirt"
- Grouping: "(red OR blue) hat"

Full-Text Query: (Simple) Query String

Query String vs Simple Query String
- The query_string query parses the input and splits text around operators
- Each textual part is analyzed independently of each other
- Query string contains a lot of reserved characters, which could lead to a syntax error that prevents the query from running
- Unlike the regular query_string query, the Simple query string query will never throw an exception and just discards invalid parts of the query

Compound Query

bool query structure
- must, must_not, should are in the scoring mode
- the filter is must match without scoring mode

{
  "query": {
    "bool": {
      "must": [],
      "must_not": [],
      "should": [],
      "filter": [],
    }
  }
}

Manual Intervention Scoring

Multi match / Query string
Function score query
Compound query boost
Rescore
Debugging score calculation
Pros and Cons of Query Context search

Function Score Query

Modify the score of documents that are retrieved by a query
Score function works on a filtered set of documents

GET products/_doc/_search
{
  "query": {
    "nested": {
      "path": "child",
      "query": {
        "function_score": {
          "query": {
            "match": {
              "child.color": "red"
            }
          },
          "boost": "1",
          "random_score": {}, 
          "boost_mode":"multiply"
        }
      }
    }
  }
}

Function Score Query

Score calculation with multiple filter context query

GET products/_doc/_search
{
  "query": {
    "nested": {
      "path": "child",
      "query": {
        "function_score": {
          "query": {
            "match": {
              "child.color": "red"
            }
          },
          "functions": [{
            "filter": {
              "match": { 
                "child.lvl1": "men"
              }
            },
            "weight": 2
          }, {
            "filter": {
              "match": { 
                "child.lvl1": "women"
              }
            },
            "weight": 5
          }],
          "boost": "1",
          "boost_mode":"multiply"
        }
      }
    }
  }
}

Function Score Query

Score calculation based on script score function
- boost_mode: multiply, replace, sum, avg, max, min

GET products/_doc/_search
{
  "query": {
    "nested": {
      "path": "child",
      "query": {
        "function_score": {
          "query": {
            "match": {
              "child.color": "red"
            }
          },
          "script_score" : {
            "script" : {
              "source": "doc['child.price'].value"
            }
          },
          "boost": 0.1,
          "boost_mode":"sum"
        }
      }
    }
  }
}

Compound Query Boost

Score calculation based on script score function
- boost_mode: multiply, replace, sum, avg, max, min

GET products/_doc/_search
{
  "query": {
    "bool": {
      "must": {
        "term": {
          "name": "jacket"
        }
      },
      "should": [
        {
          "range": {
            "price": {
              "lte": 20,
              "boost": 0.5
            }
          }
        },
        {
          "range": {
            "price": {
              "lte": 15,
              "boost": 2
            }
          }
        }
      ]
    }
  }
}

Rescore

Rescoring only the top N documents for precision improving and cost reduction, window_size refers to top N on each shard

POST products/doc/_search
{
  "query": {
    "nested": {
      "type": "child",
      "query": {
        "match_all": {}
      }
    }
  },
  "rescore": {
    "window_size" : 500,
    "query": {
      "rescore_query" : {
        "nested": {
          "type": "child",
          "query": {
            "function_score": {
              "script_score": {
                "script": {
                  "inline": "doc['price'].value"
                }
              }
            }  
          }
        }
      }
    }
  }
}

Debugging relevancy calculation

Explain scoring for specific document
Explain scoring for query

GET /products/_doc/{_id}/_explain

POST products/_doc/_search
{
  "explain": true,
  "query": {
    //......
  }
}

Support manually impact on the order of search result
Caching on shard
- The results of the entire query are cached here
- Only hits count, aggregation, and suggestions are cached
- The result are only cached if size is 0 and no hits/document
- Query json will be used as cache key
- Default to be 1% of the heap
- Can be configured by indices.requests.cache.size: 1%

Pros and Cons of Query Context

Special processing in common query

Source fields selection
Sort
Pagination
Highlight

Selecting the fields in the response

White list in _source
- can use wildcards match "*"

POST products/doc/_search
{
  "from": 0,
  "size": 24,
  "_source": ["pro*"],
  "query": {
    "nested": {
      "path": "child",
      "query": {
        "term": {
          "child.productName": "hat"
        }
      }
    }
  }
}

Selecting the fields in the response

script_fields to format output programatically

POST products/doc/_search
{
  "from": 0,
  "size": 24,
  "_source": [],
  "script_fields": {
    "max_price_including_unit": {
      "script": {
        "inline": "'$' + params['_source']['price']"
      }
    }
  },
  "query": {
    "nested": {
      "path": "child",
      "query": {
        "term": {
          "name": "hat"
        }
      }
    }
  }
}

Sorting

Sort
- either by single field or multiple fields

POST products/doc/_search
{
  "from": 0,
  "size": 24,
  "query": {
    "nested": {
      "path": "child",
      "query": {
        "term": {
          "child.name": "hat"
        }
      }
    }
  },
  "sort": [{
      "child.price": "asc"
    }, {
      "child.no": "asc"
  }]
}

Pagination

Pagination
- Default page size is 10

POST products/doc/_search
{
  "from": 0,
  "size": 24,
  "query": {
    "nested": {
      "path": "child",
      "query": {
        "term": {
          "name": "hat"
        }
      }
    }
  }
}

Highlighting

POST products/doc/_search
{
  "query": {
    "nested" : {
      "type" : "child",
        "score_mode" : "sum",
        "query": {
          "simple_query_string" : {
            "fields" : ["color^2", "size^1"],
            "query": "Red"
          }
        },
        "inner_hits": {
          "size": 1,
          "highlight": {
            "fields" : {
              "color" : {},
              "size" : {},
            }
          }
        }
    }
  }
}

Elasticsearch: Query Basics

RESTful API of Search

Filter and Query Context

Filter Context Examples

Pros and Cons of Filter Context

Basic Query Patterns

Term Level Query

Term Level Query vs Full-Text Query

Full-Text Query: Match

Full-Text Query: (Simple) Query String

Full-Text Query: (Simple) Query String

Full-Text Query: (Simple) Query String

Compound Query

Manual Intervention Scoring

Function Score Query

Function Score Query

Function Score Query

Compound Query Boost

Rescore

Debugging relevancy calculation

Pros and Cons of Query Context

Special processing in common query

Selecting the fields in the response

Selecting the fields in the response

Sorting

Pagination

Highlighting

Thanks

Elasticsearch: Query Basics

Elasticsearch: Query Basics

hanyi8000

Elasticsearch: Query Basics

RESTful API of Search

Filter and Query Context

Filter Context Examples

Pros and Cons of Filter Context

Basic Query Patterns

Term Level Query

Term Level Query vs Full-Text Query

Full-Text Query: Match

Full-Text Query: (Simple) Query String

Full-Text Query: (Simple) Query String

Full-Text Query: (Simple) Query String

Compound Query

Manual Intervention Scoring

Function Score Query

Function Score Query

Function Score Query

Compound Query Boost

Rescore

Debugging relevancy calculation

Pros and Cons of Query Context

Special processing in common query

Selecting the fields in the response

Selecting the fields in the response

Sorting

Pagination

Highlighting

Thanks

Elasticsearch: Query Basics

More from hanyi8000