Elasticsearch: Advanced Query

Han Yi

May 1, 2018

Types of Advanced Query

  • Aggregations
  • Suggesters
  • Scripts
  • Search Templates

Basic Types of Aggregations

  • Bucketing: group by
    • bucketing aggregation can be nested using bucketing and metric
  • Metric: calculation, like avg, sum, min, max
  • Matrix: calculate numeric statistics over a set of fields
  • Pipeline: aggregation chain

Aggregations Structure

"aggregations" : {
    "<aggregation_name>" : {
        "<aggregation_type>" : {
            <aggregation_body>
        }
        [,"meta" : {  [<meta_data_body>] } ]?
        [,"aggregations" : { [<sub_aggregation>]+ } ]?
    }
    [,"<aggregation_name_2>" : { ... } ]*
}
  • Meta: Being put into individual aggregations at request time that will be returned in place at the response time

Bucket Aggregations

GET /_search
{
    "aggs": {
        "nested_aggs": {
            "nested": {
                "path":"child"
            },
            "aggs": {
                "filtered_aggs": {
                    "filter": {
                        "bool": {
                            "must": [
                                {
                                    "term": {
                                        "child.color":"Red"
                                    }
                                }
                            ]
                        }
                    },
                    "aggs": {
                        "lvl1": {
                            "terms": {
                                "field": "child.category.lvl1",
                                "order": {
                                    "count":"desc"
                                }
                             },
                            "aggs": {
                                "count": {
                                    "reverse_nested": {}
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
  • Terms/Nested/Reverse Nested aggregation
    • Group by nested field

Bucket Aggregations

GET /_search
{
    "aggs" : {
        "price_ranges" : {
            "range" : {
                "field" : "price",
                "ranges" : [
                    { "to" : 100.0 },
                    { "from" : 100.0, "to" : 200.0 },
                    { "from" : 200.0 }
                ]
            }
        }
    }
}
  • Range aggregation
    • Group by range

Metric Aggregations

  • Top Hits Aggregation
    • Retrieve documents from bucket
GET /_search
{
    "aggs": {
        "nested_aggs": {
            "nested": {
                "path":"child"
            },
            "aggs": {
                "filtered_aggs": {
                    "filter": {
                        "bool": {
                            "must": [
                                {
                                    "term": {
                                        "child.color":"Red"
                                    }
                                }
                            ]
                        }
                    },
                    "aggs": {
                        "lvl1": {
                            "terms": {
                                "field": "child.category.lvl1",
                                "order": {
                                    "count":"desc"
                                }
                             },
                            "aggs": {
                                "count": {
                                    "reverse_nested": {},
                                    "aggs": {
                                        "top_hits": {
                                            "top_hits": {
                                                "sort": [{
                                                    "price": {
                                                        "order": "desc"
                                                    }
                                                }],
                                                "_source": {
                                                    "includes": [ "name", "price" ]
                                                },
                                                "size" : 1
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

Metric Aggregations

  • Max/Min/Avg Aggregation
POST /product/_search
{
    "aggs" : {
        "max_price" : { "max" : { "field" : "price" } }
    }
}

POST /product/_search
{
    "aggs" : {
        "min_price" : { "min" : { "field" : "price" } }
    }
}

POST /product/_search
{
    "aggs" : {
        "avg_price" : { "avg" : { "field" : "price" } }
    }
}

Matrix Aggregations

  • Statistics Aggregation
GET /products/_search
{
    "size": 0,
    "aggs": {
        "statistics": {
            "matrix_stats": {
                "fields": ["price"]
            }
        }
    }
}

//sample response
"aggregations": {
  "statistics": {
    "doc_count": 4553,
    "fields": [
      {
        "name": "price",
        "count": 4553,
        "mean": 47.31886243291309,
        "variance": 8779.347529532348,
        "skewness": 19.845533881336312,
        "kurtosis": 537.8599726243962,
        "covariance": {
          "price": 8779.347529532348
        },
        "correlation": {
          "price": 1
        }
      }
    ]
  }
}

Pipeline Aggregations

  • Chain of Aggregation
POST /_search
{
    "aggs": {
        "my_date_histo":{
            "date_histogram":{
                "field":"timestamp",
                "interval":"day"
            },
            "aggs":{
                "the_sum":{
                    "sum":{ "field": "lemmings" } 
                },
                "the_movavg":{
                    "moving_avg":{ "buckets_path": "the_sum" } 
                }
            }
        }
    }
}

Aggregation Summary

  • Traditional aggregation operations include distinct, count, average, group, etc
  • Elasticsearch becomes popular because of aggregation rather than search
  • Aggregation pipeline/Nest aggregation is most flexible capability in Elasticsearch
  • Aggregation is calendar aware and location awareness
  • Type keyword is better for running aggregation, sorting, etc

Suggesters

  • Term and phrase suggester
    • Make suggestions based on the existing documents in case of typos or spelling mistakes
  • Completion suggester
    • Make suggestions to predict the query term before user finishes typing

Suggesters

  • Term suggester
GET products/doc/_search
{
  "_source": [],
  "suggest": {
    "term_suggester": {
      "text": "jackat",
      "term": {
        "field": "name"
      }
    }
  }
}

Suggesters

  • Phrase suggester
GET products/doc/_search
{
  "_source": [],
  "suggest": {
    "term_suggester": {
      "text": "donw jackat",
      "phrase": {
        "field": "name",
        "max_errors": 2,
        "collate": {
          "query": {
            "inline": {
              "match_phrase": {
                "{{field_name}}": {
                  "query": "{{suggestion}}",
                  "slop": 1
                }
              }
            }
          },
          "params": {
            "field_name": "name"
          },
          "prune": false
        }
      }
    }
  }
}

Suggesters

  • Completion suggester
    • Need to create specific field whose type is "completion"
    • copy_to is usually used to create separate field from existing field
GET products/doc/_search
{
  "_source": [],
  "suggest": {
    "my_suggestion": {
      "prefix": "jack",
      "completion": {
        "field": "name"
      }
    }
  }
}

Scripts

  • Extremely flexible to achieve many features not supported by existing DSL API
    • painless
    • expression
    • mustache
    • java
GET products/_doc/_search
{
  "query": {
    "script": {
      "script": {
        "lang": "painless",
        "inline": "doc['color'].value == 'Black'"
      }
    }
  }
}

Search Templates

  • Can use mustache template engine to create search template
  • Template is stored in Elasticsearch server and can be called directly
GET _search/template/find_product_by_name
{
  "query": {
    "match": {
      "name": "{{ product_name }}"
    }
  }
}
GET products/_doc/_search
{
  "id": "find_product_by_name",
  "params": {
    "product_name": "down jacket"
  }
}

Thanks

Elasticsearch: Advanced Query

By hanyi8000

Elasticsearch: Advanced Query

  • 1,279