ElasticSearch

Gentle introduction...

Rest API

If you have access to REST API, you can majorly screw up things:

curl -X DELETE "http://localhost:9200/*"

That's right, you just emptied your cluster. One command, no confirmation.

Searching

2 main ways of searching in ES:

  • Via a url query parameter. This is pretty similar to Solr/Lucene syntax, but doesn't allow advanced stuff like aggregation, nested documents etc.
GET /<indiceName>/_search?q=person_id:5789758af6c0b483488b4735
  • Via sending a json body along the request, defining the search query using ES Query DSL. This is the standard way.

Query DSL

Simple example

GET people_5/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "date_updated": {
        "order": "desc"
      }
    }
  ],
  "from": 0,
  "size": 20
}

Query types

Of course, match_all is not the only query type. Here's the most useful ones

  • match
  • multi_match
  • query_string
  • term
  • terms
  • nested
  • bool

Match query

GET /<indice/_search
{
  "query": {
    "match": {
       "title": "Article title"
    }
  }
}

This will look for "Article title" in the field title of each documents

Ok, I'm lying: this is going to search for Article OR title inside the title of each documents, and boost accordingly.

This type of query will be analyzed according to index configuration (tokenising, stemming, synonyms etc)

Multi match query

GET /<indice/_search
{
  "query": {
    "multi_match": {
       "query": "Article title",
       "fields": ["title", "body"]
    }
  }
}

This will look for "Article title" in the both fields

Term query

GET /<indice/_search
{
  "query": {
    "term": {
       "tag": "php"
    }
  }
}

This will look for the exact term "php" in the tag field.

Contrary to match and query_string, this type of queries are not analyzed: whatever you pass have to be exactly what's in the inverted index.

Terms query

GET /<indice/_search
{
  "query": {
    "terms": {
       "tag": ["php", "elasticsearch"]
    }
  }
}

This will look for any document having either php or elasticsearch as a tag.

Boolean query

GET /<indice/_search
{
  "query": {
    "bool": {
      "must": [{
        "term": {
          "tag": "php"
        }
      }],
      should: [
        {
          "term": {
            "tag": "elasticsearch"
          }
        }
      ],
      must_not: [
        {
          "term": {
            "organization": "oracle"
          }
        }
      ],
      filter: [
        {
          "term": {
            "level": "good"
          }
        }
      ]
    }
  }
}

Must be, part of scoring

Should be, part of scoring

Must not be, not part of scoring

Must be, not part of scoring => cachable!

Nested query

GET /<indice/_search
{
  "query": {
    "nested": {
      "path": "profile",
      "query": {
        "bool": {
          "filter": {
            "terms": {
              "profile.source": [
                "public","hcareers","dice","rigzone","efinancialcareers"
              ]
            }
          },
          "must": [
            {"term": {
              "profile.value.given_name": {
                "value": "fabrice"
              }
            }}
          ]
        }
      }
    }
  }
}

ElasticSearch - DSL intro

By fguery

ElasticSearch - DSL intro

  • 367