Elasticsearch

2 days

476 slides

54 lab exercises

 

What is Elasticsearch ?

You know,

for search..

Documents

{}
{
  "firstName": "John",
  "lastName": "Smith",
  "isAlive": true,
  "age": 25,
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postalCode": "10021-3100"
  },
  "phoneNumbers": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "office",
      "number": "646 555-4567"
    }
  ]
}

So what ?

Document store

Search engine

(Full text search)

Suggestions

/

highlighting

Analytics

&

Aggregations

Alerting

&

Classification

(Percolator)

Log data analysis

Packet data / network monitoring

Event data

Data visualization

(Kibana)

Technically speaking...

Horizontally distributed

(more machines)

High Availability

(Near) Real Time

Open Source

Apache License

Proprietary plugins

for 

security, monitoring, messaging, alerting and more..

Simple RESTful API

Setup

Cluster

Node

Index

Shard

Indexing

_index

curl -XPUT 'http://cluster:9200/index/type/' -d@somedata.json

Let's take some data and 'analyze' it...

Analyzer

Input string > character filters > tokenizer > token filters > index

 

=="inverted index"

Analyzer

Standard Analyzer

Simple Analyzer

Whitespace Analyzer
Stop Analyzer
Keyword Analyzer
Pattern Analyzer
Language Analyzers
Snowball Analyzer
Custom Analyzer

Analyzers

Preprocesses the string of characters before it is passed to the tokenizer. 


A character filter may be used to strip out HTML markup,

or to convert "&" characters to the word "and".

Character filters

Mapping Char Filter
HTML Strip Char Filter
Pattern Replace Char Filter

Character filters

Breaks a string down into a stream of terms or tokens.

 

A simple tokenizer might split the string up into terms wherever it encounters whitespace or punctuation.

Tokenizers

Standard Tokenizer
Edge NGram Tokenizer
Keyword Tokenizer
Letter Tokenizer
Lowercase Tokenizer
NGram Tokenizer
Whitespace Tokenizer
Pattern Tokenizer
UAX Email URL Tokenizer
Path Hierarchy Tokenizer
Classic Tokenizer
Thai Tokenizer

Tokenizers

Accepts a stream of tokens from a tokenizer and can:

 

modify tokens (eg lowercasing)

delete tokens (eg remove stopwords)

add tokens (eg synonyms)

Token filters

Token filters

Standard Token Filter
ASCII Folding Token Filter
Length Token Filter
Lowercase Token Filter
Uppercase Token Filter
NGram Token Filter
Edge NGram Token Filter
Porter Stem Token Filter
Shingle Token Filter
Stop Token Filter
Word Delimiter Token Filter
Stemmer Token Filter
Stemmer Override Token Filter
Keyword Marker Token Filter
Keyword Repeat Token Filter
KStem Token Filter
Snowball Token Filter
Phonetic Token Filter
Synonym Token Filter
Compound Word Token Filter

Reverse Token Filter
Elision Token Filter
Truncate Token Filter
Unique Token Filter
Pattern Capture Token Filter
Pattern Replace Token Filter
Trim Token Filter
Limit Token Count Token Filter
Hunspell Token Filter
Common Grams Token Filter
Normalization Token Filter
CJK Width Token Filter
CJK Bigram Token Filter
Delimited Payload Token Filter
Keep Words Token Filter
Keep Types Token Filter
Classic Token Filter
Apostrophe Token Filter
Decimal Digit Token Filter

Index

Stores:

the original document

+ some metadata

+ terms in inverted index

Analysis

Input

Tokenizer

Standard

Character Filter

HTMLStripper

TokenFilter

Stopwords

TokenFilter

Lowercase

Index

"<div>The Quick Brown Fox Jumps Over The Lazy Dog</div>"
The Quick Brown Fox Jumps Over The Lazy Dog
"The", "Quick", "Brown", "Fox", "Jumps", "Over", "The", "Lazy", "Dog"
["Quick", "Brown", "Fox", "Jumps", "Over", "Lazy", "Dog"]
["quick", "brown", "fox", "jumps", "over", "lazy", "dog"]

Inverted Index

Inverted Index

Term Document
quick 1,2
brown 1
fox 1,2
jump 2,6,7,8,9
over 1,2,4
lazy 1
dog 1,2

Querying

and filtering

_search

query DSL (domain specific language) allows us to express complex definitions of how to slice & dice data 

Simple Query

curl -XGET 'http://kibana.tre.se:9200/_all/_search' -d '{
{
    "query": {
        "match_all": {}
    } 
}'

Complex Query

curl -XGET 'http://kibana.tre.se:9200/_all/_search?pretty' -d '{
  "facets": {
    "0": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "30m"
      },
      "global": true,
      "facet_filter": {
        "fquery": {
          "query": {
            "filtered": {
              "query": {
                "query_string": {
                  "query": "duration:<100"
                }
              },
              "filter": {
                "bool": {
                  "must": [
                    {
                      "range": {
                        "@timestamp": {
                          "from": 1457449871001,
                          "to": "now"
                        }
                      }
                    },
                    {
                      "fquery": {
                        "query": {
                          "query_string": {
                            "query": "application:(\"smart-rest\")"
                          }
                        },
                        "_cache": true
                      }
                    },
                    {
                      "fquery": {
                        "query": {
                          "query_string": {
                            "query": "logname:(\"messageRestLog\")"
                          }
                        },
                        "_cache": true
                      }
                    }
                  ],
                  "must_not": [
                    {
                      "fquery": {
                        "query": {
                          "query_string": {
                            "query": "client:(\"SMARTFunctionalTests\")"
                          }
                        },
                        "_cache": true
                      }
                    },
                    {
                      "fquery": {
                        "query": {
                          "query_string": {
                            "query": "application:(\"smart-rest-legacy\")"
                          }
                        },
                        "_cache": true
                      }
                    }
                  ]
                }
              }
            }
          }
        }
      }
    },
    "1": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "30m"
      },
      "global": true,
      "facet_filter": {
        "fquery": {
          "query": {
            "filtered": {
              "query": {
                "query_string": {
                  "query": "duration:[100 TO 500]"
                }
              },
              "filter": {
                "bool": {
                  "must": [
                    {
                      "range": {
                        "@timestamp": {
                          "from": 1457449871002,
                          "to": "now"
                        }
                      }
                    },
                    {
                      "fquery": {
                        "query": {
                          "query_string": {
                            "query": "application:(\"smart-rest\")"
                          }
                        },
                        "_cache": true
                      }
                    },
                    {
                      "fquery": {
                        "query": {
                          "query_string": {
                            "query": "logname:(\"messageRestLog\")"
                          }
                        },
                        "_cache": true
                      }
                    }
                  ],
                  "must_not": [
                    {
                      "fquery": {
                        "query": {
                          "query_string": {
                            "query": "client:(\"SMARTFunctionalTests\")"
                          }
                        },
                        "_cache": true
                      }
                    },
                    {
                      "fquery": {
                        "query": {
                          "query_string": {
                            "query": "application:(\"smart-rest-legacy\")"
                          }
                        },
                        "_cache": true
                      }
                    }
                  ]
                }
              }
            }
          }
        }
      }
    },
    "2": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "30m"
      },
      "global": true,
      "facet_filter": {
        "fquery": {
          "query": {
            "filtered": {
              "query": {
                "query_string": {
                  "query": "duration:[500 TO 2000]"
                }
              },
              "filter": {
                "bool": {
                  "must": [
                    {
                      "range": {
                        "@timestamp": {
                          "from": 1457449871002,
                          "to": "now"
                        }
                      }
                    },
                    {
                      "fquery": {
                        "query": {
                          "query_string": {
                            "query": "application:(\"smart-rest\")"
                          }
                        },
                        "_cache": true
                      }
                    },
                    {
                      "fquery": {
                        "query": {
                          "query_string": {
                            "query": "logname:(\"messageRestLog\")"
                          }
                        },
                        "_cache": true
                      }
                    }
                  ],
                  "must_not": [
                    {
                      "fquery": {
                        "query": {
                          "query_string": {
                            "query": "client:(\"SMARTFunctionalTests\")"
                          }
                        },
                        "_cache": true
                      }
                    },
                    {
                      "fquery": {
                        "query": {
                          "query_string": {
                            "query": "application:(\"smart-rest-legacy\")"
                          }
                        },
                        "_cache": true
                      }
                    }
                  ]
                }
              }
            }
          }
        }
      }
    },
    "6": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "30m"
      },
      "global": true,
      "facet_filter": {
        "fquery": {
          "query": {
            "filtered": {
              "query": {
                "query_string": {
                  "query": "duration:>2000"
                }
              },
              "filter": {
                "bool": {
                  "must": [
                    {
                      "range": {
                        "@timestamp": {
                          "from": 1457449871002,
                          "to": "now"
                        }
                      }
                    },
                    {
                      "fquery": {
                        "query": {
                          "query_string": {
                            "query": "application:(\"smart-rest\")"
                          }
                        },
                        "_cache": true
                      }
                    },
                    {
                      "fquery": {
                        "query": {
                          "query_string": {
                            "query": "logname:(\"messageRestLog\")"
                          }
                        },
                        "_cache": true
                      }
                    }
                  ],
                  "must_not": [
                    {
                      "fquery": {
                        "query": {
                          "query_string": {
                            "query": "client:(\"SMARTFunctionalTests\")"
                          }
                        },
                        "_cache": true
                      }
                    },
                    {
                      "fquery": {
                        "query": {
                          "query_string": {
                            "query": "application:(\"smart-rest-legacy\")"
                          }
                        },
                        "_cache": true
                      }
                    }
                  ]
                }
              }
            }
          }
        }
      }
    }
  },
  "size": 0
}'
    
Close

Querys

All querys are analyzed.

(So we can use the inverted index we created)

Querys

Results are 'scored'

Filter

{
    "query": {
        "bool": {
            "must": {
                "match": {
                    "body": "Java"
                }
            },
            "filter": {
                "term": {
                    "comment_count": 5
                }
            }
        }
    }
}

doc: |1|2|3|4|5|6|7|8|..

match: |1|0|0|1|0|1|0|1|..

Filter

Querys

Evaluates to what degree each document matches and results in a scored result.

 

Is computed every time we query.

 

can not be cached.

Filters

Evaluates whether a document matches or not and results in a filter bitset.

 

Does not need to be computed every time.

 

can be cached

Mappings

_mappings

{
  "@version": "1",
  "@timestamp": "2016-03-10T14:43:39.762Z",
  "host": "x13067pzz.omaccess.net",
  "logtimestamp": "2016-03-10 15:43:39,264",
  "callId": "0000015354917041-400150",
  "user": "piippo59@gmail.com",
  "uri": "/smart/api/rest/customers/525056841829790123726121104565/notifications",
  "strippedUri": "/smart/api/rest/customers/NUMBER/notifications",
  "method": "GET",
  "statusCode": 200,
  "startTime": "2016-03-10T15:43:39.206+01:00",
  "duration": 58,
  "client": "3Sverige",
  "accept": "application/vnd.com.hi3gaccess.smart-v2+json",
  "agent": "",
  "isAdmin": "",
  "authMethod": "Interactive",
  "size": 6296,
  "cache": "NONE",
  "applicationId": "se.tre.ios.3sverige",
  "applicationVersion": "3.6",
  "json_responseBody": "{\"notifications\":[{\"my3WebLink\":{\"buttonText\":\"Visa faktura\",\"url\":\"Mitt3/Faktura/#new-invoice\"},\"id\":\"0a877b32-252f-42ce-bd51-0eb48207e5b4\",\"text\":\"Du har en ny faktura med förfallodag 2016-02-26.\",\"status\":\"UNREAD\",\"ownerId\":\"10079876446\",\"shortText\":\"Ny faktura\",\"my3AppLink\":{\"buttonText\":\"Visa faktura\",\"url\":\"my3://Invoice.details/516521775523\"},\"notificationOwner\":\"account\",\"date\":\"2016-02-25\",\"eventTypeId\":\"newInvoice\"},{\"my3WebLink\":{\"buttonText\":\"Visa faktura\",\"url\":\"Mitt3/Faktura/#new-invoice\"},\"id\":\"e6a50bc2-eb7c-4e95-a769-3c57e9761d7d\",\"text\":\"Du har en ny faktura med förfallodag 2016-01-28.\",\"status\":\"UNREAD\",\"ownerId\":\"10079876446\",\"shortText\":\"Ny faktura\",\"my3AppLink\":{\"buttonText\":\"Visa faktura\",\"url\":\"my3://Invoice.details/516424402423\"},\"notificationOwner\":\"account\",\"date\":\"2016-01-19\",\"eventTypeId\":\"newInvoice\"},{\"my3WebLink\":{\"buttonText\":\"Visa faktura\",\"url\":\"Mitt3/Faktura/#new-invoice\"},\"id\":\"46551726-b721-4ac7-aca3-9d69982bc19f\",\"text\":\"Du har en ny faktura med förfallodag 2015-12-28.\",\"status\":\"READ\",\"ownerId\":\"10079876446\",\"shortText\":\"Ny faktura\",\"my3AppLink\":{\"buttonText\":\"Visa faktura\",\"url\":\"my3://Invoice.details/516287234020\"},\"notificationOwner\":\"account\",\"date\":\"2015-12-14\",\"eventTypeId\":\"newInvoice\"},{\"my3WebLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"Mitt3/Abonnemang/Abonnemangshantering?ent=${isid}\"},\"id\":\"5449f210-f8c6-4205-a5aa-d714b157b93f\",\"text\":\"46760171330 har förbrukat de 512 MB data som ingår per månad. Köp extra surf om du vill fortsätta surfa. Den extra surfmängd du köper gäller tills den är förbrukad men som längst till månadsskiftet.\",\"status\":\"READ\",\"ownerId\":\"163954130\",\"shortText\":\"Inkluderad data är slut\",\"my3AppLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"my3://Subscription.Voucher.National?msisdn=46760171330\"},\"notificationOwner\":\"subscription\",\"date\":\"2015-12-17\",\"eventTypeId\":\"outOfFreeUnits\"},{\"my3WebLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"Mitt3/Abonnemang/Abonnemangshantering?ent=163954113\"},\"id\":\"c9869eff-df61-402f-a1b6-017cfad6fd5a\",\"text\":\"46760185635 har förbrukat de 20480 MB data som ingår per månad. Köp extra surf om du vill fortsätta surfa. Den extra surfmängd du köper gäller tills den är förbrukad men som längst till månadsskiftet.\",\"status\":\"UNREAD\",\"ownerId\":\"163954113\",\"shortText\":\"Inkluderad data är slut\",\"my3AppLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"my3://Subscription.Voucher.National?msisdn=46760185635\"},\"notificationOwner\":\"subscription\",\"date\":\"2016-02-23\",\"eventTypeId\":\"outOfFreeUnits\"},{\"my3WebLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"Mitt3/Abonnemang/Abonnemangshantering?ent=163954113\"},\"id\":\"bf5d8ada-6606-44c4-99b2-f0f69744eff0\",\"text\":\"46760185635 har förbrukat de 20480 MB data som ingår per månad. Köp extra surf om du vill fortsätta surfa. Den extra surfmängd du köper gäller tills den är förbrukad men som längst till månadsskiftet.\",\"status\":\"UNREAD\",\"ownerId\":\"163954113\",\"shortText\":\"Inkluderad data är slut\",\"my3AppLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"my3://Subscription.Voucher.National?msisdn=46760185635\"},\"notificationOwner\":\"subscription\",\"date\":\"2016-01-21\",\"eventTypeId\":\"outOfFreeUnits\"},{\"my3WebLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"Mitt3/Abonnemang/Abonnemangshantering?ent=${isid}\"},\"id\":\"f675e05b-f474-4144-be6f-3552e87c317b\",\"text\":\"46760185635 har förbrukat de 20480 MB data som ingår per månad. Köp extra surf om du vill fortsätta surfa. Den extra surfmängd du köper gäller tills den är förbrukad men som längst till månadsskiftet.\",\"status\":\"UNREAD\",\"ownerId\":\"163954113\",\"shortText\":\"Inkluderad data är slut\",\"my3AppLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"my3://Subscription.Voucher.National?msisdn=46760185635\"},\"notificationOwner\":\"subscription\",\"date\":\"2015-12-16\",\"eventTypeId\":\"outOfFreeUnits\"},{\"my3WebLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"Mitt3/Abonnemang/Abonnemangshantering?ent=163929804\"},\"id\":\"a427cfa2-1c11-4b28-93b5-b6e6656bfedd\",\"text\":\"46760071705 har förbrukat de 512 MB data som ingår per månad. Köp extra surf om du vill fortsätta surfa. Den extra surfmängd du köper gäller tills den är förbrukad men som längst till månadsskiftet.\",\"status\":\"UNREAD\",\"ownerId\":\"163929804\",\"shortText\":\"Inkluderad data är slut\",\"my3AppLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"my3://Subscription.Voucher.National?msisdn=46760071705\"},\"notificationOwner\":\"subscription\",\"date\":\"2016-03-02\",\"eventTypeId\":\"outOfFreeUnits\"},{\"my3WebLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"Mitt3/Abonnemang/Abonnemangshantering?ent=${isid}\"},\"id\":\"1385dd11-5f83-4b0e-ae00-74b7f5fe7178\",\"text\":\"46760071705 har förbrukat de 512 MB data som ingår per månad. Köp extra surf om du vill fortsätta surfa. Den extra surfmängd du köper gäller tills den är förbrukad men som längst till månadsskiftet.\",\"status\":\"UNREAD\",\"ownerId\":\"163929804\",\"shortText\":\"Inkluderad data är slut\",\"my3AppLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"my3://Subscription.Voucher.National?msisdn=46760071705\"},\"notificationOwner\":\"subscription\",\"date\":\"2016-01-06\",\"eventTypeId\":\"outOfFreeUnits\"},{\"my3WebLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"Mitt3/Abonnemang/Abonnemangshantering?ent=${isid}\"},\"id\":\"bb65db9c-4ba6-4097-8f39-4ee328403780\",\"text\":\"46760071705 har förbrukat de 512 MB data som ingår per månad. Köp extra surf om du vill fortsätta surfa. Den extra surfmängd du köper gäller tills den är förbrukad men som längst till månadsskiftet.\",\"status\":\"READ\",\"ownerId\":\"163929804\",\"shortText\":\"Inkluderad data är slut\",\"my3AppLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"my3://Subscription.Voucher.National?msisdn=46760071705\"},\"notificationOwner\":\"subscription\",\"date\":\"2015-12-09\",\"eventTypeId\":\"outOfFreeUnits\"},{\"my3WebLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"Mitt3/Abonnemang/Abonnemangshantering?ent=163929802\"},\"id\":\"2f0654ac-66de-4f7e-a5fb-7ce691b001ab\",\"text\":\"46760071704 har förbrukat de 5120 MB data som ingår per månad. Köp extra surf om du vill fortsätta surfa. Den extra surfmängd du köper gäller tills den är förbrukad men som längst till månadsskiftet.\",\"status\":\"UNREAD\",\"ownerId\":\"163929802\",\"shortText\":\"Inkluderad data är slut\",\"my3AppLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"my3://Subscription.Voucher.National?msisdn=46760071704\"},\"notificationOwner\":\"subscription\",\"date\":\"2016-01-27\",\"eventTypeId\":\"outOfFreeUnits\"}]}",
  "resource": "customers",
  "subresource1": "525056841829790123726121104565",
  "subresource2": "notifications",
  "application": "smart-rest",
  "logname": "messageRestLog"
}

How to interpret..

Option 1: Let elasticsearch figure it out.

 

Option 2: Specify exactly what each field means

Mappings live in indexes.

Suggestions

_suggest

we know it and love it...

Can be based on:

term
phrase
completion
context

Creates an FST

Fancy Smart Thing

-or-

Finite State Transducer

Aggregations

_search

...but differently

When we search, we get

Data

Metadata

3 kinds of metatdata

Buckets

Metrics

Pipelines

Buckets

Each document is evaluated against the created buckets and each bucket keeps track of what documents “fall” in it. 

Metrics

Typically, metrics aggregations generate numeric stats that are computed over a specific document set.

Avg, Cardinality, Extended Stats, Geo Bounds, Geo Centroid, Max, Min, Percentiles, Percentile Ranks, Scripted Metric, Stats, Sum, Top hits, Value Count

Pipelines

Aggregates on the aggregated data

Percolation

_percolate

Like search,  reversed

Store the query's and ask if document matches.

"Send a message when temperature is over x degrees"

"Send a message when number of failures per minute is above x"

Alerting

like triggers, but different...

"If document contains X, then add field foo and bar"

Classification

tags for example...

Integration

Get data into elasticsearch

_index

Get data out of elasticsearch

_search

programmatically...

My favourite language...

Java API
JavaScript API
Groovy API
.NET API
PHP API
Perl API
Python API
Ruby API

 

+Community Contributed Clients

Kibana

Grafana

2 days

476 slides

54 lab exercises

 

.questions(?)

Elasticsearch

By maderskog

Elasticsearch

  • 423