Elasticsearch
2 days
476 slides
54 lab exercises
What is Elasticsearch ?
You know,
for search..
Documents
{}
{
"firstName": "John",
"lastName": "Smith",
"isAlive": true,
"age": 25,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021-3100"
},
"phoneNumbers": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "office",
"number": "646 555-4567"
}
]
}
So what ?
Document store
Search engine
(Full text search)
Suggestions
/
highlighting
Analytics
&
Aggregations
Alerting
&
Classification
(Percolator)
Log data analysis
Packet data / network monitoring
Event data
Data visualization
(Kibana)
Technically speaking...
Horizontally distributed
(more machines)
High Availability
(Near) Real Time
Open Source
Apache License
Proprietary plugins
for
security, monitoring, messaging, alerting and more..
Simple RESTful API
Setup
Cluster
Node
Index
Shard
Indexing
_index
curl -XPUT 'http://cluster:9200/index/type/' -d@somedata.json
Let's take some data and 'analyze' it...
Analyzer
Input string > character filters > tokenizer > token filters > index
=="inverted index"
Analyzer
Standard Analyzer
Simple Analyzer
Whitespace Analyzer
Stop Analyzer
Keyword Analyzer
Pattern Analyzer
Language Analyzers
Snowball Analyzer
Custom Analyzer
Analyzers
Preprocesses the string of characters before it is passed to the tokenizer.
A character filter may be used to strip out HTML markup,
or to convert "&" characters to the word "and".
Character filters
Mapping Char Filter
HTML Strip Char Filter
Pattern Replace Char Filter
Character filters
Breaks a string down into a stream of terms or tokens.
A simple tokenizer might split the string up into terms wherever it encounters whitespace or punctuation.
Tokenizers
Standard Tokenizer
Edge NGram Tokenizer
Keyword Tokenizer
Letter Tokenizer
Lowercase Tokenizer
NGram Tokenizer
Whitespace Tokenizer
Pattern Tokenizer
UAX Email URL Tokenizer
Path Hierarchy Tokenizer
Classic Tokenizer
Thai Tokenizer
Tokenizers
Accepts a stream of tokens from a tokenizer and can:
modify tokens (eg lowercasing)
delete tokens (eg remove stopwords)
add tokens (eg synonyms)
Token filters
Token filters
Standard Token Filter
ASCII Folding Token Filter
Length Token Filter
Lowercase Token Filter
Uppercase Token Filter
NGram Token Filter
Edge NGram Token Filter
Porter Stem Token Filter
Shingle Token Filter
Stop Token Filter
Word Delimiter Token Filter
Stemmer Token Filter
Stemmer Override Token Filter
Keyword Marker Token Filter
Keyword Repeat Token Filter
KStem Token Filter
Snowball Token Filter
Phonetic Token Filter
Synonym Token Filter
Compound Word Token Filter
Reverse Token Filter
Elision Token Filter
Truncate Token Filter
Unique Token Filter
Pattern Capture Token Filter
Pattern Replace Token Filter
Trim Token Filter
Limit Token Count Token Filter
Hunspell Token Filter
Common Grams Token Filter
Normalization Token Filter
CJK Width Token Filter
CJK Bigram Token Filter
Delimited Payload Token Filter
Keep Words Token Filter
Keep Types Token Filter
Classic Token Filter
Apostrophe Token Filter
Decimal Digit Token Filter
Index
Stores:
the original document
+ some metadata
+ terms in inverted index
Analysis
Input
Tokenizer
Standard
Character Filter
HTMLStripper
TokenFilter
Stopwords
TokenFilter
Lowercase
Index
"<div>The Quick Brown Fox Jumps Over The Lazy Dog</div>"
The Quick Brown Fox Jumps Over The Lazy Dog
"The", "Quick", "Brown", "Fox", "Jumps", "Over", "The", "Lazy", "Dog"
["Quick", "Brown", "Fox", "Jumps", "Over", "Lazy", "Dog"]
["quick", "brown", "fox", "jumps", "over", "lazy", "dog"]
Inverted Index
Inverted Index
Term | Document |
---|---|
quick | 1,2 |
brown | 1 |
fox | 1,2 |
jump | 2,6,7,8,9 |
over | 1,2,4 |
lazy | 1 |
dog | 1,2 |
Querying
and filtering
_search
query DSL (domain specific language) allows us to express complex definitions of how to slice & dice data
Simple Query
curl -XGET 'http://kibana.tre.se:9200/_all/_search' -d '{
{
"query": {
"match_all": {}
}
}'
Complex Query
curl -XGET 'http://kibana.tre.se:9200/_all/_search?pretty' -d '{
"facets": {
"0": {
"date_histogram": {
"field": "@timestamp",
"interval": "30m"
},
"global": true,
"facet_filter": {
"fquery": {
"query": {
"filtered": {
"query": {
"query_string": {
"query": "duration:<100"
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"@timestamp": {
"from": 1457449871001,
"to": "now"
}
}
},
{
"fquery": {
"query": {
"query_string": {
"query": "application:(\"smart-rest\")"
}
},
"_cache": true
}
},
{
"fquery": {
"query": {
"query_string": {
"query": "logname:(\"messageRestLog\")"
}
},
"_cache": true
}
}
],
"must_not": [
{
"fquery": {
"query": {
"query_string": {
"query": "client:(\"SMARTFunctionalTests\")"
}
},
"_cache": true
}
},
{
"fquery": {
"query": {
"query_string": {
"query": "application:(\"smart-rest-legacy\")"
}
},
"_cache": true
}
}
]
}
}
}
}
}
}
},
"1": {
"date_histogram": {
"field": "@timestamp",
"interval": "30m"
},
"global": true,
"facet_filter": {
"fquery": {
"query": {
"filtered": {
"query": {
"query_string": {
"query": "duration:[100 TO 500]"
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"@timestamp": {
"from": 1457449871002,
"to": "now"
}
}
},
{
"fquery": {
"query": {
"query_string": {
"query": "application:(\"smart-rest\")"
}
},
"_cache": true
}
},
{
"fquery": {
"query": {
"query_string": {
"query": "logname:(\"messageRestLog\")"
}
},
"_cache": true
}
}
],
"must_not": [
{
"fquery": {
"query": {
"query_string": {
"query": "client:(\"SMARTFunctionalTests\")"
}
},
"_cache": true
}
},
{
"fquery": {
"query": {
"query_string": {
"query": "application:(\"smart-rest-legacy\")"
}
},
"_cache": true
}
}
]
}
}
}
}
}
}
},
"2": {
"date_histogram": {
"field": "@timestamp",
"interval": "30m"
},
"global": true,
"facet_filter": {
"fquery": {
"query": {
"filtered": {
"query": {
"query_string": {
"query": "duration:[500 TO 2000]"
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"@timestamp": {
"from": 1457449871002,
"to": "now"
}
}
},
{
"fquery": {
"query": {
"query_string": {
"query": "application:(\"smart-rest\")"
}
},
"_cache": true
}
},
{
"fquery": {
"query": {
"query_string": {
"query": "logname:(\"messageRestLog\")"
}
},
"_cache": true
}
}
],
"must_not": [
{
"fquery": {
"query": {
"query_string": {
"query": "client:(\"SMARTFunctionalTests\")"
}
},
"_cache": true
}
},
{
"fquery": {
"query": {
"query_string": {
"query": "application:(\"smart-rest-legacy\")"
}
},
"_cache": true
}
}
]
}
}
}
}
}
}
},
"6": {
"date_histogram": {
"field": "@timestamp",
"interval": "30m"
},
"global": true,
"facet_filter": {
"fquery": {
"query": {
"filtered": {
"query": {
"query_string": {
"query": "duration:>2000"
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"@timestamp": {
"from": 1457449871002,
"to": "now"
}
}
},
{
"fquery": {
"query": {
"query_string": {
"query": "application:(\"smart-rest\")"
}
},
"_cache": true
}
},
{
"fquery": {
"query": {
"query_string": {
"query": "logname:(\"messageRestLog\")"
}
},
"_cache": true
}
}
],
"must_not": [
{
"fquery": {
"query": {
"query_string": {
"query": "client:(\"SMARTFunctionalTests\")"
}
},
"_cache": true
}
},
{
"fquery": {
"query": {
"query_string": {
"query": "application:(\"smart-rest-legacy\")"
}
},
"_cache": true
}
}
]
}
}
}
}
}
}
}
},
"size": 0
}'
Close
Querys
All querys are analyzed.
(So we can use the inverted index we created)
Querys
Results are 'scored'
Filter
{
"query": {
"bool": {
"must": {
"match": {
"body": "Java"
}
},
"filter": {
"term": {
"comment_count": 5
}
}
}
}
}
doc: |1|2|3|4|5|6|7|8|..
match: |1|0|0|1|0|1|0|1|..
Filter
Querys
Evaluates to what degree each document matches and results in a scored result.
Is computed every time we query.
can not be cached.
Filters
Evaluates whether a document matches or not and results in a filter bitset.
Does not need to be computed every time.
can be cached
Mappings
_mappings
{
"@version": "1",
"@timestamp": "2016-03-10T14:43:39.762Z",
"host": "x13067pzz.omaccess.net",
"logtimestamp": "2016-03-10 15:43:39,264",
"callId": "0000015354917041-400150",
"user": "piippo59@gmail.com",
"uri": "/smart/api/rest/customers/525056841829790123726121104565/notifications",
"strippedUri": "/smart/api/rest/customers/NUMBER/notifications",
"method": "GET",
"statusCode": 200,
"startTime": "2016-03-10T15:43:39.206+01:00",
"duration": 58,
"client": "3Sverige",
"accept": "application/vnd.com.hi3gaccess.smart-v2+json",
"agent": "",
"isAdmin": "",
"authMethod": "Interactive",
"size": 6296,
"cache": "NONE",
"applicationId": "se.tre.ios.3sverige",
"applicationVersion": "3.6",
"json_responseBody": "{\"notifications\":[{\"my3WebLink\":{\"buttonText\":\"Visa faktura\",\"url\":\"Mitt3/Faktura/#new-invoice\"},\"id\":\"0a877b32-252f-42ce-bd51-0eb48207e5b4\",\"text\":\"Du har en ny faktura med förfallodag 2016-02-26.\",\"status\":\"UNREAD\",\"ownerId\":\"10079876446\",\"shortText\":\"Ny faktura\",\"my3AppLink\":{\"buttonText\":\"Visa faktura\",\"url\":\"my3://Invoice.details/516521775523\"},\"notificationOwner\":\"account\",\"date\":\"2016-02-25\",\"eventTypeId\":\"newInvoice\"},{\"my3WebLink\":{\"buttonText\":\"Visa faktura\",\"url\":\"Mitt3/Faktura/#new-invoice\"},\"id\":\"e6a50bc2-eb7c-4e95-a769-3c57e9761d7d\",\"text\":\"Du har en ny faktura med förfallodag 2016-01-28.\",\"status\":\"UNREAD\",\"ownerId\":\"10079876446\",\"shortText\":\"Ny faktura\",\"my3AppLink\":{\"buttonText\":\"Visa faktura\",\"url\":\"my3://Invoice.details/516424402423\"},\"notificationOwner\":\"account\",\"date\":\"2016-01-19\",\"eventTypeId\":\"newInvoice\"},{\"my3WebLink\":{\"buttonText\":\"Visa faktura\",\"url\":\"Mitt3/Faktura/#new-invoice\"},\"id\":\"46551726-b721-4ac7-aca3-9d69982bc19f\",\"text\":\"Du har en ny faktura med förfallodag 2015-12-28.\",\"status\":\"READ\",\"ownerId\":\"10079876446\",\"shortText\":\"Ny faktura\",\"my3AppLink\":{\"buttonText\":\"Visa faktura\",\"url\":\"my3://Invoice.details/516287234020\"},\"notificationOwner\":\"account\",\"date\":\"2015-12-14\",\"eventTypeId\":\"newInvoice\"},{\"my3WebLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"Mitt3/Abonnemang/Abonnemangshantering?ent=${isid}\"},\"id\":\"5449f210-f8c6-4205-a5aa-d714b157b93f\",\"text\":\"46760171330 har förbrukat de 512 MB data som ingår per månad. Köp extra surf om du vill fortsätta surfa. Den extra surfmängd du köper gäller tills den är förbrukad men som längst till månadsskiftet.\",\"status\":\"READ\",\"ownerId\":\"163954130\",\"shortText\":\"Inkluderad data är slut\",\"my3AppLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"my3://Subscription.Voucher.National?msisdn=46760171330\"},\"notificationOwner\":\"subscription\",\"date\":\"2015-12-17\",\"eventTypeId\":\"outOfFreeUnits\"},{\"my3WebLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"Mitt3/Abonnemang/Abonnemangshantering?ent=163954113\"},\"id\":\"c9869eff-df61-402f-a1b6-017cfad6fd5a\",\"text\":\"46760185635 har förbrukat de 20480 MB data som ingår per månad. Köp extra surf om du vill fortsätta surfa. Den extra surfmängd du köper gäller tills den är förbrukad men som längst till månadsskiftet.\",\"status\":\"UNREAD\",\"ownerId\":\"163954113\",\"shortText\":\"Inkluderad data är slut\",\"my3AppLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"my3://Subscription.Voucher.National?msisdn=46760185635\"},\"notificationOwner\":\"subscription\",\"date\":\"2016-02-23\",\"eventTypeId\":\"outOfFreeUnits\"},{\"my3WebLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"Mitt3/Abonnemang/Abonnemangshantering?ent=163954113\"},\"id\":\"bf5d8ada-6606-44c4-99b2-f0f69744eff0\",\"text\":\"46760185635 har förbrukat de 20480 MB data som ingår per månad. Köp extra surf om du vill fortsätta surfa. Den extra surfmängd du köper gäller tills den är förbrukad men som längst till månadsskiftet.\",\"status\":\"UNREAD\",\"ownerId\":\"163954113\",\"shortText\":\"Inkluderad data är slut\",\"my3AppLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"my3://Subscription.Voucher.National?msisdn=46760185635\"},\"notificationOwner\":\"subscription\",\"date\":\"2016-01-21\",\"eventTypeId\":\"outOfFreeUnits\"},{\"my3WebLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"Mitt3/Abonnemang/Abonnemangshantering?ent=${isid}\"},\"id\":\"f675e05b-f474-4144-be6f-3552e87c317b\",\"text\":\"46760185635 har förbrukat de 20480 MB data som ingår per månad. Köp extra surf om du vill fortsätta surfa. Den extra surfmängd du köper gäller tills den är förbrukad men som längst till månadsskiftet.\",\"status\":\"UNREAD\",\"ownerId\":\"163954113\",\"shortText\":\"Inkluderad data är slut\",\"my3AppLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"my3://Subscription.Voucher.National?msisdn=46760185635\"},\"notificationOwner\":\"subscription\",\"date\":\"2015-12-16\",\"eventTypeId\":\"outOfFreeUnits\"},{\"my3WebLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"Mitt3/Abonnemang/Abonnemangshantering?ent=163929804\"},\"id\":\"a427cfa2-1c11-4b28-93b5-b6e6656bfedd\",\"text\":\"46760071705 har förbrukat de 512 MB data som ingår per månad. Köp extra surf om du vill fortsätta surfa. Den extra surfmängd du köper gäller tills den är förbrukad men som längst till månadsskiftet.\",\"status\":\"UNREAD\",\"ownerId\":\"163929804\",\"shortText\":\"Inkluderad data är slut\",\"my3AppLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"my3://Subscription.Voucher.National?msisdn=46760071705\"},\"notificationOwner\":\"subscription\",\"date\":\"2016-03-02\",\"eventTypeId\":\"outOfFreeUnits\"},{\"my3WebLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"Mitt3/Abonnemang/Abonnemangshantering?ent=${isid}\"},\"id\":\"1385dd11-5f83-4b0e-ae00-74b7f5fe7178\",\"text\":\"46760071705 har förbrukat de 512 MB data som ingår per månad. Köp extra surf om du vill fortsätta surfa. Den extra surfmängd du köper gäller tills den är förbrukad men som längst till månadsskiftet.\",\"status\":\"UNREAD\",\"ownerId\":\"163929804\",\"shortText\":\"Inkluderad data är slut\",\"my3AppLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"my3://Subscription.Voucher.National?msisdn=46760071705\"},\"notificationOwner\":\"subscription\",\"date\":\"2016-01-06\",\"eventTypeId\":\"outOfFreeUnits\"},{\"my3WebLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"Mitt3/Abonnemang/Abonnemangshantering?ent=${isid}\"},\"id\":\"bb65db9c-4ba6-4097-8f39-4ee328403780\",\"text\":\"46760071705 har förbrukat de 512 MB data som ingår per månad. Köp extra surf om du vill fortsätta surfa. Den extra surfmängd du köper gäller tills den är förbrukad men som längst till månadsskiftet.\",\"status\":\"READ\",\"ownerId\":\"163929804\",\"shortText\":\"Inkluderad data är slut\",\"my3AppLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"my3://Subscription.Voucher.National?msisdn=46760071705\"},\"notificationOwner\":\"subscription\",\"date\":\"2015-12-09\",\"eventTypeId\":\"outOfFreeUnits\"},{\"my3WebLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"Mitt3/Abonnemang/Abonnemangshantering?ent=163929802\"},\"id\":\"2f0654ac-66de-4f7e-a5fb-7ce691b001ab\",\"text\":\"46760071704 har förbrukat de 5120 MB data som ingår per månad. Köp extra surf om du vill fortsätta surfa. Den extra surfmängd du köper gäller tills den är förbrukad men som längst till månadsskiftet.\",\"status\":\"UNREAD\",\"ownerId\":\"163929802\",\"shortText\":\"Inkluderad data är slut\",\"my3AppLink\":{\"buttonText\":\"Köp extra surf\",\"url\":\"my3://Subscription.Voucher.National?msisdn=46760071704\"},\"notificationOwner\":\"subscription\",\"date\":\"2016-01-27\",\"eventTypeId\":\"outOfFreeUnits\"}]}",
"resource": "customers",
"subresource1": "525056841829790123726121104565",
"subresource2": "notifications",
"application": "smart-rest",
"logname": "messageRestLog"
}
How to interpret..
Option 1: Let elasticsearch figure it out.
Option 2: Specify exactly what each field means
Mappings live in indexes.
Suggestions
_suggest
we know it and love it...
Can be based on:
term
phrase
completion
context
Creates an FST
Fancy Smart Thing
-or-
Finite State Transducer
Aggregations
_search
...but differently
When we search, we get
Data
Metadata
3 kinds of metatdata
Buckets
Metrics
Pipelines
Buckets
Each document is evaluated against the created buckets and each bucket keeps track of what documents “fall” in it.
Metrics
Typically, metrics aggregations generate numeric stats that are computed over a specific document set.
Avg, Cardinality, Extended Stats, Geo Bounds, Geo Centroid, Max, Min, Percentiles, Percentile Ranks, Scripted Metric, Stats, Sum, Top hits, Value Count
Pipelines
Aggregates on the aggregated data
Percolation
_percolate
Like search, reversed
Store the query's and ask if document matches.
"Send a message when temperature is over x degrees"
"Send a message when number of failures per minute is above x"
Alerting
like triggers, but different...
"If document contains X, then add field foo and bar"
Classification
tags for example...
Integration
Get data into elasticsearch
_index
Get data out of elasticsearch
_search
programmatically...
My favourite language...
Java API
JavaScript API
Groovy API
.NET API
PHP API
Perl API
Python API
Ruby API
+Community Contributed Clients
Kibana
Grafana
2 days
476 slides
54 lab exercises
.questions(?)
Elasticsearch
By maderskog
Elasticsearch
- 423