elasticsearch

kibana

Elasticsearch

  • is real-time
  • is a distributed search and analytics engine
  • is a document store

Kibana

  • visualisering av data i elasticsearch
    • Tabeller och grafer
    • Geo data
    • Tidserier
    • Relationer i grafdata

Basics

Node 1

Node 2

Node 3

Index A

Shard 1

Index A

Shard 2

Index A

Shard 3

Index A

Replica 2

Index A

Replica 3

Index A

Replica 1

Cluster 1

Kibana

Index

  • Index with static size
    • job, employee, candidate
  • Continuously growing index
    • ​logs, transactioner etc
      • serilog-2018.01.01

Data input

What's a document

{
    "name": "John Doe",
    "age": 42,
    "confirmed": true,
    "created": "2018-01-01T12:00:00",
    "adress": {
        "street": "Gatan 1",
        "zip": "12344",
        "city": "Farsta"
    },
    "tags": [
        { "type": "Category", "value": "IT" },
        { "type": "Employment period", "value": "Deltid" }
    ]
}

Document metadata

  • _index: Name of the index the document lives in
  • _type: Name of the type of the document (< 6)
  • _id: Unique id of a document
  • Settings
  • Analyzers
  • Mappings

Schemas

Different types of search

Boolean searches

  • Efficient                                                                                   
  • Match or no match 
  • Like WHERE in sql
  • Does the data match?

Full text search

  • Slower then boolean searches (more efficient than % searches in sql)
  • Give result with a relevans to the search
  • How well does the data match

Combinations

Inverted index

How  the data is stored in elastic explains searches

Given the following documents:

  1. Den snabba bruna räven hoppar över den lata hunden
  2. Snabba bruna rävar hoppar över lata hundar på sommaren

Inverted index

         |  1  |  2  |
----------------------
Den      |  x  |     |
---------|-----|-----|
snabba   |  x  |     |
---------|-----|-----|
bruna    |  x  |  x  |
---------|-----|-----|
räven    |  x  |     |
---------|-----|-----|
hoppar   |  x  |  x  |
---------|-----|-----|
över     |  x  |  x  |
---------|-----|-----|
den      |  x  |     |
---------|-----|-----|
lata     |  x  |  x  |
---------|-----|-----|
hunden   |  x  |     |
---------|-----|-----|
Snabba   |     |  x  |
---------|-----|-----|
rävar    |     |  x  |
---------|-----|-----|
hundar   |     |  x  |
---------|-----|-----|
på       |     |  x  |
---------|-----|-----|
sommaren |     |  x  |
----------------------

Query: snabba bruna

Index

Terms  |  1  |  2  |
--------------------
snabba |  x  |     |
-------|-----|-----|
bruna  |  x  |  x  |
--------------------
Total  |  2  |  1  |

Normalisering

         |  1  |  2  |
----------------------
den      |  x  |     |
---------|-----|-----|
snabb    |  x  |  x  |
---------|-----|-----|
bruna    |  x  |  x  |
---------|-----|-----|
räv      |  x  |  x  |
---------|-----|-----|
hoppa    |  x  |  x  |
---------|-----|-----|
över     |  x  |  x  |
---------|-----|-----|
lata     |  x  |  x  |
---------|-----|-----|
hund     |  x  |  x  |
---------|-----|-----|
på       |     |  x  |
---------|-----|-----|
sommaren |     |  x  |
----------------------

Query: snabba bruna

Index

Terms  |  1  |  2  |
--------------------
snabb  |  x  |  x  |
-------|-----|-----|
brun   |  x  |  x  |
--------------------
Total  |  2  |  2  |

Analysis

  1. Character filters
    • Per tecken tranformering
    • rensa html, w -> v
  2. Tokenizer
    • Dela upp texten till ord
  3. Token filters
    • lowercase, synonyms, stemming etc

Standard analysers

  1. Standard analyzer
    • Word boundaries by unicode standard, erase most punctuations, lower case (generally best choice)
  2. Simple analyzer
    • Splits the text on anything that isn’t a letter, and lowercases the terms
  3. Whitespace analyzer
    • Split on whitespace, does not lowercase
  4. Language analyzer
    • Language specific analyzers

When are analyzers used?

On all full text fields

It is used when indexing and when searching on the search string

Mapping

{
    "mappings": {
        "candidate": {
            "properties": {
                "name": { "type": "text" },
                "birth_date": { "type": "date" },
                "adress": {
                    "properties": {
                        "street": { "type": "text" },
                        "zipcode": { "type": "long" },
                        "city": { "type": "keyword" }
                    }
                },
                "contacts": {
                    "properties": {
                        "home_phone": { "type": "keyword" },
                        "modile_phone": { "type": "keyword" },
                        "email": { "type": "keyword" }
                    }
                },
                "ambition": {
                    "type": "text"
                }
            }
        }
    }
}

Index mapping

Queries

Match_All

POST candidates/_search

POST candidates/_search {}

POST candidates/_search
{
  "query": {
    "match_all": {}
  }
}

Match

// match on full text
POST candidates/candidate/_search
{
  "query": {
    "match": {
      "name": "Maria"
    }
  }
}

POST candidates/candidate/_search
{
  "query": {
    "match": {
      "adress.city": "Karlstad"
    }
  }
}

Term/Terms

// Ok
POST candidates/candidate/_search
{
  "query": { "term": { "name": "maria" } }
}

// Fail wrong casing
POST candidates/candidate/_search
{
  "query": { "term": { "name": "Maria" } }
}

// Ok
POST candidates/candidate/_search
{
  "query": { "term": { "adress.city": "Karlstad" } }
}

// Fail wrong casing
POST candidates/candidate/_search
{
  "query": { "term": { "adress.city": "karlstad" } }
}

Range

POST candidates/candidate/_search
{
  "query": { 
    "range": {
      "adress.zipcode": {
        "gte": 13501
      }
    }
  }
}

POST candidates/candidate/_search
{
  "query": { 
    "range": {
      "adress.zipcode": {
        "lte": 13500
      }
    }
  }
}

Aggregations

Types

  1. Metrics
    • Min, max, percentiles etc
  2. Buckets
    • Grupperingar så som: Terms, Histogram, Date histograms etc
  3. Pipeline
    • Nested aggs

Aggregeringar together with search

  • Facetterad search
  • Combination of filters and search
  • Aggregations are used for filters
  • Most common aggregation is grouping of tokens (term agg)
  • Histogram (spread of numbers, ex age, salary or price)

Terms

GET /_search
{
    "aggs" : {
        "genres" : {
            "terms" : { "field" : "genre" }
        }
    }
}

Histogram

POST /sales/_search?size=0
{
    "aggs" : {
        "prices" : {
            "histogram" : {
                "field" : "price",
                "interval" : 50
            }
        }
    }
}

Kibana

Applikationer

Discover

Visualize

Dashboard

Timelion

Graph

Dev tools

Monitor

Management

Discover

Ett verktyg för att göra data utforskning genom sökning och filtrering

https://lucene.apache.org/core/2_9_4/queryparsersyntax.html

Visualize

  • Tabeller
  • Grafer
    • Pie
    • Bar
    • Line
    • Area
    • Histogram
    • Datehistogram
    • Heatmap
  • Tidserie grafer

Dashboard

Gruppering av sökningar och visualiseringar

Elasticsearch Kibana

By fhelje

Elasticsearch Kibana

  • 425