Distributed, highly available documents storage
Different APIs
• _mappings
• _create
• _update
• _search
• _suggest
• _percolate
• _explain
indices are like databases in relational databases.
Also, an index can have an alias.
Aliases are useful for reindex operations without changing application code,
Also can be used for grouping multiple indices with the same name or for create something like SQL views.
Updating an alias is performed by executing the operation of remove and add the alias, but elastic treats this as an atomic operation.
Sometimes, you need to create index on the flight... so, instead of creating the index before inserting your first document, preferred approach is to create an index template.
Sample use case is for logs, when an index per day is created automatically when 1st log line is inserted.
Mappings are like the relational database tables... more or less ;)
Mappings could be auto-created on the first document insert, but this is highly not recommended since you can't control accurately the type that the engine are going to assign to each field.
Each field type has its own attributes here some examples:
To a string you could set the analyzer (for the language), and you can configure if the field has to be searchable or not... This is a point to take in account, since you can increase the cluster performance by making searchable only the needed fields.
To a date you can specify if you should accept or not malformed dates.
Here you can PUT, or POST documents like any REST api.
If you use POST, an id will be generated, if PUT is used, previous version of document will be replaced if exists
Documents will be saved as inserted, this means that if you insert locations (or dates... or whatever) in different formats, that documents will be retrieved with that inconsistencies.
This operation can be performed with a simple DELETE HTTP request.
Deleting multiple documents by a query was deprecated because of problems with the cache state before that operations...
There is an external plugin for that.
# Looks for exact term
# And yes, a GET request can have body!
curl -XGET "http://localhost:9200/ads/ad/_search" -d '
{
"query": {
"term": {
"zipCode": "08001"
}
}
}
' | python -m json.tool
# Short form for simple queries
curl -XGET "http://localhost:9200/ads/ad/_search?q=zipCode:08001"
# Term search for analyzed field
curl -XGET "http://localhost:9200/ads/ad/_search" -d '
{
"query": {
"term": {
"title": "thing"
}
}
}
' | python -m json.tool
terms search looks for the term in the
# Saved as received, array fits in any mapping.
curl -XPUT "http://localhost:9200/ads/ad/10" -d'
{
"id": 10,
"title": "Anohter one... really?",
"body": "bahhh.",
"price": 2000,
"location": [2.1660139, 41.3791979],
"zipCode": [ "08001", "08002" ]
}' | python -m json.tool
# Will retrieve all ads
curl -XGET "http://localhost:9200/ads/ad/_search?q=zipCode:08001" | python -m json.tool
# Will retrieve only new ad added in this file,
# because is the only one with 08002
curl -XGET "http://localhost:9200/ads/ad/_search?q=zipCode:08002" | python -m json.toolAny field can be an array, no needed mapping option is needed to handle this.
curl -XGET "http://localhost:9200/ads/ad/_search" -d '
{
"query": {
"range": {
"price": {
"gte": 15,
"lte": 20.6
}
}
}
}
' | python -m json.toolRange searches could be done easily like this
curl -XGET "http://localhost:9200/ads/ad/_search" -d '
{
"query": {
"match": {
"title": {
"query": "Another beautiful thing",
"minimum_should_match": "75%"
}
}
}
}
' | python -m json.toolMatch queries are used to gain control over searched text.
Fuzziness could be specified, also operator for various terms, the analyzer...
curl -XGET "http://localhost:9200/ads/ad/_search" -d '
{
"query": {
"match_all": {}
},
"sort": {
"price": "desc",
"id": "desc"
}
}
' | python -m json.toolSort does not hide any secret, you can sort by multiple fields like in any other SQL database
The queries in Elastic search are divided in 2 main parts:
The query contains the filters that are affecting to the score.
The filter part is not affecting to the score, so, filters are faster and also cacheables by the engine.
curl -XGET "http://localhost:9200/ads/ad/_search" -d '
{
"query": {
"match": {
"title": {
"query": "Another beautiful thing",
"operator": "and"
}
}
},
"filter": {
"range": {
"price": {
"gte": 15,
"lte": 20.6
}
}
}
}
' | python -m json.tool
The score is an arbitrary number that elasticsearch assign based on how much a document matches the current query.
There are 3 concepts that ES uses to assign the score of a text search:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.625,
"hits": [
{
"_index": "ads",
"_type": "ad",
"_id": "1",
"_score": 0.625,
"_source": {
"id": 1,
"title": "Beautiful thing",
"body": "It's a lie, it's ugly thing.",
"price": 20.5,
"location": "41.3783325,2.1686425",
"zipCode": "08001"
}
},
{
"_index": "ads",
"_type": "ad",
"_id": "2",
"_score": 0.15342641,
"_source": {
"id": 2,
"title": "Another beautiful thing",
"body": "worst thing ever.",
"price": 15,
"location": [
2.1660139,
41.3791979
],
"zipCode": "08001"
}
}
]
}
}Geoqueries are expensive! you should consider how much accuracy do you really need to make searches fast.
WTF!? this is what happens when you interchange lat/long at some point
Arc: The slowest but most accurate is the arc calculation, which treats the world as a sphere. Accuracy is still limited because the world isn’t really a sphere.
Plane: The plane calculation, which treats the world as if it were flat, is faster but less accurate. It is most accurate at the equator and becomes less accurate toward the poles.
Sloppy arc: So called because it uses the SloppyMath Lucene class to trade accuracy for speed, the sloppy_arccalculation uses the Haversine formula to calculate distance. It is four to five times as fast as arc, and distances are 99.9% accurate. This is the default calculation.
The geohash are a standard to hash locations with the desired preccision in a single value.
Precision goes from 1 to 12
There are many ways to make geolocated queries in elastic search:
curl -XGET "http://localhost:9200/ads/ad/_search" -d '
{
"query": {
"match_all": {}
},
"filter": {
"geo_distance" : {
"location" : [ 2.1677398681640625, 41.37794494628906 ],
"distance" : "2000m",
"distance_type" : "plane",
"optimize_bbox" : "indexed"
}
},
"sort" : [
{
"_geo_distance" : {
"location" : [ {
"lat" : 41.37794494628906,
"lon" : 2.1677398681640625
} ],
"unit" : "m"
}
}
]
}
' | python -m json.tool