CMSC389L
Week 13
Search Engines w/ Elasticsearch
Friday, April 27, 2018
Demo Setup
Search Engines
Local Event App
Let's "build" an app to search for local events.
John Berryman: https://pyohio.org/schedule/presentation/258/
Local Event App
John Berryman: https://pyohio.org/schedule/presentation/258/
Local Event App
John Berryman: https://pyohio.org/schedule/presentation/258/
Local Event App
John Berryman: https://pyohio.org/schedule/presentation/258/
Local Event App
John Berryman: https://pyohio.org/schedule/presentation/258/
Local Event App
John Berryman: https://pyohio.org/schedule/presentation/258/
Why Search Engines
- Databases are good for storing and retrieving data
- but not searching
- Want to find docs with specific terms and phrases?
- Want to score and sort documents by relevance?
- Want to perform complex query operations?
Then you need a search engine.
John Berryman: https://pyohio.org/schedule/presentation/258/
Search Engine Use Cases
-
Search Engines
- Find all products for "running shoes".
-
Log Search/Analysis
- Return all logs with user ID "12345" in them.
- How many 500-errors in the past hour for that user?
-
Geo Search
- Return all "Papa Johns" ordered by proximity to (38.989697, -76.937760).
-
Auto Completion
- Auto complete "maryla..."
- ...
Elasticsearch
Elasticsearch at a High Level
- Distributed, full-text search engine by Elastic
-
Schemaless JSON database
- "schema-optional"
- Accessed via HTTP API
- Open-sourced: https://github.com/elastic/elasticsearch
How it works: Documents
- Indexable content are JSON documents
- arbitrary, no schema required
- Consists of fields (key-value pairs)
- Contains an id
{
"_id": "938hon049j4039f",
"name": "John Dough",
"birthday": "1970-07-01T11:50:16-05:00",
"passions": [
"water skiing",
"coffee",
"wood working"
],
"address": {
"line_1": "1 Margrove Rd.",
"line_2": "",
"city": "College Park",
"country": "United States",
"zip": 20742
}
}
How it works: Types
-
Types:
- Each document belongs to a type
- Optionally specifies a type declaration
- good for performance
- specify field type, analyzer, ...etc.
"mappings": {
"people": {
"properties": {
"name": {
"type": "string",
},
"address": {
"type": "string"
}
}
},
"transactions": {
"properties": {
"timestamp": {
"type": "date",
"format": "strict_date_optional_time"
},
"message": {
"type": "string"
}
}
}
}
How it works: Indexes
- Indexes are just namespaces for your types
- Nothing to do with database indexes!
http://localhost:9200/<index>/<type>/<id>
http://localhost:9200/data/transactions/<id>
http://localhost:9200/data/products/<id>
http://localhost:9200/colink/tweets/<id>
http://localhost:9200/colink/pictures/<id>
http://localhost:9200/johndough/tweets/<id>
http://localhost:9200/johndough/pictures/<id>
Low-Level Architecture
- Can't store an entire index on 1 node
- instead, split the index into smaller pieces (shards)
- Use multiple nodes in a cluster
- What happens if a shard crashes?
- Replicate each shard multiple times
- Default: 5 shards, 1 replica
Apache Lucene
-
Apache Lucene (Java library)
- High-performance, full-text search engine
- Single index on a single node
-
so why Elasticsearch?
- ES provides a management layer on top of Lucene
- Provides:
- Replication
- Traffic distribution
- Consensus + failover
- Data sharding
- Support for multiple indexes
- HTTP API
- ...
AWS Elasticsearch Service
Why AWS ES?
- AWS ES handles cluster management for you
- Detects and replaces failed nodes
- Automatic cluster scaling
- Data durability
- Node monitoring
- integrations with AWS
That's about it...
Elasticsearch Demo!
Worksheet Tasks:
- What is the title of the movie (from /omdb) with id: kP5uCGMBRZqaOuquTh81
- How many movies in this dataset were released in 2008?
- What is the id of the movie with the title "Ghostbusters"?
Postman collection: ter.ps/389lpostman
Elasticsearch Endpoint: (See Postman)
Submit a .txt file with your answers + queries to submit server.
Wrapping Up
Codelabs:
- DynamoDB out this weekend
- ECS out next week (last one!)
Final Project:
- Checkpoint #2 this Sunday
Feedback form: ter.ps/feedback13
CMSC389L Week 13
By Colin King
CMSC389L Week 13
- 958