Intro to ElasticSearch
by @vincent_lcy
WhaT are we doing here today
For Developers
Front end / Back End
Assume you know about
JSON/REST
You dont need to be an expert in
ElasticSearch / Search / NoSQL / HTML5/ Java
I will talk about
Basics, concepts from different perspectives
Show you a demo
Add search feature to your website
Vincent LAU
Javascript, Java
@vincent_lcy
kleineblase.wordpress.com/
http://bit.ly/hkosc2013_es
Powering...
Github: Repo/Every line of Code/Users
is ES A SEARCH ENGINE?
http://www.elasticsearch.org/overview/
flexible and powerful open source, distributed real-time search and analytics engine for the cloud
IDEA ABOUT SEARCH IN 1MINUTE
Don't look for a number in a phone book sorted by names
Original data structure is inefficient for look up by value
"INDEX"
Google AS AN EXAMPLE
Crawler Index the whole Web
Photo Source: http://ianieba.com/how-to-optimize-your-site-architecture/
ElastICSEARCH..
Levenshtein Automata
Finite State Transducers
Search is Hard
Apache SEARCH STACK
Core Indexing/Search Libraries (Doug Cutting)
- Map Reduce (Doug Cutting)
= Sear Server w/ Lucene - Parser
(Doug Cutting)
= Web Crawler / Search Engine= Lucene + (Hadoop) + (Solr)
= Web Crawler / Search Engine= Lucene + (Hadoop) + (Solr)
Elastic
so Why Search
AngularJS said:
90% of applications are CRUD
I will say
Most Apps are good fit for
Search-based Navigation
Most others need search features anyway
Examples
Hotesl.com, Tripadvisor
"Faceted Search"
Look Closer
IS ES A CRAWLER?
River PLUGIN
Many Storage Services provide a feed of recent change
/_change , /_delta
ES will poll for changes and Index them automatically
http://www.elasticsearch.org/guide/en/elasticsearch/rivers/current/river.html
http://guide.couchdb.org/draft/notifications.html
is ES A DB?
Better question
is ES Good enough as DB for my app?
Security concerns
Comparison
NO Eventual consistency
each ElasticSearch operation is atomic, durable, and isolated. An operation is hashed to a specific shard, performed on it, and then replicated to all its replicas. When the operation returns, it has already been replicated to all the replicas and it is "safely" there
CAP:
Most DB FEATURES
Transaction
High Availability
Sort
Query
Data Types
Storage
In-memory Cache
ELASTICSEARCH MAY NOT BE ANSWER FOR ALL QUERY
Report generation? Archiving?
Graph-based Query ->Graph DB
neo4j
neo4j
IS IT NoSQL?
Good for small/big Data!
Key Value vs Document-based
Document: Lucene Document
The DB see its structure
->field-based query & retrieval & indexing
by Pramod J Sadalage & Martin Fowler
WHAT CAN I INDEX
ANYTHING!
Transform as Input To ES: JSON compatible
Natural Fit for document-oriented database
IS IT a WEB SERVER?
Restful HTTP API
Index / Query
Even hosting static files - site plugin
Models
Security Concern!
Nginx to route
MODELS
ES as Search Service
Pure ES powered app
(ES as web server & DB)
My TRY
HK Light Pollution Map
lightpollution.hk / www.facebook.com/lightpollutionmap
v1: CouchDB -> Auto River Feed to ES for indexing
v2: Use ES as major DB
+AngularJS for Search-based Navigation
LeafletJS/ExpressJS
LeafletJS/ExpressJS
Master Project: Search Files
Use ES +Attachment Plugin to
index Filesystems and Cloud Services
+AngularJS for Web-based, Faceted File Search
SOME DEMO
-
Basic Search
- MAPPING - How should be the index document be created
- Faceted Search
- Search Engine for files
Conclusion
Try IT!
Change your thoughts on the boundary between Search / Navigation
From Quick Prototype to Boss Level
Very EASY TO START WITH
20min to add search box to my app
ES x Bootstrap UI
Take time To Understand Everything
Knowledge to Optimize
Real Expert to Optimize the Core (Lucene!)
Building Nice Faceted Search w/ AngularJs and ES
Lots of libraries out there
What Makes ES REALLY POWERFUL
High Availability
Scalability
The most single important video you should watch
http://www.elasticsearch.org/videos/distributed-diagram/
THanks
Things not covered
Real-time monitoring
Search all your logs in a cluster (logstash, Hadoop etc)
IDEA ABOUT SEARCH IN 3 MINUTES
Term
Document
Corpus
Stop Words - and, or, is
Stemming - "argue", "argued", "argues", "arguing", and "argus" reduce to the stem "argu"
Tokenize
e.g. CKJ 我們是快樂的好兒童->我們,是,快樂,的,好,兒童
Stop Words - and, or, is
Stemming - "argue", "argued", "argues", "arguing", and "argus" reduce to the stem "argu"
Analyzer -> Combine above to get index out of text
TF-IDF
a numerical statistic which reflects how important a word is to a document in a collection or corpus
More Like This VS Fuzzy Like This VS Fuzzy Query
More Like This -> Find a similar Document
Fuzzy Like this -> comparing criteria with multiple fields
Fuzzy Query -> search against combinations generated within Levenshtein edit distance limit
Lucene
Photo source: http://www.ibm.com/developerworks/library/wa-lucene/
Search Filter
- search within search
- efficient: instead of discarding results,
optimized query ?
- vs Query : no Scoring
Examples
Term Range Filter = Term Range Query - scoring
Span Query
- take positions of terms into place
SearchFirstQuery
query for spans within first sepcific # of positions of field
SpanNearQuery
matches spans within a certain number of positions from each other
score higher if two terms closer
score higher if two terms closer
Real TIME? NEAR REAL TIME?
delay:
request w/ heavy load:
faceting, sorting
by:
indexing
disk IO
optimized:
warm up
Elastic Search
By Chun Yin Vincent Lau
Elastic Search
- 4,130