New search engine on top of Elasticsearch
Agenda
- What is Elasticsearch
- Legacy search engine v.s. new search engine on top of ES
- The prototype
- Demo
What is Elasticsearch
Distributed search and analytics engine
Core of Elastic Stack

Overview
- Search and analytics engine
- Open source
- Works on multiple platforms: Windows, Linux, Mac.
- Powered by Apache Lucene
- Distributed -> Near-realtime response/Fault tolerance
- Functionalities available via REST APIs
- Extensions: X-pack
Key concepts - Logical
- Index
- Type (deprecated)
- Document
- Field
Relational DataBase Analogy
| Relational DB | Elasticsearch |
|---|---|
| Database | Index |
| Table | Type |
| Row | Document |
| Column | Field |
Key concepts - Physcal
- Cluster
- Node
- Master node
- Coordinating node
- Shards
- Primary shards
- Replica shards
Elasticsearch Cluster

Legacy search engine v.s. new search engine
The architecture of legacy search engine

Share
Independent


Limitation: lock contention
Limitation:
1. Needs synchronization between nodes.
2. Difficult to support cross project search for asymmetric cluster
The architecture of the new search engine

The prototype
What we have done so far
Configuration
- A new setting in server definition
- A JSON string looks like:
{
"network.host": "192.168.0.1",
"http.port": "9200"
}Feature flag
To control whether the server should instantiate the legacy search engine or the Elastic one

Get rid of the multi-process support

Stub out the skeleton of the Elasticsearch engine
class ElasticCrawTool : public SECrawlTool
{
public:
virtual Int32 Crawl(SEIndexMetadata& irMetadata) override;
virtual Int32 Pause(SEIndexMetadata& irMetadata) override;
virtual Int32 Resume(SEIndexMetadata& irMetadata) override;
virtual Int32 Complete(SEIndexMetadata& irMetadata) override;
virtual Int32 Destroy(SEIndexMetadata& irMetadata) override;
virtual Int32 Enable(SEIndexMetadata& irMetadata) override;
virtual Int32 SaveInitialCrawlProgress(SEIndexMetadata& irMetadata) override;
virtual Int32 SaveIncrementalCrawlProgress(SEIndexMetadata& irMetadata) override;
virtual Int32 StartIncrementalCrawl(SEIndexMetadata& irMetadata) override;
virtual Int32 StopIncrementalCrawl(SEIndexMetadata& irMetadata) override;
virtual Int32 Pend(SEIndexMetadata& irMetadata) override;
};
class ElasticWriter : public SEWriter
{
public:
virtual Int32 Update(UpdateCommand::SmartPtr iDocPtr) override;
virtual Int32 Update(std::vector<UpdateCommand::SmartPtr>& irDocs) override;
virtual Int32 UpdateImmediately(std::vector<UpdateCommand::SmartPtr>& irDocs, MBase::String& irProjectID) override;
};
class ElasticSearcher : public SESearcher
{
public:
virtual SearchResult::SmartPtr Search(SearchParameter::SmartPtr iParamPtr, UserInfo& irUser) override;
virtual MBase::String SearchJson(SearchParameter::SmartPtr iParamPtr, UserInfo& irUser) override;
virtual MBase::String SearchWithContext(ContextParameters::SmartPtr iParamPtr) override;
virtual Int32 NumDoc(MBase::String& irProjectID) override;
virtual MBase::String GetModifyTime(MBase::String& irProjectID, MBase::String& irObjectID) override;
};Porting over the java implementation
- Done
- Move the index state to Elasticsearch
- Full index
- Query
- Incremental index
- Destroy index
Demo
Questions
New search engine on top of Elasticsearch
By bawu
New search engine on top of Elasticsearch
- 24