Migrating Search from MySQL to Elasticsearch
Context: Ad Search
- Search for ads via UI
- Search for ads via API
- Enable Cassandra search
Legacy Search
- 500+ sloc PHP function
- query = string concatenation
- multiple joins
- 3 seconds+ response time
New Search
- Clean code implementation
- Extendable schema
- Less than 1 second response time
Goals
- Index 90MM+ objects in Elasticsearch
- Sync 1.5MM changes per day
Solution
- Index 90MM+ objects in Elasticsearch
- Bulk Indexer Application
- Sync 1.5MM changes per day
- Live Indexer Application
Live Indexer
1. API: Emit change notification {CREATED, UPDATED, DELETED}
2. Kafka: stores change notifications
3. Live Indexer: Consume Change notifications
4. Live Indexer: Expands change notification to full object
5. Elasticsearch: Index object
Bulk Indexer
Cronjob that reads objects from MySQL using an unbuffered connection and streams inputs into Elasticsearch.
Indexing rate: ~9k/sec => ~3 hrs
Hardware/Software
- Elasticsearch
- 3 Nodes
- 1 TB HDD
- 60 Gb RAM
- Kafka Topic
- 3 partitions
- 1 day retention
- Live indexer/bulk indexer
- Python, asyncio
Attempted solutions
Attempted solution #1: Logstash
Pros:
- Minimal code set up
- Plugin support
- Builtin Kafka, MySQL, Elasticsearch
Cons:
- HTTP plugin doesn't support token auth
- Requires plugins written in Ruby
- MySQL plugin doesn't support multiple queries
Logstash: Extract, Transform, Load tool
Attempted solution #2: FileBeats
Pros:
- Minimal code set up
- Minimal infrastructure
Cons:
- Requires persisted volume mount on application
File Beat: Application log shipper
Attempted solution #3
Combine Bulk Indexer & Live Indexer
Attempted solution #3
Combine Bulk Indexer & Live Indexer
Attempted solution #3
Combine Bulk Indexer & Live Indexer
150k/sec
11k/sec
4k/sec
Attempted solution #3
Combine Bulk Indexer & Live Indexer
Approach | Indexing Rate | Index build time |
---|---|---|
Logstash | ~1.2k/sec | 21 hours |
Live Indexer (python) | ~4k/sec | 6.25 hours |
Bulk Indexer (unbuffered MySQL connection) | ~9k/sec | ~3 hours |
Final solution
Bulk Indexer:
9k/sec => ~3hrs
Live Indexer:
max: 4k/sec
Index Cutover
Index Cutover
Index Cutover
Index Cutover
End
Search Project
By Simply Ahmazing
Search Project
- 565