SECURE & RELAIBLE DATA STORAGE
WITH ELASTICSEARCH
A CASE STUDY BY Tomasz Banasiak
ABOUT ME
-
WORKING IN RST FOR 3 YEARS
-
WORKING AS A DEVELOPER FOR 8 YEARS
-
LOVES TO SHARE KNOWLEDGE AND DISCUSS
THE PROBLEM
Image: http://www.factsandheresies.com
THE PROBLEM
-
NEED TO STORE AND SEARCH 12 BLN DOCUMENTS
-
2-3MLN MESSAGES EACH DAY
-
200+ MESSAGES/SEC DURING THE PEAK
-
MESSAGES CONTAIN IMPORTANT DATA
-
MESSAGES MUST BE SAVED
INITIAL STATE
Message
Communication Server
Client's APP
Database
...
600 shards
THE OBSTACLES
-
MESSENGER CURRENTLY USES CUSTOM PROTOCOL
-
MESSENGER MIGRATES TO XMPP IN NEAR FUTURE
-
NEED TO MIGRATE WHOLE DATA
-
USERS MUST HAVE CONTINOUS ACCESS TO ARCHIVE
-
DURING MIGRATION WE NEED TO SUPPORT BOTH DATABASES
THE OLD SOLUTION
Image: 123rf.com
OLD SOLUTION
-
MYSQL DATABASE
-
SIMPLE JOIN QUERIES FOR SEARCH (NON-FULLTEXT)
-
PROCEDURAL SHARDING TO HARDLIMITED INSTANCES
-
ALMOST REACHED MAXIMUM STORAGE CAPABILITY (LIMITED BY HARDWARE)
THE NEW APPROACH
THE NEW APPROACH GOALS
-
SUPPORT 10K WRITE REQs per SECOND
-
SERVICE INDEPENDEND ARCHITECTURE
-
EASILY TO SCALE HORIZONTALLY
-
REST API FOR SEARCHING AND BROWSING
-
USE OLD CLIENT DURING MIGRATION
THE DATABASE
cassandra
MYSQL / MYROCKS + SPHINX
ELASTICSEARCH
TECHNOLOGY
-
ELASTICSEARCH AS A DATABASE (YES, REALLY)
-
NODE.JS FOR PROCESSING SERVICES
-
PHP (ZEND's APIGILITY) AS A REST SERVICE
-
REDIS FOR CACHE, RABBITMQ FOR QUEUES
ELASTICSEARCH AS A DATABASE
-
IS CONSIDERED JUST AS A SEARCH ENGINE
-
EASILY SCALED VERTICALLY AND HORIZONTALLY
-
ES2.4+ SOLVES MOST BACKUPING PROBLEMS
-
DOING WELL IN AUTO-HEALING
-
MONITORING TOOLS (MARVEL, KOPF, HQ ETC.)
ELASTICSEARCH AS A DATABASE: CONS
-
IT WAS DESIGNED AS A SEARCH ENGINE
-
MORE SHARDS/REPLICAS INCREASE WRITE TIME
-
MANY KNOBS TO TWEAK
-
2.4+ HAS PROBLEMS WITH INDEX REBUILD
-
UPDATE IS ALMOST IMPOSSIBLE
ARCHITECTURE
ARCHITECTURE: DATA MIGRATION
Message
Communication Server
Client's APP
Database
...
QUEUE
DAEMON
CONSUMER
DAEMON
Elasticsearch Cluster
Redis 0
Redis 1 Cluster
Ext Services
ARCHITECTURE: READ THE ARCHIVE
User
Messages from January to Feburary
Client's APP
REST API
archive_2017_01
Mobile APP
archive_2017_02
archive_2016_12
ES Cluster
ARCHITECTURE: WRITE TO ARCHIVE
Message
XMPP Server
Client's APP
Nginx
CONSUMER
DAEMON
Elasticsearch Cluster
Redis Cluster
Ext Services
Message Queue
Invalid MSG Queue
ARCHITECTURE: CHAOS RESISTANT
Message
XMPP Server
Client's APP
Nginx
CONSUMER
DAEMON
Elasticsearch Cluster
Redis 1
Ext Services
Message Queue
Invalid MSG Queue
PROBLEMS & SOLUTIONS
PROBLEMS & SOLUTIONS
-
SERVICE INDEPENdece REQUIRES LOOOT MORE SPACE
-
Hard drivers are cheap ;)
-
-
ES WORKS WELL BY DEFAULT ONLY AS A SEARCH ENGINE
-
STILL WORKING ON MAKING CLUSTER STABLE
-
-
BACKUPS ARE HARD IF YOU CANNOT CLOSE INDEX
THINGS TO REMEMBER
-
ES IS GREAT AS A READ-ONLY DATABASE
-
POC IS IMPORTANT PART OF DEVELOPMENT
-
... BUT THE PRODUCTION MAKES THE VERDICT
-
+ ALWAYS REMEMBER ABOUT CHAOS MONKEY
THINGS TO REMEMBER
EACH FAILURE LEADS TO BETTER EXPERIENCE
QuESTIONS TIME!
THANK YOU!
Tomasz Banasiak
http://banasiak.pro
RST.COM.PL
Icons by: http://www.flaticon.com
PHPers #6 - Secure & Relaible data storage with Elasticsearch
By Tomasz Banasiak
PHPers #6 - Secure & Relaible data storage with Elasticsearch
- 755