SECURE & RELAIBLE DATA STORAGE
WITH ELASTICSEARCH

A CASE STUDY BY Tomasz Banasiak
ABOUT ME

-
WORKING IN RST FOR 3 YEARS
-
WORKING AS A DEVELOPER FOR 8 YEARS
-
LOVES TO SHARE KNOWLEDGE AND DISCUSS

THE PROBLEM
Image: http://www.factsandheresies.com
THE PROBLEM

-
NEED TO STORE AND SEARCH 14 BLN DOCUMENTS
-
3MLN+ MESSAGES EACH DAY
-
250+ MESSAGES/SEC DURING THE PEAK
-
MESSAGES CONTAIN IMPORTANT DATA
-
MESSAGES MUST BE SAVED
-
BUSINESS NEEDS FOR FILTERING
INITIAL STATE



Message
Communication Server

Client's APP

Database



...
600 shards
THE OBSTACLES

-
MESSENGER CURRENTLY USES CUSTOM PROTOCOL
-
MESSENGER MIGRATES TO XMPP IN NEAR FUTURE
-
NEED TO MIGRATE WHOLE DATA
-
USERS MUST HAVE CONTINOUS ACCESS TO ARCHIVE
-
DURING MIGRATION WE NEED TO SUPPORT BOTH DATABASES

THE OLD SOLUTION
Image: 123rf.com
OLD SOLUTION

-
MYSQL DATABASE
-
SIMPLE JOIN QUERIES FOR SEARCH (NON-FULLTEXT)
-
PROCEDURAL SHARDING TO HARDLIMITED INSTANCES
-
ALMOST REACHED MAXIMUM STORAGE CAPABILITY (LIMITED BY HARDWARE)

THE NEW APPROACH
THE NEW APPROACH GOALS

-
SUPPORT 10K WRITE REQs per SECOND
-
SERVICE INDEPENDEND ARCHITECTURE
-
EASILY TO SCALE HORIZONTALLY
-
REST API FOR SEARCHING AND BROWSING
-
USE OLD CLIENT DURING MIGRATION
THE DATABASE

cassandra



MYSQL / MYROCKS + SPHINX
ELASTICSEARCH
TECHNOLOGY

-
ELASTICSEARCH AS A DATABASE (YES, REALLY)
-
NODE.JS FOR PROCESSING SERVICES
-
PHP (ZEND's APIGILITY) AS A REST SERVICE
-
REDIS FOR CACHE, RABBITMQ FOR QUEUES
ELASTICSEARCH AS A DATABASE

-
IS CONSIDERED JUST AS A SEARCH ENGINE
-
EASILY SCALED VERTICALLY AND HORIZONTALLY
-
ES2.4+ SOLVES MOST BACKUPING PROBLEMS
-
DOING WELL IN AUTO-HEALING
-
MONITORING TOOLS (MARVEL, KOPF, HQ ETC.)
ELASTICSEARCH AS A DATABASE: CONS

-
IT WAS DESIGNED AS A SEARCH ENGINE
-
MORE SHARDS/REPLICAS INCREASE WRITE TIME
-
MANY KNOBS TO TWEAK
-
NOT VIRTUALIZATION FRIENDLY
-
2.4+ HAS PROBLEMS WITH INDEX REBUILD
-
UPDATE IS ALMOST IMPOSSIBLE

ARCHITECTURE
ARCHITECTURE: DATA MIGRATION



Message
Communication Server

Client's APP

Database



...



QUEUE
DAEMON
CONSUMER
DAEMON



Elasticsearch Cluster
Redis 0

Redis 1 Cluster

Ext Services


ARCHITECTURE: READ THE ARCHIVE


User
Messages from January to Feburary

Client's APP

REST API



archive_2017_01

Mobile APP
archive_2017_02
archive_2016_12

ES Cluster
ARCHITECTURE: WRITE TO ARCHIVE



Message
XMPP Server

Client's APP


Nginx
CONSUMER
DAEMON


Elasticsearch Cluster

Redis Cluster

Ext Services
Message Queue
Invalid MSG Queue



ARCHITECTURE: CHAOS RESISTANT



Message
XMPP Server

Client's APP


Nginx
CONSUMER
DAEMON


Elasticsearch Cluster

Redis 1

Ext Services
Message Queue
Invalid MSG Queue









PROBLEMS & SOLUTIONS
PROBLEMS & SOLUTIONS

-
SERVICE INDEPENdece REQUIRES LOOOT MORE SPACE
-
Hard drivers are cheap ;)
-
-
ES WORKS WELL BY DEFAULT ONLY AS A SEARCH ENGINE
-
STABLE CLUSTER REQUIRES SOME TWEAKS
-
-
BACKUPS ARE HARD IF YOU CANNOT CLOSE INDEX
THINGS TO REMEMBER

-
ES IS GREAT AS A READ-ONLY DATABASE
-
POC IS IMPORTANT PART OF DEVELOPMENT
-
... BUT THE PRODUCTION MAKES THE VERDICT
-
+ ALWAYS REMEMBER ABOUT CHAOS MONKEY
THINGS TO REMEMBER

EACH FAILURE LEADS TO BETTER EXPERIENCE







QuESTIONS TIME!

THANK YOU!
Tomasz Banasiak
http://banasiak.pro
RST.COM.PL
Icons by: http://www.flaticon.com
Secure & Relaible data storage with Elasticsearch
By Tomasz Banasiak
Secure & Relaible data storage with Elasticsearch
- 908