SECURE & RELAIBLE DATA STORAGE
WITH ELASTICSEARCH

A CASE STUDY BY Tomasz Banasiak

ABOUT ME

  • WORKING IN RST FOR 3 YEARS

  • WORKING AS A DEVELOPER FOR 8 YEARS

  • LOVES TO SHARE KNOWLEDGE AND DISCUSS

THE PROBLEM

Image: http://www.factsandheresies.com

THE PROBLEM

  • NEED TO STORE AND SEARCH 12 BLN DOCUMENTS

  • 2-3MLN MESSAGES EACH DAY

  • 200+ MESSAGES/SEC DURING THE PEAK

  • MESSAGES CONTAIN IMPORTANT DATA

  • MESSAGES MUST BE SAVED

INITIAL STATE

Message

Communication Server

Client's APP

Database

...

600 shards

THE OBSTACLES

  • MESSENGER CURRENTLY USES CUSTOM PROTOCOL

  • MESSENGER MIGRATES TO XMPP IN NEAR FUTURE

  • NEED TO MIGRATE WHOLE DATA

  • USERS MUST HAVE CONTINOUS ACCESS TO ARCHIVE

  • DURING MIGRATION WE NEED TO SUPPORT BOTH DATABASES

THE OLD SOLUTION

Image: 123rf.com

OLD SOLUTION

  • MYSQL DATABASE

  • SIMPLE JOIN QUERIES FOR SEARCH (NON-FULLTEXT)

  • PROCEDURAL SHARDING TO HARDLIMITED INSTANCES

  • ALMOST REACHED MAXIMUM STORAGE CAPABILITY (LIMITED BY HARDWARE)

THE NEW APPROACH

THE NEW APPROACH GOALS

  • SUPPORT 10K WRITE REQs per SECOND

  • SERVICE INDEPENDEND ARCHITECTURE

  • EASILY TO SCALE HORIZONTALLY

  • REST API FOR SEARCHING AND BROWSING

  • USE OLD CLIENT DURING MIGRATION

THE DATABASE

cassandra

MYSQL / MYROCKS + SPHINX

ELASTICSEARCH

TECHNOLOGY

  • ELASTICSEARCH AS A DATABASE (YES, REALLY)

  • NODE.JS FOR PROCESSING SERVICES

  • PHP (ZEND's APIGILITY) AS A REST SERVICE

  • REDIS FOR CACHE, RABBITMQ FOR QUEUES

ELASTICSEARCH AS A DATABASE

  • IS CONSIDERED JUST AS A SEARCH ENGINE

  • EASILY SCALED VERTICALLY AND HORIZONTALLY

  • ES2.4+ SOLVES MOST BACKUPING PROBLEMS

  • DOING WELL IN AUTO-HEALING

  • MONITORING TOOLS (MARVEL, KOPF, HQ ETC.)

ELASTICSEARCH AS A DATABASE: CONS

  • IT WAS DESIGNED AS A SEARCH ENGINE

  • MORE SHARDS/REPLICAS INCREASE WRITE TIME

  • MANY KNOBS TO TWEAK

  • 2.4+ HAS PROBLEMS WITH INDEX REBUILD

  • UPDATE IS ALMOST IMPOSSIBLE

ARCHITECTURE

ARCHITECTURE: DATA MIGRATION

Message

Communication Server

Client's APP

Database

...

QUEUE

DAEMON

 

CONSUMER

DAEMON

 

Elasticsearch Cluster

Redis 0

Redis 1 Cluster

Ext Services

ARCHITECTURE: READ THE ARCHIVE

User

Messages from January to Feburary

Client's APP

REST API

archive_2017_01

Mobile APP

archive_2017_02

archive_2016_12

ES Cluster

ARCHITECTURE: WRITE TO ARCHIVE

Message

XMPP Server

Client's APP

Nginx

 

CONSUMER

DAEMON

 

Elasticsearch Cluster

Redis Cluster

Ext Services

Message Queue

 

Invalid MSG Queue

 

ARCHITECTURE: CHAOS RESISTANT

Message

XMPP Server

Client's APP

Nginx

 

CONSUMER

DAEMON

 

Elasticsearch Cluster

Redis 1

Ext Services

Message Queue

 

Invalid MSG Queue

 

PROBLEMS & SOLUTIONS

PROBLEMS & SOLUTIONS

  • SERVICE INDEPENdece REQUIRES LOOOT MORE SPACE

    • ​Hard drivers are cheap ;)

  • ES WORKS WELL BY DEFAULT ONLY AS A SEARCH ENGINE

    • ​STILL WORKING ON MAKING CLUSTER STABLE

  • BACKUPS ARE HARD IF YOU CANNOT CLOSE INDEX

THINGS TO REMEMBER

  • ES IS GREAT AS A READ-ONLY DATABASE

  • POC IS IMPORTANT PART OF DEVELOPMENT

  • ... BUT THE PRODUCTION MAKES THE VERDICT

  • + ALWAYS REMEMBER ABOUT CHAOS MONKEY

THINGS TO REMEMBER

EACH FAILURE LEADS TO BETTER EXPERIENCE

QuESTIONS TIME!

THANK YOU!

Tomasz Banasiak

http://banasiak.pro

RST.COM.PL

Icons by: http://www.flaticon.com

PHPers #6 - Secure & Relaible data storage with Elasticsearch

By Tomasz Banasiak

PHPers #6 - Secure & Relaible data storage with Elasticsearch

  • 755