In-memory database MemSQL.

When and how to use with Drupal.

by Evgenii Nikitin

Expectations

  • Disks aren't used.
  • Data will be lost after server reboot.
  • It works super fast.

What are in-memory databases?

What are in-memory databases (IMDB)?

  • Stores data on the disk (respect ACID).
  • Can keep data in memory.
  • Distributed systems.
  • IMDB leverage SSD (no random writes).
  • IMDB is a new trend - cache is a new RAM, RAM is a new disk, disk is a new tape.
  • Bottlenecks are still here

Reality

MemSQL

MemSQL cluster

Insert operation

Read operation

MemSQL Studio

Durability. Transaction processing

Database types

Operational (OLTP) - operations accounting. Uses row-store tables.

 

Analytical (OLAP) - data analysis. Uses column-store tables.

Row-store and Column-store tables in MemSQL

Row-store (OLTP) Column-store (OLAP)
Data is stored on the disk but then loaded to the RAM where it is read from. Data is stored on the disk. Data is read from the disk.
Has longer recovery times Data is ready right after DB start
Is optimized to updated frequently and at random. Is optimized for batch UPDATE and DELETE queries.
Searches fast arbitrary data. Works effectively with reading of big amount of consistent data.
Supports few indexes. Supports one index per table only.
Data is stored as-is. More space is needed for storage then source data. Data is compressed. Less amount of storage is needed than source data especially with pre-sorting.
By default rowstore tables are created. Column tables are simply defined by key CLUSTERED COLUMNSTORE.

Should I pay for MemSQL?

1 UNIT - 8 vCPU and 32GB RAM

MemSQL - commercial database

4 UNITS - FOR FREE!

MySQL and MemSQL performance comparison

Environment: 6 vCPU, 16 GB RAM with SSD hard drive, Ubuntu 18.04.

 

Databases: MySQL 5.7.29 InnoDB, MemSQL 7.0.

 

Tables: MySQL rowstore, MemSQL rowstore, MemSQL columnstore.

 

Tests: loading, reading, updating, removing.

 

Measure: milliseconds.

Loading data from the file

Import of 11 million lines from 1.5 GB csv file

Reading data

Datasets: 11 000 000, 1 000 000, 50 000, 1 000 rows.

 

Number of run: 3

 

Cases: Rowstore tables with primary key, with indexes and without indexes, Columnstore with primary key only.

Search by primary key.
SELECT * FROM yellow_tripdata_staging WHERE id = 50

Search by parameter.
SELECT * FROM yellow_tripdata_staging WHERE payment_type = 2

Search by range

SELECT * FROM yellow_tripdata_staging WHERE fare_amount >= 10 AND fare_amount <= 20

Calculation of average value

SELECT avg(fare_amount) FROM yellow_tripdata_staging WHERE payment_type = 2

Calculation of amount of items

SELECT count(*) FROM yellow_tripdata_staging WHERE payment_type = 2

Calculation of amount of items with grouping

SELECT COUNT(tip_amount), payment_type FROM yellow_tripdata_staging GROUP BY payment_type;

Search of the string

SELECT * FROM yellow_tripdata_staging WHERE store_and_fwd_flag LIKE 'Y%'

Using a small limit

SELECT payment_type, pickup_longitude, pickup_latitude FROM yellow_tripdata_staging WHERE payment_type > 3 LIMIT 100

Using a big limit

SELECT payment_type, pickup_longitude, pickup_latitude FROM yellow_tripdata_staging WHERE payment_type > 3 LIMIT 100000

Two inner joins

SELECT pickup_longitude, pickup_latitude, r.name, p.name FROM yellow_tripdata_staging as t INNER JOIN rate_code as r ON t.rate_code_id = r.rate_code_id
INNER JOIN payment_type as p ON t.payment_type = p.payment_type

Updating data consistently

UPDATE yellow_tripdata_staging SET tip_amount = 0, tolls_amount = 0 WHERE id = %d

(10 000 rows have been updated)

Updating data using range

UPDATE yellow_tripdata_staging SET tip_amount = 0, tolls_amount = 0

WHERE id >= 1 AND id <= 10000

Adding data

Insert 10 000 rows - one by one

Adding data

Insert 10 000 rows - 10 times by 1000 rows

Conlusion

Performance of MemSQL and MySQL highly depends on the size of tables and usage of primary key in requests.

 

Scope of application of these databases:

  • Tables up to a thousands rows - MySQL.
     
  • Tables with more than a one hundred rows with frequent updates - MemSQL rowstore.
     
  • Tables with more than a dozen million rows with rare updates - MemSQL columnstore.

Common queries in Drupal

 

  • INSERT INTO cache_bootstrap () VALUES () ON DUPLICATE KEY UPDATE cid = VALUES(cid), expire = VALUES(expire), created = VALUES(created), tags = VALUES(tags), checksum = VALUES(checksum), data = VALUES(data), serialized = VALUES(serialized);

     
  • SELECT cid, data, created, expire, serialized, tags, checksum FROM cache_config WHERE cid IN ( :cids__0 ) ORDER BY cid;
  • SELECT session FROM sessions WHERE sid = :sid LIMIT 0, 1
  • SELECT * FROM users_field_data u WHERE u.uid = :uid AND u.default_langcode = 1
  • SELECT node_field_data.sticky AS node_field_data_sticky, node_field_data.created AS node_field_data_created, node_field_data.nid AS nid FROM node_field_data node_field_data WHERE (node_field_data.promote = '') AND (node_field_data.status = '') ORDER BY node_field_data_sticky DESC, node_field_data_created DESC LIMIT 10 OFFSET 0;
  • SELECT t.* FROM node__field_body t WHERE (entity_id IN () AND (langcode IN ()) ORDER BY delta ASC;
  • SELECT base_table.id AS id, base_table.path AS path, base_table.alias AS alias, base_table.langcode AS langcode FROM path_alias base_table WHERE (base_table.status = '') AND (base_table.alias LIKE '' ESCAPE '\\') AND (base_table.langcode IN ()) ORDER BY base_table.langcode ASC, base_table.id DESC

Page load /node

MemSQL (cold cache) : 1.5 - 2.6 s

MySQL (cold cache) : 2.3 - 2.7 s

 

MemSQL (hot cache) : 0.25 - 0.5 s

MySQL (hot cache) : 0.2 - 0.4 s

When MemSQL should be used with Drupal

Real-time analytics platforms
 

Dashboards
 

Search system by parameters (hotels, planes)
 

Systems with frequently updated data

 

MemSQL installation

System requirements:

4 CPU x86_64, 8 GB RAM

 RHEL/CentOS 6, Debian 8+ (Ubuntu) or Docker

 

TCP ports:

3306 - is used by aggregator.

3307 - is used by leaves.

8080 - default port for MemSQL Studio

 

Installation guide:

https://docs.memsql.com/v7.0/guides/deploy-memsql/memsql-tools/deploy-cluster/step-1/

Client software for MemSQL

MemSQL is protocol compatible with MySQL.

 

Database management application:

Any MySQL compatible (Sequel Pro)

 

PHP Drivers:

PDO_MySQL (requirement for Drupal)

MySQLi

MemSQL Drupal driver

composer require drupal/memsql
"post-update-cmd": [
    "cp -r web/modules/contrib/memsql/drivers web/"
]

Installation:

Add to composer.json to update driver folder:

Copy driver to docroot:

cp -r web/modules/contrib/memsql/drivers web/

MemSQL driver is ready

MemSQL driver is ready

$databases['default']['default'] = [
  'database' => 'drupal',
  'username' => 'root',
  'password' => '',
  'prefix' => '',
  'host' => '127.0.0.1',
  'port' => '3306',
  'namespace' => 'Drupal\\Driver\\Database\\memsql',
  'driver' => 'memsql',
];

settings.php

MemSQL/MySQL differences

  • SHARD KEY - MemSQL requires SHARD KEY or PRIMARY KEY is used by default.

 

  • UNIQUE INDEX - PRIMARY or UNIQUE index must be identical to or a superset of the shard key.

 

  • Default order - MySQL sorts by PRIMARY KEY by default. MemSQL - not.

Evgenii Nikitin

Drupal team leader at Smile Open Source Solutions

Made with Slides.com