RDBMS in NoSQL World

Petr Jelínek

28th of Oct 2016

TTÜ Geeky Weekly

Who?

  • Developer for 14 years
  • PostgreSQL Developer & Consultant at 2ndQuadrant
  • Previously 4 years at Skype
  • PostgreSQL Contributor for more than 10 years
  • Lead developer of Postgres-BDR multi-master replication system for PostgreSQL
  • Co-maintainer of PgBouncer lightweight connection pooler for PostgreSQL
  • Long time contributor to various smaller open-source projects

Database Paradigms

RDBMS

        PostgreSQL, Oracle, MySQL, MSSQL

NoSQL

        MongoDB, Cassandra, RocksDB

NewSQL

        Vertica, VoltDB

Other specialized

        Hadoop, Elastic search, etc

Origins of NoSQL

  • First databases were "NoSQL" as there was no SQL yet
  • SQL came from IBM in 1970s and was standardized in 1980s
  • Modern NoSQL databases  were inspired by paper from Google
  • Term NoSQL coined by Johan Oskarsson of Last.fm in 2009 for distributed non-relational databases

ACID

Durability: once data is written it will always be there, across crashes and similar

BASE

Basically Available: "always" available

Soft-state: the state of the system may change even without client activity

Eventual Consistency: the system will become consistent over time

Atomic: all commands in transaction either succeed or fail, there is no in between state preserved

Consistent: all committed data is consisted according to the rules, constraints, etc that are defined in the DB

Isolation: provides control over what data other clients see

Relational

"Unstructured"

Data Structure

user_id city country
1 London UK
2 Oxford UK
id name email
1 Simon simon@email.com

user

address

{
  "id": 1
  "name": "Simon"
  "email":"simon@email.com"
  "address":
  [
    {
      "city": "London"
      "country": "UK"
    },
    {
      "city": "Oxford"
      "country": "UK"
    }
  ]
}

Data Structure

  • Most of the useful data is structured
  • Relational != Normalised
    • JSON document is still relational data
  • Schema can be forced either by db or application
    • Schema-on-write vs schema-on-read
    • Schema-on-write means that all data has the same schema
    • Schema-on-read means that application needs to be aware of all versions of the schema

Retrieving the Data

  • SQL - complex, standardized query language with it's own types and operators
  • Four basic operations
    • SELECT - retrieve data
    • INSERT - add data
    • UPDATE - modify data
    • DELETE - remove data
  • Operate over sets of data

RDBMS

Retrieving the Data

RDBMS

SELECT * FROM users WHERE id = 1;

SELECT department, role, gender, count(*)
 FROM employees
 GROUP BY CUBE (department, role, gender);

WITH RECURSIVE ...

Retrieving the Data

NoSQL

  • Each database has it's own API
  • Usually direct integration of the interface into the programming language by the db driver
  • Capabilities of the "query language" depend on the database

Retrieving the Data

NoSQL

redis.get(1)

db.bios.find( { _id: 5 } )

db.bios.find(
   {
      awards: {
                $elemMatch: {
                     award: "Turing Award",
                     year: { $gt: 1980 }
                }
      }
   }
)

Scaling

Vertical

Scaling

Horizontal

The Many Types of NoSQL

Key-Value Store

  • Stores key-value pairs
  • Value can be anything
  • Value is opaque to the database
  • Access to the data only through the key
  • Easy to scale
  • Sometimes used as storage engine for other types of databases (RocksDB)

The Many Types of NoSQL

Document Store

  • Stores primary key-value pairs
  • Value has some specific format (i.e. JSON, BSON, XML)
  • Database is aware of the structure and contents of the value
    • Parts of the value can be manipulated
  • Usually supports secondary indexes
    • Access to data using part of the value

The Many Types of NoSQL

Graph Store

Specialised type of database optimized for storing graphs

The Many Types of NoSQL

Wide-Column Store

  • Stores records (or rows) which contain arbitrary amount of columns
    • Every record can have different columns
  • Usually query language
  • Originated from the Google BigTable paper

The Power of RDBMS

  • Rich and (mostly) standardised query language
    • Does most of the operations over the data for you
  • Many options of storage patterns
    • Relational (normalized or not)
    • Document (json, xml)
    • Key-value
  • ACID
    • The default behaviour under concurrent access is mostly what you'd expect
  • More mature
    • Over the years picked features from non-relational databases and integrated them

What to Choose and When?

Do you know what you are doing?
No: use RDBMS
Yes: you don't need me to tell you

What to Choose and When?

Key-Value store (Redis)

        Caching

Graph databases (Neo4j)

        Social relations (friend of a friend), network maps

Wide-column store (Cassandra)

        When you need to easily scale writes beyond what
        single server can do

Document store (?)

        Volatile data with dynamic structure (web sessions,
        chat, etc)

Ping me!