RDBMS in NoSQL World

Petr Jelínek

28th of Oct 2016

TTÜ Geeky Weekly

Who?

  • Developer for 14 years
  • PostgreSQL Developer & Consultant at 2ndQuadrant
  • Previously 4 years at Skype
  • PostgreSQL Contributor for more than 10 years
  • Lead developer of Postgres-BDR multi-master replication system for PostgreSQL
  • Co-maintainer of PgBouncer lightweight connection pooler for PostgreSQL
  • Long time contributor to various smaller open-source projects

Database Paradigms

RDBMS

        PostgreSQL, Oracle, MySQL, MSSQL

NoSQL

        MongoDB, Cassandra, RocksDB

NewSQL

        Vertica, VoltDB

Other specialized

        Hadoop, Elastic search, etc

Origins of NoSQL

  • First databases were "NoSQL" as there was no SQL yet
  • SQL came from IBM in 1970s and was standardized in 1980s
  • Modern NoSQL databases  were inspired by paper from Google
  • Term NoSQL coined by Johan Oskarsson of Last.fm in 2009 for distributed non-relational databases

ACID

Durability: once data is written it will always be there, across crashes and similar

BASE

Basically Available: "always" available

Soft-state: the state of the system may change even without client activity

Eventual Consistency: the system will become consistent over time

Atomic: all commands in transaction either succeed or fail, there is no in between state preserved

Consistent: all committed data is consisted according to the rules, constraints, etc that are defined in the DB

Isolation: provides control over what data other clients see

Relational

"Unstructured"

Data Structure

user_id city country
1 London UK
2 Oxford UK
id name email
1 Simon simon@email.com

user

address

{
  "id": 1
  "name": "Simon"
  "email":"simon@email.com"
  "address":
  [
    {
      "city": "London"
      "country": "UK"
    },
    {
      "city": "Oxford"
      "country": "UK"
    }
  ]
}

Data Structure

  • Most of the useful data is structured
  • Relational != Normalised
    • JSON document is still relational data
  • Schema can be forced either by db or application
    • Schema-on-write vs schema-on-read
    • Schema-on-write means that all data has the same schema
    • Schema-on-read means that application needs to be aware of all versions of the schema

Retrieving the Data

  • SQL - complex, standardized query language with it's own types and operators
  • Four basic operations
    • SELECT - retrieve data
    • INSERT - add data
    • UPDATE - modify data
    • DELETE - remove data
  • Operate over sets of data

RDBMS

Retrieving the Data

RDBMS

SELECT * FROM users WHERE id = 1;

SELECT department, role, gender, count(*)
 FROM employees
 GROUP BY CUBE (department, role, gender);

WITH RECURSIVE ...

Retrieving the Data

NoSQL

  • Each database has it's own API
  • Usually direct integration of the interface into the programming language by the db driver
  • Capabilities of the "query language" depend on the database

Retrieving the Data

NoSQL

redis.get(1)

db.bios.find( { _id: 5 } )

db.bios.find(
   {
      awards: {
                $elemMatch: {
                     award: "Turing Award",
                     year: { $gt: 1980 }
                }
      }
   }
)

Scaling

Vertical

Scaling

Horizontal

The Many Types of NoSQL

Key-Value Store

  • Stores key-value pairs
  • Value can be anything
  • Value is opaque to the database
  • Access to the data only through the key
  • Easy to scale
  • Sometimes used as storage engine for other types of databases (RocksDB)

The Many Types of NoSQL

Document Store

  • Stores primary key-value pairs
  • Value has some specific format (i.e. JSON, BSON, XML)
  • Database is aware of the structure and contents of the value
    • Parts of the value can be manipulated
  • Usually supports secondary indexes
    • Access to data using part of the value

The Many Types of NoSQL

Graph Store

Specialised type of database optimized for storing graphs

The Many Types of NoSQL

Wide-Column Store

  • Stores records (or rows) which contain arbitrary amount of columns
    • Every record can have different columns
  • Usually query language
  • Originated from the Google BigTable paper

The Power of RDBMS

  • Rich and (mostly) standardised query language
    • Does most of the operations over the data for you
  • Many options of storage patterns
    • Relational (normalized or not)
    • Document (json, xml)
    • Key-value
  • ACID
    • The default behaviour under concurrent access is mostly what you'd expect
  • More mature
    • Over the years picked features from non-relational databases and integrated them

What to Choose and When?

Do you know what you are doing?
No: use RDBMS
Yes: you don't need me to tell you

What to Choose and When?

Key-Value store (Redis)

        Caching

Graph databases (Neo4j)

        Social relations (friend of a friend), network maps

Wide-column store (Cassandra)

        When you need to easily scale writes beyond what
        single server can do

Document store (?)

        Volatile data with dynamic structure (web sessions,
        chat, etc)

Ping me!

Made with Slides.com