NoSQL DB's

NoSQL DB's

Content:

  1. A.C.I.D.
  2. C.A.P.
  3. Types of NoSQL
  4. Mongo Db

 

A.C.I.D.

A.C.I.D.

Atomicity Consistency Isolation Durability

Atomicity: A database follows the all or nothing rule, i.e., the database considers all transaction operations as one whole unit or atom.

Consistency: Ensures that only valid data following all rules and constraints is written in the database.

Isolation: Ensures that transactions are securely and independently processed at the same time without interference, but it does not ensure the order of transactions.

Durability: ensures that any transaction committed to the database will not be lost. Durability is ensured through the use of database backups and transaction logs that facilitate the restoration of committed transactions in spite of any subsequent software or hardware failures.

C.A.P. 

 CAP theorem states that in the presence of a network partition, one has to choose between consistency and availability

Types of nosql db's

  • Schema free documents.

  • Easy to model/change database arhitecture

  • Generally faster than SQL.

  • CRUD operation via REST opearions. 

  • Easy to scale, replicate and shard the data.

Pros:

  • Schema free documents.

  • Not A.C.I.D.

  • Eventual consistent.

  • No "universal" query language

Cons:

Mongo DB 

Example of a documents:

Embeded:

Reference:

Schema validation for documents, Mongo validates at the collection level using a validator option:

There are 2 validation levels moderate and strict.

Strict will check all new inserts and updates.
Moderate will validate updates only on documents that are valid.

 

Queries examples

db.contacts.find()

db.contacts.find({"email": "cat@mongodb.com"})

db.contacts.find({"phone":{ $in:["111", "999"]}}, {"age": 0})

db.contacts.find({"age":{ $gt:1, $lt: 999}}, {"age": 1})
.findOne()
.findOneAndRemplace()
.findOneAndDelete()
.findOneAndUpdate()
//cursor methods
.count()
.sort({"phone":1})
.limit(1)
.skip(5)
.map( function(u) { return u.status.length; } )
.pretty()
Atomicity/Isolation - Mongo is atomic only at collection level, if a query works on multiple collections then opperation may interleave. $isolated can be used to aquire a exclusive lock on the collections until the operation finishes, does not work on sharded cluster and does not provide all-or-nothing functionality. 
Consistency - In a replica set, replicas are eventual consistent. This is not a problem if you read only from the primary.
Durability - durability is provided by a write ahead journal that is flushed to disk every t seconds. If the server crashes all the changes not written to journal wil be lost.

ACID in Mongo

Indexes

Indexes

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure.

Most of db's use B-tree's for storing indexes

db.collection.createIndex( <key and index type specification>, <options> )

Types:

  • Single field index
  • Compound index
  • Multikey index
  • Geospatial index
  • Text index
  • Hashed index

Creating an index.

Scaling

Replication

Why use replication?

This way we remove one of the posible single points of failure in the application.

 

What is replication?

Copying (dublicating) data across multiple nodes (in different data centers if possible)

Some common problems:

  • Primary node dies
  • The network is partitioned in 2 or more groups

Consensus

  • Data is replicated on multiple nodes
  • At any given type there is one primary node
  • Primary can accept write and reads
  • Slaves can accept only reads(or nothing)
  • After receiving a write the primary will apply the change to the oplog, and send the worklog to the slaves and wait for acknowledgement

journal- low level log used for durability on a single node

oplog- high level capped collection containing write operation used for replica synchronization

Read concern

Write concern

{ w: <value>, j: <boolean>, wtimeout: <number> }

For each operation we can set one of the following levels:

W J WTIMEOUT
number true 0
majority false n
tag set
  • Local

  • Majority

  • Linearizable

  • Primary/ PrimaryPrefered

  • Pecondary/ SecondaryPrefered

  • Nearest

Two phase commit

  1. The coordinator sends a query to commit message to all cohorts and waits until it has received a reply from all cohorts.
  2. The cohorts execute the transaction up to the point where they will be asked to commit. They each write an entry to their undo log and an entry to their redo log.
  3. Each cohort replies with an agreement message (cohort votes Yes to commit), if the cohort's actions succeeded, or an abort message (cohort votes No, not to commit), if the cohort experiences a failure that will make it impossible to commit

Sharding

Sharding is a method for distributing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations.

Data is sharded at collection level

Data is distributed based on a sharding key

The sharding strategy can be based on a range or on a hash function

All shard need to be replicated (x3)

Not all collection need to be sharded.

Example of mongo arhitecture.

It depends on the user configuration, by default it's CP, but we can configure it to be AP by reading from slaves with a slack read concern, also we can add more write speed by allowing the master not to wait for aknowledgement (slack write convern).

So what is mongo in CAP?

{
	"_id": "joe",
	"phone": "999",
	"email": "joe@mongodb.com",
	"status": "Complete",
	"age": 20
}

{
	"_id": "cat",
	"phone": "12312312",
	"email": "cat@mongodb.com",
	"status": "Complete",
	"age": 16
}

db.createCollection( "contacts",
   { validator: { $and:
      [
         { phone: { $type: "string" } },
         { age: { $type: "int" } },
         { email: { $regex: /@mongodb\.com$/ } },
         { status: { $in: [ "Unknown", "Complete" ] } }
      ]
   }
} )

db.contacts.update({"_id":"cat"}, {$set: {"status": "Incomplete"})
db.contacts.update({"_id":"cat"}, {$set: {"status": "Unknown" , "phone": "999", "email": "someemail@mongodb.com"}})

db.contacts.find()
db.contacts.find({"email": "cat@mongodb.com"})
db.contacts.find({"phone":{ $in:["111", "999"]}}, {"age": 0}).sort({"_id":1})
db.contacts.find({"age":{ $gt:1, $lt: 999}}, {"age": 1}).limit(1)

Questions?

Mongo db

By Corneliu Caraman