Intro to MongoDB

By Ronak Raithatha

Gun to the head by Alex.

What we'll cover:

Introduction
CRUD
Aggregation Framework
Replication
Sharding (note: look for 12 yr olds)

Introduction

MongoDB :

is a no-SQL database
uses JSON documents which are represented in the DB as BSON.
can be scaled horizontally through sharding

Most Importantly:

MongoDB is Web Scale

JSON

JavaScript Object Notation
- Not always related
Easier to read
Lightweight compared to XML
Stored as BSON internally
- Binary JSON
- bsonspec.org

RDBMS -> MongoDB

Table	Collection
Row	Document
Column	Field
Joins	Embedded documents/Linking
Database	Database

CRUD

Creating a document

INSERT INTO seeker (fn, ln)
VALUES ('John', 'Doe')

db.seeker.insertOne({
    fn: 'John',
    ln: 'Doe'
})

{  
   "_id":ObjectId("5a5d957671775ae0f046e3e6"),
   "fn":"John",
   "ln":"Doe"
}

Result:

To insert multiple documents, use:

.insertMany(<document>,<document>,...)

SQL:

MongoDB:

CRUD

_id

Primary key for all collections
Is of type ObjectID (a BSON type)
Allows for auto-indexing
Has auto_increment if needed
Can be set explicitly :

db.seeker.insert({_id:'Stephane',country:'US'})

ObjectID

12 byte long:

4-byte: seconds since Unix epoch
3-byte: machine identifier
2-byte: process id
3-byte: counter, starting with a random value

CRUD

Updating a document

SQL:

UPDATE SEEKER SET ln='Smith' WHERE ln='Doe'

MongoDB:

db.seeker.update(
    {ln: 'Doe'},
    {$set:
        {ln: 'Smith'}
    }
)

Replace a full document with:

.replaceOne(<filter>,<replacement>)

db.seeker.update(
    {ln: 'Doe'},
    {$set:
        {age: 23}
    }
)

Adding new field

Update multiple documents with:

.updateMany(<filter>,<update>)

CRUD

Deleting a document

SQL:

DELETE FROM seeker WHERE fn='John'

MongoDB:

db.seeker.deleteOne({fn:'John'})

Or a collection

SQL:

DROP TABLE seeker

MongoDB:

db.seeker.drop()

To delete many, use:

.deleteMany(<filter>)

CRUD

Reading a document

SQL:

SELECT *
FROM seeker

MongoDB:

db.seeker.find()

SELECT id,
       fn,
       ln
FROM seeker

db.people.find(
    { },
    { fn: 1, ln: 1 }
)

SELECT fn, ln
FROM seeker
WHERE ln = "Doe"

db.seeker.find(
    { ln: "Doe" },
    { fn: 1, ln: 1, _id: 0 }
)

SELECT *
FROM seeker
WHERE age > 25
AND   age <= 50

db.seeker.find(
   { age: { $gt: 25, $lte: 50 } }
)

CRUD

Reading a document

SQL:

SELECT *
FROM seeker
WHERE age < 30
ORDER BY age ASC

MongoDB:

db.seeker.find( { age: { $lt: 30 } } )
  .sort( { age: 1 } )

SELECT *
FROM seeker
WHERE ln like "%Do%"

db.people.find( { user_id: /bc/ } )

SELECT COUNT(*)
FROM seeker

db.people.count()
or
db.people.find().count()

SELECT *
FROM seeker
LIMIT 5
SKIP 10

db.people.find().limit(5).skip(10)

CRUD

Reading a document

There are a couple other combined methods to work on single reads:

db.collection.findOneAndReplace()
db.collection.findOneAndModify()
db.collection.findAndModify()

Aggregation Framework

Provides a way to process and return data from multiple results and perform various operations on the return data.

i.e. GROUP BY, SORT, HAVING, etc.

Two most prominent ways to use the aggregation framework are:

Aggregation Pipeline: works much like pipes (|) in the terminal where data is sent through one stage at a time.
Map-Reduce: a Mongo operation but uses JS map-reduce in the background. Generally slower than the pipeline.

Aggregation Pipeline

Map-Reduce Operation

Aggregation Framework

SQL:

SELECT seeker_id,
       SUM(applications) AS total
FROM seeker
GROUP BY seeker_id
ORDER BY total

MongoDB:

db.seeker.aggregate( [
   {
     $group: {
        _id: "$seeker_id",
        total: { $sum: "$applications" }
     }
   },
   { $sort: { total: 1 } }
] )

SELECT seeker_id,
       count(*)
FROM seeker
GROUP BY seeker_id
HAVING count(*) > 1

db.seeker.aggregate( [
   {
     $group: {
        _id: "$seeker_id",
        count: { $sum: 1 }
     }
   },
   { $match: { count: { $gt: 1 } } }
] )

Aggregation Framework

SQL:

MongoDB:

SELECT seeker_id,
       app_date,
       SUM(applications) AS total
FROM orders
GROUP BY seeker_id,
         app_date
HAVING total > 3

db.seeker.aggregate( [
   {
     $group: {
        _id: {
           cust_id: "$seeker_id",
           app_date: {
               month: { $month: "$app_date" },
               day: { $dayOfMonth: "$app_date" },
               year: { $year: "$app_date"}
           }
        },
        total: { $sum: "$applications" }
     }
   },
   { $match: { total: { $gt: 3 } } }
] )

Aggregation Framework

SQL:

MongoDB:

SELECT seeker_id,
       app_date,
       SUM(applications) AS total
FROM orders
GROUP BY seeker_id,
         app_date
HAVING total > 3

db.seeker.aggregate( [
   {
     $group: {
        _id: {
           cust_id: "$seeker_id",
           app_date: {
               month: { $month: "$app_date" },
               day: { $dayOfMonth: "$app_date" },
               year: { $year: "$app_date"}
           }
        },
        total: { $sum: "$applications" }
     }
   },
   { $match: { total: { $gt: 3 } } }
] )

Aggregation Framework

SQL:

MongoDB:

SELECT *, applications
FROM seeker
WHERE ethnicity_id IN (SELECT *
FROM ethnicity
WHERE id= seeker.ethnicity_id);

db.seeker.aggregate([
   {
     $lookup:
       {
         from: "ethnicity",
         localField: "ethnicity_id",
         foreignField: "id",
         as: "ethnicity_id"
       }
  }
])

Replication

MongoDB provides built-in asynchronous replication by reading oplog to the master.

Maintains a heartbeat connection between the replica sets in case primary goes down.

If primary does go down, mongo determines the next primary either by priority set to each set, or hosting an election if the priority is the same.

Replication

Sharding

Provides a way to horizontally scale data across multiple machines by adding a subset of the overall data to each machine.

Sharding

Shard Keys

The shard key consists of an immutable field of fields that exist in every document in a target collection

Going forward

With enough interest, we can spend a hack-day working on two datasets that are similar between SQL and Mongo.

Divide up into teams and try to replicate one team's SQL query to Mongo.

Questions...

Mongo DB

By Ronak Raithatha

Mongo DB

1,400

Ronak Raithatha

baronvonratata

Intro to MongoDB

What we'll cover:

Introduction

Most Importantly:

JSON

RDBMS -> MongoDB

CRUD

Creating a document

CRUD

_id

ObjectID

CRUD

Updating a document

CRUD

Deleting a document

Or a collection

CRUD

Reading a document

CRUD

Reading a document

CRUD

Reading a document

Aggregation Framework

Aggregation Pipeline

Map-Reduce Operation

Aggregation Framework

Aggregation Framework

Aggregation Framework

Aggregation Framework

Replication

Replication

Replication

Sharding

Sharding

Shard Keys

Going forward

Questions...

Mongo DB

More from Ronak Raithatha