MongoDB

South America road trip

Matias Cascallares

April 2014

Agenda

What is MongoDB?
When should I use MongoDB?
Genius Bar session
What's new in MongoDB 2.6?

Who am I?

Matias Cascallares
Solutions Architect @ MongoDB Inc based in Singapore
Software Engineer, University of Buenos Aires
Experience mostly in web environments
In my toolbox I have Java, Python and Node.js

Where I've been Working?

What is MongoDB?

Open source Database

https://www.flickr.com/photos/dasprid/8148007408/

Written in C++

positioning

FULL FEATUREd

Flexible/Dynamic schema
Ad Hoc queries
Real time aggregation
Rich query capabilities
Strongly consistent
Geospatial features
Support for most programming languages

Object Semantic


var mybeer = { 
    name: "Lagunitas",
    type: "Indian Pale Ale",
    barrels: 106000,
    alcohol: 5.7,
    manufacturer: {
        name: "Lagunitas Brewing Company",
        address: "1280 N McDowell Blvd, Petaluma, CA 94954, US"
    },
    tasting_notes: ["sweet", "fruit"]
};

db.beers.insert(mybeer);

object semantic


// single condition
db.beers.find( { "name": "Lagunitas" } );
    
// AND condition
db.beers.find( { "type": "Indian Pale Ale", alcohol: { "$gte": 5 } } );

// OR condition
db.beers.find( { "$or": [ 
    { "type": "Indian Pale Ale" },
    { "alcohol": { "$gte": 5 } }
]});

Highly available

SCALABLE

When should i use mongoDB?

high volume data feeds

Social Media
Machine to Machine
High Frequency Trading

operational intelligence

Ad Targeting
Monitoring
Ticking Database

content management

Product Catalogue
Mobile apps
Biometric
Data Aggregation

MONGODB 2.6

What's new?

main improvement areas

Operations
Text Search
Query System Improvements
Security
MMS and Automation

Operations

DEVOPS, DEVOPS, DEVOPS!!

new wire protocol

Bulk Writes - ORDERED


var bulk = db.beers.initializeOrderedBulkOp();

// insert three new beers
bulk.insert( { name: "Lagunitas", alcohol: 5.7 } );
bulk.insert( { name: "Leffe", alcohol: 6.5 } );
bulk.insert( { name: "London Pride", alcohol: 4.7 } );

bulk.execute();

BULK WRITES - UNORDERED


var bulk = db.beers.initializeUnorderedBulkOp();

// update one beer
bulk.find( { name: "Corona" } ).update( { $set: { alcohol: 4.3 } } );

// .. and delete another one
bulk.find( { name: "Chimay" } ).remove();

// execute everything as a single bulk operation
bulk.execute();

Max time per query


// expensive query with regex without anchor
db.articles.find( { "description": /August [0-9]+ 1969/ } ).maxTimeMS(30000)

// it also works with aggregation framework!
db.articles.aggregate([ { 
    "$match": { 
        "$text": { 
            "$search": "chien", 
            "$language": "fr" 
        }
    }
}]).maxTimeMS(100);

Building an index

Foreground
Background

building an index on the foreground

Faster
More compact
Blocking operation


// foreground by default
db.beers.ensureIndex( { name: 1} );

Building an index on the background

Slower
Sparser
Non-blocking operation


// background is an optional argument
db.beers.ensureIndex( { name: 1}, { background: true } );

index build in 2.4

FG in primary -> FG in secondary

BG in primary -> FG in secondary

index build in 2.6

FG in primary -> FG in primary

BG in primary -> BG in secondary

Storage allocation

usePowerOf2Sizes will be the default allocation method for new collections

Sharding new commands

mergeChunks
cleanupOrphaned

Text search - GA

...Text Search was there in 2.4

Integrated within find


db.articles.ensureIndex( { body: "text" } );

db.articles.insert(
    { body: "the quick brown fox jumped over the lazy dog" }
);

db.articles.find( { "$text" : { $search : "quickly" } } );

INTEGRATED WITHIN AF


db.articles.aggregate([
    { "$match": { "$text": { $search: "bRoWN"} } }
]);

playing with texts


// search for a single word
db.articles.find( { "$text": { "$search": "coffee" } } );

// search for any of these words
db.articles.find( { "$text": { "$search": "bake coffee cake" } } );

// search for a phrase
db.articles.find( { "$text": { "$search": "\"coffee cake\"" } } );

// excluding some terms
db.articles.find( { "$text": { "$search": "bake coffee -cake" } } );

language support


db.articles.find(
   { "$text": { "$search": "leche", $language: "es" } }
);

language support

Query system

improvements

Aggregation Framework

Returns a cursor
Can output to a collection
New operators
explain()

Aggregation Framework


// using a cursor
db.beers.aggregate([
        { "$match": { barrels: { "$gte": 10000}} }
    ],
    { cursor: { batchSize: 1 } }
);

// output to a collection
db.beers.aggregate([
    { "$match" : { barrels : { "$gte": 10000}} },
    { "$out" : "my_output_collection" }
]);

AGGREGAtion framework - explain


db.beers.aggregate(
    [
        { "$match": { barrels: { "$gte": 500 } } },
        { "$group": { "_id": "$type", count: { "$sum":1 } } }
    ],
    { explain: true }
);

aggregation framework - explain


{"stages" : [ // one entry per pipeline stage
        {
        "$cursor" : {
            "query" : { "barrels" : { "$gte" : 500 } },
            "fields" : { "type" : 1, "_id" : 0 },
            "plan" : {
                "cursor" : "BtreeCursor barrels_1",
                "isMultiKey" : false,
                "scanAndOrder" : false,
                "indexBounds" : { "barrels" : [ [500, Infinity] ] }
            }
        }
    ],
 "ok" : 1
}

update operators - mul


db.beers.insert(
    { _id: 1, name: "Lagunitas", price: 10.99 }
);


// to increase the price by 20%
db.beers.update(
    { _id: 1 },
    { "$mul" : { price: 1.2 } }
);

UPDATE operators - bit


db.beers.update(
    {_id: 1},
    { "$bit": { mask: { and: NumberInt(10) } } }
);

db.beers.update(
    {_id: 1},
    { "$bit": { mask: { or: NumberInt(10) } } }
);

db.beers.update(
    {_id: 1},
    { "$bit": { mask: { xor: NumberInt(10) } } }
);

update operators - min/max


db.scores.insert(
    { _id: 1, low_score: 200, high_score: 400 }
);

db.scores.update(
    { _id: 1 },
    { "$min": { low_score: 250 } }
);

db.scores.update(
    { _id: 1 },
    { "$max": { high_score: 450 } }
);

Index intersection


// index creation
db.beers.ensureIndex( { barrels: 1 } );
db.beers.ensureIndex( { alcohol: 1 } );

// retrieval
db.beers.find( { barrels: { "$gte": 100 }, alcohol: { "$gte": 5.5 } } );

index intersection

Less Index Maintenance
Smaller Working Set
Lower Write Overhead
More Adaptive

query optimizer - new concepts

Query Shape
Plan Cache

what is a query shape?


db.beers.find(
    { barrels: { "$gte": 300 } },
    { _id: -1, name: 1, barrels: 1}
).sort( { alcohol: -1 } );

db.runCommand( { planCacheListQueryShapes: "beers"});
{
    "shapes" : [
        {
            "query" : { barrels: { "$gte": 300 } },
            "sort" : { alcohol: -1 },
            "projection" : { _id: -1, name: 1, barrels: 1}
        }
    ]
}

... let's create a plan cache


db.runCommand({
    "planCacheSetFilter": "beers",
    "query" : { barrels: { "$gte": 300 } },
    "sort" : { alcohol: -1 },
    "projection" : { _id: -1, name: 1, barrels: 1}
    "indexes": [
        { barrels: 1 },
        { alcohol: -1, barrels: 1 }
    ],
});

what do i get?

Better control on which indexes are going to be evaluated
Ability to predefine a set of candidate indexes
... but still is an empiric query optimizer

Geospatial enhancements

Added support for multipart geometries

MultiPoint
MultiLineString
MultiPolygon
GeometryCollection

GEOSPATIAL ENHANCEMENTS

Security

Authentication

LDAP (Enterprise)
x509

Authorization

User defined roles

creating new roles


db.createRole({
    role: "MMSMonitoringRole",
    roles: ["clusterAdmin", "readAnyDatabase"]
});

db.createRole({
    role: "MMSBackupRole",
    roles: ["clusterAdmin", "readAnyDatabase", "userAdminAnyDatabase"]
});

using my new roles


db.addUser({
    "user": "mms-monitoring",
    "pwd": "abcd1234",
    "roles": [
        "MMSMonitoringRole"
    ]
});

db.addUser({
    "user": "mms-backup",
    "pwd": "efgh5678",
    "roles": [
        "MMSBackupRole"
    ]
});

creating custom privileges


db.createRole({
  "role": "appUser",
  "db": "myApp",
  "privileges": [
        {
            "resource": { "db": "myApp" , "collection": "" },
            "actions": [ "find", "dbStats", "collStats" ]
        },
        {
            "resource": { "db": "myApp", "collection": "beers" },
            "actions": [ "insert"]
        }
  ]
};

Auditing

Schema actions
Replica Set actions
Authentication & Authorization actions
Other actions

output


    mongod --dbpath data/db --auditDestination syslog

syslog
console
JSON/BSON file

dropping a collection


var auditEntry = {
    "atype" : "dropCollection",
    "ts" : { "$date" : "2014-04-08T16:48:34.333+1000" },
    "local" : { "ip" : "127.0.0.1", "port" : 27017 },
    "remote" : { "ip" : "127.0.0.1", "port" : 55771 },
    "users" : [
        { "user": "matias", "db": "test" }
    ],
    "param" : { "ns" : "test.beers" },
    "result" : 0
};

shutting down the server


var auditEntry = {
    "atype" : "shutdown",
    "ts" : { "$date" : "2014-04-08T16:54:24.373+1000" },
    "local" : { "ip" : "127.0.0.1", "port" : 27017 },
    "remote" : { "ip" : "127.0.0.1", "port" : 55771 },
    "users" : [
        { "user": "matias", "db": "admin" }
    ],
    "param" : {},
    "result" : 0
};

MMS & automation

MMS

Monitoring and backup service
Cloud-based and on-premise
Easy to setup

cloud numbers

Monitoring: 75K updates/sec
Backup: 100 GB/hr of new data

monitoring

alerts

BACKUP

Backup a replica set or sharded cluster
Initial sync + incremental
Generated snapshots every 6 hs
Restore via HTTPS or SCP
Restore replica sets to point-in-time (last 24hs)
Restore sharded clusters to any 15 minute (last 24hs)

BACKUP

automation

Provision, create, upgrade and maintain MongoDB deployments
Hide complex stuff, just use your browser
Initial supported platforms: AWS and OpenStack
Alpha/Beta stage

what can i do?

Create your deployment (replica set or sharded)
Add/remove shards and replica sets
Resize oplog
Specify users and roles
Provisioning new machines (only in AWS)

provisioning new servers

creating your replica set

creating your cluster

50 shards, one click

See it in action..

https://www.youtube.com/watch?v=nSJiVXNsPHk&feature=youtu.be

THE END

Questions?

http://slid.es/mcascallares/mongodb-sa-road-trip

MongoDB - South America Road Trip - Buenos Aires

By Matias Cascallares

MongoDB - South America Road Trip - Buenos Aires

3,747

Matias Cascallares

mcascallares

MongoDB

South America road trip

Agenda

Who am I?

Where I've been Working?

What is MongoDB?

Open source Database

Written in C++

positioning

FULL FEATUREd

Object Semantic

object semantic

Highly available

SCALABLE

When should i use mongoDB?

high volume data feeds

operational intelligence

content management

MONGODB 2.6

What's new?

main improvement areas

Operations

DEVOPS, DEVOPS, DEVOPS!!

new wire protocol

Bulk Writes - ORDERED

BULK WRITES - UNORDERED

Max time per query

Building an index

building an index on the foreground

Building an index on the background

index build in 2.4

index build in 2.6

Storage allocation

Sharding new commands

Text search - GA

...Text Search was there in 2.4

Integrated within find

INTEGRATED WITHIN AF

playing with texts

Top 3 relevant documents

language support

language support

Query system

improvements

Aggregation Framework

Aggregation Framework

AGGREGAtion framework - explain

aggregation framework - explain

update operators - mul

UPDATE operators - bit

update operators - min/max

Index intersection

index intersection

query optimizer - new concepts

what is a query shape?

... let's create a plan cache

what do i get?

Geospatial enhancements

GEOSPATIAL ENHANCEMENTS

Security

Authentication

Authorization

creating new roles

using my new roles

creating custom privileges

Auditing

output

dropping a collection

shutting down the server

MMS & automation

MMS

cloud numbers

monitoring

monitoring

monitoring

alerts

BACKUP

BACKUP

BACKUP

automation