MongoDB

South America road trip






Matias Cascallares
April 2014

Agenda


  • What is MongoDB?
  • When should I use MongoDB?
  • Genius Bar session
  • What's new in MongoDB 2.6?

Who am I?


  • Matias Cascallares
  • Solutions Architect @ MongoDB Inc based in Singapore
  • Software Engineer, University of Buenos Aires
  • Experience mostly in web environments
  • In my toolbox I have Java, Python and Node.js

    Where I've been Working?




    What is MongoDB?


    Open source Database










    https://www.flickr.com/photos/dasprid/8148007408/

    Written in C++

    positioning


    FULL FEATUREd


    • Flexible/Dynamic schema
    • Ad Hoc queries
    • Real time aggregation
    • Rich query capabilities
    • Strongly consistent
    • Geospatial features
    • Support for most programming languages

    Object Semantic


    
    var mybeer = { 
        name: "Lagunitas",
        type: "Indian Pale Ale",
        barrels: 106000,
        alcohol: 5.7,
        manufacturer: {
            name: "Lagunitas Brewing Company",
            address: "1280 N McDowell Blvd, Petaluma, CA 94954, US"
        },
        tasting_notes: ["sweet", "fruit"]
    };
    
    db.beers.insert(mybeer);
    
    

    object semantic


    
    // single condition
    db.beers.find( { "name": "Lagunitas" } );
        
    // AND condition
    db.beers.find( { "type": "Indian Pale Ale", alcohol: { "$gte": 5 } } );
    
    // OR condition
    db.beers.find( { "$or": [ 
        { "type": "Indian Pale Ale" },
        { "alcohol": { "$gte": 5 } }
    ]});
    
    

    Highly available



    SCALABLE


     



    When should i use mongoDB?

    high volume data feeds


    • Social Media
    • Machine to Machine
    • High Frequency Trading

    operational intelligence


    • Ad Targeting
    • Monitoring
    • Ticking Database

    content management


    • Product Catalogue
    • Mobile apps
    • Biometric
    • Data Aggregation 



    MONGODB 2.6

    What's new?

    main improvement areas


    • Operations
    • Text Search
    • Query System Improvements
    • Security
    • MMS and Automation


    Operations

    DEVOPS, DEVOPS, DEVOPS!!

    new wire protocol



    Bulk Writes - ORDERED


    
    var bulk = db.beers.initializeOrderedBulkOp();
    
    // insert three new beers
    bulk.insert( { name: "Lagunitas", alcohol: 5.7 } );
    bulk.insert( { name: "Leffe", alcohol: 6.5 } );
    bulk.insert( { name: "London Pride", alcohol: 4.7 } );
    
    bulk.execute();
    
    

    BULK WRITES - UNORDERED


    
    var bulk = db.beers.initializeUnorderedBulkOp();
    
    // update one beer
    bulk.find( { name: "Corona" } ).update( { $set: { alcohol: 4.3 } } );
    
    // .. and delete another one
    bulk.find( { name: "Chimay" } ).remove();
    
    // execute everything as a single bulk operation
    bulk.execute();
    
    

    Max time per query


    
    // expensive query with regex without anchor
    db.articles.find( { "description": /August [0-9]+ 1969/ } ).maxTimeMS(30000)
    
    // it also works with aggregation framework!
    db.articles.aggregate([ { 
        "$match": { 
            "$text": { 
                "$search": "chien", 
                "$language": "fr" 
            }
        }
    }]).maxTimeMS(100);
    
    

    Building an index



    • Foreground
    • Background

    building an index on the foreground



    • Faster
    • More compact
    • Blocking operation

    
    // foreground by default
    db.beers.ensureIndex( { name: 1} );
    
    

    Building an index on the background


    • Slower
    • Sparser
    • Non-blocking operation

    
    // background is an optional argument
    db.beers.ensureIndex( { name: 1}, { background: true } );
    
    

    index build in 2.4


    FG in primary -> FG in secondary
    BG in primary -> FG in secondary

    index build in 2.6

    FG in primary -> FG in primary
    BG in primary -> BG in secondary

    Storage allocation


    • usePowerOf2Sizes will be the default allocation method for new collections



    Sharding new commands



    • mergeChunks
    • cleanupOrphaned


    Text search - GA

    ...Text Search was there in 2.4



    Integrated within find


    
    db.articles.ensureIndex( { body: "text" } );
    
    db.articles.insert(
        { body: "the quick brown fox jumped over the lazy dog" }
    );
    
    db.articles.find( { "$text" : { $search : "quickly" } } );
    
    

    INTEGRATED WITHIN AF


    
    db.articles.aggregate([
        { "$match": { "$text": { $search: "bRoWN"} } }
    ]);
    
    

    playing with texts


    
    // search for a single word
    db.articles.find( { "$text": { "$search": "coffee" } } );
    
    // search for any of these words
    db.articles.find( { "$text": { "$search": "bake coffee cake" } } );
    
    // search for a phrase
    db.articles.find( { "$text": { "$search": "\"coffee cake\"" } } );
    
    // excluding some terms
    db.articles.find( { "$text": { "$search": "bake coffee -cake" } } );
    
    

    Top 3 relevant documents


    
    db.articles.find(
       { "$text": { "$search": "cake" } },
       { "score": { "$meta": "textScore" } }
    ).sort( { "score": { "$meta": "textScore" } } ).limit(3)
    
    

    language support


    
    db.articles.find(
       { "$text": { "$search": "leche", $language: "es" } }
    );
    
    

    language support



    Query system 

    improvements

    Aggregation Framework



    • Returns a cursor
    • Can output to a collection
    • New operators
    • explain()

    Aggregation Framework


    
    // using a cursor
    db.beers.aggregate([
            { "$match": { barrels: { "$gte": 10000}} }
        ],
        { cursor: { batchSize: 1 } }
    );
    
    // output to a collection
    db.beers.aggregate([
        { "$match" : { barrels : { "$gte": 10000}} },
        { "$out" : "my_output_collection" }
    ]);
    
    

    AGGREGAtion framework - explain


    
    db.beers.aggregate(
        [
            { "$match": { barrels: { "$gte": 500 } } },
            { "$group": { "_id": "$type", count: { "$sum":1 } } }
        ],
        { explain: true }
    );
    
    

    aggregation framework - explain


    
    {"stages" : [ // one entry per pipeline stage
            {
            "$cursor" : {
                "query" : { "barrels" : { "$gte" : 500 } },
                "fields" : { "type" : 1, "_id" : 0 },
                "plan" : {
                    "cursor" : "BtreeCursor barrels_1",
                    "isMultiKey" : false,
                    "scanAndOrder" : false,
                    "indexBounds" : { "barrels" : [ [500, Infinity] ] }
                }
            }
        ],
     "ok" : 1
    }
    

    update operators - mul


    
    db.beers.insert(
        { _id: 1, name: "Lagunitas", price: 10.99 }
    );
    
    
    // to increase the price by 20%
    db.beers.update(
        { _id: 1 },
        { "$mul" : { price: 1.2 } }
    );
    
    

    UPDATE operators - bit


    
    db.beers.update(
        {_id: 1},
        { "$bit": { mask: { and: NumberInt(10) } } }
    );
    
    db.beers.update(
        {_id: 1},
        { "$bit": { mask: { or: NumberInt(10) } } }
    );
    
    db.beers.update(
        {_id: 1},
        { "$bit": { mask: { xor: NumberInt(10) } } }
    );
    
    

    update operators - min/max


    
    db.scores.insert(
        { _id: 1, low_score: 200, high_score: 400 }
    );
    
    db.scores.update(
        { _id: 1 },
        { "$min": { low_score: 250 } }
    );
    
    db.scores.update(
        { _id: 1 },
        { "$max": { high_score: 450 } }
    );
    
    

    Index intersection


    
    // index creation
    db.beers.ensureIndex( { barrels: 1 } );
    db.beers.ensureIndex( { alcohol: 1 } );
    
    // retrieval
    db.beers.find( { barrels: { "$gte": 100 }, alcohol: { "$gte": 5.5 } } );
    
    

    index intersection



    • Less Index Maintenance 
    • Smaller Working Set
    • Lower Write Overhead
    • More Adaptive

    query optimizer - new concepts


    • Query Shape
    • Plan Cache

    what is a query shape?


    
    db.beers.find(
        { barrels: { "$gte": 300 } },
        { _id: -1, name: 1, barrels: 1}
    ).sort( { alcohol: -1 } );
    
    db.runCommand( { planCacheListQueryShapes: "beers"});
    {
        "shapes" : [
            {
                "query" : { barrels: { "$gte": 300 } },
                "sort" : { alcohol: -1 },
                "projection" : { _id: -1, name: 1, barrels: 1}
            }
        ]
    }
    

    ... let's create a plan cache


    
    db.runCommand({
        "planCacheSetFilter": "beers",
        "query" : { barrels: { "$gte": 300 } },
        "sort" : { alcohol: -1 },
        "projection" : { _id: -1, name: 1, barrels: 1}
        "indexes": [
            { barrels: 1 },
            { alcohol: -1, barrels: 1 }
        ],
    });
    

    what do i get?


    • Better control on which indexes are going to be evaluated
    • Ability to predefine a set of candidate indexes
    • ... but still is an empiric query optimizer

    Geospatial enhancements


    • Added support for multipart geometries
      • MultiPoint
      • MultiLineString
      • MultiPolygon
      • GeometryCollection

    GEOSPATIAL ENHANCEMENTS




    Security

    Authentication



    • LDAP (Enterprise)
    • x509

    Authorization



    • User defined roles

    creating new roles


    
    db.createRole({
        role: "MMSMonitoringRole",
        roles: ["clusterAdmin", "readAnyDatabase"]
    });
    
    db.createRole({
        role: "MMSBackupRole",
        roles: ["clusterAdmin", "readAnyDatabase", "userAdminAnyDatabase"]
    });
    
    

    using my new roles


    
    db.addUser({
        "user": "mms-monitoring",
        "pwd": "abcd1234",
        "roles": [
            "MMSMonitoringRole"
        ]
    });
    
    db.addUser({
        "user": "mms-backup",
        "pwd": "efgh5678",
        "roles": [
            "MMSBackupRole"
        ]
    });
    

     creating custom privileges


    
    db.createRole({
      "role": "appUser",
      "db": "myApp",
      "privileges": [
            {
                "resource": { "db": "myApp" , "collection": "" },
                "actions": [ "find", "dbStats", "collStats" ]
            },
            {
                "resource": { "db": "myApp", "collection": "beers" },
                "actions": [ "insert"]
            }
      ]
    };
    

    Auditing



    • Schema actions
    • Replica Set actions
    • Authentication & Authorization actions
    • Other actions

      output


      
          mongod --dbpath data/db --auditDestination syslog
      
      

      • syslog
      • console
      • JSON/BSON file

      dropping a collection


      
      var auditEntry = {
          "atype" : "dropCollection",
          "ts" : { "$date" : "2014-04-08T16:48:34.333+1000" },
          "local" : { "ip" : "127.0.0.1", "port" : 27017 },
          "remote" : { "ip" : "127.0.0.1", "port" : 55771 },
          "users" : [
              { "user": "matias", "db": "test" }
          ],
          "param" : { "ns" : "test.beers" },
          "result" : 0
      };
      
      
      

      shutting down the server


      
      var auditEntry = {
          "atype" : "shutdown",
          "ts" : { "$date" : "2014-04-08T16:54:24.373+1000" },
          "local" : { "ip" : "127.0.0.1", "port" : 27017 },
          "remote" : { "ip" : "127.0.0.1", "port" : 55771 },
          "users" : [
              { "user": "matias", "db": "admin" }
          ],
          "param" : {},
          "result" : 0
      };
      
      


      MMS & automation

      MMS


      • Monitoring and backup service
      • Cloud-based and on-premise
      • Easy to setup

      cloud numbers


      • Monitoring: 75K updates/sec
      • Backup: 100 GB/hr of new data

      monitoring



      monitoring


      monitoring


      alerts


      BACKUP


      • Backup a replica set or sharded cluster
      • Initial sync + incremental
      • Generated snapshots every 6 hs
      • Restore via HTTPS or SCP
      • Restore replica sets to point-in-time (last 24hs)
      • Restore sharded clusters to any 15 minute (last 24hs)

      BACKUP


      BACKUP


      automation


      • Provision, create, upgrade and maintain MongoDB deployments
      • Hide complex stuff, just use your browser
      • Initial supported platforms: AWS and OpenStack
      • Alpha/Beta stage

      what can i do?


      • Create your deployment (replica set or sharded)
      • Add/remove shards and replica sets
      • Resize oplog
      • Specify users and roles
      • Provisioning new machines (only in AWS)

      provisioning new servers



      creating your replica set


      creating your cluster

       

      50 shards, one click

      See it in action..



      https://www.youtube.com/watch?v=nSJiVXNsPHk&feature=youtu.be


      THE END

      Questions?





      http://slid.es/mcascallares/mongodb-sa-road-trip  
      Made with Slides.com