MongoDB Basics
Banana's edition
Agenda
- Overview
- CRUD Operations
- Aggregation Framework
- Indexes
- Lessons learned
Overview
What is MongoDB?
Why MongoDB?
- Document Oriented Storage
- Index on any attribute
- Replication and high availability
- Auto-sharding
- Rich queries
- Fast in-place updates
- Professional support by MongoDB
Terminology and Concepts
Teorema CAP
ACID vs BASE
CRUD Operations
Create Operations
- db.collection.insertOne()
- db.collection.insertMany()
Read Operations
- db.collection.find()
- db.collection.findOne()
Read Operations - findOne
db.getCollection('users')
.findOne({username: 'kimkardashian'})
############################ VS ############################
db.getCollection('users')
.find({username: 'kimkardashian'}).limit(1)
Read Operations- find
# state = California AND user_type = celebrity
db.users.find(
{state: 'California', user_type: 'celebrity'}
)
# select all - show only username and followers_count
db.users.find(
{},
{username: 1, followers_count: 1}
)
# select all - show all fields except _id and user_id
db.useres.find(
{},
{user_id : 0, _id: 0}
)
Read Operations - Query criteria
########### $gt
# users with more than 1m of followers
db.users.find({followers_count: {$gt: 1000000}})
########### $in
# users related with zara
db.users.find({related_brands: {$in: ['zara']}})
Comparison query operators
$eq, $neq, $gt, $gte, $lt, $lte, $in, $nin
############ exists
# users with the "state" field
db.users.find({state: {$exists: 1}})
############ type
# users with text in the biography
db.users.find({biography: {$type: 2}})
Read Operations - Query criteria
Element query operators (exists, type)
############ nor
# documents that:
# price field not equal to 1.99 and sale field not equal to true
# or price field not equal to 1.99 but do not contain the sale field
# or do not contain the price field but sale field not equal to true
# or do not contain the price field and do not contain the sale field
db.inventory.find(
{
$nor: [
{ price: 1.99 },
{ sale: true } ]
}
)
Read Operations - Query criteria
Logical query operators (or, and, not, nor)
Read Operations - Query criteria
Logical query operators
$and ????
Read Operations - Query criteria
Logical query operators
$and ????
# users with # followers between 1M and 2M
db.users.find(
{
followers_count: {$gt: 1000000},
followers_count: {$lt: 2000000}
})
db.users.find(
{
$and: [
{followers_count: {$gt: 100000}},
{followers_count: {$lt: 200000}}
]
})
Read Operations - Query criteria
Logical query operators
$and ????
# brands with more than 1M of followers
# or celebrities with more than 200k followers
db.getCollection('users_copy').find(
{$or:
[
{$and: [
{user_type: 'brand'},
{followers_count: {$gte: 1000000}}
]},
{$and: [
{user_type: 'celebrity'},
{followers_count: {$gte: 200000}}
]}
]
})
Read Operations - Query criteria
Array query operators
############# all
db.users.find(
{related_brands:
{$all: ['zara', 'mango']}
}
)
############# in
db.users.find(
{related_brands:
{$in: ['zara', 'mango']}
}
)
############# size
db.users.find({related_brands: {$size: 5}})
Read Operations - Cursor modifiers
############# count
db.users.find({country:"USA"}).count()
############# limit
db.users.find({country:"USA"}).limit(5)
############# sort
db.users.find({country:"USA"})
.sort({'username':1, followers_count: -1})
############# pretty
b.users.find({username: 'kimkardashian'}).pretty()
The order of keys matters in MongoDB
Read Operations
db.books.find().pretty()
Read Operations
The order of keys matters in MongoDB
{
"_id" : ObjectId("58b4af99245cbeab4ab25b51"),
"title" : "Woe from Wit",
"meta" : {
"author" : "A. Griboyedov",
"year" : 1823
}
}
The order of keys matters in MongoDB
db.books.find(
{ meta:
{ year: 1823, author: "A. Griboyedov" }
}
).count()
db.books.find(
{ meta:
{ author: "A. Griboyedov", year: 1823 }
}
).count()
0
1
Read Operations
Update Operations
- db.collection.updateOne()
- db.collection.updateMany()
- db.collection.replaceOne()
Update Operations - upsert
db.people.updateMany(
{ name: "Andy" },
{
name: "Andy",
rating: 1,
score: 1
},
{ upsert: true }
)
If set to true, creates a new document when no document matches the query criteria.
Create/Update Operations - save
Updates an existing document or inserts a new document, depending on its document parameter.
db.products.save( { item: "book", qty: 40 } )
> WriteResult({ "nInserted" : 1 })
db.inventory.updateMany(
{ _id: 1 },
{ $addToSet: { tags: ["accessories", "camera"] } }
)
{ _id: 1, item: "polarizing_filter", tags: [ "electronics", "camera" ] }
db.inventory.updateMany(
{ _id: 2 },
{ $addToSet: {
tags: {
$each: [ "camera", "electronics", "accessories" ] }
}
}
)
{ "_id" : 1, "item" : "polarizing_filter",
"tags" : [ "electronics", "camera", [ "accessories", "camera" ] ] }
{ "_id" : 1, "item" : "polarizing_filter",
"tags" : [ "electronics", "camera", "accessories" ] }
Update Operations - arrays
Delete Operations
db.characters.bulkWrite(
[
{ insertOne :
{
"document" :
{
"_id" : 4, "char" : "Dithras", "class" : "barbarian", "lvl" : 4
}
}
},
{ insertOne :
{
"document" :
{
"_id" : 5, "char" : "Taeln", "class" : "fighter", "lvl" : 3
}
}
},
{ updateOne :
{
"filter" : { "char" : "Eldon" },
"update" : { $set : { "status" : "Critical Injury" } }
}
},
{ deleteOne :
{ "filter" : { "char" : "Brisbane"} }
},
{ replaceOne :
{
"filter" : { "char" : "Meldane" },
"replacement" : { "char" : "Tanys", "class" : "oracle", "lvl" : 4 }
}
}
]
);
Bulk Write Operations
{
"acknowledged" : true,
"deletedCount" : 1,
"insertedCount" : 2,
"matchedCount" : 2,
"upsertedCount" : 0,
"insertedIds" : {
"0" : 4,
"1" : 5
},
"upsertedIds" : {
}
}
Bulk Write Operations
Bulk Write Operations
- Ordered list of operations: executes the operations serially.
- Unordered list of operations: can execute the operations in parallel, but is not guaranteed.
- By default, bulkWrite() performs ordered operations.
To specify unordered write operations, setordered : false in the options document.
Aggregation
Aggregation Pipeline
Aggregation Expressions
- Data is processed in RAM
-
You can use indexes to improve performance in the various stages. In addition, it has an internal optimization phase.
-
It runs in native code written in C ++
- "Flexible, functional and easy to learn"
db.people.aggregate(
{
$match: {
isActive: true
}
},
{
$project: {
isActive: 1,
name: 1,
mainAddress: "$address.primary"
}
});
Aggregation Expressions
db.people.aggregate(
{
$group: {
_id: {
gender: "$gender"
},
averageAge: {
$avg: "$age"
},
count: {
$sum: 1
}
}
});
Aggregation Expressions
db.inventory.aggregate( [ { $unwind : "$sizes" } ] )
Aggregation Expressions - $unwind
{ "_id" : 1, "item" : "ABC1", "sizes" : "S" }
{ "_id" : 1, "item" : "ABC1", "sizes" : "M" }
{ "_id" : 1, "item" : "ABC1", "sizes" : "L" }
{ "_id" : 1, "item" : "ABC1", sizes: [ "S", "M", "L"] }
{ "_id" : 1, "item" : "abc", "price" : 12, "quantity" : 2 }
{ "_id" : 2, "item" : "jkl", "price" : 20, "quantity" : 1 }
{ "_id" : 3 }
Aggregation Expressions - $lookup
{ "_id" : 1, "sku" : "abc", description: "product 1", "instock" : 120 }
{ "_id" : 2, "sku" : "def", description: "product 2", "instock" : 80 }
{ "_id" : 3, "sku" : "ijk", description: "product 3", "instock" : 60 }
{ "_id" : 4, "sku" : "jkl", description: "product 4", "instock" : 70 }
{ "_id" : 5, "sku": null, description: "Incomplete" }
{ "_id" : 6 }
Orders
Inventory
db.orders.aggregate([
{
$lookup:
{
from: "inventory",
localField: "item",
foreignField: "sku",
as: "inventory_docs"
}
}
])
Aggregation Expressions - $lookup
Aggregation Expressions - $lookup
{
"_id" : 1,
"item" : "abc",
"price" : 12,
"quantity" : 2,
"inventory_docs" : [
{ "_id" : 1, "sku" : "abc", description: "product 1", "instock" : 120 }
]
}
{
"_id" : 2,
"item" : "jkl",
"price" : 20,
"quantity" : 1,
"inventory_docs" : [
{ "_id" : 4, "sku" : "jkl", "description" : "product 4", "instock" : 70 }
]
}
{
"_id" : 3,
"inventory_docs" : [
{ "_id" : 5, "sku" : null, "description" : "Incomplete" },
{ "_id" : 6 }
]
}
Map Reduce
While the custom JavaScript provide great flexibility compared to the aggregation pipeline, in general, map-reduce is less efficient and more complex than the aggregation pipeline.
Map Reduce
Map Reduce - Example
{
_id: ObjectId("50a8240b927d5d8b5891743c"),
cust_id: "abc123",
ord_date: new Date("Oct 04, 2012"),
status: 'A',
price: 25,
items: [ { sku: "mmm", qty: 5, price: 2.5 },
{ sku: "nnn", qty: 5, price: 2.5 } ]
}
Return the Total Price Per Customer
Map Reduce - Example
Return the Total Price Per Customer
# 1- Define the map function to process each input document:
var mapFunction1 = function() {
emit(this.cust_id, this.price);
};
# 2- Define the reduce function:
var reduceFunction1 = function(keyCustId, valuesPrices) {
return Array.sum(valuesPrices);
};
# 3- Perform the map-reduce on all documents
db.orders.mapReduce(
mapFunction1,
reduceFunction1,
{ out: "map_reduce_example" }
)
Indexes
Before creating indexes
-
Understand your data
-
How your queries will be
-
Create as few indexes as possible
-
Put your highly selective fields first in the index
-
Check your current indexes before creating new ones
Create index
B-Tree
Index types
Single Field
{
"_id": ObjectId("570c04a4ad233577f97dc459"),
"score": 1034,
"location": { state: "NY", city: "New York" }
}
db.records.createIndex( { score: 1 } )
db.records.find( { score: 2 } )
db.records.find( { score: { $gt: 10 } } )
The sort order (i.e. ascending or descending) of the index key does not matter
Index types
Compound Index
{
"_id": ObjectId(...),
"item": "Banana",
"category": ["food", "produce", "grocery"],
"location": "4th Street Store",
"stock": 4,
"type": "cases"
}
db.products.createIndex( { "item": 1, "stock": 1 } )
db.products.find( { item: "Banana" } )
db.products.find( { item: "Banana", stock: { $gt: 5 } } )
The order of fields has significance
Index types
Multikey Index: index the content stored in arrays
{ _id: 5, type: "food", item: "aaa", ratings: [ 5, 8, 9 ] }
{ _id: 6, type: "food", item: "bbb", ratings: [ 5, 9 ] }
{ _id: 7, type: "food", item: "ccc", ratings: [ 9, 5, 8 ] }
{ _id: 8, type: "food", item: "ddd", ratings: [ 9, 5 ] }
{ _id: 9, type: "food", item: "eee", ratings: [ 5, 9, 5 ] }
db.inventory.createIndex( { ratings: 1 } )
db.inventory.find( { ratings: [ 5, 9 ] } )
Index types
Text Indexes: supports searching for string content in a collection
- A collection can have at most one text index
- You can index multiple fields for the text index
- Starting in MongoDB 3.2, MongoDB introduces a version 3 of the text index (improved case insensitivity, diacritic insensitivity & additional delimiters for tokenization)
Index types
Text Indexes - example
db.articles.insert(
[
{ _id: 1, subject: "coffee", author: "xyz", views: 50 },
{ _id: 2, subject: "Coffee Shopping", author: "efg", views: 5 },
{ _id: 3, subject: "Baking a cake", author: "abc", views: 90 },
{ _id: 4, subject: "baking", author: "xyz", views: 100 },
{ _id: 5, subject: "Café Con Leche", author: "abc", views: 200 },
{ _id: 6, subject: "Сырники", author: "jkl", views: 80 },
{ _id: 7, subject: "coffee and cream", author: "efg", views: 10 },
{ _id: 8, subject: "Cafe con Leche", author: "xyz", views: 10 }
]
)
db.articles.createIndex( { subject: "text" } )
Index types
Text Indexes - example
# Search string of coffee
db.articles.find( { $text: { $search: "coffee" } } )
# Search string of three terms delimited by space, "bake coffee cake"
db.articles.find( { $text: { $search: "bake coffee cake" } } )
# Searches for the phrase coffee shop
db.articles.find( { $text: { $search: "\"coffee shop\"" } } )
# Searches for documents that contain the words coffee
# but do not contain the term shop
db.articles.find( { $text: { $search: "coffee -shop" } } )
# The following query specifies es, i.e. Spanish, as the language that
# determines the tokenization, stemming, and stop words
db.articles.find(
{ $text: { $search: "leche", $language: "es" } }
)
# Case sensitive search for the phrase Café Con Leche
db.articles.find( {
$text: { $search: "\"Café Con Leche\"", $caseSensitive: true }
} )
# Diacritic sensitive text search on the term CAFÉ,
# or more precisely the stemmed version of the word
db.articles.find( { $text: { $search: "CAFÉ", $diacriticSensitive: true } } )
Index Properties
Sparse Indexes
Only contain entries for documents that have the indexed field, even if the index field contains a null value
db.scores.createIndex(
{ score: 1 } ,
{ sparse: true }
)
Index Properties
Sparse Indexes
# is using the index...
db.scores.find( { score: { $lt: 90 } } )
# is not using the index...
db.scores.find().sort( { score: -1 } )
# you can explicitly specify the index...
db.scores.find().sort( { score: -1 } ).hint( { score: 1 } )
db.scores.createIndex(
{ score: 1 } ,
{ sparse: true }
)
Index Properties
Partial Indexes (New in version 3.2)
Only index the documents in a collection that meet a specified filter expression
db.restaurants.createIndex(
{ cuisine: 1, name: 1 },
{ partialFilterExpression: { rating: { $gt: 5 } } }
)
Index Properties
Partial Indexes (New in version 3.2)
db.restaurants.createIndex(
{ cuisine: 1, name: 1 },
{ partialFilterExpression: { rating: { $gt: 5 } } }
)
# is using the index...
db.restaurants.find( { cuisine: "Italian", rating: { $gte: 8 } } )
# is not using the index...
db.restaurants.find( { cuisine: "Italian", rating: { $lt: 8 } } )
db.restaurants.find( { cuisine: "Italian" } )
Partial indexes offer a superset of the functionality of sparse indexes. If you are using MongoDB 3.2 or later, partial indexes should be preferred over sparse indexes.
Index Properties
Unique Indexes
Reject duplicate values for the indexed field
db.members.createIndex(
{ groupNumber: 1, lastname: 1, firstname: 1 },
{ unique: true }
)
Unique Partial Indexes - New in version 3.2.
If you specify both the partialFilterExpression and a unique constraint, the unique constraint only applies to the documents that meet the filter expression.
Index Properties
TTL Indexes (Time-To-Live)
Automatically remove documents from a collection after a certain amount of time or at a specific clock time
db.eventlog.createIndex(
{ "lastModifiedDate": 1 },
{ expireAfterSeconds: 3600 }
)
Covered Query
An index covers a query when both of the following apply:
- all the fields in the query are part of an index, and
- all the fields returned in the results are in the same index
# index
db.inventory.createIndex( { type: 1, item: 1 } )
# cover query
db.inventory.find(
{ type: "food", item:/^c/ },
{ item: 1, _id: 0 }
)
It doesn't work with multi-key and field-embedded indexes
Explain
db.collection.find().explain()
Verbose options:
- queryPlanner: default
- executionStats: stats for the winner plan
- allPlansExecution: stats for all candidate plans
Explain
# without index...
db.getCollection('posts').find(
{username: 'kimkardashian', timestamp: {$gte: 1483228800}}
).explain('executionStats')
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "instagram.posts",
[...]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 49,
"executionTimeMillis" : 2956,
"totalKeysExamined" : 0,
"totalDocsExamined" : 2644172,
[...]
},
}
Explain
# with 2 simple index...
db.getCollection('posts').createIndex({username: 1})
db.getCollection('posts').createIndex({timestamp: -1})
# query
db.getCollection('posts').find(
{username: 'kimkardashian', timestamp: {$gte: 1483228800}}
).explain('executionStats')
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "instagram.posts",
[...]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 49,
"executionTimeMillis" : 38,
"totalKeysExamined" : 3670,
"totalDocsExamined" : 3670,
[...]
},
}
Explain
# with a compound index...
db.getCollection('posts').createIndex({username: 1, timestamp: -1})
# query
db.getCollection('posts').find(
{username: 'kimkardashian', timestamp: {$gte: 1483228800}}
).explain('executionStats')
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "instagram.posts",
[...]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 49,
"executionTimeMillis" : 124,
"totalKeysExamined" : 49,
"totalDocsExamined" : 49,
[...]
},
}
Explain
# with a compound index...
db.getCollection('posts').createIndex({username: 1, timestamp: -1})
# with a covered query
db.getCollection('posts').find(
{username: 'kimkardashian', timestamp: {$gte: 1483228800}},
{username: 1, timestamp: 1}
).explain('executionStats')
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "instagram.posts",
[...]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 49,
"executionTimeMillis" : 0,
"totalKeysExamined" : 49,
"totalDocsExamined" : 49,
[...]
},
}
Not Indexable Queries
Lessons learned
"sometimes you win,
sometimes you learn"
The document model - very important!!
The document model - very important!!
MongoDB as attribute of PostgreSQL...
MongoDB in production needs maintenance tasks
Thank you!
MongoDB Basics
By aliciapj
MongoDB Basics
- 1,719