from 0 to Blog with Spring Boot and MongoDB
Drone racing hobbyist
A NoSQL database is a non (striclty) SQL database.
It provides a mechanism for storage and retrieval of data that is not modeled in tabular form like traditional RDBMS do. [wiki]
Existed since 1960s, but they were called differently back then.
Only from 2009 they gained the name NoSQL because they needed a short hashtag to tweet about the first meetup in SF. Nobody expected to be the next big db movement!
A term for data sets that are so large that traditional methods of storage & processing are inadequate
Massive increase in data volume within the last decade
Social networks, search engines etc..
Challenges in storage, capture, analysis, transfer etc..
Stuff was getting slower. Upgrading hardware was not enough anymore (Scale-in)
Big players said: «We need to change; I will develop my own data storage!»
Scale-in is expensive and many companies were going in the direction of having "many small clusters"
Different NoSQL db for different purposes, they share common traits:
non relational
open-source
cluster-friendly
schema-less
If designed correctly, handles Big Data successfully
Extreme flexible Data Models - Schema less => but you need to handle the consistency of your data in the application layer
Cheap(er) to manage
Easy Scaling
Schema Migrations tools are optional - Flyway, fly away!
Relational Databases are better for relational data! (duh!)
Normalization (eg 3NF) => eliminate redundancy, less space required, faster access in some cases
SQL is well known language (it's been around for 30 years)
Data Integrity => using foreign key constraints help reducing inconsistency
A C I D
Mongo is a document based DB: every model, query and data stored in MongoDB is a BSON that is an extension of JSON (Javascript Object Notation)
Can you tell how many tables in a normalized relational database would be needed to store the same amount information displayed on the right (single document)?
« In a nutshell, MongoDB allows developers to design data models that support common data access patterns.»
{
"title":"Game of Thrones",
"producers":["Television 360","Grok! Television",
"Generator Entertainment"..],
"createdBy":["David Benioff", "D. B. Weiss"],
"firstSeasonReleased": 2009,
"locations":[{"country" : "Northern Ireland",
"city":"...", "dateShooted": [
{..},..]} ],
"website":"https://www.hbo.com/game-of-thrones",
"characters":["...","..."],
"seasons":[{"ordinal": 1, "newCharacters":[],
"plot": ""}....]
}
tv series
All documents in MongoDB must have an _id field that is unique (key).
If you don't specify one when inserting, Mongo will create one for you as ObjectId() that is guaranteed to be unique within that collection.
JSON value types are limited e.g.:
see details at bsonspec.org/spec.html
BSON on the other hand is:
Create (DDL):
SQL:
create table users(user_id varchar2(10),name varchar2(10),age number)
Mongo:
db.createCollection("users"); [optional]
Read:
SQL:
Select * from users Where age!=23
Mongo:
db.users.find({ age: {$ne:23} })
Update:
SQL:
UPDATE Customers
SET customerId = 1
WHERE customerID = 0
Mongo:
db.users.update({cutomerId:0},
{ $set: { customerId: 1} }, { multi: true })
Delete:
SQL:
DELETE FROM USERS WHERE ID = 6
Mongo:
db.users.remove({id:6})
DDL in Mongo is less strict than SQL. Some DML statements in comparison:
REPL!
The query language for mongoDB use a query-by-example strategy, let's give it a look!
Beside
There is also a type safe object called
It requires the explicit field type when declaring a Document => avoid Runtime Exceptions for type mismatch.
Document()
BsonDocument()
you@yourmachine:~$ git checkout exercise0
you@yourmachine:~$ git clone https://github.com/tonycatapano/Bloggo.git or via ssh you@yourmachine:~$ git clone git@github.com:tonycatapano/Bloggo.git
@MongoCollection(name = "comments")
public class Comment {
@Id
public String id;
public String text;
}
public class BlogPost {
@Id
public String id;
@ObjectId
public List<DBRef<Comment, String>> comments;
}
BlogPost post = coll.findOneById(someId);
for (DBRef<Comment, String> comment : post.comments) {
System.out.println(comment.fetch().text);
}
DAO Layer
Demo!
Starting situation:
Goals of normalization:
In designing with Mongo you have to think out of the box. This is not relational.
Always tradeoff between linking or embedding.
Decision based on:
{
_id: 20,
name: "Ragnar",
resume: 1
}
{
_id:1,
jobs:[],
education:[],
employee:20
}
Employee
Resume
Considering:
{
_id: 20,
name: "Ragnar",
resume: {
jobs:[],
education:[]
}
}
{
_id:1,
jobs:[],
education:[],
employee: {
name: "Ragnar"
}
}
{
_id: 20,
name: "Ragnar",
...
}
{
_id:1,
name: "NYC",
inhabitants:[1,2,3],
...
}
Person
City
Solution: true linking
{
_id: 20,
name: "Ragnar",
city: {name: "NYC",
zipCode: 123,
population:8.538.000},
...
}
{
_id: "1",
name: "Ragnar",
city: "NYC",
...
}
{
_id:"NYC",
zipCode: 123,
population:8.538.000
}
when the Many are in the order of "few"
{
_id: 1,
name: "Mark",
surname:"Zuckerberg",
cars:["Ferrari", "Lamborghini"]
}
Person
EMBED!
{
_id: 11,
title: "The Matrix",
director: "The Wachowski
'siblings'",
producer: "Joel Silver",
peopleInvolved: [1,2,3...,n]
}
{
_id: 222,
name: "Nicola Nardone",
role: "bullet-time expert"
workedOnMovies: [1,2,3,...,n]
}
Movies
People
{
idMovie:11
idPerson:22
}
PeopleMovies
{
_id: 20,
title: "The Sagas of Ragnar Lodbrok",
author:[1,2]
}
{
_id:1,
author_name:"Ben Waggoner",
education:[],
books:[11,20,42]
}
Book
Authors
{
_id: 10,
category: 7,
prod_name: "16 GB USB DRIVE 3.0"
}
{
_id:7,
category_name:"usb drives",
parent:6
}
Product
Category
Problems:
{
_id:7,
category_name:"usb drives",
ancestors:[8,9,10]
}
Category
{
_id:7,
category_name:"usb drives",
children:[8,9,10]
}
Category
Restructure code: single document FTW
Implement locking, semaphores, critical sections...
Tolerate: e.g. Facebook feed
After what we just saw we can design our blog with just 2 (+1) collections:
{
"_id" : ObjectId("59e336472687c8737a5c2735"),
"title" : "A RANDOM BLOG POST",
"author" : "ragnar",
"body" : "first post tahahhahaha",
"tags" : [
"uolo",
"miesta",
"asdoas",
"cocacola"
],
"comments" : [ {
"author" : "Tony",
"email" : "info@toninocatapano.com",
"body" : "this blog in Mongo sucks"
} ],
"date" : ISODate("2017-10-15T10:19:51.691Z"),
"permalink" : "a_random_blog_post"
}
{
"_id" : "ragnar",
"password" : "HScSZiHvv73vv71kY++/ve+/vQrvv73vv71R77+9,-1847318508",
"email" : "therealspartan77@vikings.org"
}
posts
users
you@yourmachine:~$ git checkout exercise1
you@yourmachine:~$ git checkout exercise2
Two ways to impact latency & throughput of database queries:
From Mongo 3.0 we have Pluggable Storage engines:
MMAPv1
Wired Tiger (default since 3.2)
vs
{name:"Doom III", genre:"FPS", price:40, extras:"Doom 1990 maps"}
videogames
collection on disk
indexes on name
Amnesia
Zelda
{..}
{..}
{..}
{..}
{..}
{..}
{..}
{..}
{..}
{..}
Foreground (default)
Background
vs
alternative: replica set redirection
To benefits of the speed of having indexes make sure that the size of your indexes fits into your memory, let's see how to assess that.
Memory
Working set
Indexes
slow
A query where all the queried fields AND projected fields are indexed. This kind of query is, as you may expect, the fastest.
REPL!
you@yourmachine:~$ git checkout exercise3
Mongo allows to query documents based on coordinates in a 2d space.
There is also a 3d model called GeoSphere that is far more complete and allows you to define a location with an area or on planes that are 3Dimensional.
And query by range!
Let's see how to use it.
you@yourmachine:~$ git checkout exercise4
Text Indexes : easy search on big texts
Multi indexes aka indexes on arrays
Query profiling
MongoDB Atlas: Database as a Service
Embedded JS interpreter: Store JS functions on Mongo and use them for distributed Map-Reduce aggregations, $eval or $where
Aggregation Framework aka groupBy
Replication and fail-over support
(auto) Sharding