PHC7065 CRITICAL SKILLS IN DATA MANIPULATION FOR POPULATION SCIENCE

NoSQL Databases 1

Hui Hu Ph.D.

Department of Epidemiology

College of Public Health and Health Professions & College of Medicine

February 12, 2017

NoSQL Databases and Semi-structured Data Models
 

Introduction to MongoDB

 

Lab: MongoDB 1

NoSQL Databases and Semi-structured Data Model

NoSQL

  • NoSQL: "non SQL", "non relational" or "not only SQL"
     
  • It provides a mechanism for storage and retrieval of data which is modeled in means other than the tabular relations used in relational databases
     
  • NoSQL systems are also sometimes called "Not only SQL" to emphasize that they may support SQL-like query languages.

Types of NoSQL Databases

  • Column: Accumulo, Cassandra, Druid, HBase, Vertica, SAP HANA
     
  • Document: Apache CouchDB, ArangoDB, Clusterpoint, Couchbase, DocumentDB, HyperDex, IBM Domino, MarkLogic, MongoDB, OrientDB, Qizx, RethinkDB
     
  • Key-value: Aerospike, ArangoDB, Couchbase, Dynamo, FairCom c-treeACE, FoundationDB, HyperDex, MemcacheDB, MUMPS, Oracle NoSQL Database, OrientDB, Redis, Riak, Berkeley DB
     
  • Graph: AllegroGraph, ArangoDB, InfiniteGraph, Apache Giraph, MarkLogic, Neo4J, OrientDB, Virtuoso, Stardog
     
  • Multi-model: Alchemy Database, ArangoDB, CortexDB, Couchbase, FoundationDB, MarkLogic, OrientDB

Semi-structured Data

  • A form of structured data that does not conform with the formal structure of data models associated with relational databases or other forms of data tables
     
  • It contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields within the data
     
  •  Also known as self-describing structure
     
  • The data model behind the web

html

  • Semi-structured data are usually tree-structured

body

h1

p

p

li

li

Tree Data Structure

  • Paper
    -  getParent -> document
    -  getChildren -> title, author, source
    -  getSibling -> report
     
  • "June 12, 2000"
    -  root-to-node path: document/report/date/"June 12, 2000"
     
  • Queries need tree navigation
    -  e.g. author of "XML query data model"

xml

  • The entities belonging to the same class may have different attributes even though they are grouped together, and the attributes' order is not important

Extensive Markup Language

  • Allows the querying of both schema and data
    -  What is the name of the element which contains a sub-element whose value is "CellType"?

JSON

JavaScript Object Notation

  • Key-value pairs
  • Tuple
  • Arrays (indicated by square brackets)
[
    {
     _id: 1,
     name: "john",
     age: 21,
     type: 1,
     status: "A",
     favorites: {artist:"Picasso",food:"pizza"},
     finished: [17,3],
     badges: ["blue","black"],
     points: [
         {points: 85, bonus: 10},
         {points: 70, bonus: 5}
     ]
    },
    {
     _id: 2,
     name: "mike",
     age: 20
    }
]

One structure can be embedded in another structure

Introduction to MongoDB

MongoDB

  • MongoDB (from "humongous") is a cross-platform document-oriented database.
  • Classified as a NoSQL database, MongoDB eschews the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas (BSON), making the integration of data in certain types of applications easier and faster
  • SQL to MongoDB Mapping Chart

find()

SELECT

  • MongoDB is a collection of documents
db.collection.find(<query filter>,<projection>).<cursor modifier>

Similar to FROM, specifies the collection to use

Similar to WHERE, specifies which documents to return

Projection variables in SELECT

How many results to return

Examples

  • Query 1:
    -  SQL: SELECT * FROM Student
    -  MongoDB: db.Student.find()
     
  • Query 2:
    -  SQL: SELECT a, b FROM Student
    -  MongoDB:
       db.Student.find(
         {},
         {a:1,b:1,_id:0}
       )
     
  • Query 3:
    -  SQL: SELECT a FROM Student WHERE b="c"
    -  MongoDB:
       db.Student.find( {b:"c"}, {a:1,_id:0} )

Examples (continued)

  • Query 4:
    -  SQL: SELECT DISTINCT a FROM Student WHERE c>10
    -  MongoDB:
       db.Student.find( {c:{$gt:10}},{a:1,_id:0} ).distinct(a)
     
  • Some Operators of MongoDB:
    $eq, $gt, $gte, $lt, $lte, $ne
    $in, $nin
    $or, $and, $not, $nor
     
  • Query 5:
    -  count the number of students whose names have the partial string "joh" in it -  must be case insensitive
    -  db.Student.find(name: {$regex: /joh/i}).count()
     
  • Query 6:
    -  same query, but name starts with "Joh"
    -  db.Student.find(name: {$regex:/^Joh/}).count()
     
  • Query 7:
    -  starts with "Joh" ends with "on"
    -  db.Student.count(name: {$regex:/^Jon.*on$/}
     
  • Regular Expressions

Array Operations

  • Find students who are taking courses "a" and "b"
    -  db.Student.find( {courses: {$in:["a","b"]} } )
     
  • Find students who are not taking courses "a" and "c"
    -  db.Student.find( {courses: {$nin:["a","c"]} } )
     
  • Find the second and third elements of courses
    -  db.Student.find( {}, {courses: {$slice: [1,2]} } )
    -  db.Student.find( {}, {courses: {$slice: -2} } )

     
  • Find a student whose second element in courses is "b"
    -  db.Student.find(courses.1:"b")
{_id:1,
name:"John",
age:20,
courses: ["a","b","c"]}

Compound Statements

  • Find students with age 20 or 21 and who are taking course "a" or "b" as their first course
     
  • SELECT * FROM Students
    WHERE (age=20 OR age=21) AND (course0="a" OR course0="b")
     
  •    db.Student.find({
       $and:[
          {$or: [{age:20},{age:21}] },
          {$or: [{courses.0:"a"},{courses.0:"b"}] }
       ]
       })
{_id:1,
name:"John",
age:20,
courses: ["a","b","c"]}

Queries over Nested Elements

_id:1,
    points: [
        {points: 90, bonus:10}
        {points: 60, bonus:20}
    ]
_id:2,
    points: [
        {points: 53, bonus:20}
        {points: 64, bonus:11}
    ]
_id:1,
    points: [
        {points: 99, bonus:17}
        {points: 10, bonus:24}
    ]
  • db.Points.find( {'points.0.points': {$lte: 80} } )
     
  • db.Points.find( {'points.points': {$lte: 80} } )
     
  • db.Points.find( { 'points.points':{$lte:81}, 'points.bonus': {$lte 20} } )
     

MongoDB 1

git pull

https://docs.mongodb.com/manual/tutorial/install-mongodb-on-ubuntu/

PHC7065-Spring2018-Lecture5

By Hui Hu

PHC7065-Spring2018-Lecture5

Slides for Lecture 5, Spring 2018, PHC7065 Critical Skills in Data Manipulation for Population Science

  • 500
Loading comments...

More from Hui Hu