NoSQL -> Not Only SQL
Alternative approach to RDBMS (relational model)
Key-value
Document
Graph
Ref software: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
Collection of key-values known as:
dictionary, associative array, hashes, maps, etc.
Key-value storage.
Nowadays more than simple key-value.
Permanent or in-memory
Examples:
With MediaWiki: Wikipedia, AnnoWiki
http://ttltheory.wordpress.com/tag/redis-examples/
http://highscalability.com/blog/2011/7/6/11-common-web-use-cases-solved-in-redis.html
JavaScript Object Notation
Textual way to share objects
In JavaScript, associative arrays are objects.
xsltproc (XSLT)
XML DOM, XPath, etc., no efficient for big files!
Reference: Pierre Lindenbaum
Semi-structured model
Schema-free
No separation between data and schema
Document formats:
XML, YAML, JSON, BSON
Popular document store (apart from MongoDB)
Can have different databases
Replication (master-master, master-slave, etc.)
Focus on consistency - ACID
(Atomicity, consistency, isolation, durability)
What is a Document?
JSON!
Everything is WEB
EVERYTHING, for the good and for the bad…
Operation | SQL | HTTP |
---|---|---|
Create | INSERT | PUT / POST |
Read (Retrieve) | SELECT | GET |
Update (Modify) | UPDATE | PUT / PATCH |
Delete (Destroy) | DELETE | DELETE |
Design document
JavaScript: Map/reduce
Temporary and Permanent views
Procedure that performs filtering and sorting
Outcome:
key : value (which can be composite)
Procedure that performs an aggregation operation
from the former values
Some interesting docs:
Map Reduce in CouchDB
http://www.slideshare.net/okurow/couchdb-mapreduce-13321353
View Cookbook for SQL Jockeys
http://guide.couchdb.org/draft/cookbook.html
Writing reduce functions
http://www.bitsbythepound.com/writing-a-reduce-function-in-couchdb-370.html
Thanks to PouchDB
Sync DBs in:
with the same RESTful syntax.
PHP - JS
Python
Blast-Bypass pipeline
Prediction of protein function improving sequence remote alignment search by a fuzzy logic algorithm. Antonio Gómez, Juan Cedano, Jordi Espadaler, Antonio Hermoso, Jaume Piñol, Enrique Querol (2008) The protein journal 27 (2) p. 130-139
Vertices (nodes) VS edges (relationships)
Self-explanation:
NCBI Taxonomy - Simple Hierarchy
Gene Ontology (molecular function, biological process, cellular component) - 3 DAGs
Related: NCBI Taxonomy in MySQL
Most popular GraphDB nowadays. JAVA based.
One DB is one instance (in one port, standard 7474)
You can have different data, with different labels
Nodes and relations are imported as JSON documents
It's very important to properly define indexes (Lucene backend)
SQL-like language
MATCH s-[*0..3]->(t:TAXID { rank:"family", scientific_name:"Hominidae" })
WHERE s.rank="genus"
RETURN s.scientific_name as name, s.rank as rank limit 50;
Query
http://127.0.0.1:7474/db/data/index/node/TAXID/id/9606
Upload (in batches)
In Python: py2neo
Jersey - REST-API for Java
Maven (project management)
Nowadays much faster than using Cypher :(
NodeJS Express interface accessing Neo4J and MySQL
PRGdb 2.0: towards a community-based database model for the analysis of R-genes in plants.
Walter Sanseverino, Antonio Hermoso, Raffaella D'Alessandro, Anna Vlasova, Giuseppe Andolfo, Luigi Frusciante, Ernesto Lowy, Guglielmo Roma, Maria Raffaella Ercolano (2013)
Nucleic acids research 41 (Database issue) p. D1167-71
or, rather said, things I'd like to try...
ArangoDB (key-value, document and graph, 3-in-1)
MariaDB (MySQL fork) with JSON support and dynamic columns http://www.slideshare.net/blueskarlsson/using-json-with-mariadb-and-mysql