NoSQL approaches with common bioinformatic examples
NoSQL
NoSQL -> Not Only SQL
NoSQL
Alternative approach to RDBMS (relational model)
NoSQL DB types
Key-value
Document
Graph
Ref software: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
Key-value
Collection of key-values known as:
dictionary, associative array, hashes, maps, etc.
Redis
Key-value storage.
Nowadays more than simple key-value.
Permanent or in-memory
Examples:
- Queues
- Caching
- etc.
With MediaWiki: Wikipedia, AnnoWiki
http://ttltheory.wordpress.com/tag/redis-examples/
http://highscalability.com/blog/2011/7/6/11-common-web-use-cases-solved-in-redis.html
JSON
JavaScript Object Notation
Textual way to share objects
In JavaScript, associative arrays are objects.
JSON vs XML
JSON vs XML
Convert XML to JSON
xsltproc (XSLT)
XML DOM, XPath, etc., no efficient for big files!
Reference: Pierre Lindenbaum
Document stores
Semi-structured model
Schema-free
No separation between data and schema
Document formats:
XML, YAML, JSON, BSON
CouchDB
Popular document store (apart from MongoDB)
Can have different databases
Replication (master-master, master-slave, etc.)
Focus on consistency - ACID
(Atomicity, consistency, isolation, durability)
CouchDB - document
What is a Document?
JSON!
- id
- rev
CouchDB - REST API
Everything is WEB
EVERYTHING, for the good and for the bad…
Operation | SQL | HTTP |
---|---|---|
Create | INSERT | PUT / POST |
Read (Retrieve) | SELECT | GET |
Update (Modify) | UPDATE | PUT / PATCH |
Delete (Destroy) | DELETE | DELETE |
CouchDB - Views
Design document
JavaScript: Map/reduce
Temporary and Permanent views
Map/Reduce
Map
Procedure that performs filtering and sorting
Outcome:
key : value (which can be composite)
Map/Reduce
Reduce
Procedure that performs an aggregation operation
from the former values
Map/Reduce
Some interesting docs:
Map Reduce in CouchDB
http://www.slideshare.net/okurow/couchdb-mapreduce-13321353
View Cookbook for SQL Jockeys
http://guide.couchdb.org/draft/cookbook.html
Writing reduce functions
http://www.bitsbythepound.com/writing-a-reduce-function-in-couchdb-370.html
CouchDB - world friendly
Thanks to PouchDB
Sync DBs in:
- terminal (e.g. levelDB)
- browser (e.g. indexedDB)
- server (couchDB)
with the same RESTful syntax.
CouchDB - other libraries
PHP - JS
Python
Example application
Blast-Bypass pipeline
Prediction of protein function improving sequence remote alignment search by a fuzzy logic algorithm. Antonio Gómez, Juan Cedano, Jordi Espadaler, Antonio Hermoso, Jaume Piñol, Enrique Querol (2008) The protein journal 27 (2) p. 130-139
GraphDB
Vertices (nodes) VS edges (relationships)
Self-explanation:
Types of graphs
NCBI Taxonomy - Simple Hierarchy
Gene Ontology (molecular function, biological process, cellular component) - 3 DAGs
Related: NCBI Taxonomy in MySQL
Neo4J
Most popular GraphDB nowadays. JAVA based.
One DB is one instance (in one port, standard 7474)
You can have different data, with different labels
Nodes and relations are imported as JSON documents
It's very important to properly define indexes (Lucene backend)
Cypher
SQL-like language
MATCH s-[*0..3]->(t:TAXID { rank:"family", scientific_name:"Hominidae" })
WHERE s.rank="genus"
RETURN s.scientific_name as name, s.rank as rank limit 50;
REST API
Query
http://127.0.0.1:7474/db/data/index/node/TAXID/id/9606
Upload (in batches)
In Python: py2neo
Lowest Common Ancestor
JAVA extensions
Jersey - REST-API for Java
Maven (project management)
Nowadays much faster than using Cypher :(
Example of API implementation
NodeJS Express interface accessing Neo4J and MySQL
PRGdb 2.0: towards a community-based database model for the analysis of R-genes in plants.
Walter Sanseverino, Antonio Hermoso, Raffaella D'Alessandro, Anna Vlasova, Giuseppe Andolfo, Luigi Frusciante, Ernesto Lowy, Guglielmo Roma, Maria Raffaella Ercolano (2013)
Nucleic acids research 41 (Database issue) p. D1167-71
New challengers and curiosities
or, rather said, things I'd like to try...
ArangoDB (key-value, document and graph, 3-in-1)
MariaDB (MySQL fork) with JSON support and dynamic columns http://www.slideshare.net/blueskarlsson/using-json-with-mariadb-and-mysql
There can be only one ?
NoSQL approaches with common bionformatic examples
By Similis.cc
NoSQL approaches with common bionformatic examples
Description of some NoSQL usage cases in Bioinformatics.
- 4,932