Weekly Delivery
Transport/SAFER
Data Storage
Apache HBase - a column-oriented datastore
Pros:
+ Fast data load (bulk, progressive or streamed)
+ Fast retrieval of results by row key and/or by column (family, descriptor)
+ Flexible schema, versioned cells
+ Java API and REST API, integration with numerous platforms (Flink, Spark, Hive, Impala)
+ Auto sharding and load balancing in a cluster setup
+ Column-level autorization based on Access Control Lists
Cons:
- Alone it does not offer the features of a RDBMS (e.g. transactions, secondary indexes, query language)
- The need for complementary tools to obtain these features (e.g. Apache Phoenix)
- Denormalized schema
Data Processing
Apache Flink - a platform for distributed analytics
Pros:
+ Broad integration (e.g. HDFS, HBase)
+ High throughput, low latency
+ Support for iterative computations (e.g. machine learning and graph analysis)
+ Supports Java and Scala
Cons:
- Not quite as developed as e.g. Spark
- Does not support Python yet
Visualization
Hue - a web interface to Hadoop
Pros:
+ Free and Open Source
+ Includes web applications for integration with Hadoop (e.g. HBase, Spark, Hive, Impala)
+ Data exploration
+ SDK
+ Under development
Cons:
- Not a pure visualization tool
- Limitations when it comes to visualization and data handling
- No support for multimedia
WeeklyDelivery
By Oscar Ivarsson
WeeklyDelivery
- 185