Weekly Delivery

Transport/SAFER

Data Storage

Apache HBase - a column-oriented datastore

Pros:

 + Fast data load (bulk, progressive or streamed)​

 + Fast retrieval of results by row key and/or by column (family, descriptor)

 + Flexible schema, versioned cells

 + Java API and REST API, integration with numerous platforms (Flink, Spark, Hive, Impala)

 + Auto sharding and load balancing in a cluster setup

 + Column-level autorization based on Access Control Lists

Cons:

 - Alone it does not offer the features of a RDBMS (e.g. transactions, secondary indexes, query language)

 - The need for complementary tools to obtain these features (e.g. Apache Phoenix)

 - Denormalized schema

Data Processing

Apache Flink - a platform for distributed analytics

Pros:

 + Broad integration (e.g. HDFS, HBase)

 + High throughput, low latency

 + Support for iterative computations (e.g. machine learning and graph analysis)

 + Supports Java and Scala

Cons:

 - Not quite as developed as e.g. Spark

 - Does not support Python yet

Visualization

Hue - a web interface to Hadoop

Pros:

 + Free and Open Source

 + Includes web applications for integration with Hadoop (e.g. HBase, Spark, Hive, Impala)

 + Data exploration

 + SDK

 + Under development

Cons:

 - Not a pure visualization tool

 - Limitations when it comes to visualization and data handling

 - No support for multimedia

WeeklyDelivery

By Oscar Ivarsson

WeeklyDelivery

  • 185