Tony Su
MCSE
openSUSE Ambassador
Logstash Parser
Logstash Shipper
Redis
Netcat
Elasticsearch
Kibana
Marvel
Graphite
The industrialized world has been collecting data for a very long time (decades) without a clear understanding what to do with it
Against human all time champions Ken Jennings and Brad Rutter
Feb 14, 2011
"Data about Data"
Generates voluminous amounts. Slightly more than "3 degrees" of any telephone conversation potentially covers nearly all persons in No America even under the current FISA which restricts phone tapping to at least one participant overseas.
Document based - Content is analyzed
Graphanalysis - Relationships between nodes is analyzed
d3
Raphael
A competitive application stack to Hadoop/Solr/Pig/Hive/
Which was the basis for Yahoo search since mid-1990's
Still the standard solution for over 90% of solutions
Text
2004 Shay Banon original creator of predecessor called "Compass"
2010 Elasticsearch First Release
2014 Raises $70 million Series C funding
Other big data search apps
Sphynx (thepiratebay)
Xapian
Business Objectives Satisfied by Hadoop/Elasticsearch stacks
Businesses evolve (different purpose, different business objectives)
Data collected by businesses evolve
Text
Similarities
Text
document-based databases (as opposed to graphanalysis databases)
nosql
data fundamentally stored as key-value pairs
map reduce (highly parallelized data processing, sorting, reducing)
application clusters
commodity hardware over Servers with redundant subsystems
application stack allows deploying parts on different nodes
Based on lucene search engine
Collect data from multiple sources
Collect numerous datatypes
Differences
Text
hadoop stack ELK stack
every app different lang json and typically javascript
must create schema schema autogenerated if needed
different protocols http/json protocol
Search
Analysis
Artificial / Machine Intelligence
Search
Similar to web search engines
faceted search
Analysis
Extract meaning from data
Apply modeling to query results
Artificial / Machine Intelligence
Decision-making based on generated inputs
Related: Robotics, providing perfect near instantaneous recall, "experiences" to draw from
Text
Text