UCSD nEXT
Sept 10, 2015​

Tony Su

MCSE

openSUSE Ambassador

 

Big Data - Elasticsearch​

ELK demo Topogra​phy

Logstash Parser

Logstash Shipper

Redis

Netcat

Elasticsearch

 

Kibana

Marvel

Graphite

Where do you see this kind of app used?

  • Homeland security telephone number metadata search
  • Netflix (recommend selections to User)
  • Personal Assistants (Siri, Cortana, Google Now)
  • Web Search Engines (Google, Yahoo, Bing, etc)
  • Banks (Searching for fraud)
  • Sports Management
  • Fantasy Sports
  • Political Campaigns, particularly Democratic Party 2012
  • ESPN FiveThirtyEight

Where does the data come from?

The industrialized world has been collecting data for a very long time (decades) without a clear understanding what to do with it

  • Text
  • Video
  • Audio
  • Biometrics
  • Census data
  • Private company data
  • Financial data (government like IRS, Financial like banks, Credit Scoring companies)
  • YouTube
  • Google
  • Facebook (500 terabytes per day in 2012)
  •  = 500 000 gigabytes
  • Twitter

Sports Team Management

Player Sensors

Harvard Business Review

Watson on Jeopardy!

Possibly the seminal open source event

Against human all time champions Ken Jennings and Brad Rutter
Feb 14, 2011

Metadata

"Data about Data"

Generates voluminous amounts. Slightly more than "3 degrees" of any telephone conversation potentially covers nearly all persons in No America even under the current FISA which restricts phone tapping to at least one participant overseas.

Big Data Applications

Document based - Content is analyzed

Graphanalysis - Relationships between nodes is analyzed

 

Displaying Big Data Results

d3​

Raphael

What is Elasticsearch?

A competitive application stack to Hadoop/Solr/Pig/Hive/
Which was the basis for Yahoo search since mid-1990's
Still the standard solution for over 90% of solutions

Text

2004 Shay Banon original creator of predecessor called "Compass"
2010 Elasticsearch First Release
2014 Raises $70 million Series C funding

Other big data search apps
Sphynx (thepiratebay)
Xapian

Why NoSQL?

Business Objectives Satisfied by Hadoop/Elasticsearch stacks
Businesses evolve (different purpose, different business objectives)
Data collected by businesses evolve
 

Text

Comparing Hadoop/Solr with Elasticsearch

Similarities

Text

document-based databases (as opposed to graphanalysis databases)
nosql
data fundamentally stored as key-value pairs
map reduce (highly parallelized data processing, sorting, reducing)
application clusters
commodity hardware over Servers with redundant subsystems
application stack allows deploying parts on different nodes
Based on lucene search engine
Collect data from multiple sources
Collect numerous datatypes

Comparing Hadoop/Solr with Elasticsearch

Differences

Text


hadoop stack                    ELK stack
every app different lang    json and typically javascript
must create schema          schema autogenerated if needed
different protocols             http/json protocol

Advanced Solutions

Search

Analysis

Artificial / Machine Intelligence

Advanced Solutions

Search

Similar to web search engines
faceted search

Advanced Solutions

Analysis

Extract meaning from data
Apply modeling to query results

Advanced Solutions

Artificial / Machine Intelligence

Decision-making based on generated inputs
Related: Robotics, providing perfect near instantaneous recall, "experiences" to draw from

Q & A

Text

Text

nEXT Big Data - Sept 10 2015

By Tony Su

nEXT Big Data - Sept 10 2015

  • 1,493