UCSD nEXT
Sept 10, 2015
Tony Su
MCSE
openSUSE Ambassador
Big Data - Elasticsearch
ELK demo Topography
Logstash Parser
Logstash Shipper
Redis
Netcat
Elasticsearch
Kibana
Marvel
Graphite
Where do you see this kind of app used?
- Homeland security telephone number metadata search
- Netflix (recommend selections to User)
- Personal Assistants (Siri, Cortana, Google Now)
- Web Search Engines (Google, Yahoo, Bing, etc)
- Banks (Searching for fraud)
- Sports Management
- Fantasy Sports
- Political Campaigns, particularly Democratic Party 2012
- ESPN FiveThirtyEight
Where does the data come from?
The industrialized world has been collecting data for a very long time (decades) without a clear understanding what to do with it
- Text
- Video
- Audio
- Biometrics
- Census data
- Private company data
- Financial data (government like IRS, Financial like banks, Credit Scoring companies)
- YouTube
- Facebook (500 terabytes per day in 2012)
- = 500 000 gigabytes
Sports Team Management
Player Sensors
Harvard Business Review
Watson on Jeopardy!
Possibly the seminal open source event
Against human all time champions Ken Jennings and Brad Rutter
Feb 14, 2011
Metadata
"Data about Data"
Generates voluminous amounts. Slightly more than "3 degrees" of any telephone conversation potentially covers nearly all persons in No America even under the current FISA which restricts phone tapping to at least one participant overseas.
Big Data Applications
Document based - Content is analyzed
Graphanalysis - Relationships between nodes is analyzed
Displaying Big Data Results
d3
Raphael
What is Elasticsearch?
A competitive application stack to Hadoop/Solr/Pig/Hive/
Which was the basis for Yahoo search since mid-1990's
Still the standard solution for over 90% of solutions
Text
2004 Shay Banon original creator of predecessor called "Compass"
2010 Elasticsearch First Release
2014 Raises $70 million Series C funding
Other big data search apps
Sphynx (thepiratebay)
Xapian
Why NoSQL?
Business Objectives Satisfied by Hadoop/Elasticsearch stacks
Businesses evolve (different purpose, different business objectives)
Data collected by businesses evolve
Text
Comparing Hadoop/Solr with Elasticsearch
Similarities
Text
document-based databases (as opposed to graphanalysis databases)
nosql
data fundamentally stored as key-value pairs
map reduce (highly parallelized data processing, sorting, reducing)
application clusters
commodity hardware over Servers with redundant subsystems
application stack allows deploying parts on different nodes
Based on lucene search engine
Collect data from multiple sources
Collect numerous datatypes
Comparing Hadoop/Solr with Elasticsearch
Differences
Text
hadoop stack ELK stack
every app different lang json and typically javascript
must create schema schema autogenerated if needed
different protocols http/json protocol
Advanced Solutions
Search
Analysis
Artificial / Machine Intelligence
Advanced Solutions
Search
Similar to web search engines
faceted search
Advanced Solutions
Analysis
Extract meaning from data
Apply modeling to query results
Advanced Solutions
Artificial / Machine Intelligence
Decision-making based on generated inputs
Related: Robotics, providing perfect near instantaneous recall, "experiences" to draw from
Q & A
Text
Text
nEXT Big Data - Sept 10 2015
By Tony Su
nEXT Big Data - Sept 10 2015
- 1,607