CMPT456 Assignment 4 Tutorial
Presentor: Weiyuan Wu <youngw@sfu.ca>
Instructor: Jian Pei
SFU CMPT 456, Fall 2019
Agenda
- Elasticsearch
- Docker
- Demo
Users are forever searching...
...that makes search functionality one of the most important things
Handling search is difficult
- Data storage
- Heterogenous data
- Ambiguous query
- ....
Lucene?
Lucene
Elasticsearch
Elasticsearch is distributed, open source search and analytics engine for all types of data.
What is Elasticsearch
Elasticsearch Internal
Elasticsearch
Index: Wikipedia
Index: ....
Index: ....
Shard1
Shard2
Shard3
Documents
Lucene Index
- Index: For store a collection of docs
- Shards: For parallel and failure safe
- Lucene Index: For quick searching
Bigger Picture
Speak JSON
JSON
- JSON - Javascript Object Notation
- Supported types: String, Array, Dict and Number
- Structured
E.g.
Text: We have 107 students enrolled CMPT456 in Fall 2019
{
"class": "CMPT456",
"number of students": 107,
"year": "2019",
"semester": "Fall"
}
- Easy for machine to read
- Human readable
Structured:
Document
- In Elasticsearch term, a document is a JSON object
- E.g.
{
"name":"Weiyuan",
"email": "youngw@sfu.ca",
"about": "TA for CMPT456",
"interests": ["sports","music"]
}
{
"title":"DeepDB: Learn from Data, not from Queries!",
"abstract": "The typical approach for learned DBMS...",
"body": "Motivation. Deep Neural Networks (DNNs)..."
}
Example: Insert Document
d1 = {
"name":"Weiyuan",
"about": "TA for CMPT456",
"email": "youngw@sfu.ca",
"interests": ["sports","music"],
}
es.index(index="sfu", id=1, body=d1)
es.get(index="sfu", id=1)
-------------------------------------
{
'_index': 'sfu',
'_type': '_doc',
'_id': '1',
'_version': 1,
'_seq_no': 0,
'_primary_term': 1,
'found': True,
'_source': {
'name': 'Weiyuan',
'about': 'TA for CMPT456',
'email': 'youngw@sfu.ca',
'interests': ['sports', 'music']
}
}
Docker
- Assignment 2 already covered Docker basics
- Elaborate this two commands in this tutorial
docker run -it --rm -p9200:9200 -p9300:9300 wooya/cmpt456a4 standby
docker run -it --rm -v$PWD/src:/workdir/src wooya/cmpt456a4 demo.py
Docker
Port 9200
Port 9300
Normally, your program talk with Elasticsearch on port 9200 on you PC.
Docker
Port 9200
Port 9300
Communication failure
Port 9200
Docker
Port 9200
Port 9300
Port 9200
Port 9300
docker run -it --rm -p9200:9200 -p9300:9300 wooya/cmpt456a4 standby
Docker
Port 9200
Port 9300
Port 9200
Port 9300
docker run -it --rm -p9200:9200 -p9300:9300 wooya/cmpt456a4 standby
Docker
docker run -it --rm -v$PWD/src:/workdir/src wooya/cmpt456a4 demo.py
src/
|-- demo.py
|-- q1.py
|-- q2.py
|-- ...
+-- assignment4.py
/wordir/src
|-- demo.py
+-- ...
Docker
docker run -it --rm -v$PWD/src:/workdir/src wooya/cmpt456a4 demo.py
src/
|-- demo.py
|-- q1.py
|-- q2.py
|-- ...
+-- assignment4.py
/wordir/src
|-- demo.py
+-- ...
Demo
- Download the source code from Gitlab
- Start the Elasticsearch in standby mode
- Insert documents into Elasticsearch using Python
- Search for the document
- Put everything into demo
Q & A
CMPT456A4 Tutorial
By Weiyüen Wu
CMPT456A4 Tutorial
- 1,197