CMPT456 Assignment 4 Tutorial

Presentor: Weiyuan Wu <youngw@sfu.ca>

Instructor: Jian Pei

SFU CMPT 456, Fall 2019

Agenda

  • Elasticsearch
  • Docker
  • Demo

Users are forever searching...

...that makes search functionality one of the most important things

Handling search is difficult

  • Data storage
  • Heterogenous data
  • Ambiguous query
  • ....

Lucene?

Lucene

Elasticsearch

Elasticsearch is distributed, open source search and analytics engine for all types of data.

What is Elasticsearch

Elasticsearch Internal

Elasticsearch

Index: Wikipedia

Index: ....

Index: ....

Shard1

Shard2

Shard3

Documents

Lucene Index

  • Index: For store a collection of docs
  • Shards: For parallel and failure safe
  • Lucene Index: For quick searching

Bigger Picture

Speak JSON

JSON

  • JSON - Javascript Object Notation
  • Supported types: String, Array, Dict and Number
  • Structured

E.g.

Text: We have 107 students enrolled CMPT456 in Fall 2019

{
    "class": "CMPT456",
    "number of students": 107,
    "year": "2019",
    "semester": "Fall"
}
  • Easy for machine to read
  • Human readable

Structured:

Document

  • In Elasticsearch term, a document is a JSON object
  • E.g. 
{
    "name":"Weiyuan",
    "email": "youngw@sfu.ca",
    "about": "TA for CMPT456",
    "interests": ["sports","music"]
}
{
    "title":"DeepDB: Learn from Data, not from Queries!",
    "abstract": "The typical approach for learned DBMS...",
    "body": "Motivation. Deep Neural Networks (DNNs)..."
}

Example: Insert Document

d1 = {
    "name":"Weiyuan",
    "about": "TA for CMPT456",
    "email": "youngw@sfu.ca",
    "interests": ["sports","music"],
}

es.index(index="sfu", id=1, body=d1)

es.get(index="sfu", id=1)

-------------------------------------
{
  '_index': 'sfu',
  '_type': '_doc',
  '_id': '1',
  '_version': 1,
  '_seq_no': 0,
  '_primary_term': 1,
  'found': True,
  '_source': {
    'name': 'Weiyuan',
    'about': 'TA for CMPT456',
    'email': 'youngw@sfu.ca',
    'interests': ['sports', 'music']
  }
}

Docker

  • Assignment 2 already covered Docker basics
  • Elaborate this two commands in this tutorial
docker run -it --rm -p9200:9200 -p9300:9300 wooya/cmpt456a4 standby
docker run -it --rm -v$PWD/src:/workdir/src wooya/cmpt456a4 demo.py

Docker

Port 9200

Port 9300

Normally, your program talk with Elasticsearch on port 9200 on you PC.

Docker

Port 9200

Port 9300

Communication failure

Port 9200

Docker

Port 9200

Port 9300

Port 9200

Port 9300

docker run -it --rm -p9200:9200 -p9300:9300 wooya/cmpt456a4 standby

Docker

Port 9200

Port 9300

Port 9200

Port 9300

docker run -it --rm -p9200:9200 -p9300:9300 wooya/cmpt456a4 standby

Docker

docker run -it --rm -v$PWD/src:/workdir/src wooya/cmpt456a4 demo.py

src/

  |-- demo.py

  |-- q1.py

  |-- q2.py

  |-- ...  
  +-- assignment4.py

/wordir/src

  |-- demo.py

  +-- ...

Docker

docker run -it --rm -v$PWD/src:/workdir/src wooya/cmpt456a4 demo.py

src/

  |-- demo.py

  |-- q1.py

  |-- q2.py

  |-- ...  
  +-- assignment4.py

/wordir/src

  |-- demo.py

  +-- ...

Demo

  • Download the source code from Gitlab
  • Start the Elasticsearch in standby mode
  • Insert documents into Elasticsearch using Python
  • Search for the document
  • Put everything into demo

Q & A

CMPT456A4 Tutorial

By Weiyüen Wu

CMPT456A4 Tutorial

  • 1,197