Introduction to Apache Solr

Son Le - IBM Technical Evangelist

E: sonle@sg.ibm.com

T: @thsonvt

Agenda

  • About Apache Solr and Its Key Features

 

  • Indexing and Data Model in Apache Solr

 

  • Demo of Apache Solr Web Admin UI

 

  • Apache Solr in IBM Watson Retrieve and Rank

Big Data Landscape

Macro Trends Driving NoSQL Technology

Introduction

Source: Scaling Apache Solr (Safaribookonline.com)

Apache Solr Architecture

Source: Solr In Action (Safaribookonline.com)

Apache Solr Main Benefits

  • Scalable— Solr scales by distributing work (indexing and query processing) to multiple servers in a cluster.

 

  • Ready to deploy— Solr is open source, is easy to install and configure, and provides a preconfigured example to help you get started.

            

  • Optimized for search— Solr is fast and can execute complex queries in subsecond speed, often only tens of milliseconds.

   

Apache Solr Main Benefits

            

  • Large volumes of documents— Solr is designed to deal with indexes containing many millions of documents.

 

  • Text-centric— Solr is optimized for searching natural-language text, like emails, web pages, resumes, PDF documents, and social messages such as tweets or blogs.

 

  •  Results sorted by relevance— Solr returns documents in ranked order based on how relevant each document is to the user’s query.

   

Apache Solr Key Features

Apache Solr Key Features

Apache Solr Key Features

Overview of Indexing Process

Source: Solr in Action (safaribooksonline.com)

Data Model in Apache Solr

Document

The basic and atomic unit of information in Solr. It is a container of fields and values that belong to a given entity of your domain model (for example, a book, car, or person).

Data Model in Apache Solr

Inverted Index

Data Model in Apache Solr

Field Type

One of the top-level entities declared in Solr schemas. A field type is declared using the <fieldType> element

Data Model in Apache Solr

Tokenizer

Breaks an incoming character stream into one or more tokens depending on specific criteria

Example: "I'm writing a simple text" as the input text

Data Model in Apache Solr

Analyzer

Examines the text of fields and generates a token stream

Made with Slides.com