Introduction to Apache Solr
Son Le - IBM Technical Evangelist
E: sonle@sg.ibm.com
T: @thsonvt
Agenda
- About Apache Solr and Its Key Features
- Indexing and Data Model in Apache Solr
- Demo of Apache Solr Web Admin UI
- Apache Solr in IBM Watson Retrieve and Rank
Big Data Landscape
Macro Trends Driving NoSQL Technology
Introduction
Source: Scaling Apache Solr (Safaribookonline.com)
Apache Solr Architecture
Source: Solr In Action (Safaribookonline.com)
Apache Solr Main Benefits
- Scalable— Solr scales by distributing work (indexing and query processing) to multiple servers in a cluster.
- Ready to deploy— Solr is open source, is easy to install and configure, and provides a preconfigured example to help you get started.
- Optimized for search— Solr is fast and can execute complex queries in subsecond speed, often only tens of milliseconds.
Apache Solr Main Benefits
- Large volumes of documents— Solr is designed to deal with indexes containing many millions of documents.
- Text-centric— Solr is optimized for searching natural-language text, like emails, web pages, resumes, PDF documents, and social messages such as tweets or blogs.
- Results sorted by relevance— Solr returns documents in ranked order based on how relevant each document is to the user’s query.
Apache Solr Key Features
Apache Solr Key Features
Apache Solr Key Features
Overview of Indexing Process
Source: Solr in Action (safaribooksonline.com)
Data Model in Apache Solr
Document
The basic and atomic unit of information in Solr. It is a container of fields and values that belong to a given entity of your domain model (for example, a book, car, or person).
Data Model in Apache Solr
Inverted Index
Data Model in Apache Solr
Field Type
One of the top-level entities declared in Solr schemas. A field type is declared using the <fieldType> element
Data Model in Apache Solr
Tokenizer
Breaks an incoming character stream into one or more tokens depending on specific criteria
Example: "I'm writing a simple text" as the input text
Data Model in Apache Solr
Analyzer
Examines the text of fields and generates a token stream
Introduction to Apache Solr
By Son Le Thanh
Introduction to Apache Solr
- 1,296