Solr 101

  • 2004: Created by Yonik Seeley (CNET)
  • 2006: Apache Software Project
  • 2008-9: v1.3-1.4, usage explodes
  • 2010: Merged with Lucene
  • 2011: v1.4 -> v3.1
  • 2014: currently at v4.1

History

  • Search engine/server/platform/whatevs
  • Written in Java
  • Open source, Apache 2.0 license
  • Runs as a Jetty or Tomcat servlet
  • Provides REST-ish HTTP API (XML or JSON)
  • Highly customizable via configuration, plugins
  • Also embeddable via EmbeddedSolrServer (although not considered a "best practice")

What is it?

  • fulltext searching
  • result facets
  • term highlighting
  • query & index analysis filters
  • text extraction (from PDF, Word, etc.)
  • Nice admin UI
  • easy relevance tuning
  • sharding, replication
  • NoSQL-ish
  • more-like-this, did-you-mean, auto-complete
  • nested documents (as of v4.5)

Major Features

  • Two ways to get content in
    • POST xml or json via HTTP
    • DataImport: import from data source
  • Indexing process
    • field-based
    • Defined in schema.xml
    • fieldType defines how field content is processed
    • analysis phase: tokenize, filter, transform
    • storage options: index only, index + content, term frequency, positions, normalizations, vectors
  • Inverted Index format: terms -> documents

Indexing

  • via HTTP GET: "?q=term1"
  • default = Lucene query syntax
    • free-form: term1 term2
    • fielded: foo:term1
    • phrases: "term1 term2"
    • wildcards: "foo:term*"
    • proximity: "term1 term2"~4
    • ranges: [1 TO 1000]
  • ExtendedDisMax
    • search across range of fields with varying "boost" factors
    • title:foo^5 fulltext:bar^0.5

Querying

Demo!

  • http://lucene.apache.org/solr
  • http://www.solrtutorial.com/index.html
  • https://wiki.apache.org/solr/FrontPage
  • https://slides.com/jamesluker/solr-101

Links

Solr 101

By James Luker

Solr 101

  • 900