SOLR in Rails

SOLR

full-text search server with Apache Lucene at the backend
Opensource, maintained by Apache
It's not a abbreviation. :P
exposes Lucene's JAVA API as REST like API's which can be called over HTTP from any programming language/platform.
Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java.

Features

Full Text Search
Faceted search
More items like this(Recommendation)/ Related searches
Spell Suggest/Auto-Complete
Custom document ranking/ordering
Snippet generation/highlighting
And a lot More....

why/when to use SOLR?

Want Greater control over your website search.
Caching, Replication, Distributed search.
Reallly fast Indexing/Searching, Indexes can be merged/optimized (Index compaction).
Great admin interface can be used over HTTP.
Awesome community support too.
Support for integration with various other products like drupal CMS, etc.
Can be used in E-commerce sites, CMS, Blog sites.
Heavily used by LinkedIn, Twitter, Cnet, Netflix, Digg.

Sunspot

Ruby library for expressive, powerful interaction with the Solr search engine
built on top of the RSolr library, which provides a low-level interface for Solr interaction
provides a simple, intuitive, expressive DSL backed by powerful features for indexing objects and searching for them.
easily plugged in to any ORM

Installation

# Add to Gemfile:

gem 'sunspot_rails'
gem 'sunspot_solr' 

# optional pre-packaged Solr distribution for use in development

#Bundle it!

bundle install

# Generate a default configuration file:

rails generate sunspot_rails:install

# If sunspot_solr was installed, start the packaged Solr distribution with:

bundle exec rake sunspot:solr:start # or sunspot:solr:run to start in foreground

Setting up Objects

Text



    class Post < ActiveRecord::Base
      searchable do
        text :title, :body
        text :comments do
          comments.map { |comment| comment.body }
        end
    
        boolean :featured
        integer :blog_id
        integer :author_id
        integer :category_ids, :multiple => true
        double  :average_rating
        time    :published_at
        time    :expired_at
    
        string  :sort_title do
          title.downcase.gsub(/^(an?|the)/, '')
        end
      end
    end

Searching Objects

Text



   Post.search do
      fulltext 'big pizza' do
        fields(:body, :title => 2.0)
        phrase_fields :title => 2.0
        phrase_slop   1
        boost(2.0) { with(:featured, true) }
      end
    
      with :blog_id, 1
      with :category_id, 5
      with(:published_at).less_than Time.now
      order_by :published_at, :desc
      paginate :page => 2, :per_page => 15
   end


# Note*: 
# text fields will be full-text searchable. 
# Other fields (e.g., integer and string) can be used to scope queries.

Configurations

sunspot.yml
solr.xml
solrconfig.xml
schema.xml

    
    <fieldType name="text" class="solr.TextField" omitNorms="false">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>


    <dynamicField name="*_text" stored="false" type="text" multiValued="true" indexed="true"/>

    
    <solrQueryParser defaultOperator="OR"/>

Lucene Scoring Model

tf - Term Frequency. The frequency with which a term appears in a document. Given a search query, the higher the term frequency, the higher the document score.
idf - Inverse Document Frequency. The rarer a term is across all documents in the index, the higher it's contribution to the score.
coord - Coordination Factor. The more query terms that are found in a document, the higher it's score.
fieldNorm - Field length. The more words that a field contains, the lower it's score. This factor penalizes documents with longer field values.
Boosts - In addition to the scoring factors mentioned above, the primary method of modifying document scores is by boosting.
- Index-time boosts are applied when adding documents, and apply to the entire document or to specific fields.
- Query-time boosts are applied when constructing a search query, and apply to specific fields.

Lucene scoring Formula

score(q,d) = coord-factor(q,d) . query-boost(q) . V(q) . V(d) . doc-len-norm(d) . doc-boost(d)
                                                  ____________
                                                     |V(q)|

References

https://github.com/sunspot/sunspot
https://wiki.apache.org/solr/
https://cwiki.apache.org/confluence/display/solr/
http://lucene.apache.org/core/3_5_0/api/core/org/apache/lucene/search/Similarity.html

Questions?

Thank you

Using SOLR in Rails

By Datt Dongare

Using SOLR in Rails

Solr in rails

10 years ago
596

Datt Dongare

I've expertise in web development using Ruby on Rails and ReactJS. I help teams to write better code and also mentor them for organizing the projects efficiently

SOLR in Rails

SOLR

Features

why/when to use SOLR?

Sunspot

Installation

Setting up Objects

Searching Objects

Configurations

Lucene Scoring Model

Lucene scoring Formula

References

Questions?

Thank you

Using SOLR in Rails

More from Datt Dongare