Scoring, term weighting and the vector space model



Dávid László

Introduction

  • Boolean queries -> big number of documents



  1. Parametric zone indexes
  2. Weighting
  3. Vector space scoring
  4. Variants of term weighting

parametric and zone indexes

  • Metadata     
  • Fields - parametric indexes
  • Zones
  • Weighted zone scoring
  • Learning weights
  • The optimal weight g

Term frequency and weighting

  • Term frequency
  • Document frequency
  • Inverse document frequency

  • Tf-idf weighting

The vector space model for scoring

  • Dot products
  • Cosine similarity

  • Magnitude of the angle theta

  • Queries as vectors
  • Computing vector scores

Variant tf-idf functions

  • Sublinear tf scaling

  • Maximum tf normalization

  • Document and query weighting schemes
  • Pivoted normalized document length
Made with Slides.com