Evergreen Search Tune-Up

Kathy Lussier, MassLNC Coordinator

klussier@masslnc.org

 

2015 Evergreen International Conference

5/15/2015

How I Learned About Evergreen Search

Two Key Places to Look in the Database

  • config.metabib_field
    • Where keyword, browse, and facet indexes are configured
    • Also available in the staff client Server Administration ->
      MARC Search/Facet Fields
  • metabib schema
    • Several tables storing the index terms for each record in the system

config.metabib_field

Proper Title configuration

Proper Title configuration

Interpreting MODS

Interpreting MODS

marcxml

You don't need to use MODS for your indexes. Using the marcxml format, all of our identifier indexes are based on MARC tags, subfields and/or indicators.

Metabib Entries

To see how the title is indexed for record 3183740, run the following SQL:

 

SELECT * FROM metabib.title_field_entry WHERE source = 3183740

Metabib Schema

  • This schema contains several tables that store index terms for each record in the database.
  • Each search class has its own metabib table:
    • metabib.author_field_entry
    • metabib.identifier_field_entry
    • metabib.keyword_field_entry
    • metabib.subject_field_entry
    • metabib.title_field_entry

Stock Indexes

  • Title class
    • Abbreviated
    • Translated
    • Alternate
    • Uniform
    • Title Proper
  • Author class
    • Corporate
    • Personal
    • Conference
    • Other

Stock Indexes

  • Subject class
    • Geographic
    • Name
    • Topic
    • Temporal
    • All Subjects
  • Series class
    • Series Title
  • Identifier Class
    • An entry for identifiers in tge record (e.g. ISBN, ISSN, UPC, TCN, etc.)

Stock Indexes

  • Keyword class: The blob

Keyword blob

The world of the Hunger Games The world of the Hunger Games The world of the Hunger Games Hunger Games Egan, Kate. creator text eng print 192 p. : col. ill. ; 21 cm. A companion guide to Panem, the world in the "Hunger Games," as portrayed in the motion picture based on the novel by Suzanne Collins. Welcome to Panem -- Life in the Districts -- Life in District 12 -- People of District 12 -- Katniss Everdeen -- At home with Katniss Everdeen -- Reaping Day -- Life in the capitol -- People of the capitol -- Tributes in the capitol -- Training for the Hunger Games -- Creatures of Panem -- Perils of the Hunger Games: the Cornucopia ; Fear ; Injuries ; Alliances ; Defiance ; Rule changes ; Love ; Lies ; Last move -- The game of Love -- After the games -- The Hunger Games glossary. adolescent by Kate Egan. Hunger games (Motion picture) Hunger games (Motion picture) Juvenile literature Hunger games (Motion picture) Hunger games (Motion picture) PN1997.2.H865 E345 2012 791.43/72 Hunger Games Collins, Suzanne. 9780545425124 (trade) 0545425123 (trade) 2011945839

Adding almost everything from MODS to the keyword index does not mean we are adding all MARC tags.

Adding a New Index to Evergreen

Insert a new entry in config.metabib_field

INSERT INTO config.metabib_field (field_class, name, label, xpath, format) VALUES (

   'keyword',

   'kw_isbn',

    'Keyword ISBN',

    $$//marcxml:datafield[@tag="020"]/marc:subfield[@code='a' or @code='z']$$,

    'marcxml'

);

Does the field need normalization?

  • config.index_normalizer contains all of the normalizers used during indexing
  • Wiki contains a good description of each of these normalization functions - http://bit.ly/evgils_normalize
  • In config.metabib_field_index_norm_map, you need to map your new index definition (by ID) to the ID for the normalizer(s) that should be used.

Normalization Example

We want to map out new ISBN index to the ISBN10-to-ISBN13 (and vice versa) normalizer, which has an ID of 12.

Assuming the ID of our new keyword ISBN index is 1001, we would use the following SQL to map it:

 

INSERT INTO config.metabib_field_index_norm_map (field, norm, pos) VALUES
(1001, 12, 2);

Reingest

 MARC tags we've added

  • 020a and 020z - keyword class. When mapped correctly, provides 10/13 ISBN conversion in keyword searching

  • 028a Music number - identifier class

  • 086 Gov Doc number -identifier class

  • 222a Key Title - title class

  • 260b Publisher - keyword class

  • 245c Statement of responsibility - keyword class

  • 505t Contents title - title class

  • 505r Contents author -author class

  • 740 ind2 Title analytic - title class

Pitfalls

  • When a user enters a keyword from the newly-added MARC field along with other keywords from the record, the system will not retrieve the record because the search terms are in two different metabib entries.
     
  • If added to the keyword class, the newly-added indexes may unintentionally receive more weight in relevance ranking.

Indexing Alternate Graphic Fields

marc21expand880 format- title

marc21expand880 format- author

Pitfall of marc21expand880

It adds new metabib entries for all of your records, even if they don't contain 880 fields.

Adjusting Relevance

Cover Density Algorithm

opensrf.xml

 

Adding Weight to an Index

Weight is a field in config.metabib_field

Because the keyword index is one big blob, we need to add indexes with a keyword class if we want some MARC fields to be weightier than others in keyword searching.

Even if we kept the weight of those keyword indexes at 1,  the fields in those new indexes would become weightier because of rank_cd (cover density)

Keyword entries for Survival Guide

Keyword entries for Free Spirit

Weighting for Stemmed/Non-Stemmed Terms

  • config.metabib_class_ts_map
  • Server Admin - > MARC Search/Facet Class FTS MAPS

config.metabib_class
Server Admin -> MARC Search/Facet Classes

On Combined Searches

  • By default, you cannot  retrieve records if the user's search terms appear in different entries for a particular search class.
    • Subject searches are the exception.
  • For example, in our Free Spirit publisher example, the user never would have retrieved the below record if they had typed "free spirit survival" as their search terms.

Enabling Combined Searches

  • Enabling Combined Search for a class in config.metabib_class will allow search terms to cross indexes.
     
  • BUT you essentially end up turning your keyword search back into a giant blob again, eliminating the benefit of adding specific fields for weighting.

Future Improvements to Relevance Ranking

Further Reading

Questions?

Evergreen Search Tune-Up

By Kathy Lussier

Evergreen Search Tune-Up

  • 2,675