Kathy Lussier
MassLNC Coordinator
2018 Evergreen International Pre-Conference
-- kmlussier, 11/04/15
From the staff client: Administration -> Server Administration -> MARC Search Facet Fields
From the database: config.metabib_field
"MODS handles the AACR2 interpretation for us; it stitches together the appropriate fields, applies NFI to titles, and the like. Otherwise we would need to invent and implement our own plugin system for all those rules (instead of letting LoC do that work for us via MODS), or hard-code them in the indexing code."
Mike Rylander, 1/29/15, Evergreen general mailing list
//mods32:mods/mods32:titleNonfiling[mods32:title and not (@type)]
//mods32:mods/mods32:titleInfo[mods32:title and (@type='uniform']
//mods32:mods/mods32:name[mods32:role/mods32:roleTerm[text()='creator']]
//mods32:mods/mods32:name[@type='personal' and not(mods32:role/mods32:roleTerm[text()='creator'])]
You don't need to use MODS for your indexes. Using the marcxml format, all of our identifier indexes are based on MARC tags, subfields and/or indicators.
In this example, we add a field for the MARC 222 tag - key title, which is not covered by MODS. In addition to specifying tags and subfields in the xpath, admins can also specify indicator values.
Text normalization is the process of transforming text into a single canonical form that it might not have had before. Normalizing text before storing or processing it allows for separation of concerns, since input is guaranteed to be consistent before operations are performed on it.
Text normalization. (n.d.). In Wikipedia. Retrieved April 25, 2018, from https://en.wikipedia.org/wiki/Text_normalization
We want to map our keyword index (id 15) to the NACO normalizer (id 1)
We would use the following SQL to map it:
INSERT INTO config.metabib_field_index_norm_map (field, norm) VALUES (15, 1);
Adding almost everything from MODS does not mean we are adding all MARC tags.
Harry Potter and the half-blood prince Harry Potter and the half-blood prince Harry Potter and the half-blood prince Rowling, J. K. creator Dale, Jim. sound recording-nonmusical fiction eng sound recording sound disc 17 sound discs (ca. 72 min. each) : digital ; 4 3/4 in. Harry Potter and the Half-Blood Prince takes up the story of Harry Potter's sixth year at Hogwarts School of Witchcraft and Wizardry at this point in the midst of the storm of this battle of good and evil. juvenile by J.K. Rowling. Unabridged. Compact disc. Read by Jim Dale. Potter, Harry (Fictitious character) Fiction Wizards Fiction Magic Fiction Schools Fiction Hogwarts School of Witchcraft and Wizardry (Imaginary place) Fiction Potter, Harry (Fictitious character) Juvenile fiction Wizards Juvenile fiction Magic Juvenile fiction Schools Juvenile fiction Hogwarts School of Witchcraft and Wizardry (Imaginary place) Juvenile fiction Audiobooks PZ7.R79835 Harh 2005ab [Fic] 0307283674 YA 760ACD Random House/Listening Library Books on Tape WORP-MAIN WORP-MAIN ROWLING WORP-MAIN WORP-MAIN ROWLING WORP-MAIN WORP-MAIN ROWLING WORP-MAIN WORP-MAIN ROWLING TEFBT 050510 20130418154900.0 412192
To see how the title is indexed for record 412192, run the following SQL:
SELECT * FROM metabib.title_field_entry WHERE source = 412192
'and':3A,24A,40A,44C,65C,81C 'at':19A,26A,60C,67C 'battl':78C 'battle':37A 'blood':7A,48C 'evil':41A,82C 'good':39A,80C 'half':6A,47C 'half-blood':5A,46C 'harri':42C,55C 'harry':1A,14A 'hogwart':61C 'hogwarts':20A 'in':29A,70C 'midst':31A,72C 'of':13A,22A,32A,35A,38A,54C,63C,73C,76C,79C 'point':28A,69C 'potter':2A,15A,43C,56C 'princ':49C 'prince':8A 's':16A,57C 'school':21A,62C 'sixth':17A,58C 'stori':53C 'storm':34A,75C 'story':12A 'take':50C 'takes':9A 'the':4A,11A,30A,33A,45C,52C,71C,74C 'this':27A,36A,68C,77C 'up':10A,51C 'witchcraft':23A,64C 'wizardri':66C 'wizardry':25A 'year':18A,59C
Query 1: User enters 'midsummer night's dream'
Query 2: User enters 'midsummer nights dream'
Query 1: User enters 'finnegans wake'
Query 2: User enters 'finnegan's wake'
Query 1: User enters 'midsummer night's dream'
Query 2: User enters 'midsummer nights dream'
Query 1: User enters 'finnegans wake'
Query 2: User enters 'finnegan's wake'
UPDATE config.metabib_field_index_norm_map a SET norm = 1 FROM (SELECT id,norm FROM config.metabib_field_index_norm_map) AS subquery WHERE subquery.norm = 17 AND a.id = subquery.id;
When configuring a search index, there is a weight field where you can specify how important this field is in relation to other fields of the same class.
In the All Searchable Fields Virtual Index, adjust the weights for fields that have been added.
All other fields are set to 1.
Bib ranking for alexander hamilton keyword search
Popularity/ bib relevance ranking for alexander hamilton keyword search
Add the following to your crontab:
30 4 * * * . ~/.bashrc && $EG_BIN_DIR/badge_score_generator.pl $SRF_CORE
Badges will begin calculating overnight!
Local Admin ->
Statistical Popularity Badges
If you don’t have the web client running, you can add the badges directly in the database. All activity metric settings are located in the rating schema.
rating.popularity_parameter is where available parameters are stored.
rating.badge is where you configure badges that can be applied to records.
rating.record_badge_score is where the scores for each record are stored after the calculations are run.
A script is available - badge_score_generator.pl - to calculate all badge scores. Add this script to your crontab to run the script nightly.
You can also manually run a calculation for an individual badge by running the recalculate_badge_score(id) database function.
Name: This is the name that will display on the record detail page to the public.
Scope: If you select a branch or system here, the badges will only be applied to titles with copies owned by the selected branch or system. This badge will only affect results for searches that are scoped to that system or branch.
Weight: The weight a specific badge carries in relation to other badges that may be applied to the record. It will affect the total badge score that is calculated when two or more badges are earned for the record. It has no impact on records that earn 1 badge.
Fixed rating - used to apply a rating to any material that matches filter. This field should only be used for popularity parameters that have the require_percentile field set to False.
Discard count - drops records with the lowest values before the percent is applied.
Carries the same advantages as the holds over time metrics, but captures things that are currently popular.
You will not capture activity for things that were popular a couple of years ago or those that are consistently popular over time.
May want to use in place of a Holds Requested Over Time metric.
A holds metric is ideal for this starter badge.
If you think specific materials are not going to get a boost from the starter badge, create targeted badges for those collections.
Non-fiction materials (using either record attributes or copy location groups) may be one area that you need to focus on.
In multi-type consortia, for libraries that are a bit different than the majority of your libraries, consider badges for that specific org unit.
In an academic library, for example, you might target a particular high-use collection, like reserves, or use one of the circulation metrics or in-house use as a way to capture the activity in those libraries.
The 'most popular' ranking method sorts the results by badge score. Within results that have the same badge score, the ranking method will be by relevance.
A new global flag allows you to set the default sort method used by the catalog. If unset, the default sort will be relevance.
A new global flag allows you to determine how much weight (1.0 to 2.0) should be given to popularity in the popularity-adjusted relevance ranking method.
Dictionaries are used to eliminate words that should not be considered in a search (stop words), and to normalize words so that different derived forms of the same word will match. A successfully normalized word is called a lexeme.
“12.6. Dictionaries.” PostgreSQL Documentation, www.postgresql.org/docs/9.5/static/textsearch-dictionaries.html.
Accessed 3/28/18
Stemming can help users find records they otherwise wouldn't have found because they entered the wrong variation of a word.
But it also has its drawbacks...
No longer retrieve false hits as a result of questionable variants
User may not find what they are looking for because they entered singular form of work instead of plural or otherwise entered the wrong variant.
Instructions for creating a synonym dictionary, with a sample dictionary, are available at
https://wiki.evergreen-ils.org/doku.php?id=scratchpad:brush_up_search
From the command line:
cd /usr/share/postresql/9.x/tsearch_data
sudo cp thesaurus_sample.ths thesaurus_masslnc.ths
psql -U evergreen -h localhost
CREATE TEXT SEARCH DICTIONARY public.thesaurus_masslnc (
TEMPLATE = thesaurus,
DictFile = thesaurus_masslnc,
Dictionary = public.english_nostop
);
CREATE TEXT SEARCH CONFIGURATION public.thesaurus_masslnc (copy=DEFAULT);
ALTER TEXT SEARCH CONFIGURATION public.thesaurus_masslnc
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart
WITH thesaurus_masslnc;
INSERT INTO config.ts_config_list VALUES ('thesaurus_masslnc', 'MassLNC Thesaurus List');
I
NSERT INTO config.metabib_class_ts_map (field_class, ts_config, index_weight) VALUES
('keyword', 'thesaurus_masslnc', 'C'),
('title', 'thesaurus_masslnc', 'C'),
('subject', 'thesaurus_masslnc', 'C');
If you search skinnytaste, the system will find any records with skinnytaste or skinny taste.
skinny taste does not need to be a phrase for the record to be retrieved.
Searching skinny taste will find the Skinnytaste title. Searching "skinny taste" as a phrase will not retrieve that record.