DLF Forum 2013
METADATA FIRST
Using Structured Data Markup & the Google Custom Search API to Outsource Your Digital Collection Search Index
Jason Clark
@jaclark
Scott Young
@hei_scott
Kenning Arlitsch
Jason Clark
Patrick O'Brien
Scott Young
Today's Talk
Creating Indexable Content
For web search
Reusing Indexed Content
For local search
Creating Indexable Content
The Inside-out Library
"The challenge is not now only to improve local systems, it is to make library resources discoverable in other venues and systems, in the places where their users are having their discovery experiences."
— Lorcan Dempsey, September 2013
Duke University Library
"Discovery Turned Inside-out"
“It is imperative for libraries to ensure access to their content through search engines by engaging in the optimization of their content for higher SERP rankings.”
- Onaifo, D. (2013). Increasing libraries’ content findability on the web with search engine optimization. Library Hi Tech, 31(1), 87–108.
Digital Collections + Web Discovery
Donors and funding agencies want more accountability and demonstrated value.
Over 80% of students begin their research using internet search engines.
Improving web discovery via search engines leads to increased numbers of visitors & increased downloads.
Foundations of Indexable Content
A software tool that provides:
- Item pages at a stable resolvable URL
- Standards-based HTML(5) markup
- Structured Data Markup
- Navigable architecture with clear design
Traditional SEO
- Title tag & <meta> description
- Sitemaps & robots.txt alignment
- Server responses & error pages
- Google Analytics & Webmaster Tools
Beyond Traditional SEO
Structured Data
schema.org
microdata/RDFa Lite
Semantic Components
Linked Data
Social Tags
“The scope of library discovery services continues to evolve. We might characterize the situation we are in now as
full collection discovery.”
— Lorcan Dempsey, September 2013
Full Collection Discovery
Solr/Blacklight Advantages
Faceted Search
Flexible Results
Stable URLs
Contemporary Design
Solr/Blacklight Barriers
Development Time
GCS for Digital Collections
Enables local discovery by reusing web-scale index
Onramp to digital collections discovery layer
Efficient for libraries (both small and large)
GCS Advantages
Leverages an index already optimized for web search
Integration with leading commercial search engine
Faceted Browsing
Flexible Design
Search Analytics
API
GCS Barriers
Development Time
API
Cost
Business Case for Outsourcing
GCS Efficiencies
1. Build indexable content for bots and humans
2. Reuse index locally
http://arc.lib.montana.edu/digital-collections/
For MSU Library, we generously estimated 20,000 queries a month to our specific search index, which would lead to about $1,200 per year as a cost.
(Summon layer = 10,000 queries a month)
Takeaways
The foundations of digital collections are built on
interoperable indexable content (metadata first)
With rich metadata and structured markup, powerful and flexible discovery platforms such as GCS become available
[http://www.lib.montana.edu/~jason/files/digital-collections-custom-search-api.zip]
Questions
Jason Clark
@jaclark
Scott Young
@hei_scott