Visualizing Library Collections


Lessons and Challenges from the Artists' Book Collection,
Frick Fine Arts Library, University of Pittsburgh

Anna-Sophia Zingarelli-Sweet
LIS2975: Digital Scholarship
December 2013

The artists' book collection


  • Started at the Frick Fine Arts Library in the early 1990s
  • Contains 160+ one of a kind, limited edition, altered, and conceptual artists' books
 Jeffrey Morin, Sacred Space, 2002

Why Visualize?


  • The collection was developed under the auspices of the former Head Librarian, Ray Anne Lockard, now retiring
  • There is no formal collection policy
  • No other staff member has worked closely with developing  the collection or has specialized knowledge of artists' books
  • This project proposes that visualizing the existing collection can help identify its strengths and weaknesses, trends and focal points, to inform future collection development and potential establishment of a collection policy

Challenges


-- The only collection-specific inventory is in a word processing file with bibliographic data and a short description of each work. While of some use, it is incomplete, never updated, and riddled with basic errors
  • Inconsistent cataloging:
--Artists' books are cataloged by the ULS's English-language cataloger, who has no specialized knowledge or training. Consequently,  the cataloging varies widely and not all artists' books can be identified in PittCat by subject heading.
  • Bottom line:
There is no single place where the entire collection is identified or described as a coherent whole.

Creation of the Data Set

  • Trial and error!
  • Found no satisfactory way to automate the export of bibliographic data to a spreadsheet from the library's OPAC
  • First try: Exported data as *.csv from OPAC search result table
  • Exported only OCLC numbers, not bibliographic display
 ...not too useful.


  • Ultimately saved bibliographic data from subject heading searches as *.txt files, then compared to existing inventory
  • Imported to Excel
  • Manually reformatted into desired columns



  • Identified 167 artists' books
  • May still be incomplete, due to cataloging inconsistencies
  • Data set includes title, author, publisher, date of publication, place of publication, and up to 9 Library of Congress Subject Headings
  • Approximately 70 books identified using subject search "Artists' book" + location search "Fine Arts"
  • Remaining books identified using existing inventory; subject headings were then added by searching the OPAC, revealing that many books were not classified with LCSH "Artists' book", but with geographically specific and other subject headings

Recommendations

  • As digital tools develop further to add value and meaning to collections, libraries should make bibliographic data more easily exportable to file formats that can be used to visualize and explore (eg *.csv, *.xlsx, *.gexf)
  • Special collections should engage in cataloging practices that more effectively and consistently "tag" items to be searched as a coherent collection (eg use "Artists' book" as primary subject heading for all collection items before geographically subdividing)

Visualization goals

  • Identify tools that would reveal hidden relationships and trends within the collection items

  • mapping publication locations
  • subject/keyword word clouds
  • author relationship networks
  • timeline

  • Seed ideas for a digital exhibition of works from the collection

  • Secondary goal: to demonstrate value of digital tools for traditional library functions (like collection development) to library leadership

GEPHI

  • First turned to Gephi due to positive feedback from peers
  • Downloaded free version 0.8.2-beta
  • Attempted to upload as *.csv (converted from *.xlsx spreadsheet), but Gephi interpreted each word in the document as a separate node, resulting in thousands of decontextualized nodes
  • Not apparent how my data could be transformed into other formats supported by Gephi (GDF, GML, GEXF)
  • May be usable with additional time spent marking up data and watching tutorials, but was not easily accessible for the typical humanities librarian

ManyEyes

  • An IBM Research tool
  • Web based; not compatible with all browsers
  • Offers numerous visualization templates to be used with uploaded data
  • All datasets and visualizations are public, can be reused
  • Data can be cut and pasted from any table format into ManyEyes template - very user friendly
  • Not flexible with all types of data (eg dates are read as quantities: "2,013" instead of 2013)
  • Does not readily handle even moderately large data sets; was difficult to visualize 167 titles, 115 authors
  • Relies on Java plugin that crashes frequently


  • Uploaded 3 datasets with variations of my spreadsheet
  • Used to create 8 visualizations, of varying utility
  • Conveniently accessible in personal online profile


Growth of Collection By Date

  • Scatter graph plotting the number of works by date
  • Each dot is a title; mouse over to view title and date
  • First item is 1967; most recent is 2013
  • Suggests that collection is stronger in recent works

  • Cannot zoom in to view subtle changes in overall trend
  • Dates are treated as quantities, causing confusion

Prominence of Author + Location

  • Size of bubble corresponds to number of author's work in collection (sort of...)

  • Since dates are treated as quantities, the size of the bubble reflects the sum of the dates, rather than the actual number of instances. It still successfully identifies more prominent authors, since those with more items in the collection have larger date sums, but is confusing and over-represents more recent authors
  • Color of bubble corresponds to publication location. This is suggestive of potential geography-based author networks, and reveals when an author has published in more than one place


Treemap hierarchy: Date and Location

  • Each square represents an item
  • Full bibliographic data is revealed by hovering over squares
  • Collocated by place of publication
  • Color represents date of publication; lighter color indicates older, darker color indicates more recent
  • Indirectly displays prominence of publication location
  • Could show trends in location of publishing (eg works from New York are older; works from Portland are more recent)


Treemap Hierarchy: Author and Title

  • Collocated by author
  • Another method of showing author prominence
  • Subdivided by title
  • Full bibliographic data revealed by hovering over each square


Matrix Chart: Location

  • In this iteration, shows prominence of each location and the authors associated with location; can also show the publishers associated with location
  • Particularly unwieldy for a large dataset

MAPS: Place of Publication (Us)

  • Shows in which states collection items have been published
  • Was only able to choose one representative item for each state - all subsequent items from an already-selected state had to be manually "ignored"
  • Time consuming to comb through all location data to identify new states
  • Does not reflect relative prominence of each state or collocate the various titles from each state
  • World maps and maps of some states/provinces by county also available


Word Cloud 1


  • Word cloud compiled from primary subject headings
  • Not manipulated
  • Unable to treat each subject heading as a coherent phrase; split up into 2-word phrases
  • Reveals most common primary subject headings, but reveals almost nothing about topical content of the items
  • A few themes do emerge (eg "September 11")
  • Searchable


Word Cloud 2

  • Drawn from all 9 columns of subject headings
  • Heavily manipulated to remove most common, non-descriptive headings and attempt to reveal topical content
  • "Banned words": and, art, Art, Artists, books, books--New, books--Specimens, books--United, in, of, State, States, States--Specimens, the, to
  • Also treats each word separately, not entire heading
  • Not searchable or zoomable


Click2Map

  • Free service that enables plotting locations from a file onto a Google map
  • Not visually appealing
  • Does not allow for association of other data with location points
  • Was not readily able to find other similar services
  • Easy to upload: cut and pasted location column into a new spreadsheet and saved as *.csv
  • Uploaded directly to map editor; automatically assigned latitude and longitude data
  • Searchable and zoomable
  • Free to create and download, but costly to publish online


SHODOR

  • Simple histogram generator
  • Can be printed, but not published online
  • Shows frequency of publication date:

Conclusions

  • Despite proliferation of free online tools, few are accommodating to the types of data and relationships relevant to analyzing library collections and accessible to average humanities librarians
  • Even these flawed visualizations, however, hint at the possibilities of visualizing this kind of data
  • Existing tools should be made more flexible to accommodate dates, key phrases, and larger data sets
  • Possible market for new specialized tools geared towards librarians and library collections
Made with Slides.com