Web Archiving Update

MIT Libraries Collections Directorate Winter 2018 Meeting

Joe Carrano | Digital Archivist | IASC

2018-12-06

What is Web Archiving anyway?

the process of collecting portions of the World Wide Web, preserving the collections in an archival format, and then serving the archives for access and use.

~ International Internet Preservation Consortium

Archive-It

Webrecorder

Free tool from Rhizhome

 

Better for capturing dynamic websites and social media

 

Have to do most of the capture manually at this point

Subscription suite from the Internet Archive

 

Good for most websites, especially text based

 

Can set up automated crawls

Where we were

  • Pilot project 2016-2017
  • Small number of seeds and individual captures
  • Had not begun systematic collecting

Where to start?

  1. We need to prioritize
  2. We need to know about websites to crawl
  3. We need to appraise

We need to prioritize

  • Focus on collecting the archival records i.e.  the records of the Institute
  • Most of these are found on the mit.edu domain
  • Begin in areas with existing archival collections

We need to know about websites to crawl

Got a list of websites!

There were 300+ of them

We need to appraise

  • Determine what the Internet Archive is doing already
  • Look at list to see which can align with EDISJ values
  • Look at list to determine which sites represent unique information about activities at the Institute

Where we are

  • 61 active seeds, 926 GB since 2016 (315 GB since August 2018)
  • Finalizing metadata application profile

Where we're going

  • Describing seeds
  • Open public access in late spring
  • Start collecting sites from student groups

Web Archives Update

By jcarrano

Web Archives Update

  • 279