Web Archiving Update
MIT Libraries Collections Directorate Winter 2018 Meeting
Joe Carrano | Digital Archivist | IASC
2018-12-06
What is Web Archiving anyway?
the process of collecting portions of the World Wide Web, preserving the collections in an archival format, and then serving the archives for access and use.
~ International Internet Preservation Consortium
Archive-It
Webrecorder
Free tool from Rhizhome
Better for capturing dynamic websites and social media
Have to do most of the capture manually at this point
Subscription suite from the Internet Archive
Good for most websites, especially text based
Can set up automated crawls
Where we were
- Pilot project 2016-2017
- Small number of seeds and individual captures
- Had not begun systematic collecting
Where to start?
- We need to prioritize
- We need to know about websites to crawl
- We need to appraise
We need to prioritize
- Focus on collecting the archival records i.e. the records of the Institute
- Most of these are found on the mit.edu domain
- Begin in areas with existing archival collections
We need to know about websites to crawl
Got a list of websites!
There were 300+ of them
We need to appraise
- Determine what the Internet Archive is doing already
- Look at list to see which can align with EDISJ values
- Look at list to determine which sites represent unique information about activities at the Institute
Where we are
- 61 active seeds, 926 GB since 2016 (315 GB since August 2018)
- Finalizing metadata application profile
Where we're going
- Describing seeds
- Open public access in late spring
- Start collecting sites from student groups
Web Archives Update
By jcarrano
Web Archives Update
- 279