Curation of the End-of-Term Web Archive

Description:

Paper for the 2011 IS&T Archiving Conference. This paper discusses the Classification of the End-of-Term Archive research project at the University of North Texas.

Creator(s):
Creation Date: 2011
Partner(s):
UNT Libraries
Collection(s):
UNT Scholarly Works
Usage:
Total Uses: 120
Past 30 days: 1
Yesterday: 0
Creator (Author):
Murray, Kathleen R.

University of North Texas

Creator (Author):
Ko, Lauren

University of North Texas

Creator (Author):
Phillips, Mark Edward

University of North Texas

Publisher Info:
Place of Publication: [Springfield, Virginia]
Date(s):
  • Creation: 2011
Description:

Paper for the 2011 IS&T Archiving Conference. This paper discusses the Classification of the End-of-Term Archive research project at the University of North Texas.

Degree:
Department: Libraries
Note:

Abstract: The Classification of the End-of-Term Archive research project at the University of North Texas Libraries is investigating the feasibility of machine-generated classification of websites in the 16-terabyte End-of-Term (EOT) Web Archive. The research is being conducted concurrently in two areas: Archive Classification and Web Archive Metrics. A set of 1,151 URLs within the EOT Archive was analyzed using link analysis methods to identify related groupings or clusters. Investigations into visualization of the underlying relationships among the URLs were also conducted. Subject Matter Experts (SMEs) in the classification of government information manually classified the same set of URLs using the Superintendent of Documents (SuDocs) Classification Numbering System, which is a hierarchical scheme that groups government publications by federal agencies. The SME-classification will serve as the criterion to evaluate the effectiveness of the link analysis. In a parallel work area of the project, metrics for Web archives were discussed in a focus group with the SMEs, who identified key criteria libraries would likely employ in acquiring materials from Web archives. Participants also identified two service models libraries will need from Web archive service providers: acquisition and access models. A subsequent survey of Federal Depository Libraries measured the demand for each of these models, as well as libraries' perceived capabilities to support long-term preservation and local hosting of materials from Web archives. It appears that some existing library metrics, but more importantly, standard usage statistics will be essential metrics.

Physical Description:

6 p.

Language(s):
Subject(s):
Keyword(s): web archives | digital libraries | metadata | link analyses | government documents
Source: IS & T--the Society for Imaging Science and Technology Archiving Conference, 2011, Salt Lake City, Utah, United States
Contributor(s):
Partner:
UNT Libraries
Collection:
UNT Scholarly Works
Identifier:
  • ARK: ark:/67531/metadc36301
Resource Type: Paper
Format: Text
Rights:
Access: Public