Analyzing the Persistence of Referenced Web Resources with Memento Metadata

Metadata describes a digital item, providing (if known) such information as creator, publisher, contents, size, relationship to other resources, and more. Metadata may also contain "preservation" components that help us to maintain the integrity of digital files over time.

Title

  • Main Title Analyzing the Persistence of Referenced Web Resources with Memento

Creator

  • Author: Sanderson, Robert
    Creator Type: Personal
    Creator Info: Los Alamos national laboratory
  • Author: Phillips, Mark Edward
    Creator Type: Personal
    Creator Info: University of North Texas
  • Author: Van de Sompel, Herbert
    Creator Type: Personal
    Creator Info: Los Alamos national laboratory

Date

  • Creation: 2011-06

Language

  • English

Description

  • Content Description: Paper presenting the results of a study into the persistence and availability of web resources reference from papers in scholarly repositories.
  • Physical Description: 4 p.

Subject

  • Keyword: experimentation
  • Keyword: digital preservation
  • Keyword: repositories
  • Keyword: web persistence

Source

  • Conference: Sixth Annual International Conference on Open Repositories, 2011, Austin, Texas, United States

Relation

  • Is Version Of: Analyzing the Persistence of Referenced Web Resources with Memento [Presentation], ark:/67531/metadc83793

Collection

  • Name: UNT Scholarly Works
    Code: UNTSW

Institution

  • Name: UNT Libraries
    Code: UNT

Rights

  • Rights Access: public

Resource Type

  • Paper

Format

  • Text

Identifier

  • Archival Resource Key: ark:/67531/metadc39318

Degree

  • Academic Department: Digital Projects Unit

Note

  • Display Note: Abstract: In this paper we present the results of a study into the persistence and availability of web resources referenced from papers in scholarly repositories. Two repositories with different characteristics, arXiv and the UNT digital library, are studied to determine if the nature of the repository, or of its content. Memento makes it possible to automate discovery of archived resources and to consider the time between the publication of the research and the archiving of the reference URLs. This automation allows us to process more than 160000 URLs, the largest known such study, and the repository metadata allows consideration of the results by discipline. The results are startling: 45% (66096) of the URLs referenced from arXiv still exist, but are not preserved for future generations, and 28% of resources referenced by UNT papers have been lost. Moving forwards, we provide some initial recommendations, including that repositories should publish URL lists extracted from papers that could be used as seeds for web archiving systems.