Search Results

open access

Web Archive Profiling Via Sampling Final Report

Description: This report covers the results, deliverables, and ongoing status of the International Internet Preservation Consortium (IIPC) funded project "Web Archive Profiling Via Sampling" with links to code, datasets, presentations, and papers as appropriate.
Date: September 16, 2016
Creator: Alam, Sawood; Nelson, Michael L.; Van de Sompel, Herbert; Balakireva, Lyudmila; Shankar, Harihar; Bornand, Nicolas J. et al.
Partner: International Internet Preservation Consortium

Harvesting Democracy: Archiving Federal Government Web Content at End of Term

Description: This presentation provides an overview of the End of Term Presidential web archive. This overview will help law librarians understand the history and scope of the "End of Term Web Archive" project so they can help select the right sites to harvest, make profitable use of the preserved materials, and more fully appreciate the urgent need for preserving web-based government information.
Date: June 17, 2016
Creator: Bailey, Jefferson; Grotke, Abigail & Phillips, Mark Edward
Partner: UNT Libraries Digital Projects Unit

Building Specialized Collections from Web Archives

Description: Presentation given at the Artificial Intelligence for Data Discovery and Reuse (AIDR) 2019 conference in Pittsburgh, Pennsylvania. This presentation discusses work on creating datasets of high-value publications and documents from web archives that can be used for machine learning research to help classify these large collections of data.
Date: May 2019
Creator: Caragea, Cornelia & Phillips, Mark Edward
Partner: UNT Libraries

How to Fit In? Integrating a web archiving program in your organization

Description: This presentation combines session slides from the 2012 International Internet Preservation Consortium sponsored workshop on "How to fit in? Integrate a web archiving program in your organization. This workshop aims to investigate the challenges and methods involved in implementing web archiving in all mainstream activities of a heritage institution.
Date: November 26, 2012
Creator: Derrot, Sophie; Oury, Clément; Rives, Caroline; Jacquet, Françoise; Lorthios, Annick; Sablonnière, Marguerite et al.
Partner: International Internet Preservation Consortium

Leveraging Machine Learning to Extract Content-Rich Publications from Web Archives

Description: Poster presented at the 2019 Texas Conference on Digital Libraries (TCDL-2019). This poster discusses about ways of Identifying content-rich documents among the wealth of materials available via web archives. This research attempts to answers the following two research questions: 1. What role do web-published documents and publications play in developing collections in the broad categories of institutional repositories, state government documents, and publications from the federal government? … more
Date: May 22, 2019
Creator: Fox, Nathaniel T. & Phillips, Mark Edward
Partner: UNT Libraries
open access

Facing the Challenge of Web Archives Preservation: the Role and Work of the IIPC Preservation Working Group

Description: This paper documents the results of a survey about the current state of preservation in International Internet Preservation Consortium (IIPC) member web archives.
Date: October 2014
Creator: Goethals, Andrea; Oury, Clément; Pearson, David; Sierman, Barbara & Steinke, Tobias
Partner: International Internet Preservation Consortium
open access

2008 Member Profile Survey Results

Description: This report contains information on International Internet Preservation Consortium (IIPC) member institutions' web archiving activities and their contributions to the consortium.
Date: December 16, 2008
Creator: Grotke, Abigail
Partner: International Internet Preservation Consortium

Labeled PDF Dataset from End of Term (EOT) 2008 Web Archive

Description: This dataset contains a random sample of 2000 PDF documents from the usda.gov domain in the End of Term (EOT) 2008 Web Archive. These samples were categorized as being of interest for possible inclusion in the Technical Report Archive and Image Library (TRAIL). Each PDF has been sorted into two categories, Technical_Report and Not_Technical_Report.
Date: July 2018
Creator: Kirkwood, Patricia; Phillips, Mark Edward & Caldwell, Christopher
Partner: UNT Libraries
open access

Putting it all together: creating a unified web harvesting workflow at the Bibliothèque nationale de France

Description: This article presents the complete web harvesting workflow at the Bibliothèque Nationale de France for the International Internet Preservation Consortium sponsored workshop "How to fit in? Integrating a web archiving program in your organisation."
Date: November 2012
Creator: Le Follic, Annick; Stirling, Peter & Wendland, Bert
Partner: International Internet Preservation Consortium
open access

A Vision of the Role and Future of Web Archives

Description: This text was presented at the 2012 General Assembly of the International Internet Preservation Coalition, and appears as a three-part blog post in The Signal, a blog hosted by the Library of Congress. This text discusses the role and future of web archives.
Date: 2012
Creator: Leetaru, Kalev H.
Partner: International Internet Preservation Consortium
open access

Web Harvesting Survey

Description: This report contains a survey of the conditions found on web sites that influence the harvesting of content and the quality of an archival crawl.
Date: July 2004
Creator: Marill, Jennifer; Boyko, Andrew; Ashenfelder, Michael & Jones, Gina
Partner: International Internet Preservation Consortium
open access

Harvesting Practices Report

Description: This report summarizes the results of the International Internet Preservation Consortium (IIPC) Harvesting Practices Survey, developed in order to understand, analyze and to collate the current Internet archiving processes and experiences amongst IIPC members.
Date: June 10, 2011
Creator: Mayr, Michaela
Partner: International Internet Preservation Consortium
open access

Researchers and the Future(s) of Web Archives

Description: Report draft for the 2011 International Internet Preservation Consortium General Assembly. This report discusses way which web archives can be used by researchers in the future.
Date: May 9, 2011
Creator: Meyer, Eric T.; Thomas, Arthur & Schroeder, Ralph
Partner: International Internet Preservation Consortium
open access

Web Archives: The Future(s)

Description: This report aims to stimulate further discussion among web archivists and researchers about the future ways in which web archives can be used by researchers.
Date: June 30, 2011
Creator: Meyer, Eric T.; Thomas, Arthur & Schroeder, Ralph
Partner: International Internet Preservation Consortium

Needs Assessment Toolkit

Description: This presentation discusses the needs assessment toolkit created for the Web-at-Risk project. This presentation outlines the details related to the web archive development process and the activities related to the needs assessment.
Date: May 2005
Creator: Murray, Kathleen R.
Partner: UNT Libraries
Back to Top of Screen