Leveraging Machine Learning to Extract Content-Rich Publications from Web Archives

PDF Version Also Available for Download.

Description

Poster presented at the 2019 Texas Conference on Digital Libraries (TCDL-2019). This poster discusses about ways of Identifying content-rich documents among the wealth of materials available via web archives. This research attempts to answers the following two research questions: 1. What role do web-published documents and publications play in developing collections in the broad categories of institutional repositories, state government documents, and publications from the federal government? 2. What are the characteristics of web-published documents and publications that help content selectors identify them for inclusion in their local collection

Physical Description

1 p. : ill.

Creation Information

Fox, Nathaniel T. & Phillips, Mark Edward May 22, 2019.

Context

This presentation is part of the collection entitled: UNT Scholarly Works and was provided by the UNT Libraries to the UNT Digital Library, a digital repository hosted by the UNT Libraries. It has been viewed 98 times. More information about this presentation can be viewed below.

Who

People and organizations associated with either the creation of this presentation or its content.

Provided By

UNT Libraries

The UNT Libraries serve the university and community by providing access to physical and online collections, fostering information literacy, supporting academic research, and much, much more.

Contact Us

What

Descriptive information to help identify this presentation. Follow the links below to find similar items on the Digital Library.

Description

Poster presented at the 2019 Texas Conference on Digital Libraries (TCDL-2019). This poster discusses about ways of Identifying content-rich documents among the wealth of materials available via web archives. This research attempts to answers the following two research questions: 1. What role do web-published documents and publications play in developing collections in the broad categories of institutional repositories, state government documents, and publications from the federal government? 2. What are the characteristics of web-published documents and publications that help content selectors identify them for inclusion in their local collection

Physical Description

1 p. : ill.

Source

  • Texas Conference on Digital Libraries (TCDL-2019), May 20-23, 2019. Austin, Texas

Language

Item Type

Identifier

Unique identifying numbers for this presentation in the Digital Library or other systems.

Relationships

Collections

This presentation is part of the following collection of related materials.

UNT Scholarly Works

Materials from the UNT community's research, creative, and scholarly activities and UNT's Open Access Repository. Access to some items in this collection may be restricted.

Related Items

Leveraging Machine Learning to Extract Content-Rich Publications from Web Archives (Presentation)

Leveraging Machine Learning to Extract Content-Rich Publications from Web Archives

Presentation for the 2019 International Internet Preservation Consortium General Assembly and Web Archiving Conference. This presentation discusses research into leveraging machine learning to identify pdfs relevant to a collection from archived records.

Leveraging Machine Learning to Extract Content-Rich Publications from Web Archives - ark:/67531/metadc1608972

What responsibilities do I have when using this presentation?

When

Dates and time periods associated with this presentation.

Creation Date

  • May 22, 2019

Added to The UNT Digital Library

  • Aug. 27, 2019, 2:15 p.m.

Description Last Updated

  • Nov. 13, 2025, 9:48 a.m.

Usage Statistics

When was this presentation last used?

Yesterday: 0
Past 30 days: 2
Total Uses: 98

Interact With This Presentation

Here are some suggestions for what to do next.

Enlarge

PDF Version Also Available for Download.

International Image Interoperability Framework

IIF Logo

We support the IIIF Presentation API

Fox, Nathaniel T. & Phillips, Mark Edward. Leveraging Machine Learning to Extract Content-Rich Publications from Web Archives, presentation, May 22, 2019; (https://digital.library.unt.edu/ark:/67531/metadc1533639/: accessed April 13, 2026), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; .

Back to Top of Screen