Leveraging Machine Learning to Extract Content-Rich Publications from Web Archives
PDF Version Also Available for Download.
Description
Poster presented at the 2019 Texas Conference on Digital Libraries (TCDL-2019). This poster discusses about ways of Identifying content-rich documents among the wealth of materials available via web archives. This research attempts to answers the following two research questions: 1. What role do web-published documents and publications play in developing collections in the broad categories of institutional repositories, state government documents, and publications from the federal government? 2. What are the characteristics of web-published documents and publications that help content selectors identify them for inclusion in their local collection
The UNT Libraries serve the university and community by providing access to physical and online collections, fostering information literacy, supporting academic research, and much, much more.
Descriptive information to help identify this presentation.
Follow the links below to find similar items on the Digital Library.
Description
Poster presented at the 2019 Texas Conference on Digital Libraries (TCDL-2019). This poster discusses about ways of Identifying content-rich documents among the wealth of materials available via web archives. This research attempts to answers the following two research questions: 1. What role do web-published documents and publications play in developing collections in the broad categories of institutional repositories, state government documents, and publications from the federal government? 2. What are the characteristics of web-published documents and publications that help content selectors identify them for inclusion in their local collection
Leveraging Machine Learning to Extract Content-Rich Publications from Web Archives - ark:/67531/metadc1608972
Collections
This presentation is part of the following collection of related materials.
UNT Scholarly Works
Materials from the UNT community's research, creative, and scholarly activities and UNT's Open Access Repository. Access to some items in this collection may be restricted.
Presentation for the 2019 International Internet Preservation Consortium General Assembly and Web Archiving Conference. This presentation discusses research into leveraging machine learning to identify pdfs relevant to a collection from archived records.
Leveraging Machine Learning to Extract Content-Rich Publications from Web Archives - ark:/67531/metadc1608972