Classification Of The End-Of-Term Archive: Extending Collection Development Practices To Web Archives Page: 2
37 p.View a full description of this report.
Extracted Text
The following text was automatically extracted from the image on this page using optical character recognition software:
IM LS Award Number LG-06-09-0174-09
Top Level # URLs # Unique
Domains Sub-domains
.gov 137,780,023 14,338
.com 7,805,205 57,873
.org 5,107,552 29,798
.mil 3,554,956 1,677
.edu 3,551,845 13,856
Table 1. Number of URLS & subdomains by top level domainsTable 2. EOTArchive mime-types by number of files
As initially planned, this two-year project was comprised of two work areas: (1) Archive Classification and
(2) Web Archive Metrics. A no-cost extension for the project was granted for the period December 1, 2011
through November 30, 2012. Two additional areas of work were planned for this time period: (3)
Improving Access to the EOT Archive and (4) Researcher Needs Assessment.
The activities of the project were carried out in four areas: Archive Classification, Web Archive Metrics,
Improving Access to the EOT 2008 Archive, and Researcher Needs Assessment. The key activities in each
area are described in the remainder of this section. Further details about the work conducted, as well as
the findings and accomplishments are described in the sections that follow.
Work Area 1 - Archive Classification
Classification of the EOT 2008 Archive involved structural analysis and human analysis. Link analysis,
cluster analysis, and visualization techniques identified the organizational and relational structure of the
EOT Archive and produced clusters of related websites from a representative set of the Archive's URLs. The
project's subject matter experts (SMEs) classified the same set of URLs according to the SuDocs
Classification Scheme using a Web-based application developed by project staff. The resulting classificationMime-Type # Files
text/html 105,590,929
image/jpeg 13,665,196
image/gif 13,031,046
application/pdf 10,320,163
Upcoming Pages
Here’s what’s next.
Search Inside
This report can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Report.
Hartman, Cathy Nelson; Murray, Kathleen R. & Phillips, Mark Edward. Classification Of The End-Of-Term Archive: Extending Collection Development Practices To Web Archives, report, February 2013; (https://digital.library.unt.edu/ark:/67531/metadc152437/m1/4/: accessed April 19, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; .