Date: June 2012
Creator: Murray, Kathleen R. & Hartman, Cathy Nelson
Description: This paper discusses the Classification of the End-of-Term Archive project. Abstract: For users, selecting relevant content from Web archives is often a daunting endeavor. This Institute of Museum and Library Services (IMLS) funded research project, Classification of the End-of-Term Archive, investigated whether link analysis and the cluster analysis were effective techniques for classifying the materials in the EOT Archive to improve discovery. Classification of the resulting clusters by subject matter experts in government information indicated that the structural analysis was not effective at creating clusters of related websites when authored by four or fewer federal government parent agencies. The results also suggested that cluster analysis might be effective at identifying topically related websites across agency authors, which would be highly desirable to both system developers and users. To investigate this, subject matter experts applied subject tags to the websites in two sets of machine-generated clusters. The findings indicate that the cluster analysis successfully identified strongly related content in 61% of clusters.
Contributing Partner: UNT Libraries