The UNT College of Engineering strives to educate and train engineers and technologists who have the vision to recognize and solve the problems of society. The college comprises six degree-granting departments of instruction and research.
Abstract: With the exponential increase in the number of documents available online, e.g., news articles, weblogs, scientific documents, effective and efficient classification methods are required in order to deliver the appropriate information to specific users or groups. The performance of document classifiers critically depends, among other things, on the choice of the feature representation. The commonly used "bag of words" representation can result in a large number of features. Feature abstraction helps reduce a classifier input size by learning an abstraction hierarchy over the set of words. A cut through the hierarchy specifies a compressed model, where the nodes on the cut represent abstract features. In this paper, we compare feature abstraction with two other methods for dimensionality reduction, i.e., feature selection and Latent Dirichlet Allocation (LDA). Experimental results on two data sets of scientific publications show that classifiers trained using abstract features significantly outperform those trained using features that have the highest average mutual information with the class, and those trained using the topic distribution and topic words output by LDA. Furthermore, we propose an approach to automatic identification of a cut in order to trade off the complexity of classifiers against their performance. Our results demonstrate the feasibility of the proposed approach.
This paper is part of the following collection of related materials.
UNT Scholarly Works
Materials from the UNT community's research, creative, and scholarly activities and UNT's Open Access Repository. Access to some items in this collection may be restricted.
Caragea, Cornelia; Silvescu, Adrian; Kataria, Saurabh; Caragea, Doina & Mitra, Prasenjit.Classifying Scientific Publications Using Abstract Features,
paper,
2011;
[Palo Alto, California].
(https://digital.library.unt.edu/ark:/67531/metadc181686/:
accessed April 19, 2024),
University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu;
crediting UNT College of Engineering.