Combining Hashing and Abstraction in Sparse High Dimensional Feature Spaces

PDF Version Also Available for Download.

Description

Article on combining hashing and abstraction in sparse high dimensional feature spaces.

Physical Description

7 p.: ill.

Creation Information

Caragea, Cornelia; Silvescu, Adrian & Mitra, Prasenjit 2012.

Context

This paper is part of the collection entitled: UNT Scholarly Works and was provided by UNT College of Engineering to Digital Library, a digital repository hosted by the UNT Libraries. It has been viewed 94 times . More information about this paper can be viewed below.

Who

People and organizations associated with either the creation of this paper or its content.

Authors

Publisher

Provided By

UNT College of Engineering

The UNT College of Engineering promotes intellectual and scholarly pursuits in the areas of computer science and engineering, preparing innovative leaders in a variety of disciplines. The UNT College of Engineering encourages faculty and students to pursue interdisciplinary research among numerous subjects of study including databases, numerical analysis, game programming, and computer systems architecture.

Contact Us

What

Descriptive information to help identify this paper. Follow the links below to find similar items on the Digital Library.

Degree Information

Description

Article on combining hashing and abstraction in sparse high dimensional feature spaces.

Physical Description

7 p.: ill.

Notes

Copyright © American Association for Artificial Intelligence. http://www.aaai.org/ocs/index.php/AAAI/AAAI12/paper/view/5087

Abstract: With the exponential increase in the number of documents available online, e.g., news articles, weblogs, scientific documents, the development of effective and efficient classification methods is needed. The performance of document classifiers critically depends, among other things, on the choice of the feature representation. The commonly used "bag of words" and n-gram representations can result in prohibitively high dimensional input spaces. Data mining algorithms applied to these input spaces may be intractable due to the large number of dimensions. Thus, dimensionality reduction algorithms that can process data into features fast at runtime, ideally in constant time per feature, are greatly needed in high throughput applications, where the number of features and data points can be in the order of millions. One promising line of research to dimensionality reduction is feature clustering. We propose to combine two types of feature clustering, namely hashing and abstraction based on hierarchical agglomerative clustering, in order to take advantage of the strengths of both techniques. Experimental results on two text data sets show that the combined approach uses significantly smaller number of features and gives similar performance when compared with the "bag of words" and n-gram approaches.

Source

  • Proceedings of the Twenty-Sixth Association for the Advancement of Artificial Intelligence (AAAI) Conference on Artificial Intelligence, 2012, Toronto, Canada

Language

Item Type

Collections

This paper is part of the following collection of related materials.

UNT Scholarly Works

Materials from the UNT community's research, creative, and scholarly activities and UNT's Open Access Repository. Access to some items in this collection may be restricted.

What responsibilities do I have when using this paper?

When

Dates and time periods associated with this paper.

Creation Date

  • 2012

Added to The UNT Digital Library

  • Sept. 13, 2013, 2:58 p.m.

Description Last Updated

  • March 27, 2014, 11:39 a.m.

Usage Statistics

When was this paper last used?

Yesterday: 0
Past 30 days: 0
Total Uses: 94

Interact With This Paper

Here are some suggestions for what to do next.

Start Reading

PDF Version Also Available for Download.

Citations, Rights, Re-Use

Caragea, Cornelia; Silvescu, Adrian & Mitra, Prasenjit. Combining Hashing and Abstraction in Sparse High Dimensional Feature Spaces, paper, 2012; [Palo Alto, California]. (digital.library.unt.edu/ark:/67531/metadc181674/: accessed December 18, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT College of Engineering.