Protein sequence classification using feature hashing

PDF Version Also Available for Download.

Description

Article discussing protein sequence classification using feature hashing.

Physical Description

8 p.: ill.

Creation Information

Caragea, Cornelia; Silvescu, Adrian & Mitra, Prasenjit Creation Date: Unknown.

Context

This article is part of the collection entitled: UNT Scholarly Works and was provided by UNT College of Engineering to Digital Library, a digital repository hosted by the UNT Libraries. It has been viewed 56 times , with 4 in the last month . More information about this article can be viewed below.

Who

People and organizations associated with either the creation of this article or its content.

Authors

Publisher

Provided By

UNT College of Engineering

The UNT College of Engineering promotes intellectual and scholarly pursuits in the areas of computer science and engineering, preparing innovative leaders in a variety of disciplines. The UNT College of Engineering encourages faculty and students to pursue interdisciplinary research among numerous subjects of study including databases, numerical analysis, game programming, and computer systems architecture.

Contact Us

What

Descriptive information to help identify this article. Follow the links below to find similar items on the Digital Library.

Degree Information

Description

Article discussing protein sequence classification using feature hashing.

Physical Description

8 p.: ill.

Notes

Abstract: Recent advances in next-generation sequencing technologies have resulted in an exponential increase in the rate at which protein sequence data are being acquired. The k-gram feature representation, commonly used for protein sequence classification, usually results in prohibitively high dimensional input spaces, for large values of k. Applying data mining algorithms to these input spaces may be intractable due to the large number of dimensions. Hence, using dimensionality reduction techniques can be crucial for the performance and the complexity of the learning algorithms. In this paper, we study the applicability of feature hashing to protein sequence classification, where the original high-dimensional space is "reduced" by hashing the features into a low-dimensional space, using a hash function, i.e., by mapping features into hash keys, where multiple features can be mapped (at random) to the same hash key, by "aggregating" their counts. We compare feature hashing with the "bag of k-grams" approach. Our results show that feature hashing is an effective approach to reducing dimensionality on protein sequence classification tasks.

Source

  • Proteome Science, 2012, London: BioMed Central Ltd.

Language

Item Type

Identifier

Unique identifying numbers for this article in the Digital Library or other systems.

Publication Information

  • Publication Title: Proteome Science
  • Volume: 10
  • Issue: S14
  • Peer Reviewed: Yes

Collections

This article is part of the following collection of related materials.

UNT Scholarly Works

Materials from the UNT community's research, creative, and scholarly activities and UNT's Open Access Repository. Access to some items in this collection may be restricted.

What responsibilities do I have when using this article?

When

Dates and time periods associated with this article.

Creation Date

  • Unknown

Accepted Date

  • June 21, 2012

Added to The UNT Digital Library

  • Sept. 6, 2013, 3:22 p.m.

Description Last Updated

  • March 27, 2014, 12:50 p.m.

Usage Statistics

When was this article last used?

Yesterday: 0
Past 30 days: 4
Total Uses: 56

Interact With This Article

Here are some suggestions for what to do next.

Start Reading

PDF Version Also Available for Download.

Citations, Rights, Re-Use

Caragea, Cornelia; Silvescu, Adrian & Mitra, Prasenjit. Protein sequence classification using feature hashing, article, Date Unknown; [London, United Kingdom]. (digital.library.unt.edu/ark:/67531/metadc180950/: accessed September 21, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT College of Engineering.