Interpolative multidimensional scaling techniques for the identification of clusters in very large sequence sets

PDF Version Also Available for Download.

Description

This article discusses interpolative multidimensional scaling techniques for the identification of clusters in very large sequence sets.

Physical Description

6 p.

Creation Information

Hughes, Adam; Ruan, Yang; Ekanayake, Saliya; Bae, Seung-Hee; Dong, Qunfeng; Rho, Mina et al. March 13, 2012.

Context

This article is part of the collection entitled: UNT Scholarly Works and was provided by UNT College of Arts and Sciences to Digital Library, a digital repository hosted by the UNT Libraries. It has been viewed 69 times . More information about this article can be viewed below.

Who

People and organizations associated with either the creation of this article or its content.

Authors

Attributed name

Authors

Publisher

Provided By

UNT College of Arts and Sciences

The UNT College of Arts and Sciences educates students in traditional liberal arts, performing arts, sciences, professional, and technical academic programs. In addition to its departments, the college includes academic centers, institutes, programs, and offices providing diverse courses of study.

Contact Us

What

Descriptive information to help identify this article. Follow the links below to find similar items on the Digital Library.

Degree Information

Description

This article discusses interpolative multidimensional scaling techniques for the identification of clusters in very large sequence sets.

Physical Description

6 p.

Notes

Proceedings from the Great Lakes Bioinformatics Conference, 2011, Athens, Ohio, United States

Abstract: Background: Modern pyrosequencing techniques make it possible to study complex bacterial populations, such as 16S rRNA, directly from environmental or clinical samples without the need for laboratory purification. Alignment of sequences across the resultant large data sets (100,000+ sequences) is of particular interest for the purpose of identifying potential gene clusters and families, but such analysis represents a daunting computational task. The aim of this work is the development of an efficient pipeline for the clustering of large sequence read sets. Methods: Pairwise alignment techniques are used here to calculate genetic distances between sequence pairs. These methods are pleasingly parallel and have been shown to more accurately reflect accurate genetic distances in highly variable regions of rRNA genes that do traditional multiple sequence alignment (MSA) approaches. By utilizing Needleman-Wunsch (NW) pairwise alignment in conjunction with novel implementations of interpolative multidimensional scaling (MDS), the authors have developed an effective method for visualizing massive biosequence data sets and quickly identifying potential gene clusters. Results: This study demonstrates the use of interpolative MDS to obtain clustering results that are qualitatively similar to those obtained through full MDS, but with substantial cost savings. In particular, the wall clock time required to cluster a set of 100,000 sequences has been reduced from seven hours to less than one hour through the use of interpolative MDS. Conclusions: Although work remains to be done in selecting the optimal training set size for interpolative MDS, substantial computational cost savings will allow the authors to cluster much larger sequence sets in the future.

Source

  • BMC Bioinformatics, 2012, London: BioMed Central Ltd.

Language

Item Type

Identifier

Unique identifying numbers for this article in the Digital Library or other systems.

Publication Information

  • Publication Title: BMC Bioinformatics
  • Volume: 13
  • Issue: 2
  • Pages: 6
  • Peer Reviewed: Yes

Collections

This article is part of the following collection of related materials.

UNT Scholarly Works

Materials from the UNT community's research, creative, and scholarly activities and UNT's Open Access Repository. Access to some items in this collection may be restricted.

What responsibilities do I have when using this article?

When

Dates and time periods associated with this article.

Creation Date

  • March 13, 2012

Added to The UNT Digital Library

  • April 2, 2012, 4:46 p.m.

Description Last Updated

  • July 22, 2013, 1:17 p.m.

Usage Statistics

When was this article last used?

Yesterday: 0
Past 30 days: 3
Total Uses: 69

Interact With This Article

Here are some suggestions for what to do next.

Start Reading

PDF Version Also Available for Download.

Citations, Rights, Re-Use

Hughes, Adam; Ruan, Yang; Ekanayake, Saliya; Bae, Seung-Hee; Dong, Qunfeng; Rho, Mina et al. Interpolative multidimensional scaling techniques for the identification of clusters in very large sequence sets, article, March 13, 2012; [London, United Kingdom]. (digital.library.unt.edu/ark:/67531/metadc78283/: accessed August 20, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT College of Arts and Sciences.