Interpolative multidimensional scaling techniques for the identification of clusters in very large sequence sets

Hughes, Adam; Ruan, Yang; Bae, Seung-Hee; Dong, Qunfeng; Rho, Mina; Qiu, Judy; Fox, Geoffrey

doi:10.1186/1471-2105-13-S2-S9

Interpolative multidimensional scaling techniques for the identification of clusters in very large sequence sets

Primary view of object titled 'Interpolative multidimensional scaling techniques for the identification of clusters in very large sequence sets'.

PDF Version Also Available for Download.

Description

This article discusses interpolative multidimensional scaling techniques for the identification of clusters in very large sequence sets.

Physical Description

6 p.

Creation Information

Hughes, Adam; Ruan, Yang; Ekanayake, Saliya; Bae, Seung-Hee; Dong, Qunfeng; Rho, Mina et al. March 13, 2012.

Context

This article is part of the collection entitled: UNT Scholarly Works and was provided by the UNT College of Arts and Sciences to the UNT Digital Library, a digital repository hosted by the UNT Libraries. It has been viewed 275 times. More information about this article can be viewed below.

Authors

Hughes, Adam Indiana University
Ruan, Yang Indiana University

Attributed name

Ekanayake, Saliya Indiana University

Authors

Bae, Seung-Hee Indiana University
Dong, Qunfeng University of North Texas
Rho, Mina Indiana University
Qiu, Judy Indiana University
Fox, Geoffrey Indiana University

Publisher

BioMed Central Ltd.
Place of Publication: [London, United Kingdom]

Provided By

UNT College of Arts and Sciences

The UNT College of Arts and Sciences educates students in traditional liberal arts, performing arts, sciences, professional, and technical academic programs. In addition to its departments, the college includes academic centers, institutes, programs, and offices providing diverse courses of study.

Degree Information

Department: Biological Sciences

Description

This article discusses interpolative multidimensional scaling techniques for the identification of clusters in very large sequence sets.

Physical Description

6 p.

Notes

Proceedings from the Great Lakes Bioinformatics Conference, 2011, Athens, Ohio, United States

Abstract: Background: Modern pyrosequencing techniques make it possible to study complex bacterial populations, such as 16S rRNA, directly from environmental or clinical samples without the need for laboratory purification. Alignment of sequences across the resultant large data sets (100,000+ sequences) is of particular interest for the purpose of identifying potential gene clusters and families, but such analysis represents a daunting computational task. The aim of this work is the development of an efficient pipeline for the clustering of large sequence read sets. Methods: Pairwise alignment techniques are used here to calculate genetic distances between sequence pairs. These methods are pleasingly parallel and have been shown to more accurately reflect accurate genetic distances in highly variable regions of rRNA genes that do traditional multiple sequence alignment (MSA) approaches. By utilizing Needleman-Wunsch (NW) pairwise alignment in conjunction with novel implementations of interpolative multidimensional scaling (MDS), the authors have developed an effective method for visualizing massive biosequence data sets and quickly identifying potential gene clusters. Results: This study demonstrates the use of interpolative MDS to obtain clustering results that are qualitatively similar to those obtained through full MDS, but with substantial cost savings. In particular, the wall clock time required to cluster a set of 100,000 sequences has been reduced from seven hours to less than one hour through the use of interpolative MDS. Conclusions: Although work remains to be done in selecting the optimal training set size for interpolative MDS, substantial computational cost savings will allow the authors to cluster much larger sequence sets in the future.

Subjects

Keywords

Source

BMC Bioinformatics, 13(2), BioMed Central Ltd., March 13, 2012, pp. 1-6

Language

English

Item Type

Article

Identifier

Unique identifying numbers for this article in the Digital Library or other systems.

Digital Object Identifier: https://doi.org/10.1186/1471-2105-13-S2-S9
Archival Resource Key: ark:/67531/metadc78283

Publication Information

Publication Title: BMC Bioinformatics
Volume: 13
Issue: 2
Peer Reviewed: Yes

Collections

This article is part of the following collection of related materials.

UNT Scholarly Works

Materials from the UNT community's research, creative, and scholarly activities and UNT's Open Access Repository. Access to some items in this collection may be restricted.

What responsibilities do I have when using this article?

Creation Date

March 13, 2012

Added to The UNT Digital Library

April 2, 2012, 4:46 p.m.

Description Last Updated

Nov. 17, 2023, 2:49 p.m.

Usage Statistics

When was this article last used?

Yesterday: 0

Past 30 days: 3

Total Uses: 275

Interact With This Article

Here are some suggestions for what to do next.

Top Search Results

We found two places within this article that matched your search. View Now

Start Reading

Thumbnail image of item number 1 in: 'Interpolative multidimensional scaling techniques for the identification of clusters in very large sequence sets'.

Thumbnail image of item number 2 in: 'Interpolative multidimensional scaling techniques for the identification of clusters in very large sequence sets'.

Thumbnail image of item number 3 in: 'Interpolative multidimensional scaling techniques for the identification of clusters in very large sequence sets'.

Thumbnail image of item number 4 in: 'Interpolative multidimensional scaling techniques for the identification of clusters in very large sequence sets'.

PDF Version Also Available for Download.

All Formats

Citations, Rights, Re-Use

International Image Interoperability Framework

We support the IIIF Presentation API

Hughes, Adam; Ruan, Yang; Ekanayake, Saliya; Bae, Seung-Hee; Dong, Qunfeng; Rho, Mina et al. Interpolative multidimensional scaling techniques for the identification of clusters in very large sequence sets, article, March 13, 2012; [London, United Kingdom]. (https://digital.library.unt.edu/ark:/67531/metadc78283/: accessed April 24, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT College of Arts and Sciences.

Interpolative multidimensional scaling techniques for the identification of clusters in very large sequence sets

Description

Physical Description

Creation Information

Context

Who

Authors

Attributed name

Authors

Publisher

Provided By

UNT College of Arts and Sciences

Contact Us

What

Degree Information

Description

Physical Description

Notes

Subjects

Keywords

Source

Language

Item Type

Identifier

Publication Information

Collections

UNT Scholarly Works

Digital Files

When

Creation Date

Added to The UNT Digital Library

Description Last Updated

Usage Statistics

Interact With This Article

Search Inside

Top Search Results

Start Reading

Citations, Rights, Re-Use

International Image Interoperability Framework

Print / Share

Links for Robots

Archival Resource Key (ARK)

International Image Interoperability Framework (IIIF)

Metadata Formats

Images

URLs

Stats