Interpolative multidimensional scaling techniques for the identification of clusters in very large sequence sets Page: 4
The following text was automatically extracted from the image on this page using optical character recognition software:
Hughes et al. BMC Bioinformatics 2012, 13(Suppl 2):S9
' , <%, S,
I - "
"r am . 4 ,1 4-+ . . ."'
- . .
... . ..C 4
l i lk I 1 " " ,
~ ~ 'qlA il
,o ; .
's y e,
" 'i'4 ri
"=+"s ,t "-
Figure 4 100K Metagenomics sequences - Full MDS. Visualization of MDS and clustering results for 100,000 gene sequences from an
environmental sample of 16S rRNA. The many different genes are classified by a clustering algorithm and visualized by MDS dimension
although more significant changes in intra-cluster
arrangement can be seen.
Figure 7 shows the wall-clock time required to run each
complete pipeline discussed above. The full, non-inter-
polative calculation required about seven hours, while
the interpolative pipeline consisting of 50,000 in-sample
and 50,000 out-of-sample points required about three-
and-a-half hours. Finally, the interpolative calculation
with 10,000 in-sample and 90,000 out-of-sample
sequences completed in a little under an hour.
This study demonstrates the effectiveness of combining
the Needleman-Wunsch genetic distance algorithm with
Multidimensional Scaling (MDS) to enable visual identi-
fication of sequence clusters in a large sample of raw
reads from the 16S rRNA genome. In addition, the use
of interpolative MDS and the Twister Iterative MapRe-
duce runtime provides significant improvement in over-
all computational throughput while maintaining the
basic structure of the resultant sequence space. Further
investigation is needed to determine the optimal ratio of
in-sample to out-of-sample data set sizes in order to
Page 4 of 6
Here’s what’s next.
This article can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Article.
Hughes, Adam; Ruan, Yang; Ekanayake, Saliya; Bae, Seung-Hee; Dong, Qunfeng; Rho, Mina et al. Interpolative multidimensional scaling techniques for the identification of clusters in very large sequence sets, article, March 13, 2012; [London, United Kingdom]. (digital.library.unt.edu/ark:/67531/metadc78283/m1/4/: accessed November 15, 2018), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT College of Arts and Sciences.