Markov Model of Segmentation and Clustering: Applications in Deciphering Genomes and Metagenomes

PDF Version Also Available for Download.

Description

Rapidly accumulating genomic data as a result of high-throughput sequencing has necessitated development of efficient computational methods to decode the biological information underlying these data. DNA composition varies across structurally or functionally different regions of a genome as well as those of distinct evolutionary origins. We adapted an integrative framework that combines a top-down, recursive segmentation algorithm with a bottom-up, agglomerative clustering algorithm to decipher compositionally distinct regions in genomes. The recursive segmentation procedure entails fragmenting a genome into compositionally distinct segments within a statistical hypothesis testing framework. This is followed by an agglomerative clustering procedure to group compositionally similar … continued below

Physical Description

xii, 135 pages

Creation Information

Pandey, Ravi Shanker August 2017.

Context

This dissertation is part of the collection entitled: UNT Theses and Dissertations and was provided by the UNT Libraries to the UNT Digital Library, a digital repository hosted by the UNT Libraries. It has been viewed 131 times. More information about this dissertation can be viewed below.

Who

People and organizations associated with either the creation of this dissertation or its content.

Publisher

Rights Holder

For guidance see Citations, Rights, Re-Use.

  • Pandey, Ravi Shanker

Provided By

UNT Libraries

The UNT Libraries serve the university and community by providing access to physical and online collections, fostering information literacy, supporting academic research, and much, much more.

Contact Us

What

Descriptive information to help identify this dissertation. Follow the links below to find similar items on the Digital Library.

Degree Information

Description

Rapidly accumulating genomic data as a result of high-throughput sequencing has necessitated development of efficient computational methods to decode the biological information underlying these data. DNA composition varies across structurally or functionally different regions of a genome as well as those of distinct evolutionary origins. We adapted an integrative framework that combines a top-down, recursive segmentation algorithm with a bottom-up, agglomerative clustering algorithm to decipher compositionally distinct regions in genomes. The recursive segmentation procedure entails fragmenting a genome into compositionally distinct segments within a statistical hypothesis testing framework. This is followed by an agglomerative clustering procedure to group compositionally similar segments within the same framework. One of our main objectives was to decipher distinctive evolutionary patterns in sex chromosomes via unraveling the underlying compositional heterogeneity. Application of this approach to the human X-chromosome provided novel insights into the stratification of the X chromosome as a consequence of punctuated recombination suppressions between the X and Y from the distal long arm to the distal short arm. Novel "evolutionary strata" were identified particularly in the X conserved region (XCR) that is not amenable to the X-Y comparative analysis due to massive loss of the Y gametologs following recombination cessation. Our compositional based approach could circumvent the limitations of the current methods that depend on X-Y (or Z-W for ZW sex determination system) comparisons by deciphering the stratification even if only the sequence of sex chromosome in the homogametic sex (i.e. X or Z chromosome) is available. These studies were extended to the plant sex chromosomes which are known to have a number of evolutionary strata that formed at the initial stage of their evolution, presenting an opportunity to examine the onset of stratum formation on the sex chromosomes. Further applications included detection of horizontally acquired DNAs in extremophilic eukaryote, Galdieria sulphuraria, which encode variety of potentially adaptive functions, and in the taxonomic profiling of metagenomic sequences. Finally, we discussed how the Markovian segmentation and clustering method can be made more sensitive and robust for further applications in biological and biomedical sciences in future.

Physical Description

xii, 135 pages

Language

Identifier

Unique identifying numbers for this dissertation in the Digital Library or other systems.

Collections

This dissertation is part of the following collection of related materials.

UNT Theses and Dissertations

Theses and dissertations represent a wealth of scholarly and artistic content created by masters and doctoral students in the degree-seeking process. Some ETDs in this collection are restricted to use by the UNT community.

What responsibilities do I have when using this dissertation?

When

Dates and time periods associated with this dissertation.

Creation Date

  • August 2017

Added to The UNT Digital Library

  • Oct. 9, 2017, 11:44 a.m.

Description Last Updated

  • Feb. 1, 2024, 11:18 a.m.

Usage Statistics

When was this dissertation last used?

Yesterday: 0
Past 30 days: 0
Total Uses: 131

Interact With This Dissertation

Here are some suggestions for what to do next.

Start Reading

PDF Version Also Available for Download.

International Image Interoperability Framework

IIF Logo

We support the IIIF Presentation API

Pandey, Ravi Shanker. Markov Model of Segmentation and Clustering: Applications in Deciphering Genomes and Metagenomes, dissertation, August 2017; Denton, Texas. (https://digital.library.unt.edu/ark:/67531/metadc1011827/: accessed July 17, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; .

Back to Top of Screen