Developing a Phylogeny Based Machine Learning Algorithm for Metagenomics

PDF Version Also Available for Download.

Description

Metagenomics is the study of the totality of the complete genetic elements discovered from a defined environment. Different from traditional microbiology study, which only analyzes a small percent of microbes that could survive in laboratory, metagenomics allows researchers to get entire genetic information from all the samples in the communities. So metagenomics enables understanding of the target environments and the hidden relationships between bacteria and diseases. In order to efficiently analyze the metagenomics data, cutting-edge technologies for analyzing the relationships among microbes and communities are required. To overcome the challenges brought by rapid growth in metagenomics datasets, advances in novel ... continued below

Creation Information

Rong, Ruichen August 2017.

Context

This dissertation is part of the collection entitled: UNT Theses and Dissertations and was provided by UNT Libraries to Digital Library, a digital repository hosted by the UNT Libraries. It has been viewed 69 times , with 5 in the last month . More information about this dissertation can be viewed below.

Who

People and organizations associated with either the creation of this dissertation or its content.

Author

Publisher

Rights Holder

For guidance see Citations, Rights, Re-Use.

  • Rong, Ruichen

Provided By

UNT Libraries

The UNT Libraries serve the university and community by providing access to physical and online collections, fostering information literacy, supporting academic research, and much, much more.

Contact Us

What

Descriptive information to help identify this dissertation. Follow the links below to find similar items on the Digital Library.

Degree Information

Description

Metagenomics is the study of the totality of the complete genetic elements discovered from a defined environment. Different from traditional microbiology study, which only analyzes a small percent of microbes that could survive in laboratory, metagenomics allows researchers to get entire genetic information from all the samples in the communities. So metagenomics enables understanding of the target environments and the hidden relationships between bacteria and diseases. In order to efficiently analyze the metagenomics data, cutting-edge technologies for analyzing the relationships among microbes and communities are required. To overcome the challenges brought by rapid growth in metagenomics datasets, advances in novel methodologies for interpreting metagenomics data are clearly needed.
The first two chapters of this dissertation summarize and compare the widely-used methods in metagenomics and integrate these methods into pipelines. Properly analyzing metagenomics data requires a variety of bioinformatcis and statistical approaches to deal with different situations. The raw reads from sequencing centers need to be processed and denoised by several steps and then be further interpreted by ecological and statistical analysis. So understanding these algorithms and combining different approaches could potentially reduce the influence of noises and biases at different steps. And an efficient and accurate pipeline is important to robustly decipher the differences and functionality of bacteria in communities.
Traditional statistical analysis and machine learning algorithms have their limitations on analyzing metagenomics data. Thus, rest three chapters describe a new phylogeny based machine learning and feature selection algorithm to overcome these problems. The new method outperforms traditional algorithms and can provide more robust candidate microbes for further analysis. With the frowing sample size, deep neural network could potentially describe more complicated characteristic of data and thus improve model accuracy. So a deep learning framework is designed on top of the shallow learning algorithm stated above in order to further improve the prediction and selection accuracy.
The present dissertation work provides a powerful tool that utilizes machine learning techniques to identify signature bacteria and key information from huge amount of metagenomics data.

Subjects

Language

Identifier

Unique identifying numbers for this dissertation in the Digital Library or other systems.

Collections

This dissertation is part of the following collection of related materials.

UNT Theses and Dissertations

Theses and dissertations represent a wealth of scholarly and artistic content created by masters and doctoral students in the degree-seeking process. Some ETDs in this collection are restricted to use by the UNT community.

What responsibilities do I have when using this dissertation?

When

Dates and time periods associated with this dissertation.

Creation Date

  • August 2017

Added to The UNT Digital Library

  • Oct. 9, 2017, 11:44 a.m.

Usage Statistics

When was this dissertation last used?

Yesterday: 0
Past 30 days: 5
Total Uses: 69

Interact With This Dissertation

Here are some suggestions for what to do next.

Start Reading

PDF Version Also Available for Download.

International Image Interoperability Framework

IIF Logo

We support the IIIF Presentation API

Rong, Ruichen. Developing a Phylogeny Based Machine Learning Algorithm for Metagenomics, dissertation, August 2017; Denton, Texas. (digital.library.unt.edu/ark:/67531/metadc1011752/: accessed October 16, 2018), University of North Texas Libraries, Digital Library, digital.library.unt.edu; .