38 Matching Results

Search Results

Advanced search parameters have been applied.

Clustering Algorithms for Time Series Gene Expression in Microarray Data

Description: Clustering techniques are important for gene expression data analysis. However, efficient computational algorithms for clustering time-series data are still lacking. This work documents two improvements on an existing profile-based greedy algorithm for short time-series data; the first one is implementation of a scaling method on the pre-processing of the raw data to handle some extreme cases; the second improvement is modifying the strategy to generate better clusters. Simulation data and real microarray data were used to evaluate these improvements; this approach could efficiently generate more accurate clusters. A new feature-based algorithm was also developed in which steady state value; overshoot, rise time, settling time and peak time are generated by the 2nd order control system for the clustering purpose. This feature-based approach is much faster and more accurate than the existing profile-based algorithm for long time-series data.
Date: August 2012
Creator: Zhang, Guilin
Partner: UNT Libraries

Investigating Human Gut Microbiome in Obesity with Machine Learning Methods

Description: Obesity is a common disease among all ages that has threatened human health and has become a global concern. Gut microbiota can affect human metabolism and thus may modulate obesity. Certain mixes of gut microbiota can protect the host to be healthy or predispose the host to obesity. Modern next-generation sequencing technique allows accessing huge amount of genetic information underlying microbiota and thus provides new insights into the functionality of these micro-organisms and their interactions with the host. Multiple previous studies have demonstrated that the microbiome might contribute to obesity by increasing dietary energy harvest, promoting fat deposition and triggering systemic inflammation. However, these researches are either based on lab cultivation studies or basic statistical analysis. In order to further explore how gut microbiota affect obesity, this thesis utilize a series of machine learning methods to analyze large amount of metagenomics data from human gut microbiome. The publicly available HMP (Human Microbiome Project) metagenomic sequencing data, contain microbiome data for healthy adults, including overweight and obese individuals, were used for this study. HMP gut data were organized based on two different feature definitions: taxonomic information and metabolic reconstruction information. Several widely used classification algorithms: namely Naive Bayes, Random Forest, SVM and elastic net logistic regression were applied to predict healthy or obese status of the subjects based on the cross-validation accuracy. Furthermore, the corresponding feature selection algorithms were used to identify signature features in each dataset that lead to the differences between healthy and obese samples. The results showed that these algorithms perform poorly on taxonomic data than metabolic pathway data though lots of selected taxa are still supported by literature. Among all the combinations between different algorithms and data, elastic net logistic regression has the best cross-validation performance and thus becomes the best model. In this model, several important ...
Date: August 2017
Creator: Zhong, Yuqing
Partner: UNT Libraries

Elicitation of Protein-Protein Interactions from Biomedical Literature Using Association Rule Discovery

Description: Extracting information from a stack of data is a tedious task and the scenario is no different in proteomics. Volumes of research papers are published about study of various proteins in several species, their interactions with other proteins and identification of protein(s) as possible biomarker in causing diseases. It is a challenging task for biologists to keep track of these developments manually by reading through the literatures. Several tools have been developed by computer linguists to assist identification, extraction and hypotheses generation of proteins and protein-protein interactions from biomedical publications and protein databases. However, they are confronted with the challenges of term variation, term ambiguity, access only to abstracts and inconsistencies in time-consuming manual curation of protein and protein-protein interaction repositories. This work attempts to attenuate the challenges by extracting protein-protein interactions in humans and elicit possible interactions using associative rule mining on full text, abstracts and captions from figures available from publicly available biomedical literature databases. Two such databases are used in our study: Directory of Open Access Journals (DOAJ) and PubMed Central (PMC). A corpus is built using articles based on search terms. A dataset of more than 38,000 protein-protein interactions from the Human Protein Reference Database (HPRD) is cross-referenced to validate discovered interactive pairs. A set of an optimal size of possible binary protein-protein interactions is generated to be made available for clinician or biological validation. A significant change in the number of new associations was found by altering the thresholds for support and confidence metrics. This study narrows down the limitations for biologists in keeping pace with discovery of protein-protein interactions via manually reading the literature and their needs to validate each and every possible interaction.
Date: August 2010
Creator: Samuel, Jarvie John
Partner: UNT Libraries

Exploring the Evolutionary History of North American Prairie Grouse (Genus: Tympanuchus) Using Multi-locus Coalescent Analyses

Description: Conservation biologists are increasingly using phylogenetics as a tool to understand evolutionary relationships and taxonomic classification. The taxonomy of North American prairie grouse (sharp-tailed grouse, T. phasianellus; lesser prairie-chicken, T. pallidicinctus; greater prairie-chicken, T. cupido; including multiple subspecies) has been designated based on physical characteristics, geography, and behavior. However, previous studies have been inconclusive in determining the evolutionary history of prairie grouse based on genetic data. Therefore, additional research investigating the evolutionary history of prairie grouse is warranted. In this study, ten loci (including mitochondrial, autosomal, and Z-linked markers) were sequenced across multiple populations of prairie grouse, and both traditional and coalescent-based phylogenetic analyses were used to address the evolutionary history of this genus. Results from this study indicate that North American prairie grouse diverged in the last 200,000 years, with species-level taxa forming well-supported monophyletic clades in species tree analyses. With these results, managers of the critically endangered Attwater's prairie-chicken (T. c. attwateri) can better evaluate whether outcrossing Attwater's with greater prairie-chickens would be a viable management tool for Attwater's conservation.
Date: May 2013
Creator: Galla, Stephanie J.
Partner: UNT Libraries

Enabling Large Scale Scientific Computations for Expressed Sequence Tag Sequencing over Grid and Cloud Computing Clusters

Description: This paper discusses expressed sequence tag sequencing over grid and cloud computing clusters, specifically for biological applications. In this paper, the authors propose a Web service framework for high-level job scheduling that is developed for scientific applications.
Date: September 2009
Creator: Pallickara, Sangmi Lee; Pierce, Marlon; Dong, Qunfeng & Kong, ChinHua
Partner: UNT College of Arts and Sciences

Investigation of the Effect of Type 2 Diabetes Mellitus on Subgingival Plaque Microbiota by High-Throughput 16S rDNA Pyrosequencing

Description: Article discussing the bacterial composition of subgingival plaque among diabetic and non-diabetic subjects to determine the effect that diabetes mellitus has on dental health.
Date: April 22, 2013
Creator: Zhou, Mi; Rong, Ruichen; Munro, Daniel; Zhu, Chunxia; Gao, Xiang; Zhang, Qi et al.
Partner: UNT College of Arts and Sciences

Developing a Phylogeny Based Machine Learning Algorithm for Metagenomics

Description: Metagenomics is the study of the totality of the complete genetic elements discovered from a defined environment. Different from traditional microbiology study, which only analyzes a small percent of microbes that could survive in laboratory, metagenomics allows researchers to get entire genetic information from all the samples in the communities. So metagenomics enables understanding of the target environments and the hidden relationships between bacteria and diseases. In order to efficiently analyze the metagenomics data, cutting-edge technologies for analyzing the relationships among microbes and communities are required. To overcome the challenges brought by rapid growth in metagenomics datasets, advances in novel methodologies for interpreting metagenomics data are clearly needed. The first two chapters of this dissertation summarize and compare the widely-used methods in metagenomics and integrate these methods into pipelines. Properly analyzing metagenomics data requires a variety of bioinformatcis and statistical approaches to deal with different situations. The raw reads from sequencing centers need to be processed and denoised by several steps and then be further interpreted by ecological and statistical analysis. So understanding these algorithms and combining different approaches could potentially reduce the influence of noises and biases at different steps. And an efficient and accurate pipeline is important to robustly decipher the differences and functionality of bacteria in communities. Traditional statistical analysis and machine learning algorithms have their limitations on analyzing metagenomics data. Thus, rest three chapters describe a new phylogeny based machine learning and feature selection algorithm to overcome these problems. The new method outperforms traditional algorithms and can provide more robust candidate microbes for further analysis. With the frowing sample size, deep neural network could potentially describe more complicated characteristic of data and thus improve model accuracy. So a deep learning framework is designed on top of the shallow learning algorithm stated above in order to further ...
Date: August 2017
Creator: Rong, Ruichen
Partner: UNT Libraries

Draft Genome Sequence of Cupriavidus sp. Strain SK-4, a di-ortho-Substituted Biphenyl-Utilizing Bacterium Isolated from Polychlorinated Biphenyl-Contaminated Sludge

Description: Article on the draft genome sequence of Cupriavidus sp. strain SK-4, a di-ortho-substituted biphenyl-utilizing bacterium isolated from polychlorinated biphenyl-contaminated sludge.
Date: May 1, 2014
Creator: Vilo, Claudia A.; Benedik, Michael J.; Ilori, Matthew Olusoji & Dong, Qunfeng
Partner: UNT College of Arts and Sciences

Understanding Microbial Biodegradation of Environmental Contaminants

Description: The accumulation of industrial contaminants in the natural environments have rapidly become a serious threat for human and animal life. Fortunately, there are microorganisms capable of degrading or transforming environmental contaminants. The present dissertation work aimed to understand the genomic basis of microbial degradation and resistance. The focus was the genomic study of the following bacteria: a) Pseudomonas fluorescens NCIMB 11764, a unique bacterium with specific enzymes that allow cyanide adaptation features. Potential cyanide degradation mechanisms found in this strain included nit1C cluster, and CNO complex. Potential cyanide tolerance genes found included cyanide insensitive oxidases, nitric oxide producing gene, and iron metabolism genes. b) Cupriavidus sp. strain SK-3 and strain SK-4. The genome of both bacteria presented the bph operon for polychlorinated biphenyl (PCB) degradation, but we found differences in the sequences of the genes. Those differences might indicate their preferences for different PCB substrates. c) Arsenic resistant bacterial communities observed in the Atacama Desert. Specific bacteria were found to thrive depending on the arsenic concentration. Examples were Bacteroidetes and Spirochaetes phyla whose proportions increased in the river with high arsenic concentrations. Also, DNA repair and replication metabolic functions seem to be necessary for resistance to arsenic contaminated environments. Our research give us insights on how bacteria communities, not just individually, can adapt and become resistant to the contaminants. The present dissertation work showed specific genes and mechanisms for degradation and resistance of contaminants that could contribute to develop new bioremediation strategies.
Access: This item is restricted to UNT Community Members. Login required if off-campus.
Date: May 2015
Creator: Vilo Muñoz, Claudia Andrea
Partner: UNT Libraries

TableMaker: An ad hoc Query Tool for Relational Databases

Description: This paper discusses an ad hoc query tool for relational databases. Most Web servers hosting biological data limit users to a defined set of search options and output formats that are short of the whole range of options available to users with direct database access. However, to make full use of the wealth of data in the database resource, it is desirable to have an intermediate solution that provides a broad range of flexible query and output options through a Web portal.
Date: July 2008
Creator: Lushbough, Carol; Duvick, Jon; Dong, Qunfeng; Jennewein, Douglas; Reynoldson, Joe & Brendel, Volker
Partner: UNT College of Arts and Sciences

Multiple domains in MtENOD8 protein including the signal peptide target it to the symbiosome

Description: Article presenting evidence from GFP fusion experiments that the MtENOD8 protein contains at least three symbiosome targeting domains, including its N-terminal signal peptide (SP).
Date: May 2012
Creator: Meckfessel, Matthew H.; Blancaflor, Elison B.; Plunkett, Michael; Dong, Qunfeng & Dickstein, Rebecca
Partner: UNT College of Arts and Sciences