27 Matching Results

Search Results

Advanced search parameters have been applied.

Identification of Novel Genomic Islands in Liverpool Epidemic Strain of Pseudomonas aeruginosa Using Segmentation and Clustering

Description: This article utilizes a recursive segmentation and cluster procedure presented as a genome-mining tool, GEMINI, to decipher genomic islands and understand their contributions to the evolution of virulence and antibiotic resistance in Pseudomonas aeruginosa.
Date: August 3, 2016
Creator: Jani, Mehul; Mathee, Kalal & Azad, Rajeev K.
Partner: UNT College of Arts and Sciences

Algorithm Optimizations in Genomic Analysis Using Entropic Dissection

Description: In recent years, the collection of genomic data has skyrocketed and databases of genomic data are growing at a faster rate than ever before. Although many computational methods have been developed to interpret these data, they tend to struggle to process the ever increasing file sizes that are being produced and fail to take advantage of the advances in multi-core processors by using parallel processing. In some instances, loss of accuracy has been a necessary trade off to allow faster computation of the data. This thesis discusses one such algorithm that has been developed and how changes were made to allow larger input file sizes and reduce the time required to achieve a result without sacrificing accuracy. An information entropy based algorithm was used as a basis to demonstrate these techniques. The algorithm dissects the distinctive patterns underlying genomic data efficiently requiring no a priori knowledge, and thus is applicable in a variety of biological research applications. This research describes how parallel processing and object-oriented programming techniques were used to process larger files in less time and achieve a more accurate result from the algorithm. Through object oriented techniques, the maximum allowable input file size was significantly increased from 200 mb to 2000 mb. Using parallel processing techniques allowed the program to finish processing data in less than half the time of the sequential version. The accuracy of the algorithm was improved by reducing data loss throughout the algorithm. Finally, adding user-friendly options enabled the program to use requests more effectively and further customize the logic used within the algorithm.
Date: August 2015
Creator: Danks, Jacob R.
Partner: UNT Libraries

Investigating Human Gut Microbiome in Obesity with Machine Learning Methods

Description: Obesity is a common disease among all ages that has threatened human health and has become a global concern. Gut microbiota can affect human metabolism and thus may modulate obesity. Certain mixes of gut microbiota can protect the host to be healthy or predispose the host to obesity. Modern next-generation sequencing technique allows accessing huge amount of genetic information underlying microbiota and thus provides new insights into the functionality of these micro-organisms and their interactions with the host. Multiple previous studies have demonstrated that the microbiome might contribute to obesity by increasing dietary energy harvest, promoting fat deposition and triggering systemic inflammation. However, these researches are either based on lab cultivation studies or basic statistical analysis. In order to further explore how gut microbiota affect obesity, this thesis utilize a series of machine learning methods to analyze large amount of metagenomics data from human gut microbiome. The publicly available HMP (Human Microbiome Project) metagenomic sequencing data, contain microbiome data for healthy adults, including overweight and obese individuals, were used for this study. HMP gut data were organized based on two different feature definitions: taxonomic information and metabolic reconstruction information. Several widely used classification algorithms: namely Naive Bayes, Random Forest, SVM and elastic net logistic regression were applied to predict healthy or obese status of the subjects based on the cross-validation accuracy. Furthermore, the corresponding feature selection algorithms were used to identify signature features in each dataset that lead to the differences between healthy and obese samples. The results showed that these algorithms perform poorly on taxonomic data than metabolic pathway data though lots of selected taxa are still supported by literature. Among all the combinations between different algorithms and data, elastic net logistic regression has the best cross-validation performance and thus becomes the best model. In this model, several important ...
Date: August 2017
Creator: Zhong, Yuqing
Partner: UNT Libraries

Isolation and Characterization of Phages Infecting Streptomyces azureus

Description: Isolating novel phages using Streptomyces azureus, which produces antibiotic thiostrepton, as a host, and characterizing the genomes may help us to find new tools that could be used to develop antibiotics in addition to contribute to the databases of phages and specifically, Streptomyces phages. Streptomyces phages Alsaber, Omar, Attoomi, Rowa, and ZamZam were isolated using during this study. They were isolated from enriched soil and sequenced by Illumina sequencing method. They were isolated from three different geographical regions. They are siphoviridae phages that create small clear plaques with a diameter of approximately 0.5-1 mm, except for Rowa which has cloudy plaques, and they have varied sizes of their heads and tails. ZamZam was not characterized at this time. The sequencing shows that they are circular genome with 3' sticky overhang and various genomes' sizes with high percentage of GC content with the average of 66%. Alsaber was classified under sub-cluster BD3, while Omar was categorized under sub-cluster BD2. They share the same cluster of Cluster BD. Rowa was placed in Cluster BL and Attoomi is currently a singleton that does not fit into an established cluster. Alsaber yields 76 putative genes with no tRNA, Omar 81 putative genes with 1 tRNA. Attoomi 53 putative genes with no tRNA, and Rowa with 61 orfs and 7 tRNA. Rowa also was a putative temperate phage due to its lysogenic activity, and Row was not able to reinfect the lysogenic strain, S. azureus (Rowa). All of the isolated phages infected S. indigocolor, while only Attoomi and Rowa were able to infect S. tricolor. Upon completion of this project, we acquired more data and understanding of S. azureus phages and Actinobacteriophage in general, which will expand the scale of future research of Streptomyces bacteriophages.
Access: This item is restricted to UNT Community Members. Login required if off-campus.
Date: May 2018
Creator: Sulaiman, Ahmad M
Partner: UNT Libraries

Developing a Phylogeny Based Machine Learning Algorithm for Metagenomics

Description: Metagenomics is the study of the totality of the complete genetic elements discovered from a defined environment. Different from traditional microbiology study, which only analyzes a small percent of microbes that could survive in laboratory, metagenomics allows researchers to get entire genetic information from all the samples in the communities. So metagenomics enables understanding of the target environments and the hidden relationships between bacteria and diseases. In order to efficiently analyze the metagenomics data, cutting-edge technologies for analyzing the relationships among microbes and communities are required. To overcome the challenges brought by rapid growth in metagenomics datasets, advances in novel methodologies for interpreting metagenomics data are clearly needed. The first two chapters of this dissertation summarize and compare the widely-used methods in metagenomics and integrate these methods into pipelines. Properly analyzing metagenomics data requires a variety of bioinformatcis and statistical approaches to deal with different situations. The raw reads from sequencing centers need to be processed and denoised by several steps and then be further interpreted by ecological and statistical analysis. So understanding these algorithms and combining different approaches could potentially reduce the influence of noises and biases at different steps. And an efficient and accurate pipeline is important to robustly decipher the differences and functionality of bacteria in communities. Traditional statistical analysis and machine learning algorithms have their limitations on analyzing metagenomics data. Thus, rest three chapters describe a new phylogeny based machine learning and feature selection algorithm to overcome these problems. The new method outperforms traditional algorithms and can provide more robust candidate microbes for further analysis. With the frowing sample size, deep neural network could potentially describe more complicated characteristic of data and thus improve model accuracy. So a deep learning framework is designed on top of the shallow learning algorithm stated above in order to further ...
Date: August 2017
Creator: Rong, Ruichen
Partner: UNT Libraries

Markov Model of Segmentation and Clustering: Applications in Deciphering Genomes and Metagenomes

Description: Rapidly accumulating genomic data as a result of high-throughput sequencing has necessitated development of efficient computational methods to decode the biological information underlying these data. DNA composition varies across structurally or functionally different regions of a genome as well as those of distinct evolutionary origins. We adapted an integrative framework that combines a top-down, recursive segmentation algorithm with a bottom-up, agglomerative clustering algorithm to decipher compositionally distinct regions in genomes. The recursive segmentation procedure entails fragmenting a genome into compositionally distinct segments within a statistical hypothesis testing framework. This is followed by an agglomerative clustering procedure to group compositionally similar segments within the same framework. One of our main objectives was to decipher distinctive evolutionary patterns in sex chromosomes via unraveling the underlying compositional heterogeneity. Application of this approach to the human X-chromosome provided novel insights into the stratification of the X chromosome as a consequence of punctuated recombination suppressions between the X and Y from the distal long arm to the distal short arm. Novel "evolutionary strata" were identified particularly in the X conserved region (XCR) that is not amenable to the X-Y comparative analysis due to massive loss of the Y gametologs following recombination cessation. Our compositional based approach could circumvent the limitations of the current methods that depend on X-Y (or Z-W for ZW sex determination system) comparisons by deciphering the stratification even if only the sequence of sex chromosome in the homogametic sex (i.e. X or Z chromosome) is available. These studies were extended to the plant sex chromosomes which are known to have a number of evolutionary strata that formed at the initial stage of their evolution, presenting an opportunity to examine the onset of stratum formation on the sex chromosomes. Further applications included detection of horizontally acquired DNAs in extremophilic eukaryote, Galdieria sulphuraria, which ...
Access: This item is restricted to UNT Community Members. Login required if off-campus.
Date: August 2017
Creator: Pandey, Ravi Shanker
Partner: UNT Libraries

A Global Stochastic Modeling Framework to Simulate and Visualize Epidemics

Description: Epidemics have caused major human and monetary losses through the course of human civilization. It is very important that epidemiologists and public health personnel are prepared to handle an impending infectious disease outbreak. the ever-changing demographics, evolving infrastructural resources of geographic regions, emerging and re-emerging diseases, compel the use of simulation to predict disease dynamics. By the means of simulation, public health personnel and epidemiologists can predict the disease dynamics, population groups at risk and their geographic locations beforehand, so that they are prepared to respond in case of an epidemic outbreak. As a consequence of the large numbers of individuals and inter-personal interactions involved in simulating infectious disease spread in a region such as a county, sizeable amounts of data may be produced that have to be analyzed. Methods to visualize this data would be effective in facilitating people from diverse disciplines understand and analyze the simulation. This thesis proposes a framework to simulate and visualize the spread of an infectious disease in a population of a region such as a county. As real-world populations have a non-homogeneous demographic and spatial distribution, this framework models the spread of an infectious disease based on population of and geographic distance between census blocks; social behavioral parameters for demographic groups. the population is stratified into demographic groups in individual census blocks using census data. Infection spread is modeled by means of local and global contacts generated between groups of population in census blocks. the strength and likelihood of the contacts are based on population, geographic distance and social behavioral parameters of the groups involved. the disease dynamics are represented on a geographic map of the region using a heat map representation, where the intensity of infection is mapped to a color scale. This framework provides a tool for public health personnel and ...
Date: May 2012
Creator: Indrakanti, Saratchandra
Partner: UNT Libraries

Understanding Microbial Biodegradation of Environmental Contaminants

Description: The accumulation of industrial contaminants in the natural environments have rapidly become a serious threat for human and animal life. Fortunately, there are microorganisms capable of degrading or transforming environmental contaminants. The present dissertation work aimed to understand the genomic basis of microbial degradation and resistance. The focus was the genomic study of the following bacteria: a) Pseudomonas fluorescens NCIMB 11764, a unique bacterium with specific enzymes that allow cyanide adaptation features. Potential cyanide degradation mechanisms found in this strain included nit1C cluster, and CNO complex. Potential cyanide tolerance genes found included cyanide insensitive oxidases, nitric oxide producing gene, and iron metabolism genes. b) Cupriavidus sp. strain SK-3 and strain SK-4. The genome of both bacteria presented the bph operon for polychlorinated biphenyl (PCB) degradation, but we found differences in the sequences of the genes. Those differences might indicate their preferences for different PCB substrates. c) Arsenic resistant bacterial communities observed in the Atacama Desert. Specific bacteria were found to thrive depending on the arsenic concentration. Examples were Bacteroidetes and Spirochaetes phyla whose proportions increased in the river with high arsenic concentrations. Also, DNA repair and replication metabolic functions seem to be necessary for resistance to arsenic contaminated environments. Our research give us insights on how bacteria communities, not just individually, can adapt and become resistant to the contaminants. The present dissertation work showed specific genes and mechanisms for degradation and resistance of contaminants that could contribute to develop new bioremediation strategies.
Access: This item is restricted to UNT Community Members. Login required if off-campus.
Date: May 2015
Creator: Vilo Muñoz, Claudia Andrea
Partner: UNT Libraries

Phylogenetic and Functional Characterization of Cotton (Gossypium hirsutum) CENTRORADIALIS/TERMINAL FLOWER1/SELF-PRUNING Genes

Description: Plant architecture is an important agronomic trait driven by meristematic activities. Indeterminate meristems set repeating phytomers while determinate meristems produce terminal structures. The centroradialis/terminal flower1/self pruning (CETS) gene family modulates architecture by controlling determinate and indeterminate growth. Cotton (G. hirsutum) is naturally a photoperiodic perennial cultivated as a day-neutral annual. Management of this fiber crop is complicated by continued vegetative growth and asynchronous fruit set. Here, cotton CETS genes are phylogenetically and functionally characterized. We identified eight CETS genes in diploid cotton (G. raimondii and G. arboreum) and sixteen in tetraploid G. hirsutum that grouped within the three generally accepted CETS clades: flowering locus T (FT)-like, terminal flower1/self pruning (TFL1/SP)-like, and mother of FT and TFL1 (MFT)-like. Over-expression of single flower truss (GhSFT), the ortholog to Arabidopsis FT, accelerates the onset of flowering in Arabidopsis Col-0. In mutant rescue analysis, this gene driven by its native promoter rescues the ft-10 late flowering phenotype. GhSFT upstream sequence was used to drive expression of the uidA reporter gene. As anticipated, GUS accumulated in the vasculature of Arabidopsis leaves. Cotton has five TFL1-like genes, all of which delay flowering when ectopically expressed in Arabidopsis; the strongest phenotypes fail to produce functional flowers. Three of these genes, GhSP, GhTFL1-L2, and GhBFT-L2, rescue the early flowering tfl1-14 mutant phenotype. GhSPpro:uidA promoted GUS activity specifically in plant meristems; whereas, other GhTFL1-like promoters predominately drove GUS activities in plant vascular tissues. Finally, analysis of Gossypium CETS promoter sequences predicted that GhSFT, GhSP, GhTFL1-L1, GhTFL1-L2 and GhBFT-L2 are regulated by transcription factors involved in shoot and flowering development. Analysis of cotton's two MFT homologs indicated that neither gene functions to control shoot architecture. Our results emphasize the functional conservation of members of this gene family in flowering plants and also suggest this family as targets during artificial selection ...
Date: December 2017
Creator: Prewitt, Sarah F
Partner: UNT Libraries

Rapid Metabolic Response of Plants Exposed to Light Stress

Description: Environmental stress conditions can drastically affect plant growth and productivity. In contrast to soil moisture or salinity that can gradually change over a period of days or weeks, changes in light intensity or temperature can occur very rapidly, sometimes over the course of minutes or seconds. So, in our study we have taken an metabolomics approach to identify the rapid response of plants to light stress. In the first part we have focused on the ultrafast (0-90 sec) metabolic response of local tissues to light stress and in the second part we analyzed the metabolic response associated with rapid systemic signaling (0-12 min). Analysis of the rapid response of Arabidopsis to light stress has revealed 111 metabolites that significantly alter in their level during the first 90 sec of light stress exposure. We further show that the levels of free and total glutathione accumulate rapidly during light stress in Arabidopsis and that the accumulation of total glutathione during light stress is dependent on an increase in nitric oxide (NO) levels. We further suggest that the increase in precursors for glutathione biosynthesis could be linked to alterations in photorespiration, and that phosphoenolpyruvate could represent a major energy and carbon source for rapid metabolic responses. Taken together, our analysis could be used as an initial road map for the identification of different pathways that could be used to augment the rapid response of plants to abiotic stress. In addition, it highlights the important role of glutathione in initial stage of light stress response. Light-induced rapid systemic signaling and systemic acquired acclimation (SAA) are thought to play an important role in the response of plants to different abiotic stresses. Although molecular and metabolic responses to light stress have been extensively studied in local leaves, and to a lesser degree in systemic leaves, very ...
Access: This item is restricted to UNT Community Members. Login required if off-campus.
Date: May 2018
Creator: Choudhury, Feroza Kaneez
Partner: UNT Libraries

Genomic Island Discovery through Enrichment of Statistical Modeling with Biological Information

Description: Horizontal gene transfer enables acquisition and dissemination of novel traits including antibiotic resistance and virulence among bacteria. Frequently such traits are gained through the acquisition of clusters of functionally related genes, often referred to as genomic islands (GIs). Quantifying horizontal flow of GIs and assessing their contributions to the emergence and evolution of novel metabolic traits in bacterial organisms are central to understanding the evolution of bacteria in general and the evolution of pathogenicity and antibiotic resistance in particular, a focus of this dissertation study. Methods for GI detection have also evolved with advances in sequencing and bioinformatics, however, comprehensive assessment of these methods has been lacking. This motivated us to assess the performance of current methods for identifying islands on broad datasets of well-characterized bacterial genomes and synthetic genomes, and leverage this information to develop a novel approach that circumvents the limitations of the current state-of-the-art in GI detection. The main findings from our assessment studies were 1) the methods have complementary strengths, 2) a gene-clustering method utilizing codon usage bias as the discriminant criterion, namely, JS-CB, is most efficient in localizing genomic islands, specifically the well-studied SCCmec resistance island in methicillin resistant Staphylococcus aureus (MRSA) genomes, and 3) in general, the bottom up, gene by gene analysis methods, are inherently limited in their ability to decipher large structures such as GIs as single entities within bacterial genomes. We adapted a top-down approach based on recursive segmentation and agglomerative clustering and developed a GI prediction tool, GEMINI, which combined compositional features with segment context information to localize GIs in the Liverpool epidemic strain of Pseudomonas aeruginosa. Application of GEMINI to the genome of P. aeruginosa LESB58 demonstrated its ability to delineate experimentally verified GIs in the LESB58 genome. GEMINI identified several novel islands including pathogenicity islands and revealed the ...
Date: August 2018
Creator: Jani, Mehul
Partner: UNT Libraries

Environmental vibrios represent a source of antagonistic compounds that inhibit pathogenic Vibrio cholerae and Vibrio parahaemolyticus strains

Description: This article predicts that marine-derived bacteria should inhibit Vibrio pathogens and may be a source of unique antibiotic compounds.
Date: March 10, 2017
Creator: Burks, David J.; Norris, Stephen; Kauffman, Kathryn M.; Joy, Abigail; Arevalo, Philip; Azad, Rajeev K. et al.
Partner: UNT College of Arts and Sciences

Rapid identification of causative insertions underlying Medicago truncatula Tnt1 mutants defective in symbiotic nitrogen fixation from a forward genetic screen by whole genome sequencing

Description: This article demonstrates that whole genome sequencing is an efficient approach for identification of causative genes underlying symbiotic nitrogen fixation defective phenotypes in Medicago truncatula Tnt1 insertion mutants obtained via forward genetic screens.
Date: February 27, 2016
Creator: Veerappan, Vijaykumar; Jani, Mehul; Kadel, Khem; Troiani, Taylor; Gale, Ronny; Mayes, Tyler et al.
Partner: UNT College of Arts and Sciences

Phylogenetic analysis of eukaryotic NEET proteins uncovers a link between a key gene duplication event and the evolution of vertebrates

Description: This article describes the use of three members of the human NEET protein family (CISD1, mitoNEET; CISD2, NAF-1 or Miner 1; and CISD3, Miner2) as guides to conduct a phylogenetic analysis of eukaryotic NEET proteins and their evolution.
Date: February 16, 2017
Creator: Inupakutika, Madhuri A.; Sengupta, Soham; Nechushtai, Rachel; Jennings, Patricia A.; Onuchic, José N.; Azad, Rajeev K. et al.
Partner: UNT College of Arts and Sciences

Ultra-fast alterations in mRNA levels uncover multiple players in light stress acclimation in plants

Description: This article contains RNA sequencing analysis of Arabidopsis thaliana plants subjected to light stress in order to identify and characterize rapid changes in the steady-state level of different transcripts in response to light stress.
Date: September 26, 2015
Creator: Suzuki, Nobuhiro; Devireddy, Amith R.; Inupakutika, Madhuri A.; Baxter, Aaron; Miller, Gad; Song, Luhua et al.
Partner: UNT College of Arts and Sciences

Glucose or Altered Ceramide Biosynthesis Mediate Oxygen Deprivation Sensitivity Through Novel Pathways Revealed by Transcriptome Analysis in Caenorhabditis elegans

Description: This article discusses how RNA-sequencing analysis was performed to assess how a glucose-supplemented diet and/or a hyl-2 mutation altered the transcriptome.
Date: August 5, 2016
Creator: Ladage, Mary L.; King, Skylar D.; Burks, David J.; Quan, Daniel L.; Garcia, Anastacia M.; Azad, Rajeev K. et al.
Partner: UNT College of Arts and Sciences