9 Matching Results

Search Results

Advanced search parameters have been applied.

Scientific data analysis on data-parallel platforms.

Description: As scientific computing users migrate to petaflop platforms that promise to generate multi-terabyte datasets, there is a growing need in the community to be able to embed sophisticated analysis algorithms in the computing platforms' storage systems. Data Warehouse Appliances (DWAs) are attractive for this work, due to their ability to store and process massive datasets efficiently. While DWAs have been utilized effectively in data-mining and informatics applications, they remain largely unproven in scientific workloads. In this paper we present our experiences in adapting two mesh analysis algorithms to function on five different DWA architectures: two Netezza database appliances, an XtremeData dbX database, a LexisNexis DAS, and multiple Hadoop MapReduce clusters. The main contribution of this work is insight into the differences between these DWAs from a user's perspective. In addition, we present performance measurements for ten DWA systems to help understand the impact of different architectural trade-offs in these systems.
Date: September 1, 2010
Creator: Ulmer, Craig D.; Bayer, Gregory W.; Choe, Yung Ryn & Roe, Diana C.
Partner: UNT Libraries Government Documents Department

Scalable multi-correlative statistics and principal component analysis with Titan.

Description: This report summarizes existing statistical engines in VTK/Titan and presents the recently parallelized multi-correlative and principal component analysis engines. It is a sequel to [PT08] which studied the parallel descriptive and correlative engines. The ease of use of these parallel engines is illustrated by the means of C++ code snippets. Furthermore, this report justifies the design of these engines with parallel scalability in mind; then, this theoretical property is verified with test runs that demonstrate optimal parallel speed-up with up to 200 processors.
Date: February 1, 2009
Creator: Thompson, David C.; Bennett, Janine C.; Roe, Diana C. & Pebay, Philippe Pierre
Partner: UNT Libraries Government Documents Department

Enumerating molecules.

Description: This report is a comprehensive review of the field of molecular enumeration from early isomer counting theories to evolutionary algorithms that design molecules in silico. The core of the review is a detail account on how molecules are counted, enumerated, and sampled. The practical applications of molecular enumeration are also reviewed for chemical information, structure elucidation, molecular design, and combinatorial library design purposes. This review is to appear as a chapter in Reviews in Computational Chemistry volume 21 edited by Kenny B. Lipkowitz.
Date: April 1, 2004
Creator: Visco, Donald Patrick, Jr. (, . Tennessee Technological University, Cookeville, TN); Faulon, Jean-Loup Michel & Roe, Diana C.
Partner: UNT Libraries Government Documents Department

Developing algorithms for predicting protein-protein interactions of homology modeled proteins.

Description: The goal of this project was to examine the protein-protein docking problem, especially as it relates to homology-based structures, identify the key bottlenecks in current software tools, and evaluate and prototype new algorithms that may be developed to improve these bottlenecks. This report describes the current challenges in the protein-protein docking problem: correctly predicting the binding site for the protein-protein interaction and correctly placing the sidechains. Two different and complementary approaches are taken that can help with the protein-protein docking problem. The first approach is to predict interaction sites prior to docking, and uses bioinformatics studies of protein-protein interactions to predict theses interaction site. The second approach is to improve validation of predicted complexes after docking, and uses an improved scoring function for evaluating proposed docked poses, incorporating a solvation term. This scoring function demonstrates significant improvement over current state-of-the art functions. Initial studies on both these approaches are promising, and argue for full development of these algorithms.
Date: January 1, 2006
Creator: Martin, Shawn Bryan; Sale, Kenneth L.; Faulon, Jean-Loup Michel & Roe, Diana C.
Partner: UNT Libraries Government Documents Department

OVIS 2.0 user%3CU%2B2019%3Es guide.

Description: This document describes how to obtain, install, use, and enjoy a better life with OVIS version 2.0. The OVIS project targets scalable, real-time analysis of very large data sets. We characterize the behaviors of elements and aggregations of elements (e.g., across space and time) in data sets in order to detect anomalous behaviors. We are particularly interested in determining anomalous behaviors that can be used as advance indicators of significant events of which notification can be made or upon which action can be taken or invoked. The OVIS open source tool (BSD license) is available for download at ovis.ca.sandia.gov. While we intend for it to support a variety of application domains, the OVIS tool was initially developed for, and continues to be primarily tuned for, the investigation of High Performance Compute (HPC) cluster system health. In this application it is intended to be both a system administrator tool for monitoring and a system engineer tool for exploring the system state in depth. OVIS 2.0 provides a variety of statistical tools for examining the behavior of elements in a cluster (e.g., nodes, racks) and associated resources (e.g., storage appliances and network switches). It calculates and reports model values and outliers relative to those models. Additionally, it provides an interactive 3D physical view in which the cluster elements can be colored by raw element values (e.g., temperatures, memory errors) or by the comparison of those values to a given model. The analysis tools and the visual display allow the user to easily determine abnormal or outlier behaviors. The OVIS project envisions the OVIS tool, when applied to compute cluster monitoring, to be used in conjunction with the scheduler or resource manager in order to enable intelligent resource utilization. For example, nodes that are deemed less healthy, that is, nodes that exhibit outlier ...
Date: April 1, 2009
Creator: Mayo, Jackson R.; Gentile, Ann C.; Brandt, James M.; Thompson, David C.; Roe, Diana C.; Wong, Matthew H. et al.
Partner: UNT Libraries Government Documents Department

OVIS 3.2 user's guide.

Description: This document describes how to obtain, install, use, and enjoy a better life with OVIS version 3.2. The OVIS project targets scalable, real-time analysis of very large data sets. We characterize the behaviors of elements and aggregations of elements (e.g., across space and time) in data sets in order to detect meaningful conditions and anomalous behaviors. We are particularly interested in determining anomalous behaviors that can be used as advance indicators of significant events of which notification can be made or upon which action can be taken or invoked. The OVIS open source tool (BSD license) is available for download at ovis.ca.sandia.gov. While we intend for it to support a variety of application domains, the OVIS tool was initially developed for, and continues to be primarily tuned for, the investigation of High Performance Compute (HPC) cluster system health. In this application it is intended to be both a system administrator tool for monitoring and a system engineer tool for exploring the system state in depth. OVIS 3.2 provides a variety of statistical tools for examining the behavior of elements in a cluster (e.g., nodes, racks) and associated resources (e.g., storage appliances and network switches). It provides an interactive 3-D physical view in which the cluster elements can be colored by raw or derived element values (e.g., temperatures, memory errors). The visual display allows the user to easily determine abnormal or outlier behaviors. Additionally, it provides search capabilities for certain scheduler logs. The OVIS capabilities were designed to be highly interactive - for example, the job search may drive an analysis which in turn may drive the user generation of a derived value which would then be examined on the physical display. The OVIS project envisions the capabilities of its tools applied to compute cluster monitoring. In the future, integration with ...
Date: October 1, 2010
Creator: Mayo, Jackson R.; Gentile, Ann C.; Brandt, James M.; Houf, Catherine A.; Thompson, David C.; Roe, Diana C. et al.
Partner: UNT Libraries Government Documents Department

Understanding and engineering enzymes for enhanced biofuel production.

Description: Today, carbon-rich fossil fuels, primarily oil, coal and natural gas, provide 85% of the energy consumed in the United States. The release of greenhouse gases from these fuels has spurred research into alternative, non-fossil energy sources. Lignocellulosic biomass is renewable resource that is carbon-neutral, and can provide a raw material for alternative transportation fuels. Plant-derived biomass contains cellulose, which is difficult to convert to monomeric sugars for production of fuels. The development of cost-effective and energy-efficient processes to transform the cellulosic content of biomass into fuels is hampered by significant roadblocks, including the lack of specifically developed energy crops, the difficulty in separating biomass components, the high costs of enzymatic deconstruction of biomass, and the inhibitory effect of fuels and processing byproducts on organisms responsible for producing fuels from biomass monomers. One of the main impediments to more widespread utilization of this important resource is the recalcitrance of cellulosic biomass and techniques that can be utilized to deconstruct cellulosic biomass.
Date: January 1, 2009
Creator: Simmons, Blake Alexander; Volponi, Joanne V.; Sapra, Rajat; Faulon, Jean-Loup Michel; Buffleben, George M. & Roe, Diana C.
Partner: UNT Libraries Government Documents Department

The OVIS analysis architecture.

Description: This report summarizes the current statistical analysis capability of OVIS and how it works in conjunction with the OVIS data readers and interpolators. It also documents how to extend these capabilities. OVIS is a tool for parallel statistical analysis of sensor data to improve system reliability. Parallelism is achieved using a distributed data model: many sensors on similar components (metaphorically sheep) insert measurements into a series of databases on computers reserved for analyzing the measurements (metaphorically shepherds). Each shepherd node then processes the sheep data stored locally and the results are aggregated across all shepherds. OVIS uses the Visualization Tool Kit (VTK) statistics algorithm class hierarchy to perform analysis of each process's data but avoids VTK's model aggregation stage which uses the Message Passing Interface (MPI); this is because if a single process in an MPI job fails, the entire job will fail. Instead, OVIS uses asynchronous database replication to aggregate statistical models. OVIS has several additional features beyond those present in VTK that, first, accommodate its particular data format and, second, improve the memory and speed of the statistical analyses. First, because many statistical algorithms are multivariate in nature and sensor data is typically univariate, interpolation of data is required to provide simultaneous observations of metrics. Note that in this report, we will refer to a single value obtained from a sensor as a measurement while a collection of multiple sensor values simultaneously present in the system is an observation. A base class for interpolation is provided that abstracts the operation of converting multiple sensor measurements into simultaneous observations. A concrete implementation is provided that performs piecewise constant temporal interpolation of multiple metrics across a single component. Secondly, because calculations may summarize data too large to fit in memory OVIS analyses batches of observations at a time and aggregates ...
Date: July 1, 2010
Creator: Mayo, Jackson R.; Gentile, Ann C.; Brandt, James M.; De Sapio, Vincent; Thompson, David C.; Roe, Diana C. et al.
Partner: UNT Libraries Government Documents Department

A framework for graph-based synthesis, analysis, and visualization of HPC cluster job data.

Description: The monitoring and system analysis of high performance computing (HPC) clusters is of increasing importance to the HPC community. Analysis of HPC job data can be used to characterize system usage and diagnose and examine failure modes and their effects. This analysis is not straightforward, however, due to the complex relationships that exist between jobs. These relationships are based on a number of factors, including shared compute nodes between jobs, proximity of jobs in time, etc. Graph-based techniques represent an approach that is particularly well suited to this problem, and provide an effective technique for discovering important relationships in job queuing and execution data. The efficacy of these techniques is rooted in the use of a semantic graph as a knowledge representation tool. In a semantic graph job data, represented in a combination of numerical and textual forms, can be flexibly processed into edges, with corresponding weights, expressing relationships between jobs, nodes, users, and other relevant entities. This graph-based representation permits formal manipulation by a number of analysis algorithms. This report presents a methodology and software implementation that leverages semantic graph-based techniques for the system-level monitoring and analysis of HPC clusters based on job queuing and execution data. Ontology development and graph synthesis is discussed with respect to the domain of HPC job data. The framework developed automates the synthesis of graphs from a database of job information. It also provides a front end, enabling visualization of the synthesized graphs. Additionally, an analysis engine is incorporated that provides performance analysis, graph-based clustering, and failure prediction capabilities for HPC systems.
Date: August 1, 2010
Creator: Mayo, Jackson R.; Kegelmeyer, W. Philip, Jr.; Wong, Matthew H.; Pebay, Philippe Pierre; Gentile, Ann C.; Thompson, David C. et al.
Partner: UNT Libraries Government Documents Department