164 Matching Results

Search Results

Advanced search parameters have been applied.

Allinea DDT as a Parallel Debugging Alternative to Totalview

Description: Totalview, from the Etnus Corporation, is a sophisticated and feature rich software debugger for parallel applications. As Totalview has gained in popularity and market share its pricing model has increased to the point where it is often prohibitively expensive for massively parallel supercomputers. Additionally, many of Totalview's advanced features are not used by members of the scientific computing community. For these reasons, supercomputing centers have begun to search for a basic parallel debugging tool which can be used as an alternative to Totalview. As the cost and complexity of Totalview has increased over the years, scientific computing centers have started searching for a viable parallel debugging alternative. DDT (Distributed Debugging Tool) from Allinea Software is a relatively new parallel debugging tool which aims to provide much of the same functionality as Totalview. This review outlines the basic features and limitations of DDT to determine if it can be a reasonable substitute for Totalview. DDT was tested on the NERSC platforms Bassi, Seaborg, Jacquard and Davinci with Fortran90, C, and C++ codes using MPI and OpenMP for parallelism.
Date: March 5, 2007
Creator: Antypas, K.B.
Partner: UNT Libraries Government Documents Department

BER Science Network Requirements Workshop -- July 26-27,2007

Description: The Energy Sciences Network (ESnet) is the primary provider of network connectivity for the US Department of Energy Office of Science, the single largest supporter of basic research in the physical sciences in the United States of America. In support of the Office of Science programs, ESnet regularly updates and refreshes its understanding of the networking requirements of the instruments, facilities, scientists, and science programs that it serves. This focus has helped ESnet to be a highly successful enabler of scientific discovery for over 20 years. In July 2007, ESnet and the Biological and Environmental Research (BER) Program Office of the DOE Office of Science organized a workshop to characterize the networking requirements of the science programs funded by the BER Program Office. These included several large programs and facilities, including Atmospheric Radiation Measurement (ARM) Program and the ARM Climate Research Facility (ACRF), Bioinformatics and Life Sciences Programs, Climate Sciences Programs, the Environmental Molecular Sciences Laboratory at PNNL, the Joint Genome Institute (JGI). National Center for Atmospheric Research (NCAR) also participated in the workshop and contributed a section to this report due to the fact that a large distributed data repository for climate data will be established at NERSC, ORNL and NCAR, and this will have an effect on ESnet. Workshop participants were asked to codify their requirements in a 'case study' format, which summarizes the instruments and facilities necessary for the science and the process by which the science is done, with emphasis on the network services needed and the way in which the network is used. Participants were asked to consider three time scales in their case studies--the near term (immediately and up to 12 months in the future), the medium term (3-5 years in the future), and the long term (greater than 5 years in the future). ...
Date: February 1, 2008
Creator: Tierney, Brian L. & Dart, Eli
Partner: UNT Libraries Government Documents Department

RRS: Replica Registration Service for Data Grids

Description: Over the last few years various scientific experiments and Grid projects have developed different catalogs for keeping track of their data files. Some projects use specialized file catalogs, others use distributed replica catalogs to reference files at different locations. Due to this diversity of catalogs, it is very hard to manage files across Grid projects, or to replace one catalog with another. In this paper we introduce a new Grid service called the Replica Registration Service (RRS). It can be thought of as an abstraction of the concepts for registering files and their replicas. In addition to traditional single file registration operations, the RRS supports collective file registration requests and keeps persistent registration queues. This approach is of particular importance for large-scale usage where thousands of files are copied and registered. Moreover, the RRS supports a set of error directives that are triggered in case of registration failures. Our goal is to provide a single uniform interface for various file catalogs to support the registration of files across multiple Grid projects, and to make Grid clients oblivious to the specific catalog used.
Date: July 15, 2005
Creator: Shoshani, Arie; Sim, Alex & Stockinger, Kurt
Partner: UNT Libraries Government Documents Department

Securing Resources in Collaborative Environments: A Peer-to-peerApproach

Description: We have developed a security model that facilitates control of resources by autonomous peers who act on behalf of collaborating users. This model allows a gradual build-up of trust. It enables secure interactions among users that do not necessarily know each other and allows them to build trust over the course of their collaboration. This paper describes various aspects of our security model and describes an architecture that implements this model to provide security in pure peer-to-peer environments.
Date: September 19, 2005
Creator: Berket, Karlo; Essiari, Abdelilah & Thompson, Mary R.
Partner: UNT Libraries Government Documents Department

Shape Optimization of Swimming Sheets

Description: The swimming behavior of a flexible sheet which moves by propagating deformation waves along its body was first studied by G. I. Taylor in 1951. In addition to being of theoretical interest, this problem serves as a useful model of the locomotion of gastropods and various micro-organisms. Although the mechanics of swimming via wave propagation has been studied extensively, relatively little work has been done to define or describe optimal swimming by this mechanism.We carry out this objective for a sheet that is separated from a rigid substrate by a thin film of viscous Newtonian fluid. Using a lubrication approximation to model the dynamics, we derive the relevant Euler-Lagrange equations to optimize swimming speed and efficiency. The optimization equations are solved numerically using two different schemes: a limited memory BFGS method that uses cubic splines to represent the wave profile, and a multi-shooting Runge-Kutta approach that uses the Levenberg-Marquardt method to vary the parameters of the equations until the constraints are satisfied. The former approach is less efficient but generalizes nicely to the non-lubrication setting. For each optimization problem we obtain a one parameter family of solutions that becomes singular in a self-similar fashion as the parameter approaches a critical value. We explore the validity of the lubrication approximation near this singular limit by monitoring higher order corrections to the zeroth order theory and by comparing the results with finite element solutions of the full Stokes equations.
Date: March 1, 2005
Creator: Wilkening, J. & Hosoi, A.E.
Partner: UNT Libraries Government Documents Department

A survey of codes and algorithms used in NERSC material scienceallocations

Description: We have carried out a survey of codes and algorithms used on NERSC computers within the science category of material science. This is part of the effort to track the usage of different algorithms in NERSC community. This survey is based on the data provided in the ERCAP application of FY06. To figure out the usage of each code in one account, we have multiplied the total high performance computer (HPC) time allocation (MPP hours) of this account with the percentage usage of this code as estimated by the users in the ERCAP application. This is not the actual usage time, but should be a good estimation of it, and it represents the intention of the users.
Date: June 1, 2006
Creator: Wang, Lin-Wang
Partner: UNT Libraries Government Documents Department

The design and implementation of Berkeley Lab's linuxcheckpoint/restart

Description: This paper describes Berkeley Linux Checkpoint/Restart (BLCR), a linux kernel module that allows system-level checkpoints on a variety of Linux systems. BLCR can be used either as a stand alone system for checkpointing applications on a single machine, or as a component by a scheduling system or parallel communication library for checkpointing and restoring parallel jobs running on multiple machines. Integration with Message Passing Interface (MPI) and other parallel systems is described.
Date: April 30, 2005
Creator: Duell, Jason
Partner: UNT Libraries Government Documents Department

Electronic structure of Calcium hexaborides

Description: We present a theoretical study of crystal and electronic structures of CaB6 within a screened-exchange local density approximation (sX-LDA). Our ab initio total energy calculations show that CaB6 is a semiconductor with a gap of >1.2 eV, in agreement with recent experimental observations. We show a very sensitive band gap dependence on the crystal internal parameter, which might partially explain the scatter of previous theoretical results. Our calculation demonstrates that it is essential to study this system simultaneously for both crystal structures and electronic properties, and that the sX-LDA provides an ideal method for this problem.
Date: June 15, 2005
Creator: Lee, Byounghak & Wang, Lin-Wang
Partner: UNT Libraries Government Documents Department

An embedded boundary method for viscous, conducting compressibleflow

Description: The evolution of an Inertial Fusion Energy (IFE) chamberinvolves a repetition of short, intense depositions of energy (fromtarget ignition) into a reaction chamber, followed by the turbulentrelaxation of that energy through shock waves and thermal conduction tothe vessel walls. We present an algorithm for 2D simulations of the fluidinside an IFE chamber between fueling repetitions. Our finite-volumediscretization for the Navier-Stokes equations incorporates a Cartesiangrid treatment for irregularly-shaped domain boundaries. The discreteconservative update is based on a time-explicit Godunov method foradvection, and a two-stage Runge-Kutta update for diffusion accommodatingstate-dependent transport properties. Conservation is enforced on cutcells along the embedded boundary interface using a local redistributionscheme so that the explicit time step for the combined approach isgoverned by the mesh spacing in the uniform grid. The test problemsdemonstrate second-order convergence of the algorithm on smooth solutionprofiles, and the robust treatment of discontinuous initial data in anIFE-relevant vessel geometry.
Date: October 20, 2004
Creator: Dragojlovic, Zoran; Najmabadi, Farrokh & Day, Marcus
Partner: UNT Libraries Government Documents Department

FastBit Reference Manual

Description: An index in a database system is a data structure that utilizes redundant information about the base data to speed up common searching and retrieval operations. Most commonly used indexes are variants of B-trees, such as B+-tree and B*-tree. FastBit implements a set of alternative indexes call compressed bitmap indexes. Compared with B-tree variants, these indexes provide very efficient searching and retrieval operations by sacrificing the efficiency of updating the indexes after the modification of an individual record. In addition to the well-known strengths of bitmap indexes, FastBit has a special strength stemming from the bitmap compression scheme used. The compression method is called the Word-Aligned Hybrid (WAH) code. It reduces the bitmap indexes to reasonable sizes and at the same time allows very efficient bitwise logical operations directly on the compressed bitmaps. Compared with the well-known compression methods such as LZ77 and Byte-aligned Bitmap code (BBC), WAH sacrifices some space efficiency for a significant improvement in operational efficiency. Since the bitwise logical operations are the most important operations needed to answer queries, using WAH compression has been shown to answer queries significantly faster than using other compression schemes. Theoretical analyses showed that WAH compressed bitmap indexes are optimal for one-dimensional range queries. Only the most efficient indexing schemes such as B+-tree and B*-tree have this optimality property. However, bitmap indexes are superior because they can efficiently answer multi-dimensional range queries by combining the answers to one-dimensional queries.
Date: August 2, 2007
Creator: Wu, Kesheng
Partner: UNT Libraries Government Documents Department

From Self-consistency to SOAR: Solving Large Scale NonlinearEigenvalue Problems

Description: What is common among electronic structure calculation, design of MEMS devices, vibrational analysis of high speed railways, and simulation of the electromagnetic field of a particle accelerator? The answer: they all require solving large scale nonlinear eigenvalue problems. In fact, these are just a handful of examples in which solving nonlinear eigenvalue problems accurately and efficiently is becoming increasingly important. Recognizing the importance of this class of problems, an invited minisymposium dedicated to nonlinear eigenvalue problems was held at the 2005 SIAM Annual Meeting. The purpose of the minisymposium was to bring together numerical analysts and application scientists to showcase some of the cutting edge results from both communities and to discuss the challenges they are still facing. The minisymposium consisted of eight talks divided into two sessions. The first three talks focused on a type of nonlinear eigenvalue problem arising from electronic structure calculations. In this type of problem, the matrix Hamiltonian H depends, in a non-trivial way, on the set of eigenvectors X to be computed. The invariant subspace spanned by these eigenvectors also minimizes a total energy function that is highly nonlinear with respect to X on a manifold defined by a set of orthonormality constraints. In other applications, the nonlinearity of the matrix eigenvalue problem is restricted to the dependency of the matrix on the eigenvalues to be computed. These problems are often called polynomial or rational eigenvalue problems In the second session, Christian Mehl from Technical University of Berlin described numerical techniques for solving a special type of polynomial eigenvalue problem arising from vibration analysis of rail tracks excited by high-speed trains.
Date: February 1, 2006
Creator: Bai, Zhaojun & Yang, Chao
Partner: UNT Libraries Government Documents Department

Minimizing I/O Costs of Multi-Dimensional Queries with BitmapIndices

Description: Bitmap indices have been widely used in scientific applications and commercial systems for processing complex,multi-dimensional queries where traditional tree-based indices would not work efficiently. A common approach for reducing the size of a bitmap index for high cardinality attributes is to group ranges of values of an attribute into bins and then build a bitmap for each bin rather than a bitmap for each value of the attribute. Binning reduces storage costs,however, results of queries based on bins often require additional filtering for discarding it false positives, i.e., records in the result that do not satisfy the query constraints. This additional filtering,also known as ''candidate checking,'' requires access to the base data on disk and involves significant I/O costs. This paper studies strategies for minimizing the I/O costs for ''candidate checking'' for multi-dimensional queries. This is done by determining the number of bins allocated for each dimension and then placing bin boundaries in optimal locations. Our algorithms use knowledge of data distribution and query workload. We derive several analytical results concerning optimal bin allocation for a probabilistic query model. Our experimental evaluation with real life data shows an average I/O cost improvement of at least a factor of 10 for multi-dimensional queries on datasets from two different applications. Our experiments also indicate that the speedup increases with the number of query dimensions.
Date: March 30, 2006
Creator: Rotem, Doron; Stockinger, Kurt & Wu, Kesheng
Partner: UNT Libraries Government Documents Department

NERSC Annual Report 2005

Description: The National Energy Research Scientific Computing Center (NERSC) is the premier computational resource for scientific research funded by the DOE Office of Science. The Annual Report includes summaries of recent significant and representative computational science projects conducted on NERSC systems as well as information about NERSC's current and planned systems and services.
Date: July 31, 2006
Creator: Hules (Ed.), John
Partner: UNT Libraries Government Documents Department

Optimizing Candidate Check Costs for Bitmap Indices

Description: In this paper, we propose a new strategy for optimizing the placement of bin boundaries to minimize the cost of query evaluation using bitmap indices with binning. For attributes with a large number of distinct values, often the most efficient index scheme is a bitmap index with binning. However, this type of index may not be able to fully resolve some user queries. To fully resolve these queries, one has to access parts of the original data to check whether certain candidate records actually satisfy the specified conditions. We call this procedure the candidate check, which usually dominates the total query processing time. Given a set of user queries, we seek to minimize the total time required to answer the queries by optimally placing the bin boundaries. We show that our dynamic programming based algorithm can efficiently determine the bin boundaries. We verify our analysis with some real user queries from the Sloan Digital Sky Survey. For queries that require significant amount of time to perform candidate check, using our optimal bin boundaries reduces the candidate check time by a factor of 2 and the total query processing time by 40 percent.
Date: July 10, 2005
Creator: Rotem, Doron; Stockinger, Kurt & Wu, Kesheng
Partner: UNT Libraries Government Documents Department

Optimizing connected component labeling algorithms

Description: This paper presents two new strategies that can be used to greatly improve the speed of connected component labeling algorithms. To assign a label to a new object, most connected component labeling algorithms use a scanning step that examines some of its neighbors. The first strategy exploits the dependencies among them to reduce the number of neighbors examined. When considering 8-connected components in a 2D image, this can reduce the number of neighbors examined from four to one in many cases. The second strategy uses an array to store the equivalence information among the labels. This replaces the pointer based rooted trees used to store the same equivalence information. It reduces the memory required and also produces consecutive final labels. Using an array instead of the pointer based rooted trees speeds up the connected component labeling algorithms by a factor of 5 {approx} 100 in our tests on random binary images.
Date: January 16, 2005
Creator: Wu, Kesheng; Otoo, Ekow & Shoshani, Arie
Partner: UNT Libraries Government Documents Department

GridRun: A lightweight packaging and execution environment forcompact, multi-architecture binaries

Description: GridRun offers a very simple set of tools for creating and executing multi-platform binary executables. These ''fat-binaries'' archive native machine code into compact packages that are typically a fraction the size of the original binary images they store, enabling efficient staging of executables for heterogeneous parallel jobs. GridRun interoperates with existing distributed job launchers/managers like Condor and the Globus GRAM to greatly simplify the logic required launching native binary applications in distributed heterogeneous environments.
Date: February 1, 2004
Creator: Shalf, John & Goodale, Tom
Partner: UNT Libraries Government Documents Department

The Inhibiting Bisection Problem

Description: Given a graph where each vertex is assigned a generation orconsumption volume, we try to bisect the graph so that each part has asignificant generation/consumption mismatch, and the cutsize of thebisection is small. Our motivation comes from the vulnerability analysisof distribution systems such as the electric power system. We show thatthe constrained version of the problem, where we place either the cutsizeor the mismatch significance as a constraint and optimize the other, isNP-complete, and provide an integer programming formulation. We alsopropose an alternative relaxed formulation, which can trade-off betweenthe two objectives and show that the alternative formulation of theproblem can be solved in polynomial time by a maximum flow solver. Ourexperiments with benchmark electric power systems validate theeffectiveness of our methods.
Date: December 18, 2006
Creator: Pinar, Ali; Fogel, Yonatan & Lesieutre, Bernard
Partner: UNT Libraries Government Documents Department

Interactive, Internet Delivery of Scientific Visualization viaStructured, Prerendered Multiresolution Imagery

Description: We present a novel approach for highly interactive remote delivery of visualization results. Instead of real-time rendering across the internet, our approach, inspired by QuickTime VR's Object Movieconcept, delivers pre-rendered images corresponding to different viewpoints and different time steps to provide the experience of 3D and temporal navigation. We use tiled, multiresolution image streaming to consume minimum bandwidth while providing the maximum resolution that a user can perceive from a given viewpoint. Since image data, a viewpoint and time stamps are the only required inputs, our approach is generally applicable to all visualization and graphics rendering applications capable of generating image files in an ordered fashion. Our design is a form of latency-tolerant remote visualization, where visualization and Rendering time is effectively decoupled from interactive exploration. Our approach trades off increased interactivity, flexible resolution (for individual clients), reduced load and effective reuse of coherent frames between multiple users (from the servers perspective) at the expense of unconstrained exploration. A normal web server is the vehicle for providing on-demand images to the remote client application, which uses client-pull to obtain and cache only those images required to fulfill the interaction needs. This paper presents an architectural description of the system along with a performance characterization for stage of production, delivery and viewing pipeline.
Date: April 20, 2005
Creator: Chen, Jerry; Yoon, Ilmi & Bethel, E. Wes
Partner: UNT Libraries Government Documents Department

Towards Optimal Multi-Dimensional Query Processing with BitmapIndices

Description: Bitmap indices have been widely used in scientific applications and commercial systems for processing complex, multi-dimensional queries where traditional tree-based indices would not work efficiently. This paper studies strategies for minimizing the access costs for processing multi-dimensional queries using bitmap indices with binning. Innovative features of our algorithm include (a) optimally placing the bin boundaries and (b) dynamically reordering the evaluation of the query terms. In addition, we derive several analytical results concerning optimal bin allocation for a probabilistic query model. Our experimental evaluation with real life data shows an average I/O cost improvement of at least a factor of 10 for multi-dimensional queries on datasets from two different applications. Our experiments also indicate that the speedup increases with the number of query dimensions.
Date: September 30, 2005
Creator: Rotem, Doron; Stockinger, Kurt & Wu, Kesheng
Partner: UNT Libraries Government Documents Department

Towards Ultra-High Resolution Models of Climate and Weather

Description: We present a speculative extrapolation of the performance aspects of an atmospheric general circulation model to ultra-high resolution and describe alternative technological paths to realize integration of such a model in the relatively near future. Due to a superlinear scaling of the computational burden dictated by stability criterion, the solution of the equations of motion dominate the calculation at ultra-high resolutions. From this extrapolation, it is estimated that a credible kilometer scale atmospheric model would require at least a sustained ten petaflop computer to provide scientifically useful climate simulations. Our design study portends an alternate strategy for practical power-efficient implementations of petaflop scale systems. Embedded processor technology could be exploited to tailor a custom machine designed to ultra-high climate model specifications at relatively affordable cost and power considerations. The major conceptual changes required by a kilometer scale climate model are certain to be difficult to implement. Although the hardware, software, and algorithms are all equally critical in conducting ultra-high climate resolution studies, it is likely that the necessary petaflop computing technology will be available in advance of a credible kilometer scale climate model.
Date: January 1, 2007
Creator: Wehner, Michael; Oliker, Leonid & Shalf, John
Partner: UNT Libraries Government Documents Department

Two Strategies to Speed up Connected Component LabelingAlgorithms

Description: This paper presents two new strategies to speed up connectedcomponent labeling algorithms. The first strategy employs a decisiontreeto minimize the work performed in the scanning phase of connectedcomponent labeling algorithms. The second strategy uses a simplifiedunion-find data structure to represent the equivalence information amongthe labels. For 8-connected components in atwo-dimensional (2D) image,the first strategy reduces the number of neighboring pixels visited from4 to7/3 on average. In various tests, using a decision tree decreases thescanning time by a factor of about 2. The second strategy uses a compactrepresentation of the union-find data structure. This strategysignificantly speeds up the labeling algorithms. We prove analyticallythat a labeling algorithm with our simplified union-find structure hasthe same optimal theoretical time complexity as do the best labelingalgorithms. By extensive experimental measurements, we confirm theexpected performance characteristics of the new labeling algorithms anddemonstrate that they are faster than other optimal labelingalgorithms.
Date: November 13, 2005
Creator: Wu, Kesheng; Otoo, Ekow & Suzuki, Kenji
Partner: UNT Libraries Government Documents Department

Unsymmetric ordering using a constrained Markowitz scheme

Description: We present a family of ordering algorithms that can be used as a preprocessing step prior to performing sparse LU factorization. The ordering algorithms simultaneously achieve the objectives of selecting numerically good pivots and preserving the sparsity. We describe the algorithmic properties and challenges in their implementation. By mixing the two objectives we show that we can reduce the amount of fill-in in the factors and reduce the number of numerical problems during factorization. On a set of large unsymmetric real problems, we obtained the median reductions of 12% in the factorization time, of 13% in the size of the LU factors, of 20% in the number of operations performed during the factorization phase, and of 11% in the memory needed by the multifrontal solver MA41-UNS. A byproduct of this ordering strategy is an incomplete LU-factored matrix that can be used as a preconditioner in an iterative solver.
Date: January 18, 2005
Creator: Amestoy, Patrick R.; S., Xiaoye & Pralet, Stephane
Partner: UNT Libraries Government Documents Department

Using IOR to analyze the I/O Performance for HPC Platforms

Description: The HPC community is preparing to deploy petaflop-scale computing platforms that may include hundreds of thousands to millions of computational cores over the next 3 years. Such explosive growth in concurrency creates daunting challenges for the design and implementation of the I/O system. In this work, we first analyzed the I/O practices and requirements of current HPC applications and used them as criteria to select a subset of microbenchmarks that reflect the workload requirements. Our analysis led to selection of IOR, an I/O benchmark developed by LLNL for the ASCI Purple procurement, as our tool to study the I/O performance on two HPC platforms. We selected parameterizations for IOR that match the requirements of key I/O intensive applications to assess its fidelity in reproducing their performance characteristics.
Date: June 8, 2007
Creator: Shan, Hongzhang & Shalf, John
Partner: UNT Libraries Government Documents Department