5,596 Matching Results

Search Results

Advanced search parameters have been applied.

Computations of Eigenpair Subsets with the MRRR Algorithm

Description: The main advantage of inverse iteration over the QR algorithm and Divide & Conquer for the symmetric tridiagonal eigenproblem is that subsets of eigenpairs can be computed at reduced cost. The MRRR algorithm (MRRR = Multiple Relatively Robust Representations) is a clever variant of inverse iteration without the need for reorthogonalization. STEGR, the current version of MRRR in LAPACK 3.0, does not allow for subset computations. The next release of STEGR is designed to compute a (sub-)set of k eigenpairs with {Omicron}(kn) operations. Because of the special way in which eigenvectors are computed, MRRR subset computations are more complicated than when using inverse iteration. Unlike the latter, MRRR sometimes cannot ignore the unwanted part of the spectrum. We describe the problems with what we call 'false singletons'. These are eigenvalues that appear to be isolated with respect to the wanted eigenvalues but in fact belong to a tight cluster of unwanted eigenvalues. This paper analyzes these complications and ways to deal with them.
Date: June 6, 2006
Creator: Marques, Osni A.; Parlett, Beresford N. & Vomel, Christof
Partner: UNT Libraries Government Documents Department

cctbx news

Description: The 'Computational Crystallography Toolbox' (cctbx, http://cctbx.sourceforge.net/) is the open-source component of the Phenix project (http://www.phenix-online.org/). Most recent cctbx developments are geared towards supporting new features of the phenix.refine application. Thus, the open-source mmtbx (macromolecular toolbox) module is currently being most rapidly developed. In this article we give an overview of some of the recent developments. However, the main theme of this article is the presentation of a light-weight example command-line application that was specifically developed for this newsletter: sequence alignment and superposition of two molecules read from files in PDB format. This involves parameter input based on the Phil module presented in Newsletter No. 5, fast reading of the PDB files with the new iotbx.pdb.input class, simple sequence alignment using the new mmtbx.alignment module, and use of the Kearsley (1989) superposition algorithm to find the least-squares solution for superposing C-alpha positions. The major steps are introduced individually, followed by a presentation of the complete application. The example application is deliberately limited in functionality to make it concise enough for this article. The main goal is to show how the open-source components are typically combined into an application. Even though the example is quite specific to macromolecular crystallography, we believe it will also be useful for a small-molecule audience interested in utilizing the large open-source library of general crystallographic algorithms (see our previous articles in this newsletter series) to build an application. We describe recent developments of the Computational Crystallography Toolbox.
Date: November 22, 2006
Creator: Grosse-Kunstleve, Ralf W.; Zwart, Peter H.; Afonine, Pavel V.; Ioerger, Thomas R. & Adams, Paul D.
Partner: UNT Libraries Government Documents Department

Efficient parallel algorithms and data structures related to trees

Description: The main contribution of this dissertation proposes a new paradigm, called the parentheses matching paradigm. It claims that this paradigm is well suited for designing efficient parallel algorithms for a broad class of nonnumeric problems. To demonstrate its applicability, we present three cost-optimal parallel algorithms for breadth-first traversal of general trees, sorting a special class of integers, and coloring an interval graph with the minimum number of colors.
Date: December 1991
Creator: Chen, Calvin Ching-Yuen
Partner: UNT Libraries

Performance study of concurrent search trees and hash algorithms on multiprocessor systems

Description: This study examines the performance of concurrent algorithms for B-trees and linear hashing. B-trees are widely used as an access method for large, single key, database files, stored in lexicographic order on secondary storage devices. Linear hashing is a fast and reliable hash algorithm, suitable for accessing records stored unordered in buckets.
Date: May 1996
Creator: Demuynck, Marie-Anne
Partner: UNT Libraries

SuperLU{_}DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems

Description: In this paper, we present the main algorithmic features in the software package SuperLU{_}DIST, a distributed-memory sparse direct solver for large sets of linear equations. We give in detail our parallelization strategies, with focus on scalability issues, and demonstrate the parallel performance and scalability on current machines. The solver is based on sparse Gaussian elimination, with an innovative static pivoting strategy proposed earlier by the authors. The main advantage of static pivoting over classical partial pivoting is that it permits a priori determination of data structures and communication pattern for sparse Gaussian elimination, which makes it more scalable on distributed memory machines. Based on this a priori knowledge, we designed highly parallel and scalable algorithms for both LU decomposition and triangular solve and we show that they are suitable for large-scale distributed memory machines.
Date: March 27, 2002
Creator: Li, Xiaoye S. & Demmel, James W.
Partner: UNT Libraries Government Documents Department

Recent applications of bandpass filtering

Description: Bandpass filtering has been applied recently in two widely different seismic applications: S.R. Taylor and A.A. Velasco in their source-path amplitude-correction (SPAC) algorithm and N.K. Yacoub in his maximum spectral energy algorithm for picking teleseismic P-wave arrival times. Though the displacement spectrum is the intermediate product in both cases, the filters and scaling corrections used to estimate it are entirely different. They tested both and found that the scaling used by Taylor and Velasco worked in all cases tested whereas Yacoub's did not. They also found that bandpass filtering as implemented by Taylor and Velasco does not work satisfactorily; however, the Gaussian filter used by Yacoub does work. The bandpass filter of Taylor and Velasco works satisfactorily when the results are centered in the band; however, a comb filter with the same number of poles and zeros as the bandpass used by Taylor and Velasco works better than the bandpass filter.
Date: March 15, 1999
Creator: Denny, M D
Partner: UNT Libraries Government Documents Department

Some Software Implementations of the Functions Sine and Cosine

Description: We present several software implementations of the elementary functions sin and cos designed to fit a large class of machines. Implementation details are provided. We also provide a detailed error analysis that bounds the errors of these implementations, over the full range of input arguments, from 0.721 to 0.912 units in the last place. Tests performed on these codes give results that are consistent with the error bounds.
Date: April 1990
Creator: Tang, Ping Tak Peter
Partner: UNT Libraries Government Documents Department

Networks and Natural Language Processing

Description: Article discussing networks and natural language processing. The authors present some of the most successful graph-based representations and algorithms used in language processing and try to explain how and why they work.
Date: September 2008
Creator: Radev, Dragomir R. & Mihalcea, Rada, 1974-
Partner: UNT College of Engineering

Unsymmetric ordering using a constrained Markowitz scheme

Description: We present a family of ordering algorithms that can be used as a preprocessing step prior to performing sparse LU factorization. The ordering algorithms simultaneously achieve the objectives of selecting numerically good pivots and preserving the sparsity. We describe the algorithmic properties and challenges in their implementation. By mixing the two objectives we show that we can reduce the amount of fill-in in the factors and reduce the number of numerical problems during factorization. On a set of large unsymmetric real problems, we obtained the median reductions of 12% in the factorization time, of 13% in the size of the LU factors, of 20% in the number of operations performed during the factorization phase, and of 11% in the memory needed by the multifrontal solver MA41-UNS. A byproduct of this ordering strategy is an incomplete LU-factored matrix that can be used as a preconditioner in an iterative solver.
Date: January 18, 2005
Creator: Amestoy, Patrick R.; S., Xiaoye & Pralet, Stephane
Partner: UNT Libraries Government Documents Department

Multithreading for Synchronization Tolerance in MatrixFactorization

Description: Physical constraints such as power, leakage and pin bandwidth are currently driving the HPC industry to produce systems with unprecedented levels of concurrency. In these parallel systems, synchronization and memory operations are becoming considerably more expensive than before. In this work we study parallel matrix factorization codes and conclude that they need to be re-engineered to avoid unnecessary (and expensive) synchronization. We propose the use of multithreading combined with intelligent schedulers and implement representative algorithms in this style. Our results indicate that this strategy can significantly outperform traditional codes.
Date: July 16, 2007
Creator: Buttari, Alfredo; Dongarra, Jack; Husbands, Parry; Kurzak, Jakub & Yelick, Katherine
Partner: UNT Libraries Government Documents Department

A local construction of the Smith normal form of a matrix polynomial

Description: We present an algorithm for computing a Smith form with multipliers of a regular matrix polynomial over a field. This algorithm differs from previous ones in that it computes a local Smith form for each irreducible factor in the determinant separately and then combines them into a global Smith form, whereas other algorithms apply a sequence of unimodular operations to the original matrix row by row (or column by column). The performance of the algorithm in exact arithmetic is reported for several test cases.
Date: September 1, 2008
Creator: Wilkening, Jon & Yu, Jia
Partner: UNT Libraries Government Documents Department

A fast portable implementation of the Secure Hash Algorithm, III.

Description: In 1992, NIST announced a proposed standard for a collision-free hash function. The algorithm for producing the hash value is known as the Secure Hash Algorithm (SHA), and the standard using the algorithm in known as the Secure Hash Standard (SHS). Later, an announcement was made that a scientist at NSA had discovered a weakness in the original algorithm. A revision to this standard was then announced as FIPS 180-1, and includes a slight change to the algorithm that eliminates the weakness. This new algorithm is called SHA-1. In this report we describe a portable and efficient implementation of SHA-1 in the C language. Performance information is given, as well as tips for porting the code to other architectures. We conclude with some observations on the efficiency of the algorithm, and a discussion of how the efficiency of SHA might be improved.
Date: October 1, 1992
Creator: McCurley, Kevin S.
Partner: UNT Libraries Government Documents Department

Large neighborhood search for the double traveling salesman problem with multiple stacks

Description: This paper considers a complex real-life short-haul/long haul pickup and delivery application. The problem can be modeled as double traveling salesman problem (TSP) in which the pickups and the deliveries happen in the first and second TSPs respectively. Moreover, the application features multiple stacks in which the items must be stored and the pickups and deliveries must take place in reserve (LIFO) order for each stack. The goal is to minimize the total travel time satisfying these constraints. This paper presents a large neighborhood search (LNS) algorithm which improves the best-known results on 65% of the available instances and is always within 2% of the best-known solutions.
Date: January 1, 2009
Creator: Bent, Russell W & Van Hentenryck, Pascal
Partner: UNT Libraries Government Documents Department

Stencil Computation Optimization and Auto-tuning on State-of-the-Art Multicore Architectures

Description: Understanding the most efficient design and utilization of emerging multicore systems is one of the most challenging questions faced by the mainstream and scientific computing industries in several decades. Our work explores multicore stencil (nearest-neighbor) computations -- a class of algorithms at the heart of many structured grid codes, including PDE solvers. We develop a number of effective optimization strategies, and build an auto-tuning environment that searches over our optimizations and their parameters to minimize runtime, while maximizing performance portability. To evaluate the effectiveness of these strategies we explore the broadest set of multicore architectures in the current HPC literature, including the Intel Clovertown, AMD Barcelona, Sun Victoria Falls, IBM QS22 PowerXCell 8i, and NVIDIA GTX280. Overall, our auto-tuning optimization methodology results in the fastest multicore stencil performance to date. Finally, we present several key insights into the architectural trade-offs of emerging multicore designs and their implications on scientific algorithm development.
Date: August 22, 2008
Creator: Datta, Kaushik; Murphy, Mark; Volkov, Vasily; Williams, Samuel; Carter, Jonathan; Oliker, Leonid et al.
Partner: UNT Libraries Government Documents Department

Loops in Reeb Graphs of 2-Manifolds

Description: Given a Morse function f over a 2-manifold with or without boundary, the Reeb graph is obtained by contracting the connected components of the level sets to points. We prove tight upper and lower bounds on the number of loops in the Reeb graph that depend on the genus, the number of boundary components, and whether or not the 2-manifold is orientable. We also give an algorithm that constructs the Reeb graph in time O(n log n), where n is the number of edges in the triangulation used to represent the 2-manifold and the Morse function.
Date: February 11, 2003
Creator: Cole-McLaughlin, K; Edelsbrunner, H; Harer, J; Natarajan, V & Pascucci, V
Partner: UNT Libraries Government Documents Department

Influence of machine organization on algorithms

Description: A comparative analysis of the CDC STAR-100 and the ILLIAC IV for a given code is given. A large two-dimensional Lagrangian hydrodynamic model for the STAR-100 computer was programmed. Some theoretical considerations concerning execution rates are Presented. Some examples are given of where these theoretical considerations were important considerations in the design of the algorithms. Finally, several observations and opinions, many of which may be controversial but which should be considered by the programming world at large, are described. (MOW)
Date: May 1, 1973
Creator: Owens, J.L.
Partner: UNT Libraries Government Documents Department