268 Matching Results

Search Results

Advanced search parameters have been applied.

Complex band structure under plane-wave nonlocal pseudopotential Hamiltonian of metallic wires and electrodes

Description: We present a practical approach to calculate the complex band structure of an electrode for quantum transport calculations. This method is designed for plane wave based Hamiltonian with nonlocal pseudopotentials and the auxiliary periodic boundary condition transport calculation approach. Currently there is no direct method to calculate all the evanescent states for a given energy for systems with nonlocal pseudopotentials. On the other hand, in the auxiliary periodic boundary condition transport calculation, there is no need for all the evanescent states at a given energy. The current method fills this niche. The method has been used to study copper and gold nanowires and bulk electrodes.
Date: July 17, 2009
Creator: Yang, Chao
Partner: UNT Libraries Government Documents Department

Inf-sup estimates for the Stokes problem in a periodic channel

Description: We derive estimates of the Babuska-Brezzi inf-sup constant {beta} for two-dimensional incompressible flow in a periodic channel with one flat boundary and the other given by a periodic, Lipschitz continuous function h. If h is a constant function (so the domain is rectangular), we show that periodicity in one direction but not the other leads to an interesting connection between {beta} and the unitary operator mapping the Fourier sine coefficients of a function to its Fourier cosine coefficients. We exploit this connection to determine the dependence of {beta} on the aspect ratio of the rectangle. We then show how to transfer this result to the case that h is C{sup 1,1} or even C{sup 0,1} by a change of variables. We avoid non-constructive theorems of functional analysis in order to explicitly exhibit the dependence of {beta} on features of the geometry such as the aspect ratio, the maximum slope, and the minimum gap thickness (if h passes near the substrate). We give an example to show that our estimates are optimal in their dependence on the minimum gap thickness in the C{sup 1,1} case, and nearly optimal in the Lipschitz case.
Date: June 27, 2007
Creator: Wilkening, Jon
Partner: UNT Libraries Government Documents Department

Fingerprinting Communication and Computation on HPC Machines

Description: How do we identify what is actually running on high-performance computing systems? Names of binaries, dynamic libraries loaded, or other elements in a submission to a batch queue can give clues, but binary names can be changed, and libraries provide limited insight and resolution on the code being run. In this paper, we present a method for"fingerprinting" code running on HPC machines using elements of communication and computation. We then discuss how that fingerprint can be used to determine if the code is consistent with certain other types of codes, what a user usually runs, or what the user requested an allocation to do. In some cases, our techniques enable us to fingerprint HPC codes using runtime MPI data with a high degree of accuracy.
Date: June 2, 2010
Creator: Peisert, Sean
Partner: UNT Libraries Government Documents Department

Comparing GPU Implementations of Bilateral and Anisotropic Diffusion Filters for 3D Biomedical Datasets

Description: We compare the performance of hand-tuned CUDA implementations of bilateral and anisotropic diffusion filters for denoising 3D MRI datasets. Our tests sweep comparable parameters for the two filters and measure total runtime, memory bandwidth, computational throughput, and mean squared errors relative to a noiseless reference dataset.
Date: May 6, 2010
Creator: Howison, Mark
Partner: UNT Libraries Government Documents Department

Review of The SIAM 100-Digit Challenge: A Study in High-Accuracy Numerical Computing

Description: In the January 2002 edition of SIAM News, Nick Trefethen announced the '$100, 100-Digit Challenge'. In this note he presented ten easy-to-state but hard-to-solve problems of numerical analysis, and challenged readers to find each answer to ten-digit accuracy. Trefethen closed with the enticing comment: 'Hint: They're hard! If anyone gets 50 digits in total, I will be impressed.' This challenge obviously struck a chord in hundreds of numerical mathematicians worldwide, as 94 teams from 25 nations later submitted entries. Many of these submissions exceeded the target of 50 correct digits; in fact, 20 teams achieved a perfect score of 100 correct digits. Trefethen had offered $100 for the best submission. Given the overwhelming response, a generous donor (William Browning, founder of Applied Mathematics, Inc.) provided additional funds to provide a $100 award to each of the 20 winning teams. Soon after the results were out, four participants, each from a winning team, got together and agreed to write a book about the problems and their solutions. The team is truly international: Bornemann is from Germany, Laurie is from South Africa, Wagon is from the USA, and Waldvogel is from Switzerland. This book provides some mathematical background for each problem, and then shows in detail how each of them can be solved. In fact, multiple solution techniques are mentioned in each case. The book describes how to extend these solutions to much larger problems and much higher numeric precision (hundreds or thousands of digit accuracy). The authors also show how to compute error bounds for the results, so that one can say with confidence that one's results are accurate to the level stated. Numerous numerical software tools are demonstrated in the process, including the commercial products Mathematica, Maple and Matlab. Computer programs that perform many of the algorithms mentioned in the ...
Date: January 25, 2005
Creator: Bailey, David
Partner: UNT Libraries Government Documents Department

High Performance, Three-Dimensional Bilateral Filtering

Description: Image smoothing is a fundamental operation in computer vision and image processing. This work has two main thrusts: (1) implementation of a bilateral filter suitable for use in smoothing, or denoising, 3D volumetric data; (2) implementation of the 3D bilateral filter in three different parallelization models, along with parallel performance studies on two modern HPC architectures. Our bilateral filter formulation is based upon the work of Tomasi [11], but extended to 3D for use on volumetric data. Our three parallel implementations use POSIX threads, the Message Passing Interface (MPI), and Unified Parallel C (UPC), a Partitioned Global Address Space (PGAS) language. Our parallel performance studies, which were conducted on a Cray XT4 supercomputer and aquad-socket, quad-core Opteron workstation, show our algorithm to have near-perfect scalability up to 120 processors. Parallel algorithms, such as the one we present here, will have an increasingly important role for use in production visual analysis systems as the underlying computational platforms transition from single- to multi-core architectures in the future.
Date: June 5, 2008
Creator: Bethel, E. Wes
Partner: UNT Libraries Government Documents Department

Can An Evolutionary Process Create English Text?

Description: Critics of the conventional theory of biological evolution have asserted that while natural processes might result in some limited diversity, nothing fundamentally new can arise from 'random' evolution. In response, biologists such as Richard Dawkins have demonstrated that a computer program can generate a specific short phrase via evolution-like iterations starting with random gibberish. While such demonstrations are intriguing, they are flawed in that they have a fixed, pre-specified future target, whereas in real biological evolution there is no fixed future target, but only a complicated 'fitness landscape'. In this study, a significantly more sophisticated evolutionary scheme is employed to produce text segments reminiscent of a Charles Dickens novel. The aggregate size of these segments is larger than the computer program and the input Dickens text, even when comparing compressed data (as a measure of information content).
Date: October 29, 2008
Creator: Bailey, David H.
Partner: UNT Libraries Government Documents Department

Misleading Performance Claims in Parallel Computations

Description: In a previous humorous note entitled 'Twelve Ways to Fool the Masses,' I outlined twelve common ways in which performance figures for technical computer systems can be distorted. In this paper and accompanying conference talk, I give a reprise of these twelve 'methods' and give some actual examples that have appeared in peer-reviewed literature in years past. I then propose guidelines for reporting performance, the adoption of which would raise the level of professionalism and reduce the level of confusion, not only in the world of device simulation but also in the larger arena of technical computing.
Date: May 29, 2009
Creator: Bailey, David H.
Partner: UNT Libraries Government Documents Department

ASCR Science Network Requirements

Description: The Energy Sciences Network (ESnet) is the primary provider of network connectivity for the US Department of Energy Office of Science, the single largest supporter of basic research in the physical sciences in the United States. In support of the Office of Science programs, ESnet regularly updates and refreshes its understanding of the networking requirements of the instruments, facilities, scientists, and science programs that it serves. This focus has helped ESnet to be a highly successful enabler of scientific discovery for over 20 years. In April 2009 ESnet and the Office of Advanced Scientific Computing Research (ASCR), of the DOE Office of Science, organized a workshop to characterize the networking requirements of the programs funded by ASCR. The ASCR facilities anticipate significant increases in wide area bandwidth utilization, driven largely by the increased capabilities of computational resources and the wide scope of collaboration that is a hallmark of modern science. Many scientists move data sets between facilities for analysis, and in some cases (for example the Earth System Grid and the Open Science Grid), data distribution is an essential component of the use of ASCR facilities by scientists. Due to the projected growth in wide area data transfer needs, the ASCR supercomputer centers all expect to deploy and use 100 Gigabit per second networking technology for wide area connectivity as soon as that deployment is financially feasible. In addition to the network connectivity that ESnet provides, the ESnet Collaboration Services (ECS) are critical to several science communities. ESnet identity and trust services, such as the DOEGrids certificate authority, are widely used both by the supercomputer centers and by collaborations such as Open Science Grid (OSG) and the Earth System Grid (ESG). Ease of use is a key determinant of the scientific utility of network-based services. Therefore, a key enabling aspect ...
Date: August 24, 2009
Creator: Dart, Eli & Tierney, Brian
Partner: UNT Libraries Government Documents Department

The BBP Algorithm for Pi

Description: The 'Bailey-Borwein-Plouffe' (BBP) algorithm for {pi} is based on the BBP formula for {pi}, which was discovered in 1995 and published in 1996 [3]: {pi} = {summation}{sub k=0}{sup {infinity}} 1/16{sup k} (4/8k+1 - 2/8k+4 - 1/8k+5 - 1/8k+6). This formula as it stands permits {pi} to be computed fairly rapidly to any given precision (although it is not as efficient for that purpose as some other formulas that are now known [4, pg. 108-112]). But its remarkable property is that it permits one to calculate (after a fairly simple manipulation) hexadecimal or binary digits of {pi} beginning at an arbitrary starting position. For example, ten hexadecimal digits {pi} beginning at position one million can be computed in only five seconds on a 2006-era personal computer. The formula itself was found by a computer program, and almost certainly constitutes the first instance of a computer program finding a significant new formula for {pi}. It turns out that the existence of this formula has implications for the long-standing unsolved question of whether {pi} is normal to commonly used number bases (a real number x is said to be b-normal if every m-long string of digits in the base-b expansion appears, in the limit, with frequency b{sup -m}). Extending this line of reasoning recently yielded a proof of normality for class of explicit real numbers (although not yet including {pi}) [4, pg. 148-156].
Date: September 17, 2006
Creator: Bailey, David H.
Partner: UNT Libraries Government Documents Department

The NAS Parallel Benchmarks

Description: The NAS Parallel Benchmarks (NPB) are a suite of parallel computer performance benchmarks. They were originally developed at the NASA Ames Research Center in 1991 to assess high-end parallel supercomputers. Although they are no longer used as widely as they once were for comparing high-end system performance, they continue to be studied and analyzed a great deal in the high-performance computing community. The acronym 'NAS' originally stood for the Numerical Aeronautical Simulation Program at NASA Ames. The name of this organization was subsequently changed to the Numerical Aerospace Simulation Program, and more recently to the NASA Advanced Supercomputing Center, although the acronym remains 'NAS.' The developers of the original NPB suite were David H. Bailey, Eric Barszcz, John Barton, David Browning, Russell Carter, LeoDagum, Rod Fatoohi, Samuel Fineberg, Paul Frederickson, Thomas Lasinski, Rob Schreiber, Horst Simon, V. Venkatakrishnan and Sisira Weeratunga. The original NAS Parallel Benchmarks consisted of eight individual benchmark problems, each of which focused on some aspect of scientific computing. The principal focus was in computational aerophysics, although most of these benchmarks have much broader relevance, since in a much larger sense they are typical of many real-world scientific computing applications. The NPB suite grew out of the need for a more rational procedure to select new supercomputers for acquisition by NASA. The emergence of commercially available highly parallel computer systems in the late 1980s offered an attractive alternative to parallel vector supercomputers that had been the mainstay of high-end scientific computing. However, the introduction of highly parallel systems was accompanied by a regrettable level of hype, not only on the part of the commercial vendors but even, in some cases, by scientists using the systems. As a result, it was difficult to discern whether the new systems offered any fundamental performance advantage over vector supercomputers, and, if ...
Date: November 15, 2009
Creator: Bailey, David H.
Partner: UNT Libraries Government Documents Department

Using wesBench to Study the Rendering Performance of Graphics Processing Units

Description: Graphics operations consist of two broad operations. The first, which we refer to here as vertex operations, consists of transformation, lighting, primitive assembly, and so forth. The second, which we refer to as pixel or fragment operations, consist of rasterization, texturing, scissoring, blending, and fill. Overall GPU rendering performance is a function of throughput of both these interdependent stages: if one stage is slower than the other, the faster stage will be forced to run more slowly and overall rendering performance will be adversely affected. This relationship is commutative: if the later stage has a greater workload than the earlier stage, the earlier stage will be forced to 'slow down.' For example, a large triangle that covers many screen pixels will incur a very small amount of work in the vertex stage while at the same time incurring a relatively large amount of work in the fragment stage. Rendering performance of a scene consisting of many large-area triangles will be limited by throughput of the fragment stage, which will have relatively more work than the vertex stage. There are two main objectives for this document. First, we introduce a new graphics benchmark, wesBench, which is useful for measuring performance of both stages of the rendering pipeline under varying conditions. Second, we present its methodology for measuring performance and show results of several performance measurement studies aimed at producing better understanding of GPU rendering performance characteristics and limits under varying configurations. First, in Section 2, we explore the 'crossover' point between geometry and rasterization. Second, in Section 3, we explore additional performance characteristics, some of which are ill- or un-documented. Lastly, several appendices provide additional material concerning problems with the gfxbench benchmark, and details about the new wesBench graphics benchmark.
Date: January 8, 2010
Creator: Bethel, Edward W
Partner: UNT Libraries Government Documents Department

Energy Proportionality for Disk Storage Using Replication

Description: Energy saving has become a crucial concern in datacenters as several reports predict that the anticipated energy costs over a three year period will exceed hardware acquisition. In particular, saving energy for storage is of major importance as storage devices (and cooling them off) may contribute over 25 percent of the total energy consumed in a datacenter. Recent work introduced the concept of energy proportionality and argued that it is a more relevant metric than just energy saving as it takes into account the tradeoff between energy consumption and performance. In this paper, we present a novel approach, called FREP (Fractional Replication for Energy Proportionality), for energy management in large datacenters. FREP includes areplication strategy and basic functions to enable flexible energy management. Specifically, our method provides performance guarantees by adaptively controlling the power states of a group of disks based on observed and predicted workloads. Our experiments, using a set of real and synthetic traces, show that FREP dramatically reduces energy requirements with a minimal response time penalty.
Date: September 9, 2010
Creator: Kim, Jinoh & Rotem, Doron
Partner: UNT Libraries Government Documents Department

Inf-sup estimates for the Stokes problem in a periodic channel

Description: We derive estimates of the Babuska-Brezzi inf-sup constant {beta} for two-dimensional incompressible flow in a periodic channel with one flat boundary and the other given by a periodic, Lipschitz continuous function h. If h is a constant function (so the domain is rectangular), we show that periodicity in one direction but not the other leads to an interesting connection between {beta} and the unitary operator mapping the Fourier sine coefficients of a function to its Fourier cosine coefficients. We exploit this connection to determine the dependence of {beta} on the aspect ratio of the rectangle. We then show how to transfer this result to the case that h is C{sup 1,1} or even C{sup 0,1} by a change of variables. We avoid non-constructive theorems of functional analysis in order to explicitly exhibit the dependence of {beta} on features of the geometry such as the aspect ratio, the maximum slope, and the minimum gap thickness (if h passes near the substrate). We give an example to show that our estimates are optimal in their dependence on the minimum gap thickness in the C{sup 1,1} case, and nearly optimal in the Lipschitz case.
Date: December 10, 2008
Creator: Wilkening, Jon
Partner: UNT Libraries Government Documents Department

An infinite branching hierarchy of time-periodic solutions of the Benjamin-Ono equation

Description: We present a new representation of solutions of the Benjamin-Ono equation that are periodic in space and time. Up to an additive constant and a Galilean transformation, each of these solutions is a previously known, multi-periodic solution; however, the new representation unifies the subset of such solutions with a fixed spatial period and a continuously varying temporal period into a single network of smooth manifolds connected together by an infinite hierarchy of bifurcations. Our representation explicitly describes the evolution of the Fourier modes of the solution as well as the particle trajectories in a meromorphic representation of these solutions; therefore, we have also solved the problem of finding periodic solutions of the ordinary differential equation governing these particles, including a description of a bifurcation mechanism for adding or removing particles without destroying periodicity. We illustrate the types of bifurcation that occur with several examples, including degenerate bifurcations not predicted by linearization about traveling waves.
Date: July 1, 2008
Creator: Wilkening, Jon
Partner: UNT Libraries Government Documents Department

Practical error estimates for Reynolds' lubrication approximation and its higher order corrections

Description: Reynolds lubrication approximation is used extensively to study flows between moving machine parts, in narrow channels, and in thin films. The solution of Reynolds equation may be thought of as the zeroth order term in an expansion of the solution of the Stokes equations in powers of the aspect ratio {var_epsilon} of the domain. In this paper, we show how to compute the terms in this expansion to arbitrary order on a two-dimensional, x-periodic domain and derive rigorous, a-priori error bounds for the difference between the exact solution and the truncated expansion solution. Unlike previous studies of this sort, the constants in our error bounds are either independent of the function h(x) describing the geometry, or depend on h and its derivatives in an explicit, intuitive way. Specifically, if the expansion is truncated at order 2k, the error is O({var_epsilon}{sup 2k+2}) and h enters into the error bound only through its first and third inverse moments {integral}{sub 0}{sup 1} h(x){sup -m} dx, m = 1,3 and via the max norms {parallel} 1/{ell}! h{sup {ell}-1}{partial_derivative}{sub x}{sup {ell}}h{parallel}{sub {infinity}}, 1 {le} {ell} {le} 2k + 2. We validate our estimates by comparing with finite element solutions and present numerical evidence that suggests that even when h is real analytic and periodic, the expansion solution forms an asymptotic series rather than a convergent series.
Date: December 10, 2008
Creator: Wilkening, Jon
Partner: UNT Libraries Government Documents Department

A Three-level BDDC algorithm for saddle point problems

Description: BDDC algorithms have previously been extended to the saddle point problems arising from mixed formulations of elliptic and incompressible Stokes problems. In these two-level BDDC algorithms, all iterates are required to be in a benign space, a subspace in which the preconditioned operators are positive definite. This requirement can lead to large coarse problems, which have to be generated and factored by a direct solver at the beginning of the computation and they can ultimately become a bottleneck. An additional level is introduced in this paper to solve the coarse problem approximately and to remove this difficulty. This three-level BDDC algorithm keeps all iterates in the benign space and the conjugate gradient methods can therefore be used to accelerate the convergence. This work is an extension of the three-level BDDC methods for standard finite element discretization of elliptic problems and the same rate of convergence is obtained for the mixed formulation of the same problems. Estimate of the condition number for this three-level BDDC methods is provided and numerical experiments are discussed.
Date: December 10, 2008
Creator: Tu, X.
Partner: UNT Libraries Government Documents Department

Voro++: a three-dimensional Voronoi cell library in C++

Description: Voro++ is a free software library for the computation of three dimensional Voronoi cells. It is primarily designed for applications in physics and materials science, where the Voronoi tessellation can be a useful tool in the analysis of densely-packed particle systems, such as granular materials or glasses. The software comprises of several C++ classes that can be modified and incorporated into other programs. A command-line utility is also provided that can use most features of the code. Voro++ makes use of a direct cell-by-cell construction, which is particularly suited to handling special boundary conditions and walls. It employs algorithms which are tolerant for numerical precision errors, and it has been successfully employed on very large particle systems.
Date: January 15, 2009
Creator: Rycroft, Chris
Partner: UNT Libraries Government Documents Department

LONG TERM FILE MIGRATION - PART II: FILE REPLACEMENT ALGORITHMS

Description: The steady increase in the power and complexity of modern computer systems has encouraged the implementation of automatic file migration systems which move files dynamically between mass storage devices and disk in response to user reference patterns. Using information describing thirteen months of text editor data set file references, (analyzed in detail in the first part of this paper), they develop and evaluation algorithms for the selection of files to be moved from disk to mass storage. They find that algorithms based on both the file size and the time since the file was last used work well. The best realizable algorithms tested condition on the empirical distribution of the times between file references. Acceptable results are also obtained by selecting for replacement that file whose size times time to last reference is maximal. Comparisons are made with a number of standard algorithms developed for paging, such as Working Set. Sufficient information (parameter values, fitted equations) is provided that our algorithms may be easily implemented on other systems.
Date: October 1, 1978
Creator: Jay Smith, Alan
Partner: UNT Libraries Government Documents Department

Phase patterns of coupled oscillators with application to wireless communication

Description: Here we study the plausibility of a phase oscillators dynamical model for TDMA in wireless communication networks. We show that emerging patterns of phase locking states between oscillators can eventually oscillate in a round-robin schedule, in a similar way to models of pulse coupled oscillators designed to this end. The results open the door for new communication protocols in a continuous interacting networks of wireless communication devices.
Date: January 2, 2008
Creator: Arenas, A.
Partner: UNT Libraries Government Documents Department

Scalable parallel Newton-Krylov solvers for discontinuous Galerkin discretizations

Description: We present techniques for implicit solution of discontinuous Galerkin discretizations of the Navier-Stokes equations on parallel computers. While a block-Jacobi method is simple and straight-forward to parallelize, its convergence properties are poor except for simple problems. Therefore, we consider Newton-GMRES methods preconditioned with block-incomplete LU factorizations, with optimized element orderings based on a minimum discarded fill (MDF) approach. We discuss the difficulties with the parallelization of these methods, but also show that with a simple domain decomposition approach, most of the advantages of the block-ILU over the block-Jacobi preconditioner are still retained. The convergence is further improved by incorporating the matrix connectivities into the mesh partitioning process, which aims at minimizing the errors introduced from separating the partitions. We demonstrate the performance of the schemes for realistic two- and three-dimensional flow problems.
Date: December 31, 2008
Creator: Persson, P.-O.
Partner: UNT Libraries Government Documents Department

Visualization and Analysis-Oriented Reconstruction of Material Interfaces

Description: Reconstructing boundaries along material interfaces from volume fractions is a difficult problem, especially because the under-resolved nature of the input data allows for many correct interpretations. Worse, algorithms widely accepted as appropriate for simulation are inappropriate for visualization. In this paper, we describe a new algorithm that is specifically intended for reconstructing material interfaces for visualization and analysis requirements. The algorithm performs well with respect to memory footprint and execution time, has desirable properties in various accuracy metrics, and also produces smooth surfaces with few artifacts, even when faced with more than two materials per cell.
Date: March 5, 2010
Creator: Childs, Henry R.
Partner: UNT Libraries Government Documents Department

Monte Carlo without chains

Description: A sampling method for spin systems is presented. The spin lattice is written as the union of a nested sequence of sublattices, all but the last with conditionally independent spins, which are sampled in succession using their marginals. The marginals are computed concurrently by a fast algorithm; errors in the evaluation of the marginals are offset by weights. There are no Markov chains and each sample is independent of the previous ones; the cost of a sample is proportional to the number of spins (but the number of samples needed for good statistics may grow with array size). The examples include the Edwards-Anderson spin glass in three dimensions.
Date: December 12, 2007
Creator: Chorin, Alexandre J.
Partner: UNT Libraries Government Documents Department