13 Matching Results

Search Results

Advanced search parameters have been applied.

DONIO: Distributed Object Network I/O Library

Description: This report describes the use and implementation of DONIO (Distributed Object Network I/O), a library of routines that provide fast file I/O capabilities in the Intel iPSC/860 and Paragon distributed memory parallel environments. DONIO caches a copy of the file in memory distributed across all processors. Disk I/O routines (such as read, write, and lseek) are replaced by calls to DONIO routines, which translate these operations into message communication to update the cached data. Experiments on the Intel Paragon show that the cost of concurrent disk I/O using DONIO for large files can be 15-30 times smaller than using standard disk I/O.
Date: January 1, 1994
Creator: D'Azevedo, E.F.
Partner: UNT Libraries Government Documents Department

Parallelization of a multiregion flow and transport code using software emulated global shared memory and high performance FORTRAN

Description: The objectives of this research are (1) to parallelize a suite of multiregion groundwater flow and solute transport codes that use Galerkin and Lagrangian- Eulerian finite element methods, (2) to test the compatibility of a global shared memory emulation software with a High Performance FORTRAN (HPF) compiler, and (3) to obtain performance characteristics and scalability of the parallel codes. The suite of multiregion flow and transport codes, 3DMURF and 3DMURT, were parallelized using the DOLIB shared memory emulation, in conjunction with the PGI HPF compiler, to run on the Intel Paragons at the Oak Ridge National Laboratory (ORNL) and a network of workstations. The novelty of this effort is first in the use of HPF and global shared memory emulation concurrently to facilitate the conversion of a serial code to a parallel code, and secondly the shared memory library enables efficient implementation of Lagrangian particle tracking along flow characteristics. The latter allows long-time-step-size simulation with particle tracking and dynamic particle redistribution for load balancing, thereby reducing the number of time steps needed for most transient problems. The parallel codes were applied to a pumping well problem to test the efficiency of the domain decomposition and particle tracking algorithms. The full problem domain consists of over 200,000 degrees of freedom with highly nonlinear soil property functions. Relatively good scalability was obtained for a preliminary test run on the Intel Paragons at the Center for Computational Sciences (CCS), ORNL. However, due to the difficulties we encountered in the PGI HPF compiler, as of the writing of this manuscript we are able to report results from 3DMURF only.
Date: February 1, 1997
Creator: D`Azevedo, E.F. & Gwo, Jin-Ping
Partner: UNT Libraries Government Documents Department

Packed storage extension for ScaLAPACK

Description: The authors describe a new extension to ScaLAPACK for computing with symmetric (Hermitian) matrices stored in a packed form. The new code is built upon the ScaLAPACK routines for full dense storage for a high degree of software reuse. The original ScaLAPACK stores a symmetric matrix as a full matrix but accesses only the lower or upper triangular part. The new code enables more efficient use of memory by storing only the lower or upper triangular part of a symmetric (Hermitian) matrix. The packed storage scheme distributes the matrix by block column panels. Within each panel, the matrix is stored as a regular ScaLAPACK matrix. This storage arrangement simplifies the subroutine interface and code reuse. Routines PxPPTRF/PxPPTRS implement the Cholesky factorization and solution for symmetric (Hermitian) linear systems in packed storage. Routines PxSPEV/PxSPEVX (PxHPEV/PxHPEVX) implement the computation of eigenvalues and eigenvectors for symmetric (Hermitian) matrices in packed storage. Routines PxSPGVX (PxHPGVX) implement the expert driver for the generalized eigenvalue problem for symmetric (Hermitian) matrices in packed storage. Performance results on the Intel Paragon suggest that the packed storage scheme incurs only a small time overhead over the full storage scheme.
Date: January 1, 1997
Creator: D'Azevedo, E.F. & Dongarra, J.J.
Partner: UNT Libraries Government Documents Department

A new shared-memory programming paradigm for molecular dynamics simulations on the Intel Paragon

Description: This report describes the use of shared memory emulation with DOLIB (Distributed Object Library) to simplify parallel programming on the Intel Paragon. A molecular dynamics application is used as an example to illustrate the use of the DOLIB shared memory library. SOTON-PAR, a parallel molecular dynamics code with explicit message-passing using a Lennard-Jones 6-12 potential, is rewritten using DOLIB primitives. The resulting code has no explicit message primitives and resembles a serial code. The new code can perform dynamic load balancing and achieves better performance than the original parallel code with explicit message-passing.
Date: December 1, 1994
Creator: D`Azevedo, E.F. & Romine, C.H.
Partner: UNT Libraries Government Documents Department

Coefficient adaptive triangulation for strongly anisotropic problems

Description: Second order elliptic partial differential equations arise in many important applications, including flow through porous media, heat conduction, the distribution of electrical or magnetic potential. The prototype is the Laplace problem, which in discrete form produces a coefficient matrix that is relatively easy to solve in a regular domain. However, the presence of anisotropy produces a matrix whose condition number is increased, making the resulting linear system more difficult to solve. In this work, we take the anisotropy into account in the discretization by mapping each anisotropic region into a ``stretched`` coordinate space in which the anisotropy is removed. The region is then uniformly triangulated, and the resulting triangulation mapped back to the original space. The effect is to generate long slender triangles that are oriented in the direction of ``preferred flow.`` Slender triangles are generally regarded as numerically undesirable since they tend to cause poor conditioning; however, our triangulation has the effect of producing effective isotropy, thus improving the condition number of the resulting coefficient matrix.
Date: January 1, 1996
Creator: D`Azevedo, E.F.; Romine, C.H. & Donato, J.M.
Partner: UNT Libraries Government Documents Department

EDONIO: Extended distributed object network I/O library

Description: This report describes EDONIO (Extended Distributed Object Network I/O), an enhanced version of DONIO (Distributed Object Network I/O Library) optimized for the Intel Paragon Systems using the new M-ASYNC access mode. DONIO provided fast file I/O capabilities in the Intel iPSC/860 and Paragon distributed memory parallel environments by caching a copy of the entire file in memory distributed across all processors. EDONIO is more memory efficient by caching only a subset of the disk file at a time. DONIO was restricted by the high memory requirements and use of 32-bit integer indexing to handle files no larger than 2 Gigabytes. EDONIO overcomes this barrier by using the extended integer library routines provided by Intel`s NX operating system. For certain applications, EDONIO may show a ten-fold improvement in performance over the native NX I/O routines.
Date: March 1, 1995
Creator: D`Azevedo, E.F. & Romine, C.H.
Partner: UNT Libraries Government Documents Department

A New Shared-Memory Programming Paradigm for Molecular Dynamics Simulations on the Intel Paragon

Description: This report describes the use of shared memory emulation with DOLIB (Distributed Object Library) to simplify parallel programming on the Intel Paragon. A molecular dynamics application is used as an example to illustrate the use of the DOLIB shared memory library. SOTON PAR, a parallel molecular dynamics code with explicit message-passing using a Lennard-Jones 6-12 potential, is rewritten using DOLIB primitives. The resulting code has no explicit message primitives and resembles a serial code. The new code can perform dynamic load balancing and achieves better performance than the original parallel code with explicit message-passing.
Date: January 1, 1995
Creator: D'Azevedo, E.F.
Partner: UNT Libraries Government Documents Department

DOLIB: Distributed Object Library

Description: This report describes the use and implementation of DOLIB (Distributed Object Library), a library of routines that emulates global or virtual shared memory on Intel multiprocessor systems. Access to a distributed global array is through explicit calls to gather and scatter. Advantages of using DOLIB include: dynamic allocation and freeing of huge (gigabyte) distributed arrays, both C and FORTRAN callable interfaces, and the ability to mix shared-memory and message-passing programming models for ease of use and optimal performance. DOLIB is independent of language and compiler extensions and requires no special operating system support. DOLIB also supports automatic caching of read-only data for high performance. The virtual shared memory support provided in DOLIB is well suited for implementing Lagrangian particle tracking techniques. We have also used DOLIB to create DONIO (Distributed Object Network I/O Library), which obtains over a 10-fold improvement in disk I/O performance on the Intel Paragon.
Date: January 1, 1994
Creator: D'Azevedo, E.F.
Partner: UNT Libraries Government Documents Department

Are Bilinear Quadrilaterals Better Than Linear Triangles?

Description: This paper compares the theoretical effectiveness of bilinear approximation over quadrilaterals with linear approximation over triangles. Anisotropic mesh transformation is used to generate asymptotically optimally efficient meshes for piecewise linear interpolation over triangles and bilinear interpolation over quadrilaterals. For approximating a convex function, although bilinear quadrilaterals are more efficient, linear triangles are more accurate and may be preferred in finite element computations; whereas for saddle-shaped functions, quadrilaterals may offer a higher order approximation on a well-designed mesh. A surprising finding is different grid orientations may yield an order of magnitude improvement in approximation accuracy.
Date: January 1, 1993
Creator: D'Azevedo, E.F.
Partner: UNT Libraries Government Documents Department

Two variants of minimum discarded fill ordering

Description: It is well known that the ordering of the unknowns can have a significant effect on the convergence of Preconditioned Conjugate Gradient (PCG) methods. There has been considerable experimental work on the effects of ordering for regular finite difference problems. In many cases, good results have been obtained with preconditioners based on diagonal, spiral or natural row orderings. However, for finite element problems having unstructured grids or grids generated by a local refinement approach, it is difficult to define many of the orderings for more regular problems. A recently proposed Minimum Discarded Fill (MDF) ordering technique is effective in finding high quality Incomplete LU (ILU) preconditioners, especially for problems arising from unstructured finite element grids. Testing indicates this algorithm can identify a rather complicated physical structure in an anisotropic problem and orders the unknowns in the preferred'' direction. The MDF technique may be viewed as the numerical analogue of the minimum deficiency algorithm in sparse matrix technology. At any stage of the partial elimination, the MDF technique chooses the next pivot node so as to minimize the amount of discarded fill. In this work, two efficient variants of the MDF technique are explored to produce cost-effective high-order ILU preconditioners. The Threshold MDF orderings combine MDF ideas with drop tolerance techniques to identify the sparsity pattern in the ILU preconditioners. These techniques identify an ordering that encourages fast decay of the entries in the ILU factorization. The Minimum Update Matrix (MUM) ordering technique is a simplification of the MDF ordering and is closely related to the minimum degree algorithm. The MUM ordering is especially for large problems arising from Navier-Stokes problems. Some interesting pictures of the orderings are presented using a visualization tool. 22 refs., 4 figs., 7 tabs.
Date: January 1, 1991
Creator: D'Azevedo, E.F. (Oak Ridge National Lab., TN (USA)); Forsyth, P.A. & Tang, Wei-Pai (Waterloo Univ., ON (Canada). Dept. of Computer Science)
Partner: UNT Libraries Government Documents Department

Modeling subsurface contaminant reactions and transport at the watershed scale

Description: The objectives of this research are: (1) to numerically examine the multiscale effects of physical and chemical mass transfer processes on watershed scale, variably saturated subsurface contaminant transport, and (2) to conduct numerical simulations on watershed scale reactive solute transport and evaluate their implications to uncertainty characterization and cost benefit analysis. Concurrent physical and chemical nonequilibrium caused by inter aggregate gradients of pressure head and solute concentration and intra-aggregate geochemical and microbiological processes, respectively, may arise at various scales and flowpaths. To this date, experimental investigations of these complex processes at watershed scale remain a challenge and numerical studies are often needed for guidance of water resources management and decision making. This research integrates the knowledge bases developed during previous experimental and numerical investigations at a proposed waste disposal site at the Oak Ridge National Laboratory to study the concurrent effects of physical and chemical nonequilibrium. Comparison of numerical results with field data indicates that: (1) multiregion, preferential flow and solute transport exist under partially saturated condition and can be confirmed theoretically, and that (2) mass transfer between pore regions is an important process influencing contaminant movement in the subsurface. Simulations of watershed scale, multi species reactive solute transport suggest that dominance of geochemistry and hydrodynamics may occur simultaneously at different locales and influence the movement of one species relative to another. Execution times on the simulations of the reactive solute transport model also indicate that the model is ready to assist the selection of important parameters for site characterization.
Date: December 1997
Creator: Gwo, J. P.; Jardine, P. M.; D`Azevedo, E. F. & Wilson, G. V.
Partner: UNT Libraries Government Documents Department

The design and implementation of the parallel out-of-core ScaLAPACK LU, QR and Cholesky factorization routines

Description: This paper describes the design and implementation of three core factorization routines--LU, QR and Cholesky--included in the out-of-core extension of ScaLAPACK. These routines allow the factorization and solution of a dense system that is too large to fit entirely in physical memory. An image of the full matrix is maintained on disk and the factorization routines transfer sub-matrices into memory. The left-looking column-oriented variant of the factorization algorithm is implemented to reduce the disk I/O traffic. The routines are implemented using a portable I/O interface and utilize high performance ScaLAPACK factorization routines as in-core computational kernels. The authors present the details of the implementation for the out-of-core ScaLAPACK factorization routines, as well as performance and scalability results on the Intel Paragon.
Date: April 1, 1997
Creator: D`Azevedo, E.F. & Dongarra, J.J.
Partner: UNT Libraries Government Documents Department

ORNL Cray X1 evaluation status report

Description: On August 15, 2002 the Department of Energy (DOE) selected the Center for Computational Sciences (CCS) at Oak Ridge National Laboratory (ORNL) to deploy a new scalable vector supercomputer architecture for solving important scientific problems in climate, fusion, biology, nanoscale materials and astrophysics. ''This program is one of the first steps in an initiative designed to provide U.S. scientists with the computational power that is essential to 21st century scientific leadership,'' said Dr. Raymond L. Orbach, director of the department's Office of Science. In FY03, CCS procured a 256-processor Cray X1 to evaluate the processors, memory subsystem, scalability of the architecture, software environment and to predict the expected sustained performance on key DOE applications codes. The results of the micro-benchmarks and kernel bench marks show the architecture of the Cray X1 to be exceptionally fast for most operations. The best results are shown on large problems, where it is not possible to fit the entire problem into the cache of the processors. These large problems are exactly the types of problems that are important for the DOE and ultra-scale simulation. Application performance is found to be markedly improved by this architecture: - Large-scale simulations of high-temperature superconductors run 25 times faster than on an IBM Power4 cluster using the same number of processors. - Best performance of the parallel ocean program (POP v1.4.3) is 50 percent higher than on Japan s Earth Simulator and 5 times higher than on an IBM Power4 cluster. - A fusion application, global GYRO transport, was found to be 16 times faster on the X1 than on an IBM Power3. The increased performance allowed simulations to fully resolve questions raised by a prior study. - The transport kernel in the AGILE-BOLTZTRAN astrophysics code runs 15 times faster than on an IBM Power4 cluster using the ...
Date: May 1, 2004
Creator: Agarwal, P.K.; Alexander, R.A.; Apra, E.; Balay, S.; Bland, A.S; Colgan, J. et al.
Partner: UNT Libraries Government Documents Department