384 Matching Results

Search Results

Advanced search parameters have been applied.

Adaptive Mesh Refinement Algorithms for Parallel Unstructured Finite Element Codes

Description: This project produced algorithms for and software implementations of adaptive mesh refinement (AMR) methods for solving practical solid and thermal mechanics problems on multiprocessor parallel computers using unstructured finite element meshes. The overall goal is to provide computational solutions that are accurate to some prescribed tolerance, and adaptivity is the correct path toward this goal. These new tools will enable analysts to conduct more reliable simulations at reduced cost, both in terms of analyst and computer time. Previous academic research in the field of adaptive mesh refinement has produced a voluminous literature focused on error estimators and demonstration problems; relatively little progress has been made on producing efficient implementations suitable for large-scale problem solving on state-of-the-art computer systems. Research issues that were considered include: effective error estimators for nonlinear structural mechanics; local meshing at irregular geometric boundaries; and constructing efficient software for parallel computing environments.
Date: February 3, 2006
Creator: Parsons, I D & Solberg, J M
Partner: UNT Libraries Government Documents Department

Using Pin as a Memory Reference Generator for Multiprocessor Simulation

Description: In this paper we describe how we have used Pin to generate a multithreaded reference stream for simulation of a multiprocessor on a uniprocessor. We have taken special care to model as accurately as possible the effects of cache coherence protocol state, and lock and barrier synchronization on the performance of multithreaded applications running on multiprocessor hardware. We first describe a simplified version of the algorithm, which uses semaphores to synchronize instrumented application threads and the simulator on every memory reference. We then describe modifications to that algorithm to model the microarchitectural features of the Itanium2 that affect the timing of memory reference issue. An experimental evaluation determines that while cycle-accurate multithreaded simulation is possible using our approach, the use of semaphores has a negative impact on the performance of the simulator.
Date: October 22, 2005
Creator: McCurdy, C
Partner: UNT Libraries Government Documents Department

Dynamic Program Phase Detection in Distributed Shared-Memory Multiprocessors

Description: We present a novel hardware mechanism for dynamic program phase detection in distributed shared-memory (DSM) multiprocessors. We show that successful hardware mechanisms for phase detection in uniprocessors do not necessarily work well in DSM systems, since they lack the ability to incorporate the parallel application's global execution information and memory access behavior based on data distribution. We then propose a hardware extension to a well-known uniprocessor mechanism that significantly improves phase detection in the context of DSM multiprocessors. The resulting mechanism is modest in size and complexity, and is transparent to the parallel application.
Date: March 6, 2006
Creator: Ipek, E; Martinez, J F; de Supinski, B R; McKee, S A & Schulz, M
Partner: UNT Libraries Government Documents Department

Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications

Description: As multiprocessors scale to unprecedented numbers of cores in order to sustain performance growth, it is vital that these gains are not nullified by high energy consumption from inter-core communication. With recent advances in 3D Integration CMOS technology, the possibility for realizing hybrid photonic-electronic networks-on-chip warrants investigating real application traces on functionally comparable photonic and electronic network designs. We present a comparative analysis using both synthetic benchmarks as well as real applications, run through detailed cycle accurate models implemented under the OMNeT++ discrete event simulation environment. Results show that when utilizing standard process-to-processor mapping methods, this hybrid network can achieve 75X improvement in energy efficiency for synthetic benchmarks and up to 37X improvement for real scientific applications, defined as network performance per energy spent, over an electronic mesh for large messages across a variety of communication patterns.
Date: January 31, 2009
Creator: Kamil, Shoaib A.; Hendry, Gilbert; Biberman, Aleksandr; Chan, Johnnie; Lee, Benjamin G.; Mohiyuddin, Marghoob et al.
Partner: UNT Libraries Government Documents Department

Critical Path-Based Thread Placement for NUMA Systems

Description: Multicore multiprocessors use a Non Uniform Memory Architecture (NUMA) to improve their scalability. However, NUMA introduces performance penalties due to remote memory accesses. Without efficiently managing data layout and thread mapping to cores, scientific applications, even if they are optimized for NUMA, may suffer performance loss. In this paper, we present algorithms and a runtime system that optimize the execution of OpenMP applications on NUMA architectures. By collecting information from hardware counters, the runtime system directs thread placement and reduces performance penalties by minimizing the critical path of OpenMP parallel regions. The runtime system uses a scalable algorithm that derives placement decisions with negligible overhead. We evaluate our algorithms and runtime system with four NPB applications implemented in OpenMP. On average the algorithms achieve between 8.13% and 25.68% performance improvement compared to the default Linux thread placement scheme. The algorithms miss the optimal thread placement in only 8.9% of the cases.
Date: November 1, 2011
Creator: Su, C Y; Li, D; Nikolopoulos, D S; Grove, M; Cameron, K & de Supinski, B R
Partner: UNT Libraries Government Documents Department

Use of Monitors in FORTRAN: a Tutorial on the Barrier, Self-Scheduling DO-Loop, and Ask for Monitors

Description: A set of macro libraries has been developed that allows programmers to write portable FORTRAN code for multiprocessors. This document presents, in tutorial form, the macros used to implement three common synchronization patterns: self-scheduling DO-loops, barrier synchronization, and the askfor monitor.
Date: July 1984
Creator: Lusk, Ewing L. & Overbeek, Ross A.
Partner: UNT Libraries Government Documents Department

Hard chaos, quantum billiards, and quantum dot computers

Description: This is the final report of a three-year, Laboratory-Directed Research and Development (LDRD) project at the Los Alamos National Laboratory (LANL). Research was performed in analytic and computational techniques for dealing with hard chaos, especially the powerful tool of cycle expansions. This work has direct application to the understanding of electrons in nanodevices, such as junctions of quantum wires, or in arrays of dots or antidots. We developed a series of techniques for computing the properties of quantum systems with hard chaos, in particular the flow of electrons through nanodevices. These techniques are providing the insight and tools to design computers with nanoscale components. Recent efforts concentrated on understanding the effects of noise and orbit pruning in chaotic dynamical systems. We showed that most complicated chaotic systems (not just those equivalent to a finite shift) will develop branch points in their cycle expansion. Once the singularity is known to exist, it can be removed with a dramatic increase in the speed of convergence of quantities of physical interest.
Date: July 1, 1996
Creator: Mainieri, R.; Cvitanovic, P. & Hasslacher, B.
Partner: UNT Libraries Government Documents Department

PMESH: A parallel mesh generator

Description: The Parallel Mesh Generation (PMESH) Project is a joint LDRD effort by A Division and Engineering to develop a unique mesh generation system that can construct large calculational meshes (of up to 10{sup 9} elements) on massively parallel computers. Such a capability will remove a critical roadblock to unleashing the power of massively parallel processors (MPPs) for physical analysis. PMESH will support a variety of LLNL 3-D physics codes in the areas of electromagnetics, structural mechanics, thermal analysis, and hydrodynamics.
Date: October 21, 1994
Creator: Hardin, D.D.
Partner: UNT Libraries Government Documents Department

Benchmark tests on the digital equipment corporation Alpha AXP 21164-based AlphaServer 8400, including a comparison of optimized vector and superscalar processing

Description: The second generation of the Digital Equipment Corp. (DEC) DECchip Alpha AXP microprocessor is referred to as the 21164. From the viewpoint of numerically-intensive computing, the primary difference between it and its predecessor, the 21064, is that the 21164 has twice the multiply/add throughput per clock period (CP), a maximum of two floating point operations (FLOPS) per CP vs. one for 21064. The AlphaServer 8400 is a shared-memory multiprocessor server system that can accommodate up to 12 CPUs and up to 14 GB of memory. In this report we will compare single processor performance of the 8400 system with that of the International Business Machines Corp. (IBM) RISC System/6000 POWER-2 microprocessor running at 66 MHz, the Silicon Graphics, Inc. (SGI) MIPS R8000 microprocessor running at 75 MHz, and the Cray Research, Inc. CRAY J90. The performance comparison is based on a set of Fortran benchmark codes that represent a portion of the Los Alamos National Laboratory supercomputer workload. The advantage of using these codes, is that the codes also span a wide range of computational characteristics, such as vectorizability, problem size, and memory access pattern. The primary disadvantage of using them is that detailed, quantitative analysis of performance behavior of all codes on all machines is difficult. One important addition to the benchmark set appears for the first time in this report. Whereas the older version was written for a vector processor, the newer version is more optimized for microprocessor architectures. Therefore, we have for the first time, an opportunity to measure performance on a single application using implementations that expose the respective strengths of vector and superscalar architecture. All results in this report are from single processors. A subsequent article will explore shared-memory multiprocessing performance of the 8400 system.
Date: February 1, 1996
Creator: Wasserman, H.J.
Partner: UNT Libraries Government Documents Department

Wide-area ATM networking for large-scale MPPs

Description: This paper presents early experiences with using high-speed ATM interfaces to connect multiple Intel Paragons on both local and wide area networks. The testbed includes the 1024 and 512 node Paragons running the OSF operating system at Oak Ridge National Laboratory and the 1840 node Paragon running the Puma operating system at Sandia National Laboratories. The experimental OC-12 (622 Mbits/sec) interfaces are built by GigaNet and provide a proprietary API for sending AAL-5 encapsulated packets. PVM is used as the massaging infrastructure and significant modifications have been made to use the GigaNet API, operate in the Puma environment, and attain acceptable performance over local networks. These modifications are described along with a discussion of roadblocks to networking MPPs with high-performance interfaces. Our early prototype utilizes approximately 25 percent of an OC-12 circuit and 80 percent of an OC-3 circuit in send plus acknowledgment ping-pong tests.
Date: April 1, 1997
Creator: Papadopoulos, P.M. & Geist, G.A. II
Partner: UNT Libraries Government Documents Department

Optical melting as a tool for optimizing SBH analysis of DNA

Description: Sequencing by hybridization is a technique that relies on the specific hybridization properties of nucleic acids to derive sequence information. Hybridization properties are highly dependent on the DNA sequence and the solution environment. Identification of the optimal SBH conditions can be obtained by optical melting. By optical melting of 128 octamer pairs that have the appropriate choice of nucleic acid structures, a useful model of stability has been obtained which will aid in the design and implementation of SBH assays.
Date: April 1995
Creator: Doktycz, M. J.; Jacobson, K. B.; Foote, R. S. & Beattie, K. L.
Partner: UNT Libraries Government Documents Department

Concurrency-based approaches to parallel programming

Description: The inevitable transition to parallel programming can be facilitated by appropriate tools, including languages and libraries. After describing the needs of applications developers, this paper presents three specific approaches aimed at development of efficient and reusable parallel software for irregular and dynamic-structured problems. A salient feature of all three approaches in their exploitation of concurrency within a processor. Benefits of individual approaches such as these can be leveraged by an interoperability environment which permits modules written using different approaches to co-exist in single applications.
Date: July 17, 1995
Creator: Kale, L.V.; Chrisochoides, N. & Kohl, J.
Partner: UNT Libraries Government Documents Department

Numerical methods on some structured matrix algebra problems

Description: This proposal concerned the design, analysis, and implementation of serial and parallel algorithms for certain structured matrix algebra problems. It emphasized large order problems and so focused on methods that can be implemented efficiently on distributed-memory MIMD multiprocessors. Such machines supply the computing power and extensive memory demanded by the large order problems. We proposed to examine three classes of matrix algebra problems: the symmetric and nonsymmetric eigenvalue problems (especially the tridiagonal cases) and the solution of linear systems with specially structured coefficient matrices. As all of these are of practical interest, a major goal of this work was to translate our research in linear algebra into useful tools for use by the computational scientists interested in these and related applications. Thus, in addition to software specific to the linear algebra problems, we proposed to produce a programming paradigm and library to aid in the design and implementation of programs for distributed-memory MIMD computers. We now report on our progress on each of the problems and on the programming tools.
Date: June 1, 1996
Creator: Jessup, E.R.
Partner: UNT Libraries Government Documents Department

Expanding symmetric multiprocessor capability through gang scheduling

Description: Symmetric Multiprocessor (SMP) systems normally provide both space- sharing and time-sharing to insure high system utilization and good responsiveness. However the prevailing lack of concurrent scheduling for parallel programs precludes SMP use in addressing many large-scale problems. Tightly synchronized communications are impractical and normal time-sharing reduces the benefit of cache memory. Evidence gathered at Lawrence Livermore National Laboratory (LLNL) indicates that gang scheduling can increase the capability of SMP systems and parallel program performance without adverse impact upon system utilization or responsiveness.
Date: March 1, 1998
Creator: Jette, M.A.
Partner: UNT Libraries Government Documents Department

Parallel paving: An algorithm for generating distributed, adaptive, all-quadrilateral meshes on parallel computers

Description: Paving is an automated mesh generation algorithm which produces all-quadrilateral elements. It can additionally generate these elements in varying sizes such that the resulting mesh adapts to a function distribution, such as an error function. While powerful, conventional paving is a very serial algorithm in its operation. Parallel paving is the extension of serial paving into parallel environments to perform the same meshing functions as conventional paving only on distributed, discretized models. This extension allows large, adaptive, parallel finite element simulations to take advantage of paving`s meshing capabilities for h-remap remeshing. A significantly modified version of the CUBIT mesh generation code has been developed to host the parallel paving algorithm and demonstrate its capabilities on both two dimensional and three dimensional surface geometries and compare the resulting parallel produced meshes to conventionally paved meshes for mesh quality and algorithm performance. Sandia`s {open_quotes}tiling{close_quotes} dynamic load balancing code has also been extended to work with the paving algorithm to retain parallel efficiency as subdomains undergo iterative mesh refinement.
Date: March 1, 1997
Creator: Lober, R.R.; Tautges, T.J. & Vaughan, C.T.
Partner: UNT Libraries Government Documents Department

Why are PVM and MPI so different?

Description: PVM and MPI are often compared. These comparisons usually start with the unspoken assumption that PVM and MPI represent different solutions to the same problem. In this paper we show that, in fact, the two systems often are solving different problems. In cases where the problems do match but the solutions chosen by PVM and MPI are different, we explain the reasons for the differences. Usually such differences can be traced to explicit differences in the goals of the two systems, their origins, or the relationship between their specifications and their implementations.
Date: September 1, 1997
Creator: Gropp, W. & Lusk, E.
Partner: UNT Libraries Government Documents Department

Experiences implementing the MPI standard on Sandia`s lightweight kernels

Description: This technical report describes some lessons learned from implementing the Message Passing Interface (MPI) standard, and some proposed extentions to MPI, at Sandia. The implementations were developed using Sandia-developed lightweight kernels running on the Intel Paragon and Intel TeraFLOPS platforms. The motivations for this research are discussed, and a detailed analysis of several implementation issues is presented.
Date: October 1, 1997
Creator: Brightwell, R. & Greenberg, D.S.
Partner: UNT Libraries Government Documents Department

Users guide to the PGAPack parallel genetic algorithm library

Description: PGAPack is a parallel genetic algorithm library that is intended to provide most capabilities desired in a genetic algorithm package, in an integrated, seamless, and portable manner. Key features of PGAPack are as follows: Ability to be called from Fortran or C. Executable on uniprocessors, multiprocessors, multicomputers, and workstation networks. Binary-, integer-, real-, and character-valued native data types. Object-oriented data structure neutral design. Parameterized population replacement. Multiple choices for selection, crossover, and mutation operators. Easy integration of hill-climbing heuristics. Easy-to-use interface for novice and application users. Multiple levels of access for expert users. Full extensibility to support custom operators and new data types. Extensive debugging facilities. Large set of example problems.
Date: January 1, 1996
Creator: Levine, D.
Partner: UNT Libraries Government Documents Department

Optimal time-critical scheduling via resource augmentation

Description: We consider two fundamental problems in dynamic scheduling: scheduling to meet deadlines in a preemptive multiprocessor setting, and scheduling to provide good response time in a number of scheduling environments. When viewed from the perspective of traditional worst-case analysis, no good on-line algorithms exist for these problems, and for some variants no good off-line algorithms exist unless {Rho} = {Nu}{Rho}. We study these problems using a relaxed notion of competitive analysis, introduced by Kalyanasundaram and Pruhs, in which the on-line algorithm is allowed more resources than the optimal off-line algorithm to which it is compared. Using this approach, we establish that several well-known on-line algorithms, that have poor performance from an absolute worst-case perspective, are optimal for the problems in question when allowed moderately more resources. For the optimization of average flow time, these are the first results of any sort, for any {Nu}{Rho}-hard version of the problem, that indicate that it might be possible to design good approximation algorithms.
Date: April 1, 1997
Creator: Phillips, C.A.; Stein, C.; Torng, E. & Wein, J.
Partner: UNT Libraries Government Documents Department

DeepView: A collaborative framework for distributed microscopy

Description: This paper outlines the motivation, requirements, and architecture of a collaborative framework for distributed virtual microscopy. In this context, the requirements are specified in terms of (1) functionality, (2) scalability, (3) interactivity, and (4) safety and security. Functionality refers to what and how an instrument does something. Scalability refers to the number of instruments, vendor-specific desktop workstations, analysis programs, and collaborators that can be accessed. Interactivity refers to how well the system can be steered either for static or dynamic experiments. Safety and security refers to safe operation of an instrument coupled with user authentication, privacy, and integrity of data communication. To meet these requirements, we introduce three types of services in the architecture: Instrument Services (IS), Exchange Services (ES), and Computational Services (CS). These services may reside on any host in the distributed system. The IS provide an abstraction for manipulating different types of microscopes; the ES provide common services that are required between different resources; and the CS provide analytical capabilities for data analysis and simulation. These services are brought together through CORBA and its enabling services, e.g., Event Services, Time Services, Naming Services, and Security Services. Two unique applications have been introduced into the CS for analyzing scientific images either for instrument control or recovery of a model for objects of interest. These include: in-situ electron microscopy and recovery of 3D shape from holographic microscopy. The first application provides a near real-time processing of the video-stream for on-line quantitative analysis and the use of that information for closed-loop servo control. The second application reconstructs a 3D representation of an inclusion (a crystal structure in a matrix) from multiple views through holographic electron microscopy. These application require steering external stimuli or computational parameters for a particular result. In a sense, ''computational instruments'' (symmetric multiprocessors) interact closely with data ...
Date: August 10, 1998
Creator: Parvin, B.; Taylor, J. & Cong, G.
Partner: UNT Libraries Government Documents Department

Some Parallel Extensions to Optimization Methods in OPT++

Description: OPT++ provides an array of optimization tools for solving scientific and engineering design problems. While these tools are useful, all of the code is serial. With increasingly easy access to multiprocessor machines and clusters of workstations, this results in unnecessarily long times to solution. In order to correct this problem, we have implemented a number of parallel techniques in OPT++. In particular, we have incorporated a speculative gradient algorithm that drastically reduces the time to solution for standard trust-region and line search algorithms. In addition, we have implemented a new version of the Trust-Region Parallel Direct Search (TRPDS) algorithm of Hough and Meza that yields a significant reduction in solution time for problems with expensive function evaluations.
Date: October 1, 2000
Creator: Howle, V. E.; Shont, S. M. & Hough, P. D.
Partner: UNT Libraries Government Documents Department

Advanced array techniques for unattended ground sensor applications

Description: Sensor arrays offer opportunities to beam form, and time-frequency analyses offer additional insights to the wavefield data. Data collected while monitoring three different sources with unattended ground sensors in a 16-element, small-aperture (approximately 5 meters) geophone array are used as examples of model-based seismic signal processing on actual geophone array data. The three sources monitored were: (Source 01). A frequency-modulated chirp of an electromechanical shaker mounted on the floor of an underground bunker. Three 60-second time-windows corresponding to (a) 50 Hz to 55 Hz sweep, (b) 60 Hz to 70 Hz sweep, and (c) 80 Hz to 90 Hz sweep. (Source 02). A single transient impact of a hammer striking the floor of the bunker. Twenty seconds of data (with the transient event approximately mid-point in the time window.(Source 11)). The transient event of a diesel generator turning on, including a few seconds before the turn-on time and a few seconds after the generator reaches steady-state conditions. The high-frequency seismic array was positioned at the surface of the ground at a distance of 150 meters (North) of the underground bunker. Four Y-shaped subarrays (each with 2-meter apertures) in a Y-shaped pattern (with a 6-meter aperture) using a total of 16 3-component, high-frequency geophones were deployed. These 48 channels of seismic data were recorded at 6000 and 12000 samples per second on 16-bit data loggers. Representative examples of the data and analyses illustrate the results of this experiment.
Date: May 6, 1997
Creator: Followill, F.E.; Wolford, J.K. & Candy, J.V.
Partner: UNT Libraries Government Documents Department

Scalable Unix tools on parallel processors

Description: The introduction of parallel processors that run a separate copy of Unix on each process has introduced new problems in managing the user`s environment. This paper discusses some generalizations of common Unix commands for managing files (e.g. 1s) and processes (e.g. ps) that are convenient and scalable. These basic tools, just like their Unix counterparts, are text-based. We also discuss a way to use these with a graphical user interface (GUI). Some notes on the implementation are provided. Prototypes of these commands are publicly available.
Date: December 31, 1994
Creator: Gropp, W. & Lusk, E.
Partner: UNT Libraries Government Documents Department

Special parallel processing workshop

Description: This report contains viewgraphs from the Special Parallel Processing Workshop. These viewgraphs deal with topics such as parallel processing performance, message passing, queue structure, and other basic concept detailing with parallel processing.
Date: December 1, 1994
Partner: UNT Libraries Government Documents Department