40 Matching Results

Search Results

Advanced search parameters have been applied.

Using a Transfer Function to Describe the Load-Balancing Problem

Description: The dynamic load-balancing problem for mesh-connected parallel computers can be clearly described by introducing a function that identifies how much work is to be transmitted between neighboring processors. This function is a solution to an elliptic problem for which a wealth of knowledge exists. The non-uniqueness of the solution to the load-balancing problem is made explicit.
Date: November 1993
Creator: Conley, Andrew J.
Partner: UNT Libraries Government Documents Department

Users manual for the Chameleon Parallel Programming Tools

Description: Message passing is a common method for writing programs for distributed-memory parallel computers. Unfortunately, the lack of a standard for message passing has hampered the construction of portable and efficient parallel programs. In an attempt to remedy this problem, a number of groups have developed their own message-passing systems, each with its own strengths and weaknesses. Chameleon is a second-generation system of this type. Rather than replacing these existing systems, Chameleon is meant to supplement them by providing a uniform way to access many of these systems. Chameleon`s goals are to (a) be very lightweight (low over-head), (b) be highly portable, and (c) help standardize program startup and the use of emerging message-passing operations such as collective operations on subsets of processors. Chameleon also provides a way to port programs written using PICL or Intel NX message passing to other systems, including collections of workstations. Chameleon is tracking the Message-Passing Interface (MPI) draft standard and will provide both an MPI implementation and an MPI transport layer. Chameleon provides support for heterogeneous computing by using p4 and PVM. Chameleon`s support for homogeneous computing includes the portable libraries p4, PICL, and PVM and vendor-specific implementation for Intel NX, IBM EUI (SP-1), and Thinking Machines CMMD (CM-5). Support for Ncube and PVM 3.x is also under development.
Date: June 1993
Creator: Gropp, William & Smith, Barry
Partner: UNT Libraries Government Documents Department

Programming in Fortran M

Description: Fortran M is a small set of extensions to Fortran that supports a modular approach to the construction of sequential and parallel programs. Fortran M programs use channels to plug together processes which may be written in Fortran M or Fortran 77. Processes communicate by sending and receiving messages on channels. Channels and processes can be created dynamically, but programs remain deterministic unless specialized nondeterministic constructs are used. Fortran M programs can execute on a range of sequential, parallel, and networked computers. This report incorporates both a tutorial introduction to Fortran M and a users guide for the Fortran M compiler developed at Argonne National Laboratory. The Fortran M compiler, supporting software, and documentation are made available free of charge by Argonne National Laboratory, but are protected by a copyright which places certain restrictions on how they may be redistributed. See the software for details. The latest version of both the compiler and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/fortran-m at info.mcs.anl.gov.
Date: August 1993
Creator: Foster, Ian; Olson, Robert & Tuecke, Steven
Partner: UNT Libraries Government Documents Department

Programming in Fortran M Revision 1

Description: Fortran M is a small set of extensions to Fortran that supports a modular approach to the construction of sequential and parallel programs. Fortran M programs use channels to plug together processes which may be written in Fortran M or Fortran 77. Processes communicate by sending and receiving messages on channels. Channels and processes can be created dynamically, but programs remain deterministic unless specialized nondeterministic constructs are used. Fortran M programs can execute on a range of sequential, parallel, and networked computers. This report incorporates both a tutorial introduction to Fortran M and a users guide for the Fortran M compiler developed at Argonne National Laboratory. The Fortran M compiler, supporting software, and documentation are made available free of charge by Argonne National Laboratory, but are protected by a copyright which places certain restrictions on how they may be redistributed. See the software for details. The latest version of both the compiler and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/fortran-m at info.mcs.anl.gov.
Date: October 1993
Creator: Foster, Ian; Olson, Robert & Tuecke, Steven
Partner: UNT Libraries Government Documents Department

Efficient Linked List Ranking Algorithms and Parentheses Matching as a New Strategy for Parallel Algorithm Design

Description: The goal of a parallel algorithm is to solve a single problem using multiple processors working together and to do so in an efficient manner. In this regard, there is a need to categorize strategies in order to solve broad classes of problems with similar structures and requirements. In this dissertation, two parallel algorithm design strategies are considered: linked list ranking and parentheses matching.
Date: December 1993
Creator: Halverson, Ranette Hudson
Partner: UNT Libraries

Final report LDRD project 105816 : model reduction of large dynamic systems with localized nonlinearities.

Description: Advanced computing hardware and software written to exploit massively parallel architectures greatly facilitate the computation of extremely large problems. On the other hand, these tools, though enabling higher fidelity models, have often resulted in much longer run-times and turn-around-times in providing answers to engineering problems. The impediments include smaller elements and consequently smaller time steps, much larger systems of equations to solve, and the inclusion of nonlinearities that had been ignored in days when lower fidelity models were the norm. The research effort reported focuses on the accelerating the analysis process for structural dynamics though combinations of model reduction and mitigation of some factors that lead to over-meshing.
Date: October 1, 2009
Creator: Lehoucq, Richard B.; Segalman, Daniel Joseph; Hetmaniuk, Ulrich L. (University of Washington, Seattle, WA) & Dohrmann, Clark R.
Partner: UNT Libraries Government Documents Department

A brief parallel I/O tutorial.

Description: This document provides common best practices for the efficient utilization of parallel file systems for analysts and application developers. A multi-program, parallel supercomputer is able to provide effective compute power by aggregating a host of lower-power processors using a network. The idea, in general, is that one either constructs the application to distribute parts to the different nodes and processors available and then collects the result (a parallel application), or one launches a large number of small jobs, each doing similar work on different subsets (a campaign). The I/O system on these machines is usually implemented as a tightly-coupled, parallel application itself. It is providing the concept of a 'file' to the host applications. The 'file' is an addressable store of bytes and that address space is global in nature. In essence, it is providing a global address space. Beyond the simple reality that the I/O system is normally composed of a small, less capable, collection of hardware, that concept of a global address space will cause problems if not very carefully utilized. How much of a problem and the ways in which those problems manifest will be different, but that it is problem prone has been well established. Worse, the file system is a shared resource on the machine - a system service. What an application does when it uses the file system impacts all users. It is not the case that some portion of the available resource is reserved. Instead, the I/O system responds to requests by scheduling and queuing based on instantaneous demand. Using the system well contributes to the overall throughput on the machine. From a solely self-centered perspective, using it well reduces the time that the application or campaign is subject to impact by others. The developer's goal should be to accomplish I/O in a ...
Date: March 1, 2010
Creator: Ward, H. Lee
Partner: UNT Libraries Government Documents Department

Lightweight storage and overlay networks for fault tolerance.

Description: The next generation of capability-class, massively parallel processing (MPP) systems is expected to have hundreds of thousands to millions of processors, In such environments, it is critical to have fault-tolerance mechanisms, including checkpoint/restart, that scale with the size of applications and the percentage of the system on which the applications execute. For application-driven, periodic checkpoint operations, the state-of-the-art does not provide a scalable solution. For example, on today's massive-scale systems that execute applications which consume most of the memory of the employed compute nodes, checkpoint operations generate I/O that consumes nearly 80% of the total I/O usage. Motivated by this observation, this project aims to improve I/O performance for application-directed checkpoints through the use of lightweight storage architectures and overlay networks. Lightweight storage provide direct access to underlying storage devices. Overlay networks provide caching and processing capabilities in the compute-node fabric. The combination has potential to signifcantly reduce I/O overhead for large-scale applications. This report describes our combined efforts to model and understand overheads for application-directed checkpoints, as well as implementation and performance analysis of a checkpoint service that uses available compute nodes as a network cache for checkpoint operations.
Date: January 1, 2010
Creator: Oldfield, Ron A.
Partner: UNT Libraries Government Documents Department

Parallel Solution of the Time-Dependent Ginzburg-Landau Equations and other Experiences using BlockComm-Chameleon and PCN on the IBM SP, Intel iPSC/860, and Clusters of Workstations

Description: Time-dependent Ginzburg-Landau (TDGL) equations are considered for modeling a thin-film finite size superconductor placed under magnetic field. The problem then leads to the use of so-called natural boundary conditions. Computational domain is partitioned into subdomains and bond variables are used in obtaining the corresponding discrete system of equations. An efficient time-differencing method based on the Forward Euler method is developed. Finally, a variable strength magnetic field resulting in a vortex motion in Type II High-critical-temperature superconducting films is introduced. The authors tackled the problem using two different state-of-the-art parallel computing tools: BlockComm/Chameleon and PCN. They had access to two high-performance distributed memory supercomputers: the Intel iPSC/860 and IBM SP1. They also tested the codes using, as a parallel computing environment, a cluster of Sun Sparc workstations.
Date: September 1995
Creator: Coskun, Erhan & Kwong, Man Kam
Partner: UNT Libraries Government Documents Department

Early Experiences with the IBM SP1 and the High-Performance Switch

Description: The IBM SP1 is IBM`s newest parallel distributed-memory computer. As part of a joint project with IBM, Argonne took delivery of an early system in order to evaluate the software environment and to begin porting programming packages and applications to this machine. This report discusses the results of those efforts once the high-performance switch was installed. An earlier report (ANL/MCS-TM-177) emphasized software usability and the initial ports to the SP1. This report contains performance results and discusses some applications and tools not covered in TM 177.
Date: November 1993
Creator: Gropp, William
Partner: UNT Libraries Government Documents Department

Quasi-Automatic Parallelization : a Simplified Approach to Multiprocessing

Description: As multiprocessors become commercially available, a great deal of concern is being focused on the problems involved in writing and debugging software for such machines. Earlier work described the use of monitors implemented by macro processors to attain portable code. This work formulates a general-purpose monitor which simplifies the programming of a wide class of numeric algorithms. We believe that the approach of describing a set of schedulable units of computation advocated by Brown offers a real simplification for the applications programmer. In this paper, we propose a straight-forward programming paradigm for describing schedulable units of computation that allows the description of many algorithms with very little effort.
Date: October 1985
Creator: Glickfeld, B. W.
Partner: UNT Libraries Government Documents Department

Activities and Operations of Argonne's Advanced Computing Research Facility : February 1990 through April 1991

Description: This report reviews the activities and operations of the Advanced Computing Research Facility (ACRF) from February 1990 through April 1991. The ACRF is operated by the Mathematics and Computer Science Division at Argonne National Laboratory. The facility's principal objective is to foster research in parallel computing. Toward this objective, the ACRF operates experimental advanced computers, supports investigations in parallel computing, and sponsors technology transfer efforts to industry and academia.
Date: May 1991
Creator: Pieper, Gail W.
Partner: UNT Libraries Government Documents Department

Research in Mathematics and Computer Science: March 1, 1991 - September 30, 1992

Description: This report discusses the following topics in mathematics and computer science at Argonne National Laboratory: Harnessing the Power; Modeling Piezoelectric Crystals; A Two-Way Street; The Challenge Is On; A True Molecular Engineering Capability; CHAMMPions Attack Climate Issues; Studying Vortex Dynamics; Studying Vortex Structure; Providing Reliable and Fast Derivatives; Automating Reasoning for Scientific Problem Solving; Optimization and Mathematical Programming; Scalable Algorithms for Linear Algebra; Reliable Core Software; Computing Phylogenetic Trees; Managing Life-Critical Systems; Interacting with Data through Visualization; New Tools for New Technologies.
Date: October 1992
Creator: Pieper, Gail W.
Partner: UNT Libraries Government Documents Department

BlockSolve v1. 1: Scalable Library Software for the Parallel Solution of Sparse Linear Systems

Description: BlockSolve is a software library for solving large, sparse systems of linear equations on massively parallel computers. The matrices must be symmetric, but may have an arbitrary sparsity structure. BlockSolve is a portable package that is compatible with several different message-passing pardigms. This report gives detailed instructions on the use of BlockSolve in applications programs.
Date: March 1993
Creator: Jones, Mark T. & Plassmann, Paul E.
Partner: UNT Libraries Government Documents Department

A Test Implementation of the MPI Draft Message-Passing Standard

Description: Message passing is a common method for programming parallel computers. The lack of a standard has significantly impeded the development of portable software libraries for these machines. Recently, an ad-hoc committee was formed to develop a standard for message-passing software for parallel computers. This group first met in April 1992 at a workshop sponsored in part by the Center for Research on Parallel Computation (CRPC). Four of the attendees at that meeting produced a draft standard, henceforth referred to as the MPI (Message-Passing Interface) draft standard. After review by a larger group, and significant changes in the document, a meeting was held in November to discuss the MPI draft standard. This document is a result of those discussions; it describes a running implementation of in most of the proposed standard, plus additional routines that were suggested by the discussions at the November meeting.
Date: December 1992
Creator: Gropp, William & Lusk, Ewing L.
Partner: UNT Libraries Government Documents Department

Practical Parallel Processing

Description: The physical limitations of uniprocessors and the real-time requirements of numerous practical applications have made parallel processing an essential technology in military, industry and scientific research. In this dissertation, we investigate parallelizations of three practical applications using three parallel machine models. The algorithms are: Finitely inductive (FI) sequence processing is a pattern recognition technique used in many fields. We first propose four parallel FI algorithms on the EREW PRAM. The time complexity of the parallel factoring and following by bucket packing is O(sk^2 n/p), and they are optimal under some conditions. The parallel factoring and following by hashing requires O(sk^2 n/p) time when uniform hash functions are used and log(p) ≤ k n/p and pm ≈ n. Their speedup is proportional to the number processors used. For these results, s is the number of levels, k is the size of the antecedents and n is the length of the input sequence and p is the number of processors. We also describe algorithms for raster/vector conversion based on the scan model to handle block-like connected components of arbitrary geometrical shapes with multi-level nested dough nuts for the IES (image exploitation system). Both the parallel raster-to-vector algorithm and parallel vector-to-raster algorithm require O(log(n2)) or O(log2(n2)) time (depending on the sorting algorithms used) for images of size n2 using p = n2 processors. Not only is the DWT (discrete wavelet transforms) useful in data compression, but also has it potentials in signal processing, image processing, and graphics. Therefore, it is of great importance to investigate efficient parallelizations of the wavelet transforms. The time complexity of the parallel forward DWT on the parallel virtual machine with linear processor organization is O(((so+s1)mn)/p), where s0 and s1 are the lengths of the filters and p is the number of processors used. The time complexity of the ...
Date: August 1996
Creator: Zhang, Hua, 1954-
Partner: UNT Libraries

Semaphore Solutions for General Mutual Exclusion Problems

Description: Automatic generation of starvation-free semaphore solutions to general mutual exclusion problems is discussed. A reduction approach is introduced for recognizing edge-solvable problems, together with an O(N^2) algorithm for graph reduction, where N is the number of nodes. An algorithm for the automatic generation of starvation-free edge-solvable solutions is presented. The solutions are proved to be very efficient. For general problems, there are two ways to generate efficient solutions. One associates a semaphore with every node, the other with every edge. They are both better than the standard monitor—like solutions. Besides strong semaphores, solutions using weak semaphores, weaker semaphores and generalized semaphores are also considered. Basic properties of semaphore solutions are also discussed. Tools describing the dynamic behavior of parallel systems, as well as performance criteria for evaluating semaphore solutions are elaborated.
Date: August 1988
Creator: Yue, Kwok B. (Kwok Bun)
Partner: UNT Libraries

Parallel tetrahedral mesh refinement with MOAB.

Description: In this report, we present the novel functionality of parallel tetrahedral mesh refinement which we have implemented in MOAB. This report details work done to implement parallel, edge-based, tetrahedral refinement into MOAB. The theoretical basis for this work is contained in [PT04, PT05, TP06] while information on design, performance, and operation specific to MOAB are contained herein. As MOAB is intended mainly for use in pre-processing and simulation (as opposed to the post-processing bent of previous papers), the primary use case is different: rather than refining elements with non-linear basis functions, the goal is to increase the number of degrees of freedom in some region in order to more accurately represent the solution to some system of equations that cannot be solved analytically. Also, MOAB has a unique mesh representation which impacts the algorithm. This introduction contains a brief review of streaming edge-based tetrahedral refinement. The remainder of the report is broken into three sections: design and implementation, performance, and conclusions. Appendix A contains instructions for end users (simulation authors) on how to employ the refiner.
Date: December 1, 2008
Creator: Thompson, David C. & Pebay, Philippe Pierre
Partner: UNT Libraries Government Documents Department

Architectural requirements for the Red Storm computing system.

Description: This report is based on the Statement of Work (SOW) describing the various requirements for delivering 3 new supercomputer system to Sandia National Laboratories (Sandia) as part of the Department of Energy's (DOE) Accelerated Strategic Computing Initiative (ASCI) program. This system is named Red Storm and will be a distributed memory, massively parallel processor (MPP) machine built primarily out of commodity parts. The requirements presented here distill extensive architectural and design experience accumulated over a decade and a half of research, development and production operation of similar machines at Sandia. Red Storm will have an unusually high bandwidth, low latency interconnect, specially designed hardware and software reliability features, a light weight kernel compute node operating system and the ability to rapidly switch major sections of the machine between classified and unclassified computing environments. Particular attention has been paid to architectural balance in the design of Red Storm, and it is therefore expected to achieve an atypically high fraction of its peak speed of 41 TeraOPS on real scientific computing applications. In addition, Red Storm is designed to be upgradeable to many times this initial peak capability while still retaining appropriate balance in key design dimensions. Installation of the Red Storm computer system at Sandia's New Mexico site is planned for 2004, and it is expected that the system will be operated for a minimum of five years following installation.
Date: October 1, 2003
Creator: Camp, William J. & Tomkins, James Lee
Partner: UNT Libraries Government Documents Department

A Parallel Genetic Algorithm for the Set Partitioning Problem

Description: In this dissertation the author reports on his efforts to develop a parallel genetic algorithm and apply it to the solution of set partitioning problem -- a difficult combinatorial optimization problem used by many airlines as a mathematical model for flight crew scheduling. He developed a distributed steady-state genetic algorithm in conjunction with a specialized local search heuristic for solving the set partitioning problem. The genetic algorithm is based on an island model where multiple independent subpopulations each run a steady-state genetic algorithm on their subpopulation and occasionally fit strings migrate between the subpopulations. Tests on forty real-world set partitioning problems were carried out on up to 128 nodes of an IBM SP1 parallel computer. The authors found that performance, as measured by the quality of the solution found and the iteration on which it was found, improved as additional subpopulation found and the iteration on which it was found, improved as additional subpopulations were added to the computation. With larger numbers of subpopulations the genetic algorithm was regularly able to find the optimal solution to problems having up to a few thousand integer variables. In two cases, high-quality integer feasible solutions were found for problems with 36,699 and 43,749 integer variables, respectively. A notable limitation they found was the difficulty solving problems with many constraints.
Date: May 1994
Creator: Levine, David
Partner: UNT Libraries Government Documents Department

Activities and Operations of the Advanced Computing Research Facility : October 1986-October 1987

Description: This paper contains a description of the work being carried out at the advanced computing research facility at Argonne National Laboratory. Topics covered are upgrading of computers, networking changes, algorithms, parallel programming, programming languages, and user training.
Date: 1987?
Creator: Pieper, Gail W.
Partner: UNT Libraries Government Documents Department

Activities and Operations of the Advanced Computing Research Facility : January 1989-January 1990

Description: This report reviews the activities and operations of the Advanced Computing Research Facility (ACRF) for the period January 1, 1989, through January 31, 1990. The ACRF is operated by the Mathematics and Computer Science Division at Argonne National Laboratory. The facility's principal objective is to foster research in parallel computing. Toward this objective, the ACRF continues to operate experimental advanced computers and to sponsor new technology transfer efforts and new research projects.
Date: February 1990
Creator: Pieper, Gail W.
Partner: UNT Libraries Government Documents Department