819 Matching Results

Search Results

Advanced search parameters have been applied.

Practical Parallel Processing

Description: The physical limitations of uniprocessors and the real-time requirements of numerous practical applications have made parallel processing an essential technology in military, industry and scientific research. In this dissertation, we investigate parallelizations of three practical applications using three parallel machine models. The algorithms are: Finitely inductive (FI) sequence processing is a pattern recognition technique used in many fields. We first propose four parallel FI algorithms on the EREW PRAM. The time complexity of the parallel factoring and following by bucket packing is O(sk^2 n/p), and they are optimal under some conditions. The parallel factoring and following by hashing requires O(sk^2 n/p) time when uniform hash functions are used and log(p) ≤ k n/p and pm ≈ n. Their speedup is proportional to the number processors used. For these results, s is the number of levels, k is the size of the antecedents and n is the length of the input sequence and p is the number of processors. We also describe algorithms for raster/vector conversion based on the scan model to handle block-like connected components of arbitrary geometrical shapes with multi-level nested dough nuts for the IES (image exploitation system). Both the parallel raster-to-vector algorithm and parallel vector-to-raster algorithm require O(log(n2)) or O(log2(n2)) time (depending on the sorting algorithms used) for images of size n2 using p = n2 processors. Not only is the DWT (discrete wavelet transforms) useful in data compression, but also has it potentials in signal processing, image processing, and graphics. Therefore, it is of great importance to investigate efficient parallelizations of the wavelet transforms. The time complexity of the parallel forward DWT on the parallel virtual machine with linear processor organization is O(((so+s1)mn)/p), where s0 and s1 are the lengths of the filters and p is the number of processors used. The time complexity of the ...
Date: August 1996
Creator: Zhang, Hua, 1954-
Partner: UNT Libraries

Hybrid Parallelism for Volume Rendering on Large, Multi- and Many-core Systems

Description: With the computing industry trending towards multi- and many-core processors, we study how a standard visualization algorithm, ray-casting volume rendering, can benefit from a hybrid parallelism approach. Hybrid parallelism provides the best of both worlds: using distributed-memory parallelism across a large numbers of nodes increases available FLOPs and memory, while exploiting shared-memory parallelism among the cores within each node ensures that each node performs its portion of the larger calculation as efficiently as possible. We demonstrate results from weak and strong scaling studies, at levels of concurrency ranging up to 216,000, and with datasets as large as 12.2 trillion cells. The greatest benefit from hybrid parallelism lies in the communication portion of the algorithm, the dominant cost at higher levels of concurrency. We show that reducing the number of participants with a hybrid approach significantly improves performance.
Date: January 1, 2011
Creator: Howison, Mark; Bethel, E. Wes & Childs, Hank
Partner: UNT Libraries Government Documents Department

Final report LDRD project 105816 : model reduction of large dynamic systems with localized nonlinearities.

Description: Advanced computing hardware and software written to exploit massively parallel architectures greatly facilitate the computation of extremely large problems. On the other hand, these tools, though enabling higher fidelity models, have often resulted in much longer run-times and turn-around-times in providing answers to engineering problems. The impediments include smaller elements and consequently smaller time steps, much larger systems of equations to solve, and the inclusion of nonlinearities that had been ignored in days when lower fidelity models were the norm. The research effort reported focuses on the accelerating the analysis process for structural dynamics though combinations of model reduction and mitigation of some factors that lead to over-meshing.
Date: October 1, 2009
Creator: Lehoucq, Richard B.; Segalman, Daniel Joseph; Hetmaniuk, Ulrich L. (University of Washington, Seattle, WA) & Dohrmann, Clark R.
Partner: UNT Libraries Government Documents Department

A brief parallel I/O tutorial.

Description: This document provides common best practices for the efficient utilization of parallel file systems for analysts and application developers. A multi-program, parallel supercomputer is able to provide effective compute power by aggregating a host of lower-power processors using a network. The idea, in general, is that one either constructs the application to distribute parts to the different nodes and processors available and then collects the result (a parallel application), or one launches a large number of small jobs, each doing similar work on different subsets (a campaign). The I/O system on these machines is usually implemented as a tightly-coupled, parallel application itself. It is providing the concept of a 'file' to the host applications. The 'file' is an addressable store of bytes and that address space is global in nature. In essence, it is providing a global address space. Beyond the simple reality that the I/O system is normally composed of a small, less capable, collection of hardware, that concept of a global address space will cause problems if not very carefully utilized. How much of a problem and the ways in which those problems manifest will be different, but that it is problem prone has been well established. Worse, the file system is a shared resource on the machine - a system service. What an application does when it uses the file system impacts all users. It is not the case that some portion of the available resource is reserved. Instead, the I/O system responds to requests by scheduling and queuing based on instantaneous demand. Using the system well contributes to the overall throughput on the machine. From a solely self-centered perspective, using it well reduces the time that the application or campaign is subject to impact by others. The developer's goal should be to accomplish I/O in a ...
Date: March 1, 2010
Creator: Ward, H. Lee
Partner: UNT Libraries Government Documents Department

Lightweight storage and overlay networks for fault tolerance.

Description: The next generation of capability-class, massively parallel processing (MPP) systems is expected to have hundreds of thousands to millions of processors, In such environments, it is critical to have fault-tolerance mechanisms, including checkpoint/restart, that scale with the size of applications and the percentage of the system on which the applications execute. For application-driven, periodic checkpoint operations, the state-of-the-art does not provide a scalable solution. For example, on today's massive-scale systems that execute applications which consume most of the memory of the employed compute nodes, checkpoint operations generate I/O that consumes nearly 80% of the total I/O usage. Motivated by this observation, this project aims to improve I/O performance for application-directed checkpoints through the use of lightweight storage architectures and overlay networks. Lightweight storage provide direct access to underlying storage devices. Overlay networks provide caching and processing capabilities in the compute-node fabric. The combination has potential to signifcantly reduce I/O overhead for large-scale applications. This report describes our combined efforts to model and understand overheads for application-directed checkpoints, as well as implementation and performance analysis of a checkpoint service that uses available compute nodes as a network cache for checkpoint operations.
Date: January 1, 2010
Creator: Oldfield, Ron A.
Partner: UNT Libraries Government Documents Department

Efficient Linked List Ranking Algorithms and Parentheses Matching as a New Strategy for Parallel Algorithm Design

Description: The goal of a parallel algorithm is to solve a single problem using multiple processors working together and to do so in an efficient manner. In this regard, there is a need to categorize strategies in order to solve broad classes of problems with similar structures and requirements. In this dissertation, two parallel algorithm design strategies are considered: linked list ranking and parentheses matching.
Date: December 1993
Creator: Halverson, Ranette Hudson
Partner: UNT Libraries

Parallelization of an unstructured grid, hydrodynamic-diffusion code

Description: We describe the parallelization of a three dimensional, un structured grid, finite element code which solves hyperbolic conservation laws for mass, momentum, and energy, and diffusion equations modeling heat conduction and radiation transport. Explicit temporal differencing advances the cell-based gasdynamic equations. Diffusion equations use fully implicit differencing of nodal variables which leads to large, sparse, symmetric, and positive definite matrices. Because of the unstructured grid, the off-diagonal non-zero elements appear in unpredictable locations. The linear systems are solved using parallelized conjugate gradients. The code is parailelized by domain decomposition of physical space into disjoint subdomains (SDS). Each processor receives its own SD plus a border of ghost cells. Results are presented on a problem coupling hydrodynamics to non-linear heat cond
Date: May 20, 1998
Creator: Milovich, J L & Shestakov, A
Partner: UNT Libraries Government Documents Department

Odyssey

Description: We present results obtained with the Odyssey simulation code. Odyssey is a 1, 2, and 3 dimensional AMR code using cartesian, cylindrical, and spherical coordinates. The results provide an interesting snapshot of Odyssey at this point in its development. Results include parallel performance and scaling, Eulerian hydrodynamics algorithm comparisons, ADI based diffusion solvers on hierarchical meshes, ECB treatment of material interfaces in diffusion solves.
Date: October 1, 1998
Creator: Braddy, D.; Brown, S.; Cook, G.; Kueny, C.; Lambert, M.; Peters, D. et al.
Partner: UNT Libraries Government Documents Department

Simulations of implosions with a 3D, parallel, unstructured-grid, radiation-hydrodynamics code

Description: An unstructured-grid, radiation-hydrodynamics code is used to simulate implosions. Although most of the problems are spherically symmetric, they are run on 3D, unstructured grids in order to test the code�s ability to maintain spherical symmetry of the converging waves. Three problems, of increasing complexity, are presented. In the first, a cold, spherical, ideal gas bubble is imploded by an enclosing high pressure source. For the second, we add non-linear heat conduction and drive the implosion with twelve laser beams centered on the vertices of an icosahedron. In the third problem, a NIF capsule is driven with a Planckian radiation source.
Date: December 28, 1998
Creator: Kaiser, T. B.; Milovich, J. L.; Prasad, M. K.; Rathkopf, J. & Shestakov, A. I.
Partner: UNT Libraries Government Documents Department

Overcoming Scalability Challenges for Tool Daemon Launching

Description: Many tools that target parallel and distributed environments must co-locate a set of daemons with the distributed processes of the target application. However, efficient and portable deployment of these daemons on large scale systems is an unsolved problem. We overcome this gap with LaunchMON, a scalable, robust, portable, secure, and general purpose infrastructure for launching tool daemons. Its API allows tool builders to identify all processes of a target job, launch daemons on the relevant nodes and control daemon interaction. Our results show that Launch-MON scales to very large daemon counts and substantially enhances performance over existing ad hoc mechanisms.
Date: February 15, 2008
Creator: Ahn, D H; Arnold, D C; de Supinski, B R; Lee, G L; Miller, B P & Schulz, M
Partner: UNT Libraries Government Documents Department

Massively Parallel Direct Simulation of Multiphase Flow

Description: The authors understanding of multiphase physics and the associated predictive capability for multi-phase systems are severely limited by current continuum modeling methods and experimental approaches. This research will deliver an unprecedented modeling capability to directly simulate three-dimensional multi-phase systems at the particle-scale. The model solves the fully coupled equations of motion governing the fluid phase and the individual particles comprising the solid phase using a newly discovered, highly efficient coupled numerical method based on the discrete-element method and the Lattice-Boltzmann method. A massively parallel implementation will enable the solution of large, physically realistic systems.
Date: August 10, 2000
Creator: COOK,BENJAMIN K.; PREECE,DALE S. & WILLIAMS,J.R.
Partner: UNT Libraries Government Documents Department

Programming in Fortran M

Description: Fortran M is a small set of extensions to Fortran that supports a modular approach to the construction of sequential and parallel programs. Fortran M programs use channels to plug together processes which may be written in Fortran M or Fortran 77. Processes communicate by sending and receiving messages on channels. Channels and processes can be created dynamically, but programs remain deterministic unless specialized nondeterministic constructs are used. Fortran M programs can execute on a range of sequential, parallel, and networked computers. This report incorporates both a tutorial introduction to Fortran M and a users guide for the Fortran M compiler developed at Argonne National Laboratory. The Fortran M compiler, supporting software, and documentation are made available free of charge by Argonne National Laboratory, but are protected by a copyright which places certain restrictions on how they may be redistributed. See the software for details. The latest version of both the compiler and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/fortran-m at info.mcs.anl.gov.
Date: August 1993
Creator: Foster, Ian; Olson, Robert & Tuecke, Steven
Partner: UNT Libraries Government Documents Department

Programming in Fortran M Revision 1

Description: Fortran M is a small set of extensions to Fortran that supports a modular approach to the construction of sequential and parallel programs. Fortran M programs use channels to plug together processes which may be written in Fortran M or Fortran 77. Processes communicate by sending and receiving messages on channels. Channels and processes can be created dynamically, but programs remain deterministic unless specialized nondeterministic constructs are used. Fortran M programs can execute on a range of sequential, parallel, and networked computers. This report incorporates both a tutorial introduction to Fortran M and a users guide for the Fortran M compiler developed at Argonne National Laboratory. The Fortran M compiler, supporting software, and documentation are made available free of charge by Argonne National Laboratory, but are protected by a copyright which places certain restrictions on how they may be redistributed. See the software for details. The latest version of both the compiler and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/fortran-m at info.mcs.anl.gov.
Date: October 1993
Creator: Foster, Ian; Olson, Robert & Tuecke, Steven
Partner: UNT Libraries Government Documents Department

Using a Transfer Function to Describe the Load-Balancing Problem

Description: The dynamic load-balancing problem for mesh-connected parallel computers can be clearly described by introducing a function that identifies how much work is to be transmitted between neighboring processors. This function is a solution to an elliptic problem for which a wealth of knowledge exists. The non-uniqueness of the solution to the load-balancing problem is made explicit.
Date: November 1993
Creator: Conley, Andrew J.
Partner: UNT Libraries Government Documents Department

Users manual for the Chameleon Parallel Programming Tools

Description: Message passing is a common method for writing programs for distributed-memory parallel computers. Unfortunately, the lack of a standard for message passing has hampered the construction of portable and efficient parallel programs. In an attempt to remedy this problem, a number of groups have developed their own message-passing systems, each with its own strengths and weaknesses. Chameleon is a second-generation system of this type. Rather than replacing these existing systems, Chameleon is meant to supplement them by providing a uniform way to access many of these systems. Chameleon`s goals are to (a) be very lightweight (low over-head), (b) be highly portable, and (c) help standardize program startup and the use of emerging message-passing operations such as collective operations on subsets of processors. Chameleon also provides a way to port programs written using PICL or Intel NX message passing to other systems, including collections of workstations. Chameleon is tracking the Message-Passing Interface (MPI) draft standard and will provide both an MPI implementation and an MPI transport layer. Chameleon provides support for heterogeneous computing by using p4 and PVM. Chameleon`s support for homogeneous computing includes the portable libraries p4, PICL, and PVM and vendor-specific implementation for Intel NX, IBM EUI (SP-1), and Thinking Machines CMMD (CM-5). Support for Ncube and PVM 3.x is also under development.
Date: June 1993
Creator: Gropp, William & Smith, Barry
Partner: UNT Libraries Government Documents Department

A distributed logic memory with two-dimensional access, as applied to a highly parallel processor

Description: Although more sophisticated designs of associative memories are not yet economically practical, with the dynamic advances in integrated circuitry currently taking place, the day appears not long off for an economical sophisticated associative memory to become a reality. This describes a general outline of a sophisticated DLM, but it also describes the actual logic involved in a building a working model. The design process involves formulating a set of commands sufficient to perform the desired algorithms, developing the logic necessary to implement these commands, and finally constructing a working model to test the logic.
Date: May 1976
Creator: Redwine, William V.
Partner: UNT Libraries