139 Matching Results

Search Results

Advanced search parameters have been applied.

A new scheduling algorithm for parallel sparse LU factorization with static pivoting

Description: In this paper we present a static scheduling algorithm for parallel sparse LU factorization with static pivoting. The algorithm is divided into mapping and scheduling phases, using the symmetric pruned graphs of L' and U to represent dependencies. The scheduling algorithm is designed for driving the parallel execution of the factorization on a distributed-memory architecture. Experimental results and comparisons with SuperLU{_}DIST are reported after applying this algorithm on real world application matrices on an IBM SP RS/6000 distributed memory machine.
Date: August 20, 2002
Creator: Grigori, Laura & Li, Xiaoye S.
Partner: UNT Libraries Government Documents Department

Expanding symmetric multiprocessor capability through gang scheduling

Description: Symmetric Multiprocessor (SMP) systems normally provide both space- sharing and time-sharing to insure high system utilization and good responsiveness. However the prevailing lack of concurrent scheduling for parallel programs precludes SMP use in addressing many large-scale problems. Tightly synchronized communications are impractical and normal time-sharing reduces the benefit of cache memory. Evidence gathered at Lawrence Livermore National Laboratory (LLNL) indicates that gang scheduling can increase the capability of SMP systems and parallel program performance without adverse impact upon system utilization or responsiveness.
Date: March 1, 1998
Creator: Jette, M.A.
Partner: UNT Libraries Government Documents Department

Feed Materials Production Center. Final phase-in report volume 9 of 15 management control system, October 25, 1985--December 31, 1985

Description: Well planned work is the key to success in managing a project or facility. Many large or small projects have management systems that are excellent in collecting history, such as monthly costs or shipments, while other systems produce spectacular plans for estimated costs and schedules. However, one of the tools for making management decisions is a control system that describes authorized work, schedules near term small increments of the work and identifies all resources that are needed to accomplish the work. That is the planning phase. Then, as the work is in process, the system periodically reports the status of the schedule and the cost. Then a comparison of plan to actual is analyzed and any significant variances are identified for management action, should it be required. The WMCO/DOE contract has included the requirement to implement a management control system known as Cost and Schedule Control System Criteria (C/SCSC), which is a system defined by DOE Orders. Thus, the intent of Task 9 of the WMCO Transition Plan was to study the management control system in place at FMPC and to prepare a plan that would enhance the system. The objective of the revised management control system would be to provide a usable management tool to measure accomplishments to plans and to identify Problem areas where management attention is needed. The system would also satisfy C/SCSC. This document presents the reports on the work tasks formulated to be part of the management system studies.
Date: January 17, 1986
Creator: North, C.L.
Partner: UNT Libraries Government Documents Department

Optimal time-critical scheduling via resource augmentation

Description: We consider two fundamental problems in dynamic scheduling: scheduling to meet deadlines in a preemptive multiprocessor setting, and scheduling to provide good response time in a number of scheduling environments. When viewed from the perspective of traditional worst-case analysis, no good on-line algorithms exist for these problems, and for some variants no good off-line algorithms exist unless {Rho} = {Nu}{Rho}. We study these problems using a relaxed notion of competitive analysis, introduced by Kalyanasundaram and Pruhs, in which the on-line algorithm is allowed more resources than the optimal off-line algorithm to which it is compared. Using this approach, we establish that several well-known on-line algorithms, that have poor performance from an absolute worst-case perspective, are optimal for the problems in question when allowed moderately more resources. For the optimization of average flow time, these are the first results of any sort, for any {Nu}{Rho}-hard version of the problem, that indicate that it might be possible to design good approximation algorithms.
Date: April 1, 1997
Creator: Phillips, C.A.; Stein, C.; Torng, E. & Wein, J.
Partner: UNT Libraries Government Documents Department

The interplay between call flow dynamics and the dissemination of QoS routing updates

Description: In this paper, the authors study the interplay between flow dynamics and Quality of Service (QoS) routing through examining its impact on call blocking probability in the context of the ATM PNNI protocol. The PNNI specification consists of a routing protocol, based upon OSPF, and a signaling protocol, based upon the ITU-T's B-ISDN signaling, i.e., Q.2931. In PNNI routing, the routing information exchanged includes link state information as well as ATM QoS state information such as maximum cell transfer delay (maxCTD), cell delay variation (CDV), and available cell rate (ACR). The exchange of routing information is done by controlled flooding. In PNNI, when a flow arrives at the entry of the network, the source switch uses its local view of the network to select a path which meets the flow's QoS requirements. If it cannot find a suitable path, the Generic Call Admission Control (GCAC) of the source switch rejects the flow. If a suitable path is found, the flow set-up procedure is invoked and every switch along the path performs Actual Connection Admission Control (ACAC) to determine whether it has the requested resources. If not, the flow is rejected. Otherwise, the resources are reserved. For very large networks, PNNI also supports recursive hierarchical routing. However due to the additional complexity of aggregating topology as well as QoS metrics, they consider only non-hierarchical networks in this study. Based on a simplified version of PNNI, they examine the relationship between the frequency of QoS state updates, the QoS-routing related control traffic overhead and the call blocking probability. For instance, if the source switch uses out-of-date information to select a path, a false blocking situation (i.e., the local routing table's view of the network does not reflect the current increased resource availability), or a false probing situation (i.e., the local routing table's ...
Date: June 1, 1998
Creator: Tsang, Rose P.
Partner: UNT Libraries Government Documents Department

Cost-effective data-parallel load balancing

Description: Load balancing algorithms improve a program`s performance on unbalanced datasets, but can degrade performance on balanced datasets, because unnecessary load redistributions occur. This paper presents a cost-effective data-parallel load balancing algorithm which performs load redistributions only when the possible savings outweigh the redistribution costs. Experiments with a data-parallel polygon renderer show a performance improvement of up to a factor of 33 on unbalanced datasets and a maximum performance loss of only 27 percent on balanced datasets when using this algorithm.
Date: May 1, 1995
Creator: Hansen, C.D. & Ahrens, J.P.
Partner: UNT Libraries Government Documents Department

Components and interfaces of a process management system for parallel programs.

Description: Parallel jobs are different from sequential jobs and require a different type of process management. We present here a process management system for parallel programs such as those written using MPI. A primary goal of the system, which we call MPD (for multipurpose daemon), is to be scalable. By this we mean that startup of interactive parallel jobs comprising thousands of processes is quick, that signals can be quickly delivered to processes, and that stdin, stdout, and stderr are managed intuitively. Our primary target is parallel machines made up of clusters of SMPs, but the system is also useful in more tightly integrated environments. We describe how MPD enables much faster startup and better runtime management of parallel jobs. We show how close control of stdio can support the easy implementation of a number of convenient system utilities, even a parallel debugger. We describe a simple but general interface that can be used to separate any process manager from a parallel library, which we use to keep MPD separate from MPICH.
Date: February 23, 2001
Creator: Butler, R.; Gropp, W. & Lusk, E.
Partner: UNT Libraries Government Documents Department

Scheduling jobs that arrive over time

Description: A natural and basic problem in scheduling theory is to provide good average quality of service to a stream of jobs that arrive over time. In this paper we consider the problem of scheduling n jobs that are released over time in order to minimize the average completion time of the set of jobs. In contrast to the problem of minimizing average completion time when all jobs are available at time 0, all the problems that we consider are NP-hard, and essentially nothing was known about constructing good approximations in polynomial time. We give the first constant-factor approximation algorithms for several variants of the single and parallel machine model. Many of the algorithms are based on interesting algorithmic and structural relationships between preemptive and nonpreemptive schedules and linear programming relaxations of both. Many of the algorithms generalize to the minimization of average weighted completion time as well.
Date: April 6, 1995
Creator: Phillips, C.; Stein, C. & Wein, J.
Partner: UNT Libraries Government Documents Department

Performance characteristics of gang scheduling in multiprogrammed environments

Description: Gang scheduling provides both space-slicing and time-slicing of computer resources for parallel programs. Each thread of execution from a parallel job is concurrently scheduled on an independent processor in order to achieve an optimal level of program performance. Time-slicing of parallel jobs provides for better overall system responsiveness and utilization than otherwise possible. Lawrence Livermore National Laboratory has deployed three generations of its gang scheduler on a variety of computing platforms. Results indicate the potential benefits of this technology to parallel processing are no less significant than time-sharing was in the 1960`s.
Date: November 1, 1997
Creator: Jette, M.A.
Partner: UNT Libraries Government Documents Department

QoS routing via multiple paths using bandwidth reservation

Description: The authors address the problem of computing a multipath, consisting of possibly overlapping paths, to transmit data from the source node s to the destination node d over a computer network while ensuring deterministic bounds on end-to-end delay or delivery rate.They consider two generic routing problems within the framework wherein bandwidth can be reserved, and guaranteed, once reserved, on various links of the communication network. The first problem requires that a message of finite length be transmitted from s to d within {tau} units of time. The second problem requires that a sequential message of r units be transmitted at a rate of {eta} such that maximum time difference between two units that are received out of order is no more than q. They propose a polynomial-time algorithm to the first problem based on an adaptation of the classical Ford-Fulkerson`s method. They present simulation results to illustrate the applicability of the proposed algorithm. They show the second problem to be NP-complete, and propose a polynomial-time approximately solution.
Date: January 1, 1998
Creator: Rao, N.S.V. & Batsell, S.G.
Partner: UNT Libraries Government Documents Department

Load balancing fictions, falsehoods and fallacies

Description: Effective use of a parallel computer requires that a calculation be carefully divided among the processors. This load balancing problem appears in many guises and has been a fervent area of research for the past decade or more. Although great progress has been made, and useful software tools developed, a number of challenges remain. It is the conviction of the author that these challenges will be easier to address if programmers first come to terms with some significant shortcomings in their current perspectives. This paper tries to identify several areas in which the prevailing point of view is either mistaken or insufficient. The goal is to motivate new ideas and directions for this important field.
Date: May 30, 2000
Partner: UNT Libraries Government Documents Department

Task parallelism and high-performance languages

Description: The definition of High Performance Fortran (HPF) is a significant event in the maturation of parallel computing: it represents the first parallel language that has gained widespread support from vendors and users. The subject of this paper is to incorporate support for task parallelism. The term task parallelism refers to the explicit creation of multiple threads of control, or tasks, which synchronize and communicate under programmer control. Task and data parallelism are complementary rather than competing programming models. While task parallelism is more general and can be used to implement algorithms that are not amenable to data-parallel solutions, many problems can benefit from a mixed approach, with for example a task-parallel coordination layer integrating multiple data-parallel computations. Other problems admit to both data- and task-parallel solutions, with the better solution depending on machine characteristics, compiler performance, or personal taste. For these reasons, we believe that a general-purpose high-performance language should integrate both task- and data-parallel constructs. The challenge is to do so in a way that provides the expressivity needed for applications, while preserving the flexibility and portability of a high-level language. In this paper, we examine and illustrate the considerations that motivate the use of task parallelism. We also describe one particular approach to task parallelism in Fortran, namely the Fortran M extensions. Finally, we contrast Fortran M with other proposed approaches and discuss the implications of this work for task parallelism and high-performance languages.
Date: March 1, 1996
Creator: Foster, I.
Partner: UNT Libraries Government Documents Department

Quantum robots plus environments.

Description: A quantum robot is a mobile quantum system, including an on board quantum computer and needed ancillary systems, that interacts with an environment of quantum systems. Quantum robots carry out tasks whose goals include making specified changes in the state of the environment or carrying out measurements on the environment. The environments considered so far, oracles, data bases, and quantum registers, are seen to be special cases of environments considered here. It is also seen that a quantum robot should include a quantum computer and cannot be simply a multistate head. A model of quantum robots and their interactions is discussed in which each task, as a sequence of alternating computation and action phases,is described by a unitary single time step operator T {approx} T{sub a} + T{sub c} (discrete space and time are assumed). The overall system dynamics is described as a sum over paths of completed computation (T{sub c}) and action (T{sub a}) phases. A simple example of a task, measuring the distance between the quantum robot and a particle on a 1D lattice with quantum phase path dispersion present, is analyzed. A decision diagram for the task is presented and analyzed.
Date: July 23, 1998
Creator: Benioff, P.
Partner: UNT Libraries Government Documents Department

The ANL/IBM SP scheduling system

Description: Approximately five years ago scientists discovered that modern UNLX workstations connected with ethernet and fiber networks could provide enough computational performance to compete with the supercomputers. As this concept became increasingly popular, the need for distributed queuing and scheduler systems became apparent. Systems such as DQS from Florida State University were developed and worked very well. Today however, supercomputers such as Argonne National Laboratory`s IBM SP system can provide more CPU and networking speed than can be obtained from these networks of workstations. Nevertheless, because modern super computers look like clusters of workstations developers felt that the scheduling systems previously used on clusters of workstations should still apply. After trying to apply some of these scheduling systems to Argonne`s SP environment it became obvious that these two computer environments have very different scheduling needs. Recognizing this need, and realizing that no one has addressed it, we at Argonne developed a new scheduling system. The approach taken in creating this system was unique in that user input and interaction were encouraged throughout the development process. Thus a scheduler was built that actually works the way the users want it to.
Date: February 1, 1995
Creator: Lifka, D.
Partner: UNT Libraries Government Documents Department

The network-enabled optimization system server

Description: Mathematical optimization is a technology under constant change and advancement, drawing upon the most efficient and accurate numerical methods to date. Further, these methods can be tailored for a specific application or generalized to accommodate a wider range of problems. This perpetual change creates an ever growing field, one that is often difficult to stay abreast of. Hence, the impetus behind the Network-Enabled Optimization System (NEOS) server, which aims to provide users, both novice and expert, with a guided tour through the expanding world of optimization. The NEOS server is responsible for bridging the gap between users and the optimization software they seek. More specifically, the NEOS server will accept optimization problems over the Internet and return a solution to the user either interactively or by e-mail. This paper discusses the current implementation of the server.
Date: August 1, 1995
Creator: Mesnier, M.P.
Partner: UNT Libraries Government Documents Department

A study of application sensitivity to variation in message passing latency and bandwidth

Description: This study measures the effects of changes in message latency and bandwidth for production-level codes on a current generation tightly coupled MPP, the Intel Paragon. Messages are sent multiple times to study the application sensitivity to variations in band - width and latency. This method preserves the effects of contention on the interconnection network. Two applications are studied, PCTH, a shock physics code developed at Sandia National Laboratories, and PSTSWM, a spectral shallow water code developed at Oak Ridge National Laboratory and Argonne National Laboratory. These codes are significant in that PCTH is a {open_quote}full physics{close_quotes} application code in production use, while PSTSWM serves as a parallel algorithm test bed and benchmark for production codes used in atmospheric modeling. They are also significant in that the message-passing behavior differs significantly between the two codes, each representing an important class of scientific message-passing applications.
Date: June 1, 1996
Creator: Worley, P.H.; Mackay, D.R.; Robinson, A.C. & Barragy, E.J.
Partner: UNT Libraries Government Documents Department

Users guide to the Argonne SP scheduling system

Description: During the past five years scientists discovered that modern UNIX workstations connected with ethernet and fiber networks could provide enough computational performance to compete with the supercomputers of the day. As this concept became increasingly popular, the need for distributed queuing and scheduling systems became apparent. Today, supercomputers, such as Argonne National Laboratory`s IBM SP system, can provide more CPU and networking speed than can be obtained from these networks of workstations. These modern supercomputers look like clusters of workstations, however, so developers felt that the scheduling systems that were previously used on clusters of workstations should still apply. After trying to apply some of these scheduling systems to Argonne`s SP environment, it became obvious that these two computer environments have very different scheduling needs. Recognizing this need and realizing that no one has addressed it, we developed a new scheduling system. The approach taken in creating this system was unique in that user input and interaction were encouraged throughout the development process. Thus, a scheduler was built that actually worked the way the users wanted it to work. This document serves a dual purpose. It is both a user`s guide and an administrator`s guide for the ANL SP scheduling system. Look for revisions to this guide that will be appearing.
Date: May 1, 1995
Creator: Lifka, D.A.; Henderson, M.W. & Rayl, K.
Partner: UNT Libraries Government Documents Department

MPI as a coordination layer for communicating HPF tasks

Description: Data-parallel languages such as High Performance Fortran (HPF) present a simple execution model in which a single thread of control performs high-level operations on distributed arrays. These languages can greatly ease the development of parallel programs. Yet there are large classes of applications for which a mixture of task and data parallelism is most appropriate. Such applications can be structured as collections of data-parallel tasks that communicate by using explicit message passing. Because the Message Passing Interface (MPI) defines standardized, familiar mechanisms for this communication model, the authors propose that HPF tasks communicate by making calls to a coordination library that provides an HPF binding for MPI. The semantics of a communication interface for sequential languages can be ambiguous when the interface is invoked from a parallel language; they show how these ambiguities can be resolved by describing one possible HPF binding for MPI. They then present the design of a library that implements this binding, discuss issues that influenced the design decisions, and evaluate the performance of a prototype HPF/MPI library using a communications microbenchmark and application kernel. Finally, they discuss how MPI features might be incorporated into the design framework.
Date: December 31, 1996
Creator: Foster, I.T.; Kohr, D.R. Jr.; Krishnaiyer, R. & Choudhary, A.
Partner: UNT Libraries Government Documents Department

Optimal randomized scheduling by replacement

Description: In the replacement scheduling problem, a system is composed of n processors drawn from a pool of p. The processors can become faulty while in operation and faulty processors never recover. A report is issued whenever a fault occurs. This report states only the existence of a fault but does not indicate its location. Based on this report, the scheduler can reconfigure the system and choose another set of n processors. The system operates satisfactorily as long as, upon report of a fault, the scheduler chooses n non-faulty processors. We provide a randomized protocol maximizing the expected number of faults the system can sustain before the occurrence of a crash. The optimality of the protocol is established by considering a closely related dual optimization problem. The game-theoretic technical difficulties that we solve in this paper are very general and encountered whenever proving the optimality of a randomized algorithm in parallel and distributed computation.
Date: May 1, 1996
Creator: Saias, I.
Partner: UNT Libraries Government Documents Department

Dependence driven execution for multiprogrammed multiprocessor

Description: Barrier synchronizations can be very expensive on multiprogramming environment because no process can go past a barrier until all the processes have arrived. If a process participating at a barrier is swapped out by the operating system, the rest of participating processes end up waiting for the swapped-out process. This paper presents a compile-time/run-time system that uses a dependence-driven execution to overlap the execution of computations separated by barriers so that the processes do not spend most of the time idling at the synchronization point.
Date: December 31, 1998
Creator: Vajracharya, S. & Grunwald, D.
Partner: UNT Libraries Government Documents Department

Instruction-level performance modeling and characterization of multimedia applications

Description: One of the challenges for characterizing and modeling realistic multimedia applications is the lack of access to source codes. On-chip performance counters effectively resolve this problem by monitoring run-time behaviors at the instruction-level. This paper presents a novel technique of characterizing and modeling workloads at the instruction level for realistic multimedia applications using hardware performance counters. A variety of instruction counts are collected from some multimedia applications, such as RealPlayer, GSM Vocoder, MPEG encoder/decoder, and speech synthesizer. These instruction counts can be used to form a set of abstract characteristic parameters directly related to a processor`s architectural features. Based on microprocessor architectural constraints and these calculated abstract parameters, the architectural performance bottleneck for a specific application can be estimated. Meanwhile, the bottleneck estimation can provide suggestions about viable architectural/functional improvement for certain workloads. The biggest advantage of this new characterization technique is a better understanding of processor utilization efficiency and architectural bottleneck for each application. This technique also provides predictive insight of future architectural enhancements and their affect on current codes. In this paper the authors also attempt to model architectural effect on processor utilization without memory influence. They derive formulas for calculating CPI{sub 0}, CPI without memory effect, and they quantify utilization of architectural parameters. These equations are architecturally diagnostic and predictive in nature. Results provide promise in code characterization, and empirical/analytical modeling.
Date: June 1999
Creator: Luo, Y. & Cameron, K. W.
Partner: UNT Libraries Government Documents Department

Performance measurement and analysis techniques for parallel and distributed programs. Technical progress report, August 1, 1996--May 31, 1997

Description: This report summarizes the authors technical progress during the first year of their three year proposal. This research is centered about the Paradyn Parallel Performance Tools. They include a summary of research accomplishments, technical transfers, and a list of papers written under this grant. The authors have made good progress on their research goals and have a strong outlook for the coming year.
Date: May 28, 1997
Creator: Miller, B.P. & Hollingsworth, J.K.
Partner: UNT Libraries Government Documents Department

A distributed computing environment with support for constraint-based task scheduling and scientific experimentation

Description: This paper describes a computing environment which supports computer-based scientific research work. Key features include support for automatic distributed scheduling and execution and computer-based scientific experimentation. A new flexible and extensible scheduling technique that is responsive to a user`s scheduling constraints, such as the ordering of program results and the specification of task assignments and processor utilization levels, is presented. An easy-to-use constraint language for specifying scheduling constraints, based on the relational database query language SQL, is described along with a search-based algorithm for fulfilling these constraints. A set of performance studies show that the environment can schedule and execute program graphs on a network of workstations as the user requests. A method for automatically generating computer-based scientific experiments is described. Experiments provide a concise method of specifying a large collection of parameterized program executions. The environment achieved significant speedups when executing experiments; for a large collection of scientific experiments an average speedup of 3.4 on an average of 5.5 scheduled processors was obtained.
Date: April 1, 1997
Creator: Ahrens, J.P.; Shapiro, L.G. & Tanimoto, S.L.
Partner: UNT Libraries Government Documents Department

PNNI routing support for ad hoc mobile networking: A flat architecture

Description: This contribution extends the Outside Nodal Hierarchy List (ONHL) procedures described in ATM Form Contribution 97-0766. These extensions allow multiple mobile networks to form either an ad hoc network or an extension of a fixed PNNI infrastructure. This contribution covers the simplest case where the top-most Logical Group Nodes (LGNs), in those mobile networks, all reside at the same level in a PNNI hierarchy. Future contributions will cover the general case where those top-most LGNs reside at different hierarchy levels. This contribution considers a flat ad hoc network architecture--in the sense that each mobile network always participates in the PNNI hierarchy at the preconfigured level of its top-most LGN.
Date: December 1, 1997
Creator: Martinez, L.; Sholander, P. & Tolendino, L.
Partner: UNT Libraries Government Documents Department