Search Results

Advanced search parameters have been applied.
open access

Accurately measuring MPI broadcasts in a computational grid

Description: An MPI library's implementation of broadcast communication can significantly affect the performance of applications built with that library. In order to choose between similar implementations or to evaluate available libraries, accurate measurements of broadcast performance are required. As we demonstrate, existing methods for measuring broadcast performance are either inaccurate or inadequate. Fortunately, we have designed an accurate method for measuring broadcast performance, even in a chall… more
Date: May 6, 1999
Creator: T, Karonis N & de Supinski, B R
Partner: UNT Libraries Government Documents Department
open access

An Approach to Performance Prediction for Parallel Applications

Description: Accurately modeling and predicting performance for large-scale applications becomes increasingly difficult as system complexity scales dramatically. Analytic predictive models are useful, but are difficult to construct, usually limited in scope, and often fail to capture subtle interactions between architecture and software. In contrast, we employ multilayer neural networks trained on input data from executions on the target platform. This approach is useful for predicting many aspects of perfo… more
Date: May 17, 2005
Creator: Ipek, E; de Supinski, B R; Schulz, M & McKee, S A
Partner: UNT Libraries Government Documents Department
open access

The ASCI PSE Milepost: Run-Time Systems Performance Tests

Description: The Accelerated Strategic Computing Initiative (ASCI) Problem Solving Environment (PSE) consists of the tools and libraries needed for the development of ASCI simulation codes on ASCI machines. The recently completed ASCI PSE Milepost demonstrated that this software environment is available and functional at the scale used for application mileposts on ASCI White. As part of the PSE Milepost, we performed extensive performance testing of several critical run-time based systems. In this paper, we… more
Date: May 7, 2001
Creator: de Supinski, B R
Partner: UNT Libraries Government Documents Department
open access

Asynchronous Checkpoint Migration with MRNet in the Scalable Checkpoint / Restart Library

Description: Applications running on today's supercomputers tolerate failures by periodically saving their state in checkpoint files on stable storage, such as a parallel file system. Although this approach is simple, the overhead of writing the checkpoints can be prohibitive, especially for large-scale jobs. In this paper, we present initial results of an enhancement to our Scalable Checkpoint/Restart Library (SCR). We employ MRNet, a tree-based overlay network library, to transfer checkpoints from the com… more
Date: March 20, 2012
Creator: Mohror, K.; Moody, A. & de Supinski, B. R.
Partner: UNT Libraries Government Documents Department
open access

AutomaDeD: Automata-Based Debugging for Dissimilar Parallel Tasks

Description: Today's largest systems have over 100,000 cores, with million-core systems expected over the next few years. This growing scale makes debugging the applications that run on them a daunting challenge. Few debugging tools perform well at this scale and most provide an overload of information about the entire job. Developers need tools that quickly direct them to the root cause of the problem. This paper presents AutomaDeD, a tool that identifies which tasks of a large-scale application first mani… more
Date: March 23, 2010
Creator: Bronevetsky, G; Laguna, I; Bagchi, S; de Supinski, B R; Ahn, D & Schulz, M
Partner: UNT Libraries Government Documents Department
open access

Automatic Fault Characterization via Abnormality-Enhanced Classification

Description: Enterprise and high-performance computing systems are growing extremely large and complex, employing hundreds to hundreds of thousands of processors and software/hardware stacks built by many people across many organizations. As the growing scale of these machines increases the frequency of faults, system complexity makes these faults difficult to detect and to diagnose. Current system management techniques, which focus primarily on efficient data access and query mechanisms, require system adm… more
Date: December 20, 2010
Creator: Bronevetsky, G; Laguna, I & de Supinski, B R
Partner: UNT Libraries Government Documents Department
open access

Benchmarking Pthreads performance

Description: The importance of the performance of threads libraries is growing as clusters of shared memory machines become more popular POSIX threads, or Pthreads, is an industry threads library standard. We have implemented the first Pthreads benchmark suite. In addition to measuring basic thread functions, such as thread creation, we apply the L.ogP model to standard Pthreads communication mechanisms. We present the results of our tests for several hardware platforms. These results demonstrate that the p… more
Date: April 27, 1999
Creator: May, J M & de Supinski, B R
Partner: UNT Libraries Government Documents Department
open access

BlueGene/L Applications: Parallelism on a Massive Scale

Description: BlueGene/L (BG/L), developed through a partnership between IBM and Lawrence Livermore National Laboratory (LLNL), is currently the world's largest system both in terms of scale with 131,072 processors and absolute performance with a peak rate of 367 TFlop/s. BG/L has led the Top500 list the last four times with a Linpack rate of 280.6 TFlop/s for the full machine installed at LLNL and is expected to remain the fastest computer in the next few editions. However, the real value of a machine like … more
Date: September 8, 2006
Creator: de Supinski, B. R.; Schulz, M.; Bulatov, V. V.; Cabot, W.; Chan, B.; Cook, A. W. et al.
Partner: UNT Libraries Government Documents Department
open access

A C++ Infrastructure for Automatic Introduction and Translation of OpenMP Directives

Description: In this paper we describe a C++ infrastructure for source-to-source translation. We demonstrate the translation of a serial program with high-level abstractions to a lower-level parallel program in two separate phases. In the first phase OpenMP directives are introduced, driven by the semantics of high-level abstractions. Then the OpenMP directives are translated to a C++ program that explicitly creates and manages parallelism according to the specified directives. Both phases are implemented u… more
Date: July 28, 2003
Creator: Quinlan, D J; Scordan, M; Yi, Q & de Supinski, B R
Partner: UNT Libraries Government Documents Department
open access

A Case for Including Transactions in OpenMP

Description: Transactional Memory (TM) has received significant attention recently as a mechanism to reduce the complexity of shared memory programming. We explore the potential of TM to improve OpenMP applications. We combine a software TM (STM) system to support transactions with an OpenMP implementation to start thread teams and provide task and loop-level parallelization. We apply this system to two application scenarios that reflect realistic TM use cases. Our results with this system demonstrate that … more
Date: January 25, 2010
Creator: Wong, M.; Bihari, B. L.; de Supinski, B. R.; Wu, P.; Michael, M.; Liu, Y. et al.
Partner: UNT Libraries Government Documents Department
open access

CLOMP: Accurately Characterizing OpenMP Application Overheads

Description: Despite its ease of use, OpenMP has failed to gain widespread use on large scale systems, largely due to its failure to deliver sufficient performance. Our experience indicates that the cost of initiating OpenMP regions is simply too high for the desired OpenMP usage scenario of many applications. In this paper, we introduce CLOMP, a new benchmark to characterize this aspect of OpenMP implementations accurately. CLOMP complements the existing EPCC benchmark suite to provide simple, easy to unde… more
Date: February 11, 2008
Creator: Bronevetsky, G; Gyllenhaal, J & de Supinski, B
Partner: UNT Libraries Government Documents Department
open access

Detailed Modeling, Design, and Evaluation of a Scalable Multi-level Checkpointing System

Description: High-performance computing (HPC) systems are growing more powerful by utilizing more hardware components. As the system mean-time-before-failure correspondingly drops, applications must checkpoint more frequently to make progress. However, as the system memory sizes grow faster than the bandwidth to the parallel file system, the cost of checkpointing begins to dominate application run times. A potential solution to this problem is to use multi-level checkpointing, which employs multiple types o… more
Date: April 9, 2010
Creator: Moody, A T; Bronevetsky, G; Mohror, K M & de Supinski, B R
Partner: UNT Libraries Government Documents Department
open access

Dynamic Program Phase Detection in Distributed Shared-Memory Multiprocessors

Description: We present a novel hardware mechanism for dynamic program phase detection in distributed shared-memory (DSM) multiprocessors. We show that successful hardware mechanisms for phase detection in uniprocessors do not necessarily work well in DSM systems, since they lack the ability to incorporate the parallel application's global execution information and memory access behavior based on data distribution. We then propose a hardware extension to a well-known uniprocessor mechanism that significantl… more
Date: March 6, 2006
Creator: Ipek, E; Martinez, J F; de Supinski, B R; McKee, S A & Schulz, M
Partner: UNT Libraries Government Documents Department
open access

Dynamic Software Testing of MPI Applications with Umpire

Description: As evidenced by the popularity of MPI (Message Passing Interface), message passing is an effective programming technique for managing coarse-grained concurrency on distributed computers. Unfortunately, debugging message-passing applications can be difficult. Software complexity, data races, and scheduling dependencies can make programming errors challenging to locate with manual, interactive debugging techniques. This article describes Umpire, a new tool for detecting programming errors at runt… more
Date: July 24, 2000
Creator: Vetter, J & de Supinski, B
Partner: UNT Libraries Government Documents Department
open access

Exploiting Data Similarity to Reduce Memory Footprints

Description: Memory size has long limited large-scale applications on high-performance computing (HPC) systems. Since compute nodes frequently do not have swap space, physical memory often limits problem sizes. Increasing core counts per chip and power density constraints, which limit the number of DIMMs per node, have exacerbated this problem. Further, DRAM constitutes a significant portion of overall HPC system cost. Therefore, instead of adding more DRAM to the nodes, mechanisms to manage memory usage mo… more
Date: January 28, 2011
Creator: Biswas, S; de Supinski, B R; Schulz, M; Franklin, D; Sherwood, T & Chong, F T
Partner: UNT Libraries Government Documents Department
Back to Top of Screen