77 Matching Results

Search Results

Advanced search parameters have been applied.

in creator/contributor: "de Supinski, B"

open access

Accurately measuring MPI broadcasts in a computational grid

An MPI library's implementation of broadcast communication can significantly affect the performance of applications built with that library. In order to choose between similar implementations or to evaluate available libraries, accurate measurements of broadcast performance are required. As we demonstrate, existing methods for measuring broadcast performance are either inaccurate or inadequate. Fortunately, we have designed an accurate method for measuring broadcast performance, even in a chall… more

Date: May 6, 1999

Creator: T, Karonis N & de Supinski, B R

Item Type: Article

Partner: UNT Libraries Government Documents Department

open access

An Approach to Performance Prediction for Parallel Applications

Accurately modeling and predicting performance for large-scale applications becomes increasingly difficult as system complexity scales dramatically. Analytic predictive models are useful, but are difficult to construct, usually limited in scope, and often fail to capture subtle interactions between architecture and software. In contrast, we employ multilayer neural networks trained on input data from executions on the target platform. This approach is useful for predicting many aspects of perfo… more

Date: May 17, 2005

Creator: Ipek, E; de Supinski, B R; Schulz, M & McKee, S A

Item Type: Article

Partner: UNT Libraries Government Documents Department

open access

The ASCI PSE Milepost: Run-Time Systems Performance Tests

The Accelerated Strategic Computing Initiative (ASCI) Problem Solving Environment (PSE) consists of the tools and libraries needed for the development of ASCI simulation codes on ASCI machines. The recently completed ASCI PSE Milepost demonstrated that this software environment is available and functional at the scale used for application mileposts on ASCI White. As part of the PSE Milepost, we performed extensive performance testing of several critical run-time based systems. In this paper, we… more

Date: May 7, 2001

Creator: de Supinski, B R

Item Type: Article

Partner: UNT Libraries Government Documents Department

open access

Asynchronous Checkpoint Migration with MRNet in the Scalable Checkpoint / Restart Library

Applications running on today's supercomputers tolerate failures by periodically saving their state in checkpoint files on stable storage, such as a parallel file system. Although this approach is simple, the overhead of writing the checkpoints can be prohibitive, especially for large-scale jobs. In this paper, we present initial results of an enhancement to our Scalable Checkpoint/Restart Library (SCR). We employ MRNet, a tree-based overlay network library, to transfer checkpoints from the com… more

Date: March 20, 2012

Creator: Mohror, K.; Moody, A. & de Supinski, B. R.

Item Type: Article

Partner: UNT Libraries Government Documents Department

open access

AutomaDeD: Automata-Based Debugging for Dissimilar Parallel Tasks

Today's largest systems have over 100,000 cores, with million-core systems expected over the next few years. This growing scale makes debugging the applications that run on them a daunting challenge. Few debugging tools perform well at this scale and most provide an overload of information about the entire job. Developers need tools that quickly direct them to the root cause of the problem. This paper presents AutomaDeD, a tool that identifies which tasks of a large-scale application first mani… more

Date: March 23, 2010

Creator: Bronevetsky, G; Laguna, I; Bagchi, S; de Supinski, B R; Ahn, D & Schulz, M

Item Type: Article

Partner: UNT Libraries Government Documents Department

open access

Automatic Fault Characterization via Abnormality-Enhanced Classification

Enterprise and high-performance computing systems are growing extremely large and complex, employing hundreds to hundreds of thousands of processors and software/hardware stacks built by many people across many organizations. As the growing scale of these machines increases the frequency of faults, system complexity makes these faults difficult to detect and to diagnose. Current system management techniques, which focus primarily on efficient data access and query mechanisms, require system adm… more

Date: December 20, 2010

Creator: Bronevetsky, G; Laguna, I & de Supinski, B R

Item Type: Article

Partner: UNT Libraries Government Documents Department

open access

Benchmarking Pthreads performance

The importance of the performance of threads libraries is growing as clusters of shared memory machines become more popular POSIX threads, or Pthreads, is an industry threads library standard. We have implemented the first Pthreads benchmark suite. In addition to measuring basic thread functions, such as thread creation, we apply the L.ogP model to standard Pthreads communication mechanisms. We present the results of our tests for several hardware platforms. These results demonstrate that the p… more

Date: April 27, 1999

Creator: May, J M & de Supinski, B R

Item Type: Article

Partner: UNT Libraries Government Documents Department

open access

Benchmarking the Stack Trace Analysis Tool for BlueGene/L

No Description Available.

Date: October 3, 2007

Creator: Lee, G L; Ahn, D H; Arnold, D C; de Supinski, B R; Miller, B P & Schulz, M W

Item Type: Article

Partner: UNT Libraries Government Documents Department

open access

Beyond DVFS: A First Look at Performance Under a Hardware-Enforced Power Bound

No Description Available.

Date: March 5, 2012

Creator: Rountree, B. R.; Ahn, D. H.; de Supinski, B. R.; Lowenthal, D. K. & Schulz, M.

Item Type: Article

Partner: UNT Libraries Government Documents Department

open access

BlueGene/L Applications: Parallelism on a Massive Scale

BlueGene/L (BG/L), developed through a partnership between IBM and Lawrence Livermore National Laboratory (LLNL), is currently the world's largest system both in terms of scale with 131,072 processors and absolute performance with a peak rate of 367 TFlop/s. BG/L has led the Top500 list the last four times with a Linpack rate of 280.6 TFlop/s for the full machine installed at LLNL and is expected to remain the fastest computer in the next few editions. However, the real value of a machine like … more

Date: September 8, 2006

Creator: de Supinski, B. R.; Schulz, M.; Bulatov, V. V.; Cabot, W.; Chan, B.; Cook, A. W. et al.

Item Type: Article

Partner: UNT Libraries Government Documents Department

open access

A C++ Infrastructure for Automatic Introduction and Translation of OpenMP Directives

In this paper we describe a C++ infrastructure for source-to-source translation. We demonstrate the translation of a serial program with high-level abstractions to a lower-level parallel program in two separate phases. In the first phase OpenMP directives are introduced, driven by the semantics of high-level abstractions. Then the OpenMP directives are translated to a C++ program that explicitly creates and manages parallelism according to the specified directives. Both phases are implemented u… more

Date: July 28, 2003

Creator: Quinlan, D J; Scordan, M; Yi, Q & de Supinski, B R

Item Type: Article

Partner: UNT Libraries Government Documents Department

open access

A Case for Including Transactions in OpenMP

Transactional Memory (TM) has received significant attention recently as a mechanism to reduce the complexity of shared memory programming. We explore the potential of TM to improve OpenMP applications. We combine a software TM (STM) system to support transactions with an OpenMP implementation to start thread teams and provide task and loop-level parallelization. We apply this system to two application scenarios that reflect realistic TM use cases. Our results with this system demonstrate that … more

Date: January 25, 2010

Creator: Wong, M.; Bihari, B. L.; de Supinski, B. R.; Wu, P.; Michael, M.; Liu, Y. et al.

Item Type: Article

Partner: UNT Libraries Government Documents Department

open access

CLOMP: Accurately Characterizing OpenMP Application Overheads

Despite its ease of use, OpenMP has failed to gain widespread use on large scale systems, largely due to its failure to deliver sufficient performance. Our experience indicates that the cost of initiating OpenMP regions is simply too high for the desired OpenMP usage scenario of many applications. In this paper, we introduce CLOMP, a new benchmark to characterize this aspect of OpenMP implementations accurately. CLOMP complements the existing EPCC benchmark suite to provide simple, easy to unde… more

Date: February 11, 2008

Creator: Bronevetsky, G; Gyllenhaal, J & de Supinski, B

Item Type: Article

Partner: UNT Libraries Government Documents Department

open access

A Comparative Study of High-Performance Computing on the Cloud

No Description Available.

Date: April 9, 2013

Creator: Marathe, A.; Harris, R.; Lowenthal, D.; de Supinski, B.; Rountree, B.; Schulz, M. et al.

Item Type: Article

Partner: UNT Libraries Government Documents Department

open access

Design and Modeling of a Non-blocking Checkpointing System

No Description Available.

Date: May 2, 2012

Creator: Sato, K; Moody, A; Mohror, K; Maruyama, N; Gamblin, T; de Supinski, B R et al.

Item Type: Article

Partner: UNT Libraries Government Documents Department

open access

Design and Modeling of an Asynchronous Checkpointing System

No Description Available.

Date: June 25, 2012

Creator: Sato, K; Moody, A; Mohror, K; Gamblin, T; de Supinski, B; Maruyama, N et al.

Item Type: Article

Partner: UNT Libraries Government Documents Department

open access

Detailed Modeling, Design, and Evaluation of a Scalable Multi-level Checkpointing System

High-performance computing (HPC) systems are growing more powerful by utilizing more hardware components. As the system mean-time-before-failure correspondingly drops, applications must checkpoint more frequently to make progress. However, as the system memory sizes grow faster than the bandwidth to the parallel file system, the cost of checkpointing begins to dominate application run times. A potential solution to this problem is to use multi-level checkpointing, which employs multiple types o… more

Date: April 9, 2010

Creator: Moody, A T; Bronevetsky, G; Mohror, K M & de Supinski, B R

Item Type: Report

Partner: UNT Libraries Government Documents Department

open access

Dynamic Program Phase Detection in Distributed Shared-Memory Multiprocessors

We present a novel hardware mechanism for dynamic program phase detection in distributed shared-memory (DSM) multiprocessors. We show that successful hardware mechanisms for phase detection in uniprocessors do not necessarily work well in DSM systems, since they lack the ability to incorporate the parallel application's global execution information and memory access behavior based on data distribution. We then propose a hardware extension to a well-known uniprocessor mechanism that significantl… more

Date: March 6, 2006

Creator: Ipek, E; Martinez, J F; de Supinski, B R; McKee, S A & Schulz, M

Item Type: Article

Partner: UNT Libraries Government Documents Department

open access

Dynamic Software Testing of MPI Applications with Umpire

As evidenced by the popularity of MPI (Message Passing Interface), message passing is an effective programming technique for managing coarse-grained concurrency on distributed computers. Unfortunately, debugging message-passing applications can be difficult. Software complexity, data races, and scheduling dependencies can make programming errors challenging to locate with manual, interactive debugging techniques. This article describes Umpire, a new tool for detecting programming errors at runt… more

Date: July 24, 2000

Creator: Vetter, J & de Supinski, B

Item Type: Article

Partner: UNT Libraries Government Documents Department

open access

Efficient and Scalable Retrieval Techniques for Global File Properties

No Description Available.

Date: April 30, 2012

Creator: Ahn, D H; Brim, M; de Supinski, B R; Gamblin, T; Lee, G L; LeGendre, M P et al.

Item Type: Article

Partner: UNT Libraries Government Documents Department

open access

Efficient MPI Support for Advanced Hybrid Programming Models

No Description Available.

Date: June 4, 2010

Creator: Hoefler, T; Bronevetsky, G; Barett, B; de Supinski, B R & Lumsdaine, A

Item Type: Article

Partner: UNT Libraries Government Documents Department

open access

Exascale Algorithms for Generalized MPI_Comm_split

No Description Available.

Date: May 19, 2011

Creator: Moody, A T; Ahn, D H & de Supinski, B R

Item Type: Article

Partner: UNT Libraries Government Documents Department

open access

Exploitation of Dynamic Communication Patterns through Static Analysis

No Description Available.

Date: June 27, 2010

Creator: Preissl, R; de Supinski, B; Schulz, M; Quinlan, D; Kranzlmueller, D & Panas, T

Item Type: Article

Partner: UNT Libraries Government Documents Department

open access

Exploiting Data Similarity to Reduce Memory Footprints

Memory size has long limited large-scale applications on high-performance computing (HPC) systems. Since compute nodes frequently do not have swap space, physical memory often limits problem sizes. Increasing core counts per chip and power density constraints, which limit the number of DIMMs per node, have exacerbated this problem. Further, DRAM constitutes a significant portion of overall HPC system cost. Therefore, instead of adding more DRAM to the nodes, mechanisms to manage memory usage mo… more

Date: January 28, 2011

Creator: Biswas, S; de Supinski, B R; Schulz, M; Franklin, D; Sherwood, T & Chong, F T

Item Type: Article

Partner: UNT Libraries Government Documents Department