613 Matching Results

Search Results

Advanced search parameters have been applied.

Supercomputers: government plans & policies

Description: This background paper was requested by the House Committee on Science and Technology. Within the past 2 years, there has been a notable expansion in the Federal supercomputer programs and this increase prompted the committee’s request for a review of issues of resource management, networking, and the role of supercomputers in basic research.
Date: March 1986
Creator: United States. Congress. Office of Technology Assessment.
Partner: UNT Libraries Government Documents Department

Massively-parallel electrical-conductivity imaging of hydrocarbonsusing the Blue Gene/L supercomputer

Description: Large-scale controlled source electromagnetic (CSEM)three-dimensional (3D) geophysical imaging is now receiving considerableattention for electrical conductivity mapping of potential offshore oiland gas reservoirs. To cope with the typically large computationalrequirements of the 3D CSEM imaging problem, our strategies exploitcomputational parallelism and optimized finite-difference meshing. Wereport on an imaging experiment, utilizing 32,768 tasks/processors on theIBM Watson Research Blue Gene/L (BG/L) supercomputer. Over a 24-hourperiod, we were able to image a large scale marine CSEM field data setthat previously required over four months of computing time ondistributed clusters utilizing 1024 tasks on an Infiniband fabric. Thetotal initial data misfit could be decreased by 67 percent within 72completed inversion iterations, indicating an electrically resistiveregion in the southern survey area below a depth of 1500 m below theseafloor. The major part of the residual misfit stems from transmitterparallel receiver components that have an offset from the transmittersail line (broadside configuration). Modeling confirms that improvedbroadside data fits can be achieved by considering anisotropic electricalconductivities. While delivering a satisfactory gross scale image for thedepths of interest, the experiment provides important evidence for thenecessity of discriminating between horizontal and verticalconductivities for maximally consistent 3D CSEM inversions.
Date: May 16, 2007
Creator: Commer, M.; Newman, G.A.; Carazzone, J.J.; Dickens, T.A.; Green,K.E.; Wahrmund, L.A. et al.
Partner: UNT Libraries Government Documents Department

The case of the missing supercomputer performance : achieving optimal performance on the 8, 192 processors of ASCI Q

Description: In this paper we describe how we improved the effective performance of ASCI Q, the world's second-fastest supercomputer, to meet our expectations. Using an arsenal of performance-analysis techniques including analytical models, custom microbenchmarks, full applications, and simulators, we succeeded in observing a serious-but previously undetectable-performance problem. We identified the source of the problem, eliminated the problem, and 'closed the loop' by demonstrating improved application performance. We present our methodology and provide insight into performance analysis that is immediately applicable to other large-scale cluster-based supercomputers.
Date: January 1, 2003
Creator: Petrini, F. (Fabrizio); Kerbyson, D. J. (Darren J.) & Pakin, S. D. (Scott D.)
Partner: UNT Libraries Government Documents Department

BER Science Network Requirements Workshop -- July 26-27,2007

Description: The Energy Sciences Network (ESnet) is the primary provider of network connectivity for the US Department of Energy Office of Science, the single largest supporter of basic research in the physical sciences in the United States of America. In support of the Office of Science programs, ESnet regularly updates and refreshes its understanding of the networking requirements of the instruments, facilities, scientists, and science programs that it serves. This focus has helped ESnet to be a highly successful enabler of scientific discovery for over 20 years. In July 2007, ESnet and the Biological and Environmental Research (BER) Program Office of the DOE Office of Science organized a workshop to characterize the networking requirements of the science programs funded by the BER Program Office. These included several large programs and facilities, including Atmospheric Radiation Measurement (ARM) Program and the ARM Climate Research Facility (ACRF), Bioinformatics and Life Sciences Programs, Climate Sciences Programs, the Environmental Molecular Sciences Laboratory at PNNL, the Joint Genome Institute (JGI). National Center for Atmospheric Research (NCAR) also participated in the workshop and contributed a section to this report due to the fact that a large distributed data repository for climate data will be established at NERSC, ORNL and NCAR, and this will have an effect on ESnet. Workshop participants were asked to codify their requirements in a 'case study' format, which summarizes the instruments and facilities necessary for the science and the process by which the science is done, with emphasis on the network services needed and the way in which the network is used. Participants were asked to consider three time scales in their case studies--the near term (immediately and up to 12 months in the future), the medium term (3-5 years in the future), and the long term (greater than 5 years in the future). ...
Date: February 1, 2008
Creator: Tierney, Brian L. & Dart, Eli
Partner: UNT Libraries Government Documents Department

Scientific Application Performance on Candidate PetaScalePlatforms

Description: After a decade where HEC (high-end computing) capability was dominated by the rapid pace of improvements to CPU clock frequency, the performance of next-generation supercomputers is increasingly differentiated by varying interconnect designs and levels of integration. Understanding the tradeoffs of these system designs, in the context of high-end numerical simulations, is a key step towards making effective petascale computing a reality. This work represents one of the most comprehensive performance evaluation studies to date on modern HEC systems, including the IBM Power5, AMD Opteron, IBM BG/L, and Cray X1E. A novel aspect of our study is the emphasis on full applications, with real input data at the scale desired by computational scientists in their unique domain. We examine six candidate ultra-scale applications, representing a broad range of algorithms and computational structures. Our work includes the highest concurrency experiments to date on five of our six applications, including 32K processor scalability for two of our codes and describe several successful optimizations strategies on BG/L, as well as improved X1E vectorization. Overall results indicate that our evaluated codes have the potential to effectively utilize petascale resources; however, several applications will require reengineering to incorporate the additional levels of parallelism necessary to achieve the vast concurrency of upcoming ultra-scale systems.
Date: January 1, 2007
Creator: Oliker, Leonid; Canning, Andrew; Carter, Jonathan; Iancu, Costin; Lijewski, Michael; Kamil, Shoaib et al.
Partner: UNT Libraries Government Documents Department

Plasma Science Contribution to the SCaLeS Report

Description: In June of 2003, about 250 computational scientists and mathematicians being funded by the DOE Office of Science met in Arlington, VA, to attend a 2-day workshop on the Science Case for Large-scale Simulation (SCaLeS). This document was the output of the Plasma Science Section of that workshop. The conclusion is that exciting and important progress can be made in the field of Plasma Science if computer power continues to grow and algorithmic development continues to occur at the rate that it has in the past. Full simulations of burning plasma experiments could be possible in the 5-10 year time frame if an aggressive growth program is launched in this area.
Date: October 9, 2003
Creator: Jardin, S.C.
Partner: UNT Libraries Government Documents Department

New approaches for modeling type Ia supernovae

Description: Type Ia supernovae (SNe Ia) are the largest thermonuclearexplosions in the Universe. Their light output can be seen across greatstances and has led to the discovery that the expansion rate of theUniverse is accelerating. Despite the significance of SNe Ia, there arestill a large number of uncertainties in current theoretical models.Computational modeling offers the promise to help answer the outstandingquestions. However, even with today's supercomputers, such calculationsare extremely challenging because of the wide range of length and timescales. In this paper, we discuss several new algorithms for simulationsof SNe Ia and demonstrate some of their successes.
Date: June 25, 2007
Creator: Zingale, Michael; Almgren, Ann S.; Bell, John B.; Day, Marcus S.; Rendleman, Charles A. & Woosley, Stan
Partner: UNT Libraries Government Documents Department

NERSC News

Description: This month's issue has the following 3 articles: (1) Kathy Yelick is the new director for the National Energy Research Scientific Computing Center (NERSC); (2) Head of the Class--A cray XT4 named Franklin passes a rigorous test and becomes an official member of the NERSC supercomputing family; and (3) Model Comparisons--Fusion research group published several recent papers examining the results of two types of turbulence simulations and their impact on tokamak designs.
Date: November 25, 2007
Creator: Wang, Ucilia
Partner: UNT Libraries Government Documents Department

Asynchronous Checkpoint Migration with MRNet in the Scalable Checkpoint / Restart Library

Description: Applications running on today's supercomputers tolerate failures by periodically saving their state in checkpoint files on stable storage, such as a parallel file system. Although this approach is simple, the overhead of writing the checkpoints can be prohibitive, especially for large-scale jobs. In this paper, we present initial results of an enhancement to our Scalable Checkpoint/Restart Library (SCR). We employ MRNet, a tree-based overlay network library, to transfer checkpoints from the compute nodes to the parallel file system asynchronously. This enhancement increases application efficiency by removing the need for an application to block while checkpoints are transferred to the parallel file system. We show that the integration of SCR with MRNet can reduce the time spent in I/O operations by as much as 15x. However, our experiments exposed new scalability issues with our initial implementation. We discuss the sources of the scalability problems and our plans to address them.
Date: March 20, 2012
Creator: Mohror, K.; Moody, A. & de Supinski, B. R.
Partner: UNT Libraries Government Documents Department

Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms

Description: We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific-optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD quad-core, AMD dual-core, and Intel quad-core designs, the heterogeneous STI Cell, as well as one of the first scientific studies of the highly multithreaded Sun Victoria Falls (a Niagara2 SMP). We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural trade-offs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.
Date: October 16, 2008
Creator: Williams, Samuel; Oliker, Leonid; Vuduc, Richard; Shalf, John; Yelick, Katherine & Demmel, James
Partner: UNT Libraries Government Documents Department

Soft Error Vulnerability of Iterative Linear Algebra Methods

Description: Devices are increasingly vulnerable to soft errors as their feature sizes shrink. Previously, soft error rates were significant primarily in space and high-atmospheric computing. Modern architectures now use features so small at sufficiently low voltages that soft errors are becoming important even at terrestrial altitudes. Due to their large number of components, supercomputers are particularly susceptible to soft errors. Since many large scale parallel scientific applications use iterative linear algebra methods, the soft error vulnerability of these methods constitutes a large fraction of the applications overall vulnerability. Many users consider these methods invulnerable to most soft errors since they converge from an imprecise solution to a precise one. However, we show in this paper that iterative methods are vulnerable to soft errors, exhibiting both silent data corruptions and poor ability to detect errors. Further, we evaluate a variety of soft error detection and tolerance techniques, including checkpointing, linear matrix encodings, and residual tracking techniques.
Date: January 19, 2008
Creator: Bronevetsky, G & de Supinski, B
Partner: UNT Libraries Government Documents Department

Compiler-Enhanced Incremental Checkpointing for OpenMP Applications

Description: As modern supercomputing systems reach the peta-flop performance range, they grow in both size and complexity. This makes them increasingly vulnerable to failures from a variety of causes. Checkpointing is a popular technique for tolerating such failures, enabling applications to periodically save their state and restart computation after a failure. Although a variety of automated system-level checkpointing solutions are currently available to HPC users, manual application-level checkpointing remains more popular due to its superior performance. This paper improves performance of automated checkpointing via a compiler analysis for incremental checkpointing. This analysis, which works with both sequential and OpenMP applications, reduces checkpoint sizes by as much as 80% and enables asynchronous checkpointing.
Date: January 21, 2008
Creator: Bronevetsky, G; Marques, D; Pingali, K; Rugina, R & McKee, S A
Partner: UNT Libraries Government Documents Department

Non-preconditioned conjugate gradient on cell and FPGA based hybrid supercomputer nodes

Description: This work presents a detailed implementation of a double precision, non-preconditioned, Conjugate Gradient algorithm on a Roadrunner heterogeneous supercomputer node. These nodes utilize the Cell Broadband Engine Architecture{sup TM} in conjunction with x86 Opteron{sup TM} processors from AMD. We implement a common Conjugate Gradient algorithm, on a variety of systems, to compare and contrast performance. Implementation results are presented for the Roadrunner hybrid supercomputer, SRC Computers, Inc. MAPStation SRC-6 FPGA enhanced hybrid supercomputer, and AMD Opteron only. In all hybrid implementations wall clock time is measured, including all transfer overhead and compute timings.
Date: January 1, 2009
Creator: Dubois, David H; Dubois, Andrew J; Boorman, Thomas M & Connor, Carolyn M
Partner: UNT Libraries Government Documents Department

High-performance networking.

Description: Our research in high-performance networking addresses the communication needs of Grand Challenge applications over a wide range of environments - wide-area network (WAN) in support of grids and local-area network (LAN) and system-area network (SAN) in support of network of workstations and clusters. While the high-performance computing (HPC) community generally groups clusters and grids together as commodity supercomputing infrastructures, the networking aspects of clusters and grids are fundamentally different. In networks of workstations and clusters, the primary communication bottleneck is the host-interface bottleneck whereas in grids, the bottlenecks are adaptation bottlenecks in particular, flow control and congestion control. To address these problems, we offer a set of solutions specifically tailored to each of the aforementioned environments.
Date: January 1, 2001
Creator: Feng, W. C. (Wu-Chun)
Partner: UNT Libraries Government Documents Department

The NAS Parallel Benchmarks

Description: The NAS Parallel Benchmarks (NPB) are a suite of parallel computer performance benchmarks. They were originally developed at the NASA Ames Research Center in 1991 to assess high-end parallel supercomputers. Although they are no longer used as widely as they once were for comparing high-end system performance, they continue to be studied and analyzed a great deal in the high-performance computing community. The acronym 'NAS' originally stood for the Numerical Aeronautical Simulation Program at NASA Ames. The name of this organization was subsequently changed to the Numerical Aerospace Simulation Program, and more recently to the NASA Advanced Supercomputing Center, although the acronym remains 'NAS.' The developers of the original NPB suite were David H. Bailey, Eric Barszcz, John Barton, David Browning, Russell Carter, LeoDagum, Rod Fatoohi, Samuel Fineberg, Paul Frederickson, Thomas Lasinski, Rob Schreiber, Horst Simon, V. Venkatakrishnan and Sisira Weeratunga. The original NAS Parallel Benchmarks consisted of eight individual benchmark problems, each of which focused on some aspect of scientific computing. The principal focus was in computational aerophysics, although most of these benchmarks have much broader relevance, since in a much larger sense they are typical of many real-world scientific computing applications. The NPB suite grew out of the need for a more rational procedure to select new supercomputers for acquisition by NASA. The emergence of commercially available highly parallel computer systems in the late 1980s offered an attractive alternative to parallel vector supercomputers that had been the mainstay of high-end scientific computing. However, the introduction of highly parallel systems was accompanied by a regrettable level of hype, not only on the part of the commercial vendors but even, in some cases, by scientists using the systems. As a result, it was difficult to discern whether the new systems offered any fundamental performance advantage over vector supercomputers, and, if ...
Date: November 15, 2009
Creator: Bailey, David H.
Partner: UNT Libraries Government Documents Department

Scalable Performance Measurement and Analysis

Description: Concurrency levels in large-scale, distributed-memory supercomputers are rising exponentially. Modern machines may contain 100,000 or more microprocessor cores, and the largest of these, IBM's Blue Gene/L, contains over 200,000 cores. Future systems are expected to support millions of concurrent tasks. In this dissertation, we focus on efficient techniques for measuring and analyzing the performance of applications running on very large parallel machines. Tuning the performance of large-scale applications can be a subtle and time-consuming task because application developers must measure and interpret data from many independent processes. While the volume of the raw data scales linearly with the number of tasks in the running system, the number of tasks is growing exponentially, and data for even small systems quickly becomes unmanageable. Transporting performance data from so many processes over a network can perturb application performance and make measurements inaccurate, and storing such data would require a prohibitive amount of space. Moreover, even if it were stored, analyzing the data would be extremely time-consuming. In this dissertation, we present novel methods for reducing performance data volume. The first draws on multi-scale wavelet techniques from signal processing to compress systemwide, time-varying load-balance data. The second uses statistical sampling to select a small subset of running processes to generate low-volume traces. A third approach combines sampling and wavelet compression to stratify performance data adaptively at run-time and to reduce further the cost of sampled tracing. We have integrated these approaches into Libra, a toolset for scalable load-balance analysis. We present Libra and show how it can be used to analyze data from large scientific applications scalably.
Date: October 27, 2009
Creator: Gamblin, T
Partner: UNT Libraries Government Documents Department

Detecting Distributed Scans Using High-Performance Query-DrivenVisualization

Description: Modern forensic analytics applications, like network trafficanalysis, perform high-performance hypothesis testing, knowledgediscovery and data mining on very large datasets. One essential strategyto reduce the time required for these operations is to select only themost relevant data records for a given computation. In this paper, wepresent a set of parallel algorithms that demonstrate how an efficientselection mechanism -- bitmap indexing -- significantly speeds up acommon analysist ask, namely, computing conditional histogram on verylarge datasets. We present a thorough study of the performancecharacteristics of the parallel conditional histogram algorithms. Asacase study, we compute conditional histograms for detecting distributedscans hidden in a dataset consisting of approximately 2.5 billion networkconnection records. We show that these conditional histograms can becomputed on interactive timescale (i.e., in seconds). We also show how toprogressively modify the selection criteria to narrow the analysis andfind the sources of the distributed scans.
Date: September 1, 2006
Creator: Stockinger, Kurt; Bethel, E. Wes; Campbell, Scott; Dart, Eli & Wu,Kesheng
Partner: UNT Libraries Government Documents Department

Comparison of the CRAY X-MP-4, Fujitsu VP-200, and Hitachi S-810/20 : an Argonne Perspective

Description: A set of programs, gathered from major Argonne computer users, was run on the current generation of supercomputers: the CRAY X-MP-4, Fujitsu VP-200, and Hitachi S-810/20. The results show that a single processor of a CRAY X-MP-4 is a consistently strong performer over a wide range of problems. The Fujitsu and Hitachi excel on highly vectorized programs and offer an attractive opportunity to sites with IBM-compatible computers.
Date: October 1985
Creator: Dongarra, J. J. & Hinds, Alan
Partner: UNT Libraries Government Documents Department

TOP500 Supercomputers for November 2002

Description: 20th Edition of TOP500 List of World's Fastest Supercomputers Released MANNHEIM, Germany; KNOXVILLE, Tenn.;&BERKELEY, Calif. In what has become a much-anticipated event in the world of high-performance computing, the 20th edition of the TOP500 list of the world's fastest supercomputers was released today (November 15, 2002). The Earth Simulator supercomputer installed earlier this year at the Earth Simulator Center in Yokohama, Japan, is with its Linpack benchmark performance of 35.86 Tflop/s (trillions of calculations per second) retains the number one position. The No.2 and No.3 positions are held by two new, identical ASCI Q systems at Los Alamos National Laboratory (7.73Tflop/s each). These systems are built by Hewlett-Packard and based on the Alpha Server SC computer system.
Date: November 15, 2002
Creator: Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack & Simon, Horst D.
Partner: UNT Libraries Government Documents Department

TOP500 Sublist for November 2001

Description: 18th Edition of TOP500 List of World's Fastest Supercomputers Released MANNHEIM, GERMANY; KNOXVILLE, TENN.; BERKELEY, CALIF. In what has become a much-anticipated event in the world of high-performance computing, the 18th edition of the TOP500 list of the world's fastest supercomputers was released today (November 9, 2001). The latest edition of the twice-yearly ranking finds IBM as the leader in the field, with 32 percent in terms of installed systems and 37 percent in terms of total performance of all the installed systems. In a surprise move Hewlett-Packard captured the second place with 30 percent of the systems. Most of these systems are smaller in size and as a consequence HP's share of installed performance is smaller with 15 percent. This is still enough for second place in this category. SGI, Cray and Sun follow in the number of TOP500 systems with 41 (8 percent), 39 (8 percent), and 31 (6 percent) respectively. In the category of installed performance Cray Inc. keeps the third position with 11 percent ahead of SGI (8 percent) and Compaq (8 percent).
Date: November 9, 2001
Creator: Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack J. & Simon,Horst D.
Partner: UNT Libraries Government Documents Department

17th Edition of TOP500 List of World's Fastest SupercomputersReseased

Description: 17th Edition of TOP500 List of World's Fastest Supercomputers Released MANNHEIM, GERMANY; KNOXVILLE, TENN.; BERKELEY, CALIF. In what has become a much-anticipated event in the world of high-performance computing, the 17th edition of the TOP500 list of the world's fastest supercomputers was released today (June 21). The latest edition of the twice-yearly ranking finds IBM as the leader in the field, with 40 percent in terms of installed systems and 43 percent in terms of total performance of all the installed systems. In second place in terms of installed systems is Sun Microsystems with 16 percent, while Cray Inc. retained second place in terms of performance (13 percent). SGI Inc. was third both with respect to systems with 63 (12.6 percent) and performance (10.2 percent).
Date: June 21, 2001
Creator: Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack J. & Simon,Horst D.
Partner: UNT Libraries Government Documents Department