138 Matching Results

Search Results

Advanced search parameters have been applied.

The Performance Effect of Multi-core on ScientificApplications

Description: The historical trend of increasing single CPU performancehas given way to roadmap of increasing core count. The challenge ofeffectively utilizing these multi-core chips is just starting to beexplored by vendors and application developers alike. In this study, wepresent some performance measurements of several complete scientificapplications on single and dual core Cray XT3 and XT4 systems with a viewto characterizing the effects of switching to multi-core chips. Weconsider effects within a node by using applications run at lowconcurrencies, and also effects on node-interconnect interaction usinghigher concurrency results. Finally, we construct a simple performancemodel based on the principle on-chip shared resource--memorybandwidth--and use this to predict the performance of the forthcomingquad-core system.
Date: May 14, 2007
Creator: Carter, Jonathan; He, Yun; Shalf, John; Shan, Hongzhang; Strohmaier, Erich & Wasserman, Harvey
Partner: UNT Libraries Government Documents Department

Franklin: User Experiences

Description: The newest workhorse of the National Energy Research Scientific Computing Center is a Cray XT4 with 9,736 dual core nodes. This paper summarizes Franklin user experiences from friendly early user period to production period. Selected successful user stories along with top issues affecting user experiences are presented.
Date: May 7, 2008
Creator: Center, National Energy Research Supercomputing; He, Yun (Helen); Kramer, William T.C.; Carter, Jonathan & Cardo, Nicholas
Partner: UNT Libraries Government Documents Department

Optimizing the Point-In-Box Search Algorithm for the Cray Y-MP(TM) Supercomputer

Description: Determining the subset of points (particles) in a problem domain that are contained within certain spatial regions of interest can be one of the most time-consuming parts of some computer simulations. Examples where this 'point-in-box' search can dominate the computation time include (1) finite element contact problems; (2) molecular dynamics simulations; and (3) interactions between particles in numerical methods, such as discrete particle methods or smooth particle hydrodynamics. This paper describes methods to optimize a point-in-box search algorithm developed by Swegle that make optimal use of the architectural features of the Cray Y-MP Supercomputer.
Date: December 23, 1998
Creator: Attaway, S.W.; Davis, M.E.; Heinstein, M.W. & Swegle, J.S.
Partner: UNT Libraries Government Documents Department

Improved utilization and responsiveness with gang scheduling

Description: Most commercial multicomputers use space-slicing schemes in which each scheduling decision has an unknown impact on the future: should a job be scheduled, risking that it will block other larger jobs later, or should the processors be left idle for now in anticipation of future arrivals? This dilemma is solved by using gang scheduling, because then the impact of each decision is limited to its time slice, and future arrivals can be accommodated in other time slices. This added flexibility is shown to improve overall system utilization and responsiveness. Empirical evidence from using gang scheduling on a Cray T3D installed at Lawrence Livermore National Lab corroborates these results, and shows conclusively that gang scheduling can be very effective with current technology. 29 refs., 10 figs., 6 tabs.
Date: October 1, 1996
Creator: Feitelson, D.G., & Jette, M.A.
Partner: UNT Libraries Government Documents Department

A Cray T3D performance study

Description: We carry out a performance study using the Cray T3D parallel supercomputer to illustrate some important features of this machine. Timing experiments show the speed of various basic operations while more complicated operations give some measure of its parallel performance.
Date: May 1, 1996
Creator: Nallana, A. & Kincaid, D.R.
Partner: UNT Libraries Government Documents Department

Performance of the BLAS-1 and other mathematical kernels on the SGI/Cray Origin 2000 processor

Description: The purpose of this paper is to explore issues related to the computation and communication performance of the Basic Linear Algebra Subroutines (BLAS-1) and related kernels on the SGI/Cray Origin 2000 parallel computer. Experiments are performed both on vendor-supplied mathematical library routines as well as hand-coded loops and array syntax. The goal of this study is to get a better understanding of performance issues pertaining to the Origin 2000 architecture.
Date: August 1, 1997
Creator: Dearholt, W. & Joubert, W.
Partner: UNT Libraries Government Documents Department

Numerical Tokamak Turbulence Calculations on the CRAY T3E

Description: Full cross section calculations of ion-temperature-gradient-driven turbulence with Landau closure are being carried out as part of the Numerical Tokamak Turbulence Project, one of the U.S. Department of Energy`s Phase II Grand Challenges. To include the full cross section of a magnetic fusion device like the tokamak requires more memory and CPU time than is available on the National Energy Research Scientific Computing Center`s (NERSC`s) shared-memory vector machines such as the CRAY C90 and J90. Calculations of cylindrical multi-helicity ion-temperature-gradient-driven turbulence were completed on NERSC`s 160-processor distributed-memory CRAY T3E parallel computer with 256 Mbytes of memory per processor. This augurs well for yet more memory and CPU intensive calculations on the next-generation T3E at NERSC. This paper presents results on benchmarks with the current T3E at NERSC. Physics results pertaining to plasma confinement at the core of tokamaks subject to ion-temperature-gradient-driven-turbulence are also highlighted. Results at this resolution covering this extent of physical time were previously unattainable. Work is in progress to increase the resolution, improve the performance of the parallel code, and include toroidal geometry in these calculations in anticipation of the imminent arrival of a fully configured,512-processor, T3E-900 model.
Date: December 31, 1997
Creator: Lynch, V.E., Leboeuf, J.N., Carreras, B.A.
Partner: UNT Libraries Government Documents Department

Development of a dynamic time sharing scheduled environment. Final report

Description: Massively parallel computers, such as the Cray T3D, have historically supported resource sharing solely with space sharing. In that method, multiple problems are solved by executing them on distinct processors. This project developed a dynamic time- and space-sharing scheduler to achieve greater interactivity and throughput than could be achieved with space-sharing alone. CRI and LLNL worked together on the design, testing, and review aspects of this project. There were separate software deliverables. CRI implemented a general purpose scheduling system as per the design specifications. LLNL ported the local gang scheduler software to the LLNL Cray T3D. In this approach, processes are allocated simultaneously to all components of a parallel program (in a gang). Program execution is preempted as needed to provide for interactivity. Programs are also relocated to different processors as needed to efficiently pack the computer`s torus of processors. In phase one, CRI developed an interface specification after discussions with LLNL for system-level software supporting a time- and space-sharing environment on the LLNL T3D. The two parties also discussed interface specifications for external control tools (such as scheduling policy tools, system administration tools) and applications programs. CRI assumed responsibility for the writing and implementation of all the necessary system software in this phase. In phase two, CRI implemented job-rolling on the Cray T3D, a mechanism for preempting a program, saving its state to disk, and later restoring its state to memory for continued execution. LLNL ported its gang scheduler to the LLNL T3D utilizing the CRI interface implemented in phases one and two. During phase three, the functionality and effectiveness of the LLNL gang scheduler was assessed to provide input to CRI time- and space-sharing efforts. CRI will utilize this information in the development of general schedulers suitable for other sites and future architectures. All phases of this project ...
Date: October 1, 1997
Partner: UNT Libraries Government Documents Department

MPICH on the T3D: A case study of high performance message passing

Description: This paper describes the design, implementation and performance of a port of the Argonne National Laboratory/Mississippi State University MPICH implementation of the Message Passing Interface standard to the Cray T3D massively parallel processing system. A description of the factors influencing the design and the various stages of implementation are presented. Performance results revealing superior bandwidth and comparable latency as compared to other full message passing systems on the T3D are shown. Further planned improvements and optimizations, including an analysis of a port to the T3E, are mentioned.
Date: July 1, 1996
Creator: Brightwell, R. & Skjellum, A.
Partner: UNT Libraries Government Documents Department

Selecting and implementing the PBS scheduler on an SGI Onyx 2/Orgin 2000.

Description: In the Mathematics and Computer Science Division at Argonne, the demand for resources on the Onyx 2 exceeds the resources available for consumption. To distribute these scarce resources effectively, we need a scheduling and resource management package with multiple capabilities. In particular, it must accept standard interactive user logins, allow batch jobs, backfill the system based on available resources, and permit system activities such as accounting to proceed without interruption. The package must include a mechanism to treat the graphic pipes as a schedulable resource. Also required is the ability to create advance reservations, offer dedicated system modes for large resource runs and benchmarking, and track the resources consumed for each job run. Furthermore, our users want to be able to obtain repeatable timing results on job runs. And, of course, package costs must be carefully considered. We explored several options, including NQE and various third-party products, before settling on the PBS scheduler.
Date: June 28, 1999
Creator: Bittner, S.
Partner: UNT Libraries Government Documents Department

New tools using the hardware performance monitor to help users tune programs on the Cray X-MP

Description: The performance of a Cray system is highly dependent on the tuning techniques used by individuals on their codes. Many of our users were not taking advantage of the tuning tools that allow them to monitor their own programs by using the Hardware Performance Monitor (HPM). We therefore modified UNICOS to collect HPM data for all processes and to report Mflop ratings based on users, programs, and time used. Our tuning efforts are now being focused on the users and programs that have the best potential for performance improvements. These modifications and some of the more striking performance improvements are described.
Date: September 25, 1991
Creator: Engert, D.E.; Rudsinski, L. (Argonne National Lab., IL (United States)) & Doak, J. (Cray Research, Inc., Minneapolis, MN (United States))
Partner: UNT Libraries Government Documents Department

Taking scientific visualization to the masses

Description: The paper offers the premise that scientific visualization capabilities are generally available only to a limited subset of scientists. Several reasons for this are presented. The paper describes a collaborative project between scientists of the Defense Nuclear Agency and computer scientists at Los Alamos National Laboratory. This project's goal is to get visualization capabilities into the hands of many more scientists.
Date: January 1, 1991
Creator: Vigil, M. & Bouchier, S.
Partner: UNT Libraries Government Documents Department

Results of the first UNICOS security survey

Description: At the Santa Fe CUG, in September 1991, a brief survey was distributed to attendees in order to begin developing a database of sites interested and active in using UNICOS security protections and features. Forty-two individuals attended a Security BOF session; their responses comprised about three-quarters of the forty-six sites (representing 62 installed machines) who completed and returned the survey questionnaire. Although the sample is clearly biased -- most of those responding had already evidenced interest in security by attending the BOF -- the broad range of sites, industrial and academic as well as government and military, that were represented was surprising. Fully 50% of the 62 installed machines were actively running UNICOS Secure Mode. This talk will provide an overview of the results of the survey, which will be repeated at least annually by the new Security MIG. A tabulation of the sites that have some experience with running Secure Mode UNICOS will be made available to all sites, in keeping with the goal of disseminating such hard-won experience with UNICOS security.
Date: January 1, 1992
Creator: Christoph, G.G.
Partner: UNT Libraries Government Documents Department

Xcl---A family of programming lanquage-based shells

Description: As the three major UNIX shells have emerged, they have shown little inclination to include syntax and semantics from existing programming languages. The first of these shells, sh, contains only a small amount of C-like syntax. Csh provides some C language expression syntax, but includes very little other C syntax. The newest of these shells, ksh, also includes some C-like expression syntax, although they contain some significantly un-C-like syntax in such areas as relational and logical operators. Several much less widely used shells have been written that much more closely resemble particular programming languages. However, each of these shells have the disadvantage of not being based upon a widely used programming language. In addition, interactive commands tend to be difficult to enter because they must frequently be entered using normal programming language constructs such as function calls. Thus, the vast majority of programmers of today's UNIX shells must deal with a shell interface that is not based upon any familiar programming language. This paper describes some of the features of the xcl family of shells. Each of these shells is based closely upon an existing programming language and provides the user with a familiar and highly programmable shell interface. 7 refs.
Date: January 1, 1989
Creator: Roschke, M.A.
Partner: UNT Libraries Government Documents Department

Implementing ASPEN on the CRAY computer

Description: This paper describes our experience in converting the ASPEN program for use on our CRAY computers at the Los Alamos National Laboratory. The CRAY computer is two-to-five times faster than a CDC-7600 for scalar operations, is equipped with up to two million words of high-speed storage, and has vector processing capability. Thus, the CRAY is a natural candidate for programs that are the size and complexity of ASPEN. Our approach to converting ASPEN and the conversion problems are discussed, including our plans for optimizing the program. Comparisons of run times for test problems between the CRAY and IBM 370 computer versions are presented.
Date: January 1, 1981
Creator: Duerre, K.H. & Bumb, A.C.
Partner: UNT Libraries Government Documents Department

Comparative performance evaluation of two supercomputers: CDC Cyber-205 and CRI Cray-1

Description: This report compares the performance of Control Data Corporation's newest supercomputer, the Cyber-205, with the Cray Research, Inc. Cray-1, currently the Laboratory's largest mainframe. The rationale of our benchmarking effort is discussed. Results are presented of tests to determine the speed of basic arithmetic operations, of runs using our standard benchmark programs, and of runs using three codes that have been optimized for both machines: a linear system solver, a model hydrodynamics code, and parts of a plasma simulation code. It is concluded that the speed of the Cyber-205 for memory-to-memory operations on vectors stored in consecutive locations is considerably faster than that of the Cray-1. However, the overall performance of the machine is not quite equal to that of the Cray for tasks of interest to the Laboratory as represented by our benchmark set.
Date: January 1, 1981
Creator: Bucher, I.Y. & Moore, J.W.
Partner: UNT Libraries Government Documents Department

High Performance Computing Facility Operational Assessment, FY 2011 Oak Ridge Leadership Computing Facility

Description: Oak Ridge National Laboratory's Leadership Computing Facility (OLCF) continues to deliver the most powerful resources in the U.S. for open science. At 2.33 petaflops peak performance, the Cray XT Jaguar delivered more than 1.5 billion core hours in calendar year (CY) 2010 to researchers around the world for computational simulations relevant to national and energy security; advancing the frontiers of knowledge in physical sciences and areas of biological, medical, environmental, and computer sciences; and providing world-class research facilities for the nation's science enterprise. Scientific achievements by OLCF users range from collaboration with university experimentalists to produce a working supercapacitor that uses atom-thick sheets of carbon materials to finely determining the resolution requirements for simulations of coal gasifiers and their components, thus laying the foundation for development of commercial-scale gasifiers. OLCF users are pushing the boundaries with software applications sustaining more than one petaflop of performance in the quest to illuminate the fundamental nature of electronic devices. Other teams of researchers are working to resolve predictive capabilities of climate models, to refine and validate genome sequencing, and to explore the most fundamental materials in nature - quarks and gluons - and their unique properties. Details of these scientific endeavors - not possible without access to leadership-class computing resources - are detailed in Section 4 of this report and in the INCITE in Review. Effective operations of the OLCF play a key role in the scientific missions and accomplishments of its users. This Operational Assessment Report (OAR) will delineate the policies, procedures, and innovations implemented by the OLCF to continue delivering a petaflop-scale resource for cutting-edge research. The 2010 operational assessment of the OLCF yielded recommendations that have been addressed (Reference Section 1) and where appropriate, changes in Center metrics were introduced. This report covers CY 2010 and CY 2011 Year to ...
Date: August 1, 2011
Creator: Baker, Ann E; Bland, Arthur S Buddy; Hack, James J; Barker, Ashley D; Boudwin, Kathlyn J.; Kendall, Ricky A et al.
Partner: UNT Libraries Government Documents Department

Measuring and tuning energy efficiency on large scale high performance computing platforms.

Description: Recognition of the importance of power in the field of High Performance Computing, whether it be as an obstacle, expense or design consideration, has never been greater and more pervasive. While research has been conducted on many related aspects, there is a stark absence of work focused on large scale High Performance Computing. Part of the reason is the lack of measurement capability currently available on small or large platforms. Typically, research is conducted using coarse methods of measurement such as inserting a power meter between the power source and the platform, or fine grained measurements using custom instrumented boards (with obvious limitations in scale). To collect the measurements necessary to analyze real scientific computing applications at large scale, an in-situ measurement capability must exist on a large scale capability class platform. In response to this challenge, we exploit the unique power measurement capabilities of the Cray XT architecture to gain an understanding of power use and the effects of tuning. We apply these capabilities at the operating system level by deterministically halting cores when idle. At the application level, we gain an understanding of the power requirements of a range of important DOE/NNSA production scientific computing applications running at large scale (thousands of nodes), while simultaneously collecting current and voltage measurements on the hosting nodes. We examine the effects of both CPU and network bandwidth tuning and demonstrate energy savings opportunities of up to 39% with little or no impact on run-time performance. Capturing scale effects in our experimental results was key. Our results provide strong evidence that next generation large-scale platforms should not only approach CPU frequency scaling differently, but could also benefit from the capability to tune other platform components, such as the network, to achieve energy efficient performance.
Date: August 1, 2011
Creator: Laros, James H., III
Partner: UNT Libraries Government Documents Department

Running Infiniband on the Cray XT3

Description: In an effort to utilize the performance and cost benefits of the infiniband interconnect, this paper will discuss what was needed to install and load a single data rate infiniband host channel adapter into a service node on the Cray XT3. Along with the discussion on how to do it, this paper will also provide some performance numbers achieved from this connection to a remote system.
Date: May 1, 2007
Creator: Minich, Makia
Partner: UNT Libraries Government Documents Department

Optimization of the particle pusher in a diode simulation code

Description: The particle pusher in Sandia's particle-in-cell diode simulation code has been rewritten to reduce the required run time of a typical simulation. The resulting new version of the code has been found to run up to three times as fast as the original with comparable accuracy. The cost of this optimization was an increase in storage requirements of about 15%. The new version has also been written to run efficiently on a CRAY-1 computing system. Steps taken to affect this reduced run time are described. Various test cases are detailed.
Date: September 1, 1979
Creator: Theimer, M.M. & Quintenz, J.P.
Partner: UNT Libraries Government Documents Department

Scalability of preconditioners as a strategy for parallel computation of compressible fluid flow

Description: Parallel implementations of a Newton-Krylov-Schwarz algorithm are used to solve a model problem representing low Mach number compressible fluid flow over a backward-facing step. The Mach number is specifically selected to result in a numerically {open_quote}stiff{close_quotes} matrix problem, based on an implicit finite volume discretization of the compressible 2D Navier-Stokes/energy equations using primitive variables. Newton`s method is used to linearize the discrete system, and a preconditioned Krylov projection technique is used to solve the resulting linear system. Domain decomposition enables the development of a global preconditioner via the parallel construction of contributions derived from subdomains. Formation of the global preconditioner is based upon additive and multiplicative Schwarz algorithms, with and without subdomain overlap. The degree of parallelism of this technique is further enhanced with the use of a matrix-free approximation for the Jacobian used in the Krylov technique (in this case, GMRES(k)). Of paramount interest to this study is the implementation and optimization of these techniques on parallel shared-memory hardware, namely the Cray C90 and SGI Challenge architectures. These architectures were chosen as representative and commonly available to researchers interested in the solution of problems of this type. The Newton-Krylov-Schwarz solution technique is increasingly being investigated for computational fluid dynamics (CFD) applications due to the advantages of full coupling of all variables and equations, rapid non-linear convergence, and moderate memory requirements. A parallel version of this method that scales effectively on the above architectures would be extremely attractive to practitioners, resulting in efficient, cost-effective, parallel solutions exhibiting the benefits of the solution technique.
Date: May 1, 1996
Creator: Hansen, G.A.
Partner: UNT Libraries Government Documents Department

Computer workstation speeds

Description: This report compares the performance of several computers. Some of the machines are discontinued, and some are anticipated, but most are currently installed at Sandia Laboratories. All the computers are personal workstations or departmental servers, except for comparison, one is a Cray C90 mainframe supercomputer (not owned by the Laboratories). A few of the computers have multiple processors, but parallelism is not tested. The time to run three programs is reported for every computer. Unlike many benchmarks, these are complete application programs. They were written and are used at Sandia Laboratories. Also SPECmarks are reported for many computers. These are industry standard performance ratings. They are in general agreement with the speeds of running the Sandia programs. This report concludes with some background material and notes about specific manufacturers.
Date: June 1, 1996
Creator: Grcar, J.F.
Partner: UNT Libraries Government Documents Department

Parallel performance of TORT on the CRAY J90: Model and measurement

Description: A limitation on the parallel performance of TORT on the CRAY J90 is the amount of extra work introduced by the multitasking algorithm itself. The extra work beyond that of the serial version of the code, called overhead, arises from the synchronization of the parallel tasks and the accumulation of results by the master task. The goal of recent updates to TORT was to reduce the time consumed by these activities. To help understand which components of the multitasking algorithm contribute significantly to the overhead, a parallel performance model was constructed and compared to measurements of actual timings of the code.
Date: October 1, 1997
Creator: Barnett, A. & Azmy, Y.Y.
Partner: UNT Libraries Government Documents Department