77 Matching Results

Search Results

Advanced search parameters have been applied.

Electron energy loss spectroscopy of CH/sub 3/N/sub 2/CH/sub 3/ adsorbed on Ni(100), Ni(111), Cr(100), Cr(111)

Description: A study of the adsorption of CH/sub 3/N/sub 2/CH/sub 3/ on Ni(100), Ni(111), Cr(100), and Cr(111) using high resolution electron energy loss spectroscopy (EELS) is presented. Under approximately the same conditions of coverage, the vibrational spectra of CH/sub 3/N/sub 2/CH/sub 3/ on these four surfaces are quite distinct from one another, implying that the CH/sub 3/N/sub 2/CH/sub 3/-substrate interaction is very sensitive to the physical and electronic structure of each surface. In addition to the room temperature studies, the evolution of surface species on the Ni(100) surface in the temperature range 300 to 425 K was studied. Analysis of the Ni(100) spectra indicates that molecular adsorption, probably through the N lone pair, occurs at room temperature. Spectra taken after annealing the CH/sub 3/N/sub 2/CH/sub 3/-Ni(100) surfaces indicate that CH and CN bond scission occurred at the elevated temperatures. Decomposition of CH/sub 3/N/sub 2/CH/sub 3/ takes place on the Ni(111), Cr(100), and Cr(111) surfaces at room temperature, as evidenced by the intensity of the carbon-metal stretch in the corresponding spectra. Possible identities of coadsorbed dissociation products are considered. The stable coverage of surface species on all four surfaces at 300 K is less than one monolayer. A general description of an electron energy loss (EEL) spectrometer is given. Followed by a more specific discussion of some recent modifications to the EEL monochromator assembly used in this laboratory. Both the previous configuration of our monochromator and the new version are briefly described, as an aid to understanding the motivation for the changes as well as the differences in operation of the two versions. For clarity, the new monochromator design is referred to as variable pass, while the previous design is referred to as double pass. A modified tuning procedure for the new monochromator is also presented. 58 refs., 11 figs.
Date: July 1, 1985
Creator: Schulz, M.A.
Partner: UNT Libraries Government Documents Department

Extracting Critical Path Graphs from MPI Applications

Description: The critical path is one of the fundamental runtime characteristics of a parallel program. It identifies the longest execution sequence without wait delays. In other words, the critical path is the global execution path that inflicts wait operations on other nodes without itself being stalled. Hence, it dictates the overall runtime and knowing it is important to understand an application's runtime and message behavior and to target optimizations. We have developed a toolset that identifies the critical path of MPI applications, extracts it, and then produces a graphical representation of the corresponding program execution graph to visualize it. To implement this, we intercept all MPI library calls, use the information to build the relevant subset of the execution graph, and then extract the critical path from there. We have applied our technique to several scientific benchmarks and successfully produced critical path diagrams for applications running on up to 128 processors.
Date: July 27, 2005
Creator: Schulz, M
Partner: UNT Libraries Government Documents Department

Program Development Tools and Infrastructures

Description: Exascale class machines will exhibit a new level of complexity: they will feature an unprecedented number of cores and threads, will most likely be heterogeneous and deeply hierarchical, and offer a range of new hardware techniques (such as speculative threading, transactional memory, programmable prefetching, and programmable accelerators), which all have to be utilized for an application to realize the full potential of the machine. Additionally, users will be faced with less memory per core, fixed total power budgets, and sharply reduced MTBFs. At the same time, it is expected that the complexity of applications will rise sharply for exascale systems, both to implement new science possible at exascale and to exploit the new hardware features necessary to achieve exascale performance. This is particularly true for many of the NNSA codes, which are large and often highly complex integrated simulation codes that push the limits of everything in the system including language features. To overcome these limitations and to enable users to reach exascale performance, users will expect a new generation of tools that address the bottlenecks of exascale machines, that work seamlessly with the (set of) programming models on the target machines, that scale with the machine, that provide automatic analysis capabilities, and that are flexible and modular enough to overcome the complexities and changing demands of the exascale architectures. Further, any tool must be robust enough to handle the complexity of large integrated codes while keeping the user's learning curve low. With the ASC program, in particular the CSSE (Computational Systems and Software Engineering) and CCE (Common Compute Environment) projects, we are working towards a new generation of tools that fulfill these requirements and that provide our users as well as the larger HPC community with the necessary tools, techniques, and methodologies required to make exascale performance a reality.
Date: March 12, 2012
Creator: Schulz, M
Partner: UNT Libraries Government Documents Department

An Application of a State of the Art 3D-CAD-Modeling and Simulation System for the Decommissioning of Nuclear Capital Equipment in Respect of German Prototype Spent Fuel Reprocessing Plant Karlsruhe

Description: Siempelkamp Nukleartechnik GmbH is engaged in the optimization of decommissioning processes for several years. With respect of the complexity of the projects, the time frame and the budget it is necessary to find more effective ways to handle those tasks in the near future. The decommissioning and dismantling will be achieved in six steps taking into account that some processing equipment can be dismantled before and the rest only after the High Active Liquid Waste Concentrate (HAWC) has been vitrified approximately by mid of 2005. After the successful beginning of the remote dismantling of the main process cells from March 2000, the next remote dismantling project at the WAK was initiated April 2000.
Date: February 25, 2002
Creator: Schulz, M.; Boese, U. & Doering, K.
Partner: UNT Libraries Government Documents Department

Current Trends in Numerical Simulation for Parallel Engineering Environments New Directions and Work-in-Progress

Description: In today's world, the use of parallel programming and architectures is essential for simulating practical problems in engineering and related disciplines. Remarkable progress in CPU architecture, system scalability, and interconnect technology continues to provide new opportunities, as well as new challenges for both system architects and software developers. These trends are paralleled by progress in parallel algorithms, simulation techniques, and software integration from multiple disciplines. ParSim brings together researchers from both application disciplines and computer science and aims at fostering closer cooperation between these fields. Since its successful introduction in 2002, ParSim has established itself as an integral part of the EuroPVM/MPI conference series. In contrast to traditional conferences, emphasis is put on the presentation of up-to-date results with a short turn-around time. This offers a unique opportunity to present new aspects in this dynamic field and discuss them with a wide, interdisciplinary audience. The EuroPVM/MPI conference series, as one of the prime events in parallel computation, serves as an ideal surrounding for ParSim. This combination enables the participants to present and discuss their work within the scope of both the session and the host conference. This year, eleven papers from authors in nine countries were submitted to ParSim, and we selected five of them. They cover a wide range of different application fields including gas flow simulations, thermo-mechanical processes in nuclear waste storage, and cosmological simulations. At the same time, the selected contributions also address the computer science side of their codes and discuss different parallelization strategies, programming models and languages, as well as the use nonblocking collective operations in MPI. We are confident that this provides an attractive program and that ParSim will be an informal setting for lively discussions and for fostering new collaborations. We hope this session will fulfill its purpose to provide new insights from ...
Date: June 29, 2006
Creator: Trinitis, C & Schulz, M
Partner: UNT Libraries Government Documents Department

Practical Differential Profiling

Description: Comparing performance profiles from two runs is an essential performance analysis step that users routinely perform. In this work we present eGprof, a tool that facilitates these comparisons through differential profiling inside gprof. We chose this approach, rather than designing a new tool, since gprof is one of the few performance analysis tools accepted and used by a large community of users. eGprof allows users to 'subtract' two performance profiles directly. It also includes callgraph visualization to highlight the differences in graphical form. Along with the design of this tool, we present several case studies that show how eGprof can be used to find and to study the differences of two application executions quickly and hence can aid the user in this most common step in performance analysis. We do this without requiring major changes on the side of the user, the most important factor in guaranteeing the adoption of our tool by code teams.
Date: February 4, 2007
Creator: Schulz, M & De Supinski, B R
Partner: UNT Libraries Government Documents Department

6th International Special Session on Current Trends in Numerical Simulation for Parallel Engineering Environments

Description: In today's world, the use of parallel programming and architectures is essential for simulating practical problems in engineering and related disciplines. Remarkable progress in CPU architecture (multi- and many-core, SMT, transactional memory, virtualization support, etc.), system scalability, and interconnect technology continues to provide new opportunities, as well as new challenges for both system architects and software developers. These trends are paralleled by progress in parallel algorithms, simulation techniques, and software integration from multiple disciplines. In its 6th year ParSim continues to build a bridge between computer science and the application disciplines and to help with fostering cooperations between the different fields. In contrast to traditional conferences, emphasis is put on the presentation of up-to-date results with a shorter turn-around time. This offers the unique opportunity to present new aspects in this dynamic field and discuss them with a wide, interdisciplinary audience. The EuroPVM/MPI conference series, as one of the prime events in parallel computation, serves as an ideal surrounding for ParSim. This combination enables the participants to present and discuss their work within the scope of both the session and the host conference. This year, ten papers with authors in ten countries were submitted to ParSim, and after a quick turn-around, yet thorough review process we decided to accept three of them for publication and presentation during the ParSim session. These three papers show the use of simulation in a range of different application fields including earthquake and turbulence simulation. At the same time, they also address computer science aspects and discuss different parallelization strategies, programming models and environments, as well as scalability. We are confident that this provides an attractive program and that ParSim will yet again be an informal setting for lively discussions and for fostering new collaborations. Several people contributed to this event. Thanks go to Jack ...
Date: July 9, 2007
Creator: Schulz, M & Trinitis, C
Partner: UNT Libraries Government Documents Department

What Scientific Applications can Benefit from Hardware Transactional Memory?

Description: Achieving efficient and correct synchronization of multiple threads is a difficult and error-prone task at small scale and, as we march towards extreme scale computing, will be even more challenging when the resulting application is supposed to utilize millions of cores efficiently. Transactional Memory (TM) is a promising technique to ease the burden on the programmer, but only recently has become available on commercial hardware in the new Blue Gene/Q system and hence the real benefit for realistic applications has not been studied, yet. This paper presents the first performance results of TM embedded into OpenMP on a prototype system of BG/Q and characterizes code properties that will likely lead to benefits when augmented with TM primitives. We first, study the influence of thread count, environment variables and memory layout on TM performance and identify code properties that will yield performance gains with TM. Second, we evaluate the combination of OpenMP with multiple synchronization primitives on top of MPI to determine suitable task to thread ratios per node. Finally, we condense our findings into a set of best practices. These are applied to a Monte Carlo Benchmark and a Smoothed Particle Hydrodynamics method. In both cases an optimized TM version, executed with 64 threads on one node, outperforms a simple TM implementation. MCB with optimized TM yields a speedup of 27.45 over baseline.
Date: June 4, 2012
Creator: Schindewolf, M; Bihari, B; Gyllenhaal, J; Schulz, M; Wang, A & Karl, W
Partner: UNT Libraries Government Documents Department

On the Performance of an Algebraic MultigridSolver on Multicore Clusters

Description: Algebraic multigrid (AMG) solvers have proven to be extremely efficient on distributed-memory architectures. However, when executed on modern multicore cluster architectures, we face new challenges that can significantly harm AMG's performance. We discuss our experiences on such an architecture and present a set of techniques that help users to overcome the associated problems, including thread and process pinning and correct memory associations. We have implemented most of the techniques in a MultiCore SUPport library (MCSup), which helps to map OpenMP applications to multicore machines. We present results using both an MPI-only and a hybrid MPI/OpenMP model.
Date: April 29, 2010
Creator: Baker, A H; Schulz, M & Yang, U M
Partner: UNT Libraries Government Documents Department

Exploiting Data Similarity to Reduce Memory Footprints

Description: Memory size has long limited large-scale applications on high-performance computing (HPC) systems. Since compute nodes frequently do not have swap space, physical memory often limits problem sizes. Increasing core counts per chip and power density constraints, which limit the number of DIMMs per node, have exacerbated this problem. Further, DRAM constitutes a significant portion of overall HPC system cost. Therefore, instead of adding more DRAM to the nodes, mechanisms to manage memory usage more efficiently - preferably transparently - could increase effective DRAM capacity and thus the benefit of multicore nodes for HPC systems. MPI application processes often exhibit significant data similarity. These data regions occupy multiple physical locations across the individual rank processes within a multicore node and thus offer a potential savings in memory capacity. These regions, primarily residing in heap, are dynamic, which makes them difficult to manage statically. Our novel memory allocation library, SBLLmalloc, automatically identifies identical memory blocks and merges them into a single copy. SBLLmalloc does not require application or OS changes since we implement it as a user-level library. Overall, we demonstrate that SBLLmalloc reduces the memory footprint of a range of MPI applications by 32.03% on average and up to 60.87%. Further, SBLLmalloc supports problem sizes for IRS over 21.36% larger than using standard memory management techniques, thus significantly increasing effective system size. Similarly, SBLLmalloc requires 43.75% fewer nodes than standard memory management techniques to solve an AMG problem.
Date: January 28, 2011
Creator: Biswas, S; de Supinski, B R; Schulz, M; Franklin, D; Sherwood, T & Chong, F T
Partner: UNT Libraries Government Documents Department

ScalaTrace: Tracing, Analysis and Modeling of HPC Codes at Scale

Description: Characterizing the communication behavior of large-scale applications is a difficult and costly task due to code/system complexity and their long execution times. An alternative to running actual codes is to gather their communication traces and then replay them, which facilitates application tuning and future procurements. While past approaches lacked lossless scalable trace collection, we contribute an approach that provides orders of magnitude smaller, if not near constant-size, communication traces regardless of the number of nodes while preserving structural information. We introduce intra- and inter-node compression techniques of MPI events, we develop a scheme to preserve time and causality of communication events, and we present results of our implementation for BlueGene/L. Given this novel capability, we discuss its impact on communication tuning and on trace extrapolation. To the best of our knowledge, such a concise representation of MPI traces in a scalable manner combined with time-preserving deterministic MPI call replay are without any precedence.
Date: March 31, 2010
Creator: Mueller, F; Wu, X; Schulz, M; de Supinski, B & Gamblin, T
Partner: UNT Libraries Government Documents Department