1,119 Matching Results

Search Results

Advanced search parameters have been applied.

Designing a Micro-Mechanical Transistor

Description: This is the final report of a three-year, Laboratory-Directed Research and Development (LDRD) project at the Los Alamos National Laboratory (LANL). Micro-mechanical electronic systems are chips with moving parts. They are fabricated with the same techniques that are used to manufacture electronic chips, sharing their low cost. Micro-mechanical chips can also contain electronic components. By combining mechanical parts with electronic parts it becomes possible to process signal mechanically. To achieve designs comparable to those obtained with electronic components it is necessary to have a mechanical device that can change its behavior in response to a small input - a mechanical transistor. The work proposed will develop the design tools for these complex-shaped resonant structures using the geometrical ray technique. To overcome the limitations of geometrical ray chaos, the dynamics of the rays will be studied using the methods developed for the study of nonlinear dynamical systems. T his leads to numerical methods that execute well in parallel computer architectures, using a limited amount of memory and no inter-process communication.
Date: June 3, 1999
Creator: Mainieri, R.
Partner: UNT Libraries Government Documents Department

Creating science-driven computer architecture: A new path to scientific leadership

Description: This document proposes a multi-site strategy for creating a new class of computing capability for the U.S. by undertaking the research and development necessary to build supercomputers optimized for science in partnership with the American computer industry.
Date: October 14, 2002
Creator: McCurdy, C. William; Stevens, Rick; Simon, Horst; Kramer, William; Bailey, David; Johnston, William et al.
Partner: UNT Libraries Government Documents Department

Performance and Accuracy of LAPACK's Symmetric TridiagonalEigensolvers

Description: We compare four algorithms from the latest LAPACK 3.1 release for computing eigenpairs of a symmetric tridiagonal matrix. These include QR iteration, bisection and inverse iteration (BI), the Divide-and-Conquer method (DC), and the method of Multiple Relatively Robust Representations (MR). Our evaluation considers speed and accuracy when computing all eigenpairs, and additionally subset computations. Using a variety of carefully selected test problems, our study includes a variety of today's computer architectures. Our conclusions can be summarized as follows. (1) DC and MR are generally much faster than QR and BI on large matrices. (2) MR almost always does the fewest floating point operations, but at a lower MFlop rate than all the other algorithms. (3) The exact performance of MR and DC strongly depends on the matrix at hand. (4) DC and QR are the most accurate algorithms with observed accuracy O({radical}ne). The accuracy of BI and MR is generally O(ne). (5) MR is preferable to BI for subset computations.
Date: April 19, 2007
Creator: Demmel, Jim W.; Marques, Osni A.; Parlett, Beresford N. & Vomel,Christof
Partner: UNT Libraries Government Documents Department

Computational Biology, Advanced Scientific Computing, and Emerging Computational Architectures

Description: This CRADA was established at the start of FY02 with $200 K from IBM and matching funds from DOE to support post-doctoral fellows in collaborative research between International Business Machines and Oak Ridge National Laboratory to explore effective use of emerging petascale computational architectures for the solution of computational biology problems. 'No cost' extensions of the CRADA were negotiated with IBM for FY03 and FY04.
Date: June 27, 2007
Partner: UNT Libraries Government Documents Department

Efficient Graph Based Assembly of Short-Read Sequences on Hybrid Core Architecture

Description: Advanced architectures can deliver dramatically increased throughput for genomics and proteomics applications, reducing time-to-completion in some cases from days to minutes. One such architecture, hybrid-core computing, marries a traditional x86 environment with a reconfigurable coprocessor, based on field programmable gate array (FPGA) technology. In addition to higher throughput, increased performance can fundamentally improve research quality by allowing more accurate, previously impractical approaches. We will discuss the approach used by Convey?s de Bruijn graph constructor for short-read, de-novo assembly. Bioinformatics applications that have random access patterns to large memory spaces, such as graph-based algorithms, experience memory performance limitations on cache-based x86 servers. Convey?s highly parallel memory subsystem allows application-specific logic to simultaneously access 8192 individual words in memory, significantly increasing effective memory bandwidth over cache-based memory systems. Many algorithms, such as Velvet and other de Bruijn graph based, short-read, de-novo assemblers, can greatly benefit from this type of memory architecture. Furthermore, small data type operations (four nucleotides can be represented in two bits) make more efficient use of logic gates than the data types dictated by conventional programming models.JGI is comparing the performance of Convey?s graph constructor and Velvet on both synthetic and real data. We will present preliminary results on memory usage and run time metrics for various data sets with different sizes, from small microbial and fungal genomes to very large cow rumen metagenome. For genomes with references we will also present assembly quality comparisons between the two assemblers.
Date: March 22, 2011
Creator: Sczyrba, Alex; Pratap, Abhishek; Canon, Shane; Han, James; Copeland, Alex; Wang, Zhong et al.
Partner: UNT Libraries Government Documents Department

Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms

Description: We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific-optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD quad-core, AMD dual-core, and Intel quad-core designs, the heterogeneous STI Cell, as well as one of the first scientific studies of the highly multithreaded Sun Victoria Falls (a Niagara2 SMP). We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural trade-offs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.
Date: October 16, 2008
Creator: Williams, Samuel; Oliker, Leonid; Vuduc, Richard; Shalf, John; Yelick, Katherine & Demmel, James
Partner: UNT Libraries Government Documents Department

Fabric-based systems: model, tools, applications.

Description: A Fabric Based System is a parameterized cellular architecture in which an array of computing cells communicates with an embedded processor through a global memory . This architecture is customizable to different classes of applications by funtional unit, interconnect, and memory parameters, and can be instantiated efficiently on platform FPGAs . In previous work, we have demonstrated the advantage of reconfigurable fabrics for image and signal processing applications . Recently, we have build a Fabric Generator, a Java-based toolset that greatly accelerates construction of the fabrics presented in. A module-generation library is used to define, instantiate, and interconnect cells' datapaths . FG generates customized sequencers for individual cells or collections of cells . We describe the Fabric-Based System model, the FG toolset, and concrete realizations offabric architectures generated by FG on the Altera Excalibur ARM that can deliver 4.5 GigaMACs/s (8/16 bit data, Multiply-Accumulate) .
Date: January 1, 2003
Creator: Wolinski, C. (Christophe); Gokhale, M. (Maya) & McCabe, K. P. (Kevin P.)
Partner: UNT Libraries Government Documents Department

Hydra: a service oriented architecture for scientific simulation integration

Description: One of the current major challenges in scientific modeling and simulation, in particular in the infrastructure-analysis community, is the development of techniques for efficiently and automatically coupling disparate tools that exist in separate locations on different platforms, implemented in a variety of languages and designed to be standalone. Recent advances in web-based platforms for integrating systems such as SOA provide an opportunity to address these challenges in a systematic fashion. This paper describes Hydra, an integrating architecture for infrastructure modeling and simulation that defines geography-based schemas that, when used to wrap existing tools as web services, allow for seamless plug-and-play composability. Existing users of these tools can enhance the value of their analysis by assessing how the simulations of one tool impact the behavior of another tool and can automate existing ad hoc processes and work flows for integrating tools together.
Date: January 1, 2008
Creator: Bent, Russell; Djidjev, Tatiana; Hayes, Birch P; Holland, Joe V; Khalsa, Hari S; Linger, Steve P et al.
Partner: UNT Libraries Government Documents Department

Non-preconditioned conjugate gradient on cell and FPGA based hybrid supercomputer nodes

Description: This work presents a detailed implementation of a double precision, non-preconditioned, Conjugate Gradient algorithm on a Roadrunner heterogeneous supercomputer node. These nodes utilize the Cell Broadband Engine Architecture{sup TM} in conjunction with x86 Opteron{sup TM} processors from AMD. We implement a common Conjugate Gradient algorithm, on a variety of systems, to compare and contrast performance. Implementation results are presented for the Roadrunner hybrid supercomputer, SRC Computers, Inc. MAPStation SRC-6 FPGA enhanced hybrid supercomputer, and AMD Opteron only. In all hybrid implementations wall clock time is measured, including all transfer overhead and compute timings.
Date: January 1, 2009
Creator: Dubois, David H; Dubois, Andrew J; Boorman, Thomas M & Connor, Carolyn M
Partner: UNT Libraries Government Documents Department

GridRun: A lightweight packaging and execution environment forcompact, multi-architecture binaries

Description: GridRun offers a very simple set of tools for creating and executing multi-platform binary executables. These ''fat-binaries'' archive native machine code into compact packages that are typically a fraction the size of the original binary images they store, enabling efficient staging of executables for heterogeneous parallel jobs. GridRun interoperates with existing distributed job launchers/managers like Condor and the Globus GRAM to greatly simplify the logic required launching native binary applications in distributed heterogeneous environments.
Date: February 1, 2004
Creator: Shalf, John & Goodale, Tom
Partner: UNT Libraries Government Documents Department

A brief comparison between grid based real space algorithms andspectrum algorithms for electronic structure calculations

Description: Quantum mechanical ab initio calculation constitutes the biggest portion of the computer time in material science and chemical science simulations. As a computer center like NERSC, to better serve these communities, it will be very useful to have a prediction for the future trends of ab initio calculations in these areas. Such prediction can help us to decide what future computer architecture can be most useful for these communities, and what should be emphasized on in future supercomputer procurement. As the size of the computer and the size of the simulated physical systems increase, there is a renewed interest in using the real space grid method in electronic structure calculations. This is fueled by two factors. First, it is generally assumed that the real space grid method is more suitable for parallel computation for its limited communication requirement, compared with spectrum method where a global FFT is required. Second, as the size N of the calculated system increases together with the computer power, O(N) scaling approaches become more favorable than the traditional direct O(N{sup 3}) scaling methods. These O(N) methods are usually based on localized orbital in real space, which can be described more naturally by the real space basis. In this report, the author compares the real space methods versus the traditional plane wave (PW) spectrum methods, for their technical pros and cons, and the possible of future trends. For the real space method, the author focuses on the regular grid finite different (FD) method and the finite element (FE) method. These are the methods used mostly in material science simulation. As for chemical science, the predominant methods are still Gaussian basis method, and sometime the atomic orbital basis method. These two basis sets are localized in real space, and there is no indication that their roles in quantum ...
Date: December 1, 2006
Creator: Wang, Lin-Wang
Partner: UNT Libraries Government Documents Department

Design Space Exploration of Domain Specific CGRAs Using Crowd-sourcing

Description: CGRAs (coarse grained reconfigurable array architectures) try to fill the gap between FPGAs and ASICs. Over three decades, the research towards CGRA design has produced number of architectures. Each of these designs lie at different points on a line drawn between FPGAs and ASICs, depending on the tradeoffs and design choices made during the design of architectures. Thus, design space exploration (DSE) takes a very important role in the circuit design process. In this work I propose the design space exploration of CGRAs can be done quickly and efficiently through crowd-sourcing and a game driven approach based on an interactive mapping game UNTANGLED and a design environment called SmartBricks. Both UNTANGLED and SmartBricks have been developed by our research team at Reconfigurable Computing Lab, UNT. I present the results of design space exploration of domain-specific reconfigurable architectures and compare the results comparing stripe vs mesh style, heterogeneous vs homogeneous. I also compare the results obtained from different interconnection topologies in mesh. These results show that this approach offers quick DSE for designers and also provides low power architectures for a suite of benchmarks. All results were obtained using standard cell ASICs with 90 nm process.
Date: August 2014
Creator: Sistla, Anil Kumar
Partner: UNT Libraries

Using a Transfer Function to Describe the Load-Balancing Problem

Description: The dynamic load-balancing problem for mesh-connected parallel computers can be clearly described by introducing a function that identifies how much work is to be transmitted between neighboring processors. This function is a solution to an elliptic problem for which a wealth of knowledge exists. The non-uniqueness of the solution to the load-balancing problem is made explicit.
Date: November 1993
Creator: Conley, Andrew J.
Partner: UNT Libraries Government Documents Department

Decoherence and a simple quantum computer

Description: The authors analyze the effect of decoherence on the operation of part of a simple quantum computer. The results indicate that quantum bit coding techniques may be used to mitigate the effects of two sources of decoherence - amplitude damping and phase randomization.
Date: October 1, 1995
Creator: Chuang, I.L.; Yamamoto, Y. & Laflamme, R.
Partner: UNT Libraries Government Documents Department

Transitive closure on the imagine stream processor

Description: The increasing gap between processor and memory speeds is a well-known problem in modern computer architecture. The Imagine system is designed to address the processor-memory gap through streaming technology. Stream processors are best-suited for computationally intensive applications characterized by high data parallelism and producer-consumer locality with minimal data dependencies. This work examines an efficient streaming implementation of the computationally intensive Transitive Closure (TC) algorithm on the Imagine platform. We develop a tiled TC algorithm specifically for the Imagine environment, which efficiently reuses streams to minimize expensive off-chip data transfers. The implementation requires complex stream programming since the memory hierarchy and cluster organization of the underlying architecture are exposed to the Imagine programmer. Results demonstrate that limited performance of TC is achieved primarily due to the complicated data-dependencies of the blocked algorithm. This work is an ongoing effort to identify classes of scientific problems well-suited for streaming processors.
Date: November 11, 2003
Creator: Griem, Gorden & Oliker, Leonid
Partner: UNT Libraries Government Documents Department

Radiation transport algorithms on trans-petaflops supercomputers of different architectures.

Description: We seek to understand which supercomputer architecture will be best for supercomputers at the Petaflops scale and beyond. The process we use is to predict the cost and performance of several leading architectures at various years in the future. The basis for predicting the future is an expanded version of Moore's Law called the International Technology Roadmap for Semiconductors (ITRS). We abstract leading supercomputer architectures into chips connected by wires, where the chips and wires have electrical parameters predicted by the ITRS. We then compute the cost of a supercomputer system and the run time on a key problem of interest to the DOE (radiation transport). These calculations are parameterized by the time into the future and the technology expected to be available at that point. We find the new advanced architectures have substantial performance advantages but conventional designs are likely to be less expensive (due to economies of scale). We do not find a universal ''winner'', but instead the right architectural choice is likely to involve non-technical factors such as the availability of capital and how long people are willing to wait for results.
Date: August 1, 2003
Creator: Christopher, Thomas Woods
Partner: UNT Libraries Government Documents Department

GPU COMPUTING FOR PARTICLE TRACKING

Description: This is a feasibility study of using a modern Graphics Processing Unit (GPU) to parallelize the accelerator particle tracking code. To demonstrate the massive parallelization features provided by GPU computing, a simplified TracyGPU program is developed for dynamic aperture calculation. Performances, issues, and challenges from introducing GPU are also discussed. General purpose Computation on Graphics Processing Units (GPGPU) bring massive parallel computing capabilities to numerical calculation. However, the unique architecture of GPU requires a comprehensive understanding of the hardware and programming model to be able to well optimize existing applications. In the field of accelerator physics, the dynamic aperture calculation of a storage ring, which is often the most time consuming part of the accelerator modeling and simulation, can benefit from GPU due to its embarrassingly parallel feature, which fits well with the GPU programming model. In this paper, we use the Tesla C2050 GPU which consists of 14 multi-processois (MP) with 32 cores on each MP, therefore a total of 448 cores, to host thousands ot threads dynamically. Thread is a logical execution unit of the program on GPU. In the GPU programming model, threads are grouped into a collection of blocks Within each block, multiple threads share the same code, and up to 48 KB of shared memory. Multiple thread blocks form a grid, which is executed as a GPU kernel. A simplified code that is a subset of Tracy++ [2] is developed to demonstrate the possibility of using GPU to speed up the dynamic aperture calculation by having each thread track a particle.
Date: March 25, 2011
Creator: Nishimura, Hiroshi; Song, Kai; Muriki, Krishna; Sun, Changchun; James, Susan & Qin, Yong
Partner: UNT Libraries Government Documents Department

SEARCH FOR A RELIABLE STORAGE ARCHITECTURE FOR RHIC.

Description: Software used to operate the Relativistic Heavy Ion Collider (RHIC) resides on one operational RAID storage system. This storage system is also used to store data that reflects the status and recent history of accelerator operations. Failure of this system interrupts the operation of the accelerator as backup systems are brought online. In order to increase the reliability of this critical control system component, the storage system architecture has been upgraded to use Storage Area Network (SAN) technology and to introduce redundant components and redundant storage paths. This paper describes the evolution of the storage system, the contributions to reliability that each additional feature has provided, further improvements that are being considered, and real-life experience with the current system.
Date: October 15, 2007
Creator: BINELLO,S.; KATZ, R.A. & MORRIS, J.T.
Partner: UNT Libraries Government Documents Department

DE-FG02-04ER25606 Identity Federation and Policy Management Guide: Final Report

Description: The goal of this 3-year project was to facilitate a more productive dynamic matching between resource providers and resource consumers in Grid environments by explicitly specifying policies. There were broadly two problems being addressed by this project. First, there was a lack of an Open Grid Services Architecture (OGSA)-compliant mechanism for expressing, storing and retrieving user policies and Virtual Organization (VO) policies. Second, there was a lack of tools to resolve and enforce policies in the Open Services Grid Architecture. To address these problems, our overall approach in this project was to make all policies explicit (e.g., virtual organization policies, resource provider policies, resource consumer policies), thereby facilitating policy matching and policy negotiation. Policies defined on a per-user basis were created, held, and updated in MyPolMan, thereby providing a Grid user to centralize (where appropriate) and manage his/her policies. Organizationally, the corresponding service was VOPolMan, in which the policies of the Virtual Organization are expressed, managed, and dynamically consulted. Overall, we successfully defined, prototyped, and evaluated policy-based resource management and access control for OGSA-based Grids. This DOE project partially supported 17 peer-reviewed publications on a number of different topics: General security for Grids, credential management, Web services/OGSA/OGSI, policy-based grid authorization (for remote execution and for access to information), policy-directed Grid data movement/placement, policies for large-scale virtual organizations, and large-scale policy-aware grid architectures. In addition to supporting the PI, this project partially supported the training of 5 PhD students.
Date: May 25, 2011
Creator: Humphrey, Marty, A
Partner: UNT Libraries Government Documents Department

The Integrated Plasma Simulator: A Flexible Python Framework for Coupled Multiphysics Simulation

Description: High-fidelity coupled multiphysics simulations are an increasingly important aspect of computational science. In many domains, however, there has been very limited experience with simulations of this sort, therefore research in coupled multiphysics often requires computational frameworks with significant flexibility to respond to the changing directions of the physics and mathematics. This paper presents the Integrated Plasma Simulator (IPS), a framework designed for loosely coupled simulations of fusion plasmas. The IPS provides users with a simple component architecture into which a wide range of existing plasma physics codes can be inserted as components. Simulations can take advantage of multiple levels of parallelism supported in the IPS, and can be controlled by a high-level ``driver'' component, or by other coordination mechanisms, such as an asynchronous event service. We describe the requirements and design of the framework, and how they were implemented in the Python language. We also illustrate the flexibility of the framework by providing examples of different types of simulations that utilize various features of the IPS.
Date: November 1, 2011
Creator: Foley, Samantha S; Elwasif, Wael R & Bernholdt, David E
Partner: UNT Libraries Government Documents Department

Peridigm summary report : lessons learned in development with agile components.

Description: This report details efforts to deploy Agile Components for rapid development of a peridynamics code, Peridigm. The goal of Agile Components is to enable the efficient development of production-quality software by providing a well-defined, unifying interface to a powerful set of component-based software. Specifically, Agile Components facilitate interoperability among packages within the Trilinos Project, including data management, time integration, uncertainty quantification, and optimization. Development of the Peridigm code served as a testbed for Agile Components and resulted in a number of recommendations for future development. Agile Components successfully enabled rapid integration of Trilinos packages into Peridigm. A cost of this approach, however, was a set of restrictions on Peridigm's architecture which impacted the ability to track history-dependent material data, dynamically modify the model discretization, and interject user-defined routines into the time integration algorithm. These restrictions resulted in modifications to the Agile Components approach, as implemented in Peridigm, and in a set of recommendations for future Agile Components development. Specific recommendations include improved handling of material states, a more flexible flow control model, and improved documentation. A demonstration mini-application, SimpleODE, was developed at the onset of this project and is offered as a potential supplement to Agile Components documentation.
Date: September 1, 2011
Creator: Salinger, Andrew Gerhard; Mitchell, John Anthony; Littlewood, David John & Parks, Michael L.
Partner: UNT Libraries Government Documents Department

Performance Portability for Unstructured Mesh Physics

Description: ASC legacy software must be ported to emerging hardware architectures. This paper notes that many programming models used by DOE applications are similar, and suggests that constructing a common terminology across these models could reveal a performance portable programming model. The paper then highlights how the LULESH mini-app is used to explore new programming models with outside solution providers. Finally, we suggest better tools to identify parallelism in software, and give suggestions for enhancing the co-design process with vendors.
Date: March 23, 2012
Creator: Keasler, J A
Partner: UNT Libraries Government Documents Department