457 Matching Results

Search Results

Advanced search parameters have been applied.

Neutron beam irradiation study of workload dependence of SER in a microprocessor

Description: It is known that workloads are an important factor in soft error rates (SER), but it is proving difficult to find differentiating workloads for microprocessors. We have performed neutron beam irradiation studies of a commercial microprocessor under a wide variety of workload conditions from idle, performing no operations, to very busy workloads resembling real HPC, graphics, and business applications. There is evidence that the mean times to first indication of failure, MTFIF defined in Section II, may be different for some of the applications.
Date: January 1, 2009
Creator: Michalak, Sarah E; Graves, Todd L; Hong, Ted; Ackaret, Jerry; Sonny, Rao; Subhasish, Mitra et al.
Partner: UNT Libraries Government Documents Department

Fault tolerant issues in the BTeV trigger

Description: The BTeV trigger performs sophisticated computations using large ensembles of FPGAs, DSPs, and conventional microprocessors. This system will have between 5,000 and 10,000 computing elements and many networks and data switches. While much attention has been devoted to developing efficient algorithms, the need for fault-tolerant, fault-adaptive, and flexible techniques and software to manage this huge computing platform has been identified as one of the most challenging aspects of this project. They describe the problem and offer an approach to solving it based on a distributed, hierarchical fault management system.
Date: December 3, 2002
Creator: al., Jeffrey A. Appel et
Partner: UNT Libraries Government Documents Department

Stream Monitoring and Control Team

Description: This presentation discusses research on multiple sensor clusters that monitor environmental health. The goal of the project was to create a series of sensor clusters, arranged into an array of nodes, that feeds key stream health data to an easily accessible database using an ad hoc wireless network.
Date: 2013
Creator: Bunn, Zac; McEver, Mike; Seastrunk, Deliah; Fu, Shengli; Hoeinghaus, David & Gu, Yixing
Partner: UNT College of Engineering

Scalable Performance Measurement and Analysis

Description: Concurrency levels in large-scale, distributed-memory supercomputers are rising exponentially. Modern machines may contain 100,000 or more microprocessor cores, and the largest of these, IBM's Blue Gene/L, contains over 200,000 cores. Future systems are expected to support millions of concurrent tasks. In this dissertation, we focus on efficient techniques for measuring and analyzing the performance of applications running on very large parallel machines. Tuning the performance of large-scale applications can be a subtle and time-consuming task because application developers must measure and interpret data from many independent processes. While the volume of the raw data scales linearly with the number of tasks in the running system, the number of tasks is growing exponentially, and data for even small systems quickly becomes unmanageable. Transporting performance data from so many processes over a network can perturb application performance and make measurements inaccurate, and storing such data would require a prohibitive amount of space. Moreover, even if it were stored, analyzing the data would be extremely time-consuming. In this dissertation, we present novel methods for reducing performance data volume. The first draws on multi-scale wavelet techniques from signal processing to compress systemwide, time-varying load-balance data. The second uses statistical sampling to select a small subset of running processes to generate low-volume traces. A third approach combines sampling and wavelet compression to stratify performance data adaptively at run-time and to reduce further the cost of sampled tracing. We have integrated these approaches into Libra, a toolset for scalable load-balance analysis. We present Libra and show how it can be used to analyze data from large scientific applications scalably.
Date: October 27, 2009
Creator: Gamblin, T
Partner: UNT Libraries Government Documents Department

Using reconfigurable functional units in conventional microprocessors.

Description: Scientific applications use highly specialized data structures that require complex, latency sensitive graphs of integer instructions for memory address calculations. Working with the Univeristy of Wisconsin, we have demonstrated significant differences between the Sandia's applications and the industry standard SPEC-FP (standard performance evaluation corporation-floating point) suite. Specifically, integer dataflow performance is critical to overall system performance. To improve this performance, we have developed a configurable functional unit design that is capable of accelerating integer dataflow.
Date: September 1, 2010
Creator: Rodrigues, Arun F.
Partner: UNT Libraries Government Documents Department

Embedded Software for the CEBAF RF Control Module

Description: The CEBAF accelerator control system employs a distributed computer strategy. As part of this strategy, the RF control sub-system uses 342 RF Control Modules, one for each of four warm section beam forming cavities (i.e., choppers, buncher, capture) and 338 superconducting accelerating cavities. Each control module has its own microprocessor, which provides local intelligence to automatically control over 100 parameters, while keeping the user interface simple. The microprocessor controls analog and digital I/O, including the phase and gradient section, high power amplifier (HPA), and interlocks. Presently, the embedded code is used to commission the 14 RF control modules in the injector. This paper describes the operational experience of this complex real-time control system.
Date: May 1, 1991
Creator: West, C.; Lahti, George & Ashkenazi, I.
Partner: UNT Libraries Government Documents Department

Identifying Energy-Efficient Concurrency Levels using Machine Learning

Description: Multicore microprocessors have been largely motivated by the diminishing returns in performance and the increased power consumption of single-threaded ILP microprocessors. With the industry already shifting from multicore to many-core microprocessors, software developers must extract more thread-level parallelism from applications. Unfortunately, low power-efficiency and diminishing returns in performance remain major obstacles with many cores. Poor interaction between software and hardware, and bottlenecks in shared hardware structures often prevent scaling to many cores, even in applications where a high degree of parallelism is potentially available. In some cases, throwing additional cores at a problem may actually harm performance and increase power consumption. Better use of otherwise limitedly beneficial cores by software components such as hypervisors and operating systems can improve system-wide performance and reliability, even in cases where power consumption is not a main concern. In response to these observations, we evaluate an approach to throttle concurrency in parallel programs dynamically. We throttle concurrency to levels with higher predicted efficiency from both performance and energy standpoints, and we do so via machine learning, specifically artificial neural networks (ANNs). One advantage of using ANNs over similar techniques previously explored is that the training phase is greatly simplified, thereby reducing the burden on the end user. Using machine learning in the context of concurrency throttling is novel. We show that ANNs are effective for identifying energy-efficient concurrency levels in multithreaded scientific applications, and we do so using physical experimentation on a state-of-the-art quad-core Xeon platform.
Date: July 23, 2007
Creator: Curtis-Maury, M; Singh, K; Blagojevic, F; Nikolopoulos, D S; de Supinski, B R; Schulz, M et al.
Partner: UNT Libraries Government Documents Department

Improving Memory Subsystem Performance Using ViVA: Virtual Vector Architecture

Description: The disparity between microprocessor clock frequencies and memory latency is a primary reason why many demanding applications run well below peak achievable performance. Software controlled scratchpad memories, such as the Cell local store, attempt to ameliorate this discrepancy by enabling precise control over memory movement; however, scratchpad technology confronts the programmer and compiler with an unfamiliar and difficult programming model. In this work, we present the Virtual Vector Architecture (ViVA), which combines the memory semantics of vector computers with a software-controlled scratchpad memory in order to provide a more effective and practical approach to latency hiding. ViVA requires minimal changes to the core design and could thus be easily integrated with conventional processor cores. To validate our approach, we implemented ViVA on the Mambo cycle-accurate full system simulator, which was carefully calibrated to match the performance on our underlying PowerPC Apple G5 architecture. Results show that ViVA is able to deliver significant performance benefits over scalar techniques for a variety of memory access patterns as well as two important memory-bound compact kernels, corner turn and sparse matrix-vector multiplication -- achieving 2x-13x improvement compared the scalar version. Overall, our preliminary ViVA exploration points to a promising approach for improving application performance on leading microprocessors with minimal design and complexity costs, in a power efficient manner.
Date: January 12, 2009
Creator: Gebis, Joseph; Oliker, Leonid; Shalf, John; Williams, Samuel & Yelick, Katherine
Partner: UNT Libraries Government Documents Department

Evaluation of soft-core processors on a Xilinx Virtex-5 field programmable gate array.

Description: Node-based architecture (NBA) designs for future satellite projects hold the promise of decreasing system development time and costs, size, weight, and power and positioning the laboratory to address other emerging mission opportunities quickly. Reconfigurable field programmable gate array (FPGA)-based modules will comprise the core of several of the NBA nodes. Microprocessing capabilities will be necessary with varying degrees of mission-specific performance requirements on these nodes. To enable the flexibility of these reconfigurable nodes, it is advantageous to incorporate the microprocessor into the FPGA itself, either as a hard-core processor built into the FPGA or as a soft-core processor built out of FPGA elements. This document describes the evaluation of three reconfigurable FPGA-based soft-core processors for use in future NBA systems: the MicroBlaze (uB), the open-source Leon3, and the licensed Leon3. Two standard performance benchmark applications were developed for each processor. The first, Dhrystone, is a fixed-point operation metric. The second, Whetstone, is a floating-point operation metric. Several trials were run at varying code locations, loop counts, processor speeds, and cache configurations. FPGA resource utilization was recorded for each configuration.
Date: April 1, 2011
Creator: Learn, Mark Walter
Partner: UNT Libraries Government Documents Department

Scientific Computing Kernels on the Cell Processor

Description: The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. As a result, the high performance computing community is examining alternative architectures that address the limitations of modern cache-based designs. In this work, we examine the potential of using the recently-released STI Cell processor as a building block for future high-end computing systems. Our work contains several novel contributions. First, we introduce a performance model for Cell and apply it to several key scientific computing kernels: dense matrix multiply, sparse matrix vector multiply, stencil computations, and 1D/2D FFTs. The difficulty of programming Cell, which requires assembly level intrinsics for the best performance, makes this model useful as an initial step in algorithm design and evaluation. Next, we validate the accuracy of our model by comparing results against published hardware results, as well as our own implementations on a 3.2GHz Cell blade. Additionally, we compare Cell performance to benchmarks run on leading superscalar (AMD Opteron), VLIW (Intel Itanium2), and vector (Cray X1E) architectures. Our work also explores several different mappings of the kernels and demonstrates a simple and effective programming model for Cell's unique architecture. Finally, we propose modest microarchitectural modifications that could significantly increase the efficiency of double-precision calculations. Overall results demonstrate the tremendous potential of the Cell architecture for scientific computations in terms of both raw performance and power efficiency.
Date: April 4, 2007
Creator: Williams, Samuel W.; Shalf, John; Oliker, Leonid; Kamil, Shoaib; Husbands, Parry & Yelick, Katherine
Partner: UNT Libraries Government Documents Department

Lattice Boltzmann Simulation Optimization on Leading Multicore Platforms

Description: We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of search-based performance optimizations, popular in linear algebra and FFT libraries, to application-specific computational kernels. Our work applies this strategy to a lattice Boltzmann application (LBMHD) that historically has made poor use of scalar microprocessors due to its complex data structures and memory access patterns. We explore one of the broadest sets of multicore architectures in the HPC literature, including the Intel Clovertown, AMD Opteron X2, Sun Niagara2, STI Cell, as well as the single core Intel Itanium2. Rather than hand-tuning LBMHD for each system, we develop a code generator that allows us identify a highly optimized version for each platform, while amortizing the human programming effort. Results show that our auto-tuned LBMHD application achieves up to a 14x improvement compared with the original code. Additionally, we present detailed analysis of each optimization, which reveal surprising hardware bottlenecks and software challenges for future multicore systems and applications.
Date: February 1, 2008
Creator: Williams, Samuel; Carter, Jonathan; Oliker, Leonid; Shalf, John & Yelick, Katherine
Partner: UNT Libraries Government Documents Department

Calibration and Operation Schemes for CEBAF RF Control

Description: The RF control system for the CEBAF accelerator uses calibration tables to calibrate and linearize critical components in the RF control modules. This includes compensation for temperature drifts. Calibration data are stored in nonvolatile RAM on the CPU board in the control module. Algorithms for calibration of components like the vector modulator for the phase reference and the gradient detector are described. The calibration will be performed in a dedicated test stand which will be completely automated. The microprocessor in the control modules allows running of complex algorithms to achieve phase lock and optimize system gains for minimum residual errors for different gradients and beam loading.
Date: September 1, 1990
Creator: Ashkenazi, I.; Hovater, J.; Fugitt, Jock; Mahoney, Kelly & Simrock, Stefan
Partner: UNT Libraries Government Documents Department

Portable and Transparent Message Compression in MPI Libraries to Improve the Performance and Scalability of Parallel Applications

Description: The goal of this project has been to develop a lossless compression algorithm for message-passing libraries that can accelerate HPC systems by reducing the communication time. Because both compression and decompression have to be performed in software in real time, the algorithm has to be extremely fast while still delivering a good compression ratio. During the first half of this project, they designed a new compression algorithm called FPC for scientific double-precision data, made the source code available on the web, and published two papers describing its operation, the first in the proceedings of the Data Compression Conference and the second in the IEEE Transactions on Computers. At comparable average compression ratios, this algorithm compresses and decompresses 10 to 100 times faster than BZIP2, DFCM, FSD, GZIP, and PLMI on the three architectures tested. With prediction tables that fit into the CPU's L1 data acache, FPC delivers a guaranteed throughput of six gigabits per second on a 1.6 GHz Itanium 2 system. The C source code and documentation of FPC are posted on-line and have already been downloaded hundreds of times. To evaluate FPC, they gathered 13 real-world scientific datasets from around the globe, including satellite data, crash-simulation data, and messages from HPC systems. Based on the large number of requests they received, they also made these datasets available to the community (with permission of the original sources). While FPC represents a great step forward, it soon became clear that its throughput was too slow for the emerging 10 gigabits per second networks. Hence, no speedup can be gained by including this algorithm in an MPI library. They therefore changed the aim of the second half of the project. Instead of implementing FPC in an MPI library, they refocused their efforts to develop a parallel compression algorithm to further boost ...
Date: April 17, 2009
Creator: Albonesi, David & Burtscher, Martin
Partner: UNT Libraries Government Documents Department

Demonstration of damage with a wireless sensor network

Description: A damage detection system was developed with commercially available wireless sensors. Statistical process control methods were used to monitor the correlation of vibration data from two accelerometers mounted across a joint. Changes in correlation were used to detect damage to the joint. All data processing was done remotely on a microprocessor integrated with the wireless sensors to allow for the transmission of a simple damaged or undamaged status for each monitored joint. Additionally, a portable demonstration structure was developed to showcase the capabilities of the damage detection system to monitor joint failure in real time.
Date: January 1, 2001
Creator: Tanner, Neal A. & Farrar, C. R. (Charles R.)
Partner: UNT Libraries Government Documents Department

LDRD final report : massive multithreading applied to national infrastructure and informatics.

Description: Large relational datasets such as national-scale social networks and power grids present different computational challenges than do physical simulations. Sandia's distributed-memory supercomputers are well suited for solving problems concerning the latter, but not the former. The reason is that problems such as pattern recognition and knowledge discovery on large networks are dominated by memory latency and not by computation. Furthermore, most memory requests in these applications are very small, and when the datasets are large, most requests miss the cache. The result is extremely low utilization. We are unlikely to be able to grow out of this problem with conventional architectures. As the power density of microprocessors has approached that of a nuclear reactor in the past two years, we have seen a leveling of Moores Law. Building larger and larger microprocessor-based supercomputers is not a solution for informatics and network infrastructure problems since the additional processors are utilized to only a tiny fraction of their capacity. An alternative solution is to use the paradigm of massive multithreading with a large shared memory. There is only one instance of this paradigm today: the Cray MTA-2. The proposal team has unique experience with and access to this machine. The XMT, which is now being delivered, is a Red Storm machine with up to 8192 multithreaded 'Threadstorm' processors and 128 TB of shared memory. For many years, the XMT will be the only way to address very large graph problems efficiently, and future generations of supercomputers will include multithreaded processors. Roughly 10 MTA processor can process a simple short paths problem in the time taken by the Gordon Bell Prize-nominated distributed memory code on 32,000 processors of Blue Gene/Light. We have developed algorithms and open-source software for the XMT, and have modified that software to run some of these algorithms on other ...
Date: September 1, 2009
Creator: Henderson, Bruce A.; Murphy, Richard C.; Wheeler, Kyle; Mackey, Gregory; Berry, Jonathan W.; LaViolette, Randall A. et al.
Partner: UNT Libraries Government Documents Department

Spec-Doc: A User's Guide to Spectrometer Software

Description: SPEC is the name of the operating system designed to control the NMR spectrometers in lab. SPEC is actually one large program which handles many functions necessary to control each spectrometer. The program handles all I/O with peripheral devices such as the console ('terminal' or 'CRT'). The program carries out its operations by accepting commands which each invoke specific subroutines to perform their function. There are a total of 60 commands in SPEC, each carrying out a different function. Because so many commands make SPEC a very large program, not all of the program is core resident. Rather, each command calls in an overlay handler which loads into memory the appropriate overlay from the disk and begins execution of the command. Thus SPEC is an independent disk based operating system. The commands in SPEC are capable of operating the microprocessor based pulse programmer, starting and acquiring data from the spectrometer data acquisition system, storing data on disk and manipulating it mathematically, displaying and plotting data. All arithmetic operations within SPEC are performed on integers. Since the DATA GENERAL computers are 16 bit machines operating in two's complement mode, the integer range is +32767. Many of the mathematical operations of SPEC are done in double precision integer mode with the final result always scaled to the above range. For many of the commands, integer overflow is detected and reported as an error message. Overflowed points are set to +32767. SPEC accepts command input from the console or reads a string of commands previously entered on the disk. The later command structure is called a macro. Macros may be nested and may have constants passed to them at execution time, thus allowing for a powerful supercommand structure. Both forms of commands are discussed in the next section. SPEC is designed to run ...
Date: May 1, 1983
Creator: Sinton, S.
Partner: UNT Libraries Government Documents Department

Scientific Application Performance on Leading Scalar and VectorSupercomputing Platforms

Description: The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to build high-end computing (HEC) platforms, primarily because of their generality, scalability, and cost effectiveness. However, the growing gap between sustained and peak performance for full-scale scientific applications on conventional supercomputers has become a major concern in high performance computing, requiring significantly larger systems and application scalability than implied by peak performance in order to achieve desired performance. The latest generation of custom-built parallel vector systems have the potential to address this issue for numerical algorithms with sufficient regularity in their computational structure. In this work we explore applications drawn from four areas: magnetic fusion (GTC), plasma physics (LBMHD3D), astrophysics (Cactus), and material science (PARATEC). We compare performance of the vector-based Cray X1, X1E, Earth Simulator, NEC SX-8, with performance of three leading commodity-based superscalar platforms utilizing the IBM Power3, Intel Itanium2, and AMD Opteron processors. Our work makes several significant contributions: a new data-decomposition scheme for GTC that (for the first time) enables a breakthrough of the Teraflop barrier; the introduction of a new three-dimensional Lattice Boltzmann magneto-hydrodynamic implementation used to study the onset evolution of plasma turbulence that achieves over 26Tflop/s on 4800 ES processors; the highest per processor performance (by far) achieved by the full-production version of the Cactus ADM-BSSN; and the largest PARATEC cell size atomistic simulation to date. Overall, results show that the vector architectures attain unprecedented aggregate performance across our application suite, demonstrating the tremendous potential of modern parallel vector systems.
Date: January 1, 2007
Creator: Oliker, Leonid; Canning, Andrew; Carter, Jonathan; Shalf, John & Ethier, Stephane
Partner: UNT Libraries Government Documents Department

Performance of Ultra-Scale Applications on Leading Vector andScalar HPC Platforms

Description: The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to build high-end capability and capacity computers primarily because of their generality, scalability, and cost effectiveness. However, the constant degradation of superscalar sustained performance, has become a well-known problem in the scientific computing community. This trend has been widely attributed to the use of superscalar-based commodity components who's architectural designs offer a balance between memory performance, network capability, and execution rate that is poorly matched to the requirements of large-scale numerical computations. The recent development of massively parallel vector systems offers the potential to increase the performance gap for many important classes of algorithms. In this study we examine four diverse scientific applications with the potential to run at ultrascale, from the areas of plasma physics, material science, astrophysics, and magnetic fusion. We compare performance between the vector-based Earth Simulator (ES) and Cray X1, with leading superscalar-based platforms: the IBM Power3/4 and the SGI Altix. Results demonstrate that the ES vector systems achieve excellent performance on our application suite - the highest of any architecture tested to date.
Date: January 1, 2005
Creator: Oliker, Leonid; Canning, Andrew; Carter, Jonathan Carter; Shalf,John; Simon, Horst; Ethier, Stephane et al.
Partner: UNT Libraries Government Documents Department

An Implementation of the IEEE Standard for Binary Floating-Point Arithmetic for the Motorola 6809 Microprocessor

Description: This thesis describes a software implementation of the IEEE Floating-Point Standard (IEEE Task P754), which is believed to be an effective system for reliable, accurate computer arithmetic. The standard is implemented as a set of procedures written in Motorola 6809 assembly language. Source listings of the procedures are contained in appendices.
Date: August 1983
Creator: Rosenblum, David Samuel
Partner: UNT Libraries

Benchmark tests on the digital equipment corporation Alpha AXP 21164-based AlphaServer 8400, including a comparison of optimized vector and superscalar processing

Description: The second generation of the Digital Equipment Corp. (DEC) DECchip Alpha AXP microprocessor is referred to as the 21164. From the viewpoint of numerically-intensive computing, the primary difference between it and its predecessor, the 21064, is that the 21164 has twice the multiply/add throughput per clock period (CP), a maximum of two floating point operations (FLOPS) per CP vs. one for 21064. The AlphaServer 8400 is a shared-memory multiprocessor server system that can accommodate up to 12 CPUs and up to 14 GB of memory. In this report we will compare single processor performance of the 8400 system with that of the International Business Machines Corp. (IBM) RISC System/6000 POWER-2 microprocessor running at 66 MHz, the Silicon Graphics, Inc. (SGI) MIPS R8000 microprocessor running at 75 MHz, and the Cray Research, Inc. CRAY J90. The performance comparison is based on a set of Fortran benchmark codes that represent a portion of the Los Alamos National Laboratory supercomputer workload. The advantage of using these codes, is that the codes also span a wide range of computational characteristics, such as vectorizability, problem size, and memory access pattern. The primary disadvantage of using them is that detailed, quantitative analysis of performance behavior of all codes on all machines is difficult. One important addition to the benchmark set appears for the first time in this report. Whereas the older version was written for a vector processor, the newer version is more optimized for microprocessor architectures. Therefore, we have for the first time, an opportunity to measure performance on a single application using implementations that expose the respective strengths of vector and superscalar architecture. All results in this report are from single processors. A subsequent article will explore shared-memory multiprocessing performance of the 8400 system.
Date: February 1, 1996
Creator: Wasserman, H.J.
Partner: UNT Libraries Government Documents Department

Cost-effective instrumentation and control upgrades for commercial nuclear power plants using surety principles developed at Sandia National Laboratories

Description: Many nuclear power plants use instrument and control systems based on analog electronics. The state of the art in process control and instrumentation has advanced to use digital electronics and incorporate advanced technology. This technology includes distributed microprocessors, fiber optics, intelligent systems (neural networks), and advanced displays. The technology is used to optimize processes and enhance the man-machine interface while maintaining control and safety of the processes. Nuclear power plant operators have been hesitant to install this technology because of the cost and uncertainty in the regulatory process. This technology can be directly applied in an operating nuclear power plant provided a surety principle-based {open_quotes}administrator{close_quotes} hardware system is included in parallel with the upgrade Sandia National Laboratories has developed a rigorous approach to High Consequence System Surety (HCSS). This approach addresses the key issues of safety, security, and control while satisfying requirements for reliability and quality. HCSS principles can be applied to nuclear power plants in a manner that allows the off-the-shelf use of process control instrumentation while maintaining a high level of safety and enhancing the plant performance. We propose that an HCSS administrator be constructed as a standardized approach to address regulatory issues. Such an administrator would allow a plant control system to be constructed with commercially available, state-of-the-art equipment and be customized to the needs of the individual plant operator.
Date: November 1, 1997
Creator: Rochau, G.E. & Dalton, L.J.
Partner: UNT Libraries Government Documents Department

Control system reliability at Jefferson Lab

Description: At Thomas Jefferson National Accelerator Facility (Jefferson Lab), the availability of the control system is crucial to the operation of the accelerator for experimental programs. Jefferson Lab's control system, uses 68040 based microprocessors running VxWorks, Unix workstations, and a variety of VME, CAMAC. GPIB, and serial devices. The software consists of control system toolkit software, commercial packages, and over 200 custom and generic applications, some of which are highly complex. The challenge is to keep this highly diverse and still growing system, with over 162,000 control points, operating reliably, while managing changes and upgrades to both the hardware and software. Downtime attributable to the control system includes the time to troubleshoot and repair problems and the time to restore the machine to operation of the scheduled program. This paper describes the availability of the control system during the last year, the heaviest contributors to downtime and the response to problems. Strategies for improving the robustness of the control system am detailed and include changes in hardware, software, procedures and processes. The improvements range from the routine preventive hardware maintenance, to improving their ability to detect, predict and prevent problems. This paper also describes the software tools used to assist in control system troubleshooting, maintenance and failure recovery processes.
Date: December 1, 1997
Creator: White, K.S.; Areti, H. & Garza, O.
Partner: UNT Libraries Government Documents Department

Parallel supercomputing with commodity components

Description: We have implemented a parallel computer architecture based entirely upon commodity personal computer components. Using 16 Intel Pentium Pro microprocessors and switched fast ethernet as a communication fabric, we have obtained sustained performance on scientific applications in excess of one Gigaflop. During one production astrophysics treecode simulation, we performed 1.2 x 10{sup 15} floating point operations (1.2 Petaflops) over a three week period, with one phase of that simulation running continuously for two weeks without interruption. We report on a variety of disk, memory and network benchmarks. We also present results from the NAS parallel benchmark suite, which indicate that this architecture is competitive with current commercial architectures. In addition, we describe some software written to support efficient message passing, as well as a Linux device driver interface to the Pentium hardware performance monitoring registers.
Date: September 1, 1997
Creator: Warren, M.S.; Goda, M.P. & Becker, D.J.
Partner: UNT Libraries Government Documents Department

Secure coprocessing applications and research issues

Description: The potential of secure coprocessing to address many emerging security challenges and to enable new applications has been a long-standing interest of many members of the Computer Research and Applications Group, including this author. The purpose of this paper is to summarize this thinking, by presenting a taxonomy of some potential applications and by summarizing what we regard as some particularly interesting research questions.
Date: August 1, 1996
Creator: Smith, S.W.
Partner: UNT Libraries Government Documents Department