260 Matching Results

Search Results

Advanced search parameters have been applied.

On the Equivalence of Nonnegative Matrix Factorization and K-means- Spectral Clustering

Description: We provide a systematic analysis of nonnegative matrix factorization (NMF) relating to data clustering. We generalize the usual X = FG{sup T} decomposition to the symmetric W = HH{sup T} and W = HSH{sup T} decompositions. We show that (1) W = HH{sup T} is equivalent to Kernel K-means clustering and the Laplacian-based spectral clustering. (2) X = FG{sup T} is equivalent to simultaneous clustering of rows and columns of a bipartite graph. We emphasizes the importance of orthogonality in NMF and soft clustering nature of NMF. These results are verified with experiments on face images and newsgroups.
Date: December 4, 2005
Creator: Ding, Chris; He, Xiaofeng; Simon, Horst D. & Jin, Rong
Partner: UNT Libraries Government Documents Department

Semantic-Aware Automatic Parallelization of Modern Applications Using High-Level Abstractions

Description: Automatic introduction of OpenMP for sequential applications has attracted significant attention recently because of the proliferation of multicore processors and the simplicity of using OpenMP to express parallelism for shared-memory systems. However, most previous research has only focused on C and Fortran applications operating on primitive data types. Modern applications using high-level abstractions, such as C++ STL containers and complex user-defined class types, are largely ignored due to the lack of research compilers that are readily able to recognize high-level object-oriented abstractions and leverage their associated semantics. In this paper, we use a source-to-source compiler infrastructure, ROSE, to explore compiler techniques to recognize high-level abstractions and to exploit their semantics for automatic parallelization. Several representative parallelization candidate kernels are used to study semantic-aware parallelization strategies for high-level abstractions, combined with extended compiler analyses. Preliminary results have shown that semantics of abstractions can help extend the applicability of automatic parallelization to modern applications and expose more opportunities to take advantage of multicore processors.
Date: December 21, 2009
Creator: Liao, C; Quinlan, D J; Willcock, J J & Panas, T
Partner: UNT Libraries Government Documents Department

Architecture independent performance characterization andbenchmarking for scientific applications

Description: A simple, tunable, synthetic benchmark with a performance directly related to applications would be of great benefit to the scientific computing community. In this paper, we present a novel approach to develop such a benchmark. The initial focus of this project is on data access performance of scientific applications. First a hardware independent characterization of code performance in terms of address streams is developed. The parameters chosen to characterize a single address stream are related to regularity, size, spatial, and temporal locality. These parameters are then used to implement a synthetic benchmark program that mimics the performance of a corresponding code. To test the validity of our approach we performed experiments using five test kernels on six different platforms. The performance of most of our test kernels can be approximated by a single synthetic address stream. However in some cases overlapping two address streams is necessary to achieve a good approximation.
Date: August 31, 2004
Creator: Strohmaier, Erich & Shan, Hongzhang
Partner: UNT Libraries Government Documents Department

TORCH Computational Reference Kernels - A Testbed for Computer Science Research

Description: For decades, computer scientists have sought guidance on how to evolve architectures, languages, and programming models in order to improve application performance, efficiency, and productivity. Unfortunately, without overarching advice about future directions in these areas, individual guidance is inferred from the existing software/hardware ecosystem, and each discipline often conducts their research independently assuming all other technologies remain fixed. In today's rapidly evolving world of on-chip parallelism, isolated and iterative improvements to performance may miss superior solutions in the same way gradient descent optimization techniques may get stuck in local minima. To combat this, we present TORCH: A Testbed for Optimization ResearCH. These computational reference kernels define the core problems of interest in scientific computing without mandating a specific language, algorithm, programming model, or implementation. To compliment the kernel (problem) definitions, we provide a set of algorithmically-expressed verification tests that can be used to verify a hardware/software co-designed solution produces an acceptable answer. Finally, to provide some illumination as to how researchers have implemented solutions to these problems in the past, we provide a set of reference implementations in C and MATLAB.
Date: December 2, 2010
Creator: Kaiser, Alex; Williams, Samuel Webb; Madduri, Kamesh; Ibrahim, Khaled; Bailey, David H.; Demmel, James W. et al.
Partner: UNT Libraries Government Documents Department

Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms

Description: We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific-optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD quad-core, AMD dual-core, and Intel quad-core designs, the heterogeneous STI Cell, as well as one of the first scientific studies of the highly multithreaded Sun Victoria Falls (a Niagara2 SMP). We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural trade-offs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.
Date: October 16, 2008
Creator: Williams, Samuel; Oliker, Leonid; Vuduc, Richard; Shalf, John; Yelick, Katherine & Demmel, James
Partner: UNT Libraries Government Documents Department

Prototype Vector Machine for Large Scale Semi-Supervised Learning

Description: Practicaldataminingrarelyfalls exactlyinto the supervisedlearning scenario. Rather, the growing amount of unlabeled data poses a big challenge to large-scale semi-supervised learning (SSL). We note that the computationalintensivenessofgraph-based SSLarises largely from the manifold or graph regularization, which in turn lead to large models that are dificult to handle. To alleviate this, we proposed the prototype vector machine (PVM), a highlyscalable,graph-based algorithm for large-scale SSL. Our key innovation is the use of"prototypes vectors" for effcient approximation on both the graph-based regularizer and model representation. The choice of prototypes are grounded upon two important criteria: they not only perform effective low-rank approximation of the kernel matrix, but also span a model suffering the minimum information loss compared with the complete model. We demonstrate encouraging performance and appealing scaling properties of the PVM on a number of machine learning benchmark data sets.
Date: April 29, 2009
Creator: Zhang, Kai; Kwok, James T. & Parvin, Bahram
Partner: UNT Libraries Government Documents Department

Traveling waves and the renormalization group improvedBalitsky-Kovchegov equation

Description: I study the incorporation of renormalization group (RG)improved BFKL kernels in the Balitsky-Kovchegov (BK) equation whichdescribes parton saturation. The RG improvement takes into accountimportant parts of the next-to-leading and higher order logarithmiccorrections to the kernel. The traveling wave front method for analyzingthe BK equation is generalized to deal with RG-resummed kernels,restricting to the interesting case of fixed QCD coupling. The resultsshow that the higher order corrections suppress the rapid increase of thesaturation scale with increasing rapidity. I also perform a "diffusive"differential equation approximation, which illustrates that someimportant qualitative properties of the kernel change when including RGcorrections.
Date: December 1, 2006
Creator: Enberg, Rikard
Partner: UNT Libraries Government Documents Department

Oracle inequalities for SVMs that are based on random entropy numbers

Description: In this paper we present a new technique for bounding local Rademacher averages of function classes induced by a loss function and a reproducing kernel Hilbert space (RKHS). At the heart of this technique lies the observation that certain expectations of random entropy numbers can be bounded by the eigenvalues of the integral operator associated to the RKHS. We then work out the details of the new technique by establishing two new oracle inequalities for SVMs, which complement and generalize orevious results.
Date: January 1, 2009
Creator: Steinwart, Ingo
Partner: UNT Libraries Government Documents Department

Hierarchical resilience with lightweight threads.

Description: This paper proposes methodology for providing robustness and resilience for a highly threaded distributed- and shared-memory environment based on well-defined inputs and outputs to lightweight tasks. These inputs and outputs form a failure 'barrier', allowing tasks to be restarted or duplicated as necessary. These barriers must be expanded based on task behavior, such as communication between tasks, but do not prohibit any given behavior. One of the trends in high-performance computing codes seems to be a trend toward self-contained functions that mimic functional programming. Software designers are trending toward a model of software design where their core functions are specified in side-effect free or low-side-effect ways, wherein the inputs and outputs of the functions are well-defined. This provides the ability to copy the inputs to wherever they need to be - whether that's the other side of the PCI bus or the other side of the network - do work on that input using local memory, and then copy the outputs back (as needed). This design pattern is popular among new distributed threading environment designs. Such designs include the Barcelona STARS system, distributed OpenMP systems, the Habanero-C and Habanero-Java systems from Vivek Sarkar at Rice University, the HPX/ParalleX model from LSU, as well as our own Scalable Parallel Runtime effort (SPR) and the Trilinos stateless kernels. This design pattern is also shared by CUDA and several OpenMP extensions for GPU-type accelerators (e.g. the PGI OpenMP extensions).
Date: October 1, 2011
Creator: Wheeler, Kyle Bruce
Partner: UNT Libraries Government Documents Department

Simple classifiers from data dependent hypothesis classes

Description: In this paper we introduce simple classifiers as an example of how to use the data dependent hypothesis class framework described by Cannon et al. (2002) to explore the performance/computation trade-off in the classifier design problem. We provide a specific example of a simple classifier and demonstrate that it has many remarkable properties: For example it possesses computationally efficient learning algorithms with favorable bounds on estimation error, admits kernel mappings, and is particularly well suited to boosting. We present experimental results on synthetic and real data that suggest that this classifier is competitive with powerful alternative methods.
Date: January 1, 2003
Creator: Cannon, A. (Adam); Howse, J. W. (James W.); Hush, D. R. (Donald R.) & Scovel, James C.
Partner: UNT Libraries Government Documents Department

Performance Evaluation of the IBM SP and the Compaq AlphaServer SC

Description: Oak Ridge National Laboratory (ORNL) has recently installed both a Compaq AlphaServer SC and an IBM SP, each with 4-way SMP nodes, allowing a direct comparison of the two architectures. In this paper, we describe our initial evaluation. The evaluation looks at both kernel and application performance for a spectral atmospheric general circulation model, an important application for the ORNL systems.
Date: August 3, 2001
Creator: Worley, P.H.
Partner: UNT Libraries Government Documents Department

Scientific kernels on VIRAM and imagine media processors

Description: Many high performance applications run well below the peak arithmetic performance of the underlying machine, with inefficiencies often attributed to a lack of memory bandwidth. In this work we examine two emerging media processors designed to address the well-known gap between processor and memory performance, in the context of scientific computing. The VIRAM architecture uses novel PIM technology to combine embedded DRAM with a vector co-processor for exploiting its large bandwidth potential. The Imagine architecture, on the other hand, provides a stream-aware memory hierarchy to support the tremendous processing potential of the SIMD controlled VLIW clusters. First we develop a scalable synthetic probe that allows us to parametize key performance attributes of VIRAM and Imagine while capturing the performance crossover point of these architectures. Next we present results for two important scientific kernels each with a unique set of computational characteristics and memory access patterns. Our experiments isolate the set of application characteristics best suited for each architecture and show a promising direction towards interfacing leading-edge media processor technology with high-end scientific computations.
Date: October 10, 2002
Creator: Narayanan, Manikamdan; Oliker, Leonid; Janin, Adam; Husbands,Parry & Li, Xiaoye S.
Partner: UNT Libraries Government Documents Department

Evaluation of architectural paradigms for addressing theprocessor-memory gap

Description: Many high performance applications run well below the peak arithmetic performance of the underlying machine, with inefficiencies often attributed to poor memory system behavior. In the context of scientific computing we examine three emerging processors designed to address the well-known gap between processor and memory performance through the exploitation of data parallelism. The VIRAM architecture uses novel PIM technology to combine embedded DRAM with a vector co-processor for exploiting its large bandwidth potential. The DIVA architecture incorporates a collection of PIM chips as smart-memory coprocessors to a conventional microprocessor, and relies on superword-level parallelism to make effective use of the available memory bandwidth. The Imagine architecture provides a stream-aware memory hierarchy to support the tremendous processing potential of SIMD controlled VLIW clusters. First we develop a scalable synthetic probe that allows us to parametize key performance attributes of VIRAM, DIVA and Imagine while capturing the performance crossover points of these architectures. Next we present results for scientific kernels with different sets of computational characteristics and memory access patterns. Our experiments allow us to evaluate the strategies employed to exploit data parallelism, isolate the set of application characteristics best suited to each architecture and show a promising direction towards interfacing leading-edge processor technology with high-end scientific computations.
Date: July 4, 2003
Creator: Oliker, Leonid; Gorden, Grime; Husbands, Parry & Chame, Jacqualine
Partner: UNT Libraries Government Documents Department

Results from ORNL Characterization of Nominal 350 ?m LEUCO Kernels (LEU03) from the BWXT G73V-20-69303 Composite

Description: Measurements were made using optical microscopy to determine the size and shape of the LEU03 kernels. Hg porosimetry was performed to measure density. The results are summarized in Table 1-1. Values in the table are for the composite and are calculated at 95% confidence from the measured values of a random riffled sample. The LEu03 kernel composite met all the specifications in Table 1-1. The BWXT results for measuring the same kernel properties are given in Table 1-2. BWXT characterization methods were significantly different from ORNL methods, which resulted in slight differences in the reported results. BWXT performed manual microscopy measurements for mean diameter (100 particles measured along 2 axes) and aspect ratio (100 particles measured); ORNL used automated image acquisition and analysis (3847 particles measured along 180 axes). Diameter measurements were in good agreement. The narrower confidence interval in the ORNL results for average mean diameter is due to the greater number of particles measured. The critical limits for mean diameter reported at ORNL and BWXT are similar, because ORNL measured a larger standard deviation (10.46 {micro}m vs. 8.70 {micro}m). Aspect ratio satisfied the specification with greater margin in the ORNL results mostly because of the larger sample size resulting in a lower uncertainty in the binomial distribution statistical calculation. ORNL measured 11 out of 3847 kernels exceeding the control limit (1.05); BWXT measured 1 out of 100 particles exceeding the control limit. BWXT used the aspect ratio of perpendicular diameters in a random image plane, where one diameter was a maximum or a minimum. ORNL used the aspect ratio of the absolute maximum and minimum diameters in a random image plane. The ORNL technique can be expected to yield higher measured aspect ratios. Hand tabling was performed at ORNL prior to characterization by repeatedly pouring a small fraction ...
Date: November 1, 2006
Creator: Kercher, Andrew K. & Hunn, John D.
Partner: UNT Libraries Government Documents Department

Modelling long-distance seed dispersal in heterogeneous landscapes.

Description: 1. Long-distance seed dispersal is difficult to measure, yet key to understanding plant population dynamics and community composition. 2. We used a spatially explicit model to predict the distribution of seeds dispersed long distances by birds into habitat patches of different shapes. All patches were the same type of habitat and size, but varied in shape. They occurred in eight experimental landscapes, each with five patches of four different shapes, 150 m apart in a matrix of mature forest. The model was parameterized with smallscale movement data collected from field observations of birds. In a previous study we validated the model by testing its predictions against observed patterns of seed dispersal in real landscapes with the same types and spatial configuration of patches as in the model. 3. Here we apply the model more broadly, examining how patch shape influences the probability of seed deposition by birds into patches, how dispersal kernels (distributions of dispersal distances) vary with patch shape and starting location, and how movement of seeds between patches is affected by patch shape. 4. The model predicts that patches with corridors or other narrow extensions receive higher numbers of seeds than patches without corridors or extensions. This pattern is explained by edgefollowing behaviour of birds. Dispersal distances are generally shorter in heterogeneous landscapes (containing patchy habitat) than in homogeneous landscapes, suggesting that patches divert the movement of seed dispersers, ‘holding’ them long enough to increase the probability of seed defecation in the patches. Dispersal kernels for seeds in homogeneous landscapes were smooth, whereas those in heterogenous landscapes were irregular. In both cases, long-distance (> 150 m) dispersal was surprisingly common, usually comprising approximately 50% of all dispersal events. 5. Synthesis . Landscape heterogeneity has a large influence on patterns of long-distance seed dispersal. Our results suggest that long-distance ...
Date: January 1, 2008
Creator: Levey, Douglas, J.; Tewlsbury, Joshua, J. & Bolker, Benjamin, M.
Partner: UNT Libraries Government Documents Department

QUANTUM MECHANICAL REACTIVE SCATTERING VIA EXCHANGE KERNELS: APPLICATION TO THE COLLINEAR H+H2 REACTION

Description: A formulation of quantum mechanical reactive scattering given by Miller is applied to the collinear H + H{sub 2} reaction. The approach is the direct analog to the Hartree-Fock method of electronic structure theory, and it obviates the need for specialized (e.g., 'natural' collision) coordinates. The rearrangement process takes place via an explicit exchange interaction (cf. electron exchange in Hartree-Fock theory), and closed channels are incorporated via a square-integrable set of correlation functions. Agreement with results obtained by others using other methods is excellent, showing this approach to quantum mechanical reactive scattering to be a viable one.
Date: October 1, 1977
Creator: Garrett, Bruce G. & Miller, William H.
Partner: UNT Libraries Government Documents Department

Deep Burn: Development of Transuranic Fuel for High-Temperature Helium-Cooled Reactors- Monthly Highlights November 2010

Description: During FY 2011 the DB Program will report Highlights on a monthly basis, but will no longer produce Quarterly Progress Reports. Technical details that were previously included in the quarterly reports will be included in the appropriate Milestone Reports that are submitted to FCRD Program Management. These reports will also be uploaded to the Deep Burn website. The Monthly Highlights report for October 2010, ORNL/TM-2010/300, was distributed to program participants on November 29, 2010. This report discusses the following: (1) Thermochemical Data and Model Development; (2) TRU (transuranic elements) TRISO (tri-structural isotropic) Development - (a) TRU Kernel Development, (b) Coating Development; (3) LWR Fully Ceramic Fuel - (a) FCM Fabrication Development, (b) FCM Irradiation Testing.
Date: December 1, 2010
Creator: Snead, Lance Lewis; Bell, Gary L & Besmann, Theodore M
Partner: UNT Libraries Government Documents Department

Fracture and Fragmentation of Simplicial Finite Elements Meshes using Graphs

Description: An approach for the topological representation of simplicial finite element meshes as graphs is presented. It is shown that by using a graph, the topological changes induced by fracture reduce to a few, local kernel operations. The performance of the graph representation is demonstrated and analyzed, using as reference the 3D fracture algorithm by Pandolfi and Ortiz [22]. It is shown that the graph representation initializes in O(N{sub E}{sup 1.1}) time and fractures in O(N{sub I}{sup 1.0}) time, while the reference implementation requires O(N{sub E}{sup 2.1}) time to initialize and O(N{sub I}{sup 1.9}) time to fracture, where NE is the number of elements in the mesh and N{sub I} is the number of interfaces to fracture.
Date: October 18, 2006
Creator: Mota, A; Knap, J & Ortiz, M
Partner: UNT Libraries Government Documents Department

Evaluating the Performance Impact of Xen on MPI and Process Execution For HPC Systems

Description: Virtualization has become increasingly popular for enabling full system isolation, load balancing, and hardware multiplexing for high-end server systems. Virtualizing software has the potential to benefit HPC systems similarly by facilitating efficient cluster management, application isolation, full-system customization, and process migration. However, virtualizing software is not currently employed in HPC environments due to its perceived overhead. In this work, we investigate the overhead imposed by the popular, open-source, Xen virtualization system, on performance-critical HPC kernels and applications. We empirically evaluate the impact of Xen on both communication and computation and compare its use to that of a customized kernel using HPC cluster resources at Lawrence Livermore National Lab (LLNL). We also employ statistically sound methods to compare the performance of a paravirtualized kernel against three popular Linux operating systems: RedHat Enterprise 4 (RHEL4) for build versions 2.6.9 and 2.6.12 and the LLNL CHAOS kernel, a specialized version of RHEL4. Our results indicate that Xen is very efficient and practical for HPC systems.
Date: December 21, 2006
Creator: Youseff, L; Wolski, R; Gorda, B & Krintz, C
Partner: UNT Libraries Government Documents Department

Evaluating operating system vulnerability to memory errors.

Description: Reliability is of great concern to the scalability of extreme-scale systems. Of particular concern are soft errors in main memory, which are a leading cause of failures on current systems and are predicted to be the leading cause on future systems. While great effort has gone into designing algorithms and applications that can continue to make progress in the presence of these errors without restarting, the most critical software running on a node, the operating system (OS), is currently left relatively unprotected. OS resiliency is of particular importance because, though this software typically represents a small footprint of a compute node's physical memory, recent studies show more memory errors in this region of memory than the remainder of the system. In this paper, we investigate the soft error vulnerability of two operating systems used in current and future high-performance computing systems: Kitten, the lightweight kernel developed at Sandia National Laboratories, and CLE, a high-performance Linux-based operating system developed by Cray. For each of these platforms, we outline major structures and subsystems that are vulnerable to soft errors and describe methods that could be used to reconstruct damaged state. Our results show the Kitten lightweight operating system may be an easier target to harden against memory errors due to its smaller memory footprint, largely deterministic state, and simpler system structure.
Date: May 1, 2012
Creator: Ferreira, Kurt Brian; Bridges, Patrick G. (University of New Mexico); Pedretti, Kevin Thomas Tauke; Mueller, Frank (North Carolina State University); Fiala, David (North Carolina State University) & Brightwell, Ronald Brian
Partner: UNT Libraries Government Documents Department

Paravirtualization for HPC Systems

Description: Virtualization has become increasingly popular for enabling full system isolation, load balancing, and hardware multiplexing. This wide-spread use is the result of novel techniques such as paravirtualization that make virtualization systems practical and efficient. Paravirtualizing systems export an interface that is slightly different from the underlying hardware but that significantly streamlines and simplifies the virtualization process. In this work, we investigate the efficacy of using paravirtualizing software for performance-critical HPC kernels and applications. Such systems are not currently employed in HPC environments due to their perceived overhead. However, virtualization systems offer tremendous potential for benefiting HPC systems by facilitating application isolation, portability, operating system customization, and program migration. We present a comprehensive performance evaluation of Xen, a low-overhead, Linux-based, virtual machine monitor (VMM), for paravirtualization of HPC cluster systems at Lawrence Livermore National Lab (LLNL). We consider four categories of micro-benchmarks from the HPC Challenge (HPCC) and LLNL ASCI Purple suites to evaluate a wide range of subsystem-specific behaviors. In addition, we employ macro-benchmarks and HPC application to evaluate overall performance in a real setting. We also employ statistically sound methods to compare the performance of a paravirtualized kernel against three popular Linux operating systems: RedHat Enterprise 4 (RHEL4) for build versions 2.6.9 and 2.6.12 and the LLNL CHAOS kernel, a specialized version of RHEL4. Our results indicate that Xen is very efficient and practical for HPC systems.
Date: October 12, 2006
Creator: Youseff, L; Wolski, R; Gorda, B & Krintz, C
Partner: UNT Libraries Government Documents Department

Investigating methods of supporting dynamically linked executables on high performance computing platforms.

Description: Shared libraries have become ubiquitous and are used to achieve great resource efficiencies on many platforms. The same properties that enable efficiencies on time-shared computers and convenience on small clusters prove to be great obstacles to scalability on large clusters and High Performance Computing platforms. In addition, Light Weight operating systems such as Catamount have historically not supported the use of shared libraries specifically because they hinder scalability. In this report we will outline the methods of supporting shared libraries on High Performance Computing platforms using Light Weight kernels that we investigated. The considerations necessary to evaluate utility in this area are many and sometimes conflicting. While our initial path forward has been determined based on this evaluation we consider this effort ongoing and remain prepared to re-evaluate any technology that might provide a scalable solution. This report is an evaluation of a range of possible methods of supporting dynamically linked executables on capability class1 High Performance Computing platforms. Efforts are ongoing and extensive testing at scale is necessary to evaluate performance. While performance is a critical driving factor, supporting whatever method is used in a production environment is an equally important and challenging task.
Date: September 1, 2009
Creator: Kelly, Suzanne Marie; Laros, James H., III; Pedretti, Kevin Thomas Tauke & Levenhagen, Michael J.
Partner: UNT Libraries Government Documents Department