282 Matching Results

Search Results

Advanced search parameters have been applied.

On the Equivalence of Nonnegative Matrix Factorization and K-means- Spectral Clustering

Description: We provide a systematic analysis of nonnegative matrix factorization (NMF) relating to data clustering. We generalize the usual X = FG{sup T} decomposition to the symmetric W = HH{sup T} and W = HSH{sup T} decompositions. We show that (1) W = HH{sup T} is equivalent to Kernel K-means clustering and the Laplacian-based spectral clustering. (2) X = FG{sup T} is equivalent to simultaneous clustering of rows and columns of a bipartite graph. We emphasizes the importance of orthogonality in NMF and soft clustering nature of NMF. These results are verified with experiments on face images and newsgroups.
Date: December 4, 2005
Creator: Ding, Chris; He, Xiaofeng; Simon, Horst D. & Jin, Rong
Partner: UNT Libraries Government Documents Department

Architecture independent performance characterization andbenchmarking for scientific applications

Description: A simple, tunable, synthetic benchmark with a performance directly related to applications would be of great benefit to the scientific computing community. In this paper, we present a novel approach to develop such a benchmark. The initial focus of this project is on data access performance of scientific applications. First a hardware independent characterization of code performance in terms of address streams is developed. The parameters chosen to characterize a single address stream are related to regularity, size, spatial, and temporal locality. These parameters are then used to implement a synthetic benchmark program that mimics the performance of a corresponding code. To test the validity of our approach we performed experiments using five test kernels on six different platforms. The performance of most of our test kernels can be approximated by a single synthetic address stream. However in some cases overlapping two address streams is necessary to achieve a good approximation.
Date: August 31, 2004
Creator: Strohmaier, Erich & Shan, Hongzhang
Partner: UNT Libraries Government Documents Department

Semantic-Aware Automatic Parallelization of Modern Applications Using High-Level Abstractions

Description: Automatic introduction of OpenMP for sequential applications has attracted significant attention recently because of the proliferation of multicore processors and the simplicity of using OpenMP to express parallelism for shared-memory systems. However, most previous research has only focused on C and Fortran applications operating on primitive data types. Modern applications using high-level abstractions, such as C++ STL containers and complex user-defined class types, are largely ignored due to the lack of research compilers that are readily able to recognize high-level object-oriented abstractions and leverage their associated semantics. In this paper, we use a source-to-source compiler infrastructure, ROSE, to explore compiler techniques to recognize high-level abstractions and to exploit their semantics for automatic parallelization. Several representative parallelization candidate kernels are used to study semantic-aware parallelization strategies for high-level abstractions, combined with extended compiler analyses. Preliminary results have shown that semantics of abstractions can help extend the applicability of automatic parallelization to modern applications and expose more opportunities to take advantage of multicore processors.
Date: December 21, 2009
Creator: Liao, C; Quinlan, D J; Willcock, J J & Panas, T
Partner: UNT Libraries Government Documents Department

TORCH Computational Reference Kernels - A Testbed for Computer Science Research

Description: For decades, computer scientists have sought guidance on how to evolve architectures, languages, and programming models in order to improve application performance, efficiency, and productivity. Unfortunately, without overarching advice about future directions in these areas, individual guidance is inferred from the existing software/hardware ecosystem, and each discipline often conducts their research independently assuming all other technologies remain fixed. In today's rapidly evolving world of on-chip parallelism, isolated and iterative improvements to performance may miss superior solutions in the same way gradient descent optimization techniques may get stuck in local minima. To combat this, we present TORCH: A Testbed for Optimization ResearCH. These computational reference kernels define the core problems of interest in scientific computing without mandating a specific language, algorithm, programming model, or implementation. To compliment the kernel (problem) definitions, we provide a set of algorithmically-expressed verification tests that can be used to verify a hardware/software co-designed solution produces an acceptable answer. Finally, to provide some illumination as to how researchers have implemented solutions to these problems in the past, we provide a set of reference implementations in C and MATLAB.
Date: December 2, 2010
Creator: Kaiser, Alex; Williams, Samuel Webb; Madduri, Kamesh; Ibrahim, Khaled; Bailey, David H.; Demmel, James W. et al.
Partner: UNT Libraries Government Documents Department

Performance Evaluation of the IBM SP and the Compaq AlphaServer SC

Description: Oak Ridge National Laboratory (ORNL) has recently installed both a Compaq AlphaServer SC and an IBM SP, each with 4-way SMP nodes, allowing a direct comparison of the two architectures. In this paper, we describe our initial evaluation. The evaluation looks at both kernel and application performance for a spectral atmospheric general circulation model, an important application for the ORNL systems.
Date: August 3, 2001
Creator: Worley, P.H.
Partner: UNT Libraries Government Documents Department

Scientific kernels on VIRAM and imagine media processors

Description: Many high performance applications run well below the peak arithmetic performance of the underlying machine, with inefficiencies often attributed to a lack of memory bandwidth. In this work we examine two emerging media processors designed to address the well-known gap between processor and memory performance, in the context of scientific computing. The VIRAM architecture uses novel PIM technology to combine embedded DRAM with a vector co-processor for exploiting its large bandwidth potential. The Imagine architecture, on the other hand, provides a stream-aware memory hierarchy to support the tremendous processing potential of the SIMD controlled VLIW clusters. First we develop a scalable synthetic probe that allows us to parametize key performance attributes of VIRAM and Imagine while capturing the performance crossover point of these architectures. Next we present results for two important scientific kernels each with a unique set of computational characteristics and memory access patterns. Our experiments isolate the set of application characteristics best suited for each architecture and show a promising direction towards interfacing leading-edge media processor technology with high-end scientific computations.
Date: October 10, 2002
Creator: Narayanan, Manikamdan; Oliker, Leonid; Janin, Adam; Husbands,Parry & Li, Xiaoye S.
Partner: UNT Libraries Government Documents Department

Evaluation of architectural paradigms for addressing theprocessor-memory gap

Description: Many high performance applications run well below the peak arithmetic performance of the underlying machine, with inefficiencies often attributed to poor memory system behavior. In the context of scientific computing we examine three emerging processors designed to address the well-known gap between processor and memory performance through the exploitation of data parallelism. The VIRAM architecture uses novel PIM technology to combine embedded DRAM with a vector co-processor for exploiting its large bandwidth potential. The DIVA architecture incorporates a collection of PIM chips as smart-memory coprocessors to a conventional microprocessor, and relies on superword-level parallelism to make effective use of the available memory bandwidth. The Imagine architecture provides a stream-aware memory hierarchy to support the tremendous processing potential of SIMD controlled VLIW clusters. First we develop a scalable synthetic probe that allows us to parametize key performance attributes of VIRAM, DIVA and Imagine while capturing the performance crossover points of these architectures. Next we present results for scientific kernels with different sets of computational characteristics and memory access patterns. Our experiments allow us to evaluate the strategies employed to exploit data parallelism, isolate the set of application characteristics best suited to each architecture and show a promising direction towards interfacing leading-edge processor technology with high-end scientific computations.
Date: July 4, 2003
Creator: Oliker, Leonid; Gorden, Grime; Husbands, Parry & Chame, Jacqualine
Partner: UNT Libraries Government Documents Department

Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms

Description: We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific-optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD quad-core, AMD dual-core, and Intel quad-core designs, the heterogeneous STI Cell, as well as one of the first scientific studies of the highly multithreaded Sun Victoria Falls (a Niagara2 SMP). We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural trade-offs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.
Date: October 16, 2008
Creator: Williams, Samuel; Oliker, Leonid; Vuduc, Richard; Shalf, John; Yelick, Katherine & Demmel, James
Partner: UNT Libraries Government Documents Department

Traveling waves and the renormalization group improvedBalitsky-Kovchegov equation

Description: I study the incorporation of renormalization group (RG)improved BFKL kernels in the Balitsky-Kovchegov (BK) equation whichdescribes parton saturation. The RG improvement takes into accountimportant parts of the next-to-leading and higher order logarithmiccorrections to the kernel. The traveling wave front method for analyzingthe BK equation is generalized to deal with RG-resummed kernels,restricting to the interesting case of fixed QCD coupling. The resultsshow that the higher order corrections suppress the rapid increase of thesaturation scale with increasing rapidity. I also perform a "diffusive"differential equation approximation, which illustrates that someimportant qualitative properties of the kernel change when including RGcorrections.
Date: December 1, 2006
Creator: Enberg, Rikard
Partner: UNT Libraries Government Documents Department

Oracle inequalities for SVMs that are based on random entropy numbers

Description: In this paper we present a new technique for bounding local Rademacher averages of function classes induced by a loss function and a reproducing kernel Hilbert space (RKHS). At the heart of this technique lies the observation that certain expectations of random entropy numbers can be bounded by the eigenvalues of the integral operator associated to the RKHS. We then work out the details of the new technique by establishing two new oracle inequalities for SVMs, which complement and generalize orevious results.
Date: January 1, 2009
Creator: Steinwart, Ingo
Partner: UNT Libraries Government Documents Department

Hierarchical resilience with lightweight threads.

Description: This paper proposes methodology for providing robustness and resilience for a highly threaded distributed- and shared-memory environment based on well-defined inputs and outputs to lightweight tasks. These inputs and outputs form a failure 'barrier', allowing tasks to be restarted or duplicated as necessary. These barriers must be expanded based on task behavior, such as communication between tasks, but do not prohibit any given behavior. One of the trends in high-performance computing codes seems to be a trend toward self-contained functions that mimic functional programming. Software designers are trending toward a model of software design where their core functions are specified in side-effect free or low-side-effect ways, wherein the inputs and outputs of the functions are well-defined. This provides the ability to copy the inputs to wherever they need to be - whether that's the other side of the PCI bus or the other side of the network - do work on that input using local memory, and then copy the outputs back (as needed). This design pattern is popular among new distributed threading environment designs. Such designs include the Barcelona STARS system, distributed OpenMP systems, the Habanero-C and Habanero-Java systems from Vivek Sarkar at Rice University, the HPX/ParalleX model from LSU, as well as our own Scalable Parallel Runtime effort (SPR) and the Trilinos stateless kernels. This design pattern is also shared by CUDA and several OpenMP extensions for GPU-type accelerators (e.g. the PGI OpenMP extensions).
Date: October 1, 2011
Creator: Wheeler, Kyle Bruce
Partner: UNT Libraries Government Documents Department

Simple classifiers from data dependent hypothesis classes

Description: In this paper we introduce simple classifiers as an example of how to use the data dependent hypothesis class framework described by Cannon et al. (2002) to explore the performance/computation trade-off in the classifier design problem. We provide a specific example of a simple classifier and demonstrate that it has many remarkable properties: For example it possesses computationally efficient learning algorithms with favorable bounds on estimation error, admits kernel mappings, and is particularly well suited to boosting. We present experimental results on synthetic and real data that suggest that this classifier is competitive with powerful alternative methods.
Date: January 1, 2003
Creator: Cannon, A. (Adam); Howse, J. W. (James W.); Hush, D. R. (Donald R.) & Scovel, James C.
Partner: UNT Libraries Government Documents Department

Prototype Vector Machine for Large Scale Semi-Supervised Learning

Description: Practicaldataminingrarelyfalls exactlyinto the supervisedlearning scenario. Rather, the growing amount of unlabeled data poses a big challenge to large-scale semi-supervised learning (SSL). We note that the computationalintensivenessofgraph-based SSLarises largely from the manifold or graph regularization, which in turn lead to large models that are dificult to handle. To alleviate this, we proposed the prototype vector machine (PVM), a highlyscalable,graph-based algorithm for large-scale SSL. Our key innovation is the use of"prototypes vectors" for effcient approximation on both the graph-based regularizer and model representation. The choice of prototypes are grounded upon two important criteria: they not only perform effective low-rank approximation of the kernel matrix, but also span a model suffering the minimum information loss compared with the complete model. We demonstrate encouraging performance and appealing scaling properties of the PVM on a number of machine learning benchmark data sets.
Date: April 29, 2009
Creator: Zhang, Kai; Kwok, James T. & Parvin, Bahram
Partner: UNT Libraries Government Documents Department

Results from ORNL Characterization of Nominal 350 ?m LEUCO Kernels (LEU03) from the BWXT G73V-20-69303 Composite

Description: Measurements were made using optical microscopy to determine the size and shape of the LEU03 kernels. Hg porosimetry was performed to measure density. The results are summarized in Table 1-1. Values in the table are for the composite and are calculated at 95% confidence from the measured values of a random riffled sample. The LEu03 kernel composite met all the specifications in Table 1-1. The BWXT results for measuring the same kernel properties are given in Table 1-2. BWXT characterization methods were significantly different from ORNL methods, which resulted in slight differences in the reported results. BWXT performed manual microscopy measurements for mean diameter (100 particles measured along 2 axes) and aspect ratio (100 particles measured); ORNL used automated image acquisition and analysis (3847 particles measured along 180 axes). Diameter measurements were in good agreement. The narrower confidence interval in the ORNL results for average mean diameter is due to the greater number of particles measured. The critical limits for mean diameter reported at ORNL and BWXT are similar, because ORNL measured a larger standard deviation (10.46 {micro}m vs. 8.70 {micro}m). Aspect ratio satisfied the specification with greater margin in the ORNL results mostly because of the larger sample size resulting in a lower uncertainty in the binomial distribution statistical calculation. ORNL measured 11 out of 3847 kernels exceeding the control limit (1.05); BWXT measured 1 out of 100 particles exceeding the control limit. BWXT used the aspect ratio of perpendicular diameters in a random image plane, where one diameter was a maximum or a minimum. ORNL used the aspect ratio of the absolute maximum and minimum diameters in a random image plane. The ORNL technique can be expected to yield higher measured aspect ratios. Hand tabling was performed at ORNL prior to characterization by repeatedly pouring a small fraction ...
Date: November 1, 2006
Creator: Kercher, Andrew K. & Hunn, John D.
Partner: UNT Libraries Government Documents Department

Modelling long-distance seed dispersal in heterogeneous landscapes.

Description: 1. Long-distance seed dispersal is difficult to measure, yet key to understanding plant population dynamics and community composition. 2. We used a spatially explicit model to predict the distribution of seeds dispersed long distances by birds into habitat patches of different shapes. All patches were the same type of habitat and size, but varied in shape. They occurred in eight experimental landscapes, each with five patches of four different shapes, 150 m apart in a matrix of mature forest. The model was parameterized with smallscale movement data collected from field observations of birds. In a previous study we validated the model by testing its predictions against observed patterns of seed dispersal in real landscapes with the same types and spatial configuration of patches as in the model. 3. Here we apply the model more broadly, examining how patch shape influences the probability of seed deposition by birds into patches, how dispersal kernels (distributions of dispersal distances) vary with patch shape and starting location, and how movement of seeds between patches is affected by patch shape. 4. The model predicts that patches with corridors or other narrow extensions receive higher numbers of seeds than patches without corridors or extensions. This pattern is explained by edgefollowing behaviour of birds. Dispersal distances are generally shorter in heterogeneous landscapes (containing patchy habitat) than in homogeneous landscapes, suggesting that patches divert the movement of seed dispersers, ‘holding’ them long enough to increase the probability of seed defecation in the patches. Dispersal kernels for seeds in homogeneous landscapes were smooth, whereas those in heterogenous landscapes were irregular. In both cases, long-distance (> 150 m) dispersal was surprisingly common, usually comprising approximately 50% of all dispersal events. 5. Synthesis . Landscape heterogeneity has a large influence on patterns of long-distance seed dispersal. Our results suggest that long-distance ...
Date: January 1, 2008
Creator: Levey, Douglas, J.; Tewlsbury, Joshua, J. & Bolker, Benjamin, M.
Partner: UNT Libraries Government Documents Department

Edge detection by nonlinear dynamics

Description: We demonstrate how the formulation of a nonlinear scale-space filter can be used for edge detection and junction analysis. By casting edge-preserving filtering in terms of maximizing information content subject to an average cost function, the computed cost at each pixel location becomes a local measure of edgeness. This computation depends on a single scale parameter and the given image data. Unlike previous approaches which require careful tuning of the filter kernels for various types of edges, our scheme is general enough to be able to handle different edges, such as lines, step-edges, corners and junctions. Anisotropy in the data is handled automatically by the nonlinear dynamics.
Date: July 1994
Creator: Wong, Yiu-fai
Partner: UNT Libraries Government Documents Department

Kernel Near Principal Component Analysis

Description: We propose a novel algorithm based on Principal Component Analysis (PCA). First, we present an interesting approximation of PCA using Gram-Schmidt orthonormalization. Next, we combine our approximation with the kernel functions from Support Vector Machines (SVMs) to provide a nonlinear generalization of PCA. After benchmarking our algorithm in the linear case, we explore its use in both the linear and nonlinear cases. We include applications to face data analysis, handwritten digit recognition, and fluid flow.
Date: July 1, 2002
Creator: Martin, Shawn B.
Partner: UNT Libraries Government Documents Department

The Kernel Polynomial Method for non-orthogonal electronic structure calculations

Description: The Kernel Polynomial Method (KPM) has been successfully applied to tight-binding electronic structure calculations as an O(N) method. Here we extend this method to nonorthogonal basis sets with a sparse overlap matrix S and a sparse Hamiltonian H. Since the KPM method utilizes matrix vector multiplications it is necessary to apply S{sup -1} H onto a vector. The multiplication of S{sup -1} is performed using a preconditioned conjugate gradient method and does not involve the explicit inversion of S. Hence the method scales the same way as the original KPM method, i.e. O(N), although there is an overhead due to the additional conjugate gradient part. We show an application of this method to defects in a titanate/platinum interface and to a large scale electronic structure calculation of amorphous diamond.
Date: October 1, 1996
Creator: Roeder, H.; Silver, R.N.; Kress, J.D. & Landrum, G.A.
Partner: UNT Libraries Government Documents Department

Distribution-free discriminant analysis

Description: This report describes our experience in implementing a non-parametric (distribution-free) discriminant analysis module for use in a wide range of pattern recognition problems. Issues discussed include performance results on both real and simulated data sets, comparisons to other methods, and the computational environment. In some cases, this module performs better than other existing methods. Nearly all cases can benefit from the application of multiple methods.
Date: May 1, 1997
Creator: Burr, T. & Doak, J.
Partner: UNT Libraries Government Documents Department

Prediction: Design of experiments based on approximating covariance kernels

Description: Using Mercer`s expansion to approximate the covariance kernel of an observed random function the authors transform the prediction problem to the regression problem with random parameters. The latter one is considered in the framework of convex design theory. First they formulate results in terms of the regression model with random parameters, then present the same results in terms of the original problem.
Date: November 1, 1998
Creator: Fedorov, V.
Partner: UNT Libraries Government Documents Department

The stability of the spectator, Dirac, and Salpeter equations for mesons

Description: Mesons are made of quark-antiquark pairs held together by the strong force. The one channel spectator, Dirac, and Salpeter equations can each be used to model this pairing. The authors look at cases where the relativistic kernel of these equations corresponds to a time-like vector exchange, a scalar exchange, or a linear combination of the two. Since the model used in this paper describes mesons which cannot decay physically, the equations must describe stable states. They find that this requirement is not always satisfied, and give a complete discussion of the conditions under which the various equations give unphysical, unstable solutions.
Date: August 1, 1998
Creator: Uzzo, Michael & Gross, Franz
Partner: UNT Libraries Government Documents Department

Final Report: A Flexible Sequency Reconstructor for Large-Scale DNA Sequencing Projects, September 1, 1994 - August 31, 1999

Description: Because no current assembly system produces perfect results on all data, some users prefer to use FAKtory in tandem with other assembly systems, and compare the results to help identify false joins or chimeric sequences. Our gryphon tool can aid in this process by displaying a graphical view of a FAKtory assembly alongside any other format assembly. One of our goals has been to allow interfacing between different systems, and as a result of these efforts, the FAKII kernel is the preferred assembly engine of many GAP4 users worldwide. Recent developments of the FAKtory project has focused on enhancing output capabilities to increase compatibility with other tools, improving the performance and increasing the robustness of the system, and adding features based on user requests. FAKtory can now output entire assemblies in FASTA format, and can product PostScript file captures of the Layout, MultiAlignment and Trace views. For compatibility with the GAP4 system, the FAKII Overlap Graph and Assembly objects can now be saved from FAKtory. As an aid to project organization, input and output are automatically directed to a specified project directory which is selectable via a File Selection Box incorporated into the File Directories panel. A File Selection Box is also provided when an output operation is initiated to allow easy redirection of output.
Date: August 31, 1999
Creator: Miller, Susan & Myers, Gene
Partner: UNT Libraries Government Documents Department