252 Matching Results

Search Results

Advanced search parameters have been applied.

On the Equivalence of Nonnegative Matrix Factorization and K-means- Spectral Clustering

Description: We provide a systematic analysis of nonnegative matrix factorization (NMF) relating to data clustering. We generalize the usual X = FG{sup T} decomposition to the symmetric W = HH{sup T} and W = HSH{sup T} decompositions. We show that (1) W = HH{sup T} is equivalent to Kernel K-means clustering and the Laplacian-based spectral clustering. (2) X = FG{sup T} is equivalent to simultaneous clustering of rows and columns of a bipartite graph. We emphasizes the importance of orthogonality in NMF and soft clustering nature of NMF. These results are verified with experiments on face images and newsgroups.
Date: December 4, 2005
Creator: Ding, Chris; He, Xiaofeng; Simon, Horst D. & Jin, Rong
Partner: UNT Libraries Government Documents Department

Unsymmetric ordering using a constrained Markowitz scheme

Description: We present a family of ordering algorithms that can be used as a preprocessing step prior to performing sparse LU factorization. The ordering algorithms simultaneously achieve the objectives of selecting numerically good pivots and preserving the sparsity. We describe the algorithmic properties and challenges in their implementation. By mixing the two objectives we show that we can reduce the amount of fill-in in the factors and reduce the number of numerical problems during factorization. On a set of large unsymmetric real problems, we obtained the median reductions of 12% in the factorization time, of 13% in the size of the LU factors, of 20% in the number of operations performed during the factorization phase, and of 11% in the memory needed by the multifrontal solver MA41-UNS. A byproduct of this ordering strategy is an incomplete LU-factored matrix that can be used as a preconditioner in an iterative solver.
Date: January 18, 2005
Creator: Amestoy, Patrick R.; S., Xiaoye & Pralet, Stephane
Partner: UNT Libraries Government Documents Department

Multithreading for Synchronization Tolerance in MatrixFactorization

Description: Physical constraints such as power, leakage and pin bandwidth are currently driving the HPC industry to produce systems with unprecedented levels of concurrency. In these parallel systems, synchronization and memory operations are becoming considerably more expensive than before. In this work we study parallel matrix factorization codes and conclude that they need to be re-engineered to avoid unnecessary (and expensive) synchronization. We propose the use of multithreading combined with intelligent schedulers and implement representative algorithms in this style. Our results indicate that this strategy can significantly outperform traditional codes.
Date: July 16, 2007
Creator: Buttari, Alfredo; Dongarra, Jack; Husbands, Parry; Kurzak, Jakub & Yelick, Katherine
Partner: UNT Libraries Government Documents Department

A new scheduling algorithm for parallel sparse LU factorization with static pivoting

Description: In this paper we present a static scheduling algorithm for parallel sparse LU factorization with static pivoting. The algorithm is divided into mapping and scheduling phases, using the symmetric pruned graphs of L' and U to represent dependencies. The scheduling algorithm is designed for driving the parallel execution of the factorization on a distributed-memory architecture. Experimental results and comparisons with SuperLU{_}DIST are reported after applying this algorithm on real world application matrices on an IBM SP RS/6000 distributed memory machine.
Date: August 20, 2002
Creator: Grigori, Laura & Li, Xiaoye S.
Partner: UNT Libraries Government Documents Department

Performance analysis of parallel supernodal sparse LU factorization

Description: We investigate performance characteristics for the LU factorization of large matrices with various sparsity patterns. We consider supernodal right-looking parallel factorization on a bi-dimensional grid of processors, making use of static pivoting. We develop a performance model and we validate it using the implementation in SuperLU-DIST, the real matrices and the IBM Power3 machine at NERSC. We use this model to obtain performance bounds on parallel computers, to perform scalability analysis and to identify performance bottlenecks. We also discuss the role of load balance and data distribution in this approach.
Date: February 5, 2004
Creator: Grigori, Laura & Li, Xiaoye S.
Partner: UNT Libraries Government Documents Department

Using Perturbed QR Factorizations To Solve Linear Least-Squares Problems

Description: We propose and analyze a new tool to help solve sparse linear least-squares problems min{sub x} {parallel}Ax-b{parallel}{sub 2}. Our method is based on a sparse QR factorization of a low-rank perturbation {cflx A} of A. More precisely, we show that the R factor of {cflx A} is an effective preconditioner for the least-squares problem min{sub x} {parallel}Ax-b{parallel}{sub 2}, when solved using LSQR. We propose applications for the new technique. When A is rank deficient we can add rows to ensure that the preconditioner is well-conditioned without column pivoting. When A is sparse except for a few dense rows we can drop these dense rows from A to obtain {cflx A}. Another application is solving an updated or downdated problem. If R is a good preconditioner for the original problem A, it is a good preconditioner for the updated/downdated problem {cflx A}. We can also solve what-if scenarios, where we want to find the solution if a column of the original matrix is changed/removed. We present a spectral theory that analyzes the generalized spectrum of the pencil (A*A,R*R) and analyze the applications.
Date: March 21, 2008
Creator: Avron, Haim; Ng, Esmond G. & Toledo, Sivan
Partner: UNT Libraries Government Documents Department

LUsim: A Framework for Simulation-Based Performance Modelingand Prediction of Parallel Sparse LU Factorization

Description: Sparse parallel factorization is among the most complicated and irregular algorithms to analyze and optimize. Performance depends both on system characteristics such as the floating point rate, the memory hierarchy, and the interconnect performance, as well as input matrix characteristics such as such as the number and location of nonzeros. We present LUsim, a simulation framework for modeling the performance of sparse LU factorization. Our framework uses micro-benchmarks to calibrate the parameters of machine characteristics and additional tools to facilitate real-time performance modeling. We are using LUsim to analyze an existing parallel sparse LU factorization code, and to explore a latency tolerant variant. We developed and validated a model of the factorization in SuperLU_DIST, then we modeled and implemented a new variant of slud, replacing a blocking collective communication phase with a non-blocking asynchronous point-to-point one. Our strategy realized a mean improvement of 11percent over a suite of test matrices.
Date: April 15, 2008
Creator: Univ. of California, San Diego
Partner: UNT Libraries Government Documents Department

A Supernodal Approach to Incomplete LU Factorization with Partial Pivoting

Description: We present a new supernode-based incomplete LU factorization method to construct a preconditioner for solving sparse linear systems with iterative methods. The new algorithm is primarily based on the ILUTP approach by Saad, and we incorporate a number of techniques to improve the robustness and performance of the traditional ILUTP method. These include the new dropping strategies that accommodate the use of supernodal structures in the factored matrix. We present numerical experiments to demonstrate that our new method is competitive with the other ILU approaches and is well suited for today's high performance architectures.
Date: June 25, 2009
Creator: Li, Xiaoye Sherry & Shao, Meiyue
Partner: UNT Libraries Government Documents Department

Updating the Symmetric Indefinite Factorization with Applications in a Modified Newton's Method

Description: In recent years the use of quasi-Newton methods in optimization algorithms has inspired much of the research in an area of numerical linear algebra called updating matrix factorizations. Previous research in this area has been concerned with updating the factorization of a symmetric positive definite matrix. Here, a numerical algorithm is presented for updating the Symmetric Indefinite Factorization of Bunch and Parlett. The algorithm requires only O(n/sup 2/) arithmetic operations to update the factorization of an n x n symmetric matrix when modified by a rank-one matrix. An error analysis of this algorithm is given. Computational results are presented that investigate the timing and accuracy of this algorithm. Another algorithm is presented for the unconstrained minimization of a nonlinear functional. The algorithm is a modification of Newton's method. At points where the Hessian is indefinite the search for the next iterate is conducted along a quadratic curve in the plane spanned by a direction of negative curvature and a gradient-related descent direction. The stopping criteria for this search take into account the second-order derivative information. The result is that the iterates are shown to converge globally to a critical point at which the Hessian is positive semi-definite. Computational results are presented which indicate that the method is promising.
Date: June 1977
Creator: Sorensen, Danny C.
Partner: UNT Libraries Government Documents Department

Enhancing Scalability of Sparse Direct Methods

Description: TOPS is providing high-performance, scalable sparse direct solvers, which have had significant impacts on the SciDAC applications, including fusion simulation (CEMM), accelerator modeling (COMPASS), as well as many other mission-critical applications in DOE and elsewhere. Our recent developments have been focusing on new techniques to overcome scalability bottleneck of direct methods, in both time and memory. These include parallelizing symbolic analysis phase and developing linear-complexity sparse factorization methods. The new techniques will make sparse direct methods more widely usable in large 3D simulations on highly-parallel petascale computers.
Date: July 23, 2007
Creator: Li, Xiaoye S.; Demmel, James; Grigori, Laura; Gu, Ming; Xia,Jianlin; Jardin, Steve et al.
Partner: UNT Libraries Government Documents Department

Direction-preserving and Schur-monotonic Semi-separable Approximations of Symmetric Positive Definite Matrices

Description: For a given symmetric positive definite matrix A {element_of} R{sup nxn}, we develop a fast and backward stable algorithm to approximate A by a symmetric positive-definite semi-separable matrix, accurate to any prescribed tolerance. In addition, this algorithm preserves the product, AZ, for a given matrix Z {element_of} R{sup nxd}, where d << n. Our algorithm guarantees the positive-definiteness of the semi-separable matrix by embedding an approximation strategy inside a Cholesky factorization procedure to ensure that the Schur complements during the Cholesky factorization all remain positive definite after approximation. It uses a robust direction-preserving approximation scheme to ensure the preservation of AZ. We present numerical experiments and discuss potential implications of our work.
Date: October 20, 2009
Creator: Gu, Ming; Li, Xiaoye Sherry & Vassilevski, Panayot S.
Partner: UNT Libraries Government Documents Department

NIKE3D a nonlinear, implicit, three-dimensional finite element code for solid and structural mechanics user's manual update summary

Description: This report provides the NIKE3D user's manual update summary for changes made from version 3.0.0 April 24, 1995 to version 3.3.6 March 24,2000. The updates are excerpted directly from the code printed output file (hence the Courier font and formatting), are presented in chronological order and delineated by NIKE3D version number. NIKE3D is a fully implicit three-dimensional finite element code for analyzing the finite strain static and dynamic response of inelastic solids, shells, and beams. Spatial discretization is achieved by the use of 8-node solid elements, 2-node truss and beam elements, and 4-node membrane and shell elements. Thirty constitutive models are available for representing a wide range of elastic, plastic, viscous, and thermally dependent material behavior. Contact-impact algorithms permit gaps, frictional sliding, and mesh discontinuities along material interfaces. Several nonlinear solution strategies are available, including Full-, Modified-, and Quasi-Newton methods. The resulting system of simultaneous linear equations is either solved iteratively by an element-by-element method, or directly by a direct factorization method.
Date: March 24, 2000
Creator: Puso, M; Maker, B N; Ferencz, R M & Hallquist, J O
Partner: UNT Libraries Government Documents Department

Test of weak and strong factorization in nucleus-nucleuscollisions atseveral hundred MeV/nucleon

Description: Total and partial charge-changing cross sections have been measured for argon projectiles at 400 MeV/nucleon in carbon, aluminum, copper, tin and lead targets; cross sections for hydrogen were also obtained, using a polyethylene target. The validity of weak and strong factorization properties has been investigated for partial charge-changing cross sections; preliminary cross section values obtained for carbon, neon and silicon at 290 and 400 MeV/nucleon and iron at 400 MeV/nucleon, in carbon, aluminum, copper, tin and lead targets have been also used for testing these properties. Two different analysis methods were applied and both indicated that these properties are valid, without any significant difference between weak and strong factorization. The factorization parameters have then been calculated and analyzed in order to find some systematic behavior useful for modeling purposes.
Date: June 21, 2006
Creator: La Tessa, Chiara; Sihver, Lembit; Zeitlin, Cary; Miller, Jack; Guetersloh, Stephen; Heilbronn, Lawrence et al.
Partner: UNT Libraries Government Documents Department

Towards an Accurate Performance Modeling of Parallel SparseFactorization

Description: We present a performance model to analyze a parallel sparseLU factorization algorithm on modern cached-based, high-end parallelarchitectures. Our model characterizes the algorithmic behavior bytakingaccount the underlying processor speed, memory system performance, aswell as the interconnect speed. The model is validated using theSuperLU_DIST linear system solver, the sparse matrices from realapplications, and an IBM POWER3 parallel machine. Our modelingmethodology can be easily adapted to study performance of other types ofsparse factorizations, such as Cholesky or QR.
Date: May 26, 2006
Creator: Grigori, Laura & Li, Xiaoye S.
Partner: UNT Libraries Government Documents Department

Source Apportionment Analysis of Measured Volatile Organic Compounds in Corpus Christi, Texas

Description: Corpus Christi among of the largest industrialized coastal urban areas in Texas. The strategic location of the city along the Gulf of Mexico allows for many important industries and an international business to be located. The cluster of industries and businesses in the region contribute to the air pollution from emissions that are harmful to the environment and to the people living in and visiting the area. Volatile organic compounds (VOC) constitute an important class of pollutants measured in the area. The automated gas chromatography (Auto GC) data was collected from Texas Commission of Environmental Quality (TCEQ) and source apportionment analysis was conducted on this data to identify key sources of VOC affecting this study region. EPA PMF 3.0 was employed in this sources apportionment study of measured VOC concentration during 2005 - 2012 in Corpus Christi, Texas. The study identified nine optimal factors (Source) that could explain the concentration of VOC at two urbane monitoring sites in the study region. Natural gas was found to be the largest contributor of VOC in the area, followed by gasoline and vehicular exhaust. Diesel was the third highest contributor with emissions from manufacturing and combustion processes. Refineries gases and evaporative fugitive emissions were other major contributors in the area; Flaring operations, solvents, and petrochemicals also impacted the measured VOC in the urban area. It was noted that he measured VOC concentrations were significantly influenced by the economic downturn in the region and this was highlighted in the annual trends of the apportioned VOC.
Date: May 2014
Creator: Abood, Ahmed T.
Partner: UNT Libraries

Factoring Algebraic Error for Relative Pose Estimation

Description: We address the problem of estimating the relative pose, i.e. translation and rotation, of two calibrated cameras from image point correspondences. Our approach is to factor the nonlinear algebraic pose error functional into translational and rotational components, and to optimize translation and rotation independently. This factorization admits subproblems that can be solved using direct methods with practical guarantees on global optimality. That is, for a given translation, the corresponding optimal rotation can directly be determined, and vice versa. We show that these subproblems are equivalent to computing the least eigenvector of second- and fourth-order symmetric tensors. When neither translation or rotation is known, alternating translation and rotation optimization leads to a simple, efficient, and robust algorithm for pose estimation that improves on the well-known 5- and 8-point methods.
Date: March 9, 2009
Creator: Lindstrom, P & Duchaineau, M
Partner: UNT Libraries Government Documents Department

Making tensor factorizations robust to non-gaussian noise.

Description: Tensors are multi-way arrays, and the CANDECOMP/PARAFAC (CP) tensor factorization has found application in many different domains. The CP model is typically fit using a least squares objective function, which is a maximum likelihood estimate under the assumption of independent and identically distributed (i.i.d.) Gaussian noise. We demonstrate that this loss function can be highly sensitive to non-Gaussian noise. Therefore, we propose a loss function based on the 1-norm because it can accommodate both Gaussian and grossly non-Gaussian perturbations. We also present an alternating majorization-minimization (MM) algorithm for fitting a CP model using our proposed loss function (CPAL1) and compare its performance to the workhorse algorithm for fitting CP models, CP alternating least squares (CPALS).
Date: March 1, 2011
Creator: Chi, Eric C. (Rice University, Houston, TX) & Kolda, Tamara Gibson
Partner: UNT Libraries Government Documents Department

LDRD final report : leveraging multi-way linkages on heterogeneous data.

Description: This report is a summary of the accomplishments of the 'Leveraging Multi-way Linkages on Heterogeneous Data' which ran from FY08 through FY10. The goal was to investigate scalable and robust methods for multi-way data analysis. We developed a new optimization-based method called CPOPT for fitting a particular type of tensor factorization to data; CPOPT was compared against existing methods and found to be more accurate than any faster method and faster than any equally accurate method. We extended this method to computing tensor factorizations for problems with incomplete data; our results show that you can recover scientifically meaningfully factorizations with large amounts of missing data (50% or more). The project has involved 5 members of the technical staff, 2 postdocs, and 1 summer intern. It has resulted in a total of 13 publications, 2 software releases, and over 30 presentations. Several follow-on projects have already begun, with more potential projects in development.
Date: September 1, 2010
Creator: Dunlavy, Daniel M. & Kolda, Tamara Gibson (Sandia National Laboratories, Livermore, CA)
Partner: UNT Libraries Government Documents Department

Benefits of IEEE-754 features in modern symmetric tridiagonaleigensolvers

Description: Bisection is one of the most common methods used to compute the eigenvalues of symmetric tridiagonal matrices. Bisection relies on the Sturm count: For a given shift a, the number of negative pivots in the factorization T - {sigma}I = LDL{sup T} equals the number of eigenvalues of T that are smaller than a. In IEEE-754 arithmetic, the value oo permits the computation to continue past a zero pivot, producing a correct Sturm count when T is unreduced. Demmel and Li showed that using oo rather than testing for zero pivots within the loop could significantly improve performance on certain architectures. When eigenvalues are to be computed to high relative accuracy, it is often preferable to work with LDL{sup T} factorizations instead of the original tridiagonal T. One important example is the MRRR algorithm. When bisection is applied to the factored matrix, the Sturm count is computed from LDL{sup T} which makes differential stationary and progressive qds algorithms the methods of choice. While it seems trivial to replace T by LDL{sup T}, in reality these algorithms are more complicated: In IEEE-754 arithmetic, a zero pivot produces an overflow followed by an invalid exception (NaN, or 'Not a Number') that renders the Sturm count incorrect. We present alternative, safe formulations that are guaranteed to produce the correct result. Benchmarking these algorithms on a variety of platforms shows that the original formulation without tests is always faster provided that no exception occurs. The transforms see speed-ups of up to 2.6x over the careful formulations. Tests on industrial matrices show that encountering exceptions in practice is rare. This leads to the following design: First, compute the Sturm count by the fast but unsafe algorithm. Then, if an exception occurs, recompute the count by a safe, slower alternative. The new Sturm count algorithms improve ...
Date: March 12, 2006
Creator: Marques, Osni; Riedy, Jason E. & Vomel, Christof
Partner: UNT Libraries Government Documents Department

Rate Equation Theory for Island Sizes and Capture Zone Areas in Submonolayer Deposition: Realistic Treatment of Spatial Aspects of Nucleation

Description: Extensive information on the distribution of islands formed during submonolayer deposition is provided by the joint probability distribution (JPD) for island sizes, s, and capture zone areas, A. A key ingredient determining the form of the JPD is the impact of each nucleation event on existing capture zone areas. Combining a realistic characterization of such spatial aspects of nucleation with a factorization ansatz for the JPD, we provide a concise rate equation formulation for the variation with island size of both the capture zone area and the island density.
Date: December 5, 2002
Creator: Evans, J W; Li, M & Bartelt, M C
Partner: UNT Libraries Government Documents Department

Unvail the Mysterious of the Single Spin Asymmetry

Description: Single transverse-spin asymmetry in high energy hadronic reaction has been greatly investigated from both experiment and theory sides in the last few years. In this talk, I will summarize some recent theoretical developments, which, in my opinion, help to unvail the mysterious of the single spin asymmetry.
Date: January 5, 2010
Creator: Yuan, Feng
Partner: UNT Libraries Government Documents Department

Theoretical Studies in Elementary Particle Physics

Description: This final report summarizes work at Penn State University from June 1, 1990 to April 30, 2012. The work was in theoretical elementary particle physics. Many new results in perturbative QCD, in string theory, and in related areas were obtained, with a substantial impact on the experimental program.
Date: April 1, 2013
Creator: Collins, John C. & Roiban, Radu S
Partner: UNT Libraries Government Documents Department