Enhancing Scalability of Sparse Direct Methods Page: 4 of 5
This article is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided to Digital Library by the UNT Libraries Government Documents Department.
The following text was automatically extracted from the image on this page using optical character recognition software:
matrix181 P=8 P =256 dds15 P=8 P =256
Symbolic Sequential 6.8 6.8 Symbolic Sequential 4.6 4.6
Parallel 2.6 2.7 Parallel 1.6 0.5
Entire solver Old 84.7 26.6 Entire solver Old 64.1 43.2
New 159.2 26.5 New 66.3 31.4
Table 3. Parallel runtime (seconds). "Old" or "New" solver means using the serial or parallel
symbolic factorization algorithm, respectively.
3. Linear-complexity sparse factorization
As is well known, sparse Gaussian elimination has super-linear time complexity. For example,
for a Laplacian-type of PDE model problem with 2D n x n mesh or 3D n x n x n mesh, using
nested dissection numbering, the operation counts are 0(n3) and 0(n6), respectively. This was
shown to be optimal when performing exact elimination. However, we have observed that by
exploiting numerical low rankness in Schur complements, we can design a nearly-linear time,
and sufficiently accurate factorization algorithm.
The idea comes from structured matrix computations. Specifically, we use semi-separable
matrices in our study. For example, a semi-separable matrix with 4 x 4 blocks has the following
D1 U1V U1W2VT U1W2W3V
A ~ V2U D2 U2V3T U2W3VT
V W2 U1 V U2 D3 U3V4
V4W WT U V4W3U V4 U D4
" The first and second off-diagonal blocks of A are
U1 (VT W2Vf W2WwV) and U1W2 (V W3Vl)f.
" For general i < j, the (i, j) block-entry is UiWi+lWi+2 - - - Wy1V.
" Every upper off-diagonal block has the form
U1W2W3 ..-. Wi
U V+1 W+1>i+2 ... Wi+1Wi+2 ... WN-1VN ).
D, U, W and V matrices have dimensions k x k. A is n x n with n = Nk and uses O(nk) memory,
which is very economical for k < n. The examples of semi-separable matrix include banded
matrices and their inverses. This representation can be numerically constructed.
We can devise many fast algorithms based on this compressed representation. For example,
Figure 3 compares the fast semi-separable Cholesky factorization (SS-Cholesky) to the standard
Cholesky in LAPACK (DPOTRF), with the rank k chosen to be 16 and 64. For large matrices,
we see orders of magnitude speedups.
This SS-Cholesky kernel can be used in a sparse factorization. Consider using a nested
dissection ordering for the discretization mesh, then the submatrix corresponding to the
separator node is essentially dense during Schur complement updates. Furthermore, the
maximum off-diagonal rank of those matrices are small and nearly constant. Based on this
observation, we devised a new multifrontal factorization scheme, which compresses the frontal
matrix in semi-separable form, and performs SS-Cholesky on it. The superfast multifrontal
method works as follows :
Here’s what’s next.
This article can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Article.
Li, Xiaoye S.; Demmel, James; Grigori, Laura; Gu, Ming; Xia,Jianlin; Jardin, Steve et al. Enhancing Scalability of Sparse Direct Methods, article, July 23, 2007; Berkeley, California. (digital.library.unt.edu/ark:/67531/metadc896087/m1/4/: accessed January 20, 2019), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Government Documents Department.