Soft Error Vulnerability of Iterative Linear Algebra Methods Page: 4 of 13
This article is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided to UNT Digital Library by the UNT Libraries Government Documents Department.
Extracted Text
The following text was automatically extracted from the image on this page using optical character recognition software:
2 Iterative Linear Methods
The linear system, A x = b, underlies many scientific com-
puting applications. Methods that directly compute an
exact solution for x, such as Gaussian elimination, are
generally expensive, particularly if A is sparse. Thus,
most applications use iterative methods, such as multi-
grid. These methods start with a sample solution and
then iteratively refine it to find an approximate solution
with an estimated error below an acceptable threshold.
For example, the Conjugate Gradient (CG) method ex-
presses x as a linear function of n vectors pi, p2, ... pn,
with each pair of vectors conjugate in A (p2 A p = 0).
Although the p1's can be computed directly, in practice a
small subset of the p,'s is needed is to achieve accuracy
within machine precision. As such, CG approximates the
solution x = aip1 + ... + anpn with only a few vectors.
Under CG, the initial approximation is xg; the residual
ro = b - A x0, which is the direction of the error in x0,
serves as the first conjugate vector, po. Subsequent itera-
tions compute the residual rk and use it to compute the
next conjugate vector Pm. However, to ensure the that pm
is conjugate to prior p1's, Pk - rm - Tk_ Ph1. The
coefficients ak are computed as Tk-1T k-1 ApN. This pro-
Tk-2Tp[
cess is repeated until rk is below some threshold. Although
other iterative methods compute subsequent approxima-
tions differently, all follow a similar pattern.
Two main properties of iterative linear methods shape
the general perception of their soft error vulnerability.
First, they begin with an imprecise solution and iterate
to within some level of accuracy. As such, soft errors that
do not corrupt the data of the matrix A, the vector b or
control state, such as a pointer to a vector, should have
little impact. Second, their residual norm, which tracks
convergence towards a solution, can be used to detect er-
rors by testing its progress for any abnormalities.
3 Target Iterative Methods
We focus on SparseLib [6], a sparse matrix library that
includes several iterative solvers and linear operations on
a variety of sparse matrix storage formats. We examine
the soft error vulnerability of six iterative methods: Con-
jugate Gradient (CG); Conjugate Gradient Squared (CGS);
Biconjugate Gradient (BiCG); Biconjugate Gradient Sta-
bilized (BiCGSTA); Preconditioned Richardson (PR); and
Chebyshev (Cheby). We evaluate these methods with 39
matrixes, each from a different group of the University of
Florida Sparse Matrix Collection [5]. Since CG and Cheby
only work on symmetric matrixes, we use each group's
largest symmetric matrix; some groups have no symmet-
ric matrixes, in which case we use the largest unsymmetric
matrix. The collection provides the right-hand side, b, for
many matrixes; for each matrix that does not include b,we use a right-hand side that corresponds to a solution
vector x of all ones.
In order to establish a baseline for our soft error exper-
iments, we applied each iterative method to each matrix
to identify the smallest residual that the method achieves
in under a minute. We used residual thresholds <1 since
SparseLib's initial guess for x produces a residual of 1. We
did not consider residuals <le-150 since smaller residuals
lead to numerical instability in most matrix/method com-
binations. Different matrix/method combinations have
different minimum residuals: some methods only execute
for a fraction of a second on some matrixes while others
diverge for all target residuals
4 Fault Injection Methodology
We model the impact of soft errors by flipping a single ran-
domly chosen bit at a randomly chosen time in the target
iterative method's data structures. We implement fault
injection into any object on the stack or heap through
manual instrumentation of SparseLib. We do not inject
errors during the reading of the matrix A and vector b
since scientific applications use linear solvers as part of
larger numerical algorithms that read the input data once
but execute the linear solver many times. We also do not
inject errors into system-dependent state since we focus
on the soft error vulnerability of sparse iterative methods
in general, rather than a specific implementation with a
specific compiler. In particular, this means we do not flip
bits in registers, malloc object headers, function return
pointers and application code, since these all depend on a
particular compiler or system library. Our results demon-
strate that the soft error vulnerability of the methods is
significant; injecting additional errors would only increase
that vulnerability.
We perform our experiments on 2.4Ghz dual-core
Opterons, with 2GB RAM each. We evaluate each
method's fault vulnerability for the ten largest matrixes
for which the method satisfies the constraints discussed
in Section 3. We inject faults in the base methods (Sec-
tion 5) and in methods enhanced with soft error detection
(Section 6) and tolerance (Section 7) techniques. For each
combination of iterative method, matrix, and error detec-
tion and/or correction technique we performed 500 trials.
We analyze our data for the impact of a single bit flip on
a single run of each method and on a parallel application
that uses iterative methods internally and runs for one day
on a thousand processors using 1OFIT/MB memory, con-
sistent with typical 1,000-5,000FIT/MB DRAM [18] with
90%-98% effective error correction.
5 Impact of Soft Errors
We first consider the four possible outcomes of a single bit
flip on a single run of an iterative method:
Upcoming Pages
Here’s what’s next.
Search Inside
This article can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Article.
Bronevetsky, G & de Supinski, B. Soft Error Vulnerability of Iterative Linear Algebra Methods, article, January 19, 2008; Livermore, California. (https://digital.library.unt.edu/ark:/67531/metadc900445/m1/4/: accessed April 19, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT Libraries Government Documents Department.