Co-design of software and hardware to implement remote sensing algorithms Page: 4 of 15
This article is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided to Digital Library by the UNT Libraries Government Documents Department.
Extracted Text
The following text was automatically extracted from the image on this page using optical character recognition software:
Figure 2. The idea of the k-means algorithm. Data points are represented as filled circles, and centers are open
circles. The first step assigns each data point to the center that it is closest to. The second step (re)locates the
centers to be the mean value of all the points in the cluster.
2. CLUSTERING AND SEGMENTATION
Although the first published references to the k-means algorithm can be traced to the mid-1960's,29-32 the basic
idea for the algorithm is quite simple (see Fig. 2) and is sometimes called the "generalized Lloyd algorithm" since it
is a vector version of Lloyd's original scalar quantization algorithm which was developed in the late-1950's.33 The
fine-grained parallelizability of the algorithm has long been noted,34 and this makes it a particularly attractive
target for hardware implementation. The k-means algorithm is a kind of special case of the Expectation Maximum
(EM) algorithm introduced and analyzed by Dempster et a1.35 It can be proved to result in better approximations
to a local optimum with every iteration; a global optimum, unfortunately, is not guaranteed. The literature on
clustering algorithms is quite vast, and there are many reviews36-41; partly this is because there is such a wide range
of applications.
While multi- and hyper-spectral imagery provide unprecedented amounts of information about a scene, it also
presents the image analyst with something of a headache. It is not humanly possible to look at all those channels
at the same time. By clustering the data, one provides not only a substantial (albeit lossy) compression, but also a
"picture" in which pixels with similar spectral signature are identified by a unique (and usually false) color.
As well as reducing the data for quicklook views, clustering also provides an organization of the data that can
be useful for other processing downstream. For example, Kelly and White42 employ clustering as a preprocessing
step to speed up interactive/exploratory manipulations of large data sets. Also, Schowengerdt43 suggests the use
of image segmentation for change detection: a change in the segmentation is more likely to indicate an actual
change on the ground, since the segmentation is relatively robust to changes in sensor performance and atmospheric
conditions. Several authors have shown that clustering the data beforehand improves the performance of algorithms
which attempt to "learn" features from a small number of labelled examples.44-49 The use of clustering for image
restoration has also been explored in some detail; see Sheppard et al.50 for a recent review. For remote detection
and characterization of gaseous plumes, the ground scene ceases to be the signal of interest, and becomes instead
the clutter. Clustering provides a way to reduce this background clutter, because the within-class variance of a
segmented image can be made much smaller than the overall variance of the image as a whole.51 The issues of
clustering and pixel mixing are somewhat at odds with each other, but a paper by Stocker and Schaum52 points to
one approach for combining them.
In the subsections that follow, we will describe both previous and ongoing work to modify the traditional k-means
algorithm so as to speed it up with the aid of programmable logic hardware.
2.1. Low-level variants of k-means
One of the first, and most straightforward, ways to alter a floating-point serial-processor algorithm is to truncate the
precision of the data and/or of the intermediate computations, and to do the computation in fixed-point arithmetic.
We found in Leeser et al.53 that good clusterings could be achieved, even with considerable truncation of the data.
We found that it was advantageous to allocate two more bits of precision to the centers than to the data, but this is
relatively cheap since there are few centers and many data points.
Truncating the precision of the data still keeps the algorithm more or less intact; a less obvious way to alter the
algorithm is to incorporate approximations which are more amenable to hardware. At the core (and in the inner
loop) of the k-means algorithm is the computation of distance between the centers and the data points. By altering
the distance metric itself, we were able to achieve a hardware implemention which greatly enhanced speedup.
In one dimension, distance between two points is straightforwardly given by the absolute difference between
their coordinate values. In higher dimension, the distance between two points can be expressed in terms of the
Upcoming Pages
Here’s what’s next.
Search Inside
This article can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Article.
Theiler, J. P. (James P.); Frigo, J. (Janette); Gokhale, M. (Maya) & Szymanski, J. J. (John J.). Co-design of software and hardware to implement remote sensing algorithms, article, January 1, 2001; United States. (digital.library.unt.edu/ark:/67531/metadc926714/m1/4/: accessed February 20, 2019), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Government Documents Department.