Visualization and Integrated Data Mining of Disparate Information

Saffer, Jeffrey D.; Albright, Cory L.; Calapristi, Augustin J.; Chen, Guang; Crow, Vernon L.; Decker, Scott D.; Groch, Kevin M.; Havre, Susan L.; Malard, Joel; Martin, Tonya J.; Miller, Nancy E.; Monroe, Philip J.; Nowell, Lucy T.; Payne, Deborah A.; Reyes Spindola, Jorge F.; Scarberry, Randall E.; Sofia, Heidi J.; Stillwell, Lisa C.; Thomas, Gregory S.; Thurston, Sarah J.; Williams, Leigh K.; Zabriskie, Sean J.; Hicks, M. G.

You Are Here:
University Libraries
UNT Digital Library
UNT Libraries Government Documents Department
This Article
Page: 3

Visualization and Integrated Data Mining of Disparate Information Page: 3 of 7

This article is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided to UNT Digital Library by the UNT Libraries Government Documents Department.

View a full description of this article.

Previous search

Adjust Image
Rotate Left
Rotate Right
Brightness, Contrast, etc. (Experimental)
Cropping Tool
Download Sizes
Preview all sizes/dimensions or...
Download Thumbnail
Download Small
Download Medium
Download Large
High Resolution Files
IIIF Image JSON
IIIF Image URL
Accessibility
View Extracted Text

zoom Next

These controls are experimental and have not yet been optimized for user experience.

brightness

Reset Brightness 0

contrast

Reset Contrast 0

saturation

Reset Saturation 0

sharpen

Reset Sharpness 0

exposure

Reset Exposure 0

hue

Reset Hue 0

gamma

Reset Gama 0

Applying filters

Visualization and Integrated Data Mining of Disparate Information

[Sequence #]: 3 of 7

Previous item Next item

Extracted Text

The following text was automatically extracted from the image on this page using optical character recognition software:

109
Chemical Data Analysis in the Large, May 22"d - 26th 2000, Bozen, Italy

DATA VISUALIZATION - BASIC CONCEPTS
Exploratory data analysis requires a framework in
which
1. the data can be organized along the lines
of interest to the analyst and
2. a collection of tools is available for
pursuing specific inquiries.
For both, the methods need to handle large volumes
of data, with reasonable speed, and provide linkage
among complementary views and to other tools.
Presenting data in an organized fashion requires
appropriate data overviews, especially those that
allow inference by comparison. For this, we have
adopted visualization methods since they offer
unequalled facility in presenting large volumes of
data. In addition, the structure within a well-
designed visualization can suggest relationships
that might otherwise be overlooked. In that regard,
it should be clear that data visualization methods
assist, but cannot replace the analyst.
A key component of this approach is to use all the
relevant attributes simultaneously for deriving the
comparisons. With very large data sets, such as
high-throughput screening, it is not possible for the
analyst to examine the behavior of the data records
a few columns at a time and be able to assess the
overall behavior. The selection of attributes for
comparison can be useful for testing specific
hypotheses, but do not facilitate discovery of the
unexpected. Hence, cluster-based methods that
utilize all the appropriate data attributes
simultaneously are preferred.
Even with mathematical methods that use all the
data, no single visualization method can convey all
of the information likely to be needed by the
analyst and several complementary approaches are
necessary. In that spirit, these should not be viewed
as stand-alone entities, but linked together for
continuity in data analysis. This becomes

particularly important in an integrated analysis
across different experimental data sets, for example,
where distinct visualizations are used to organize
the data from separate experimental regimens. The
data overviews also need to be supported by
complementary tools that support access to and, in
many cases, visualization of the details of the data.
The easy access to these tools is the foundation for
progressing from visualization to data mining.
Given that the data exploration is necessary in the
first place since the volume of data is too large to
assimilate at once, the key features of the
visualization methods are speed and progressive
disclosure. Speed is essential since iterative
analyses are necessary. Progressive disclosure is a
specific type of iteration that is needed frequently.
This goes beyond simply zooming in, but rather
needs to allow a finer resolution based on
comparison of a subset of data records. For
example, the relationships uncovered from a subset
may be driven by a very different set of attributes
than in a full data set comparison.
Finally, recognizing that no exploratory data
analysis package can do everything, the
visualizations and tools need to provide easy access
to external databases and analytical methods. For
example, in the bioinformatics realm, the collection
of public domain tools is enormous and rather than
attempt to duplicate these, all that is necessary is
easy export of data from a visualization into these
tools and vice versa.
DATA OVERVIEW VISUALIZATIONS
As noted above, complementary data overviews are
needed to address different aspects of a large data
set. We classify these overviews into four types:
" overviews of the data itself,
" overviews of the relationship of each
data record to every other record,
" overviews of the associations within
the data set, and

http://www.beilstein-institut.de/bozen2000/proceedings/saffer/saffer.pdf

Beilstein-Institut

Upcoming Pages

Here’s what’s next.

4 of 7

5 of 7

6 of 7

7 of 7

Show all pages in this article.

Search Inside

This article can be searched. Note: Results may vary based on the legibility of text within the document.

or search this site for other articles

Tools / Downloads

Get a copy of this page or view the extracted text.

Preview all sizes/dimensions or...

Download Thumbnail
Download Small
Download Medium
Download Large
IIIF Image JSON
IIIF Image

View Extracted (OCR) Text

Citing and Sharing

Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.

Reference the current page of this Article.

Saffer, Jeffrey D.; Albright, Cory L.; Calapristi, Augustin J.; Chen, Guang; Crow, Vernon L.; Decker, Scott D. et al. Visualization and Integrated Data Mining of Disparate Information, article, May 11, 2001; Richland, Washington. (https://digital.library.unt.edu/ark:/67531/metadc1417445/m1/3/: accessed May 29, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT Libraries Government Documents Department.

Visualization and Integrated Data Mining of Disparate Information Page: 3 of 7

Upcoming Pages

Search Inside

Tools / Downloads

Citing and Sharing

Reference the current page of this Article.

Print / Share This Page

Permanent URL (This Page)

Univesal Viewer

International Image Interoperability Framework (This Page)