Visualization and Integrated Data Mining of Disparate Information Page: 3 of 7
This article is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided to UNT Digital Library by the UNT Libraries Government Documents Department.
Extracted Text
The following text was automatically extracted from the image on this page using optical character recognition software:
109
Chemical Data Analysis in the Large, May 22"d - 26th 2000, Bozen, ItalyDATA VISUALIZATION - BASIC CONCEPTS
Exploratory data analysis requires a framework in
which
1. the data can be organized along the lines
of interest to the analyst and
2. a collection of tools is available for
pursuing specific inquiries.
For both, the methods need to handle large volumes
of data, with reasonable speed, and provide linkage
among complementary views and to other tools.
Presenting data in an organized fashion requires
appropriate data overviews, especially those that
allow inference by comparison. For this, we have
adopted visualization methods since they offer
unequalled facility in presenting large volumes of
data. In addition, the structure within a well-
designed visualization can suggest relationships
that might otherwise be overlooked. In that regard,
it should be clear that data visualization methods
assist, but cannot replace the analyst.
A key component of this approach is to use all the
relevant attributes simultaneously for deriving the
comparisons. With very large data sets, such as
high-throughput screening, it is not possible for the
analyst to examine the behavior of the data records
a few columns at a time and be able to assess the
overall behavior. The selection of attributes for
comparison can be useful for testing specific
hypotheses, but do not facilitate discovery of the
unexpected. Hence, cluster-based methods that
utilize all the appropriate data attributes
simultaneously are preferred.
Even with mathematical methods that use all the
data, no single visualization method can convey all
of the information likely to be needed by the
analyst and several complementary approaches are
necessary. In that spirit, these should not be viewed
as stand-alone entities, but linked together for
continuity in data analysis. This becomesparticularly important in an integrated analysis
across different experimental data sets, for example,
where distinct visualizations are used to organize
the data from separate experimental regimens. The
data overviews also need to be supported by
complementary tools that support access to and, in
many cases, visualization of the details of the data.
The easy access to these tools is the foundation for
progressing from visualization to data mining.
Given that the data exploration is necessary in the
first place since the volume of data is too large to
assimilate at once, the key features of the
visualization methods are speed and progressive
disclosure. Speed is essential since iterative
analyses are necessary. Progressive disclosure is a
specific type of iteration that is needed frequently.
This goes beyond simply zooming in, but rather
needs to allow a finer resolution based on
comparison of a subset of data records. For
example, the relationships uncovered from a subset
may be driven by a very different set of attributes
than in a full data set comparison.
Finally, recognizing that no exploratory data
analysis package can do everything, the
visualizations and tools need to provide easy access
to external databases and analytical methods. For
example, in the bioinformatics realm, the collection
of public domain tools is enormous and rather than
attempt to duplicate these, all that is necessary is
easy export of data from a visualization into these
tools and vice versa.
DATA OVERVIEW VISUALIZATIONS
As noted above, complementary data overviews are
needed to address different aspects of a large data
set. We classify these overviews into four types:
" overviews of the data itself,
" overviews of the relationship of each
data record to every other record,
" overviews of the associations within
the data set, andhttp://www.beilstein-institut.de/bozen2000/proceedings/saffer/saffer.pdf
Beilstein-Institut
Upcoming Pages
Here’s what’s next.
Search Inside
This article can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Article.
Saffer, Jeffrey D.; Albright, Cory L.; Calapristi, Augustin J.; Chen, Guang; Crow, Vernon L.; Decker, Scott D. et al. Visualization and Integrated Data Mining of Disparate Information, article, May 11, 2001; Richland, Washington. (https://digital.library.unt.edu/ark:/67531/metadc1417445/m1/3/: accessed May 29, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT Libraries Government Documents Department.