Two-stage Framework for a Topology-Based Projection and Visualization of Classified Document Collections
Description: During the last decades, electronic textual information has become the world's largest and most important information source available. People have added a variety of daily newspapers, books, scientific and governmental publications, blogs and private messages to this wellspring of endless information and knowledge. Since neither the existing nor the new information can be read in its entirety, computers are used to extract and visualize meaningful or interesting topics and documents from this huge information clutter. In this paper, we extend, improve and combine existing individual approaches into an overall framework that supports topological analysis of high dimensional document point clouds given by the well-known tf-idf document-term weighting method. We show that traditional distance-based approaches fail in very high dimensional spaces, and we describe an improved two-stage method for topology-based projections from the original high dimensional information space to both two dimensional (2-D) and three dimensional (3-D) visualizations. To show the accuracy and usability of this framework, we compare it to methods introduced recently and apply it to complex document and patent collections.
Date: July 19, 2010
Creator: Oesterling, Patrick; Scheuermann, Gerik; Teresniak, Sven; Heyer, Gerhard; Koch, Steffen; Ertl, Thomas et al.
Item Type: Refine your search to only Article
Partner: UNT Libraries Government Documents Department