Description: Document representation is important for computer-based text processing. Good document representations must include at least the most salient concepts of the document. Documents exist in a multidimensional space that difficult the identification of what concepts to include. A current problem is to measure the effectiveness of the different strategies that have been proposed to accomplish this task. As a contribution towards this goal, this dissertation studied the visual inter-document relationship in a dimensionally reduced space. The same treatment was done on full text and on three document representations. Two of the representations were based on the assumption that the salient features in a document set follow the chi-distribution in the whole document set. The third document representation identified features through a novel method. A Coefficient of Variability was calculated by normalizing the Cartesian distance of the discriminating value in the relevant and the non-relevant document subsets. Also, the local dictionary method was used. Cosine similarity values measured the inter-document distance in the information space and formed a matrix to serve as input to the Multi-Dimensional Scale (MDS) procedure. A Precision-Recall procedure was averaged across all treatments to statistically compare them. Treatments were not found to be statistically the same and the null hypotheses were rejected.
Date: May 2000
Creator: Oyarce, Guillermo Alfredo