Toward a Unified Retrieval Outcome Analysis Framework for Cross-Language Information Retrieval Page: 8
This paper is part of the collection entitled: UNT Scholarly Works and was provided to UNT Digital Library by the UNT College of Information.
Extracted Text
The following text was automatically extracted from the image on this page using optical character recognition software:
ASIST 2005 Contributed Paper -Jiangping Chen
Table 5. Summary of Query Translation using the LKB and the LDC Dictionary
Ikb tdn Idc tdn
Total Terms Evaluated 1538 1610
Number of Correct Translations 1204 (78.3%) 1185 (73.6%)
Number of Incorrect Translations 260 (16.9%) 282 (17.5%)
Number of Missing Translations 74 (4.8%) 143 (8.9%)
Individual Query Analysis
In this component, the researcher was interested in what contributed to the good performance of LKB on
certain queries, and what caused the failure of LKB on some other queries. The first factor being considered was
the translation effectiveness. The analysis above revealed that EC-CLIR using the LKB achieved better retrieval
performance than that using the LDC dictionary, and the translation using the LKB was better as well.
Superficially, a correlation between the difference in EC-CLIR performance and the percentage difference in
correct, incorrect and missing translations can be expected. However, this was not true for the queries tested in
this study. A correlation analysis using Spearman's rho found that the difference in average precision between
Ikb_tdn and Idc_tdn had no correlation with the difference in the percentage of correct, incorrect and missing
translations3
The researcher then decided to examine two types of topics: Hard & Stable, and Hard & Unstable topics, to
explore the reasons behind the above results and other major factors affecting system performance. The analysis
of Hard & Stable topics may discover causes of generally hard topics, and the analysis of Hard & Unstable ones
may help find possible ways to improve the performance of the CLIR system using the LKB.
Eight topics belong to Hard & Stable category, which had an average precision score lower than 0.17 from all
the four runs. They were topics 1, 5, 6, 13, 14, 18, 34, and 46. These queries were resistant to translation errors-
-the query translation results had little effect on their IR performance. Among them, Topics 1, 5, 14, 18 proved
difficult for TREC-5 monolingual participating systems, with median average precision lower than 0.15. In an
attempt to find out the reasons, the top 10 retrieved documents returned by the most precise of the four runs were
examined. Table 6 presents some characteristics of those topics, including the run which returned the highest
average precision (AP), the magnitude of the AP, query length, number of relevant documents, and the number of
relevant documents returned in the top 10 by the top runs. It appears that there were very few relevant documents
returned in the top 10 for most of the Hard & Stable topics.
Table. 6. Hard & Stable Topics
Topic The run The highest Original English # of # of relevant
ID returning the AP score query length (in Relevant documents
highest AP words) Documents in top 10
score
1 mono _tdn 0.1502 56 13 1
5 Idctdn 0.1069 70 28 3
6 mono tdn 0.1325 37 77 4
13 Idc tdn 0.0787 36 110 0
14 mono tdn 0.0558 45 57 2
18 Ikb _tdn 0.1214 93 102 1
34 Idc tdn 0.1632 65 95 5
46 mono tdn 0.1443 68 166 6
A manual inspection of the top 10 retrieved documents (including relevant and irrelevant documents) for each
topic has been conducted. Table 7 summarizes our observations after comparing the relevant and irrelevant
documents in the top 10 sets. It seems to us that most of the 8 topics need a different retrieval strategy from the
traditional tf-idf IR model applied by the system. For example, the query from topic 6, "International support of
3 The values of Spearman's rho between the difference in average precision between kb_tdn and Idc_tdn and the difference
in the percentage of correct, incorrect, and missing translations are 0.144, -0.082, and -0.018 respectively. None is significant.8 of 11
Upcoming Pages
Here’s what’s next.
Search Inside
This paper can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Paper.
Chen, Jiangping. Toward a Unified Retrieval Outcome Analysis Framework for Cross-Language Information Retrieval, paper, 2005; (https://digital.library.unt.edu/ark:/67531/metadc132969/m1/8/: accessed May 5, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT College of Information.