Toward a Unified Retrieval Outcome Analysis Framework for Cross-Language Information Retrieval Page: 1
11 p.View a full description of this paper.
Extracted Text
The following text was automatically extracted from the image on this page using optical character recognition software:
ASIST 2005 Contributed Paper -Jiangping Chen
Toward a Unified Retrieval Outcome Analysis Framework for Cross-
Language Information Retrieval
Jiangping Chen
School of Library and Information Sciences, University of North Texas, Denton, P. O. Box 311068, TX
76203, jpchen@unt.edu
Abstract
This paper proposes a Retrieval Outcome Analysis Framework, or ROA Framework, to systematically
evaluate retrieval performance of Cross-Language Information Retrieval systems. The ROA framework
goes beyond TREC-type retrieval evaluation methodology by including procedures focusing on individual
queries, especially difficult queries. The framework is comprised of four interrelated components: (1)
Overall System Performance Evaluation, (2) Query Categorization, (3) Translation Analysis, and (4)
Individual Query Analysis. An example of applying the framework is discussed in detail. The author
believes the proposed framework would be especially useful for the development of real-world Cross-
Language Information Retrieval systems because the evaluation guided by the framework has the
potential to discover causes behind poor retrieval performance.
Introduction
Cross-Language Information Retrieval (CLIR) is a special case of Information Retrieval (IR). It explores
solutions to finding relevant documents in a collection of documents written in a different language or languages
from users' queries. A CLIR system often behaves quite differently in response to different queries: The system
retrieves relevant documents or web pages as top-ranked ones for some queries, but it fails to find any relevant
documents, or ranks them very low, for some other queries. In the latter case, the users either cannot obtain the
needed information, or they have to study the long list of returned documents to locate what they want.
CLIR evaluation is an essential part of CLIR system design and development. A well-designed evaluation
guided by sound methodology should be able to identify the strengths and the weaknesses of the system,
especially the causes of unsatisfactory retrieval performance in response to certain queries, and to provide
evidence for system improvement. However, current CLIR evaluation focuses more on the average performance
over multiple topics than individual topic, just like monolingual IR system evaluation, as Hu, Bandhakavi, and Zhai
have pointed out (2003). Few systems or researchers have performed systematic, in-depth analysis on individual
queries or topics. In particular, researchers have paid little attention to those difficult queries or topics for which
relevant documents or answers are not found or are ranked very low by IR systems or CLIR systems.
Consequently, little is known about why some queries are more difficult then others. Current IR evaluation as
conducted by TREC (http://trec.nist.gov/) may help the system to improve overall performance, but produces a
limited effect on certain difficult queries because current TREC evaluations lack methods for performing in-depth
retrieval analysis.
The researcher believes that it is necessary to explore methodological issues of conducting analysis at
individual query level in order to understand the causes behind IR system performance. The investigation would
benefit IR systems, especially real-world information access and retrieval systems, by allowing system designers
to adjust their retrieval and user interaction strategies to provide better service for their users. In this paper, the
author introduces a concept called Retrieval Outcome Analysis (ROA). ROA refers to a series of analytical
procedures which systematically evaluate information retrieval on individual queries. In contrast to the traditional,
TREC-like IR system evaluation paradigm, ROA focuses on exploring the causes behind retrieval performance on
individual queries. A well designed ROA should provide more evidence to explain why a system performs well on
certain topics and why it does poorly on some others, not just precision and recall scores.
In order to demonstrate the usefulness of the ROA and the procedures involved in it, the author proposes an
ROA framework as a methodology for CLIR system evaluation. The ROA framework that is built upon the ROA
concept will be presented and illustrated in the remaining part of this paper: The next section, "Related Research,"
reviews current IR system evaluation strategies and studies that have contributed to IR or CLIR performance
analysis methodologies. The following section presents the ROA framework for CLIR. The fourth section provides1 of 11
Upcoming Pages
Here’s what’s next.
Search Inside
This paper can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Paper.
Chen, Jiangping. Toward a Unified Retrieval Outcome Analysis Framework for Cross-Language Information Retrieval, paper, 2005; (https://digital.library.unt.edu/ark:/67531/metadc132969/m1/1/?rotate=90: accessed April 25, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT College of Information.