UNT at ImageCLEF 2010: CLIR for Wikipedia Images Page: 3
The following text was automatically extracted from the image on this page using optical character recognition software:
Data Collection preparation:
For our experiments we translated all the French and German texts in the captions
associated with the images into English using the Google Translation service. This
translation was added to the caption using a new field that was indexed together with
the original English caption (if it was available).
For our runs we used just one language at a time from the three provided in the
original ImageCLEFwiki Topics and built topics automatically using a simple
strategy that converted all the words to a "#combine" statement in lemur. We used
first the English topics as our base line. A second run with French topics that were
translated into English was created to measure the effect on query translation. We also
asked two of the members of our group to use the Indri Query Language and create
manual queries that could take advantage of the advanced option of the more
advanced operators in Lemur. For this purpose we made available for these users the
Indri web search engine (which is based on Lemur) and asked them to conduct
searches with the system until they were satisfied with the results that were retrieved.
Each user learned the syntax of the Indri Query Language and then created queries
that tried to use the capabilities of the query language. For example, for our first user
the procedure followed to build the query is described below:
All query statements used to perform manual image retrieval were built based on
the Indri Query Language. The user developed all seventy manual query statements
using the following methods:
1. The user tried different combinations of keywords to retrieve images from a
2. Based on the returned images, the user refined the query statements based on the
a. Incorporate those observable objects within images that can match the
question topics into the query keywords using Indri Query Language
operators such as #combine and #filreq.
b. Reject those observable objects within images that cannot match the
question topics using Indri Query Language operators such as #filrej.
c. The user reviewed the first 50 images returned and reiterated the two
steps mentioned above until the precision of the first 50 images
reached at least 80%.
For example, for topic number seven. In order to find most images representing
"striking lighting in the sky", the user tried the method mentioned in 2a to incorporate
all potential keywords that could imply "striking lighting in sky" such as lightning,
day, night, strike, struck, striking, and sky. The user also rejected the keyword
"fighter" using method mentioned in 2b so that the aircraft fighter "lighting" would
not be selected for this question. The final query submitted in the official run for this
#filrej(fighter #filreq(lightning #combine(day night strike struck striking sky)))
Here’s what’s next.
This paper can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Paper.
Ruiz, Miguel E.; Chen, Jiangping; Pasupathy, Karthikeyan; Chin, Pok & Knudson, Ryan. UNT at ImageCLEF 2010: CLIR for Wikipedia Images, paper, September 2010; (https://digital.library.unt.edu/ark:/67531/metadc96836/m1/3/: accessed May 25, 2019), University of North Texas Libraries, Digital Library, https://digital.library.unt.edu; crediting UNT College of Information.