Improving the Search on the Internet by Using WordNet and Lexical Operators Page: 1
The following text was automatically extracted from the image on this page using optical character recognition software:
Improving the search on the Internet by using
WordNet and lexical operators
Dan I. Moldovan and Rada Mihalcea
Department of Computer Science and Engineering
Southern Methodist University
Dallas, Texas, 75275-0122
July 21, 1999
Not for distribution or attribution. For review purposes only.
This paper presents a natural language interface system to an Internet
search engine that provides the following improvements: (1) accepts natural
language (English) questions, (2) expands the query, based on a word sense
disambiguation method, and (3) uses a new lexical operator to post-process
the documents retrieved for extracting only the part of a document that is
relevant to a query. The system was tested on 100 queries of which 50 were
adopted from the TIPSTER topics collection, provided at the 6th Text Re-
trieval Conference (TREC-6) and 50 were selected from among the queries
submitted by users to an existing Web search engine. The results obtained
demonstrate a substantial increase in both the precision and the percentage
of queries answered correctly, while the amount of text presented to the user
is reduced in comparison with the current Internet search engine technology.
A vast amount of information is available on the Internet, and naturally, many information
gathering tools have been developed. Search engines with different characteristics, such as
AltaVista, Lycos, Infoseek, and others are available. However, there are inherent difficulties
associated with the task of retrieving information on the Internet: (1) the web information
is diverse and highly unstructured, and (2) the size of information is large and it grows at an
exponential rate. While these two issues are profound and require long term solutions, still
it is possible to develop software around the existing search engines to improve the quality
of the information retrieved.
A main problem with the current search engines is that broad, general queries produce a
large volume of documents extracted, while specific, narrow questions often fail to produce any
documents [Selberg and Etzioni 1995], [Zorn, Emanoil et al. 1996]. Many of the documents
retrieved for general queries are totally irrelevant and many relevant documents are missing
because the query does not contain the keywords that index those documents. Some queries
formulated in terms of restrictive boolean operators lead to the right documents, but most
often being too restrictive, these queries extract no documents.
Here’s what’s next.
This article can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Article.
Moldovan, Dan I. & Mihalcea, Rada, 1974-. Improving the Search on the Internet by Using WordNet and Lexical Operators, article, July 21, 1999; [New York, New York]. (digital.library.unt.edu/ark:/67531/metadc83306/m1/1/: accessed June 24, 2018), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT College of Engineering.