Automatic generation of a coarse grained WordNet
Date: June 2001
Creator: Mihalcea, Rada & Moldovan, Dan
Description: This paper discusses automatic generation of a coarse grained WordNet. Abstract: Several principles for the automatic transformation of WordNet into a coarser grained dictionary are proposed. A new version of WordNet is derived, leading to a reduction of 26% in the average polysemy of words, while introducing a small error rate of 2.1%, as measured on a sense tagged corpus.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc83310/
An Automatic Method for Generating Sense Tagged Corpora
Date: 1999
Creator: Mihalcea, Rada & Moldovan, Dan
Description: This paper discusses an automatic method for generating sense tagged corpora. Abstract: The unavailability of very large corpora with semantically disambiguated words is a major limitation in text processing research. For example, statistical methods for word sense disambiguation of free text are known to achieve high accuracy results when large corpora are available to develop context rules, to train and test them. This article presents a novel approach to automatically generate arbitrarily large corpora for word senses. The method is based on (1) the information provided in WordNet, used to formulate queries consisting of synonyms or definitions of word senses, and (2) the information gathered from Internet using existing search engines. The method was tested on 120 word senses and a precision of 91% was observed.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc83300/
Word Sense Disambiguation based on Semantic Density
Date: August 1998
Creator: Mihalcea, Rada & Moldovan, Dan
Description: This paper presents a Word Sense Disambiguation method based on the idea of semantic density between words. The disambiguation is done in the context of WordNet. The Internet is used as a raw corpora to provide statistical information for word associations. A metric is introduced and used to measure the semantic density and to rank all possible combinations of the senses of two words. This method provides a precision of 58% in indicating the correct sense for both words at the same time. The precision increases as we consider more choices: 70% for top two ranked and 73% for top three ranked.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc83303/
A WordNet-Based Interface to Internet Search Engines
Date: May 1998
Creator: Moldovan, Dan & Mihalcea, Rada
Description: This paper discusses a WordNet-based interface to Internet search engines. A vast amount of information is available on the Internet, and naturally, many information gathering tools have been developed. Several search engines with different characteristics, such as Alta Vista, Lycos, Infoseek, and others are available. However, the web information retrieval technology is still in its infancy, and there is need for considerable improvement. Some inherent difficulties are: (1) the web information is diverse and highly unstructured, (2) the size of information is large and it grows at an exponential rate, and (3) the current search engine technology is still rudimentary. While the first two issues are more profound and require long term solutions, it may be possible to develop software around the search engines to improve the quality of the information retrieved. In this paper the authors present a natural language interface system to a search engine and discuss some of the results obtained.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc83305/
Semantic Indexing using WordNet Senses
Date: October 2000
Creator: Mihalcea, Rada & Moldovan, Dan
Description: In this paperarticle, the authors describe a boolean Information Retrieval system that adds words semantics to the classic word based indexing. Two of the main tasks of our system, namely the indexing and retrieval components, are using a combined word-based and sense-based approach. The key to our system is a methodology for building semantic representations of open text, at word and collocation level. This new technique, called semantic indexing, shows improved effectiveness over the classic word based indexing techniques.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc83301/
Document Indexing using Named Entities
Date: January 2001
Creator: Mihalcea, Rada, 1974- & Moldovan, Dan I.
Description: This article discusses document indexing using named entities. Abstract: Current text indexing and retrieval techniques have their roots in the field of Information Retrieval where the task is to extract documents that best match a query. With an ever increasing number of documents available due to the easy access through the Internet, the challenge is to provide users with concise and relevant information. The authors are proposing here a novel, yet simple approach, which indexes the named entities in the documents, such as to improve the relevance of documents retrieved. Experiments performed in finding information related to a set of 75 input questions, from a large collection of 125,000 documents, show that this new technique reduces the number of retrieved documents by a factor of 2, while still retrieving the relevant documents.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc83311/
eXtended WordNet: progress report
Date: June 2001
Creator: Mihalcea, Rada, 1974- & Moldovan, Dan I.
Description: This paper discusses eXtended WordNet. Abstract: eXtended WordNet (XWN), a morphologically and semantically enhanced version of the WordNet dictionary, is currently build at SMU. There are several phases in the XWN project. This paper focuses on the semantic disambiguation stage of this project, and the preprocessing required by this stage.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc83309/
Improving the search on the Internet by using WordNet and lexical operators
Date: July 21, 1999
Creator: Moldovan, Dan I. & Mihalcea, Rada, 1974-
Description: This article discusses improving the search on the internet by using WordNet and lexical operators. Abstract: This paper presents a natural language interface system to an Internet search engine that provides the following improvements: (1) accepts natural language (English) questions, (2) expands the query, based on a word sense disambiguation method, and (3) uses a new lexical operator to post-process the documents retrieved for extracting only the part of a document that is relevant to a query. The system was tested on 100 queries of which 50 were adopted from the TIPSTER topics collection, provided at the 6th Text Retrieval Conference (TREC-6) and 50 were selected from among the queries submitted by users to an existing Web search engine. The results obtained demonstrate a substantial increase in both the precision and the percentage of queries answered correctly, while the amount of text presented to the user is reduced in comparison with the current Internet search engine technology.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc83306/
An Iterative Approach to Word Sense Disambiguation
Date: May 2000
Creator: Mihalcea, Rada, 1974- & Moldovan, Dan I.
Description: This paper discusses an iterative approach to Word Sense Disambiguation. Abstract: In this paper, we present an iterative algorithm for Word Sense Disambiguation. It combines two sources of information: WordNet and a semantic tagged corpus, for the purpose of identifying the correct sense of the words in a given text. It differs from other standard approaches in that the disambiguation process is performed in an iterative manner: starting from free text, a set of disambiguated words is built, using various methods; new words are sense tagged based on their relation to the already disambiguated words, and then added to the set. This iterative process allows us to identify, in the original text, a set of words which can be disambiguated with high precision; 55% of the verbs and nouns are disambiguated with an accuracy of 92%.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc83304/
A Method for Word Sense Disambiguation of Unrestricted Text
Date: June 1999
Creator: Mihalcea, Rada, 1974- & Moldovan, Dan I.
Description: This paper discusses a method for word sense disambiguation of unrestricted text. Abstract: Selecting the most appropriate sense for an ambiguous word in a sentence is a central problem in Natural Language Processing. In this paper, the authors present a method that attempts to disambiguate all the nouns, verbs, adverbs and adjectives in a text, using the senses provided in WordNet. The senses are ranked using two sources of information: (1) the Internet for gathering statistics for word-word co-occurrences and (2) WordNet for measuring the semantic density for a pair of words. The authors report an average accuracy of 80% for the first ranked sense, and 91% for the first two ranked senses. Extensions of this method for larger windows of more than two words are considered.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc83302/