You limited your search to:
Partner:
UNT College of Engineering
Resource Type:
Paper
Language:
English
Collection:
UNT Scholarly Works
A WordNet-Based Interface to Internet Search Engines
Date: May 1998
Creator: Moldovan, Dan & Mihalcea, Rada
Description: This paper discusses a WordNet-based interface to Internet search engines. A vast amount of information is available on the Internet, and naturally, many information gathering tools have been developed. Several search engines with different characteristics, such as Alta Vista, Lycos, Infoseek, and others are available. However, the web information retrieval technology is still in its infancy, and there is need for considerable improvement. Some inherent difficulties are: (1) the web information is diverse and highly unstructured, (2) the size of information is large and it grows at an exponential rate, and (3) the current search engine technology is still rudimentary. While the first two issues are more profound and require long term solutions, it may be possible to develop software around the search engines to improve the quality of the information retrieved. In this paper the authors present a natural language interface system to a search engine and discuss some of the results obtained.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc83305/
Word Sense Disambiguation based on Semantic Density
Date: August 1998
Creator: Mihalcea, Rada & Moldovan, Dan
Description: This paper presents a Word Sense Disambiguation method based on the idea of semantic density between words. The disambiguation is done in the context of WordNet. The Internet is used as a raw corpora to provide statistical information for word associations. A metric is introduced and used to measure the semantic density and to rank all possible combinations of the senses of two words. This method provides a precision of 58% in indicating the correct sense for both words at the same time. The precision increases as we consider more choices: 70% for top two ranked and 73% for top three ranked.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc83303/
An Automatic Method for Generating Sense Tagged Corpora
Date: 1999
Creator: Mihalcea, Rada & Moldovan, Dan
Description: This paper discusses an automatic method for generating sense tagged corpora. Abstract: The unavailability of very large corpora with semantically disambiguated words is a major limitation in text processing research. For example, statistical methods for word sense disambiguation of free text are known to achieve high accuracy results when large corpora are available to develop context rules, to train and test them. This article presents a novel approach to automatically generate arbitrarily large corpora for word senses. The method is based on (1) the information provided in WordNet, used to formulate queries consisting of synonyms or definitions of word senses, and (2) the information gathered from Internet using existing search engines. The method was tested on 120 word senses and a precision of 91% was observed.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc83300/
A Method for Word Sense Disambiguation of Unrestricted Text
Date: June 1999
Creator: Mihalcea, Rada & Moldovan, Dan
Description: This paper discusses a method for word sense disambiguation of unrestricted text. Selecting the most appropriate sense for an ambiguous word in a sentence is a central problem in Natural Language Processing. In this paper, the authors present a method that attempts to disambiguate all the nouns, verbs, adverbs and adjectives in a text, using the senses provided in WordNet. The senses are ranked using two sources of information: (1) the Internet for gathering statistics for word-word co-occurrences and (2) WordNet for measuring the semantic density for a pair of words. The authors report an average accuracy of 80% for the first ranked sense, and 91% for the first two ranked senses. Extensions of this method for larger windows of more than two words are considered.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc83302/
LASSO: A Tool for Surfing the Answer Net
Date: November 1999
Creator: Moldovan, Dan I.; Harabagiu, Sanda M.; Paşca, Marius. 1974-; Mihalcea, Rada, 1974-; Goodrum, Richard A.; Gîrju, Corina R. et al
Description: This paper discusses LASSO, a tool for surfing the answer net. Abstract: This paper presents the architecture, operation and results obtained with the LASSO system developed in the Natural Language Processing Laboratory at SMU. The system relies on a combination of syntactic and semantic techniques, and lightweight abductive inference to find answers. The search for the answer is based on a novel form of indexing called paragraph indexing. A score of 55.5% for short answers and 64.5% for long answers was achieved.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc83331/
A Semi-Complete Disambiguation Algorithm for Open Text
Date: 2000
Creator: Mihalcea, Rada
Description: This paper discusses a semi-complete disambiguation algorithm for open text. Word Sense Disambiguation (WSD) is one of the most difficult areas of Natural Language Processing (NLP); the semantic comprehension of a text, and the possibility to expand a text with semantically related information, drastically depends on the availability of a highly accurate WSD algorithm. Solutions considered so far by researchers for the WSD problem, are making use of machine readable dictionaries (Leacock, Chodorow and Miller 1998), or the information gathered from raw or semantically disambiguated corpora (Yarowsky 1995). These methods are designed either to work with a few pre-selected words, in which case a high accuracy is obtained, or they are general methods which disambiguate, with lower precision, all the words in a text. With the present work, the authors are trying to achieve a compromise between these two different directions. There are fields in NLP, like Information Retrieval and others, which could benefit from a method which performs a semi-complete disambiguation (i.e. it disambiguates only a certain percentage of the words in a text), but which is highly accurate.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc83293/
An Iterative Approach to Word Sense Disambiguation
Date: May 2000
Creator: Mihalcea, Rada, 1974- & Moldovan, Dan I.
Description: This paper discusses an iterative approach to Word Sense Disambiguation. Abstract: In this paper, we present an iterative algorithm for Word Sense Disambiguation. It combines two sources of information: WordNet and a semantic tagged corpus, for the purpose of identifying the correct sense of the words in a given text. It differs from other standard approaches in that the disambiguation process is performed in an iterative manner: starting from free text, a set of disambiguated words is built, using various methods; new words are sense tagged based on their relation to the already disambiguated words, and then added to the set. This iterative process allows us to identify, in the original text, a set of words which can be disambiguated with high precision; 55% of the verbs and nouns are disambiguated with an accuracy of 92%.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc83304/
Semantic Indexing using WordNet Senses
Date: October 2000
Creator: Mihalcea, Rada & Moldovan, Dan
Description: In this paperarticle, the authors describe a boolean Information Retrieval system that adds words semantics to the classic word based indexing. Two of the main tasks of our system, namely the indexing and retrieval components, are using a combined word-based and sense-based approach. The key to our system is a methodology for building semantic representations of open text, at word and collocation level. This new technique, called semantic indexing, shows improved effectiveness over the classic word based indexing techniques.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc83301/
The Structure and Performance of an Open-Domain Question Answering System
Date: October 2000
Creator: Moldovan, Dan; Harabagiu, Sanda; Paşca, Marius; Mihalcea, Rada; Gîrju, Roxana; Goodrum, Richard et al
Description: This paper presents the architecture, operation and results obtained with the LASSO Question Answering system developed in the Natural Language Processing Laboratory at SMU. To find answers, the system relies on a combination of syntactic and semantic techniques. The search for the answer is based on a novel form of indexing called paragraph indexing. A score of 55.5% for short answers and 64.5% for long answers was achieved at the TREC-8 competition.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc83312/
FALCON: Boosting Knowledge for Answer Engines
Date: November 2000
Creator: Harabagiu, Sanda M.; Moldovan, Dan I.; Paşca, Marius. 1974-; Mihalcea, Rada, 1974-; Surdeanu, Mihai; Bunescu, Răzvan et al
Description: This paper discusses FALCON. Abstract: This paper presents the features of FALCON, an answer engine that integrates different forms of syntactic, semantic and pragmatic knowledge for the goal of achieving better performance. The answer engine handles question reformulations, finds the expected answer type from a large hierarchy that incorporates the WordNet semantic net and extracts answers after performing unifications on the semantic forms of the question and its candidate answers. To rule out erroneous answers, it provides justification option, implemented as an abductive proof. In TREC-9, FALCON generated a score of 58% for short answers and 76% for long answers.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc83296/