You limited your search to:

  Partner: UNT College of Engineering
 Decade: 2000-2009
 Year: 2003
Building Multilingual Semantic Networks with Non-Expert Contributions over the Web

Building Multilingual Semantic Networks with Non-Expert Contributions over the Web

Date: November 2003
Creator: Ayewah, Nathanial; Mihalcea, Rada, 1974- & Nastase, Vivi
Description: This paper discusses building multilingual semantic networks. Abstract: We present a system that allows non-expert Web users to contribute towards building a multilingual lexical resource. Our study focuses on the Romanian-English language pair, and the target resource is a Romanian WordNet strongly connected to the English WordNet. We use a bilingual dictionary, a monolingual definition dictionary and documents on the Web to build synsets, attach them a gloss, and provide some examples. The results of the semi-automatic acquisition system are judged by two human judges, and they are compared to automatic approaches to building a Romanian WordNet.
Contributing Partner: UNT College of Engineering
Creating Large Annotated Data Collections with Web Users' Help

Creating Large Annotated Data Collections with Web Users' Help

Date: April 2003
Creator: Mihalcea, Rada, 1974- & Chklovski, Timothy A. (Timothy Anatolievich), 1977
Description: This paper discusses creating annotated data collections. Abstract: Open Mind Word Expert is an implemented active learning system that aims to create large annotated corpora by tapping into the world's vast pool of knowledge. It does this by relying on the vast number of Web users who contribute their knowledge to data annotation. Open Mind Word Expert focuses on building semantically annotated corpora, by collecting word sense tagging from the general public over the Web. During the first nine months of activity, the system yielded 90,000 high quality tagged items at a much lower cost than the traditional method of hiring lexicographers.
Contributing Partner: UNT College of Engineering
An Evaluation Exercise for Word Alignment

An Evaluation Exercise for Word Alignment

Date: May 2003
Creator: Mihalcea, Rada, 1974- & Pedersen, Ted
Description: This paper discusses an evaluation exercise for word alignment. Abstract: This paper presents the task definition, resources, participating systems, and comparative results for the shared task on word alignment, which was organized as part of the HLT/NAACL 2003 Workshop on Building and Using Parallel Texts. The shared task included Romanian-English and English-French sub-tasks, and drew the participation of seven teams from around the world.
Contributing Partner: UNT College of Engineering
Exploiting Agreement and Disagreement of Human Annotators for Word Sense Disambiguation

Exploiting Agreement and Disagreement of Human Annotators for Word Sense Disambiguation

Date: September 2003
Creator: Chklovski, Timothy A. (Timothy Anatolievich), 1977 & Mihalcea, Rada, 1974-
Description: This paper discusses word sense disambiguation. Abstract: It is generally agreed that the success of a Word Sense Disambiguation (WSD) system depends, in large, on having enough sense annotated data available at hand, and a well-motivated sense inventory into which the disambiguations are made. The authors report a Web-based approach to (1) constructing large sense tagged corpora by exploiting agreement of Web users who contribute word sense annotation, and (2) deriving a coarse-grained sense inventory from a fine-grained inventory by exploiting disagreements of independent contributors about word senses. The authors investigate the quantity and quality of the sense tagged data collected with this approach over the past year. The authors present and evaluate an automatic clustering algorithm able to derive sense clusters that compare well with manually constructed clusters.
Contributing Partner: UNT College of Engineering
Performance Analysis of a Part of Speech Tagging Task

Performance Analysis of a Part of Speech Tagging Task

Date: February 2003
Creator: Mihalcea, Rada
Description: In this paper, the author attempts to make a formal analysis of the performance in automatic part of speech tagging. Lower and upper bounds in tagging precision using existing taggers or their combination are provided. Since the author shows that with existing taggers, automatic perfect tagging is not possible, two solutions for applications requiring very high precision are presented: (1) a solution involving minimum human intervention for a precision of over 98.7%, and (2) a combination of taggers using a memory based learning algorithm that succeeds in reducing the error rate with 11.6% with respect to the best tagger involved.
Contributing Partner: UNT College of Engineering
The Role of Non-Ambiguous Words in Natural Language Disambiguation

The Role of Non-Ambiguous Words in Natural Language Disambiguation

Date: September 2003
Creator: Mihalcea, Rada
Description: This paper describes an unsupervised approach for natural language disambiguation, applicable to ambiguity problems where classes of equivalence can be defined over the set of words in a lexicon. Lexical knowledge is induced from non-ambiguous words via classes of equivalence, and enables the automatic generation of annotated corpora. The only requirements are a lexicon and a raw textual corpus. The method was tested on two natural language ambiguity tasks in several languages: part of speech tagging (English, Swedish, Chinese), and word sense disambiguation (English, Romanian). Classifiers trained on automatically constructed corpora were found to have a performance comparable with classifiers that learn from expensive manually annotated data.
Contributing Partner: UNT College of Engineering