You limited your search to:

  Partner: UNT College of Engineering
An Evaluation Exercise for Romanian Word Sense Disambiguation

An Evaluation Exercise for Romanian Word Sense Disambiguation

Date: July 2004
Creator: Mihalcea, Rada, 1974-; Nastase, Vivi; Chklovski, Timothy A. (Timothy Anatolievich), 1977; Tatar, Doina; Tufis, Dan & Hristea, Florentina T.
Description: This paper discusses an evaluation exercise for Romanian word sense disambiguation. Abstract: This paper presents the task definition, resources, participating systems, and comparative results for a Romanian Word Sense Disambiguation task, which was organized as part of the SENSEVAL-3 evaluation exercise. Five teams with a total of seven systems were drawn to this task.
Contributing Partner: UNT College of Engineering
Explorations in Automatic Book Summarization

Explorations in Automatic Book Summarization

Date: June 2007
Creator: Ceylan, Hakan & Mihalcea, Rada, 1974-
Description: This paper discusses explorations in automatic book summarization. Abstract: Most of the text summarization research carried out to date has been concerned with the summarization of short documents (e.g., news stories, technical reports), and very little work if any has been done on the summarization of very long documents. In this paper, we try to address this gap and explore the problem of book summarization. We introduce a new data set specifically designed for the evaluation of systems for book summarization, and describe summarization techniques that explicitly account for the length of the documents.
Contributing Partner: UNT College of Engineering
Exploiting Agreement and Disagreement of Human Annotators for Word Sense Disambiguation

Exploiting Agreement and Disagreement of Human Annotators for Word Sense Disambiguation

Date: September 2003
Creator: Chklovski, Timothy A. (Timothy Anatolievich), 1977 & Mihalcea, Rada, 1974-
Description: This paper discusses word sense disambiguation. Abstract: It is generally agreed that the success of a Word Sense Disambiguation (WSD) system depends, in large, on having enough sense annotated data available at hand, and a well-motivated sense inventory into which the disambiguations are made. The authors report a Web-based approach to (1) constructing large sense tagged corpora by exploiting agreement of Web users who contribute word sense annotation, and (2) deriving a coarse-grained sense inventory from a fine-grained inventory by exploiting disagreements of independent contributors about word senses. The authors investigate the quantity and quality of the sense tagged data collected with this approach over the past year. The authors present and evaluate an automatic clustering algorithm able to derive sense clusters that compare well with manually constructed clusters.
Contributing Partner: UNT College of Engineering
A Logic Programming Framework for Semantic Interpretation with WordNet and PageRank

A Logic Programming Framework for Semantic Interpretation with WordNet and PageRank

Date: September 2004
Creator: Tarau, Paul; Mihalcea, Rada, 1974- & Figa, Elizabeth
Description: This paper discusses a logic programming framework for semantic interpretation with WordNet and PageRank. Abstract: This paper describes applications of Logic Programming to Natural Language processing in combination with graph-algorithms and statistical methods. Google's PageRank and similar fast-converging recursive graph algorithms have provided practical means to statistically rank vertices of large graphs like the World Wide Web. By combining a fast Java-based PageRank implementation with a Prolog base inferential layer, running on top of an optimized WordNet graph, the authors describe applications to word sense disambiguation and evaluate their accuracy in comparison with human annotated corpus data.
Contributing Partner: UNT College of Engineering
Linguistic Ethnography: Identifying Dominant Word Classes in Text

Linguistic Ethnography: Identifying Dominant Word Classes in Text

Date: March 2009
Creator: Pulman, Stephen & Mihalcea, Rada, 1974-
Description: This paper discusses linguistic ethnography. Abstract: In this paper, we propose a method for "linguistic ethnography" - a general mechanism for characterizing texts with respect to the dominance of certain classes of words. Using humor as a case study, the authors explore the automatic learning of salient word classes, including semantic classes (e.g., person, animal), psycholinguistic classes (e.g., tentative, cause), and affective load (e.g., anger, happiness). We measure the reliability of the derived word classes and their associated dominance scores by showing significant correlation across different corpora.
Contributing Partner: UNT College of Engineering
Linguistically Motivated Features for Enhanced Back-of-the-Book Indexing

Linguistically Motivated Features for Enhanced Back-of-the-Book Indexing

Date: June 2008
Creator: Csomai, Andras & Mihalcea, Rada, 1974-
Description: This paper discusses linguistically motivated features for enhanced back-of-the-book indexing. Abstract: In this paper, we present a supervised method for back-of-the-book index construction. We introduce a novel set of features that goes beyond the typical frequency-based analysis, including features based on discourse comprehension, syntactic patterns, and information drawn from an online encyclopedia. In experiments carried out on a book collection, the method was found to lead to an improvement of roughly 140% as compared to an existing state-of-the-art supervised method.
Contributing Partner: UNT College of Engineering
The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language

The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language

Date: 2009
Creator: Mihalcea, Rada, 1974- & Strapparava, Carlo, 1962-
Description: This paper discusses explorations in the automatic recognition of deceptive language. Abstract: In this paper, we present initial experiments in the recognition of deceptive language. We introduce three data sets of true and lying texts collected for this purpose, and the authors show that automatic classification is a viable technique to distinguish between truth and falsehood as expressed in language. We also introduce a method for class-based feature analysis, which sheds some light on the features that are characteristic for deceptive text.
Contributing Partner: UNT College of Engineering
Linking Educational Materials to Encyclopedic Knowledge

Linking Educational Materials to Encyclopedic Knowledge

Date: July 2007
Creator: Csomai, Andras & Mihalcea, Rada, 1974-
Description: This paper discusses linking educational materials to encyclopedic knowledge. Abstract: This paper describes a system that automatically links study materials to encyclopedic knowledge, and shows how the availability of such knowledge within easy reach of the learner can improve both the quality of the knowledge acquired and the time needed to obtain such knowledge.
Contributing Partner: UNT College of Engineering
Making Computers Laugh: Investigations in Automatic Humor Recognition

Making Computers Laugh: Investigations in Automatic Humor Recognition

Date: October 2005
Creator: Mihalcea, Rada, 1974- & Strapparava, Carlo, 1962-
Description: This paper discusses investigations in automatic humor recognition. Abstract: Humor is one of the most interesting and puzzling aspects of human behavior. Despite the attention it has received in fields such as philosophy, linguistics, and psychology, there have been only few attempts to create computational models for humor recognition or generation. In this paper, we bring empirical evidence that computational approaches can be successfully applied to the task of humor recognition. Through experiments performed on very large data sets, we show that automatic classification techniques can be effectively used to distinguish between humorous and non-humorous texts, with significant improvements observed over apriori known baselines.
Contributing Partner: UNT College of Engineering
Making Sense Out of the Web

Making Sense Out of the Web

Date: November 2004
Creator: Mihalcea, Rada, 1974-
Description: This paper discusses the main lines of research in deriving efficient Word Sense Disambiguation. Abstract: In the past few years, we have witnessed a tremendous growth of the World Wide Web, both in terms of number of Web pages accessible online - resulting in what represents today the largest publicly available corpus, and in terms of number of Web users - who now these two main dimensions - pages and users - has opened the doors to a realm of new approaches to data-hungry and knowledge-hungry language processing applications. Among these, Word Sense Disambiguation is one of the applications that has the potential of benefiting the most from the large amounts of Web-based data and from the availability of inexpensive Web user supervision. In this paper, the author discusses the main lines of research in deriving efficient Word Sense Disambiguation methods that exploit: (1) the Web as a corpus - which represents a view of the Web seen as an enormous collection of Web pages; and (2) the Web as collective mind - where the Web is regarded as a large group of Web users who can contribute their knowledge to the process of identifying word meanings.
Contributing Partner: UNT College of Engineering