You limited your search to:

  Partner: UNT College of Engineering
Learning to Identify Educational Materials

Learning to Identify Educational Materials

Date: 2009
Creator: Hassan, Samer & Mihalcea, Rada, 1974-
Description: This paper discusses learning to identify educational materials. Abstract: In this paper, we explore the task of automatically identifying educational materials, by classifying documents with respect to their educative value. Through experiments carried out on a data set of manually annotated documents, we show that the generally accepted notion of a learning object's "educativeness" is indeed a property that can be reliably assigned through automatic classification.
Contributing Partner: UNT College of Engineering
Learning to Identify Emotions in Text

Learning to Identify Emotions in Text

Date: March 2008
Creator: Strapparava, Carlo, 1962- & Mihalcea, Rada, 1974-
Description: This paper discusses learning to identify emotions in text. Abstract: This paper describes experiments concerned with the automatic analysis of emotions in text. We describe the construction of a large data set annotated for six basic emotions: Anger, Disgust, Fear, Joy, Sadness, and Surprise, and we propose and evaluate several knowledge-based and corpus-based methods for the automatic identification of these emotions in text.
Contributing Partner: UNT College of Engineering
Letter Level Learning for Language Independent Diacritics Restoration

Letter Level Learning for Language Independent Diacritics Restoration

Date: September 2002
Creator: Mihalcea, Rada, 1974- & Nastase, Vivi
Description: This paper discusses letter level learning for language independent diacritics restoration. Abstract: This paper presents a method for diacritics restoration based on learning mechanisms that act at letter level. The method requires no additional tagging tools or resources other than raw text, which makes it independent of the language, and particularly appealing for languages for which there are few resources available. The algorithm was evaluated on four different languages, namely Czech, Hungarian, Polish, and Romanian, and an average accuracy of over 98% was observed.
Contributing Partner: UNT College of Engineering
The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language

The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language

Date: 2009
Creator: Mihalcea, Rada, 1974- & Strapparava, Carlo, 1962-
Description: This paper discusses explorations in the automatic recognition of deceptive language. Abstract: In this paper, we present initial experiments in the recognition of deceptive language. We introduce three data sets of true and lying texts collected for this purpose, and the authors show that automatic classification is a viable technique to distinguish between truth and falsehood as expressed in language. We also introduce a method for class-based feature analysis, which sheds some light on the features that are characteristic for deceptive text.
Contributing Partner: UNT College of Engineering
Linguistic Ethnography: Identifying Dominant Word Classes in Text

Linguistic Ethnography: Identifying Dominant Word Classes in Text

Date: March 2009
Creator: Pulman, Stephen & Mihalcea, Rada, 1974-
Description: This paper discusses linguistic ethnography. Abstract: In this paper, we propose a method for "linguistic ethnography" - a general mechanism for characterizing texts with respect to the dominance of certain classes of words. Using humor as a case study, the authors explore the automatic learning of salient word classes, including semantic classes (e.g., person, animal), psycholinguistic classes (e.g., tentative, cause), and affective load (e.g., anger, happiness). We measure the reliability of the derived word classes and their associated dominance scores by showing significant correlation across different corpora.
Contributing Partner: UNT College of Engineering
Linguistically Motivated Features for Enhanced Back-of-the-Book Indexing

Linguistically Motivated Features for Enhanced Back-of-the-Book Indexing

Date: June 2008
Creator: Csomai, Andras & Mihalcea, Rada, 1974-
Description: This paper discusses linguistically motivated features for enhanced back-of-the-book indexing. Abstract: In this paper, we present a supervised method for back-of-the-book index construction. We introduce a novel set of features that goes beyond the typical frequency-based analysis, including features based on discourse comprehension, syntactic patterns, and information drawn from an online encyclopedia. In experiments carried out on a book collection, the method was found to lead to an improvement of roughly 140% as compared to an existing state-of-the-art supervised method.
Contributing Partner: UNT College of Engineering
Linking Educational Materials to Encyclopedic Knowledge

Linking Educational Materials to Encyclopedic Knowledge

Date: July 2007
Creator: Csomai, Andras & Mihalcea, Rada, 1974-
Description: This paper discusses linking educational materials to encyclopedic knowledge. Abstract: This paper describes a system that automatically links study materials to encyclopedic knowledge, and shows how the availability of such knowledge within easy reach of the learner can improve both the quality of the knowledge acquired and the time needed to obtain such knowledge.
Contributing Partner: UNT College of Engineering
A Logic Programming Framework for Semantic Interpretation with WordNet and PageRank

A Logic Programming Framework for Semantic Interpretation with WordNet and PageRank

Date: September 2004
Creator: Tarau, Paul; Mihalcea, Rada, 1974- & Figa, Elizabeth
Description: This paper discusses a logic programming framework for semantic interpretation with WordNet and PageRank. Abstract: This paper describes applications of Logic Programming to Natural Language processing in combination with graph-algorithms and statistical methods. Google's PageRank and similar fast-converging recursive graph algorithms have provided practical means to statistically rank vertices of large graphs like the World Wide Web. By combining a fast Java-based PageRank implementation with a Prolog base inferential layer, running on top of an optimized WordNet graph, the authors describe applications to word sense disambiguation and evaluate their accuracy in comparison with human annotated corpus data.
Contributing Partner: UNT College of Engineering
Making Computers Laugh: Investigations in Automatic Humor Recognition

Making Computers Laugh: Investigations in Automatic Humor Recognition

Date: October 2005
Creator: Mihalcea, Rada, 1974- & Strapparava, Carlo, 1962-
Description: This paper discusses investigations in automatic humor recognition. Abstract: Humor is one of the most interesting and puzzling aspects of human behavior. Despite the attention it has received in fields such as philosophy, linguistics, and psychology, there have been only few attempts to create computational models for humor recognition or generation. In this paper, we bring empirical evidence that computational approaches can be successfully applied to the task of humor recognition. Through experiments performed on very large data sets, we show that automatic classification techniques can be effectively used to distinguish between humorous and non-humorous texts, with significant improvements observed over apriori known baselines.
Contributing Partner: UNT College of Engineering
Making Sense Out of the Web

Making Sense Out of the Web

Date: November 2004
Creator: Mihalcea, Rada, 1974-
Description: This paper discusses the main lines of research in deriving efficient Word Sense Disambiguation. Abstract: In the past few years, we have witnessed a tremendous growth of the World Wide Web, both in terms of number of Web pages accessible online - resulting in what represents today the largest publicly available corpus, and in terms of number of Web users - who now these two main dimensions - pages and users - has opened the doors to a realm of new approaches to data-hungry and knowledge-hungry language processing applications. Among these, Word Sense Disambiguation is one of the applications that has the potential of benefiting the most from the large amounts of Web-based data and from the availability of inexpensive Web user supervision. In this paper, the author discusses the main lines of research in deriving efficient Word Sense Disambiguation methods that exploit: (1) the Web as a corpus - which represents a view of the Web seen as an enormous collection of Web pages; and (2) the Web as collective mind - where the Web is regarded as a large group of Web users who can contribute their knowledge to the process of identifying word meanings.
Contributing Partner: UNT College of Engineering