Instance Based Learning with Automatic Feature Selection Applied to Word Sense Disambiguation
Date: August 2002
Creator: Mihalcea, Rada, 1974-
Description: This paper discusses instance based learning with automatic feature selection applied to word sense disambiguation. Abstract We describe an algorithm for Word Sense Disambiguation (WSD) that relies on a lazy learner improved with automatic feature selection. The algorithm was implemented in a system that achieves excellent performance on the set of data released during the SENSEVAL-2 competition. We present the results obtained and discuss the performance of various features in the context of supervised learning algorithms for WSD.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc30943/
Integrating Knowledge for Subjectivity Sense Labeling
Date: May 2009
Creator: Gyamfi, Yaw; Wiebe, Janyce M.; Mihalcea, Rada, 1974- & Akkaya, Cem
Description: This paper discusses integrating knowledge for subjectivity sense labeling. Abstract: This paper introduces an integrative approach to automatic word sense subjectivity annotation. We use features that exploit the hierarchical structure and domain information in lexical resources such as WordNet, as well as other types of features that measure the similarity of glosses and the overlap among sets of semantically related words. Integrated in a machine learning framework, the entire set of features is found to give better results than any individual type of feature.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc31013/
Intra-Class Competitive Assignments in CS2: A One-Year Study
Date: July 2006
Creator: Garlick, Ryan & Akl, Robert G.
Description: This paper discusses intra-class competitive assignments in CS2. Abstract: The widespread goals of student retention, introducing larger programming projects, and fostering collaboration among students in computer science courses has led to the inclusion of group projects in many curricula, with task division and collaboration as motivation for students to complete assignments. This paper presents a study in a first-year programming assignment with similar goals, but with methods adopting the contrarian view - having students directly compete with one another in a tournament of their respective software agents. This paper presents the results of a year-long experiment in an intra-class competitive assignment in the second C++ programming course at the University of North Texas in Denton. Metrics of student performance on the assignment, correlation with course grade, student surveys of the project, and retention statistics are presented. Results demonstrating overwhelmingly positive response and high levels of effort among students are submitted, along with remarks on application to student recruiting, retention, and curriculum design.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc30828/
Investigations in Unsupervised Back-of-the-Book Indexing
Date: May 2007
Creator: Csomai, Andras & Mihalcea, Rada, 1974-
Description: This paper discusses investigations in unsupervised back-of-the-book indexing. Abstract: This paper describes our experiments with unsupervised methods for back-of-the-book index construction. Through comparative evaluations performed on a gold standard data set of 29 books and their corresponding indexes, the authors draw conclusions as to what are the most accurate unsupervised methods for automatic index construction. We show that if the right sequence of methods and heuristics is used, the performance of an unsupervised back-of-the-book index construction system can be raised with up to 250% relative increase in F-measure as compared to the performance of a system based on the traditional tf*idf weighting scheme.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc30990/
An Iterative Approach to Word Sense Disambiguation
Date: May 2000
Creator: Mihalcea, Rada, 1974- & Moldovan, Dan I.
Description: This paper discusses an iterative approach to Word Sense Disambiguation. Abstract: In this paper, we present an iterative algorithm for Word Sense Disambiguation. It combines two sources of information: WordNet and a semantic tagged corpus, for the purpose of identifying the correct sense of the words in a given text. It differs from other standard approaches in that the disambiguation process is performed in an iterative manner: starting from free text, a set of disambiguated words is built, using various methods; new words are sense tagged based on their relation to the already disambiguated words, and then added to the set. This iterative process allows us to identify, in the original text, a set of words which can be disambiguated with high precision; 55% of the verbs and nouns are disambiguated with an accuracy of 92%.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc83304/
A Language Independent Algorithm for Single and Multiple Document Summarization
Date: October 2005
Creator: Mihalcea, Rada, 1974- & Tarau, Paul
Description: This paper discusses a language independent algorithm for single and multiple document summarization. Abstract: This paper describes a method for language independent extractive summarization that relies on iterative graph-based ranking algorithms. Through evaluations performed on a single document summarization task for English and Portuguese, we show that the method performs equally well regardless of the language. Moreover, we show how a meta-summarizer relying on a layered application of techniques for single-document summarization can be turned into an effective method for multi-document summarization.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc30965/
Language Independent Extractive Summarization
Date: July 2005
Creator: Mihalcea, Rada, 1974-
Description: This paper discusses language independent extractive summarization. Abstract: We demonstrate TextRank - a system for unsupervised extractive summarization that relies on the application of iterative graph-based ranking algorithms to graphs encoding the cohesive structure of a text. An important characteristic of the system is that it does not rely on any language-specific knowledge resources or any manually constructed training data, and thus it is highly portable to new languages or domains.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc30967/
Learning Multilingual Subjective Language via Cross-Lingual Projections
Date: June 2007
Creator: Mihalcea, Rada, 1974-; Banea, Carmen & Wiebe, Janyce M.
Description: This paper discusses learning multilingual subjective language via cross-lingual projections. Abstract: This paper explores methods for generating subjectivity analysis resources in a new language by leveraging on the tools and resources available in English. Given a bridge between English and the selected target language (e.g., a bilingual dictionary or a parallel corpus), the methods can be used to rapidly create tools for subjectivity analysis in the new language.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc30991/
Learning to Identify Educational Materials
Date: 2009
Creator: Hassan, Samer & Mihalcea, Rada, 1974-
Description: This paper discusses learning to identify educational materials. Abstract: In this paper, we explore the task of automatically identifying educational materials, by classifying documents with respect to their educative value. Through experiments carried out on a data set of manually annotated documents, we show that the generally accepted notion of a learning object's "educativeness" is indeed a property that can be reliably assigned through automatic classification.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc31014/
Learning to Identify Emotions in Text
Date: March 2008
Creator: Strapparava, Carlo, 1962- & Mihalcea, Rada, 1974-
Description: This paper discusses learning to identify emotions in text. Abstract: This paper describes experiments concerned with the automatic analysis of emotions in text. We describe the construction of a large data set annotated for six basic emotions: Anger, Disgust, Fear, Joy, Sadness, and Surprise, and we propose and evaluate several knowledge-based and corpus-based methods for the automatic identification of these emotions in text.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc31005/