Search Results

Measuring Semantic Relatedness Using Salient Encyclopedic Concepts

Description: While pragmatics, through its integration of situational awareness and real world relevant knowledge, offers a high level of analysis that is suitable for real interpretation of natural dialogue, semantics, on the other end, represents a lower yet more tractable and affordable linguistic level of analysis using current technologies. Generally, the understanding of semantic meaning in literature has revolved around the famous quote ``You shall know a word by the company it keeps''. In this thesis we investigate the role of context constituents in decoding the semantic meaning of the engulfing context; specifically we probe the role of salient concepts, defined as content-bearing expressions which afford encyclopedic definitions, as a suitable source of semantic clues to an unambiguous interpretation of context. Furthermore, we integrate this world knowledge in building a new and robust unsupervised semantic model and apply it to entail semantic relatedness between textual pairs, whether they are words, sentences or paragraphs. Moreover, we explore the abstraction of semantics across languages and utilize our findings into building a novel multi-lingual semantic relatedness model exploiting information acquired from various languages. We demonstrate the effectiveness and the superiority of our mono-lingual and multi-lingual models through a comprehensive set of evaluations on specialized synthetic datasets for semantic relatedness as well as real world applications such as paraphrase detection and short answer grading. Our work represents a novel approach to integrate world-knowledge into current semantic models and a means to cross the language boundary for a better and more robust semantic relatedness representation, thus opening the door for an improved abstraction of meaning that carries the potential of ultimately imparting understanding of natural language to machines.
Date: August 2011
Creator: Hassan, Samer
Partner: UNT Libraries

Random-Walk Term Weighting for Improved Text Classification

Description: This paper describes a new approach for estimating term weights in a document, and shows how the new weighting scheme can be used to improve the accuracy of a text classifier.
Date: September 2007
Creator: Hassan, Samer; Mihalcea, Rada, 1974- & Banea, Carmen
Item Type: Paper
Partner: UNT College of Engineering

Text Mining for Automatic Image Tagging

Description: This paper introduces several extractive approaches for automatic image tagging, relying exclusively on information mined from texts. Through evaluations on two datasets, the authors show that their methods exceed competitive baselines by a large margin, and compare favorably with the state-of-the-art that uses both textual and image features.
Date: August 2010
Creator: Leong, Chee Wee; Mihalcea, Rada, 1974- & Hassan, Samer
Item Type: Paper
Partner: UNT College of Engineering

UNT: SubFinder: Combining Knowledge Sources for Automatic Lexical Substitution

Description: This paper describes the University of North Texas SubFinder system. The system is able to provide the most likely set of substitutes for a word in a given context, by combining several techniques and knowledge sources. SubFinder has successfully participated in the best and out of ten (oot) tracks in the SEMEVAL lexical substitution task, consistently ranking in the first or second place.
Date: June 2007
Creator: Hassan, Samer; Csomai, Andras; Banea, Carmen; Sinha, Ravi & Mihalcea, Rada, 1974-
Item Type: Paper
Partner: UNT College of Engineering