Topic Identification Using Wikipedia Graph Centrality

Topic Identification Using Wikipedia Graph Centrality

Date: May 2009
Creator: Coursey, Kino High & Mihalcea, Rada, 1974-
Description: This paper presents a method for automatic topic identification using a graph-centrality algorithm applied to an encyclopedic graph derived from Wikipedia. When tested on a data set with manually assigned topics, the system is found to significantly improve over a simpler baseline that does not make use of the external encyclopedic knowledge.
Contributing Partner: UNT College of Engineering
Using Wikipedia for Automatic Word Sense Disambiguation

Using Wikipedia for Automatic Word Sense Disambiguation

Date: April 2007
Creator: Mihalcea, Rada, 1974-
Description: This paper describes a method for generating sense-tagged data using Wikipedia as a source of sense annotations. Through word sense disambiguation experiments, the authors show that the Wikipedia-based sense annotations are reliable and can be used to construct accurate sense classifiers.
Contributing Partner: UNT College of Engineering
The Value of Everything: Ranking and Association with Encyclopedic Knowledge

The Value of Everything: Ranking and Association with Encyclopedic Knowledge

Date: December 2009
Creator: Coursey, Kino High
Description: This dissertation describes WikiRank, an unsupervised method of assigning relative values to elements of a broad coverage encyclopedic information source in order to identify those entries that may be relevant to a given piece of text. The valuation given to an entry is based not on textual similarity but instead on the links that associate entries, and an estimation of the expected frequency of visitation that would be given to each entry based on those associations in context. This estimation of relative frequency of visitation is embodied in modifications to the random walk interpretation of the PageRank algorithm. WikiRank is an effective algorithm to support natural language processing applications. It is shown to exceed the performance of previous machine learning algorithms for the task of automatic topic identification, providing results comparable to that of human annotators. Second, WikiRank is found useful for the task of recognizing text-based paraphrases on a semantic level, by comparing the distribution of attention generated by two pieces of text using the encyclopedic resource as a common reference. Finally, WikiRank is shown to have the ability to use its base of encyclopedic knowledge to recognize terms from different ontologies as describing the same thing, and thus ...
Contributing Partner: UNT Libraries
Measuring Semantic Relatedness Using Salient Encyclopedic Concepts

Measuring Semantic Relatedness Using Salient Encyclopedic Concepts

Date: August 2011
Creator: Hassan, Samer
Description: While pragmatics, through its integration of situational awareness and real world relevant knowledge, offers a high level of analysis that is suitable for real interpretation of natural dialogue, semantics, on the other end, represents a lower yet more tractable and affordable linguistic level of analysis using current technologies. Generally, the understanding of semantic meaning in literature has revolved around the famous quote ``You shall know a word by the company it keeps''. In this thesis we investigate the role of context constituents in decoding the semantic meaning of the engulfing context; specifically we probe the role of salient concepts, defined as content-bearing expressions which afford encyclopedic definitions, as a suitable source of semantic clues to an unambiguous interpretation of context. Furthermore, we integrate this world knowledge in building a new and robust unsupervised semantic model and apply it to entail semantic relatedness between textual pairs, whether they are words, sentences or paragraphs. Moreover, we explore the abstraction of semantics across languages and utilize our findings into building a novel multi-lingual semantic relatedness model exploiting information acquired from various languages. We demonstrate the effectiveness and the superiority of our mono-lingual and multi-lingual models through a comprehensive set of evaluations on specialized ...
Contributing Partner: UNT Libraries
An Approach Towards Self-Supervised Classification Using Cyc

An Approach Towards Self-Supervised Classification Using Cyc

Date: December 2006
Creator: Coursey, Kino High
Description: Due to the long duration required to perform manual knowledge entry by human knowledge engineers it is desirable to find methods to automatically acquire knowledge about the world by accessing online information. In this work I examine using the Cyc ontology to guide the creation of Naïve Bayes classifiers to provide knowledge about items described in Wikipedia articles. Given an initial set of Wikipedia articles the system uses the ontology to create positive and negative training sets for the classifiers in each category. The order in which classifiers are generated and used to test articles is also guided by the ontology. The research conducted shows that a system can be created that utilizes statistical text classification methods to extract information from an ad-hoc generated information source like Wikipedia for use in a formal semantic ontology like Cyc. Benefits and limitations of the system are discussed along with future work.
Contributing Partner: UNT Libraries
Wikify! Linking Documents to Encyclopedic Knowledge

Wikify! Linking Documents to Encyclopedic Knowledge

Date: November 2007
Creator: Mihalcea, Rada, 1974- & Csomai, Andras
Description: This paper introduces the use of Wikipedia as a resource for automatic keyword extraction and word sense disambiguation, and shows how this online encyclopedia can be used to achieve state-of-the-art results on both these tasks.
Contributing Partner: UNT College of Engineering
Cross-lingual Semantic Relatedness Using Encyclopedic Knowledge

Cross-lingual Semantic Relatedness Using Encyclopedic Knowledge

Date: August 2009
Creator: Hassan, Samer & Mihalcea, Rada, 1974-
Description: This paper discusses cross-lingual semantic relatedness using encyclopedic knowledge.
Contributing Partner: UNT College of Engineering
Linking Educational Materials to Encyclopedic Knowledge

Linking Educational Materials to Encyclopedic Knowledge

Date: July 2007
Creator: Csomai, Andras & Mihalcea, Rada, 1974-
Description: This paper discusses linking educational materials to encyclopedic knowledge.
Contributing Partner: UNT College of Engineering