The Value of Everything: Ranking and Association with Encyclopedic Knowledge

The Value of Everything: Ranking and Association with Encyclopedic Knowledge

Date: December 2009
Creator: Coursey, Kino High
Description: This dissertation describes WikiRank, an unsupervised method of assigning relative values to elements of a broad coverage encyclopedic information source in order to identify those entries that may be relevant to a given piece of text. The valuation given to an entry is based not on textual similarity but instead on the links that associate entries, and an estimation of the expected frequency of visitation that would be given to each entry based on those associations in context. This estimation of relative frequency of visitation is embodied in modifications to the random walk interpretation of the PageRank algorithm. WikiRank is an effective algorithm to support natural language processing applications. It is shown to exceed the performance of previous machine learning algorithms for the task of automatic topic identification, providing results comparable to that of human annotators. Second, WikiRank is found useful for the task of recognizing text-based paraphrases on a semantic level, by comparing the distribution of attention generated by two pieces of text using the encyclopedic resource as a common reference. Finally, WikiRank is shown to have the ability to use its base of encyclopedic knowledge to recognize terms from different ontologies as describing the same thing, and thus ...
Contributing Partner: UNT Libraries
An Approach Towards Self-Supervised Classification Using Cyc

An Approach Towards Self-Supervised Classification Using Cyc

Date: December 2006
Creator: Coursey, Kino High
Description: Due to the long duration required to perform manual knowledge entry by human knowledge engineers it is desirable to find methods to automatically acquire knowledge about the world by accessing online information. In this work I examine using the Cyc ontology to guide the creation of Naïve Bayes classifiers to provide knowledge about items described in Wikipedia articles. Given an initial set of Wikipedia articles the system uses the ontology to create positive and negative training sets for the classifiers in each category. The order in which classifiers are generated and used to test articles is also guided by the ontology. The research conducted shows that a system can be created that utilizes statistical text classification methods to extract information from an ad-hoc generated information source like Wikipedia for use in a formal semantic ontology like Cyc. Benefits and limitations of the system are discussed along with future work.
Contributing Partner: UNT Libraries
Topic Identification Using Wikipedia Graph Centrality

Topic Identification Using Wikipedia Graph Centrality

Date: May 2009
Creator: Coursey, Kino High & Mihalcea, Rada, 1974-
Description: This paper presents a method for automatic topic identification using a graph-centrality algorithm applied to an encyclopedic graph derived from Wikipedia. When tested on a data set with manually assigned topics, the system is found to significantly improve over a simpler baseline that does not make use of the external encyclopedic knowledge.
Contributing Partner: UNT College of Engineering
Using Encyclopedic Knowledge for Automatic Topic Identification

Using Encyclopedic Knowledge for Automatic Topic Identification

Date: May 2009
Creator: Coursey, Kino High; Mihalcea, Rada, 1974- & Moen, William E.
Description: This paper presents a method for automatic topic identification using an encyclopedic graph derived from Wikipedia. The system is found to exceed the performance of previously proposed machine learning algorithms for topic identification, with an annotation consistency comparable to human annotations.
Contributing Partner: UNT College of Engineering
Automatic Keyword Extraction for Learning Object Repositories

Automatic Keyword Extraction for Learning Object Repositories

Date: October 2008
Creator: Coursey, Kino High; Mihalcea, Rada, 1974- & Moen, William E.
Description: This article discusses automatic keyword extraction for learning object repositories.
Contributing Partner: UNT College of Engineering