An Approach Towards Self-Supervised Classification Using Cyc

Description:

Due to the long duration required to perform manual knowledge entry by human knowledge engineers it is desirable to find methods to automatically acquire knowledge about the world by accessing online information. In this work I examine using the Cyc ontology to guide the creation of Naïve Bayes classifiers to provide knowledge about items described in Wikipedia articles. Given an initial set of Wikipedia articles the system uses the ontology to create positive and negative training sets for the classifiers in each category. The order in which classifiers are generated and used to test articles is also guided by the ontology. The research conducted shows that a system can be created that utilizes statistical text classification methods to extract information from an ad-hoc generated information source like Wikipedia for use in a formal semantic ontology like Cyc. Benefits and limitations of the system are discussed along with future work.

Creator(s): Coursey, Kino High
Creation Date: December 2006
Partner(s):
UNT Libraries
Collection(s):
UNT Theses and Dissertations
Usage:
Total Uses: 209
Past 30 days: 18
Yesterday: 0
Creator (Author):
Publisher Info:
Publisher Name: University of North Texas
Place of Publication: Denton, Texas
Date(s):
  • Creation: December 2006
  • Digitized: April 11, 2008
Description:

Due to the long duration required to perform manual knowledge entry by human knowledge engineers it is desirable to find methods to automatically acquire knowledge about the world by accessing online information. In this work I examine using the Cyc ontology to guide the creation of Naïve Bayes classifiers to provide knowledge about items described in Wikipedia articles. Given an initial set of Wikipedia articles the system uses the ontology to create positive and negative training sets for the classifiers in each category. The order in which classifiers are generated and used to test articles is also guided by the ontology. The research conducted shows that a system can be created that utilizes statistical text classification methods to extract information from an ad-hoc generated information source like Wikipedia for use in a formal semantic ontology like Cyc. Benefits and limitations of the system are discussed along with future work.

Degree:
Level: Master's
Discipline: Computer Science
Language(s):
Subject(s):
Keyword(s): text classification | Cyc | Wikipedia | Naive Bayes ontologies | machine learning
Contributor(s):
Partner:
UNT Libraries
Collection:
UNT Theses and Dissertations
Identifier:
  • OCLC: 137260795 |
  • ARK: ark:/67531/metadc5470
Resource Type: Thesis or Dissertation
Format: Text
Rights:
Access: Public
License: Copyright
Holder: Coursey, Kino High
Statement: Copyright is held by the author, unless otherwise noted. All rights reserved.