Measuring the Interestingness of Articles in a Limited User Environment Prospectus

PDF Version Also Available for Download.

Description

Search engines, such as Google, assign scores to news articles based on their relevancy to a query. However, not all relevant articles for the query may be interesting to a user. For example, if the article is old or yields little new information, the article would be uninteresting. Relevancy scores do not take into account what makes an article interesting, which would vary from user to user. Although methods such as collaborative filtering have been shown to be effective in recommendation systems, in a limited user environment there are not enough users that would make collaborative filtering effective. I present ... continued below

Physical Description

PDF-file: 45 pages; size: 1.2 Mbytes

Creation Information

Pon, R K April 18, 2007.

Context

This thesis or dissertation is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided by UNT Libraries Government Documents Department to Digital Library, a digital repository hosted by the UNT Libraries. More information about this document can be viewed below.

Who

People and organizations associated with either the creation of this thesis or dissertation or its content.

Author

Publisher

Provided By

UNT Libraries Government Documents Department

Serving as both a federal and a state depository library, the UNT Libraries Government Documents Department maintains millions of items in a variety of formats. The department is a member of the FDLP Content Partnerships Program and an Affiliated Archive of the National Archives.

Contact Us

What

Descriptive information to help identify this thesis or dissertation. Follow the links below to find similar items on the Digital Library.

Description

Search engines, such as Google, assign scores to news articles based on their relevancy to a query. However, not all relevant articles for the query may be interesting to a user. For example, if the article is old or yields little new information, the article would be uninteresting. Relevancy scores do not take into account what makes an article interesting, which would vary from user to user. Although methods such as collaborative filtering have been shown to be effective in recommendation systems, in a limited user environment there are not enough users that would make collaborative filtering effective. I present a general framework for defining and measuring the ''interestingness'' of articles, called iScore, incorporating user-feedback including tracking multiple topics of interest as well as finding interesting entities or phrases in a complex relationship network. I propose and have shown the validity of the following: 1. Filtering based on only topic relevancy is insufficient for identifying interesting articles. 2. No single feature can characterize the interestingness of an article for a user. It is the combination of multiple features that yields higher quality results. For each user, these features have different degrees of usefulness for predicting interestingness. 3. Through user-feedback, a classifier can combine features to predict interestingness for the user. 4. Current evaluation corpora, such as TREC, do not capture all aspects of personalized news filtering systems necessary for system evaluation. 5. Focusing on only specific evolving user interests instead of all topics allows for more efficient resource utilization while yielding high quality recommendation results. 6. Multiple profile vectors yield significantly better results than traditional methods, such as the Rocchio algorithm, for identifying interesting articles. Additionally, the addition of tracking multiple topics as a new feature in iScore, can improve iScore's classification performance. 7. Multiple topic tracking yields better results than the best results from the last TREC adaptive filtering run. As future work, I will address the following hypothesis: Entities and the relationship among these entities using current information extraction technology can be utilized to identify entities of interest and relationships of interest, using a scheme such as PageRank. And I will address one of the following two hypotheses: 1. By addressing the multiple reading roles that a single user may have, classification results can be improved. 2. By tailoring the operating parameters of MTT, better classification results can be achieved.

Physical Description

PDF-file: 45 pages; size: 1.2 Mbytes

Language

Identifier

Unique identifying numbers for this document in the Digital Library or other systems.

  • Report No.: UCRL-TH-230629
  • Grant Number: W-7405-ENG-48
  • Office of Scientific & Technical Information Report Number: 908108
  • Archival Resource Key: ark:/67531/metadc885223

Collections

This document is part of the following collection of related materials.

Office of Scientific & Technical Information Technical Reports

Reports, articles and other documents harvested from the Office of Scientific and Technical Information.

Office of Scientific and Technical Information (OSTI) is the Department of Energy (DOE) office that collects, preserves, and disseminates DOE-sponsored research and development (R&D) results that are the outcomes of R&D projects or other funded activities at DOE labs and facilities nationwide and grantees at universities and other institutions.

What responsibilities do I have when using this thesis or dissertation?

When

Dates and time periods associated with this thesis or dissertation.

Creation Date

  • April 18, 2007

Added to The UNT Digital Library

  • Sept. 22, 2016, 2:13 a.m.

Description Last Updated

  • Nov. 29, 2016, 7:55 p.m.

Usage Statistics

When was this document last used?

Yesterday: 0
Past 30 days: 0
Total Uses: 3

Interact With This Thesis Or Dissertation

Here are some suggestions for what to do next.

Start Reading

PDF Version Also Available for Download.

International Image Interoperability Framework

IIF Logo

We support the IIIF Presentation API

Pon, R K. Measuring the Interestingness of Articles in a Limited User Environment Prospectus, thesis or dissertation, April 18, 2007; Livermore, California. (digital.library.unt.edu/ark:/67531/metadc885223/: accessed October 19, 2018), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Government Documents Department.