Supervised Keyphrase Extraction as Positive Unlabeled Learning

PDF Version Also Available for Download.

Description

This paper shows that performance of trained keyphrase extractors approximates a classifier trained on articles labeled by multiple annotators, leading to higher average F₁ scores and better rankings of keyphrases.

Physical Description

6 p.

Creation Information

Sterckx, Lucas; Caragea, Cornelia; Demeester, Thomas & Develder, Chris November 2016.

Context

This article is part of the collection entitled: UNT Scholarly Works and was provided by UNT College of Engineering to Digital Library, a digital repository hosted by the UNT Libraries. It has been viewed 15 times . More information about this article can be viewed below.

Who

People and organizations associated with either the creation of this article or its content.

Authors

Publisher

Provided By

UNT College of Engineering

The UNT College of Engineering strives to educate and train engineers and technologists who have the vision to recognize and solve the problems of society. The college comprises six degree-granting departments of instruction and research.

Contact Us

What

Descriptive information to help identify this article. Follow the links below to find similar items on the Digital Library.

Degree Information

Description

This paper shows that performance of trained keyphrase extractors approximates a classifier trained on articles labeled by multiple annotators, leading to higher average F₁ scores and better rankings of keyphrases.

Physical Description

6 p.

Notes

Abstract: The problem of noisy and unbalanced training data for supervised keyphrase extraction results from the subjectivity of keyphrase assignment,which we quantify by crowdsourcing keyphrases for news and fashion magazine articles with many annotators per document.We show that annotators exhibit substantial disagreement, meaning that single annotator data could lead to very different training sets for supervised keyphrase extractors. Thus, annotations from single authors or readers lead to noisy training data and poor extraction performance of the resulting supervised extractor. We provide a simple but effective solution to still work with such data by reweighting the importance of unlabeled candidate phrases in a two stage Positive Unlabeled Learning setting. We show that performance of trained keyphrase extractors approximates a classifier trained on articles labeled by multiple annotators, leading to higher average F1scores and better rankings of keyphrases. We apply this strategy to a variety of test collections from different backgrounds and show improvements over strong baseline models.

Source

  • 2016 Conference on Empirical Methods in Natural Language Processing, November 1-5, 2016. Austin, Texas.

Language

Item Type

Publication Information

  • Publication Title: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
  • Pages: 1924-1929
  • Peer Reviewed: Yes

Collections

This article is part of the following collection of related materials.

UNT Scholarly Works

Materials from the UNT community's research, creative, and scholarly activities and UNT's Open Access Repository. Access to some items in this collection may be restricted.

What responsibilities do I have when using this article?

When

Dates and time periods associated with this article.

Creation Date

  • November 2016

Added to The UNT Digital Library

  • Aug. 29, 2017, 9:38 a.m.

Usage Statistics

When was this article last used?

Yesterday: 0
Past 30 days: 3
Total Uses: 15

Interact With This Article

Here are some suggestions for what to do next.

Start Reading

PDF Version Also Available for Download.

International Image Interoperability Framework

IIF Logo

We support the IIIF Presentation API

Sterckx, Lucas; Caragea, Cornelia; Demeester, Thomas & Develder, Chris. Supervised Keyphrase Extraction as Positive Unlabeled Learning, article, November 2016; Stroudsburg, Pennsylvania. (digital.library.unt.edu/ark:/67531/metadc991023/: accessed November 21, 2018), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT College of Engineering.