Investigating the Extractive Summarization of Literary Novels

Description:

Abstract

Due to the vast amount of information we are faced with, summarization has become a critical necessity of everyday human life. Given that a large fraction of the electronic documents available online and elsewhere consist of short texts such as Web pages, news articles, scientific reports, and others, the focus of natural language processing techniques to date has been on the automation of methods targeting short documents. We are witnessing however a change: an increasingly larger number of books become available in electronic format. This means that the need for language processing techniques able to handle very large documents such as books is becoming increasingly important. This thesis addresses the problem of summarization of novels, which are long and complex literary narratives. While there is a significant body of research that has been carried out on the task of automatic text summarization, most of this work has been concerned with the summarization of short documents, with a particular focus on news stories. However, novels are different in both length and genre, and consequently different summarization techniques are required. This thesis attempts to close this gap by analyzing a new domain for summarization, and by building unsupervised and supervised systems that effectively take into account the properties of long documents, and outperform the traditional extractive summarization systems typically addressing news genre.

Creator(s): Ceylan, Hakan
Creation Date: December 2011
Partner(s):
UNT Libraries
Collection(s):
UNT Theses and Dissertations
Usage:
Total Uses: 149
Past 30 days: 5
Yesterday: 0
Creator (Author):
Publisher Info:
Publisher Name: University of North Texas
Publisher Info: www.unt.edu
Place of Publication: Denton, Texas
Date(s):
  • Creation: December 2011
Description:

Abstract

Due to the vast amount of information we are faced with, summarization has become a critical necessity of everyday human life. Given that a large fraction of the electronic documents available online and elsewhere consist of short texts such as Web pages, news articles, scientific reports, and others, the focus of natural language processing techniques to date has been on the automation of methods targeting short documents. We are witnessing however a change: an increasingly larger number of books become available in electronic format. This means that the need for language processing techniques able to handle very large documents such as books is becoming increasingly important. This thesis addresses the problem of summarization of novels, which are long and complex literary narratives. While there is a significant body of research that has been carried out on the task of automatic text summarization, most of this work has been concerned with the summarization of short documents, with a particular focus on news stories. However, novels are different in both length and genre, and consequently different summarization techniques are required. This thesis attempts to close this gap by analyzing a new domain for summarization, and by building unsupervised and supervised systems that effectively take into account the properties of long documents, and outperform the traditional extractive summarization systems typically addressing news genre.

Degree:
Discipline: Computer Science
Level: Doctoral
PublicationType: Doctoral Dissertation
Language(s):
Subject(s):
Keyword(s): Text summarization | extractive summarization | summarization of literary novels
Contributor(s):
Partner:
UNT Libraries
Collection:
UNT Theses and Dissertations
Identifier:
  • ARK: ark:/67531/metadc103298
Resource Type: Thesis or Dissertation
Format: Text
Rights:
Access: Public
Holder: Ceylan, Hakan
License: Copyright
Statement: Copyright is held by the author, unless otherwise noted. All rights Reserved.