The Decomposition of Human-Written Book Summaries Page: 584
This paper is part of the collection entitled: UNT Scholarly Works and was provided to UNT Digital Library by the UNT College of Engineering.
Extracted Text
The following text was automatically extracted from the image on this page using optical character recognition software:
584 H. Ceylan and R. Mihalcea
Determining whether a phrase in a summary sentence is a result of lexical
paraphrasing or a generalization/specification of phrase(s) of the original text is
a difficult problem, and it is omitted in both our and Jing & McKeown's studies.
Hence, in the decomposition algorithm, we only consider the sentence reduction,
sentence combination, syntactic transformation, and reordering operations.
The sentence reduction operation refers to extracting a sentence from the
original source and then removing certain words or phrases from it. Sentence
combination is the process of combining two or more sentences from the original
source and merging them into one sentence. Note that it is possible to combine
only parts of the sentences, hence this operation is often used together with the
sentence reduction operation. Syntactic transformation refers to modification of
the syntactic structure of a sentence, such as word reordering or passive trans-
formations. Finally, the reordering operation is concerned with the position of
the sentence in the summary with respect to the sentences in the original text
that are used to construct it.
3.2 Problem Formulation
Based on the assumptions discussed in the previous section, the sentence de-
composition problem translates into finding the words of a summary sentence
inside the original text. If the words come from a single sentence, then we can
conclude that either the original sentence is included as-is in the summary, or
that the sentence reduction operation is used to eliminate of some of the words.
If the position of the words is changed with respect to the original sentence, then
syntactic transformations are also involved. Further, if some of the words come
from different sentences, then we can conclude that the sentence combination
operation is used.
Thus the problem is formulated as follows. Each summary sentence is rep-
resented as a sequence of words (w1, w2, ..., w ), and each word wi can be rep-
resented as a set Si of tuples (P, L), where P is the position of the sentence
in the source document which contains wi, and L is the position of wi within
the sentence. The tuple (P, L) is also referred to as the document position of
the word. For example, the tuple (4, 12) for a word w means that w appears in
the 12th position of the 4th sentence of the source document. Hence, for each
summary sentence, there are M = 1=l Si possible ways to compose it, where
I Si denotes the cardinality of the set Si. We are interested in finding the set
that will select the most likely document position for each word. The next section
describes the algorithm that attempts to do this task in an efficient way.
3.3 Algorithm
The relation between the document position of a word and the position of the
preceding words can be used to assign a likelihood probability to the current
document position. This likelihood can be estimated using an N-gram model. In
our study, we follow [4] and use a bigram model. Hence, we specify the probability
P(wi - (P, L)jlwel(P, L)k) where (P, L)j e Si and (P, L)k e Si-1, as the
Upcoming Pages
Here’s what’s next.
Search Inside
This paper can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Paper.
Ceylan, Hakan & Mihalcea, Rada, 1974-. The Decomposition of Human-Written Book Summaries, paper, March 2009; [Berlin, Germany]. (https://digital.library.unt.edu/ark:/67531/metadc31018/m1/3/: accessed April 23, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT College of Engineering.