Computational Laughing: Automatic Recognition of Humorous One-liners Page: 3
The following text was automatically extracted from the image on this page using optical character recognition software:
Take my advice; I don't use it anyway.
I get enough exercise just pushing my luck.
Beauty is in the eye of the beer holder.
Trocadero expects tripling of revenues.
Silver fixes at two-month high, but gold lags.
Oil prices slip as refiners shop for bargains.
They were like spirits, and I loved them.
I wonder if there is some contradiction here.
The train arrives three minutes early.
Creativity is more important than knowledge.
Beauty is in the eye of the beholder.
I believe no tales from an enemy's tongue.
Table 1: Sample examples of one-liners, Reuters titles,
BNC sentences, and proverbs.
the humor-recognition task more difficult and thus more
real. We do not want the automatic classifiers to learn
to distinguish between humorous and non-humorous ex-
amples based simply on text length or vocabulary dif-
ferences. Instead, we seek to enforce the classifiers to
identify humor-specific features, by supplying them with
negative examples similar in most of their aspects to the
positive examples, but different in their comic effect.
Structural similarity was enforced by requiring that
each example in the non-humorous data set follows the
same length restriction as the one-liners: one sentence
with an average length of 10-15 words. Composition
similarity is sought by trying to identify examples simi-
lar to the one-liners with respect to their creativity and
We tested three different sets of negative examples:
1. Reuters titles, extracted from news articles published
in the Reuters newswire over a period of one year
(8/20/1996 - 8/19/1997) [Lewis et al., 2004]. The ti-
tles consist of short sentences with simple syntax, and
are often phrased to catch the readers' attention (an
effect similar to the one rendered by one-liners).
2. Proverbs manually extracted from an "English proverb
collection." Proverbs are sayings that transmit, usu-
ally in one short sentence, important facts or experi-
ences that are considered true by many people. Their
property of being condensed, but memorable sayings
make them very similar to the one-liners. In fact,
some one-liners attempt to imitate proverbs, but with
a comic effect, as in e.g. "Beauty is in the eye of the
beer holder", derived from "Beauty is in the eye of the
3. British National Corpus (BNC) sentences, which were
selected at random from the BNC corpus, covering dif-
ferent styles, genres and domains. Unlike the Reuters
titles or the proverbs, the BNC sentences have typi-
cally no added creativity and no specific intent. How-
ever, we decided to add this set of negative examples
to our experimental setting, in order to observe the
level of difficulty of a humor-recognition task when
performed with respect to simple text.
Table 1 shows three examples from each data set, to
illustrate their structure and composition.
The "400HS" and "40000HS" Data Sets
To summarize, two data sets were built and used in the
experiments: (1) a small set that emphasizes the quality
aspect of the data, for which the one-liners were manu-
ally selected; and (2) a very large set automatically ex-
tracted using a Web-based bootstrapping process, em-
phasizing the quantity aspect of the data, including a
small fraction of potentially noisy examples.
* The "400HS" data set. In this set, the positive ex-
amples consist of 200 one-liners that were manually
collected, and thus are guaranteed to be "clean" hu-
morous examples. The set of negative examples con-
sist of one of the following sets: (1) 200 Reuters titles;
(2) 200 sentences randomly selected from BNC; (3)
* The "40000HS" data set. The positive examples in
this set consist of 20,000 one-liners automatically iden-
tified on the Web using the bootstrapping method il-
lustrated earlier. Since the collection process was au-
tomatic, noisy entries are also possible. Manual verifi-
cation of a randomly selected sample of 200 one-liners
resulted into the identification of 18 noisy entries, in-
dicating an average of 9% potential noise in the data
set, which is within reasonable limits. The negative
examples are drawn from: (1) Reuters titles; or (2)
BNC sentences. Since the collection of proverbs that
we could obtain was relatively small, this type of neg-
ative examples was not included in the large data ex-
Algorithms for Text Classification
There is a large body of algorithms previously tested
on text classification problems, due also to the fact that
text categorization is one of the testbeds of choice for
machine learning. In the classification experiments we
present here, we compare results obtained with two fre-
quently used text classifiers, Naive Bayes and Support
Vector Machines, selected based on their performance in
previously reported work, and for the diversity of their
Naive Bayes. The basic idea in a Naive Bayes text
classifier is to estimate the probability of a category
Here’s what’s next.
This paper can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Paper.
Mihalcea, Rada, 1974- & Strapparava, Carlo, 1962-. Computational Laughing: Automatic Recognition of Humorous One-liners, paper, July 2005; (https://digital.library.unt.edu/ark:/67531/metadc30966/m1/3/: accessed June 16, 2019), University of North Texas Libraries, Digital Library, https://digital.library.unt.edu; crediting UNT College of Engineering.