Predicting Subjectivity Orientation of Online Forum Threads Page: 4
The following text was automatically extracted from the image on this page using optical character recognition software:
sentences given by Bruce et. al. , we say that a thread's topic is subjective
if the thread starter seeks private states of minds of other people such as opin-
ions, evaluations, speculations, etc. and non-subjective if the thread starter seeks
factual and/or verifiable information. We call a thread subjective if its topic of
discussion is subjective and non-subjective if it discusses a non-subjective topic.
We assume that subjective threads have discussions, mainly, in subjective lan-
guages whereas non-subjective threads discuss, mainly, in factual languages. We
note that there may be cases where this assumption does not hold good, how-
ever, analysis of such exceptional cases is not the focus of this paper and is left
for future work.
Problem statement: Given an online forum thread T, classify it into one of
the two classes: subjective (denoted by +1) or non-subjective (denoted by -1).
In this work, we assume that a thread discusses a single topic which is speci-
fied by the thread starter in the title and the initial post. Analyzing subjectivity
of threads with multiple topics is a separate research problem that is out of scope
of this work.
3.1 Feature Generation
Intuitively, in online forums, threads discussing subjective topics would contain
more subjective sentences compared to threads discussing non-subjective topics.
This difference usually results in different vocabulary and grammatical structures
of these two types of sentences . To capture this intuition, we used words,
parts-of-speech tags and their combinations as the features for classification.
These features have been shown to perform well in other subjectivity analysis
tasks [17,18,19]. We used the Lingua-en-tagger package from CPAN1 for part-of-
speech tagging. The following features were extracted for a sentence in different
structural elements (title, initial post, reply posts) of a thread:
- Bag of Words (BoW): all words of a sentence.
- Unigrams + POS tags (BoW+POS): all words of a sentence and their
- Unigrams + bigrams (BoW+Bi): all words and sequences of 2 consec-
utive words in a sentence.
- Unigrams + bigrams + POS tags (BoW+Bi+POS): all words, their
parts-of-speech tags and sequences of 2 consecutive words in a sentence.
Table 1 describes feature generation on a sentence containing three words
Wj, Wi+1 and W2+2 and POS, POSi+1 and POS+2 are the parts-of-speech tags
for the words W2, Wi+1 and W+2, respectively. For feature representation we
used term frequency (as we empirically found it to be more effective than tf-idf
and binary) as the weighting scheme and used minimum document frequency
for a term as 3 (we experimented with minimum document frequency 3, 5 and
10 and 3 gave the best results).
Here’s what’s next.
This chapter can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Chapter.
Biyani, Prakhar; Caragea, Cornelia & Mitra, Prasenjit. Predicting Subjectivity Orientation of Online Forum Threads, chapter, March 2013; [Berlin, Germany]. (digital.library.unt.edu/ark:/67531/metadc725770/m1/4/: accessed March 21, 2018), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT College of Engineering.