Date: March 2009
Creator: Mihalcea, Rada & Pulman, Stephen
Description: In this paper, the authors propose a method for "linguistic ethnography" - a general mechanism for characterizing texts with respect to the dominance of certain classes of words. Using humor as a case study, the authors explore the automatic learning of salient word classes, including semantic classes (e.g., person, animal), psycholinguistic classes (e.g., tentative, cause), and affective load (e.g., anger, happiness). The authors measure the reliability of the derived word classes and their associated dominance scores by showing significant correlation across different corpora.
Contributing Partner: UNT College of Engineering