Extracting a Representation from Text for Semantic Analysis Page: 243
The following text was automatically extracted from the image on this page using optical character recognition software:
tached to noun modifiers, most notably the Loca-
tion label. For example, given the reference answer
fragment The water on the floor had a much larger
surface area, one of the facets extracted was Loca-
tion on(water, floor).
We refer to facets that express relations between
higher-level propositions as inter-propositional
facets. An example of such a facet is (le) above,
connecting the proposition the brass ring did not
stick to the nail to the proposition the ring is not
iron. In addition to specifying the headwords of
inter-propositional facets (stick and is, in le), we
also note up to two key facets from each of the
propositions that the relation is connecting (b, c,
and d in example 1). Reference answer facets that
are assumed to be understood by the learner a pri-
ori, (e.g., because they are part of the question), are
also annotated to indicate this.
There were a total of 2878 reference answer fac-
ets, resulting in a mean of 10 facets per answer
(median 8). Facets that were assumed to be under-
stood a priori by students accounted for 33% of all
facets and inter-propositional facets accounted for
11%. The results of automated annotation of stu-
dent answers (section 3) focus on the facets that
are not assumed to be understood a priori (67% of
all facets); of these, 12% are inter-propositional.
A total of 36 different facet relation types were
utilized. The majority, 21, are VerbNet thematic
roles. Direction, Manner, and Purpose are Prop-
Bank adjunctive argument labels (Palmer et al.,
2005). Quantifier, Means, Cause-to-Know and
copulas were added to the preceding roles. Finally,
anything that did not fit into the above categories
retained its dependency parse type: VMod (Verb
Modifier), NMod (Noun Modifier), AMod (Adjec-
tive or Adverb Modifier), and Root (Root was used
when a single word in the answer, typically yes,
no, agree, disagree, A-D, etc., stood alone without
a significant relation to the remainder of the refer-
ence answer; this occurred only 21 times, account-
ing for fewer than 1% of the reference answer
facets). The seven highest frequency relations are
NMod, Theme, Cause, Be, Patient, AMod, and
Location, which together account for 70% of the
reference answer facet relations
2.2 Student Answer Annotation
For each student answer, we annotated each
reference answer facet to indicate whether and how
the student addressed that facet. We settled on the
five annotation categories in Table 1. These labels
and the annotation process are detailed in (Nielsen
et al., 2008b).
Understood: Reference answer facets directly ex-
pressed or whose understanding is inferred
Contradiction: Reference answer facets contradicted
by negation, antonymous expressions, pragmatics, etc.
Self-Contra: Reference answer facets that are both con-
tradicted and implied (self contradictions)
Diff-Arg: Reference answer facets whose core relation
is expressed, but it has a different modifier or argument
Unaddressed: Reference answer facets that are not ad-
dressed at all by the student's answer
Table 1. Facet Annotation Labels
3 Automated Classification
As partial validation of this knowledge representa-
tion, we present results of an automatic assessment
of our student answers. We start with the hand
generated reference answer facets. We generate
automatic parses for the reference answers and the
student answers and automatically modify these
parses to match our desired representation. Then
for each reference answer facet, we extract features
indicative of the student's understanding of that
facet. Finally, we train a machine learning classi-
fier on training data and use it to classify unseen
test examples, assigning a Table 1 label for each
reference answer facet.
We used a variety of linguistic features that as-
sess the facets' similarity via lexical entailment
probabilities following (Glickman et al., 2005),
part of speech tags and lexical stem matches. They
include information extracted from modified de-
pendency parses such as relevant relation types and
path edit distances. Revised dependency parses are
used to align the terms and facet-level information
for feature extraction. Remaining details can be
found in (Nielsen et al., 2008a) and are not central
to the semantic representation focus of this paper.
Current classification accuracy, assigning a Table
1 label to each reference answer facet to indicate
the student's expressed understanding, is 79%
within domain (assessing unseen answers to ques-
tions associated with the training data) and 69%
out of domain (assessing answers to questions re-
garding entirely different science subjects). These
results are 26% and 15% over the majority class
baselines, respectively, and 21% and 6% over lexi-
Here’s what’s next.
This paper can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Paper.
Nielsen, Rodney D.; Ward, Wayne; Martin, James H. & Palmer, Martha. Extracting a Representation from Text for Semantic Analysis, paper, June 2008; Stroudsburg, Pennsylvania. (digital.library.unt.edu/ark:/67531/metadc1042597/m1/3/: accessed July 17, 2018), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT College of Engineering.