Improving the Search on the Internet by Using WordNet and Lexical Operators Page: 7
18 p.View a full description of this article.
Extracted Text
The following text was automatically extracted from the image on this page using optical character recognition software:
5.1 Contextual ranking of word senses
5.1.1 Algorithm 1
Input: semantically untagged word - word2 pair (W 1- W2)
Output: ranking the senses of one word
Procedure:
1. Form a similarity list for each sense of one of the words.
Pick one of the words, say W2, and using WordNet, form a similarity list for each sense of
that word. For this, use the words from the synset of each sense of the word. Consider,
for example, that W2 has m senses. This means that W2 appears in m similarity lists:
(W21 2(1)W21(2) . 21(k1))
(W ,21 W2(1) W22(2) 1., 2 2(k2))
(w W 2m(1) w2m(2) m(...,w km))
where, W, ..., W2 are the senses ofW2, and W2 (represents the synonym number
s of the sense W2 as defined in WordNet.
2. Form W1 - W (s) pairs. The pairs that may be formed are:
(W - W21, W1 - 21(1), W - 21(2)W, ..., I - 21(k)
(Wi -W , W1 -W2(1i),W1 - w2(2).w1 2W.~(k2))
(w1 - lW2m, W1 - wm(1),W1 - w(2), ..., wm(( km))
3. Search the Internet and rank the senses W2(s)
A search performed on the Internet for each set of pairs as defined above, results in the
number of hits indicating the frequency of occurrences for Wi together with that sense
of W2 .
Using the operators provided by AltaVista, form a query for each set above. One such
query is:
("Wi* WV*" OR "Wi* W1I)*" OR "Wi* Wj(2)*" OR ... OR "Wi* wjkI)*") for all 1 < i K m.
The asterisk (*) is used as a wildcard to increase the number of hits with morphologically
related words. Using such a query, we get the number of hits for each sense i of W2
and this provides a ranking of the m senses of W2 as they relate with W1.
A similar algorithm is used to rank the senses of W1 while keeping W2 constant (un-
disambiguated). Since these two procedures are done over a large corpora (the Internet), and
with the help of similarity lists, there is little correlation between the results produced by the
two procedures.
Procedure Evaluation This method was tested on 384 pairs: 200 verb-noun, 127
adjective-noun, and 57 adverb-verb extracted from the first text of the SemCor 1.6 from
the Brown corpus. Using the query form presented above on Alta Vista, we obtained the
results shown in Table 1. The table indicates the percentages of correct senses (as given by
SemCor) ranked by us in top 1, top 2, top 3, and top 4 of our list. We concluded that by
keeping the top four choices for verbs and nouns and the top two choices for adjectives and
adverbs, we cover with high percentage (mid and upper 90's) all relevant senses. Looking
Upcoming Pages
Here’s what’s next.
Search Inside
This article can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Article.
Moldovan, Dan I. & Mihalcea, Rada, 1974-. Improving the Search on the Internet by Using WordNet and Lexical Operators, article, July 21, 1999; [New York, New York]. (https://digital.library.unt.edu/ark:/67531/metadc83306/m1/7/: accessed April 24, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT College of Engineering.