Automatic Language Identification for Metadata Records: Measuring the Effectiveness of Various Approaches Page: 66
v, 92 pages : illustrations (some color)View a full description of this dissertation.
Extracted Text
The following text was automatically extracted from the image on this page using optical character recognition software:
Summary of Results
Notice the highest-scoring approach is the modified N-gram frequency profile and distance
measure. Recall that this modification removes replicated blanks from the N-grams and
considers inter-word N-grams of all sizes, i.e., N = 1-5. The misclassified texts are in most cases
misclassified as being in a language closely related to the actual language, just as Hornik et al.
(2013) demonstrate when they describe the clustering of related languages such as the
Scandinavian languages. Of note is the identical accuracy of all approaches for the Danish
classification. Each of the misclassifications for the Danish language, in all approaches, is
classified as Swedish, which is, like Danish, a Scandinavian language.
In the following chapter, I discuss this chapter's results in more detail; explain the problems I
faced along the way, the limitations, future research plans; and include a conclusion.66
Upcoming Pages
Here’s what’s next.
Search Inside
This dissertation can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Dissertation.
Knudson, Ryan Charles. Automatic Language Identification for Metadata Records: Measuring the Effectiveness of Various Approaches, dissertation, May 2015; Denton, Texas. (https://digital.library.unt.edu/ark:/67531/metadc801895/m1/72/: accessed July 16, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; .