Automatic Language Identification for Metadata Records: Measuring the Effectiveness of Various Approaches Page: 72
v, 92 pages : illustrations (some color)View a full description of this dissertation.
Extracted Text
The following text was automatically extracted from the image on this page using optical character recognition software:
the Swedish language of Danish influence ("Swedish", 2015). This revolt makes this
misclassification understandable.
Classification accuracy decreases when considering highly-related languages (Trieschnigg, et al.,
2012). In all cases, save one, in which an English (Germanic) title is classified as Estonian (Uralic),
the classifiers in this study classify the titles within the same language family in cases of
misclassification.
Language relatedness is a common problem for accurate automatic language identification.
Cases abound in which an English text is classified as German, etc. Methods of improving
classification between related languages should exploit characteristics of each language that
clearly distinguish it from its relatives.
The misclassifications in this study are considered one of the study's limitations. Below is a
discussion of some of the other limitations of this study.
Limitations of investigation
As noted above, misclassifications occurring between related languages can be considered a
shortcoming of this investigation. This is not the only problem. The following section describes a
few more limitations of this investigation.
Limited domain of training data
The data used for training in this investigation is of a political nature. It is comprised of freely
available European Parliamentary Proceedings. While many metadata records might describe
digital objects of a political nature, this is definitely not the case for all metadata records, the
domain of which is limited only by the number of potential domains of digital objects. Metadata
may describe objects of the musical, literary, or scientific domain, among others.72
Upcoming Pages
Here’s what’s next.
Search Inside
This dissertation can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Dissertation.
Knudson, Ryan Charles. Automatic Language Identification for Metadata Records: Measuring the Effectiveness of Various Approaches, dissertation, May 2015; Denton, Texas. (https://digital.library.unt.edu/ark:/67531/metadc801895/m1/78/: accessed July 16, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; .