Automatic Language Identification for Metadata Records: Measuring the Effectiveness of Various Approaches

Knudson, Ryan Charles

You Are Here:
University Libraries
UNT Digital Library
UNT Libraries
This Dissertation
Page: 6

Automatic Language Identification for Metadata Records: Measuring the Effectiveness of Various Approaches Page: 6

v, 92 pages : illustrations (some color)

This dissertation is part of the collection entitled: UNT Theses and Dissertations and was provided to UNT Digital Library by the UNT Libraries.

View a full description of this dissertation.

Previous search

Adjust Image
Rotate Left
Rotate Right
Brightness, Contrast, etc. (Experimental)
Cropping Tool
Download Sizes
Preview all sizes/dimensions or...
Download Thumbnail
Download Small
Download Medium
Download Large
High Resolution Files
IIIF Image JSON
IIIF Image URL
Accessibility
View Extracted Text

zoom Next

These controls are experimental and have not yet been optimized for user experience.

brightness

Reset Brightness 0

contrast

Reset Contrast 0

saturation

Reset Saturation 0

sharpen

Reset Sharpness 0

exposure

Reset Exposure 0

hue

Reset Hue 0

gamma

Reset Gama 0

Applying filters

Automatic Language Identification for Metadata Records: Measuring the Effectiveness of Various Approaches

6

Previous item Next item

Extracted Text

The following text was automatically extracted from the image on this page using optical character recognition software:

language element will be returned by such a search. As mentioned above, the language element
is not always present in metadata records. Let us say an estimated ten documents in the
collection are in fact in Bulgarian, but only three of the ten have a Bulgarian value in the
language element. Recall is a commonly used measure in information retrieval that refers to the
number of relevant documents retrieved by a search divided by the number of relevant
documents in a collection, or: R = rdr/rdc where R is recall, rdr is relevant documents
returned by a search, and rdc is relevant documents in collection. The recall value of the
hypothetical search above for Bulgarian documents would then be .30, or 30%, which is quite
low in terms of state-of-the-art information retrieval. The hypothetical Bulgarian user would
then be incapable of using more than three of the ten Bulgarian documents present in the
hypothetical collection, short of manually inspecting each title in the entire collection, or
performing an exhaustive search.
This problem is indubitably occurring with a frequency correlated to the burgeoning of
documents and digital materials in various languages from around the globe. This demonstrates
a gap in effective search and retrieval in any digital collection, a gap in which users are incapable
of retrieving relevant documents to a query consisting of the language of the content of desired
documents.
Research question
In light of the problem above, I answer the following question: Of the various approaches to
automatic language identification, which one is most effective for the accurate language
identification of metadata records, specifically, the title elements?

Upcoming Pages

Here’s what’s next.

13 of 98

14 of 98

15 of 98

16 of 98

Show all pages in this dissertation.

Search Inside

This dissertation can be searched. Note: Results may vary based on the legibility of text within the document.

or search this site for other thesis or dissertations

Tools / Downloads

Get a copy of this page or view the extracted text.

Preview all sizes/dimensions or...

Download Thumbnail
Download Small
Download Medium
Download Large
IIIF Image JSON
IIIF Image

View Extracted (OCR) Text

Citing and Sharing

Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.

Reference the current page of this Dissertation.

Knudson, Ryan Charles. Automatic Language Identification for Metadata Records: Measuring the Effectiveness of Various Approaches, dissertation, May 2015; Denton, Texas. (https://digital.library.unt.edu/ark:/67531/metadc801895/m1/12/?rotate=270: accessed July 16, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; .

Automatic Language Identification for Metadata Records: Measuring the Effectiveness of Various Approaches Page: 6

Upcoming Pages

Search Inside

Tools / Downloads

Citing and Sharing

Reference the current page of this Dissertation.

Print / Share This Page

Permanent URL (This Page)

Univesal Viewer

International Image Interoperability Framework (This Page)