The enhancement of machine translation for low-density languages using Web-gathered parallel texts.

PDF Version Also Available for Download.

Description

The majority of the world's languages are poorly represented in informational media like radio, television, newspapers, and the Internet. Translation into and out of these languages may offer a way for speakers of these languages to interact with the wider world, but current statistical machine translation models are only effective with a large corpus of parallel texts - texts in two languages that are translations of one another - which most languages lack. This thesis describes the Babylon project which attempts to alleviate this shortage by supplementing existing parallel texts with texts gathered automatically from the Web -- specifically targeting … continued below

Creation Information

Mohler, Michael Augustine Gaylord December 2007.

Context

This thesis is part of the collection entitled: UNT Theses and Dissertations and was provided by the UNT Libraries to the UNT Digital Library, a digital repository hosted by the UNT Libraries. It has been viewed 452 times. More information about this thesis can be viewed below.

Who

People and organizations associated with either the creation of this thesis or its content.

Chair

Committee Members

Publisher

Rights Holder

For guidance see Citations, Rights, Re-Use.

  • Mohler, Michael Augustine Gaylord

Provided By

UNT Libraries

The UNT Libraries serve the university and community by providing access to physical and online collections, fostering information literacy, supporting academic research, and much, much more.

Contact Us

What

Descriptive information to help identify this thesis. Follow the links below to find similar items on the Digital Library.

Description

The majority of the world's languages are poorly represented in informational media like radio, television, newspapers, and the Internet. Translation into and out of these languages may offer a way for speakers of these languages to interact with the wider world, but current statistical machine translation models are only effective with a large corpus of parallel texts - texts in two languages that are translations of one another - which most languages lack. This thesis describes the Babylon project which attempts to alleviate this shortage by supplementing existing parallel texts with texts gathered automatically from the Web -- specifically targeting pages that contain text in a pair of languages. Results indicate that parallel texts gathered from the Web can be effectively used as a source of training data for machine translation and can significantly improve the translation quality for text in a similar domain. However, the small quantity of high-quality low-density language parallel texts on the Web remains a significant obstacle.

Language

Identifier

Unique identifying numbers for this thesis in the Digital Library or other systems.

Collections

This thesis is part of the following collection of related materials.

UNT Theses and Dissertations

Theses and dissertations represent a wealth of scholarly and artistic content created by masters and doctoral students in the degree-seeking process. Some ETDs in this collection are restricted to use by the UNT community.

What responsibilities do I have when using this thesis?

When

Dates and time periods associated with this thesis.

Creation Date

  • December 2007

Added to The UNT Digital Library

  • May 2, 2008, 3:16 p.m.

Description Last Updated

  • Jan. 21, 2014, 1:44 p.m.

Usage Statistics

When was this thesis last used?

Yesterday: 0
Past 30 days: 0
Total Uses: 452

Interact With This Thesis

Here are some suggestions for what to do next.

Start Reading

PDF Version Also Available for Download.

International Image Interoperability Framework

IIF Logo

We support the IIIF Presentation API

Mohler, Michael Augustine Gaylord. The enhancement of machine translation for low-density languages using Web-gathered parallel texts., thesis, December 2007; Denton, Texas. (https://digital.library.unt.edu/ark:/67531/metadc5140/: accessed July 17, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; .

Back to Top of Screen