The enhancement of machine translation for low-density languages using Web-gathered parallel texts.

PDF Version Also Available for Download.

Description

The majority of the world's languages are poorly represented in informational media like radio, television, newspapers, and the Internet. Translation into and out of these languages may offer a way for speakers of these languages to interact with the wider world, but current statistical machine translation models are only effective with a large corpus of parallel texts - texts in two languages that are translations of one another - which most languages lack. This thesis describes the Babylon project which attempts to alleviate this shortage by supplementing existing parallel texts with texts gathered automatically from the Web -- specifically targeting ... continued below

Creation Information

Mohler, Michael Augustine Gaylord December 2007.

Context

This thesis is part of the collection entitled: UNT Theses and Dissertations and was provided by UNT Libraries to Digital Library, a digital repository hosted by the UNT Libraries. It has been viewed 270 times . More information about this thesis can be viewed below.

Who

People and organizations associated with either the creation of this thesis or its content.

Chair

Committee Members

Publisher

Rights Holder

For guidance see Citations, Rights, Re-Use.

  • Mohler, Michael Augustine Gaylord

Provided By

UNT Libraries

With locations on the Denton campus of the University of North Texas and one in Dallas, UNT Libraries serves the school and the community by providing access to physical and online collections; The Portal to Texas History and UNT Digital Libraries; academic research, and much, much more.

Contact Us

What

Descriptive information to help identify this thesis. Follow the links below to find similar items on the Digital Library.

Description

The majority of the world's languages are poorly represented in informational media like radio, television, newspapers, and the Internet. Translation into and out of these languages may offer a way for speakers of these languages to interact with the wider world, but current statistical machine translation models are only effective with a large corpus of parallel texts - texts in two languages that are translations of one another - which most languages lack. This thesis describes the Babylon project which attempts to alleviate this shortage by supplementing existing parallel texts with texts gathered automatically from the Web -- specifically targeting pages that contain text in a pair of languages. Results indicate that parallel texts gathered from the Web can be effectively used as a source of training data for machine translation and can significantly improve the translation quality for text in a similar domain. However, the small quantity of high-quality low-density language parallel texts on the Web remains a significant obstacle.

Language

Identifier

Unique identifying numbers for this thesis in the Digital Library or other systems.

Collections

This thesis is part of the following collection of related materials.

UNT Theses and Dissertations

Theses and dissertations represent a wealth of scholarly and artistic content created by masters and doctoral students in the degree-seeking process. Some ETDs in this collection are restricted to use by the UNT community.

What responsibilities do I have when using this thesis?

When

Dates and time periods associated with this thesis.

Creation Date

  • December 2007

Added to The UNT Digital Library

  • May 2, 2008, 3:16 p.m.

Description Last Updated

  • Jan. 21, 2014, 1:44 p.m.

Usage Statistics

When was this thesis last used?

Yesterday: 0
Past 30 days: 2
Total Uses: 270

Interact With This Thesis

Here are some suggestions for what to do next.

Start Reading

PDF Version Also Available for Download.

Citations, Rights, Re-Use

Mohler, Michael Augustine Gaylord. The enhancement of machine translation for low-density languages using Web-gathered parallel texts., thesis, December 2007; Denton, Texas. (digital.library.unt.edu/ark:/67531/metadc5140/: accessed September 26, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; .