BABYLON Parallel Text Builder: Gathering Parallel Texts for Low-Density Languages

Description:

This paper discusses BABYLON parallel text builder.

Creator(s):
Creation Date: May 2008
Partner(s):
UNT College of Engineering
Collection(s):
UNT Scholarly Works
Usage:
Total Uses: 218
Past 30 days: 7
Yesterday: 1
Creator (Author):
Mohler, Michael

University of North Texas

Creator (Author):
Mihalcea, Rada, 1974-

University of North Texas

Publisher Info:
Place of Publication: [Paris, France]
Date(s):
  • Creation: May 2008
Description:

This paper discusses BABYLON parallel text builder.

Degree:
Note:

Abstract: This paper describes BABYLON, a system that attempts to overcome the shortage of parallel texts in low-density languages by supplementing existing parallel texts with texts gathered automatically from the Web. In addition to the identification of entire Web pages, the authors also propose a new feature specifically designed to find parallel text chunks within a single document. Experiments carried out on the Quechua-Spanish language pair show that the system is successful in automatically identifying a significant amount of parallel texts on the Web. Evaluations of a machine translation system trained on this corpus indicate that the Web-gathered parallel texts can supplement manually compiled parallel texts and perform significantly better than the manually compiled texts when tested on other Web-gathered data.

Physical Description:

4 p.

Language(s):
Subject(s):
Keyword(s): parallel texts | low-density languages | BABYLON
Source: Sixth International Conference on Language Resources and Evaluation, 2008, Marrakech, Morocco
Contributor(s):
Partner:
UNT College of Engineering
Collection:
UNT Scholarly Works
Identifier:
  • ARK: ark:/67531/metadc31004
Resource Type: Paper
Format: Text
Rights:
Access: Public