BABYLON Parallel Text Builder: Gathering Parallel Texts for Low-Density Languages

PDF Version Also Available for Download.


This paper discusses BABYLON parallel text builder.

Physical Description

4 p.

Creation Information

Mohler, Michael & Mihalcea, Rada, 1974- May 2008.


This paper is part of the collection entitled: UNT Scholarly Works and was provided by UNT College of Engineering to Digital Library, a digital repository hosted by the UNT Libraries. It has been viewed 868 times , with 43 in the last month . More information about this paper can be viewed below.


People and organizations associated with either the creation of this paper or its content.



Provided By

UNT College of Engineering

The UNT College of Engineering promotes intellectual and scholarly pursuits in the areas of computer science and engineering, preparing innovative leaders in a variety of disciplines. The UNT College of Engineering encourages faculty and students to pursue interdisciplinary research among numerous subjects of study including databases, numerical analysis, game programming, and computer systems architecture.

Contact Us


Descriptive information to help identify this paper. Follow the links below to find similar items on the Digital Library.

Degree Information


This paper discusses BABYLON parallel text builder.

Physical Description

4 p.


Abstract: This paper describes BABYLON, a system that attempts to overcome the shortage of parallel texts in low-density languages by supplementing existing parallel texts with texts gathered automatically from the Web. In addition to the identification of entire Web pages, the authors also propose a new feature specifically designed to find parallel text chunks within a single document. Experiments carried out on the Quechua-Spanish language pair show that the system is successful in automatically identifying a significant amount of parallel texts on the Web. Evaluations of a machine translation system trained on this corpus indicate that the Web-gathered parallel texts can supplement manually compiled parallel texts and perform significantly better than the manually compiled texts when tested on other Web-gathered data.


  • Sixth International Conference on Language Resources and Evaluation, 2008, Marrakech, Morocco


Item Type


This paper is part of the following collection of related materials.

UNT Scholarly Works

The Scholarly Works Collection is home to materials from the University of North Texas community's research, creative, and scholarly activities and serves as UNT's Open Access Repository. It brings together articles, papers, artwork, music, research data, reports, presentations, and other scholarly and creative products representing the expertise in our university community. Access to some items in this collection may be restricted.

What responsibilities do I have when using this paper?


Dates and time periods associated with this paper.

Creation Date

  • May 2008

Added to The UNT Digital Library

  • Jan. 31, 2011, 2:01 p.m.

Description Last Updated

  • March 27, 2014, 11:26 a.m.

Usage Statistics

When was this paper last used?

Yesterday: 4
Past 30 days: 43
Total Uses: 868

Interact With This Paper

Here are some suggestions for what to do next.

Start Reading

PDF Version Also Available for Download.

Citations, Rights, Re-Use

Mohler, Michael & Mihalcea, Rada, 1974-. BABYLON Parallel Text Builder: Gathering Parallel Texts for Low-Density Languages, paper, May 2008; [Paris, France]. ( accessed May 29, 2017), University of North Texas Libraries, Digital Library,; crediting UNT College of Engineering.