This system will be undergoing maintenance Tuesday, December 6 from 9AM to 12PM CST.

Creating a Testbed for the Evaluation of Automatically Generated Back-of-the-book Indexes


This paper discusses automatic generating of back-of-the-book indexes.

Creation Date: February 2006
UNT College of Engineering
UNT Scholarly Works
Total Uses: 119
Past 30 days: 3
Yesterday: 0
Creator (Author):
Csomai, Andras

University of North Texas

Creator (Author):
Mihalcea, Rada, 1974-

University of North Texas

Publisher Info:
Publisher Name: Springer-Verlag
Place of Publication: [Berlin, Germany]
  • Creation: February 2006

This paper discusses automatic generating of back-of-the-book indexes.


Abstract: The automatic generation of back-of-the-book indexes seems to be out of sight of the Information Retrieval and Natural Language Processing communities, although the increasingly large number of books available in electronic format, as well as recent advances in key-phrase extraction, should motivate an increased interest in this topic. In this paper, the authors describe the background relevant to the process of creating back-of-the-book indexes, namely (1) a short overview of the origin and structure of back-of-the-book indexes, and (2) the correspondence that can be established between techniques for automatic index construction and keyphrase extraction. Since the development of any automatic system requires in the first place an evaluation testbed, the authors describe their work in building a gold standard collection of books and indexes, and the authors present several metrics that can be used for the evaluation of automatically generated indexes against the gold standard. Finally, the authors investigate the properties of the gold standard index, such as index size, length of index entries, and upper bounds on coverage as indicated by the presence of index entries in the document.

Physical Description:

12 p.

Keyword(s): natural language processing | indexes | information retrieval
Source: Conference on Computational Linguistics and Intelligent Text Processing (CICLing), 2006, Mexico City, Mexico
UNT College of Engineering
UNT Scholarly Works
  • ARK: ark:/67531/metadc30982
Resource Type: Paper
Format: Text
Access: Public