On the performance of bitmap indices for high cardinality attributes

PDF Version Also Available for Download.

Description

It is well established that bitmap indices are efficient for read-only attributes with a small number of distinct values. For an attribute with a large number of distinct values, the size of the bitmap index can be very large. To over come this size problem, specialized compression schemes are used. Even though there is empirical evidence that some of these compression schemes work well, there has not been any systematic analysis of their effectiveness. In this paper, we analyze the time and space complexities of the two most efficient bitmap compression techniques known, the Byte-aligned Bitmap Code (BBC) and the ... continued below

Physical Description

vp.

Creation Information

Wu, Kesheng; Otoo, Ekow & Shoshani, Arie March 5, 2004.

Context

This article is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided by UNT Libraries Government Documents Department to Digital Library, a digital repository hosted by the UNT Libraries. It has been viewed 12 times . More information about this article can be viewed below.

Who

People and organizations associated with either the creation of this article or its content.

Publisher

Provided By

UNT Libraries Government Documents Department

Serving as both a federal and a state depository library, the UNT Libraries Government Documents Department maintains millions of items in a variety of formats. The department is a member of the FDLP Content Partnerships Program and an Affiliated Archive of the National Archives.

Contact Us

What

Descriptive information to help identify this article. Follow the links below to find similar items on the Digital Library.

Description

It is well established that bitmap indices are efficient for read-only attributes with a small number of distinct values. For an attribute with a large number of distinct values, the size of the bitmap index can be very large. To over come this size problem, specialized compression schemes are used. Even though there is empirical evidence that some of these compression schemes work well, there has not been any systematic analysis of their effectiveness. In this paper, we analyze the time and space complexities of the two most efficient bitmap compression techniques known, the Byte-aligned Bitmap Code (BBC) and the Word-Aligned Hybrid (WAH) code, and study their performance on high cardinality attributes. Our analyses indicate that both compression schemes are optimal in time. The time and space required to operate on two compressed bitmaps are proportional to the total size of the two bitmaps. We demonstrate further that an in-place OR algorithm can operate on a large number of sparse bitmaps in time linear in their total size. Our analyses also show that the compressed indices are relatively small compared with commonly used indices such as B-trees. Given these facts, we conclude that bitmap index is efficient on attributes of low cardinalities as well as on those of high cardinalities. We also verify the analytical results with extensive tests, and identify an optimal way to combine different options to achieve the best performance. The test results confirm the linearity in the total size of the compressed bitmaps, and that WAH out performs BBC by about a factor of two.

Physical Description

vp.

Notes

OSTI as DE00822860

Source

  • VLDB 2004: 30th International Conference on Very Large Data Bases, Toronto, Ontario (CA), 08/30/2004--09/03/2004

Language

Item Type

Identifier

Unique identifying numbers for this article in the Digital Library or other systems.

  • Report No.: LBNL--54673
  • Grant Number: AC03-76SF00098
  • Office of Scientific & Technical Information Report Number: 822860
  • Archival Resource Key: ark:/67531/metadc785258

Collections

This article is part of the following collection of related materials.

Office of Scientific & Technical Information Technical Reports

Reports, articles and other documents harvested from the Office of Scientific and Technical Information.

Office of Scientific and Technical Information (OSTI) is the Department of Energy (DOE) office that collects, preserves, and disseminates DOE-sponsored research and development (R&D) results that are the outcomes of R&D projects or other funded activities at DOE labs and facilities nationwide and grantees at universities and other institutions.

What responsibilities do I have when using this article?

When

Dates and time periods associated with this article.

Creation Date

  • March 5, 2004

Added to The UNT Digital Library

  • Dec. 3, 2015, 9:30 a.m.

Description Last Updated

  • April 4, 2016, 1:49 p.m.

Usage Statistics

When was this article last used?

Yesterday: 0
Past 30 days: 0
Total Uses: 12

Interact With This Article

Here are some suggestions for what to do next.

Start Reading

PDF Version Also Available for Download.

Citations, Rights, Re-Use

Wu, Kesheng; Otoo, Ekow & Shoshani, Arie. On the performance of bitmap indices for high cardinality attributes, article, March 5, 2004; Berkeley, California. (digital.library.unt.edu/ark:/67531/metadc785258/: accessed May 21, 2018), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Government Documents Department.