Grid Collector: Using an event catalog to speed up user analysisin distributed environment

PDF Version Also Available for Download.

Description

Nuclear and High Energy Physics experiments such as STAR at BNL are generating millions of files with Peta Bytes of data each year. In most cases, analysis programs have to read all events in a file in order to find the interesting ones. Since the interesting events may be a small fraction of events in the file, a significant portion of the computer time is wasted on reading the unwanted events. To address this issue, we developed a software system called Grid Collector. The core of Grid Collector is an Event Catalog. This catalog can be efficiently searched with compressed ... continued below

Creation Information

Wu, Kesheng; Shoshani, Arie; Zhang, Wei-Ming; Lauret, Jerome & Perevoztchikov, Victor November 1, 2004.

Context

This article is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided by UNT Libraries Government Documents Department to Digital Library, a digital repository hosted by the UNT Libraries. More information about this article can be viewed below.

Who

People and organizations associated with either the creation of this article or its content.

Sponsor

Publisher

Provided By

UNT Libraries Government Documents Department

Serving as both a federal and a state depository library, the UNT Libraries Government Documents Department maintains millions of items in a variety of formats. The department is a member of the FDLP Content Partnerships Program and an Affiliated Archive of the National Archives.

Contact Us

What

Descriptive information to help identify this article. Follow the links below to find similar items on the Digital Library.

Description

Nuclear and High Energy Physics experiments such as STAR at BNL are generating millions of files with Peta Bytes of data each year. In most cases, analysis programs have to read all events in a file in order to find the interesting ones. Since the interesting events may be a small fraction of events in the file, a significant portion of the computer time is wasted on reading the unwanted events. To address this issue, we developed a software system called Grid Collector. The core of Grid Collector is an Event Catalog. This catalog can be efficiently searched with compressed bitmap indices. Tests show that Grid Collector can index and search STAR event data much faster than database systems. It is fully integrated with an existing analysis framework so that a minimal effort is required to use Grid Collector. In addition, by taking advantage of existing file catalogs, Storage Resource Managers (SRMs) and GridFTP, Grid Collector automatically downloads the needed files anywhere on the Grid without user intervention. Grid Collector can significantly improve user productivity. For a user that typically performs computation on 50 percent of the events, using Grid Collector could reduce the turn around time by 30 percent. The improvement is more significant when searching for rare events, because only a small number of events with appropriate properties are read into memory and the necessary files are automatically located and down loaded through the best available route.

Source

  • Computing in High Energy and Nuclear Physics(CHEP) 2004, Interlaken, Switzerland, 27th September - 1st October2004

Language

Item Type

Identifier

Unique identifying numbers for this article in the Digital Library or other systems.

  • Report No.: LBNL--58092
  • Grant Number: DE-AC02-05CH11231
  • Office of Scientific & Technical Information Report Number: 882078
  • Archival Resource Key: ark:/67531/metadc875242

Collections

This article is part of the following collection of related materials.

Office of Scientific & Technical Information Technical Reports

What responsibilities do I have when using this article?

When

Dates and time periods associated with this article.

Creation Date

  • November 1, 2004

Added to The UNT Digital Library

  • Sept. 21, 2016, 2:29 a.m.

Description Last Updated

  • Sept. 22, 2017, 3:07 p.m.

Usage Statistics

When was this article last used?

Congratulations! It looks like you are the first person to view this item online.

Interact With This Article

Here are some suggestions for what to do next.

Start Reading

PDF Version Also Available for Download.

Citations, Rights, Re-Use

Wu, Kesheng; Shoshani, Arie; Zhang, Wei-Ming; Lauret, Jerome & Perevoztchikov, Victor. Grid Collector: Using an event catalog to speed up user analysisin distributed environment, article, November 1, 2004; Berkeley, California. (digital.library.unt.edu/ark:/67531/metadc875242/: accessed September 23, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Government Documents Department.