Applying chimera virtual data concepts to cluster finding in the Sloan Sky Survey Page: 2 of 15
This article is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided to UNT Digital Library by the UNT Libraries Government Documents Department.
Extracted Text
The following text was automatically extracted from the image on this page using optical character recognition software:
Virtual Galaxy Clusters 2
- Will the derived data be traceable in the manner expected?
- Will the computations map onto effective DAGs for efficient grid execution?
- When code or data changes can we identify dependent re-derivations?
- Will the virtual data paradigm enhance overall productivity?
More specifically, we demonstrate for the first time-albeit only in prototype form-a general, discipline-
independent mechanism that allows scientists in any field to use an off-the-shelf toolkit to track their data
production and, with relative ease, to harness the power of large-scale grid resources.
The work reported here complements and extends other work by the GriPhyN collaboration [8-10]. Also
related is work on data lineage in database systems [5, 7, 19, 20, 25]. Our work leverages these techniques,
but differs in two respects: first, data is not necessarily stored in databases and the operations used to derive
data items may be arbitrary computations; second, we address issues relating to the automated generation
and scheduling of the computations required to instantiate data products.
2 GriPhyN Tools for the Virtual Data Grid
The current toolkit (VDT V1.0) includes the Globus Toolkit [13], Condor and Condor-G [16, 21], and the
Grid Data Mirroring Package [18, 22].
We apply here a new tool to be included in VDT: the Chimera virtual data system [15]. Chimera supports
the capture and reuse of information on how data is generated by computations. It comprises a virtual data
catalog, used to record virtual data information, plus a virtual data language interpreter that translates data
definition and query operations expressed in a virtual data language (VDL) into virtual data catalog
operations (Figure 1).
Virtual Data
Applications -
Task Graphs
Chimera (compute and data
Chimeramovement tasks, with
Virtual Data Language dependencies)
(definition and query)
VDL Interpreter Data Grid Resources
(manipulate derivations (distributed execution
and transformations) and data management)
SQL
Virtual Data Catalog
(implements Chimera
Virtual Data Schema)
Figure 1: Schematic of the Chimera architecture
The VDC tracks how data is derived, with sufficient precision that one can create and re-create the data from
this knowledge. One can then definitively determine how the data was created - something that is often not
feasible today in the massive data collections maintained by large collaborations. One can also implement a
new class of "virtual data management" operations that, for example, "re-materialize" data products that
were deleted, generate data products that were defined but never created, regenerate data when data
dependencies or transformation programs change, and/or create replicas of data products at remote locations
when re-creation is more efficient than data transfer. This brings the power and discipline that we have so
Upcoming Pages
Here’s what’s next.
Search Inside
This article can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Article.
al., James Annis et. Applying chimera virtual data concepts to cluster finding in the Sloan Sky Survey, article, August 13, 2002; Batavia, Illinois. (https://digital.library.unt.edu/ark:/67531/metadc737748/m1/2/: accessed April 17, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT Libraries Government Documents Department.