Chunking of Large Multidimensional Arrays Page: 6 of 19
This report is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided to Digital Library by the UNT Libraries Government Documents Department.
The following text was automatically extracted from the image on this page using optical character recognition software:
0 1 2 Vector of Offset-Indices
7.0 0.0 0.0 0.0 12.0 0.0 0,0> <0,2> <1,0> <2,0> <2,1> <2,2>
0 0.0 3.0 0.0 0.0 0.0 -4.0 9 2 3 6 7 8
0.0 -7.0 0.0 0.0 0.0 0.0
2.0 0.0 0.0 0.0 0.0 0.0
2 I5.0 8.0 0.0 10.0 -1.0 9.0 7.0 0.0~ 12.0 0.C.0 -7.0~ 5.0 8.0 1.0 10.C [1.o 9.o0
[0.04.0 1.0 0.0 2.0 -9.0 . 0.0.4.01.00.0 .0 -9.0
(a) Blocked Matrix (b) Block Addressing
Figure 2: Block Compressed Row Storage
Iz = g((i, j)) that takes the coordinate index (i, j) and returns an offset value Iij, relative to
the position of Io,o and also the inverse function (i, j) = g-'(Iz ) that takes an offset value
and returns the coordinate index. Figure 2b illustrates the offset-indices of the blocks when
the mapping function is the row-major linearizing function. Given the coordinates (i, j), one
determines if the element a2 exists by first computing the offset index of the block I2 using
the mapping function go and then determining whether Iz occurs in the offset index vector or
not. Searching the offset index vector can be done with an interpolation search or alternatively,
organizing the pairs of offset index and block pointers as a balanced binary search tree.
The BCRS method generalizes naturally to multi-dimensional arrays. The chunking process
is equivalent to the block partitioning method used for matrices. For extremely large multi-
dimensional arrays the first level chunk organization can be done with B+-tree. In HDF5 , a
popular multi-dimensional array file format used extensively in scientific computing, the chunk
accessing is done through a B+-tree storage scheme.
3 Access Models of Arrays
A k dimensional array, M[N1, N2, ... , Nk], consists of H>1 Nz elements. Each of its elements,
m(i, i2, ..-. , ik), is indexed by k indices where 0 < i2 < N is its index with respect to the jth
dimension. We wish to store M on disk subject to the constraint that each disk block can hold
at most C elements of M. This is done by partitioning M into equal shape rectangular chunks
such that each chunk fits on a disk block, i.e., if each chunk has dimensions (ci, c2, ..., ck) then
Hk= c< 5C.
The system supports queries that retrieve rectangular sub-arrays of M.
A query q = ([li : ui), [12 : u2), ... , [lk : uk)) specifies a lower bound l and an upper bound u,
on each of the k dimensions. The query retrieves all elements m(i1,i2, ...,ik) of M such that
hg < i2 < u for 1 < j < k. The cost of answering this query is directly related to the number
of chunks (disk blocks) that overlap the sub-array defined by the query. In , it was shown
that knowledge of the predicted query access patterns can be efficiently used to select chunk
dimensions that result in a significant reduction in the cost of answering queries. Prediction of
query access patterns is usually based on query statistics that are collected using query history
logs, sampling, or other statistical methods. Next we present two models commonly used for
query access pattern prediction: The Independent Attribute Range model and The Query Shape
Here’s what’s next.
This report can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Report.
Rotem, Doron; Otoo, Ekow J. & Seshadri, Sridhar. Chunking of Large Multidimensional Arrays, report, February 28, 2007; Berkeley, California. (https://digital.library.unt.edu/ark:/67531/metadc896932/m1/6/: accessed May 26, 2019), University of North Texas Libraries, Digital Library, https://digital.library.unt.edu; crediting UNT Libraries Government Documents Department.