# Chunking of Large Multidimensional Arrays Page: 7 of 19

This
**report**
is part of the collection entitled:
Office of Scientific & Technical Information Technical Reports and
was provided to Digital Library
by the UNT Libraries Government Documents Department.

#### Extracted Text

The following text was automatically extracted from the image on this page using optical character recognition software:

3.1 Independent Attribute Range (IAR)

For a query q = ([i : ui), [12 : u2), ... , [l : uk)), we define its shape to be (A1, A2, ... , AA) where

Ai is the size of its range on the ith dimension, i.e., assuming the ranges are closed on the left

and open on the right, AZ = u2 - l2. In both models a query shape can fall randomly anywhere

within the array. In the IAR model, a probabilistic distribution of the possible range values

is calculated separately for each of the k dimensions. It is assumed that the specifications of

ranges of attributes in queries are independent of each other [1]. This assumption means that the

estimated probability of a query shape is calculated as a product of the estimated probabilities

of its components, i.e., the probability of a shape (ai, a2, ..., a ) is estimated as H 1 p(aj) where

p(aj) is the estimated probability that the value of the range for the ith dimension is a1. More

detailed treatment of this case is provided in Section 4.2.

3.2 Query Shape (QS)

This model is attributed to Sarawagi and Stonebraker [14], As in the IAR model, each query

is associated with a shape (A1, A2, ... , AA). The difference is that in the QS model the query

access pattern is estimated in terms of probability distribution of complete query shapes rather

than distributions of ranges of individual dimensions.

The QS model, groups the queries into a collection of classes L1, L2, L3, ... , Lq such that

the class LZ contains all queries of shape (Ai1, A22, ... A2k). Each class LZ is associated with a

probability P, such that 2 P = 1.0. The access pattern is defined then by the set of pairs

{(Ail, Ai2, ... Aik) : Pi}, 1 < i < q. Our exact analysis of this model is presented in Section 4.3.

3.3 Illustrative Example

Under both models (IAR) and (QS), the actual location of a query shape relative to the array

is assumed to be uniformly distributed. The following small example illustrates the difference

between the two models.

Table 1: Queries

Query number Query shape

1 <1:3,2:5> <2,3>

2 <4:7,6:10> <3,4>

3 <5:9;3:6> <4,3>

4 <6:8,4:7> <2,3>

Example: For a 2-dimensional array, we assume access pattern estimation is based on a

sample of 4 queries given in Table 1. Range distribution for each dimension is given in Table 2

and shape distributions according to the two models are shown in Table 3. Note that under

model (IAR), some shapes that were not observed in the sample are assumed to have non-zero

probability whereas under model (QS) only observed shapes have non- zero probability.

4 Optimizing Array Chunk Shapes

4.1 Analysis of Expected Chunk Overlaps

We will first estimate the number of chunks overlapping a fixed shape and then compute the

expected number of chunks under the probabilistic assumptions for each of the two models.6

## Upcoming Pages

Here’s what’s next.

## Search Inside

This report can be searched. **Note: **Results may vary based on the legibility of text within the document.

## Tools / Downloads

Get a copy of this page or view the extracted text.

## Citing and Sharing

Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.

### Reference the current page of this Report.

Rotem, Doron; Otoo, Ekow J. & Seshadri, Sridhar. Chunking of Large Multidimensional Arrays, report, February 28, 2007; Berkeley, California. (https://digital.library.unt.edu/ark:/67531/metadc896932/m1/7/: accessed May 24, 2019), University of North Texas Libraries, Digital Library, https://digital.library.unt.edu; crediting UNT Libraries Government Documents Department.