Scheduling in Heterogeneous Grid Environments: The Effects of DataMigration Page: 4 of 8
This article is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided to Digital Library by the UNT Libraries Government Documents Department.
The following text was automatically extracted from the image on this page using optical character recognition software:
three job migration algorithms therefore lie in the timing
of the job transfer request initiations and the destination
choice for those requests.
B. Reference Algorithms
We use two scheduling algorithms as baseline ref-
erences for comparison. The centralized strategy has a
single GS and represents a performance target for our
distributed scheduling approaches. The local algorithm,
on the other hand, performs no job migration and repre-
sents a traditional non-grid scheduling environment.
1) Centralized: In the centralized algorithm, all jobs
are submitted to a single GS which does not have
an affinity to a specific compute server. The GS is
responsible for making global decisions and assigning
each job to a specific machine. It tracks the status of each
job and maintains current information on all available re-
sources, allowing it to compute ATT and RU without any
communication. When a job is submitted, the GS selects
the optimal server (based on ATT and RU) and migrates
the job to that system. Although communication-free
resource awareness is unrealistic, this strategy allows us
to model the potential gain of a centralized architecture.
However, the model is impractical as it constitutes a
single point of failure and thus suffers from a lack of
reliability and fault tolerance. Additionally, this approach
has severe scalability problems that may result in a per-
formance bottleneck for large-scale grid environments.
2) Local: In the local scheduling algorithm, there are
no GSs. All jobs are submitted to LSs and executed
on the compute server associated with each LS. This
approach represents how scheduling is currently being
performed and we use it to demonstrate the benefits of
grid scheduling algorithms. The local scheduling policy
for all strategies is first-come-first-serve with backfilling.
We evaluate our grid scheduling algorithms by sim-
ulating resources and jobs. Our environment models
the submission of workloads to grid schedulers, the
operation of grid and local schedulers, the transfer of job
input/output data between compute servers, the network
bandwidth contention, and the execution of jobs on
compute servers. During these simulations, we gather
performance information so that we can compare the var-
ious grid scheduling algorithms. Our specific objective in
this paper is to understand the effects of data migration.
A. Resource Configurations
We simulate seven actual compute servers in our
simulations. These systems are located at Lawrence
Berkeley National Laboratory, NASA Ames Research
Center, Lawrence Livermore National Laboratory, and
San Diego Supercomputer Center. All seven machines
are either cache-coherent SMP clusters or NUMA shared
memory systems, interconnected by a fast proprietary
network. Both architectures partition CPUs into nodes
for management purposes, and the current practice is to
allocate each node to a single application so that applica-
tions do not interfere with each other. We therefore used
this allocation approach in our simulation environment.
We wanted to use 12 servers to give us more flexibility
for grouping them into sets; so we duplicated five of
the seven systems. The systems were then split into 3,
6, and 12 sets to simulate machines grouped at 3, 6,
and 12 different sites. Each site has an equal number of
machines with equivalent total computational power. The
characteristics of these systems and the sites to which
they are assigned are shown in Table I.
CONFIGURATIONS OF THE COMPUTE SERVERS AND THEIR
ASSIGNMENT TO SITES
Server # of CPUs/ Clock Site Locator
ID Nodes Node (MHz) 3 sites [6 sites 12 sites
Si 184 16 375 0 0 0
S2 305 4 332 1 1 1
S3 144 8 375 2 3 2
S4 256 4 600 1 0 3
SS 32 2 250 2 2 4
S6 128 4 400 2 5 5
57 64 2 250 2 5 6
S8 144 8 375 1 2 7
So 256 4 600 0 4 8
Sio 32 2 250 0 1 9
Sii 128 4 400 0 3 10
12I 64 2 250 1 4 11
We also simulate the networks connecting the servers.
We assume that all systems at a single site share a local
network and that each of these networks is connected
to every other local network (at remote sites) using a
point-to-point connection. When we model the transfer
of a job's input/output data, we simulate the use of
a local network on the sending side, a point-to-point
inter-site network, and a local network on the receiving
side. Any of these three networks can constrain the end-
to-end data migration bandwidth. We assume that all
data transfers share the network bandwidth equally, and
network contention occurs when multiple data transfers
simultaneously utilize a given network path. In our
model, each site network has a peak bandwidth of
800 Mb/s, while 40 Mb/s is available from each point-to-
point network. This represents a gigabit Ethernet LAN
and a relatively high-performance WAN.
For the experiments reported here, we make two
simplifying assumptions. First, we assume that program
performance is linearly related to CPU speed. Second,
even though the systems we are simulating are not all
binary compatible, we assume that users have compiled
Here’s what’s next.
This article can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Article.
Oliker, Leonid; Biswas, Rupak; Shan, Hongzhang & Smith, Warren. Scheduling in Heterogeneous Grid Environments: The Effects of DataMigration, article, January 1, 2004; Berkeley, California. (https://digital.library.unt.edu/ark:/67531/metadc901613/m1/4/: accessed April 20, 2019), University of North Texas Libraries, Digital Library, https://digital.library.unt.edu; crediting UNT Libraries Government Documents Department.