Clusters as large-scale development facilities. Page: 4 of 10
This article is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided to UNT Digital Library by the UNT Libraries Government Documents Department.
Extracted Text
The following text was automatically extracted from the image on this page using optical character recognition software:
cases substantially differ. We illustrate this situation in
Table 1.
In essence the basic job model for both uses is similar
in. (Indeed, we support both types of jobs on the system,
often simultaneously.) The specific goals of a testbed
user's job are often quite different, and hence the usage
patterns within that job are different.
3. Characterization of Testbed Usage
In the first days of the system, testbed usage was
largely characterized by users who needed interactive
access to a large cluster for short periods of time. For
example, scientists used the cluster to test systems
software that launched jobs. In this case, which is typical
of much of this type of development, the scientists needed
to use the entire system interactively but only for
moments at a time.
Over time, the usage of the system has changed, both
as the system has become more capable and as the user
community of Chiba City has grown. Recently, we have
had quite a number of different OS and system tool
developers on the system who need to install their own
operating systems on as much of the cluster as possible.
After pushing out and configuring the installation, they
usually run a series of tests that might take hours or days.
We can classify testbed usage based on two metrics:
" Degree of scalability. This describes the degree
to which the specific project is focused on
developing and testing at large scale or carrying
out research into scalability issues.
" Degree of system impact. This describes the
project's ability to operate in the standard
environment. We have categorized these as
"computational usage", "basic development",
"system development", and "extreme
development", each of which is described below.
These two issues, scalability and impact, go hand in
hand. While the testbed can support high-impact
development on a single node, most high-impact testbed
users are also interested in testing scalability issues.
Therefore, while we have found it interesting to note
which of our users operate at large scale, we have not
found it particularly useful to differentiate between them
based on scalability because most of the testbed users
eventually want to use the entire system.
In the following sections, we profile these broad
categories of cluster users and describe the augmented
functionality they require in order to use testbed clusters
effectively. They are described in order of increasing
degree of system impact.
3.1. Computational Usage
The first type of user is a standard application user on
a computational cluster. In most of these cases, the userwants to run a mature application for a considerable
period of time. The operating system running on the
nodes usually doesn't matter as long as the application
can be recompiled for the target system. No enhanced
privileges of any sort are required on any portion of the
system. A computational user may place significant
demand on the I/O system of the cluster.
The intent of a computational application user is
typically to generate a set of numerical results.
3.2. Basic Development
The second type of user, quite common on Chiba City,
is the basic development user. A good example of this
type of user is the developer of a systems library such as a
numerical library or a communications library.
In general, this type of user is interested in code
scalability and performance. The following are typical
requirements, some of which were noted above:
" On-demand Access. Waiting in a queue when
actively developing can severely limit the
effectiveness of a development session. These
users like to be in the "code/compile/test" loop
that is common for development on unscheduled
systems. In order to address this issue, we took
three steps:
o Thirty-two of the nodes on the system
were designated "unscheduled", that is
the scheduler does not control access to
them. They are available to all users at
all times and are intended to be used as
an on-demand testing area.
o The scheduler policy was arranged to
allow only very short-running jobs
during certain business hours. In
theory, this would allow the quick jobs
that characterize this kind of
development to migrate to the front of
the queue. In reality, we discovered
that this feature was rarely used because
developers didn't link their
development schedules to the queue
policy.
o We made it possible to easily request
reserved nodes on the system for
arbitrary amounts of time.
" Interactive Access. Interactive debugging is
important for the basic development user. This
type of access to the nodes was enabled by
default; when the scheduler allocates a node to a
user, that user can login to that node.
" Property Extraction in Job Runs. The
development user frequently wants information
from a variety of sources ranging from kernel
Upcoming Pages
Here’s what’s next.
Search Inside
This article can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Article.
Evard, R.; Desai, N.; Navarro, J. P. & Nurmi, D. Clusters as large-scale development facilities., article, July 1, 2002; Illinois. (https://digital.library.unt.edu/ark:/67531/metadc741653/m1/4/: accessed April 25, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT Libraries Government Documents Department.