Infrastructure for distributed enterprise simulation Page: 42 of 77
This report is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided to UNT Digital Library by the UNT Libraries Government Documents Department.
Extracted Text
The following text was automatically extracted from the image on this page using optical character recognition software:
Unfortunately, in a distributed system such as this, where multiple users may be simultaneous
clients of the Batch Server, there are really multiple threads of control.
The problems arise when a server acts as a client of another job, for example, when the
Batch Server creates an object to spawn a job on another machine using its machine server.
While the Batch Server is waiting for a reply from the Machine Server it must continue
processing requests. For example, the machine server will need to obtain the job and its
characteristics from the Batch Server while it is creating the process. However, suppose a user
requests that the job be canceled while the job is being started. The batch object's data structure
can be placed in an inconsistent state resulting in an eventual failure. The preferred solution is to
use mutual exclusion locks to protect sensitive areas of code. Since these locks were not
available to us, we pursued the alternative of carefully coding the application so that methods of
remote objects will not be invoked during critical sections of code. Unfortunately, this defeats
somewhat' the purpose of an object-oriented approach, since knowledge of the details of each
object's implementation is needed to write correct code to use it.
4.2 DISTRIBUTED QUEUING SYSTEM
The Distributed Queuing System (DQS) is a publicly available batch queuing system related to
the widely used Network Queuing System (NQS). It is a complete rewrite of NQS and provides
the additional capability of being able to run parallel jobs across some or all of the nodes in a
cluster of machines.
Although some bugs have arisen in DQS on our system, after applying our fixes DQS
seems exceptionally stable. Furthermore, the DQS development team has been quite responsive
in applying our fixes to their code so that the bugs will not affect us in future releases.
The main problem with DQS is its simplistic scheduling system for parallel jobs. When a
parallel job is at the top of the queue it can only run if the minimum number of processors it
requests are available. Otherwise, the next job will be considered and so on. If one of these jobs
only requires as many processors as are available, then it will be run. Thus, the parallel job may
be prevented from running indefinitely. We plan to monitor the performance of DQS to
determine if and how the scheduling algorithm needs to be rewritten.41
Upcoming Pages
Here’s what’s next.
Search Inside
This report can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Report.
Johnson, M. M.; Yoshimura, A. S. & Goldsby, M. E. Infrastructure for distributed enterprise simulation, report, January 1, 1998; United States. (https://digital.library.unt.edu/ark:/67531/metadc690726/m1/42/: accessed April 19, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT Libraries Government Documents Department.