Description: This research describes Fountain, a suite of programs used to monitor the resources of a cluster. A cluster is a collection of individual computers that are connected via a high speed communication network. They are traditionally used by users who desire more resources, such as processing power and memory, than any single computer can provide. A common drawback to effectively utilizing such a large-scale system is the management infrastructure, which often does not often scale well as the system grows. Large-scale parallel systems provide new research challenges in the area of systems software, the programs or tools that manage the system from boot-up to running a parallel job. The approach presented in this thesis utilizes a collection of separate components that communicate with each other to achieve a common goal. While systems software comprises a broad array of components, this thesis focuses on the design choices for a node monitoring component. We will describe Fountain, an implementation of the Scalable Systems Software (SSS) node monitor specification. It is targeted at aggregate node monitoring for clusters, focusing on both scalability and fault tolerance as its design goals. It leverages widely used technologies such as XML and HTTP to present an interface to other components in the SSS environment.
Date: August 9, 2006
Creator: Miller, Samuel James
Partner: UNT Libraries Government Documents Department