In order to address the needs of future scientific applications for storing and accessing large amounts of data in an efficient way, one needs to understand the limitations of current technologies and how they may cause systeminstability or unavailability. A number of factors can impact system availability ranging from facility-wide power outage to a single point of failure such as network switches or global file systems. In addition, individual component failure in a system can degrade the performance of that system. This paper focuses on analyzing both of these factors and their impacts on the computational and storage systems at …
continued below
Publisher Info:
Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA (United States)
Place of Publication:
Berkeley, California
Provided By
UNT Libraries Government Documents Department
Serving as both a federal and a state depository library, the UNT Libraries Government Documents Department maintains millions of items in a variety of formats. The department is a member of the FDLP Content Partnerships Program and an Affiliated Archive of the National Archives.
Descriptive information to help identify this report.
Follow the links below to find similar items on the Digital Library.
Description
In order to address the needs of future scientific applications for storing and accessing large amounts of data in an efficient way, one needs to understand the limitations of current technologies and how they may cause systeminstability or unavailability. A number of factors can impact system availability ranging from facility-wide power outage to a single point of failure such as network switches or global file systems. In addition, individual component failure in a system can degrade the performance of that system. This paper focuses on analyzing both of these factors and their impacts on the computational and storage systems at NERSC. Component failure data presented in this report primarily focuses on disk drive in on of the computational system and tape drive failure in HPSS. NERSC collected available component failure data and system-wide outages for its computational and storage systems over a six-year period and made them available to the HPC community through the Petascale Data Storage Institute.
This report is part of the following collection of related materials.
Office of Scientific & Technical Information Technical Reports
Reports, articles and other documents harvested from the Office of Scientific and Technical Information.
Office of Scientific and Technical Information (OSTI) is the Department of Energy (DOE) office that collects, preserves, and disseminates DOE-sponsored research and development (R&D) results that are the outcomes of R&D projects or other funded activities at DOE labs and facilities nationwide and grantees at universities and other institutions.