Beyond the CM-5: A case study in performance analysis for the CM-5, T3D, and high performance RISC workstations Page: 3 of 15
This article is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided to Digital Library by the UNT Libraries Government Documents Department.
The following text was automatically extracted from the image on this page using optical character recognition software:
Beyond the CM-5 : A Case Study in Performance Analysis for
the CM-5, T3D, and High Performance RISC Workstations
David M. Beazley Peter S. Lomdahl
Department of Computer Science Theoretical Division
University of Utah Los Alamos National Laboratory
Salt Lake City, UT 84112 Los Alamos, NM 87545
March 22, 1995
We present a comphrehensive performance evaluation of our molecular dynamics code SPaSM on
the CM-5 in order to devise optimization strategies for the CM-5, T3D, and RISC workstations.
In this analysis, we focus on the effective use of the SPARC microprocessor by performing mea-
surements of instruction set utilization, cache effects, memory access patterns, and pipeline stall
cycles. We then show that we can account for more than 99% of observed execution time of our pro-
gram. Optimization strategies are devised and we show that our highly optimized ANSI C program
running only on the SPARC microprocessor of the CM-5 is only twice as slow as our Gordon-Bell
prize winning code that utilized the CM-5 vector units. On the CM-5E, we show that this optimized
code runs faster than the vector unit version. We then apply these techniques to the Cray T3D and
measure resulting speedups. Finally, we show that simple optimization strategies are effective on a
wide variety of high performance RISC workstations.
It has often been remarked by history professors that "the only way to know where you're going
is to know where you've been." In this paper, we will apply this simple principle to our molecular
dynamics code developed for the CM-5 by performing a detailed study of processor utilization.
Having spent over 2 years working on the CM-5, it is now time to start thinking about other
architectures. In order to get high performance on new systems, we feel that we must step back
and make a thorough assessment of how we used the CM-5 in the first place. By performing
this study, we hope to discover performance problems so that we can avoid the other historical
observation that "history tends to repeat itself."
Our performance study will take an unconventional approach. Rather than focusing on
Mflops, communications times, scaling, and other popular measurements, we will explore a very
simple question : How well did our code utilize the processing power of the CM-5 processing nodes?
To answer this question, we will attempt to quantify code behavior in a way similiar to that found
in Hennessy and Patterson . We will focus on instruction set utilization, stall behavior, and
cache effects in addition to scientific and algorithmic considerations. From this information we
will suggest optimization strategies and measure resulting speedups. Finally, we will test these
optimizations on the Cray T3D and high performance HP, SGI and IBM workstations. As RISC
Here’s what’s next.
This article can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Article.
Beazley, D.M. & Lomdahl, P.S. Beyond the CM-5: A case study in performance analysis for the CM-5, T3D, and high performance RISC workstations, article, March 22, 1995; New Mexico. (digital.library.unt.edu/ark:/67531/metadc622361/m1/3/: accessed November 18, 2018), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Government Documents Department.