Performance Modeling and Optimization of a High Energy Colliding Beam Simulation Code Page: 4 of 13
This article is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided to UNT Digital Library by the UNT Libraries Government Documents Department.
Extracted Text
The following text was automatically extracted from the image on this page using optical character recognition software:
Total Execution Time and Communication Time
- Seaborg _ _Bassi
Jacquard -x- Jaguar
---Bluegene --
/700
600
500
400
300
200
100
0X-X-X-X
32 64 128 256
Communication Time- 70
60
50
- 40
3032 64 128 256
% CommunicationFigure 3: Total execution time, absolute, and relative communication time of BeamBeam3D. Times are
given in seconds on the left and % of times on the right vertical axis.containing a number of slices which are assigned
to this column of processors cyclically along the
longitudinal direction. This gives a good load bal-
ance of slices among different column processors.
Within each column, the computational grid asso-
ciated with each slice is decomposed uniformly
among all the column processors. This allows us to
parallelize the solution of the Poisson equation.
BeamBeam3D is implemented in Fortran90
using MPI. Table 1 summarizes the most impor-
tant communication steps and the surrounding loop
structures.
3 Performance Analysis
BeamBeam3D is used on several parallel
computing systems. We selected five major sys-
tems for the development of the performance
model and as target for our code optimization.
Table 2 summarizes the main features of these five
systems. This selection contains system with single
processor and two, eight, and 16 way SMP nodes
as well as interconnects with fat-tree or 3D torus
topology using different technologies. The result-
ing global system architectures are hierarchical
with substantially different locality structures. The
full set represents a large variety of communication
hierarchies with various levels of communication
performance and networks with different conten-
tion points.Name System Network T. SMP Proc. BW
size Peak [MB/s]
__________ ________ [ ____ GF/s]____
Seaborg SP Power3 Colony FT 16 1.5 175
Bassi SP Power5 Federation FT 8 7.6 1112
Jacquard Opteron Infiniband FT 2 4.4 360
Jaguar XT3 SeaStar 3D 11 4.8 1084
BG/L BlueGene/L IBM 3D 1* 2.8 142
custom
Table 2: Main architectural features of the system
used in this study. Topology (T) is Fat-tree (FT) or
3D torus (3D); BW is bi-directional link band-
width; (*On BG/L only one processor per node was used.)
Figure 3 shows the total execution time and
communication time on all five systems across in-
creasing concurrency levels for a fixed, typical
problem size with 5 million particles, a grid size of
2562x 8 slices, and a process layout with 16 proc-
esses in column (y) direction (for up to 256 proces-
sors total). The limited scalability for this case is
evident. Increasing the number of simulated parti-
cles would improve performance and scalability
but does not represent the usage of this code. The
chosen concurrency levels (32-256 processors) are
ranging from typical levels up to maximum levels
currently reasonably usable. While total execution
time decreases (also asymptotically only little),
communication time actually increases with in-
creasing concurrency for all systems. This reflects
the fact that the volume of communication for each
processor is constant independent of the number of4
32 64 128 256
Total Execution Time0
C
vlX w -
-
Upcoming Pages
Here’s what’s next.
Search Inside
This article can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Article.
Shan, Hongzhang; Strohmaier, Erich; Qiang, Ji; Bailey, David H. & Yelick, Kathy. Performance Modeling and Optimization of a High Energy Colliding Beam Simulation Code, article, June 1, 2006; (https://digital.library.unt.edu/ark:/67531/metadc901446/m1/4/: accessed April 19, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT Libraries Government Documents Department.