Accurately measuring MPI broadcasts in a computational grid Page: 4 of 12
This article is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided to Digital Library by the UNT Libraries Government Documents Department.
The following text was automatically extracted from the image on this page using optical character recognition software:
Figure 1: Time Line of a 16 Task MPICH Broadcast
2. Collective Communications Model
A method to benchmark implementations of collective
communications needs to measure several properties.
Almost all collective communication benchmarks attempt
to measure the time required to complete the communi-
cation, from its first send until its last receive. Although
this is an important quantity, these methods overlook
several other important properties, such as local
processing overheads and the potential to overlap compu-
tation with communication. In this section, we use a model
of collective communications based on the LogP model to
characterize the important performance properties of
In the LogP model, four parameters capture point-to-
point communication . The send overhead, os, is the
time during which a processor is sending a message, while
the receive overhead, or, is the portion of the time that a
processor is receiving a message that cannot be overlapped
with the message transmission. The (wire) latency, L, is
the time that a message actually spends in transit from its
source to its destination; the more conventional definition
of message latency is equal to os+ L +or. The final
parameter, the gap, g, measures the ability to overlap
computation and communication while fully utilizing the
communication system and is equal to the minimum
interval between consecutive message sends or receives.
We extend the LogP with per processor parameters to
capture collective communications more accurately. Our
extensions apply to both asymmetrical (collective commu-
nications with a root) and symmetrical collective commu-
nications . The per processor overhead is the time, oi,
spent sending and receiving messages by each processor, i,
that participates in the collective communication. The per
processor overheads can be measured with a method
similar to that used to measure the overhead of point-to-
point communications . The minimum interval of time,
gi, between consecutive occurrences of the same collective
communication at processor i is the per processor gap,
which can be measured simply by timing repeated
occurences of the operation at each processor.
Figure 1 shows the time line of a 16 task broadcast for
the binomial tree algorithm used in MPICH , a popular
implementation of the MPI standard . In our figures,
we assume L and the time spent sending or receiving a
message are constant. Our broadcast benchmark method
does not rely on these assumptions, which do not hold in
general, particularly in grid environments, where latency
along different links can vary highly.
Most collective communication benchmarks try to
measure operation latency, OL, the total time that it takes
to complete the communication. In Figure 1, OL= tf - to,
the difference between the time at which the last processor
finishes the operation and the time at which the first
processor begins the operation. OL is the important
latency for collective communications; a method that
measures OL without measuring L would be sufficient.
Several factors make measuring OL difficult. The
pipelining effect - the potential for overlapping consec-
utive communications - causes many of the inaccuracies
. In addition, the first processor to begin the operation
or the last processor to finish the operation is difficult to
identify in general, even with algorithmic knowledge. For
example, consider a 15 task broadcast in MPICH. Our
model predicts that the last processor to finish the
operation will be one of tasks 7, 11 and 13. The correct
choice varies with each communication due to stochastic
delays and overheads. To overcome this difficulty, our
method measures the operation latency, OL;, to each desti-
nation, i, of the broadcast. The largest of these measure-
ments can be used as a reasonable estimate of OL.
Here’s what’s next.
This article can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Article.
T, Karonis N & de Supinski, B R. Accurately measuring MPI broadcasts in a computational grid, article, May 6, 1999; California. (digital.library.unt.edu/ark:/67531/metadc625016/m1/4/: accessed September 21, 2018), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Government Documents Department.