Center for Programming Models for Scalable Parallel Computing: Future Programming Models Page: 3 of 8
This report is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided to UNT Digital Library by the UNT Libraries Government Documents Department.
Extracted Text
The following text was automatically extracted from the image on this page using optical character recognition software:
For example, we customized the well-known MCS spin lock algorithm to a new
extension called MCS-SW to C64 using the C64 efficient sleep/wakeup support.
Instead of spinning, a thread waiting for a lock goes to sleep after it adds itself into the
linked-list based waiting queue. When the lock owner releases the lock, it wakes up its
successor by sending a wakeup signal. MCS-SW also uses less memory space than the
original implementation. As shown in Figure 1, the new MCS-SW algorithm shows
several cycles lower overhead than the Original MCS for more than one thread.
Additionally, MCS-SW executes much less instructions per lock/release pair. Thus it
consumes less power. In summary, MCS-SW is a more efficient spin lock algorithm for
C64 in all three aspects of interest: time, memory, and power.
All three optimizations together result in an 80% overhead reduction for language
constructs in OpenMP (example in Figure 2). We believe that such a significant
reduction in the cost of managing parallelism makes OpenMP more amenable for
writing parallel programs on the C64 platform.
* 1 read
Q 12 threads
FJ1--- - A -duads'---------------------------------------- -------------------------
Q 16 hrad
Figure 2: Overhead (us) of OpenMP parallel and parallel for (base: direct porting; opt:
optimized)
Moreover, to increase our understanding of the behavior and performance
characteristics of OpenMP programs on multi-core architectures, we have studied the
performance of basic OpenMP language constructs on a multi-core chip architecture
such as the C64 architecture. Compared with previous work on conventional SMP
systems, the overhead of OpenMP language constructs on C64 is at least one order of
magnitude lower. For example, when 128 threads are executing a barrier concurrently
with our OpenMP implementation on C64, the overhead is only 983 cycles (see Figure
3). In comparison, a barrier performed with tens of threads normally takes tens of
thousands of cycles on a conventional commodity SMP system.
Upcoming Pages
Here’s what’s next.
Search Inside
This report can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Report.
Gao, Guang, R. Center for Programming Models for Scalable Parallel Computing: Future Programming Models, report, July 24, 2008; United States. (https://digital.library.unt.edu/ark:/67531/metadc896971/m1/3/: accessed March 19, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT Libraries Government Documents Department.