What Scientific Applications can Benefit from Hardware Transactional Memory? - Early experience from a commercially available HTM system. Page: 3 of 12
This article is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided to UNT Digital Library by the UNT Libraries Government Documents Department.
Extracted Text
The following text was automatically extracted from the image on this page using optical character recognition software:
What Scientific Applications can Benefit from Hardware
Transactional Memory?
Early experience from a commercially available HTM system.ABSTRACT
Achieving efficient and correct synchronization of multiple
threads is a difficult and error-prone task at small scale and,
as we march towards extreme scale computing, will be even
more challenging when the resulting application is supposed
to utilize millions of cores efficiently. Transactional Mem-
ory (TM) is a promising technique to ease the burden on
the programmer, but only recently has become available on
commercial hardware in the new Blue Gene/Q system and
hence the real benefit for realistic applications has not been
studied, yet.
This paper presents the first performance results of TM
embedded into OpenMP on a prototype system of BG/Q
and characterizes code properties that will likely lead to
benefits when augmented with TM primitives. We first,
study the influence of thread count, environment variables
and memory layout on TM performance and identify code
properties that will yield performance gains with TM. Sec-
ond, we evaluate the combination of OpenMP with multiple
synchronization primitives on top of MPI to determine suit-
able task to thread ratios per node. Finally, we condense
our findings into a set of best practices and apply them to
a Monte Carlo Benchmark, closely representing a real world
application, to optimize its performance. This optimized
TM version, executed with 64 threads on one node, yields a
speedup of 26.47 over baseline.
General Terms
Performance Analysis, Hardware Transactional Memory
Keywords
Performance Analysis, Hardware Transactional Memory,
Early Experience, BG/Q
1. INTRODUCTION
Achieving efficient and correct synchronization of multi-
ple threads is a difficult and error-prone task. In particu-
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
26th International Conference on Supercomputing '12 San Servolo Island,
Venice, Italy
Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00.lar, lock-based synchronization schemes often lead to high-
overheads, either due to lock contention, when using coarse
grain locks, or unnecessary lock overhead, when using fine
grain locks. This not only slows down the overall process
using the locks, but also has a global effect in large scale
programming due to the creation of skew between processes
as well as load imbalance, both major factors limiting the
scalability of applications.
Transactional Memory (TM) has been proposed almost
a decade ago to tackle this issue in shared memory sys-
tems [15]. TM simplifies synchronization by providing a
simple construct: the programmer wraps the critical in-
structions in a transaction (also called atomic block). These
transactions then are executed optimistically in parallel and
conflicting accesses are resolved by a TM run time system.
As a consequence only the effects of entire and completed
transactions are visible to concurrent threads avoiding the
visibility of intermediate memory states.
Except for a few, by now discontinued prototype imple-
mentations in research processors, TM has mainly been con-
fined to software solutions and therefore has been burdened
with significant runtime overheads severely restricting its
applicability in high performance computing. However, the
recently introduced Blue Gene/Q (BG/Q) system by IBM
for the first time provides Hardware Transactional Memory
(HTM) in a commercially available platform. BG/Q is de-
signed as a large scale platform designed for scientific com-
puting workloads. The first full machine will be installed at
Lawrence Livermore National Laboratory and will provide
more than 1.6 million compute cores with a total of over
6 million hardware threads, making application scalability
one of the premier challenges on this machine.
This paper presents the first performance evaluation of
the HTM capabilities on BG/Q from the application per-
spective. Not every lock-based application will be able to
benefit from HTM and it is important to understand what
code properties lead to efficient executions and, hence, which
codes can benefit from a port to HTM. In order to help
code developers with this task, we provide a precise evalua-
tion of the strengths and weaknesses of the architecture as
well as what is required to map applications to the archi-
tecture in an efficient way. In particular, we focus on the
the synchronization primitives for parallel programming in
shared memory architectures with OpenMP and provide de-
tail benchmark results. Our experiments take into account
the application's characteristic (high or low contention), the
influence of environment variables, the effects of enlarging
transaction sizes, and hybrid parallelization with MPI. We
Upcoming Pages
Here’s what’s next.
Search Inside
This article can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Article.
Schindewolf, M; Schulz, M; Bihari, B; Gyllenhaal, J; Wang, A & Karl, W. What Scientific Applications can Benefit from Hardware Transactional Memory? - Early experience from a commercially available HTM system., article, January 19, 2012; Livermore, California. (https://digital.library.unt.edu/ark:/67531/metadc832072/m1/3/: accessed April 23, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT Libraries Government Documents Department.