COMPUTER SCIENCE RESEARCH MELISSES: Liquid Services for Scalable Multithreaded and Multicore Execution on Emerging Supercomputers Page: 3 of 6
This report is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided to Digital Library by the UNT Libraries Government Documents Department.
The following text was automatically extracted from the image on this page using optical character recognition software:
and collaborated with Dr. Bronis de Supinski and Dr. Martin Schulz, of the Center for Applied Scientific
Computing, in this research.
We have ported the MELISSES to AMD Opteron Quad-Core Socket-F processors. The AMD Opteron
port enabled us to stress-test MELISSES and its capabilities for simultaneous optimization of performance,
power and temperature on multicore platforms with NUMA architecture, using two power-aware clusters
at Virginia Tech. We have also investigated virtualization of the MELISSES infrastructure, using the Xen
hypervisor. A preliminary publication on the integration of the MELISSES hardware event monitor in
virtualized environments appeared in .
3 Earlier contributions
We hereby provide a summary of earlier contributions form this research, all of which have been documented
in earlier reports:
We have robustified and released PACMAN (http://www.cs.wm.edu/pacman), an implementation of
our continuous profiler which provides accurate hardware event counters on a thread-local basis, at sub-
microsecond granularity on Intel Hyperthreaded processors. PACMAN has been used to implement a num-
ber of performance and power-related optimizations for multithreaded codes running on layered parallel
The first successful demonstration of MELISSES capabilities was a profile-driven parallelization scheme
for multithreaded codes, in each parallel regions was parallelized individually using either speculative pre-
computation with helper threads, or non-speculative thread-level parallelization. Regions that exhibit ample
instruction-level parallelism with low memory access rates are parallelized with conventional TLP methods,
whereas regions with limited instruction-level parallelism and high memory access rates are not parallelized.
They are executed instead with speculative precomputation, which preexecutes long-latency memory ac-
cesses. MELISSES assists in locating the critical memory accesses that are responsible for most of memory
latency and are offloaded for precomputation on helper threads. Runtime mechanisms and schemes for
combining TLP with speculative precomputation via the use of MELISSES were presented in . Another
relevant publication  addressed the problem of devising effective speculative precomputation schemes
for floating point scientific codes.
The design and implementation of PACMAN is discussed in . We deployed MELISSES and our con-
tinuous monitoring technology to achieve simultaneous optimization of performance and power on several
layered multicore platforms. The results of this work appeared in [12, 11, 13]. The distinguishing aspect
of this work is that it is the first to demonstrate concurrent improvement of both power and performance
on a high-end computing platform. Using MELISSES and runtime scalability predictors, a technology we
developed from scratch to accurately characterize and predict power and performance in phases of multi-
threaded code using non-linear regression, we have been able to improve performance by 22% on average,
while reducing energy consumption by 26% on average, on Intel platforms with up to 8 cores distributed
between 4 processors. More specifically, MELISSES isolates phases of multithreaded execution delimited
by loops and function calls, characterizes each phase in terms of scaling at all layers of a parallel architec-
ture (including the processor layer, the core layer within processors and the thread layer within cores), and
locates an execution sweet spot for each phase, in which maximum scalability is retained while the system
deactivates threads, cores or entire processors to reduce power consumption.
We have used a module of MELISSES which conducts statistical analysis of memory references, to
monitor cache access behavior in the SimICS multiprocessor simulator. This monitoring module has been
used to derive dynamic data re-mapping algorithms for large L2 caches . In a continuation of this work,
we used MELISSES on the SimICS full-system simulation platform, to implement speculative precomputa-
tion schemes that reduce remote memory access latency on layered parallel architectures with non-uniform
Here’s what’s next.
This report can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Report.
Nikolopoulos, Dimitrios S. COMPUTER SCIENCE RESEARCH MELISSES: Liquid Services for Scalable Multithreaded and Multicore Execution on Emerging Supercomputers, report, August 10, 2008; United States. (digital.library.unt.edu/ark:/67531/metadc933258/m1/3/: accessed December 19, 2018), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Government Documents Department.