PERI - Auto-tuning Memory Intensive Kernels for Multicore Page: 3 of 15
This article is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided to UNT Digital Library by the UNT Libraries Government Documents Department.
Extracted Text
The following text was automatically extracted from the image on this page using optical character recognition software:
Core Intel AMD Sun STI
Architecture Core2 Barcelona Niagara2 PPE SPE
super scalar super scalar MT MT SIMD
Type out of order out of order dual issue dual issue dual issue
Clock (GHz) 2.33 2.30 1.16 3.20 3.20
DP GFlop/s 9.33 9.20 1.16 6.4 1.8
Local Store 256KB
per core Li Data Cache 32KB 64KB 8KB 32KB -
per core L2 Cache - 512KB - 512KB -
Li TLB entries 16 32 128 1024 256
Page Size 4KB 4KB 4MB 4KB 4KB
System Xeon E5345 Opteron 2356 UltraSparc T5140 T2+ QS20
(Clovertown) (Barcelona) (Victoria Falls) Cell Blade
# Sockets 2 2 2 2
Cores/Socket 4 4 8 1 8
shared L2/L3 Cache 4x4MB(shared by 2) 2x2MB(shared by 4) 2x4MB(shared by 8)
DP GFlop/s 74.66 73.6 18.7 12.8 29
DRAM 21.33(read) 21.33 42.66(read) 51.2
Bandwidth (GB/s) 10.66(write) 21.33(write)
DP Flop:Byte Ratio 2.33 3.45 0.29 0.25 0.57
System Power (Watts) 330 350 610 285
Threading Pthreads Pthreads Pthreads Pthreads libspe2.1
Compiler icc 10.0 gcc 4.1.2 gcc 4.0.4 xlc 8.2 xlc 8.2
Table 1. Architectural summary of evaluated platforms. Top: per core characteristics. Bottom: SMP
characteristics. tEach of the two thread groups may issue up to one instruction. 5Al system power is measured
with a digital power meter while under a full computational load. Cell BladeCenter power running SGEMM
averaged per blade.
two DDR2-667 memory controllers and a single cache-coherent HyperTransport (HT) link to access the other
socket's cache and memory; thus delivering 10.66 GB/s per socket, for an aggregate non-uniform memory
access (NUMA) memory bandwidth of 21.33 GB/s for the quad-core, dual-socket system examined in our
study. Non-uniformity arises from the fact that DRAM is directly attached to each processor. Thus, access to
DRAM attached to the other socket comes at the price of lower bandwidth and higher latency. The DRAM
capacity of the tested configuration is 16 GB.
2.3. Sun Victoria Falls
The Sun "UltraSparc T2 Plus" dual-socket 8-core processor, referred to as Victoria Falls, presents an interesting
departure from mainstream multicore chip design. Rather than depending on four-way superscalar execution,
each of the 16 strictly in-order cores supports two groups of four hardware thread contexts (referred to as Chip
MultiThreading or CMT) - providing a total of 64 simultaneous hardware threads per socket. Each core may
issue up to one instruction per thread group assuming there is no resource conflict. The CMT approach is
designed to tolerate instruction, cache, and DRAM latency through fine-grained multithreading.
Victoria Falls instantiates one floating-point unit (FPU) per core (shared among 8 threads). Our study
examines the Sun UltraSparc T5140 with two T2+ processors operating at 1.16 GHz, with a per-core and
per-socket peak performance of 1.16 GFlop/s and 9.33 GFlop/s, respectively - no fused-multiply add (FMA)
functionality. Each core has access to its own private 8KB write-through Li cache, but is connected to a shared
Upcoming Pages
Here’s what’s next.
Search Inside
This article can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Article.
Bailey, David H.; Williams, Samuel; Datta, Kaushik; Carter, Jonathan; Oliker, Leonid; Shalf, John et al. PERI - Auto-tuning Memory Intensive Kernels for Multicore, article, June 24, 2008; Berkeley, California. (https://digital.library.unt.edu/ark:/67531/metadc898896/m1/3/: accessed April 24, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT Libraries Government Documents Department.