Performance and Performance Engineering of the Community Earth System Model Page: 4 of 12
This article is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided to Digital Library by the UNT Libraries Government Documents Department.
The following text was automatically extracted from the image on this page using optical character recognition software:
realization) production run is also relatively small. Parallel
algorithms need to be highly optimized for even a modest
number of computational threads to make best use of the
limited available parallelism.
High resolution exploratory science runs are also vital for
validating model extensions and for preparing for next gen-
eration production problem scenarios and next generation
computing architectures. Thus more traditional parallel scal-
ability out to very large thread counts is equally important.
Finally, CESM is a community code that is evolving con-
tinually to evaluate and include new science. Thus it has
been very important that the CESM be easy to maintain
and port to new systems, and that CESM performance be
easy to optimize for new systems or for changes in problem
specification or processor count .
In this paper we give an overview of the performance en-
gineering aspects of CESM, including description of perfor-
mance optimization options and overall performance opti-
mization methodology. We then describe performance re-
sults for four production-like simulations on the Cray XT5
system sited at Oak Ridge National Laboratory and the IBM
BG/P system sited at Argonne National Laboratory. Finally
we describe performance results for two experimental high
resolution simulations, one using the current default numer-
ical methods and one using a new, more scalable, numerical
method for the atmosphere that is expected to become the
default method in the near future.
CESM consists of a system of five geophysical component
models: atmosphere, land, ocean, sea ice, and ice sheet.
Two-dimensional boundary data (flux and state informa-
tion) are exchanged periodically through a coupler compo-
nent. The coupler coordinates the interaction and time evo-
lution of the component models, and also serves to remap
the boundary-exchange data in space . The atmosphere
model is CAM, the Community Atmosphere Model [3, 21].
The ocean model is POP, the Parallel Ocean Program [11,
25]. The land model is CLM, the Community Land Model [8,
23]. The sea ice model is CICE, the Community Ice Code [1,
15, 16]. The ice sheet model is CISM, the Community Ice
Sheet Model .
CAM, POP, CLM, and CICE are all parallel models, sup-
porting both distributed memory (MPI) and shared mem-
ory (OpenMP) parallelism. The parallel implementations
are based for the most part on decompositions of the asso-
ciated spatial domains. CISM, the most recent addition to
CESM, is still a serial code and runs as a single process,
but with an interface that allows it to run concurrently with
the other components. The coupler is itself a parallel code,
supporting MPI, but not OpenMP, parallelism.
The ocean model can run concurrently with all of the other
geophysical component models. The sea ice and land models
can run concurrently with respect to each other, but for sci-
ence reasons they must run sequentially with respect to the
atmosphere model. This limits the fraction of time when all
CESM components can execute simultaneously. The cou-
pler runs both sequentially between components and con-
currently with components, depending on the work to be
The frequency of atmosphere-land coupling and atmosphere-
sea ice coupling is relatively high, ranging from 48 to 96
times per simulation day for the simulations examined here.
In contrast, the atmosphere and ocean models exchange data
once per simulation day for the example production simula-
tions and 4 times per day for the high resolution simulations.
Currently the ice sheet model coupling is one-way, receiving
surface data from the land once per simulated day and not
In the first three versions of CCSM each component model
and the coupler were run as separate executables assigned to
nonoverlapping processor sets. As of CESM (and CCSM4),
the entire system is now run as a single executable and there
is greatly increased flexibility to select the component pro-
cessor layout. It is typical for the atmosphere, land, and sea
ice model to run on a common set of processors, while the
ocean model runs concurrently on a disjoint set of proces-
sors. This is not a requirement, however, and CESM can
now run with all components on disjoint processor subsets,
all on the same processors, or any combination in between.
Each component model has its own performance character-
istics, and the coupling itself adds to the complexity of the
performance characterization . The first step in CESM
performance optimization is to determine the optimized per-
formance of each of the component models for a number of
different processor counts for the given platform and prob-
lem specification. This performance information is then used
to determine how to assign processors to components to
maximize CESM throughput.
Each CESM component model is also a parallel application
code in its own right and was developed for the most part
independently from the other component models. In conse-
quence, each has its own approaches to performance engi-
neering. These are discussed in turn.
CAM is characterized by two computational phases: the
dynamics, which advances the evolutionary equations for
the atmospheric flow, and the physics, which approximates
subgrid phenomena such as precipitation processes, clouds,
long- and short-wave radiation, and turbulent mixing. Sep-
arate data structures and parallelization strategies are used
for the dynamics and physics. The dynamics and physics are
executed in turn during each model simulation timestep, re-
quiring that some data be rearranged between the dynamics
and physics data structures each timestep.
CAM includes multiple compile-time options for computing
the dynamics, referred to as dynamical cores or dycores. The
default dycore for use with CESM is a finite-volume method
(FV) formulated originally by Lin and Rood  that uses
a tensor-product longitude x latitude x vertical-level com-
putational grid over the sphere. CAM also supports less
structured but more uniform grids such as cubed-sphere and
icosahedral-like tesselations of the sphere and several dy-
cores which can use these grids are being evaluated in CAM.
Here’s what’s next.
This article can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Article.
Worley, P. H.; Craig, A. P.; Dennis, J. M.; Mirin, A. A.; Taylor, M. A. & Vertenstein, M. Performance and Performance Engineering of the Community Earth System Model, article, April 12, 2011; Livermore, California. (digital.library.unt.edu/ark:/67531/metadc870278/m1/4/: accessed January 20, 2019), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Government Documents Department.