Scientific Data Management (SDM) Center for Enabling Technologies Page: 2 of 18
This report is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided to UNT Digital Library by the UNT Libraries Government Documents Department.
Extracted Text
The following text was automatically extracted from the image on this page using optical character recognition software:
1 Introduction
Effectively generating, managing, and analyzing scientific data requires a comprehensive,
end-to-end approach that encompasses all stages from the initial data acquisition to the final
analysis of the data. As part of the SPA thrust area, we developed a suite of tools and
frameworks that integrate into a robust and auditable system for automation of scientific
processes to enhance and speed up scientific discovery. Our technologies provide run-time
management of the workflow processes, provenance collection, and analysis and display of
results. This has led to the deployment of production workflows that allow scientists to a)
monitor, in near-real-time, complex tasks such as the execution of large simulation codes,
and b) facilitate complex analyses of the process metadata and of the simulation results. This
has resulted in significant savings in scientists' time, in more efficient use of resources, and in
a more cost-effective scientific discovery process overall.
Workflow technologies have a long history in the database and information systems
communities [GHS95]. Similarly, the scientific community has developed a number of
problem-solving environments, most of them as integrated solutions [HRG+00].
Component-based solution support systems are also proliferating [CL02, CCA06]. Scientific
workflows merge advances in all these areas to automate support for sophisticated scientific
problem-solving [LAB+06, LGO5, DOE04, ABB+03, BVPO0, VS97, SV96]. We use the term
scientific workflow as a blanket term describing a series of structured activities and
computations (called workflow components or actors) that arise in scientific
problem-solving as part of the discovery process. This description includes the actions
performed (by actors), the decisions made (control-flow), and the underlying coordination,
such as data transfers (dataflow) and scheduling, required to execute the workflow. In its
simplest case, a workflow is a linear sequence of tasks, each one implemented by an actor.
An example of a scientific workflow is: transfer a configuration file to a large cluster, run a
simulation passing this file as an input parameter, transfer the results of the simulation to a
secondary system (e.g. a smaller cluster), select a known variable, and generate a movie
showing how this variable evolves over time. Scientific workflows can exhibit and exploit
data-, task-, and pipeline-parallelism. In science and engineering, process tasks and
computations often are large-scale, complex, and structured with intricate dependencies
[DOE04, DBN+96, EBV95, Elm66].
Over the past five years, our activities have both established Kepler as a viable scientific
workflow environment and demonstrated its value across multiple science applications. We
have published numerous peer-reviewed papers on the technologies highlighted in this short
paper and have given Kepler tutorials at SC06, SC07, SC08, and SciDAC 2007. Our outreach
activities have allowed scientists to learn best practices and better utilize Kepler to address
their individual workflow problems.
Our contributions to advancing the state-of-the-art in scientific workflows have focused on
the following areas. Progress in each of these areas is described in subsequent sections.
" Workflow development. The development of a deeper understanding of scientific
workflows "in the wild" and of the requirements for support tools that allow easy
construction of complex scientific workflows;
" Generic workflow components and templates. The development of generic actors
(i.e. workflow components and processes) which can be broadly applied to scientific
problems;
" Provenance collection and analysis. The design of a flexible provenance collection
Upcoming Pages
Here’s what’s next.
Search Inside
This report can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Report.
Lud?scher, Bertram & Altintas, Ilkay. Scientific Data Management (SDM) Center for Enabling Technologies, report, September 6, 2013; United States. (https://digital.library.unt.edu/ark:/67531/metadc838272/m1/2/: accessed March 18, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT Libraries Government Documents Department.