69 Matching Results

Search Results

Advanced search parameters have been applied.

High Performance, Three-Dimensional Bilateral Filtering

Description: Image smoothing is a fundamental operation in computer vision and image processing. This work has two main thrusts: (1) implementation of a bilateral filter suitable for use in smoothing, or denoising, 3D volumetric data; (2) implementation of the 3D bilateral filter in three different parallelization models, along with parallel performance studies on two modern HPC architectures. Our bilateral filter formulation is based upon the work of Tomasi [11], but extended to 3D for use on volumetric data. Our three parallel implementations use POSIX threads, the Message Passing Interface (MPI), and Unified Parallel C (UPC), a Partitioned Global Address Space (PGAS) language. Our parallel performance studies, which were conducted on a Cray XT4 supercomputer and aquad-socket, quad-core Opteron workstation, show our algorithm to have near-perfect scalability up to 120 processors. Parallel algorithms, such as the one we present here, will have an increasingly important role for use in production visual analysis systems as the underlying computational platforms transition from single- to multi-core architectures in the future.
Date: June 5, 2008
Creator: Bethel, E. Wes
Partner: UNT Libraries Government Documents Department

Exploration of Optimization Options for Increasing Performance of a GPU Implementation of a Three-dimensional Bilateral Filter

Description: This report explores using GPUs as a platform for performing high performance medical image data processing, specifically smoothing using a 3D bilateral filter, which performs anisotropic, edge-preserving smoothing. The algorithm consists of a running a specialized 3D convolution kernel over a source volume to produce an output volume. Overall, our objective is to understand what algorithmic design choices and configuration options lead to optimal performance of this algorithm on the GPU. We explore the performance impact of using different memory access patterns, of using different types of device/on-chip memories, of using strictly aligned and unaligned memory, and of varying the size/shape of thread blocks. Our results reveal optimal configuration parameters for our algorithm when executed sample 3D medical data set, and show performance gains ranging from 30x to over 200x as compared to a single-threaded CPU implementation.
Date: January 6, 2012
Creator: Bethel, E. Wes & Bethel, E. Wes
Partner: UNT Libraries Government Documents Department

Hybrid Parallelism for Volume Rendering on Large, Multi-core Systems

Description: This work studies the performance and scalability characteristics of"hybrid'"parallel programming and execution as applied to raycasting volume rendering -- a staple visualization algorithm -- on a large, multi-core platform. Historically, the Message Passing Interface (MPI) has become the de-facto standard for parallel programming and execution on modern parallel systems. As the computing industry trends towards multi-core processors, with four- and six-core chips common today and 128-core chips coming soon, we wish to better understand how algorithmic and parallel programming choices impact performance and scalability on large, distributed-memory multi-core systems. Our findings indicate that the hybrid-parallel implementation, at levels of concurrency ranging from 1,728 to 216,000, performs better, uses a smaller absolute memory footprint, and consumes less communication bandwidth than the traditional, MPI-only implementation.
Date: July 12, 2010
Creator: Howison, Mark; Bethel, E. Wes & Childs, Hank
Partner: UNT Libraries Government Documents Department

Hybrid Parallelism for Volume Rendering on Large, Multi-core Systems

Description: This work studies the performance and scalability characteristics of"hybrid" parallel programming and execution as applied to raycasting volume rendering -- a staple visualization algorithm -- on a large, multi-core platform. Historically, the Message Passing Interface (MPI) has become the de-facto standard for parallel programming and execution on modern parallel systems. As the computing industry trends towards multi-core processors, with four- and six-core chips common today and 128-core chips coming soon, we wish to better understand how algorithmic and parallel programming choices impact performance and scalability on large, distributed-memory multi-core systems. Our findings indicate that the hybrid-parallel implementation, at levels of concurrency ranging from 1,728 to 216,000, performs better, uses a smaller absolute memory footprint, and consumes less communication bandwidth than the traditional, MPI-only implementation.
Date: June 14, 2010
Creator: Howison, Mark; Bethel, E. Wes & Childs, Hank
Partner: UNT Libraries Government Documents Department

MPI-hybrid Parallelism for Volume Rendering on Large, Multi-core Systems

Description: This work studies the performance and scalability characteristics of"hybrid'" parallel programming and execution as applied to raycasting volume rendering -- a staple visualization algorithm -- on a large, multi-core platform. Historically, the Message Passing Interface (MPI) has become the de-facto standard for parallel programming and execution on modern parallel systems. As the computing industry trends towards multi-core processors, with four- and six-core chips common today and 128-core chips coming soon, we wish to better understand how algorithmic and parallel programming choices impact performance and scalability on large, distributed-memory multi-core systems. Our findings indicate that the hybrid-parallel implementation, at levels of concurrency ranging from 1,728 to 216,000, performs better, uses a smaller absolute memory footprint, and consumes less communication bandwidth than the traditional, MPI-only implementation.
Date: March 20, 2010
Creator: Howison, Mark; Bethel, E. Wes & Childs, Hank
Partner: UNT Libraries Government Documents Department

Federal Market Information Technology in the Post Flash Crash Era: Roles for Supercomputing

Description: This paper describes collaborative work between active traders, regulators, economists, and supercomputing researchers to replicate and extend investigations of the Flash Crash and other market anomalies in a National Laboratory HPC environment. Our work suggests that supercomputing tools and methods will be valuable to market regulators in achieving the goal of market safety, stability, and security. Research results using high frequency data and analytics are described, and directions for future development are discussed. Currently the key mechanism for preventing catastrophic market action are “circuit breakers.” We believe a more graduated approach, similar to the “yellow light” approach in motorsports to slow down traffic, might be a better way to achieve the same goal. To enable this objective, we study a number of indicators that could foresee hazards in market conditions and explore options to confirm such predictions. Our tests confirm that Volume Synchronized Probability of Informed Trading (VPIN) and a version of volume Herfindahl-Hirschman Index (HHI) for measuring market fragmentation can indeed give strong signals ahead of the Flash Crash event on May 6 2010. This is a preliminary step toward a full-fledged early-warning system for unusual market conditions.
Date: September 16, 2011
Creator: Bethel, E. Wes; Leinweber, David; Ruebel, Oliver & Wu, Kesheng
Partner: UNT Libraries Government Documents Department

Data Parallel Bin-Based Indexing for Answering Queries on Multi-Core Architectures

Description: The multi-core trend in CPUs and general purpose graphics processing units (GPUs) offers new opportunities for the database community. The increase of cores at exponential rates is likely to affect virtually every server and client in the coming decade, and presents database management systems with a huge, compelling disruption that will radically change how processing is done. This paper presents a new parallel indexing data structure for answering queries that takes full advantage of the increasing thread-level parallelism emerging in multi-core architectures. In our approach, our Data Parallel Bin-based Index Strategy (DP-BIS) first bins the base data, and then partitions and stores the values in each bin as a separate, bin-based data cluster. In answering a query, the procedures for examining the bin numbers and the bin-based data clusters offer the maximum possible level of concurrency; each record is evaluated by a single thread and all threads are processed simultaneously in parallel. We implement and demonstrate the effectiveness of DP-BIS on two multi-core architectures: a multi-core CPU and a GPU. The concurrency afforded by DP-BIS allows us to fully utilize the thread-level parallelism provided by each architecture--for example, our GPU-based DP-BIS implementation simultaneously evaluates over 12,000 records with an equivalent number of concurrently executing threads. In comparing DP-BIS's performance across these architectures, we show that the GPU-based DP-BIS implementation requires significantly less computation time to answer a query than the CPU-based implementation. We also demonstrate in our analysis that DP-BIS provides better overall performance than the commonly utilized CPU and GPU-based projection index. Finally, due to data encoding, we show that DP-BIS accesses significantly smaller amounts of data than index strategies that operate solely on a column's base data; this smaller data footprint is critical for parallel processors that possess limited memory resources (e.g., GPUs).
Date: June 2, 2009
Creator: Gosink, Luke; Wu, Kesheng; Bethel, E. Wes; Owens, John D. & Joy, Kenneth I.
Partner: UNT Libraries Government Documents Department

Query-Driven Network Flow Data Analysis and Visualization

Description: This document is the final report for a WFO agreement between LBNL and the National Visualization and Analytics Center at PNNL. The document lists project milestones, their completion date, research results and findings. In brief, the project focuses on testing the hypothesis that the duty cycle in scientific discovery can be reduced by combining visual analytics interfaces, novel visualization techniques and scientific data management technology.
Date: June 14, 2006
Creator: Bethel, E. Wes
Partner: UNT Libraries Government Documents Department

Hybrid Parallelism for Volume Rendering on Large, Multi- and Many-core Systems

Description: With the computing industry trending towards multi- and many-core processors, we study how a standard visualization algorithm, ray-casting volume rendering, can benefit from a hybrid parallelism approach. Hybrid parallelism provides the best of both worlds: using distributed-memory parallelism across a large numbers of nodes increases available FLOPs and memory, while exploiting shared-memory parallelism among the cores within each node ensures that each node performs its portion of the larger calculation as efficiently as possible. We demonstrate results from weak and strong scaling studies, at levels of concurrency ranging up to 216,000, and with datasets as large as 12.2 trillion cells. The greatest benefit from hybrid parallelism lies in the communication portion of the algorithm, the dominant cost at higher levels of concurrency. We show that reducing the number of participants with a hybrid approach significantly improves performance.
Date: January 1, 2011
Creator: Howison, Mark; Bethel, E. Wes & Childs, Hank
Partner: UNT Libraries Government Documents Department

Cactus and Visapult: An ultra-high performance grid-distributedvisualization architecture using connectionless protocols

Description: This past decade has seen rapid growth in the size,resolution, and complexity of Grand Challenge simulation codes. Thistrend is accompanied by a trend towards multinational, multidisciplinaryteams who carry out this research in distributed teams, and thecorresponding growth of Grid infrastructure to support these widelydistributed Virtual Organizations. As the number and diversity ofdistributed teams grow, the need for visualization tools to analyze anddisplay multi-terabyte, remote data becomes more pronounced and moreurgent. One such tool that has been successfully used to address thisproblem is Visapult. Visapult is a parallel visualization tool thatemploys Grid-distributed components, latency tolerant visualization andgraphics algorithms, along with high performance network I/O in order toachieve effective remote analysis of massive datasets. In this paper wediscuss improvements to network bandwidth utilization and responsivenessof the Visapult application that result from using connectionlessprotocols to move data payload between the distributed Visapultcomponents and a Grid-enabled, high performance physics simulation usedto study gravitational waveforms of colliding black holes: The Cactuscode. These improvements have boosted Visapult's network efficiency to88-96 percent of the maximum theoretical available bandwidth onmulti-gigabit Wide Area Networks, and greatly enhanced interactivity.Such improvements are critically important for future development ofeffective interactive Grid applications.
Date: August 31, 2002
Creator: Bethel, E. Wes & Shalf, John
Partner: UNT Libraries Government Documents Department

H5hut: A High-Performance I/O Library for Particle-based Simulations

Description: Particle-based simulations running on large high-performance computing systems over many time steps can generate an enormous amount of particle- and field-based data for post-processing and analysis. Achieving high-performance I/O for this data, effectively managing it on disk, and interfacing it with analysis and visualization tools can be challenging, especially for domain scientists who do not have I/O and data management expertise. We present the H5hut library, an implementation of several data models for particle-based simulations that encapsulates the complexity of HDF5 and is simple to use, yet does not compromise performance.
Date: September 24, 2010
Creator: Howison, Mark; Adelmann, Andreas; Bethel, E. Wes; Gsell, Achim; Oswald, Benedikt & Prabhat,
Partner: UNT Libraries Government Documents Department

Interactive, Internet Delivery of Scientific Visualization viaStructured, Prerendered Multiresolution Imagery

Description: We present a novel approach for highly interactive remote delivery of visualization results. Instead of real-time rendering across the internet, our approach, inspired by QuickTime VR's Object Movieconcept, delivers pre-rendered images corresponding to different viewpoints and different time steps to provide the experience of 3D and temporal navigation. We use tiled, multiresolution image streaming to consume minimum bandwidth while providing the maximum resolution that a user can perceive from a given viewpoint. Since image data, a viewpoint and time stamps are the only required inputs, our approach is generally applicable to all visualization and graphics rendering applications capable of generating image files in an ordered fashion. Our design is a form of latency-tolerant remote visualization, where visualization and Rendering time is effectively decoupled from interactive exploration. Our approach trades off increased interactivity, flexible resolution (for individual clients), reduced load and effective reuse of coherent frames between multiple users (from the servers perspective) at the expense of unconstrained exploration. A normal web server is the vehicle for providing on-demand images to the remote client application, which uses client-pull to obtain and cache only those images required to fulfill the interaction needs. This paper presents an architectural description of the system along with a performance characterization for stage of production, delivery and viewing pipeline.
Date: April 20, 2005
Creator: Chen, Jerry; Yoon, Ilmi & Bethel, E. Wes
Partner: UNT Libraries Government Documents Department

Interactive, Internet Delivery of Visualization via Structured,Prerendered multiresolution Imagery

Description: One of the fundamental problems in remote visualization --where I/O and data intensive visualization activities take place at acentrally located supercomputer center and resulting imagery is deliveredto a remotely located user -- is reduced interactivity resulting from thecombination of high network latency and relatively low network bandwidth.This research project has produced a novel approach for latency-tolerantdelivery of visualization and rendering results where client-side framerate display performance is independent of source dataset size, imagesize, visualization technique or rendering complexity. As such, it is asuitable solution for remote visualization image delivery for anyvisualization or rendering application that can generate image frames inan ordered fashion. This new capability is suitable for use in addressingmany of ASCR s remote visualization needs, particularly deployment atopen computing facilities to provide remote visualization capabilities toteams of scientific researchers.
Date: October 25, 2007
Creator: Bethel, E. Wes; Yoon, Ilmi & Chen, Jerry
Partner: UNT Libraries Government Documents Department

Interactive stereo electron microscopy enhanced with virtual reality

Description: An analytical system is presented that is used to take measurements of objects perceived in stereo image pairs obtained from a scanning electron microscope (SEM). Our system operates by presenting a single stereo view that contains stereo image data obtained from the SEM, along with geometric representations of two types of virtual measurement instruments, a ''protractor'' and a ''caliper''. The measurements obtained from this system are an integral part of a medical study evaluating surfactant, a liquid coating the inner surface of the lung which makes possible the process of breathing. Measurements of the curvature and contact angle of submicron diameter droplets of a fluorocarbon deposited on the surface of airways are performed in order to determine surface tension of the air/liquid interface. This approach has been extended to a microscopic level from the techniques of traditional surface science by measuring submicrometer rather than millimeter diameter droplets, as well as the lengths and curvature of cilia responsible for movement of the surfactant, the airway's protective liquid blanket. An earlier implementation of this approach for taking angle measurements from objects perceived in stereo image pairs using a virtual protractor is extended in this paper to include distance measurements and to use a unified view model. The system is built around a unified view model that is derived from microscope-specific parameters, such as focal length, visible area and magnification. The unified view model ensures that the underlying view models and resultant binocular parallax cues are consistent between synthetic and acquired imagery. When the view models are consistent, it is possible to take measurements of features that are not constrained to lie within the projection plane. The system is first calibrated using non-clinical data of known size and resolution. Using the SEM, stereo image pairs of grids and spheres of known resolution are ...
Date: December 17, 2001
Creator: Bethel, E.Wes; Bastacky, S.Jacob & Schwartz, Kenneth S.
Partner: UNT Libraries Government Documents Department

Personal Display Wall

Description: The LBNL Visualization Group has created a tiled display wall design that uses components that are readily available from a local hardware store and/or multiple online vendors, and requires minimal tools and skill to assemble. The result is a low-cost, easy to assemble tiled display device that is readily accessible to visualization researchers and domain scientists alike. The LBNL Personal Display (PD) Wall differentiates itself from other LCD-matrix displays because its design minimizes cost and complexity while retaining the functionality of its more expensive tiled display brethren. The PD-Wall occupies the same amount of desktop area as a large flatscreen LCD display panel. LBNL will be publishing and distributing simple plans so that any laboratory or user site can construct their own copies of this device.
Date: January 1, 2004
Creator: Shalf, John; Bethel, E. Wes & Siegerist, Cristina
Partner: UNT Libraries Government Documents Department

VisPortal: Increasing Scientific Productivity by Simplifying Access to and Use of Remote Computational Resources

Description: Our goal is to simplify and streamline the process of using remotely located visual data analysis software tools. This discussion presents an example of an easy-to-use interface that mediates access to and use of diverse and powerful visual data analysis resources. The interface is presented via a standard web browser, which is ubiquitous and a part of every researchers work environment. Through the web interface, a few mouse clicks are all that is needed to take advantage of powerful, remotely located software resources. The VisPortal project is the software that provides diverse services to remotely located users through their web browser. Using standard Globus-grid middleware and off-the-shelf web automation, the VisPortal hides the underlying complexity of resource selection and distributed application management. The portal automates complex workflows that would otherwise require a substantial amount of manual effort on the part of the researcher. With a few mouse clicks, a researcher can quickly perform complex tasks like creating MPEG movies, scheduling file transfers, launching components of a distributed application, and accessing specialized resources.
Date: January 1, 2004
Creator: Siegerist, Cristina; Shalf, John & Bethel, E. Wes
Partner: UNT Libraries Government Documents Department

Cactus and Visapult: A case study of ultra-high performance distributed visualization using connectionless protocols

Description: This past decade has seen rapid growth in the size, resolution, and complexity of Grand Challenge simulation codes. Many such problems still require interactive visualization tools to make sense of multi-terabyte data stores. Visapult is a parallel volume rendering tool that employs distributed components, latency tolerant algorithms, and high performance network I/O for effective remote visualization of massive datasets. In this paper we discuss using connectionless protocols to accelerate Visapult network I/O and interfacing Visapult to the Cactus General Relativity code to enable scalable remote monitoring and steering capabilities. With these modifications, network utilization has moved from 25 percent of line-rate using tuned multi-streamed TCP to sustaining 88 percent of line rate using the new UDP-based transport protocol.
Date: May 7, 2002
Creator: Shalf, John & Bethel, E. Wes
Partner: UNT Libraries Government Documents Department

Chromium Renderserver: Scalable and Open Source Remote RenderingInfrastructure

Description: Chromium Renderserver (CRRS) is software infrastructure thatprovides the ability for one or more users to run and view image outputfrom unmodified, interactive OpenGL and X11 applications on a remote,parallel computational platform equipped with graphics hardwareaccelerators via industry-standard Layer 7 network protocolsand clientviewers. The new contributions of this work include a solution to theproblem of synchronizing X11 and OpenGL command streams, remote deliveryof parallel hardware-accelerated rendering, and a performance analysis ofseveral different optimizations that are generally applicable to avariety of rendering architectures. CRRSis fully operational, Open Sourcesoftware.
Date: December 1, 2007
Creator: Paul, Brian; Ahern, Sean; Bethel, E. Wes; Brugger, Eric; Cook,Rich; Daniel, Jamison et al.
Partner: UNT Libraries Government Documents Department

Query-Driven Visualization of Time-Varying Adaptive Mesh Refinement Data

Description: The visualization and analysis of AMR-based simulations is integral to the process of obtaining new insight in scientific research. We present a new method for performing query-driven visualization and analysis on AMR data, with specific emphasis on time-varying AMR data. Our work introduces a new method that directly addresses the dynamic spatial and temporal properties of AMR grids which challenge many existing visualization techniques. Further, we present the first implementation of query-driven visualization on the GPU that uses a GPU-based indexing structure to both answer queries and efficiently utilize GPU memory. We apply our method to two different science domains to demonstrate its broad applicability.
Date: August 1, 2008
Creator: Gosink, Luke J.; Anderson, John C.; Bethel, E. Wes & Joy, Kenneth I.
Partner: UNT Libraries Government Documents Department