Decomposition of Large Scale Semantic Graphsvia an Efficient Communities Algorithm

Yao, Y

doi:10.2172/926011

Decomposition of Large Scale Semantic Graphsvia an Efficient Communities Algorithm Metadata

Metadata describes a digital item, providing (if known) such information as creator, publisher, contents, size, relationship to other resources, and more. Metadata may also contain "preservation" components that help us to maintain the integrity of digital files over time.

Available Formats

Dublin Core: XML Dublin Core: JSON Dublin Core: Text Dublin Core: RDF/XML UNTL: XML Access METS: XML

Title

Main Title Decomposition of Large Scale Semantic Graphsvia an Efficient Communities Algorithm

Creator

Author: Yao, Y

Creator Type: Personal

Contributor

Sponsor: United States. Department of Energy.

Contributor Type: Organization

Publisher

Name: Lawrence Livermore National Laboratory

Place of Publication: Livermore, California

Additional Info: Lawrence Livermore National Laboratory (LLNL), Livermore, CA

Date

Creation: 2008-02-08

Language

English

Description

Content Description: Semantic graphs have become key components in analyzing complex systems such as the Internet, or biological and social networks. These types of graphs generally consist of sparsely connected clusters or 'communities' whose nodes are more densely connected to each other than to other nodes in the graph. The identification of these communities is invaluable in facilitating the visualization, understanding, and analysis of large graphs by producing subgraphs of related data whose interrelationships can be readily characterized. Unfortunately, the ability of LLNL to effectively analyze the terabytes of multisource data at its disposal has remained elusive, since existing decomposition algorithms become computationally prohibitive for graphs of this size. We have addressed this limitation by developing more efficient algorithms for discerning community structure that can effectively process massive graphs. Current algorithms for detecting community structure, such as the high quality algorithm developed by Girvan and Newman [1], are only capable of processing relatively small graphs. The cubic complexity of Girvan and Newman, for example, makes it impractical for graphs with more than approximately 10{sup 4} nodes. Our goal for this project was to develop methodologies and corresponding algorithms capable of effectively processing graphs with up to 10{sup 9} nodes. From a practical standpoint, we expect the developed scalable algorithms to help resolve a variety of operational issues associated with the productive use of semantic graphs at LLNL. During FY07, we completed a graph clustering implementation that leverages a dynamic graph transformation to more efficiently decompose large graphs. In essence, our approach dynamically transforms the graph (or subgraphs) into a tree structure consisting of biconnected components interconnected by bridge links. This isomorphism allows us to compute edge betweenness, the chief source of inefficiency in Girvan and Newman's decomposition algorithm, much more efficiently, leading to significantly reduced computation time. Test runs on a desktop computer have shown reductions of up to 89%. Our focus this year has been on the implementation of parallel graph clustering on one of LLNL's supercomputers. In order to achieve efficiency in parallel computing, we have exploited the fact that large semantic graphs tend to be sparse, comprising loosely connected dense node clusters. When implemented on distributed memory computers, our approach performed well on several large graphs with up to one billion nodes, as shown in Table 2. The rightmost column of Table 2 contains the associated Newman's modularity [1], a metric that is widely used to assess the quality of community structure. Existing algorithms produce results that merely approximate the optimal solution, i.e., maximum modularity. We have developed a verification tool for decomposition algorithms, based upon a novel integer linear programming (ILP) approach, that computes an exact solution. We have used this ILP methodology to find the maximum modularity and corresponding optimal community structure for several well-studied graphs in the literature (e.g., Figure 1) [3]. The above approaches assume that modularity is the best measure of quality for community structure. In an effort to enhance this quality metric, we have also generalized Newman's modularity based upon an insightful random walk interpretation that allows us to vary the scope of the metric. Generalized modularity has enabled us to develop new, more flexible versions of our algorithms. In developing these methodologies, we have made several contributions to both graph theoretic algorithms and software engineering. We have written two research papers for refereed publication [3-4] and are working on another one [5]. In addition, we have presented our research findings at three academic and professional conferences.
Physical Description: 5 p. (0.1 MB)

Subject

STI Subject Categories: 99 General And Miscellaneous//Mathematics, Computing, And Information Science
Keyword: Computers
Keyword: Implementation
Keyword: Processing
Keyword: Linear Programming
Keyword: Supercomputers
Keyword: Efficiency
Keyword: Verification
Keyword: Algorithms
Keyword: Internet
Keyword: Exact Solutions
Keyword: Transformations
Keyword: Metrics
Keyword: Communities

Collection

Name: Office of Scientific & Technical Information Technical Reports

Code: OSTI

Institution

Name: UNT Libraries Government Documents Department

Code: UNTGD

Resource Type

Report

Format

Text

Identifier

Report No.: LLNL-TR-401193
Grant Number: W-7405-ENG-48
DOI: 10.2172/926011
Office of Scientific & Technical Information Report Number: 926011
Archival Resource Key: ark:/67531/metadc894116

Note

Display Note: PDF-file: 5 pages; size: 0.1 Mbytes