Decomposition of Large Scale Semantic Graphsvia an Efficient Communities Algorithm Metadata

Metadata describes a digital item, providing (if known) such information as creator, publisher, contents, size, relationship to other resources, and more. Metadata may also contain "preservation" components that help us to maintain the integrity of digital files over time.

Title

  • Main Title Decomposition of Large Scale Semantic Graphsvia an Efficient Communities Algorithm

Creator

  • Author: Yao, Y
    Creator Type: Personal

Contributor

  • Sponsor: United States. Department of Energy.
    Contributor Type: Organization

Publisher

  • Name: Lawrence Livermore National Laboratory
    Place of Publication: Livermore, California
    Additional Info: Lawrence Livermore National Laboratory (LLNL), Livermore, CA

Date

  • Creation: 2008-02-08

Language

  • English

Description

  • Content Description: Semantic graphs have become key components in analyzing complex systems such as the Internet, or biological and social networks. These types of graphs generally consist of sparsely connected clusters or 'communities' whose nodes are more densely connected to each other than to other nodes in the graph. The identification of these communities is invaluable in facilitating the visualization, understanding, and analysis of large graphs by producing subgraphs of related data whose interrelationships can be readily characterized. Unfortunately, the ability of LLNL to effectively analyze the terabytes of multisource data at its disposal has remained elusive, since existing decomposition algorithms become computationally prohibitive for graphs of this size. We have addressed this limitation by developing more efficient algorithms for discerning community structure that can effectively process massive graphs. Current algorithms for detecting community structure, such as the high quality algorithm developed by Girvan and Newman [1], are only capable of processing relatively small graphs. The cubic complexity of Girvan and Newman, for example, makes it impractical for graphs with more than approximately 10{sup 4} nodes. Our goal for this project was to develop methodologies and corresponding algorithms capable of effectively processing graphs with up to 10{sup 9} nodes. From a practical standpoint, we expect the developed scalable algorithms to help resolve a variety of operational issues associated with the productive use of semantic graphs at LLNL. During FY07, we completed a graph clustering implementation that leverages a dynamic graph transformation to more efficiently decompose large graphs. In essence, our approach dynamically transforms the graph (or subgraphs) into a tree structure consisting of biconnected components interconnected by bridge links. This isomorphism allows us to compute edge betweenness, the chief source of inefficiency in Girvan and Newman's decomposition algorithm, much more efficiently, leading to significantly reduced computation time. Test runs on a desktop computer have shown reductions of up to 89%. Our focus this year has been on the implementation of parallel graph clustering on one of LLNL's supercomputers. In order to achieve efficiency in parallel computing, we have exploited the fact that large semantic graphs tend to be sparse, comprising loosely connected dense node clusters. When implemented on distributed memory computers, our approach performed well on several large graphs with up to one billion nodes, as shown in Table 2. The rightmost column of Table 2 contains the associated Newman's modularity [1], a metric that is widely used to assess the quality of community structure. Existing algorithms produce results that merely approximate the optimal solution, i.e., maximum modularity. We have developed a verification tool for decomposition algorithms, based upon a novel integer linear programming (ILP) approach, that computes an exact solution. We have used this ILP methodology to find the maximum modularity and corresponding optimal community structure for several well-studied graphs in the literature (e.g., Figure 1) [3]. The above approaches assume that modularity is the best measure of quality for community structure. In an effort to enhance this quality metric, we have also generalized Newman's modularity based upon an insightful random walk interpretation that allows us to vary the scope of the metric. Generalized modularity has enabled us to develop new, more flexible versions of our algorithms. In developing these methodologies, we have made several contributions to both graph theoretic algorithms and software engineering. We have written two research papers for refereed publication [3-4] and are working on another one [5]. In addition, we have presented our research findings at three academic and professional conferences.
  • Physical Description: 5 p. (0.1 MB)

Subject

  • STI Subject Categories: 99 General And Miscellaneous//Mathematics, Computing, And Information Science
  • Keyword: Computers
  • Keyword: Implementation
  • Keyword: Processing
  • Keyword: Linear Programming
  • Keyword: Supercomputers
  • Keyword: Efficiency
  • Keyword: Verification
  • Keyword: Algorithms
  • Keyword: Internet
  • Keyword: Exact Solutions
  • Keyword: Transformations
  • Keyword: Metrics
  • Keyword: Communities

Collection

  • Name: Office of Scientific & Technical Information Technical Reports
    Code: OSTI

Institution

  • Name: UNT Libraries Government Documents Department
    Code: UNTGD

Resource Type

  • Report

Format

  • Text

Identifier

  • Report No.: LLNL-TR-401193
  • Grant Number: W-7405-ENG-48
  • DOI: 10.2172/926011
  • Office of Scientific & Technical Information Report Number: 926011
  • Archival Resource Key: ark:/67531/metadc894116

Note

  • Display Note: PDF-file: 5 pages; size: 0.1 Mbytes
Back to Top of Screen