Decomposition of Large Scale Semantic Graphsvia an Efficient Communities Algorithm

PDF Version Also Available for Download.

Description

Semantic graphs have become key components in analyzing complex systems such as the Internet, or biological and social networks. These types of graphs generally consist of sparsely connected clusters or 'communities' whose nodes are more densely connected to each other than to other nodes in the graph. The identification of these communities is invaluable in facilitating the visualization, understanding, and analysis of large graphs by producing subgraphs of related data whose interrelationships can be readily characterized. Unfortunately, the ability of LLNL to effectively analyze the terabytes of multisource data at its disposal has remained elusive, since existing decomposition algorithms become ... continued below

Physical Description

5 p. (0.1 MB)

Creation Information

Yao, Y February 8, 2008.

Context

This report is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided by UNT Libraries Government Documents Department to Digital Library, a digital repository hosted by the UNT Libraries. More information about this report can be viewed below.

Who

People and organizations associated with either the creation of this report or its content.

Author

Publisher

Provided By

UNT Libraries Government Documents Department

Serving as both a federal and a state depository library, the UNT Libraries Government Documents Department maintains millions of items in a variety of formats. The department is a member of the FDLP Content Partnerships Program and an Affiliated Archive of the National Archives.

Contact Us

What

Descriptive information to help identify this report. Follow the links below to find similar items on the Digital Library.

Description

Semantic graphs have become key components in analyzing complex systems such as the Internet, or biological and social networks. These types of graphs generally consist of sparsely connected clusters or 'communities' whose nodes are more densely connected to each other than to other nodes in the graph. The identification of these communities is invaluable in facilitating the visualization, understanding, and analysis of large graphs by producing subgraphs of related data whose interrelationships can be readily characterized. Unfortunately, the ability of LLNL to effectively analyze the terabytes of multisource data at its disposal has remained elusive, since existing decomposition algorithms become computationally prohibitive for graphs of this size. We have addressed this limitation by developing more efficient algorithms for discerning community structure that can effectively process massive graphs. Current algorithms for detecting community structure, such as the high quality algorithm developed by Girvan and Newman [1], are only capable of processing relatively small graphs. The cubic complexity of Girvan and Newman, for example, makes it impractical for graphs with more than approximately 10{sup 4} nodes. Our goal for this project was to develop methodologies and corresponding algorithms capable of effectively processing graphs with up to 10{sup 9} nodes. From a practical standpoint, we expect the developed scalable algorithms to help resolve a variety of operational issues associated with the productive use of semantic graphs at LLNL. During FY07, we completed a graph clustering implementation that leverages a dynamic graph transformation to more efficiently decompose large graphs. In essence, our approach dynamically transforms the graph (or subgraphs) into a tree structure consisting of biconnected components interconnected by bridge links. This isomorphism allows us to compute edge betweenness, the chief source of inefficiency in Girvan and Newman's decomposition algorithm, much more efficiently, leading to significantly reduced computation time. Test runs on a desktop computer have shown reductions of up to 89%. Our focus this year has been on the implementation of parallel graph clustering on one of LLNL's supercomputers. In order to achieve efficiency in parallel computing, we have exploited the fact that large semantic graphs tend to be sparse, comprising loosely connected dense node clusters. When implemented on distributed memory computers, our approach performed well on several large graphs with up to one billion nodes, as shown in Table 2. The rightmost column of Table 2 contains the associated Newman's modularity [1], a metric that is widely used to assess the quality of community structure. Existing algorithms produce results that merely approximate the optimal solution, i.e., maximum modularity. We have developed a verification tool for decomposition algorithms, based upon a novel integer linear programming (ILP) approach, that computes an exact solution. We have used this ILP methodology to find the maximum modularity and corresponding optimal community structure for several well-studied graphs in the literature (e.g., Figure 1) [3]. The above approaches assume that modularity is the best measure of quality for community structure. In an effort to enhance this quality metric, we have also generalized Newman's modularity based upon an insightful random walk interpretation that allows us to vary the scope of the metric. Generalized modularity has enabled us to develop new, more flexible versions of our algorithms. In developing these methodologies, we have made several contributions to both graph theoretic algorithms and software engineering. We have written two research papers for refereed publication [3-4] and are working on another one [5]. In addition, we have presented our research findings at three academic and professional conferences.

Physical Description

5 p. (0.1 MB)

Notes

PDF-file: 5 pages; size: 0.1 Mbytes

Language

Item Type

Identifier

Unique identifying numbers for this report in the Digital Library or other systems.

  • Report No.: LLNL-TR-401193
  • Grant Number: W-7405-ENG-48
  • DOI: 10.2172/926011 | External Link
  • Office of Scientific & Technical Information Report Number: 926011
  • Archival Resource Key: ark:/67531/metadc894116

Collections

This report is part of the following collection of related materials.

Office of Scientific & Technical Information Technical Reports

Reports, articles and other documents harvested from the Office of Scientific and Technical Information.

Office of Scientific and Technical Information (OSTI) is the Department of Energy (DOE) office that collects, preserves, and disseminates DOE-sponsored research and development (R&D) results that are the outcomes of R&D projects or other funded activities at DOE labs and facilities nationwide and grantees at universities and other institutions.

What responsibilities do I have when using this report?

When

Dates and time periods associated with this report.

Creation Date

  • February 8, 2008

Added to The UNT Digital Library

  • Sept. 27, 2016, 1:39 a.m.

Description Last Updated

  • April 13, 2017, 6:08 p.m.

Usage Statistics

When was this report last used?

Congratulations! It looks like you are the first person to view this item online.

Interact With This Report

Here are some suggestions for what to do next.

Start Reading

PDF Version Also Available for Download.

Citations, Rights, Re-Use

Yao, Y. Decomposition of Large Scale Semantic Graphsvia an Efficient Communities Algorithm, report, February 8, 2008; Livermore, California. (digital.library.unt.edu/ark:/67531/metadc894116/: accessed October 20, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Government Documents Department.