HCDF: A Hybrid Community Discovery Framework Page: 4 of 14
This article is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided to Digital Library by the UNT Libraries Government Documents Department.
The following text was automatically extracted from the image on this page using optical character recognition software:
communities or groups).
Coalescing strategies. We explored three options
for incorporating hints into the core Bayesian model:
seed which uses hints only as an initial configuration for
the inference procedure; prior which propagates hints
from one configuration to the next; and attribute which
incorporates hints as additional link-attributes.
To quantitatively measure the effectiveness of the
discovered community structure, we use link prediction
performance and robustness to small perturbations in
the network structure. Why is link prediction a good
measure? A factorization of the graph's connectivity
structure into a number of communities is good, if
it can be used to predict the presence or absence of
links between a pair of nodes based on their respective
communities. Therefore, we evaluate effectiveness by
randomly holding out a number of links, building the
model under evaluation, and then trying to predict
the held-out links. We use area under the ROC curve
(AUC), as an accurate measure of performance.1 Why
is robustness to small perturbations in the network
structure a good measure? As Karrer et al. argue,
community structures that are "significant / believable"
should be able to withstand small perturbations in the
network structure . We utilize their quantification
of robustness, measured by Variation of Information
A: an entropy-based distance metric that measures the
distance between two clusterings one on the original
network and one on the perturbed network.
We describe an extensive evaluation on several
real data sets from a diverse range of domains (see
Section 4). Our methods, HCD-X (which utilizes
hints from Cross-Associations as link-attributes) and
HCD-M (which utilizes hints from Fast Modularity
as link-attributes), achieve significant improvements
in effectiveness across the board, while maintaining
scalability. We observe up to 0.22 improvement in AUC
scores for link prediction. The average AUC scores for
HCD-M and HCD-X (across nine data sets and five trial
runs) are 0.95 and 0.96, respectively. Furthermore, the
AUC of our hybrid methods never drops below 0.91 in
the worst case.
With respect to robustness to small perturbations
in the network structure, both of our hybrid meth-
ods maintain low values for Variation of Information,
A E [0, 1]. For example, with 10% of the links ran-
domly rewired (while maintaining the same degree dis-
tribution), our hybrid methods have on average A of
1AUC is equal to the probability of a model ranking a
randomly chosen positive instance higher than a randomly chosen
negative instance. The default value for AUC is 0.5. In our case,
a positive instance is a nonzero entry in the adjacency matrix
versus a negative instance is a zero entry.
0.19 (indicating that "significant / believable" commu-
nities were found).
Summarizing, the main contributions of the paper
* We propose a generic Bayesian framework, 7-(CDB,
for hybrid community discovery; and present three
ways for coalescing hints from various community
* We propose two novel solutions, HCD-X and HCD-
M, for the task of community discovery that are (1)
effective in terms of link prediction and robustness
to small perturbations in network structure, (2)
scalable, (3) nonparametric, and (4) consistent
across various application domains.
* We present results from an extensive evaluation
of HCD-X and HCD-M on several large real-
world graphs, with millions of links. Our results
demonstrate that HCD-X and HCD-M are highly
effective and consistent across different application
The rest of the paper is organized as follows. Next,
we review the related work. We present our proposed
method in Section 3, followed by experimental results
and discussion. We conclude the paper in Section 5.
2 Related work
Bayesian Approaches to Community Discov-
ery in Graphs. There are only a handful of scalable
Bayesian approaches to community discovery in graphs
[26, 23, 22, 15, 12], most extend Latent Dirichlet Allo-
cation (LDA) . LDA is a latent variable model for
topic modeling. SSN-LDA  and LDA-G  are the
simplest adaptions of LDA for community discovery in
graphs. GWN-LDA  introduces a Gaussian distribu-
tion with inverse-Wishart prior on a LDA-based model
to find communities in social networks with weighted
links. LDA-based models have also been used to find
communities in textual attributes and relations [24, 15].
For simplicity's sake, we chose LDA-G for the Bayesian
constituent of HCD-X and HCD-M. We could have eas-
ily chosen one of the other scalable Bayesian approaches.
The effectiveness and consistency of these non-hybrid
Bayesian models to community discovery in a diverse
collection of real-world graphs is unknown.
In , the authors propose a multi-step approach
involving LDA to cluster large document collections
containing both text and relations. In [16, 10], the
authors propose a two-step approach to find clusters
in data containing attributes and relations (namely,
citation graphs and a gene-interaction network). The
strategies used to incorporate information from one step
to the next is different in 7-C'DF than in these methods.
Here’s what’s next.
This article can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Article.
Henderson, K; Eliassi-Rad, T; Papadimitriou, S & Faloutsos, C. HCDF: A Hybrid Community Discovery Framework, article, January 19, 2010; Livermore, California. (digital.library.unt.edu/ark:/67531/metadc868844/m1/4/: accessed May 25, 2018), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Government Documents Department.