abstract:a22ebde629685540.tex

1: \begin{abstract}

2:  This paper initiates formal analysis of a simple, distributed algorithm

3: for community detection on networks. We analyze an algorithm that we call

4: \textsc{Max-LPA}, both in terms of its convergence time and in terms of the

5: ``quality'' of the communities detected. \textsc{Max-LPA} is an instance

6: of a class of community detection algorithms called \textit{label propagation}

7: algorithms. As far as we know, most analysis of label propagation algorithms

8: thus far has been empirical in nature and in this paper we seek a theoretical

9: understanding of label propagation algorithms.

10: In our main result, we define a clustered version of \er random graphs

11: with clusters $V_1, V_2, \ldots, V_k$ where the probability $p$, of an

12: edge connecting nodes within a cluster $V_i$ is higher than $p'$, the

13: probability of an edge connecting nodes in distinct clusters. We show

14: that even with fairly general restrictions on $p$ and $p'$ ($p =

15: \Omega\left(\frac{1}{n^{1/4-\epsilon}}\right)$ for any $\epsilon > 0$, $p' = O(p^2)$, where $n$ is

16: the number of nodes), \textsc{Max-LPA}

17: detects the clusters $V_1, V_2, \ldots, V_n$ in just two rounds. Based on this

18: and on empirical results, we conjecture that \textsc{Max-LPA} can correctly

19: and quickly identify communities on clustered \er graphs even when the

20: clusters are much sparser, i.e., with $p = \frac{c\log n}{n}$ for some $c > 1$.

21: \end{abstract}

22: