1: \begin{abstract}
2: This paper initiates formal analysis of a simple, distributed algorithm
3: for community detection on networks. We analyze an algorithm that we call
4: \textsc{Max-LPA}, both in terms of its convergence time and in terms of the
5: ``quality'' of the communities detected. \textsc{Max-LPA} is an instance
6: of a class of community detection algorithms called \textit{label propagation}
7: algorithms. As far as we know, most analysis of label propagation algorithms
8: thus far has been empirical in nature and in this paper we seek a theoretical
9: understanding of label propagation algorithms.
10: In our main result, we define a clustered version of \er random graphs
11: with clusters $V_1, V_2, \ldots, V_k$ where the probability $p$, of an
12: edge connecting nodes within a cluster $V_i$ is higher than $p'$, the
13: probability of an edge connecting nodes in distinct clusters. We show
14: that even with fairly general restrictions on $p$ and $p'$ ($p =
15: \Omega\left(\frac{1}{n^{1/4-\epsilon}}\right)$ for any $\epsilon > 0$, $p' = O(p^2)$, where $n$ is
16: the number of nodes), \textsc{Max-LPA}
17: detects the clusters $V_1, V_2, \ldots, V_n$ in just two rounds. Based on this
18: and on empirical results, we conjecture that \textsc{Max-LPA} can correctly
19: and quickly identify communities on clustered \er graphs even when the
20: clusters are much sparser, i.e., with $p = \frac{c\log n}{n}$ for some $c > 1$.
21: \end{abstract}
22: