012c0a3323d65c07.tex
1: \begin{abstract}
2: % We introduce the generalised random dot product graph, a latent position network model that comprises as special cases the mixed membership and standard stochastic block models. Nodes are represented as vectors in $\R^d$, and the probability of an edge between two nodes functionally depends on the nodes' vector representations via a canonical choice of indefinite inner product. This model provides the only possible representation of nodes in $\R^d$ such that mixed membership is encoded by the corresponding convex combination of vectors. We prove two results, showing strong consistency and a central limit theorem, on latent position estimates using spectral embedding of the adjacency and normalised Laplacian matrices. Estimation of the stochastic block model then reduces to a Gaussian clustering problem, and the mixed membership stochastic blockmodel to a support estimation problem. However, the positions are identifiable only up to transformation in the indefinite orthogonal group $\indefO$, where $p \geq 1$ and $q \geq 0$ are two integers satisfying $p+q=d$. When $q > 0$, indicating the presence of disassortative network behaviour, inter-point distance is not identifiable, with implications for subsequent inference. For example, the $K$-means algorithm, which is commonly recommended for spectral clustering, is not invariant to such transformations, whereas (non-spherical) Gaussian clustering is --- and should be preferred.
3: % \end{abstract}
4: