abstract:6dd3f3e60f6f96dd.tex

1: \begin{abstract}

2: Network representation learning (NRL) technique has been successfully adopted in various data mining and machine learning applications.

3: Random walk based NRL is one popular paradigm, which uses a set of random walks to capture the network structural information,

4: and then employs word2vec models to learn the low-dimensional representations.

5: However, until now there is lack of a framework, which unifies existing random walk based NRL models

6: and supports to efficiently learn from large networks.

7: The main obstacle comes from the diverse random walk models and the inefficient sampling method for the random walk generation.

8: In this paper, we first introduce a new and efficient edge sampler

9: based on Metropolis-Hastings sampling technique,

10: and theoretically show the convergence property of the edge sampler to arbitrary discrete probability distributions.

11: Then we propose a random walk model abstraction,

12: in which users can easily define different transition probability by specifying dynamic edge weights and random walk states.

13: The abstraction is efficiently supported by our edge sampler, since our sampler can draw samples from unnormalized probability distribution in constant time complexity.

14: Finally, with the new edge sampler and random walk model abstraction,

15: we carefully implement a scalable NRL framework called \sys.

16: We conduct comprehensive experiments with five random walk based NRL models over eleven real-world datasets,

17: and the results clearly demonstrate the efficiency of \sys over billion-edge networks.

18: The code of \sys is released at: https://github.com/shaoyx/UniNet.

19: \end{abstract}

20: