1: \begin{abstract}
2: Our objective is to sample the node set of a large unknown graph via crawling, to accurately estimate a given metric of interest.
3: %We design a weighted random walk (WRW) algorithm that achieves high efficiency by preferentially crawling those nodes and edges that convey greater information regarding the target metric.
4: We design a random walk on an appropriately defined weighted graph
5: %% Comment: this is accurate and also captures both the definition of categories and setting up the weights
6: that achieves high efficiency by preferentially crawling those nodes and edges that convey greater information regarding the target metric.
7: Our approach begins by employing the theory of stratification
8: %and importance sampling
9: to find optimal node weights, for a given estimation problem, under an independence sampler. While optimal under independence sampling, these weights may be impractical %or even unachievable via
10: under graph crawling due to constraints arising from the structure of the graph.
11: Therefore, the edge weights for our random walk should be chosen so as to lead to an equilibrium distribution that strikes a balance between approximating the optimal weights under an independence sampler and achieving fast convergence.
12: We propose a heuristic approach (stratified weighted random walk, or S-WRW) that achieves this goal, while using only limited information about the graph structure and the node properties.
13: We evaluate our technique in simulation, and experimentally, by collecting a sample of Facebook college users.
14: We show that S-WRW requires \mbox{13-15} times fewer samples than the simple re-weighted random walk (RW) to achieve the same estimation accuracy for a range of metrics.
15:
16: \end{abstract}