abstract:4057cc46331e103e.tex

1: \begin{abstract}

2: Many real-world networks are prohibitively large for data retrieval,

3: storage and analysis of all of its nodes and links. Understanding the

4: structure and dynamics of these networks entails creating a smaller

5: representative sample of the full graph while preserving its

6: relevant topological properties. In this report, we show that graph sampling

7: algorithms currently proposed in the literature are not able to

8: preserve network properties even with sample sizes containing as many

9: as 20\% of the nodes from the original graph. We present a new sampling

10: algorithm, called Tiny Sample Extractor, with a new goal of a sample

11: size smaller than 5\% of the original graph while preserving

12: two key properties of a network, the degree distribution and its

13: clustering co-efficient. Our approach is based on a new empirical method of

14: estimating measurement biases in crawling algorithms and compensating

15: for them accordingly. We present a detailed comparison of best known

16: graph sampling algorithms, focusing in particular on how the

17: properties of the sample subgraphs converge to those of the original

18: graph as they grow. These results show that our sampling algorithm

19: extracts a smaller subgraph than other algorithms while also achieving

20: a closer convergence to the degree distribution, measured by the

21: degree exponent, of the original graph. The subgraph generated by the

22: Tiny Sample Extractor, however, is not necessarily representative of

23: the full graph with regard to other properties such as

24: assortativity. This indicates that the problem of extracting a truly

25: representative small subgraph from a large graph remains unsolved.

26: \end{abstract}

27: