4057cc46331e103e.tex
1: \begin{abstract}
2: Many real-world networks are prohibitively large for data retrieval,
3: storage and analysis of all of its nodes and links. Understanding the
4: structure and dynamics of these networks entails creating a smaller
5: representative sample of the full graph while preserving its
6: relevant topological properties. In this report, we show that graph sampling 
7: algorithms currently proposed in the literature are not able to
8: preserve network properties even with sample sizes containing as many
9: as 20\% of the nodes from the original graph. We present a new sampling
10: algorithm, called Tiny Sample Extractor, with a new goal of a sample
11: size smaller than 5\% of the original graph while preserving 
12: two key properties of a network, the degree distribution and its
13: clustering co-efficient. Our approach is based on a new empirical method of
14: estimating measurement biases in crawling algorithms and compensating
15: for them accordingly. We present a detailed comparison of best known
16: graph sampling algorithms, focusing in particular on how the
17: properties of the sample subgraphs converge to those of the original
18: graph as they grow. These results show that our sampling algorithm
19: extracts a smaller subgraph than other algorithms while also achieving
20: a closer convergence to the degree distribution, measured by the
21: degree exponent, of the original graph. The subgraph generated by the
22: Tiny Sample Extractor, however, is not necessarily representative of
23: the full graph with regard to other properties such as
24: assortativity. This indicates that the problem of extracting a truly
25: representative small subgraph from a large graph remains unsolved.
26: \end{abstract}
27: