0009:cs0009005/path.tex

1: \documentclass[11pt,twocolumn]{article}

2: \usepackage{times}

3: \usepackage{mathfont}

4: \usepackage{url}

5:

6: \def\square{\framebox{ } \smallskip \smallskip}

7: \long\def\omit#1{}

8: \def\log{\mathop{{\rm log}}}

9: \def\Pr{\mathop{{\rm Pr}}}

10:

11: % magic to make big-O notation use script font

12: \mathcode`O="724F

13:

14: \setlength{\textwidth}{6.5in}

15: \setlength{\textheight}{9.2in}

16: \setlength{\topmargin}{-.5in}

17: \setlength{\oddsidemargin}{0in}

18: \setlength{\evensidemargin}{0in}

19:

20: \pagenumbering{arabic}

21: \newtheorem{theorem}{Theorem}

22: \newtheorem{lemma}[theorem]{Lemma}

23: \newtheorem{corollary}[theorem]{Corollary}

24: \newtheorem{observation}[theorem]{Observation}

25:

26: \begin{document}

27: \title{Fast Approximation of Centrality}

28: \author{David Eppstein\thanks{Dept. Inf. \& Comp. Sci.,

29: UC Irvine, CA 92697-3425, USA,

30: {\tt\{eppstein,josephw\}@ics.uci.edu}.}

31: \and Joseph Wang$^*$}

32: \date{ }

33: \maketitle

34:

35: \begin{abstract}

36: Social studies researchers use graphs to model

37: group activities in social networks.

38: An important property in this context is

39: the {\em centrality} of a vertex: the inverse of the

40: average distance to each other vertex.  We describe a

41: randomized approximation algorithm for centrality in weighted

42: graphs.  For graphs exhibiting the small world phenomenon, our

43: method estimates the centrality of all vertices

44: with high probability within a $(1+\epsilon)$ factor in near-linear time.

45: \end{abstract}

46:

47: \section{Introduction}

48: In social network analysis, the vertices of a graph represent

49: agents in a group and the edges represent relationships, such

50: as communication or friendship.

51: The idea of applying graph theory to analyze the

52: connection between the structural {\em centrality}

53: and group process was introduced by Bavelas \cite{Bavelas48}.

54: Various measurement of centrality \cite{Bonacich72,Freeman79,Friedkin91}

55: have been proposed for analyzing

56: communication activity, control, or independence within

57: a social network.

58:

59: We are particularly interested in {\em closeness centrality}

60: \cite{Bavelas50,Beauchamp65,Sabidussi66}, which is used to

61: measure the independence and efficiency of an agent

62: \cite{Freeman79,Friedkin91}. Beauchamp~\cite{Beauchamp65} defined

63: the closeness centrality of agent $a_j$ as

64: $${n - 1} \over {\sum_{i = 1}^{n} d(i, j)}$$

65: where $d(i, j)$ is the

66: distance between agents $i$ and~$j$.\footnote{This

67: should be distinguished from another common concept of graph centrality,

68: in which the most central vertices minimize the maximum

69: distance to another vertex.}

70: We

71: are interested in computing centrality  values for all agents.

72: To compute the centrality for each agent,

73: it is sufficient to solve the all-pairs shortest-paths (APSP)

74: problem. No faster exact method is known.

75:

76: The APSP problem can be solved by various algorithms

77: in time $O(nm + n^2 \log n)$ \cite{FredmanTarjan87,Johnson77}, $O(n^3)$

78: \cite{Floyd62}, or more quickly using fast matrix multiplication

79: techniques \cite{AGM97,CoppWin90,Seidel95,Yuval76}.

80: \omit{Several researchers have developed more efficient algorithms for

81: special graph classes such as interval graphs

82: \cite{ACL93,CLSS98,RMPR92} and chordal graphs \cite{BCD94,HSS97}.

83: The APSP problem

84: can be solved in average-case in time $O(n^2 \log n)$

85: for various classes of random graphs

86: \cite{CFMP97,FriezeGrimmett85,MehlhornPriebe95,MoffatTakaoka85}.}

87: Because these results are slow

88: or (with fast matrix multiplication) complicated and impractical,

89: and because recent applications of social network theory to the internet

90: may involve graphs with millions of vertices, it is of interest to

91: consider faster approximations. Aingworth et al. \cite{ACIM99} proposed

92: an algorithm with an additive error of $2$

93: for the unweighted APSP problem

94: that runs in time $O(n^{2.5}\sqrt{\log n})$.

95: However this is still slow and does not provide a good approximation

96: when the distances are small.

97:

98: In this paper, we consider a method for fast approximation of centrality.

99: We apply a random sampling technique to approximate the

100: inverse centrality of all vertices in a weighted graph to within an

101: additive error of $\epsilon \Delta$ with high probability

102: in time $O({\log n \over \epsilon^2} (n \log n + m))$, where

103: $\epsilon$ is any fixed constant and $\Delta$ is the diameter of the

104: graph.

105:

106: It has been observed empirically that many social networks exhibit the

107: {\em small world phenomenon} \cite{Milgram67}: their diameter is bounded

108: by a constant, or, equivalently, the ratio between the minimum and

109: maximum distance is bounded.  For such networks, the inverse centrality

110: at any vertex is $\Omega(\Delta)$ and our method provides a near-linear

111: time $(1+\epsilon)$-approximation to the centrality of all vertices.

112:

113:

114: \omit{\section{Preliminaries}

115: We are given a graph $G(V, E)$ with $n$ vertices

116: and $m$ edges. The distance $d(u, v)$ between

117: two vertices $u$ and $v$ is the length of the shortest path

118: between them. The diameter $\Delta$ of a graph $G$

119: is defined as $max_{u, v \in V} d(u, v)$. For

120: simplicity, we define centrality $c_v$ for vertex $v$

121: as ${n - 1} \over {\sum_{u \in V} d(u, v)}$. If $G$ is not

122: connected, then $c_v = \infty$. Hence we will assume $G$ is connected.}

123:

124: \omit{Given an optimization problem $P$. Let $value(OPT)$ denote

125: the optimal solution for a problem instance in $P$.

126: Let $value(A)$ denote the solution computed by

127: an approximation algorithm $A$.

128: $A$ is said to have constant additive approximation

129: error $c$ if $|value(A) -  value(OPT)| \le c$ for every

130: problem instance in $P$. }

131:

132: % too long for two column mode

133: %\section{Randomized Approximation Algorithm}

134: \section{The Algorithm}

135:

136: We now describe a randomized

137: approximation algorithm RAND for estimating centrality.

138: RAND randomly chooses $k$ sample vertices and computes

139: single-source shortest-paths (SSSP) from each sample vertex to all

140: other vertices. The estimated centrality of a vertex is

141: defined in terms of the average distance to the sample vertices.

142:

143:

144: \vfil\eject

145:

146: \noindent {\bf Algorithm RAND:}

147: \begin{enumerate}

148: \item Let $k$ be the number of iterations needed to

149: obtain the desired error bound.

150: \item In iteration $i$, pick

151: vertex $v_i$ uniformly at random from $G$ and solve the SSSP problem

152: with

153: $v_i$ as the source.

154: \item Let

155: 	$$\hat{c}_u = 1/\sum_{i = 1}^{k}\frac{n\,d(v_i, u)}{k(n-1)}$$

156: be the centrality estimator for vertex $u$.

157: \end{enumerate}

158:

159: \smallskip

160: It is not hard to see that, for any $k$ and $u$,

161: the expected value of $1/\hat{c}_u$ is equal to $1/c_u$.

162:

163: \omit{

164: PROOF NEEDS FIXING TO ACCOUNT FOR N/(N-1) FACTOR!

165: \begin{theorem}

166: $E[1/\hat{c}_u] = 1/c_u$.

167: \end{theorem}

168:

169: {\bf Proof:}

170: Each vertex has equal probability of $1/n$ to be picked at each

171: round. The expected value for $1 \over \hat{c}_u$ is

172: \begin{eqnarray*}

173: E[1 \over {\hat{c}_u}] & = & {n \over n - 1} 1/n^{k} \cdot {{kn^{k - 1}

174: \sum_{i = 1}^{n} d(i, u)} \over k} \\

175: & = & {n \over n - 1} {{\sum_{i = 1}^{n} d(i, u)} \over n}  \\

176: & = & {1 \over c_u}.\end{eqnarray*}

177: \square

178: \bigskip

179: }

180:

181: \omit{In 1963, Hoeffding \cite{Hoeffding63} gave the following theorem

182: on probability bounds for sums of independent random variables.}

183:

184: \begin{lemma}[Hoeffding

185: \cite{Hoeffding63}]

186: If $x_1, x_2, \ldots, x_{k}$ are independent,

187: $a_i \le x_i \le b_i$,

188: and $\mu = E[\sum x_i/k]$ is the expected mean,

189: then for $\xi > 0$

190: $$\Pr\Bigl\{ |{\sum_{i = 1}^{k} x_i \over k} - \mu| \ge \xi \Bigr\}

191: \le 2 e^{-2{k}^2 {\xi}^2/\sum_{i = 1}^{k}(b_i - a_i)^2}.$$

192: \end{lemma}

193: \bigskip

194:

195:

196: We need to bound the probability that the error in estimating

197: the inverse centrality of any vertex $u$ is at most $\xi$.

198: This is done by applying Hoeffding's bound with

199: $x_i = \frac{d(i, u) n}{(n-1)}$,

200: $\mu = \frac{1}{c_u}$,

201: $a_i=0$, and $b_i=\frac{n\Delta}{n-1}$.

202: \omit{

203: % I just put the factor of two directly into the lemma

204: We know $E[1/\hat{c}_u] = 1/c_u$.

205: To take care of the case in

206: which $\hat{c}_u$ is smaller than $c_u$, we multiply

207: the above inequality by $2$.

208: }

209: Thus the probability that

210: the difference between the estimated inverse centrality

211: $1/\hat{c}_u$ and the actual inverse centrality $1/c_u$ is more than $\xi$

212: is

213: \begin{eqnarray*}

214: \Pr\left\{ {\textstyle |\frac{1}{\hat{c}_u} - \frac{1}{c_u}|}

215: \ge \xi \right\}

216: & \le &

217: 2 \cdot e^{-2{k}^2 {\xi}^2/\sum_{i = 1}^{k}(b_i - a_i)^2} \\

218: & = & 2 \cdot e^{-2{k}^2 {\xi}^2/{k}(\frac{n\Delta}{n-1})^2} \\

219: & = & 2 \cdot e^{-\Omega(k\xi^2/\Delta^2)}

220: \end{eqnarray*}

221: For $\xi = \epsilon\Delta$, using $\Theta(\frac{\log n}{\epsilon^2})$

222: samples will cause the probability of error at any vertex to be bounded

223: above by e.g. $1/n^2$, giving at most $1/n$ probability of

224: having greater than $\epsilon\Delta$ error anywhere in the graph.

225:

226: \omit{

227: Fredman and Tarjan \cite{FredmanTarjan87} gave an

228: algorithm for solving the $SSSP$ problem in time $O(n \log n + m)$.

229: Thus }

230: The total running time of algorithm is

231: $O(k \cdot m)$ for unweighted graphs and $O(k (n \log n + m))$

232: for weighted graphs.

233: Thus, for $k = \Theta(\frac{\log n}{\epsilon^2})$,

234: we have an $O({\log n \over \epsilon^2} (n \log n + m))$ algorithm

235: for approximating centrality within an inverse additive

236: error of $\epsilon \Delta$ with high probability.

237:

238:

239: \omit{\section{Conclusion}

240: We gave an $O({\log n \over \epsilon^2} (n \log n + m))$

241: randomized algorithm with additive error of $\epsilon \Delta$

242: for weighted graphs. Many graph classes such as paths, cycles,

243: and balanced trees, have centrality proportional to

244: $\Delta$. More interestingly, Milgram \cite{Milgram67} showed that

245: many social networks have bounded diameter and centrality.

246: When the centrality is proportional to $\Delta$,

247: we have an $(1 + \epsilon)$-approximation algorithm. }

248:

249: \small

250: \paragraph{Acknowledgements.}

251: We thank Dave Goggin for bringing this problem to our attention,

252: and Lin Freeman for helpful comments on a draft of this paper.

253:

254: \bibliographystyle{nomonths}

255: \let\oldbib\thebibliography

256: \def\thebibliography#1{\oldbib{#1}\itemsep 0pt}

257: \bibliography{bibdata}

258: \end{document}

259: