0511:cs0511080/imm.tex

1: \documentclass{article}

2: \usepackage{graphicx}

3:

4: \widowpenalty=10000

5: \clubpenalty=10000

6:

7: \newcommand{\graphHeight}{5cm}

8: \newcommand{\heurHeight}{4.4cm}

9: \newcommand{\graphDraw}{1}

10:

11: \newcommand{\wH}[2]{{h_{#2\mid #1}}}

12: \newcommand{\wP}[2]{p_{#2\mid #1}}

13: \newcommand{\wPs}[0]{P_{\mathrm{s}}}

14: \newcommand{\wPv}[0]{P_{\mathrm{v}}}

15: \newcommand{\qgcc}[1]{{q_{#1}^{V}}}

16: \newcommand{\wgcc}[2]{{w_{#1,#2}^{V}}}

17: \newcommand{\degcc}[1]{{w_{#1}^{V}}}

18:

19: \newcommand{\GCC}[1]{{\mathrm{GCC}_{#1}}}

20: \newcommand{\GOUT}[1]{{\mathrm{GOUT}_{#1}}}

21: \newcommand{\GIN}[1]{{\mathrm{GIN}_{#1}}}

22: \newcommand{\GWCC}[1]{{\mathrm{GWCC}_{#1}}}

23: \newcommand{\GSCC}[1]{{\mathrm{GSCC}_{#1}}}

24: \newcommand{\gcc}[1]{\theta_{#1}}

25: \newcommand{\gin}[1]{{\theta_{#1}^{\mathrm{in}}}}

26: \newcommand{\gout}[1]{{\theta_{#1}^{\mathrm{out}}}}

27: \newcommand{\qin}[1]{{q_{#1}^{\mathrm{in}}}}

28: \newcommand{\qout}[1]{{q_{#1}^{\mathrm{out}}}}

29: \newcommand{\win}[2]{{w_{#1,#2}^{\mathrm{in}}}}

30: \newcommand{\wout}[2]{{w_{#1,#2}^{\mathrm{out}}}}

31: \newcommand{\dein}[1]{{w_{#1}^{\mathrm{in}}}}

32: \newcommand{\deout}[1]{{w_{#1}^{\mathrm{out}}}}

33:

34: \begin{document}

35:

36: \title{A Dissemination Strategy for Immunizing Scale-Free Networks}

37:

38: \author{Alexandre~O.~Stauffer\\

39: Valmir~C.~Barbosa\thanks{Corresponding author (valmir@cos.ufrj.br).}\\

40: \\

41: Universidade Federal do Rio de Janeiro\\

42: Programa de Engenharia de Sistemas e Computa\c c\~ao, COPPE\\

43: Caixa Postal 68511\\

44: 21941-972 Rio de Janeiro - RJ, Brazil}

45:

46: \date{}

47:

48: \maketitle

49:

50: \begin{abstract}

51: We consider the problem of distributing a vaccine for immunizing a scale-free

52: network against a given virus or worm. We introduce a new method, based on

53: vaccine dissemination, that seems to reflect more accurately what is expected to

54: occur in real-world networks. Also, since the dissemination is performed using

55: only local information, the method can be easily employed in practice. Using a

56: random-graph framework, we analyze our method both mathematically and by means

57: of simulations. We demonstrate its efficacy regarding the trade-off between the

58: expected number of nodes that receive the vaccine and the network's resulting

59: vulnerability to develop an epidemic as the virus or worm attempts to infect one

60: of its nodes. For some scenarios, the new method is seen to render the network

61: practically invulnerable to attacks while requiring only a small fraction of the

62: nodes to receive the vaccine.

63:

64: \bigskip

65: \noindent

66: \textbf{Keywords:} Network immunization, Random networks, Heuristic flooding.

67: \end{abstract}

68:

69: %===============================================================================

70: %===============================================================================

71: %===============================================================================

72: \section{Introduction} \label{sec:intro}

73:

74: The term ``scale-free'' is widely used to designate the class of networks that

75: have node degrees distributed as a power law \cite{barabasi1999,newman2003b},

76: according to which the probability that a randomly chosen node has degree $a$ is

77: proportional to $a^{-\tau}$ for some parameter $\tau>0$. There has been a recent

78: surge of interest in scale-free networks, as a great variety of real-world

79: networks, like the Internet, the WWW, social networks, and

80: scientific-collaboration networks, have been empirically observed to have

81: node-degree distributions that approximately follow a power law

82: \cite{faloutsos1999,albert2002}. In contrast with the classical random-graph

83: model introduced by Erd\H{o}s and R�nyi, whose node-degree distribution is the

84: Poisson distribution, therefore sharply concentrated around its mean value

85: \cite{erdos1959,bollobas2002}, scale-free networks normally contain nodes with

86: a wide range of degrees, typically with a few nodes of extremely high degrees

87: coexisting with a plethora of low-degree nodes.

88:

89: In this paper we consider the problem of preventing viruses or worms from

90: spreading on scale-free computer networks. The fact that node degrees are in

91: this case distributed according to a power law has profound impact on the way

92: the network operates. In particular, it makes the problem of fighting the

93: proliferation of viruses and other infections much more challenging, since the

94: presence of high-degree nodes dramatically increases the rate at which a virus

95: may propagate \cite{satorras2001,satorras2003}. For this reason, instead of

96: combating the proliferation of a virus in an already infected network, we

97: consider a preventive immunization strategy, which consists of distributing the

98: appropriate vaccine to a small subset of the network's nodes, striving to

99: immunize those nodes that can more efficiently block the spread of a future

100: infection. The goal of this approach is to distribute the vaccine to as few

101: nodes as possible while making the network invulnerable to an epidemic, that is,

102: to the occurrence of a state in which a relatively large number of nodes is

103: infected.

104:

105: We can measure the efficacy of an immunization strategy by two indicators: the

106: expected spread, which is the expected fraction of the network's nodes that

107: receive the vaccine, and the expected vulnerability, which is the expected

108: fraction of the network's nodes that may become infected when the virus attempts

109: to infect a randomly chosen node of the immunized network. Clearly, these two

110: indicators are strongly influenced by how we select the nodes to receive the

111: vaccine. A simple rule for choosing these nodes is to randomly select a given

112: fraction of the network's nodes \cite{albert2000,cohen2000,satorras2003}. When

113: applied to scale-free networks, we know that this rule normally gives

114: unsatisfactory results, as it only achieves a reasonably small expected

115: vulnerability for prohibitively high expected spreads. An alternative rule

116: consists of distributing the vaccine to all the nodes that have degrees greater

117: than a given value \cite{albert2000,cohen2001,satorras2003}. Despite being more

118: efficient for scale-free networks than the previous strategy, as it achieves

119: quite a small expected vulnerability with only a modest expected spread,

120: applying this rule to real-world networks is known to be usually difficult

121: \cite{cohen2003}. The use of this rule demands global knowledge regarding the

122: location of the nodes having the highest degrees, while the nodes of many

123: real-world networks may only be assumed to have information that can be directly

124: inferred from their immediate neighborhoods. Yet another alternative is to

125: randomly choose some of the network's nodes and, for each of them, to immunize a

126: randomly chosen fraction of its neighbors \cite{cohen2003}. This rule, however,

127: and in fact the previous two as well, seems hard to implement in practice on

128: computer networks, since apparently it requires that the vaccine be somehow

129: transmitted to a given fraction of the network's nodes by means other than the

130: network's own.

131:

132: In this paper, we assume that the vaccine enters the network at a single node,

133: called the originator. We attribute to this node the responsibility of starting

134: the dissemination of the vaccine by initiating the method called heuristic

135: flooding for disseminating information in networks \cite{stauffer2004}. Let $u$

136: be the originator. For each neighbor $v$ of $u$, this method prescribes that $u$

137: forward the vaccine to $v$ with probability given by a heuristic function

138: $h(a,b)$, where $a$ and $b$ are, respectively, the degrees of $u$ and

139: $v$.\footnote{We assume $h(a,b)=0$ if $a=0$ or $b=0$.} Each of the nodes that

140: receive the vaccine, when receiving it for the first time, proceeds likewise and

141: probabilistically forwards the vaccine to its own neighbors. By not requiring

142: that the nodes of the network have information beyond what can be inferred from

143: their immediate neighborhoods, this strategy can be easily used in practice.

144: Furthermore, it represents more accurately what occurs in real scenarios, since

145: it does not rely on the prior selection of nodes that characterizes all the

146: three immunization strategies mentioned above, but rather assumes that the

147: vaccine spreads out of a single node (say, the very site of its development or

148: the site responsible for its distribution) via a heuristically controlled form

149: of flooding. With this set of characteristics that, in essence, make it

150: independent of any network-wide properties, the new strategy is to our knowledge

151: the first of a kind.

152:

153: We organize the remainder of the paper as follows. In Section~\ref{sec:math}, we

154: use a random-graph framework and the formalism introduced in

155: \cite{molloy1995,molloy1998,newman2001}, whose details are discussed as they are

156: needed, to obtain mathematical results for the aforementioned efficacy

157: indicators. We utilize our analytical results in Section~\ref{sec:heur} to

158: discover the properties that an ideal heuristic function should have to be

159: efficient. We then introduce a heuristic function that seeks to approximate this

160: ideal and therefore can be used to disseminate the vaccine. In

161: Section~\ref{sec:sim} we discuss simulation results on random graphs having node

162: degrees distributed according to a power law. Our results reveal that this

163: heuristic function performs very attractively for the ranges of $\tau$ (the

164: distribution's parameter) that typically are thought to hold for networks like

165: the Internet. They also agree satisfactorily with our analytical predictions. We

166: conclude in Section~\ref{sec:conc}.

167:

168: %===============================================================================

169: %===============================================================================

170: %===============================================================================

171: \section{Mathematical analysis} \label{sec:math}

172:

173: Let $G$ be a random graph having $n$ nodes, whose degrees are distributed

174: independently from one another and identically to a random variable $K_G$. We

175: assume that the nodes of $G$ are interconnected in an independent way given

176: their degrees, which therefore remain independent. We base our mathematical

177: analysis of this section on the formalism introduced in \cite{newman2001} and

178: target the case in which $G$ has a formally infinite number of nodes.

179:

180: Let $P_G(a)$ be the probability that a randomly chosen node of $G$ has degree

181: $a$, i.e., the probability that $K_G=a$. The average degree in $G$, denoted by

182: $Z_G$, is clearly

183: \begin{equation}

184:    Z_G = \sum_{a=0}^{n-1} a P_G(a).

185: \end{equation}

186: Given that the degrees of two adjacent nodes are independent from each other,

187: the probability that some node's neighbor has degree $b$ is identical to the

188: expected fraction of edges incident to degree-$b$ nodes, which is given by

189: \begin{equation}

190:    \frac{b P_G(b)}{\sum_{a=0}^{n-1}a P_G(a)} = \frac{b P_G(b)}{Z_G}.

191:    \label{eq:neigh}

192: \end{equation}

193:

194: From \cite{molloy1995,newman2001}, a necessary and sufficient condition for a

195: size-$\Theta(n)$ connected component to almost surely exist in $G$ is that

196: \begin{equation}

197:    \sum_{b=1}^{n-1} (b-1) \frac{b P_G(b)}{Z_G} > 1,

198:    \label{eq:phase}

199: \end{equation}

200: which intuitively means that, given a randomly chosen node $u$ of $G$, a

201: size-$\Theta(n)$ connected component exists almost surely if and only if a

202: neighbor of $u$ is expected to have more than $1$ neighbor besides $u$. When

203: (\ref{eq:phase}) is satisfied, we denote the size-$\Theta(n)$ connected

204: component of $G$ (its giant connected component) by $\GCC{G}$. Also, all the

205: other connected components of $G$ are small with high probability, comprising

206: only $o(n)$ nodes, and $G$ is said to be above the phase transition that gives

207: rise to $\GCC{G}$. On the other hand, when (\ref{eq:phase}) is not satisfied,

208: $G$ is said to be below the phase transition that gives rise to $\GCC{G}$ and

209: all of its connected components are small with high probability, consisting each

210: of $o(n)$ nodes.

211:

212: Given a randomly chosen node $u$ of $G$ and a neighbor $v$ of $u$, we define the

213: reach of $u$ through $v$ as the set of nodes that can be reached by a path

214: starting at $u$ and whose first edge is $(u,v)$. A node belongs to $\GCC{G}$ if

215: and only if it has at least one neighbor through which its reach contains a

216: large, size-$\Theta(n)$ number of nodes. Let $q$ be the probability that a node

217: has a small, size-$o(n)$ reach through a given neighbor. The probability that a

218: degree-$a$ node belongs to $\GCC{G}$ is then $1-q^a$, and the probability that a

219: randomly chosen node of $G$ belongs to $\GCC{G}$, which we denote by $\gcc{G}$,

220: is

221: \begin{equation}

222:    \gcc{G} = 1 - \sum_{a=0}^{n-1} q^a P_G(a).

223:    \label{eq:gccG}

224: \end{equation}

225: The probability $q$ that $u$ has a small, size-$o(n)$ reach through $v$ can be

226: obtained from the probability that $v$ itself has a small, size-$o(n)$ reach

227: through each of its other neighbors (i.e., excluding $u$). Since the probability

228: that two neighbors of $u$ have another common neighbor (i.e., besides $u$)

229: varies with $n$ proportionally to $n^{-1}$ \cite{newman2001}, which for large

230: $n$ is negligible, the probability that $v$ has a small, size-$o(n)$ reach

231: through a given neighbor is also $q$, thus leading to

232: \begin{equation}

233:    q = \sum_{b=1}^{n-1} q^{b-1} \frac{b P_G(b)}{Z_G}.

234: \end{equation}

235: This equation can be solved numerically and then used in (\ref{eq:gccG}) to

236: obtain $\gcc{G}$.

237:

238: From now on, we assume that $G$ is above the phase transition and, therefore,

239: $\GCC{G}$ exists with high probability. Furthermore, since $G$ can be

240: unconnected and real-world computer networks are normally connected, we assume

241: that it is the graph induced by $\GCC{G}$, rather than $G$ itself, that models

242: the network, and also condition the remainder of our analysis accordingly.

243:

244: %===============================================================================

245: \subsection{Expected spread} \label{sec:spread}

246:

247: In this section, we calculate the expected spread in $\GCC{G}$, which is denoted

248: by $\wPs$ and consists of the expected fraction of the nodes of $\GCC{G}$ that

249: are immunized when a vaccine is distributed using the heuristic flooding

250: described in Section~\ref{sec:intro}. We resort to the same method of analysis

251: developed in \cite{stauffer2004}. Let $S$ be a directed subgraph of $G$ that

252: spans all the nodes of $G$. For a degree-$a$ node $u$ and a degree-$b$ neighbor

253: $v$ of $u$ in $G$, the probability that the directed edge $(u \to v)$ exists in

254: $S$ is given by $h(a,b)$, the heuristic function employed during the vaccine

255: dissemination. Before proceeding to the calculation of $\wPs$, we pause for a

256: brief study of $S$.

257:

258: The neighbors of a node $u$ in $S$ can be classified into two different types:

259: the in-neighbors, those from which an edge exists directed toward $u$; and the

260: out-neighbors, those toward which an edge exists directed from $u$. If a

261: directed path exists starting at some node $u$ and ending at another node $v$,

262: then we say that $u$ reaches $v$ in $S$ or that $v$ is in the reach of $u$ in

263: $S$. Note that, if $u$ receives the vaccine, then the reach of  $u$ in $S$ is

264: part of the set of nodes that become immunized.

265:

266: The connected components of a directed graph can also be of two basic types.

267: First, there are the weakly  connected components, which are constituted by the

268: nodes that can reach one another by undirected paths, i.e., paths for which the

269: directions of the edges are disregarded. The other type is that of the strongly

270: connected components, each comprising a maximal set of nodes that can both reach

271: and be reached from one another.

272:

273: Similarly to the case of the undirected graph $G$, there is a criterion for

274: deciding whether $S$ almost surely has a size-$\Theta(n)$ weakly connected

275: component, commonly known as the giant weakly connected component of $S$,

276: denoted by $\GWCC{S}$. Likewise, there is another criterion according to which

277: $S$ almost surely has a size-$\Theta(n)$ strongly connected component, commonly

278: referred to as the giant strongly connected component, denoted by $\GSCC{S}$.

279: Clearly, when both $\GWCC{S}$ and $\GSCC{S}$ exist, as we henceforth assume, all

280: the nodes of $\GSCC{S}$ belong also to $\GWCC{S}$, and all the nodes of

281: $\GWCC{S}$ belong also to $\GCC{G}$.

282:

283: Since $\GSCC{S}$ exists by assumption, we can define two other size-$\Theta(n)$

284: connected components of $S$, which we refer to as the giant in-component

285: ($\GIN{S}$), formed by the nodes that can reach $\GSCC{S}$, and the giant

286: out-component ($\GOUT{S}$), formed by the nodes reachable from $\GSCC{S}$. Note

287: that, by definition, the nodes of $\GSCC{S}$ belong also to both $\GIN{S}$ and

288: $\GOUT{S}$. We denote by $\gin{S}$ and $\gout{S}$ the expected fraction of the

289: nodes of $G$ that belong to, respectively, $\GIN{S}$ and $\GOUT{S}$.

290: Figure~\ref{fig:ggi} illustrates an instance of graph $G$ having a power-law

291: node-degree distribution with $\tau=2.1$ (part~(a)) and a possible instance of

292: its directed subgraph $S$ (part~(b)).

293:

294: \begin{figure*}[!t]

295:    \centering

296:    \begin{tabular}{c}

297:    \includegraphics[scale=\graphDraw]{imgs/g.eps}\\

298:    \\

299:    \includegraphics[scale=\graphDraw]{imgs/gi.eps}

300:    \end{tabular}

301:    \caption{A $G$ instance having a power-law node-degree distribution with

302: $\tau=2.1$ (a) and one possible instance of the directed subgraph $S$ of the $G$

303: instance (b). Part~(b) also shows the nodes belonging to $\GSCC{S}$ (filled

304: circles), $\GIN{S}$ (filled circles and triangles), and $\GOUT{S}$ (filled

305: circles and filled squares).}

306:    \label{fig:ggi}

307: \end{figure*}

308:

309: Assuming that the originator is randomly chosen among the nodes of $\GCC{G}$,

310: the vaccine is guaranteed to be distributed to a size-$\Theta(n)$ set of nodes

311: if the originator belongs to $\GIN{S}$, which happens with probability

312: $\gin{S}/\gcc{G}$. When this is the case, the nodes that receive the vaccine

313: either belong to $\GOUT{S}$, corresponding to a fraction $\gout{S}/\gcc{G}$ of

314: the nodes of $\GCC{G}$, or are not in $\GOUT{S}$ despite being reachable from

315: the originator, and then amount to a small, size-$o(n)$ number of nodes.

316: Neglecting the latter nodes is equivalent to assuming that nodes receive the

317: vaccine only if the originator is in $\GIN{S}$. In this case, only the nodes in

318: $\GOUT{S}$ receive the vaccine and we have

319: \begin{equation}

320:    \wPs = \frac{\gin{S}\gout{S}}{\gcc{G}^2}.

321:    \label{eq:wPi}

322: \end{equation}

323:

324: In order to obtain $\gin{S}$, recall that the nodes of $\GIN{S}$ are the only

325: ones that have a non-negligible reach. Considering a degree-$a$ node $u$ of $G$

326: and a degree-$b$ neighbor $v$ of $u$ in $G$, we say that $v$ is a dead end with

327: respect to $u$ in $S$ if either $(u \to v)$ is not an edge of $S$, or it is but

328: the reach of $u$ through $v$ in $S$ is negligible, consisting of only $o(n)$

329: nodes. Denoting by $\qin{b}$ the conditional probability that the reach of $u$

330: through $v$ in $S$ is negligible given that $u$ is an in-neighbor of $v$ in $S$,

331:  we obtain the probability that $v$ is a dead end with respect to $u$ in $S$,

332: which is

333: \begin{equation}

334:    1-h(a,b)+h(a,b)\qin{b}.

335: \end{equation}

336: And since the probability that $v$ has degree $b$ is given by (\ref{eq:neigh}),

337: the probability that a given neighbor of a degree-$a$ node is a dead end with

338: respect to it in $S$, which we denote by $\dein{a}$, is

339: \begin{equation}

340:    \dein{a} = \sum_{b=1}^{n-1} \left(1-h(a,b)+h(a,b)\qin{b}\right) \frac{bP_G(b)}{Z_G}.

341:    \label{eq:dein}

342: \end{equation}

343: Because a node belongs to $\GIN{S}$ if and only if at least one of its neighbors

344: in $G$ is not a dead end with respect to it in $S$, we arrive at

345: \begin{equation}

346:    \gin{S} = 1 - \sum_{a=0}^{n-1} (\dein{a})^a P_G(a).

347:    \label{eq:gin}

348: \end{equation}

349:

350: As a means to calculate $\qin{b}$, let us consider a degree-$b$ node $v$ of $G$

351: reached by following a directed edge $(u \to v)$ of $S$. The reach of $u$

352: through $v$ in $S$ is negligible, which happens with probability $\qin{b}$, if

353: and only if all of the other $b-1$ neighbors of $v$ in $G$ (i.e., excluding $u$)

354: are themselves dead ends with respect to $v$ in $S$. This clearly leads to

355: \begin{equation}

356:    \qin{b} = (\dein{b})^{b-1}.

357:    \label{eq:qin}

358: \end{equation}

359: Equations (\ref{eq:dein}) and (\ref{eq:qin}) can be put together to yield

360: another equation where $\dein{a}$ is a function of all the other $\dein{}$'s.

361: This equation can then be solved numerically to obtain $\gin{S}$ via

362: (\ref{eq:gin}).

363:

364: We can follow a completely analogous derivation and obtain $\gout{S}$ by noting

365: that a node belongs to $\GOUT{S}$ if and only if it can be reached from a

366: size-$\Theta(n)$ set of nodes. Let $u$ be a degree-$a$ node of $G$ and $v$ a

367: neighbor of $u$ in $G$. We denote by $\deout{a}$ the probability that either

368: $u$ is not an out-neighbor of $v$ in $S$ or is but the number of nodes that can

369: reach $u$ through $v$ in $S$ is small, consisting of only $o(n)$ nodes. Also, we

370: denote by $\qout{b}$ the conditional probability that the number of nodes that

371: can reach $u$ through $v$ in $S$ is small, given that the degree of $v$ in $G$

372: is $b$ and $u$ is an out-neighbor of $v$. In a way analogous to the one that led

373: to (\ref{eq:dein}), (\ref{eq:gin}), and (\ref{eq:qin}), we obtain

374: \begin{equation}

375:    \deout{a} = \sum_{b=1}^{n-1} \left(1-h(b,a)+h(b,a)\qout{b}\right)

376:    \frac{bP_G(b)}{Z_G}, \label{eq:deout}

377: \end{equation}

378: \begin{equation}

379:    \gout{S} = 1 - \sum_{a=0}^{n-1} (\deout{a})^a P_G(a),

380:    \label{eq:gout}

381: \end{equation}

382: and

383: \begin{equation}

384:    \qout{b} = (\deout{b})^{b-1}.

385:    \label{eq:qout}

386: \end{equation}

387: Also, and identically to the derivation of $\gin{S}$, we can unify

388: (\ref{eq:deout}) and (\ref{eq:qout}) and calculate the value of each

389: $\deout{a}$ numerically to obtain $\gout{S}$ via (\ref{eq:gout}).

390:

391: %===============================================================================

392: \subsection{Expected vulnerability} \label{sec:vulnerability}

393:

394: Consistently with the simplifying assumptions of Section~\ref{sec:spread}, we

395: keep assuming that no node is immunized when the originator does not belong to

396: $\GIN{S}$. When this happens, all nodes of $\GCC{G}$ remain vulnerable to the

397: virus, and if the virus infects a node of $\GCC{G}$ it may propagate until the

398: entire $\GCC{G}$ is infected. Let us analyze the case in which the originator

399: does belong to $\GIN{S}$.

400:

401: As before, we assume that only the nodes of $\GOUT{S}$ receive the vaccine. Let

402: $V$ be an undirected subgraph of $G$ that spans all the nodes of $G$, and let an

403: edge $(u,v)$ of $G$ belong to $V$ if and only if neither $u$ nor $v$ belongs to

404: $\GOUT{S}$. That is, given a certain instance of the subgraph $S$, subgraph $V$

405: contains all the edges of $G$ that are not incident to nodes of $\GOUT{S}$.

406: Clearly, the edges of $V$ represent the edges through which the virus may

407: propagate if it reaches either of an edge's (unimmunized) end nodes.

408: Figure~\ref{fig:ggc} illustrates the subgraph $V$ corresponding to the $G$ and

409: $S$ instances of Figure~\ref{fig:ggi}.

410:

411: \begin{figure*}[!t]

412:    \centering

413:    \includegraphics[scale=\graphDraw]{imgs/gc.eps}

414:    %\hspace{\stretch{1}}

415:    %\includegraphics[scale=\graphDraw]{imgs/gc1.eps} \hspace{\stretch{1}}

416:    %\includegraphics[scale=\graphDraw]{imgs/gc2.eps} \hspace{\stretch{1}}

417:    \caption{The graph $V$ that corresponds to the $G$ and $S$ instances of

418: Figure~\ref{fig:ggi}. Nodes represented by filled circles or filled squares

419: belong to $\GOUT{S}$.}

420:    \label{fig:ggc}

421: \end{figure*}

422:

423: Once again, and similarly to the case of $G$, a criterion exists for deciding

424: whether a size-$\Theta(n)$ connected component almost surely exists in $V$. We

425: denote such a component by $\GCC{V}$. When it does exist, and since all the

426: other connected components of $V$ contain with high probability only $o(n)$

427: nodes (which we again neglect), a virus may only proliferate into a large,

428: size-$\Theta(n)$ set of nodes if it first infects a node of $\GCC{V}$. This, of

429: course, is predicated upon the originator being in $\GIN{S}$ and dissemination

430: taking place exclusively inside $\GOUT{S}$, the assumptions of

431: Section~\ref{sec:spread}.

432:

433: We define the expected vulnerability of $\GCC{G}$, denoted by $\wPv$, as the

434: fraction of the nodes of $\GCC{G}$ that may become infected when the virus

435: attempts to infect a randomly chosen node of $\GCC{G}$. Let $\gcc{V}$ be the

436: fraction of the nodes of $G$ that belong to $\GCC{V}$. If the originator does

437: not belong to $\GIN{S}$ (which occurs with probability $1-\gin{S}/\gcc{G}$),

438: then $\wPv=1$; if it does belong to $\GIN{S}$ (with probability

439: $\gin{S}/\gcc{G}$), then $\wPv=\gcc{V}/\gcc{G}$ if and only if the virus first

440: infects a node of $\GCC{V}$, which occurs with probability $\gcc{V}/\gcc{G}$. We

441: then have

442: \begin{equation}

443:    \wPv = 1-\frac{\gin{S}}{\gcc{G}} + \frac{\gin{S}}{\gcc{G}}\left(\frac{\gcc{V}}{\gcc{G}}\right)^2.

444:    \label{eq:wPv}

445: \end{equation}

446:

447: Henceforth in this section we concentrate on calculating $\gcc{V}$ for the case

448: in which $\GCC{V}$ does exist. Clearly, a node of $G$ belongs to $\GCC{V}$ only

449: if it does not belong to $\GOUT{S}$. Through the remainder of the section, let

450: $u$ be a degree-$a$ node of $G$ that does not belong to $\GOUT{S}$ and $v$ a

451: neighbor of $u$ in $G$. Given that $v$ has degree $b$, we define $\wH{a}{b}$ as

452: the probability that the edge $(v \to u)$ exists in $S$. Since $u$ does not

453: belong to $\GOUT{S}$, node $v$ must be such that it satisfies one of the

454: following conditions: either edge $(v \to u)$ does not exist in $S$, which

455: happens with probability $1-h(b,a)$, or $(v \to u)$ exists in $S$ but the number

456: of nodes that can reach $u$ through $v$ is small, which occurs with probability

457: $h(b,a)\qout{b}$. We can then express $\wH{a}{b}$ as the ratio of the

458: probability that the latter condition is satisfied to the probability that

459: either the former or the latter is. This leads to

460: \begin{equation}

461:    \wH{a}{b} = \frac{h(b,a)\qout{b}}{1 - h(b,a) + h(b,a)\qout{b}}.

462:    \label{eq:wh}

463: \end{equation}

464:

465: Now let $\wP{a}{b}$ be the probability that $v$ has degree $b$ in $G$. Clearly,

466: $\wP{a}{b}$ is proportional to the joint probability that $v$ satisfies one of

467: the above conditions regarding the existence of edge $(v \to u)$ in $S$ and also

468: that a node's neighbor in $G$ has degree $b$. That is, $\wP{a}{b}$ is

469: proportional to

470: $\left(1-h(b,a)+h(b,a)\qout{b}\right)bP_G(b)/Z_G$. Using (\ref{eq:deout}), we

471: obtain

472: \begin{equation}

473:    \wP{a}{b} = \left(\frac{1-h(b,a)+h(b,a)\qout{b}}{\deout{a}}\right) \frac{bP_G(b)}{Z_G}.

474:    \label{eq:wp}

475: \end{equation}

476:

477: Let $b$ be the degree of $v$ in $G$. Because $u$ does not belong to $\GOUT{S}$,

478: nodes $u$ and $v$ are neighbors in $V$ if and only if $v$ does not belong to

479: $\GOUT{S}$ either. If $(v \to u)$ is an edge of $S$, which occurs with

480: probability $\wH{a}{b}$, then $v$ is obviously not in $\GOUT{S}$, as it would

481: otherwise make $u$ belong to $\GOUT{S}$ along with it. On the other hand, if

482: $(v \to u)$ is not an edge of $S$ (with probability $1-\wH{a}{b}$), then $v$

483: does not belong to $\GOUT{S}$ if and only if the number of nodes that can reach

484: it in $S$ is small, which happens with probability $\qout{b}$. It follows that

485: the probability that $u$ and $v$ are neighbors in $V$ is given by

486: \begin{equation}

487:    \wH{a}{b} + (1 - \wH{a}{b})\qout{b}.

488:    \label{eq:aux30}

489: \end{equation}

490: When $u$ and $v$ are indeed neighbors in $V$, we define $\qgcc{b}$ as the

491: probability that $u$ has a small reach in $V$ through $v$. We say that $v$ is a

492: dead end with respect to $u$ in $V$ if either $v$ is not a neighbor of $u$ in

493: $V$, which occurs with probability

494: $1-\left[\wH{a}{b} + (1 - \wH{a}{b})\qout{b}\right]$, or it is but the reach of

495: $u$ through $v$ in $V$ is small, which occurs with probability

496: $\left[\wH{a}{b} + (1 - \wH{a}{b})\qout{b}\right]\qgcc{b}$. Thus, the

497: probability that $v$ is a dead end with respect to $u$ in $V$ is

498: \begin{eqnarray}

499:    \lefteqn{1-\left[\wH{a}{b} + (1 - \wH{a}{b})\qout{b}\right] +

500:    \left[\wH{a}{b} + (1 - \wH{a}{b})\qout{b}\right]\qgcc{b}}

501:    \hspace{1.25in}\nonumber\\

502:    && = \wH{a}{b}\qgcc{b} + (1-\wH{a}{b})(1 - \qout{b} + \qout{b}\qgcc{b}),

503:    \label{eq:wgccV}

504: \end{eqnarray}

505: so the probability that a neighbor of $u$ is a dead end with respect to $u$ in

506: $V$, which we denote by $\degcc{a}$, is clearly

507: \begin{equation}

508:    \degcc{a} = \sum_{b=1}^{n-1} \left[\wH{a}{b}\qgcc{b} + (1-\wH{a}{b})(1 - \qout{b} + \qout{b}\qgcc{b})\right]\wP{a}{b}.

509:    \label{eq:degccV}

510: \end{equation}

511:

512: In order to calculate $\qgcc{b}$, notice that the reach of $u$ through $v$ in

513: $V$ is small if and only if all other $b-1$ neighbors of $v$ in $G$ are

514: themselves dead ends with respect to $v$ in $V$. Then, assuming that the degrees

515: of a node's neighbors in $G$ remain independent from one another even under the

516: condition that the node does not belong to $\GOUT{S}$, we have

517: \begin{equation}

518:    \qgcc{b} = (\degcc{b})^{b-1}.

519:    \label{eq:qgccV}

520: \end{equation}

521: Putting (\ref{eq:degccV}) and (\ref{eq:qgccV}) together leads to an equation

522: where $\degcc{a}$ is a function of all the other $\degcc{}$'s, which can then be

523: solved numerically for $0 \leq a \leq n-1$.

524:

525: We are, finally, in position to calculate the value of $\gcc{V}$. Let $u$ be a

526: randomly chosen node of $G$ having degree $a$. In order to belong to $\GCC{V}$,

527: node $u$  must not belong to $\GOUT{S}$, which occurs with probability

528: $\left(\deout{a}\right)^a$. Furthermore, $u$ belongs to $\GCC{V}$ only if at

529: least one of its neighbors is not a dead end with respect to it in $V$, which

530: occurs with probability $1-(\degcc{a})^a$. It then follows that

531: \begin{equation}

532:    \gcc{V} = \sum_{a=0}^{n-1} (\deout{a})^a \left[1 - (\degcc{a})^a\right] P_G(a).

533:    \label{eq:gccV}

534: \end{equation}

535:

536: %===============================================================================

537: %===============================================================================

538: %===============================================================================

539: \section{The heuristic function} \label{sec:heur}

540:

541: The efficiency of heuristic flooding as a means of immunizing a network depends

542: heavily on the choice of the heuristic function $h(a,b)$. Before introducing our

543: heuristic function, we elaborate on the properties of subgraph $S$ that we may

544: expect to lead to good results for $\wPs$ and $\wPv$.

545:

546: First of all, it is clear that $S$ must be above the phase transition that gives

547: rise to $\GSCC{S}$, thereby guaranteeing that $\GSCC{S}$, $\GIN{S}$, and

548: $\GOUT{S}$ almost surely exist. When this is the case, the nodes of $\GIN{S}$

549: are the most suitable ones for being the originator, as they can immunize a

550: non-negligible number of nodes. But since we cannot assume any prior information

551: on the originator, $\GIN{S}$ should contain as many nodes as possible in order

552: to make the probability that the originator is chosen from outside it as small

553: as possible. With regard to $\GOUT{S}$, we know that it contains the nodes that

554: receive the vaccine when the originator belongs to $\GIN{S}$. In order to

555: prevent an excessive number of nodes from receiving the vaccine, the size of

556: $\GOUT{S}$ should be kept to modest values. Putting these two observations

557: together, we ideally want $\GIN{S}$ to span all the nodes of the network,

558: $\GSCC{S}$ to contain only the nodes that can more efficiently block the

559: spreading of an infection, and $\GOUT{S}$ to be the same as $\GSCC{S}$.

560:

561: Since we know that immunizing the nodes with the highest degrees is an efficient

562: way to prevent epidemics in scale-free networks

563: \cite{albert2000,cohen2001,satorras2003}, we introduce, in this section, a

564: heuristic function that stimulates the transmission of the vaccine to

565: high-degree nodes. Introducing a parameter $\alpha \geq 0$, and considering a

566: degree-$a$ node $u$ that has the vaccine and a degree-$b$ neighbor $v$ of $u$,

567: our heuristic function $h(a,b)$, which  gives the probability that $u$ sends the

568: vaccine to $v$, is defined as follows:

569: \begin{itemize}

570: \item If $b=1$, that is, $v$ has no neighbor besides $u$, then $h(a,b)=0$ and

571: $u$ deterministically decides not to send the vaccine to $v$. In this case,

572: since $u$ is already immune, should $v$ become infected it can transmit the

573: virus to no other node, so we choose not to give $v$ the vaccine.

574: \item If $a \leq 2 \leq b$, that is, $u$ has degree at most $2$ and $v$ has

575: degree at least $2$, then $h(a,b)=1$ and $u$ deterministically decides to send

576: the vaccine to $v$. This is meant to force some low-degree nodes to forward the

577: vaccine, thereby precluding a premature conclusion of heuristic flooding and, as

578: a consequence, leading to a larger $\GIN{S}$.

579: \item For all the other positive values of $a$ and $b$, we let

580: \begin{equation}

581:    h(a,b)=\tanh\left(\frac{b-1}{\left(a-2\right)^\alpha}\right).

582:    \label{eq:h}

583: \end{equation}

584: \end{itemize}

585:

586: Figure~\ref{fig:h} shows two plots illustrating this heuristic function for

587: $\alpha=0.7$ (part~(a)) and $\alpha=1.0$ (part~(b)). Clearly, for fixed $a>2$,

588: $h(a,b)$ increases with $b$, so the vaccine is more likely to be transmitted to

589: high-degree nodes. For fixed $b>1$, $h(a,b)$ decreases with $a$, thus reflecting

590: the intuition that, when $u$ is a high-degree node, sending the vaccine to $v$

591: may be unnecessary even if $v$ is a high-degree node (there are probably other

592: paths through which the vaccine can be transmitted from $u$ to $v$).

593:

594: \begin{figure*}[!t]

595:    \centering

596:    \begin{tabular}{c}

597:    \includegraphics[height=\heurHeight]{imgs/h07.eps}\\

598:    \\

599:    \includegraphics[height=\heurHeight]{imgs/h10.eps}

600:    \end{tabular}

601:    \caption{Plots of the heuristic function given by (\ref{eq:h}) for

602: $\alpha=0.7$ (a) and $\alpha=1.0$ (b).}

603:    \label{fig:h}

604: \end{figure*}

605:

606: %===============================================================================

607: %===============================================================================

608: %===============================================================================

609: \section{Simulation results} \label{sec:sim}

610:

611: We have conducted extensive simulations on random graphs with node degrees

612: distributed according to a power law. Generating such a graph is achieved in two

613: phases \cite{newman2001}. Let $u_1, u_2, \ldots, u_n$ be the nodes of the random

614: graph we want to generate. In the first phase, for $i=1,\ldots,n$ we sample the

615: degree $d_i$ of each $u_i$ from the power-law distribution, obtaining the

616: so-called degree sequence of the graph. If $\sum_{i=1}^n d_ i$ turns out to be

617: odd, then we discard the entire degree sequence and sample a new one, repeating

618: the process until the sum of the degrees comes out even. In the second phase, we

619: consider an imaginary urn having $\sum_{i=1}^n d_i$ labeled balls, the labels of

620: $d_i$ of them being $u_i$. We then successively remove pairs of balls from the

621: urn until it has no more balls. For each pair we remove---say, of labels $u_i$

622: and $u_j$---we add edge $(u_i,u_j)$ to the graph. This method can produce graphs

623: having multiple edges or self-loops, but it has the advantage of generating

624: graphs whose degrees remain independent even after the edges are added, which is

625: a core assumption of our analysis.

626:

627: We carried out our simulations for $n=10000$ and $2 \leq \tau \leq 3$. For each

628: value of $\tau$, we generated $500$ $G$ instances. For each $G$ instance, we

629: used the heuristic $h(a,b)$ to both sample $1000$ instances of the subgraph $S$

630: and, in an independent way, conduct $1000$ vaccine disseminations by heuristic

631: flooding from an originator selected randomly among the nodes of the largest

632: connected component of the $G$ instance. For each $S$ instance, we selected the

633: largest strongly connected component and calculated the sizes of the

634: corresponding in-component (counting the nodes that can reach the strongly

635: connected component) and out-component (counting the nodes that can be reached

636: from the strongly connected component). We then obtained the expected sizes of

637: $\GIN{S}$ and $\GOUT{S}$ by averaging these quantities over the $500000$

638: samples. For each vaccine dissemination, we calculated the fraction of nodes

639: that receive the vaccine and the fraction of nodes to which an infection may

640: spread when an attempt at infecting a randomly chosen node inside the largest

641: connected component of $G$ takes place. We then obtained $\wPs$ and $\wPv$ by

642: averaging these quantities over the $500000$ samples.

643:

644: Simulation results are shown in Figure~\ref{fig:sim} for

645: $\alpha=0.1,0.4,0.7,1.0$. We note, in general, a satisfactory agreement between

646: analytic and simulation results, with the exception of part~(d), in which case

647: the deviation may be attributed to the approximations made during the derivation

648: of $\gcc{V}$ in Section~\ref{sec:vulnerability} to yield (\ref{eq:gccV}).

649:

650: \begin{figure*}[!t]

651:    \centering

652:    \begin{tabular}{rr}

653:    \includegraphics[height=\graphHeight]{imgs/graph_powerlaw_worm2_gin_nolookahead.eps} &

654:    \includegraphics[height=\graphHeight]{imgs/graph_powerlaw_worm2_gout_nolookahead.eps} \\

655:    \includegraphics[height=\graphHeight]{imgs/graph_powerlaw_worm2_immuned_nolookahead.eps} &

656:    \includegraphics[height=\graphHeight]{imgs/graph_powerlaw_worm2_new_nolookahead.eps}

657:    \end{tabular}

658:    \caption{Simulation results of vaccine dissemination by heuristic flooding.

659: Solid lines give the analytic predictions.}

660:    \label{fig:sim}

661: \end{figure*}

662:

663: When $\tau \leq 2.5$, the plots for $\gin{S}/\gcc{G}$ and $\gout{S}/\gcc{G}$

664: (Figure~\ref{fig:sim}(a,b)) reveal that the heuristic function introduced in

665: Section~\ref{sec:heur} results in a $\GIN{S}$ that spans almost all the nodes of

666: $\GCC{G}$, while the size of $\GOUT{S}$ keeps to a relatively modest fraction of

667: $\GCC{G}$. For example, for $\tau \leq 2.5$ and $\alpha=1.0$, the relative size

668: of $\GIN{S}$ is always above $0.97$ and the relative size of $\GOUT{S}$ is

669: always below $0.13$. For $\tau > 2.5$, the relative size of $\GIN{S}$ decreases

670: with $\tau$, thus evidencing that heuristic flooding has more difficulty

671: disseminating the vaccine when the graph is sparser.

672:

673: Owing to $\wPs$ being given by $(\gin{S}/\gcc{G})(\gout{S}/\gcc{G})$

674: (cf.\ (\ref{eq:wPi})), and to $\gin{S}/\gcc{G}$ being relatively close to $1$

675: (Figure~\ref{fig:sim}(a)), the plots for $\wPs$ (Figure~\ref{fig:sim}(c)) are of

676: course similar to the plots for $\gout{S}/\gcc{G}$ (Figure~\ref{fig:sim}(b)).

677: Furthermore, given a value of $\alpha$, $\wPs$ decreases with $\tau$, which

678: means that heuristic flooding spreads through a smaller number of nodes when the

679: graph is sparser, as, in this case, there are less paths conducting to the

680: high-degree nodes.

681:

682: As for $\wPv$ (Figure~\ref{fig:sim}(d)), we note that, for $\tau \leq 2.5$,

683: $\wPv$ is nearly zero. This result is a natural consequence both of the guiding

684: principle of the heuristic introduced in Section~\ref{sec:heur}, which ascribes

685: more probability for transmitting the vaccine to nodes having higher degrees,

686: and of the result for $\gin{S}/\gcc{G}$ (Figure~\ref{fig:sim}(a)), which

687: indicates that $\GIN{S}$ spans almost all the nodes of $\GCC{G}$. As $\tau$ is

688: increased to values greater than $2.5$, $\wPv$ moves farther away from zero,

689: since the size of $\GIN{S}$ decreases and, therefore, the probability that

690: heuristic flooding distributes the vaccine to only a small number of nodes

691: increases. Regarding the value of $\alpha$, we note a clear trade-off between

692: $\wPs$ and $\wPv$. If we were to adjust $\alpha$ in such a way as to decrease

693: $\wPs$, we would have an increase in $\wPv$, which shows that the number of

694: immunized nodes has a direct impact on the resulting vulnerability of the

695: network.

696:

697: %===============================================================================

698: %===============================================================================

699: %===============================================================================

700: \section{Conclusion} \label{sec:conc}

701:

702: We have considered in this paper the problem of immunizing a scale-free network

703: against a virus or worm. We introduced a new immunization strategy, one that we

704: believe reflects more accurately what happens in real scenarios. In our

705: strategy, we assume that the vaccine enters the network at exactly one node, in

706: general the site of the vaccine's development or the site in charge of its

707: distribution, for example. This node begins the dissemination of the vaccine by

708: heuristic flooding, aiming at immunizing the nodes that have the highest

709: degrees. With this purpose in mind, we introduced a heuristic function that

710: gives more probability to forwarding the vaccine toward nodes with higher

711: degrees.

712:

713: We obtained analytical and simulation results on random graphs having node

714: degrees distributed according to a power law. Our mathematical analysis has

715: innovative aspects that we expect may shed some light on obtaining analytical

716: results for similar distributed algorithms. Also, we hope our analysis can

717: contribute to the development of new heuristic functions for vaccine

718: dissemination. With regard to our simulation results, they show satisfactory

719: agreement with our mathematical analysis and highlight the expected trade-off

720: between the number of nodes that receive the vaccine and the vulnerability of

721: the network to future infections. Especially for power laws with relatively

722: small value for the parameter $\tau$, our heuristic function achieves very good

723: results, making the network practically invulnerable to an epidemic while

724: requiring the immunization of only roughly $10\%$ of the nodes.

725:

726: We note, finally, that one possible direction in which this paper's research may

727: be extended, in addition to the search for other heuristic functions, is that of

728: allowing for multiple concurrent initiators. While algorithmically (i.e., from

729: the perspective of flooding the network) such an extension is trivial, extending

730: the analysis of Section~\ref{sec:math} is expected to be a significantly more

731: complex endeavor.

732:

733: \subsection*{Acknowledgments}

734:

735: The authors acknowledge partial support from CNPq, CAPES, and a FAPERJ BBP

736: grant.

737:

738: \bibliography{imm}

739: \bibliographystyle{plain}

740:

741: \end{document}

742:

743: