1: \documentclass{article}
2: \usepackage{graphicx}
3:
4: \widowpenalty=10000
5: \clubpenalty=10000
6:
7: \newcommand{\graphHeight}{5cm}
8: \newcommand{\heurHeight}{4.4cm}
9: \newcommand{\graphDraw}{1}
10:
11: \newcommand{\wH}[2]{{h_{#2\mid #1}}}
12: \newcommand{\wP}[2]{p_{#2\mid #1}}
13: \newcommand{\wPs}[0]{P_{\mathrm{s}}}
14: \newcommand{\wPv}[0]{P_{\mathrm{v}}}
15: \newcommand{\qgcc}[1]{{q_{#1}^{V}}}
16: \newcommand{\wgcc}[2]{{w_{#1,#2}^{V}}}
17: \newcommand{\degcc}[1]{{w_{#1}^{V}}}
18:
19: \newcommand{\GCC}[1]{{\mathrm{GCC}_{#1}}}
20: \newcommand{\GOUT}[1]{{\mathrm{GOUT}_{#1}}}
21: \newcommand{\GIN}[1]{{\mathrm{GIN}_{#1}}}
22: \newcommand{\GWCC}[1]{{\mathrm{GWCC}_{#1}}}
23: \newcommand{\GSCC}[1]{{\mathrm{GSCC}_{#1}}}
24: \newcommand{\gcc}[1]{\theta_{#1}}
25: \newcommand{\gin}[1]{{\theta_{#1}^{\mathrm{in}}}}
26: \newcommand{\gout}[1]{{\theta_{#1}^{\mathrm{out}}}}
27: \newcommand{\qin}[1]{{q_{#1}^{\mathrm{in}}}}
28: \newcommand{\qout}[1]{{q_{#1}^{\mathrm{out}}}}
29: \newcommand{\win}[2]{{w_{#1,#2}^{\mathrm{in}}}}
30: \newcommand{\wout}[2]{{w_{#1,#2}^{\mathrm{out}}}}
31: \newcommand{\dein}[1]{{w_{#1}^{\mathrm{in}}}}
32: \newcommand{\deout}[1]{{w_{#1}^{\mathrm{out}}}}
33:
34: \begin{document}
35:
36: \title{A Dissemination Strategy for Immunizing Scale-Free Networks}
37:
38: \author{Alexandre~O.~Stauffer\\
39: Valmir~C.~Barbosa\thanks{Corresponding author (valmir@cos.ufrj.br).}\\
40: \\
41: Universidade Federal do Rio de Janeiro\\
42: Programa de Engenharia de Sistemas e Computa\c c\~ao, COPPE\\
43: Caixa Postal 68511\\
44: 21941-972 Rio de Janeiro - RJ, Brazil}
45:
46: \date{}
47:
48: \maketitle
49:
50: \begin{abstract}
51: We consider the problem of distributing a vaccine for immunizing a scale-free
52: network against a given virus or worm. We introduce a new method, based on
53: vaccine dissemination, that seems to reflect more accurately what is expected to
54: occur in real-world networks. Also, since the dissemination is performed using
55: only local information, the method can be easily employed in practice. Using a
56: random-graph framework, we analyze our method both mathematically and by means
57: of simulations. We demonstrate its efficacy regarding the trade-off between the
58: expected number of nodes that receive the vaccine and the network's resulting
59: vulnerability to develop an epidemic as the virus or worm attempts to infect one
60: of its nodes. For some scenarios, the new method is seen to render the network
61: practically invulnerable to attacks while requiring only a small fraction of the
62: nodes to receive the vaccine.
63:
64: \bigskip
65: \noindent
66: \textbf{Keywords:} Network immunization, Random networks, Heuristic flooding.
67: \end{abstract}
68:
69: %===============================================================================
70: %===============================================================================
71: %===============================================================================
72: \section{Introduction} \label{sec:intro}
73:
74: The term ``scale-free'' is widely used to designate the class of networks that
75: have node degrees distributed as a power law \cite{barabasi1999,newman2003b},
76: according to which the probability that a randomly chosen node has degree $a$ is
77: proportional to $a^{-\tau}$ for some parameter $\tau>0$. There has been a recent
78: surge of interest in scale-free networks, as a great variety of real-world
79: networks, like the Internet, the WWW, social networks, and
80: scientific-collaboration networks, have been empirically observed to have
81: node-degree distributions that approximately follow a power law
82: \cite{faloutsos1999,albert2002}. In contrast with the classical random-graph
83: model introduced by Erd\H{o}s and Rényi, whose node-degree distribution is the
84: Poisson distribution, therefore sharply concentrated around its mean value
85: \cite{erdos1959,bollobas2002}, scale-free networks normally contain nodes with
86: a wide range of degrees, typically with a few nodes of extremely high degrees
87: coexisting with a plethora of low-degree nodes.
88:
89: In this paper we consider the problem of preventing viruses or worms from
90: spreading on scale-free computer networks. The fact that node degrees are in
91: this case distributed according to a power law has profound impact on the way
92: the network operates. In particular, it makes the problem of fighting the
93: proliferation of viruses and other infections much more challenging, since the
94: presence of high-degree nodes dramatically increases the rate at which a virus
95: may propagate \cite{satorras2001,satorras2003}. For this reason, instead of
96: combating the proliferation of a virus in an already infected network, we
97: consider a preventive immunization strategy, which consists of distributing the
98: appropriate vaccine to a small subset of the network's nodes, striving to
99: immunize those nodes that can more efficiently block the spread of a future
100: infection. The goal of this approach is to distribute the vaccine to as few
101: nodes as possible while making the network invulnerable to an epidemic, that is,
102: to the occurrence of a state in which a relatively large number of nodes is
103: infected.
104:
105: We can measure the efficacy of an immunization strategy by two indicators: the
106: expected spread, which is the expected fraction of the network's nodes that
107: receive the vaccine, and the expected vulnerability, which is the expected
108: fraction of the network's nodes that may become infected when the virus attempts
109: to infect a randomly chosen node of the immunized network. Clearly, these two
110: indicators are strongly influenced by how we select the nodes to receive the
111: vaccine. A simple rule for choosing these nodes is to randomly select a given
112: fraction of the network's nodes \cite{albert2000,cohen2000,satorras2003}. When
113: applied to scale-free networks, we know that this rule normally gives
114: unsatisfactory results, as it only achieves a reasonably small expected
115: vulnerability for prohibitively high expected spreads. An alternative rule
116: consists of distributing the vaccine to all the nodes that have degrees greater
117: than a given value \cite{albert2000,cohen2001,satorras2003}. Despite being more
118: efficient for scale-free networks than the previous strategy, as it achieves
119: quite a small expected vulnerability with only a modest expected spread,
120: applying this rule to real-world networks is known to be usually difficult
121: \cite{cohen2003}. The use of this rule demands global knowledge regarding the
122: location of the nodes having the highest degrees, while the nodes of many
123: real-world networks may only be assumed to have information that can be directly
124: inferred from their immediate neighborhoods. Yet another alternative is to
125: randomly choose some of the network's nodes and, for each of them, to immunize a
126: randomly chosen fraction of its neighbors \cite{cohen2003}. This rule, however,
127: and in fact the previous two as well, seems hard to implement in practice on
128: computer networks, since apparently it requires that the vaccine be somehow
129: transmitted to a given fraction of the network's nodes by means other than the
130: network's own.
131:
132: In this paper, we assume that the vaccine enters the network at a single node,
133: called the originator. We attribute to this node the responsibility of starting
134: the dissemination of the vaccine by initiating the method called heuristic
135: flooding for disseminating information in networks \cite{stauffer2004}. Let $u$
136: be the originator. For each neighbor $v$ of $u$, this method prescribes that $u$
137: forward the vaccine to $v$ with probability given by a heuristic function
138: $h(a,b)$, where $a$ and $b$ are, respectively, the degrees of $u$ and
139: $v$.\footnote{We assume $h(a,b)=0$ if $a=0$ or $b=0$.} Each of the nodes that
140: receive the vaccine, when receiving it for the first time, proceeds likewise and
141: probabilistically forwards the vaccine to its own neighbors. By not requiring
142: that the nodes of the network have information beyond what can be inferred from
143: their immediate neighborhoods, this strategy can be easily used in practice.
144: Furthermore, it represents more accurately what occurs in real scenarios, since
145: it does not rely on the prior selection of nodes that characterizes all the
146: three immunization strategies mentioned above, but rather assumes that the
147: vaccine spreads out of a single node (say, the very site of its development or
148: the site responsible for its distribution) via a heuristically controlled form
149: of flooding. With this set of characteristics that, in essence, make it
150: independent of any network-wide properties, the new strategy is to our knowledge
151: the first of a kind.
152:
153: We organize the remainder of the paper as follows. In Section~\ref{sec:math}, we
154: use a random-graph framework and the formalism introduced in
155: \cite{molloy1995,molloy1998,newman2001}, whose details are discussed as they are
156: needed, to obtain mathematical results for the aforementioned efficacy
157: indicators. We utilize our analytical results in Section~\ref{sec:heur} to
158: discover the properties that an ideal heuristic function should have to be
159: efficient. We then introduce a heuristic function that seeks to approximate this
160: ideal and therefore can be used to disseminate the vaccine. In
161: Section~\ref{sec:sim} we discuss simulation results on random graphs having node
162: degrees distributed according to a power law. Our results reveal that this
163: heuristic function performs very attractively for the ranges of $\tau$ (the
164: distribution's parameter) that typically are thought to hold for networks like
165: the Internet. They also agree satisfactorily with our analytical predictions. We
166: conclude in Section~\ref{sec:conc}.
167:
168: %===============================================================================
169: %===============================================================================
170: %===============================================================================
171: \section{Mathematical analysis} \label{sec:math}
172:
173: Let $G$ be a random graph having $n$ nodes, whose degrees are distributed
174: independently from one another and identically to a random variable $K_G$. We
175: assume that the nodes of $G$ are interconnected in an independent way given
176: their degrees, which therefore remain independent. We base our mathematical
177: analysis of this section on the formalism introduced in \cite{newman2001} and
178: target the case in which $G$ has a formally infinite number of nodes.
179:
180: Let $P_G(a)$ be the probability that a randomly chosen node of $G$ has degree
181: $a$, i.e., the probability that $K_G=a$. The average degree in $G$, denoted by
182: $Z_G$, is clearly
183: \begin{equation}
184: Z_G = \sum_{a=0}^{n-1} a P_G(a).
185: \end{equation}
186: Given that the degrees of two adjacent nodes are independent from each other,
187: the probability that some node's neighbor has degree $b$ is identical to the
188: expected fraction of edges incident to degree-$b$ nodes, which is given by
189: \begin{equation}
190: \frac{b P_G(b)}{\sum_{a=0}^{n-1}a P_G(a)} = \frac{b P_G(b)}{Z_G}.
191: \label{eq:neigh}
192: \end{equation}
193:
194: From \cite{molloy1995,newman2001}, a necessary and sufficient condition for a
195: size-$\Theta(n)$ connected component to almost surely exist in $G$ is that
196: \begin{equation}
197: \sum_{b=1}^{n-1} (b-1) \frac{b P_G(b)}{Z_G} > 1,
198: \label{eq:phase}
199: \end{equation}
200: which intuitively means that, given a randomly chosen node $u$ of $G$, a
201: size-$\Theta(n)$ connected component exists almost surely if and only if a
202: neighbor of $u$ is expected to have more than $1$ neighbor besides $u$. When
203: (\ref{eq:phase}) is satisfied, we denote the size-$\Theta(n)$ connected
204: component of $G$ (its giant connected component) by $\GCC{G}$. Also, all the
205: other connected components of $G$ are small with high probability, comprising
206: only $o(n)$ nodes, and $G$ is said to be above the phase transition that gives
207: rise to $\GCC{G}$. On the other hand, when (\ref{eq:phase}) is not satisfied,
208: $G$ is said to be below the phase transition that gives rise to $\GCC{G}$ and
209: all of its connected components are small with high probability, consisting each
210: of $o(n)$ nodes.
211:
212: Given a randomly chosen node $u$ of $G$ and a neighbor $v$ of $u$, we define the
213: reach of $u$ through $v$ as the set of nodes that can be reached by a path
214: starting at $u$ and whose first edge is $(u,v)$. A node belongs to $\GCC{G}$ if
215: and only if it has at least one neighbor through which its reach contains a
216: large, size-$\Theta(n)$ number of nodes. Let $q$ be the probability that a node
217: has a small, size-$o(n)$ reach through a given neighbor. The probability that a
218: degree-$a$ node belongs to $\GCC{G}$ is then $1-q^a$, and the probability that a
219: randomly chosen node of $G$ belongs to $\GCC{G}$, which we denote by $\gcc{G}$,
220: is
221: \begin{equation}
222: \gcc{G} = 1 - \sum_{a=0}^{n-1} q^a P_G(a).
223: \label{eq:gccG}
224: \end{equation}
225: The probability $q$ that $u$ has a small, size-$o(n)$ reach through $v$ can be
226: obtained from the probability that $v$ itself has a small, size-$o(n)$ reach
227: through each of its other neighbors (i.e., excluding $u$). Since the probability
228: that two neighbors of $u$ have another common neighbor (i.e., besides $u$)
229: varies with $n$ proportionally to $n^{-1}$ \cite{newman2001}, which for large
230: $n$ is negligible, the probability that $v$ has a small, size-$o(n)$ reach
231: through a given neighbor is also $q$, thus leading to
232: \begin{equation}
233: q = \sum_{b=1}^{n-1} q^{b-1} \frac{b P_G(b)}{Z_G}.
234: \end{equation}
235: This equation can be solved numerically and then used in (\ref{eq:gccG}) to
236: obtain $\gcc{G}$.
237:
238: From now on, we assume that $G$ is above the phase transition and, therefore,
239: $\GCC{G}$ exists with high probability. Furthermore, since $G$ can be
240: unconnected and real-world computer networks are normally connected, we assume
241: that it is the graph induced by $\GCC{G}$, rather than $G$ itself, that models
242: the network, and also condition the remainder of our analysis accordingly.
243:
244: %===============================================================================
245: \subsection{Expected spread} \label{sec:spread}
246:
247: In this section, we calculate the expected spread in $\GCC{G}$, which is denoted
248: by $\wPs$ and consists of the expected fraction of the nodes of $\GCC{G}$ that
249: are immunized when a vaccine is distributed using the heuristic flooding
250: described in Section~\ref{sec:intro}. We resort to the same method of analysis
251: developed in \cite{stauffer2004}. Let $S$ be a directed subgraph of $G$ that
252: spans all the nodes of $G$. For a degree-$a$ node $u$ and a degree-$b$ neighbor
253: $v$ of $u$ in $G$, the probability that the directed edge $(u \to v)$ exists in
254: $S$ is given by $h(a,b)$, the heuristic function employed during the vaccine
255: dissemination. Before proceeding to the calculation of $\wPs$, we pause for a
256: brief study of $S$.
257:
258: The neighbors of a node $u$ in $S$ can be classified into two different types:
259: the in-neighbors, those from which an edge exists directed toward $u$; and the
260: out-neighbors, those toward which an edge exists directed from $u$. If a
261: directed path exists starting at some node $u$ and ending at another node $v$,
262: then we say that $u$ reaches $v$ in $S$ or that $v$ is in the reach of $u$ in
263: $S$. Note that, if $u$ receives the vaccine, then the reach of $u$ in $S$ is
264: part of the set of nodes that become immunized.
265:
266: The connected components of a directed graph can also be of two basic types.
267: First, there are the weakly connected components, which are constituted by the
268: nodes that can reach one another by undirected paths, i.e., paths for which the
269: directions of the edges are disregarded. The other type is that of the strongly
270: connected components, each comprising a maximal set of nodes that can both reach
271: and be reached from one another.
272:
273: Similarly to the case of the undirected graph $G$, there is a criterion for
274: deciding whether $S$ almost surely has a size-$\Theta(n)$ weakly connected
275: component, commonly known as the giant weakly connected component of $S$,
276: denoted by $\GWCC{S}$. Likewise, there is another criterion according to which
277: $S$ almost surely has a size-$\Theta(n)$ strongly connected component, commonly
278: referred to as the giant strongly connected component, denoted by $\GSCC{S}$.
279: Clearly, when both $\GWCC{S}$ and $\GSCC{S}$ exist, as we henceforth assume, all
280: the nodes of $\GSCC{S}$ belong also to $\GWCC{S}$, and all the nodes of
281: $\GWCC{S}$ belong also to $\GCC{G}$.
282:
283: Since $\GSCC{S}$ exists by assumption, we can define two other size-$\Theta(n)$
284: connected components of $S$, which we refer to as the giant in-component
285: ($\GIN{S}$), formed by the nodes that can reach $\GSCC{S}$, and the giant
286: out-component ($\GOUT{S}$), formed by the nodes reachable from $\GSCC{S}$. Note
287: that, by definition, the nodes of $\GSCC{S}$ belong also to both $\GIN{S}$ and
288: $\GOUT{S}$. We denote by $\gin{S}$ and $\gout{S}$ the expected fraction of the
289: nodes of $G$ that belong to, respectively, $\GIN{S}$ and $\GOUT{S}$.
290: Figure~\ref{fig:ggi} illustrates an instance of graph $G$ having a power-law
291: node-degree distribution with $\tau=2.1$ (part~(a)) and a possible instance of
292: its directed subgraph $S$ (part~(b)).
293:
294: \begin{figure*}[!t]
295: \centering
296: \begin{tabular}{c}
297: \includegraphics[scale=\graphDraw]{imgs/g.eps}\\
298: \\
299: \includegraphics[scale=\graphDraw]{imgs/gi.eps}
300: \end{tabular}
301: \caption{A $G$ instance having a power-law node-degree distribution with
302: $\tau=2.1$ (a) and one possible instance of the directed subgraph $S$ of the $G$
303: instance (b). Part~(b) also shows the nodes belonging to $\GSCC{S}$ (filled
304: circles), $\GIN{S}$ (filled circles and triangles), and $\GOUT{S}$ (filled
305: circles and filled squares).}
306: \label{fig:ggi}
307: \end{figure*}
308:
309: Assuming that the originator is randomly chosen among the nodes of $\GCC{G}$,
310: the vaccine is guaranteed to be distributed to a size-$\Theta(n)$ set of nodes
311: if the originator belongs to $\GIN{S}$, which happens with probability
312: $\gin{S}/\gcc{G}$. When this is the case, the nodes that receive the vaccine
313: either belong to $\GOUT{S}$, corresponding to a fraction $\gout{S}/\gcc{G}$ of
314: the nodes of $\GCC{G}$, or are not in $\GOUT{S}$ despite being reachable from
315: the originator, and then amount to a small, size-$o(n)$ number of nodes.
316: Neglecting the latter nodes is equivalent to assuming that nodes receive the
317: vaccine only if the originator is in $\GIN{S}$. In this case, only the nodes in
318: $\GOUT{S}$ receive the vaccine and we have
319: \begin{equation}
320: \wPs = \frac{\gin{S}\gout{S}}{\gcc{G}^2}.
321: \label{eq:wPi}
322: \end{equation}
323:
324: In order to obtain $\gin{S}$, recall that the nodes of $\GIN{S}$ are the only
325: ones that have a non-negligible reach. Considering a degree-$a$ node $u$ of $G$
326: and a degree-$b$ neighbor $v$ of $u$ in $G$, we say that $v$ is a dead end with
327: respect to $u$ in $S$ if either $(u \to v)$ is not an edge of $S$, or it is but
328: the reach of $u$ through $v$ in $S$ is negligible, consisting of only $o(n)$
329: nodes. Denoting by $\qin{b}$ the conditional probability that the reach of $u$
330: through $v$ in $S$ is negligible given that $u$ is an in-neighbor of $v$ in $S$,
331: we obtain the probability that $v$ is a dead end with respect to $u$ in $S$,
332: which is
333: \begin{equation}
334: 1-h(a,b)+h(a,b)\qin{b}.
335: \end{equation}
336: And since the probability that $v$ has degree $b$ is given by (\ref{eq:neigh}),
337: the probability that a given neighbor of a degree-$a$ node is a dead end with
338: respect to it in $S$, which we denote by $\dein{a}$, is
339: \begin{equation}
340: \dein{a} = \sum_{b=1}^{n-1} \left(1-h(a,b)+h(a,b)\qin{b}\right) \frac{bP_G(b)}{Z_G}.
341: \label{eq:dein}
342: \end{equation}
343: Because a node belongs to $\GIN{S}$ if and only if at least one of its neighbors
344: in $G$ is not a dead end with respect to it in $S$, we arrive at
345: \begin{equation}
346: \gin{S} = 1 - \sum_{a=0}^{n-1} (\dein{a})^a P_G(a).
347: \label{eq:gin}
348: \end{equation}
349:
350: As a means to calculate $\qin{b}$, let us consider a degree-$b$ node $v$ of $G$
351: reached by following a directed edge $(u \to v)$ of $S$. The reach of $u$
352: through $v$ in $S$ is negligible, which happens with probability $\qin{b}$, if
353: and only if all of the other $b-1$ neighbors of $v$ in $G$ (i.e., excluding $u$)
354: are themselves dead ends with respect to $v$ in $S$. This clearly leads to
355: \begin{equation}
356: \qin{b} = (\dein{b})^{b-1}.
357: \label{eq:qin}
358: \end{equation}
359: Equations (\ref{eq:dein}) and (\ref{eq:qin}) can be put together to yield
360: another equation where $\dein{a}$ is a function of all the other $\dein{}$'s.
361: This equation can then be solved numerically to obtain $\gin{S}$ via
362: (\ref{eq:gin}).
363:
364: We can follow a completely analogous derivation and obtain $\gout{S}$ by noting
365: that a node belongs to $\GOUT{S}$ if and only if it can be reached from a
366: size-$\Theta(n)$ set of nodes. Let $u$ be a degree-$a$ node of $G$ and $v$ a
367: neighbor of $u$ in $G$. We denote by $\deout{a}$ the probability that either
368: $u$ is not an out-neighbor of $v$ in $S$ or is but the number of nodes that can
369: reach $u$ through $v$ in $S$ is small, consisting of only $o(n)$ nodes. Also, we
370: denote by $\qout{b}$ the conditional probability that the number of nodes that
371: can reach $u$ through $v$ in $S$ is small, given that the degree of $v$ in $G$
372: is $b$ and $u$ is an out-neighbor of $v$. In a way analogous to the one that led
373: to (\ref{eq:dein}), (\ref{eq:gin}), and (\ref{eq:qin}), we obtain
374: \begin{equation}
375: \deout{a} = \sum_{b=1}^{n-1} \left(1-h(b,a)+h(b,a)\qout{b}\right)
376: \frac{bP_G(b)}{Z_G}, \label{eq:deout}
377: \end{equation}
378: \begin{equation}
379: \gout{S} = 1 - \sum_{a=0}^{n-1} (\deout{a})^a P_G(a),
380: \label{eq:gout}
381: \end{equation}
382: and
383: \begin{equation}
384: \qout{b} = (\deout{b})^{b-1}.
385: \label{eq:qout}
386: \end{equation}
387: Also, and identically to the derivation of $\gin{S}$, we can unify
388: (\ref{eq:deout}) and (\ref{eq:qout}) and calculate the value of each
389: $\deout{a}$ numerically to obtain $\gout{S}$ via (\ref{eq:gout}).
390:
391: %===============================================================================
392: \subsection{Expected vulnerability} \label{sec:vulnerability}
393:
394: Consistently with the simplifying assumptions of Section~\ref{sec:spread}, we
395: keep assuming that no node is immunized when the originator does not belong to
396: $\GIN{S}$. When this happens, all nodes of $\GCC{G}$ remain vulnerable to the
397: virus, and if the virus infects a node of $\GCC{G}$ it may propagate until the
398: entire $\GCC{G}$ is infected. Let us analyze the case in which the originator
399: does belong to $\GIN{S}$.
400:
401: As before, we assume that only the nodes of $\GOUT{S}$ receive the vaccine. Let
402: $V$ be an undirected subgraph of $G$ that spans all the nodes of $G$, and let an
403: edge $(u,v)$ of $G$ belong to $V$ if and only if neither $u$ nor $v$ belongs to
404: $\GOUT{S}$. That is, given a certain instance of the subgraph $S$, subgraph $V$
405: contains all the edges of $G$ that are not incident to nodes of $\GOUT{S}$.
406: Clearly, the edges of $V$ represent the edges through which the virus may
407: propagate if it reaches either of an edge's (unimmunized) end nodes.
408: Figure~\ref{fig:ggc} illustrates the subgraph $V$ corresponding to the $G$ and
409: $S$ instances of Figure~\ref{fig:ggi}.
410:
411: \begin{figure*}[!t]
412: \centering
413: \includegraphics[scale=\graphDraw]{imgs/gc.eps}
414: %\hspace{\stretch{1}}
415: %\includegraphics[scale=\graphDraw]{imgs/gc1.eps} \hspace{\stretch{1}}
416: %\includegraphics[scale=\graphDraw]{imgs/gc2.eps} \hspace{\stretch{1}}
417: \caption{The graph $V$ that corresponds to the $G$ and $S$ instances of
418: Figure~\ref{fig:ggi}. Nodes represented by filled circles or filled squares
419: belong to $\GOUT{S}$.}
420: \label{fig:ggc}
421: \end{figure*}
422:
423: Once again, and similarly to the case of $G$, a criterion exists for deciding
424: whether a size-$\Theta(n)$ connected component almost surely exists in $V$. We
425: denote such a component by $\GCC{V}$. When it does exist, and since all the
426: other connected components of $V$ contain with high probability only $o(n)$
427: nodes (which we again neglect), a virus may only proliferate into a large,
428: size-$\Theta(n)$ set of nodes if it first infects a node of $\GCC{V}$. This, of
429: course, is predicated upon the originator being in $\GIN{S}$ and dissemination
430: taking place exclusively inside $\GOUT{S}$, the assumptions of
431: Section~\ref{sec:spread}.
432:
433: We define the expected vulnerability of $\GCC{G}$, denoted by $\wPv$, as the
434: fraction of the nodes of $\GCC{G}$ that may become infected when the virus
435: attempts to infect a randomly chosen node of $\GCC{G}$. Let $\gcc{V}$ be the
436: fraction of the nodes of $G$ that belong to $\GCC{V}$. If the originator does
437: not belong to $\GIN{S}$ (which occurs with probability $1-\gin{S}/\gcc{G}$),
438: then $\wPv=1$; if it does belong to $\GIN{S}$ (with probability
439: $\gin{S}/\gcc{G}$), then $\wPv=\gcc{V}/\gcc{G}$ if and only if the virus first
440: infects a node of $\GCC{V}$, which occurs with probability $\gcc{V}/\gcc{G}$. We
441: then have
442: \begin{equation}
443: \wPv = 1-\frac{\gin{S}}{\gcc{G}} + \frac{\gin{S}}{\gcc{G}}\left(\frac{\gcc{V}}{\gcc{G}}\right)^2.
444: \label{eq:wPv}
445: \end{equation}
446:
447: Henceforth in this section we concentrate on calculating $\gcc{V}$ for the case
448: in which $\GCC{V}$ does exist. Clearly, a node of $G$ belongs to $\GCC{V}$ only
449: if it does not belong to $\GOUT{S}$. Through the remainder of the section, let
450: $u$ be a degree-$a$ node of $G$ that does not belong to $\GOUT{S}$ and $v$ a
451: neighbor of $u$ in $G$. Given that $v$ has degree $b$, we define $\wH{a}{b}$ as
452: the probability that the edge $(v \to u)$ exists in $S$. Since $u$ does not
453: belong to $\GOUT{S}$, node $v$ must be such that it satisfies one of the
454: following conditions: either edge $(v \to u)$ does not exist in $S$, which
455: happens with probability $1-h(b,a)$, or $(v \to u)$ exists in $S$ but the number
456: of nodes that can reach $u$ through $v$ is small, which occurs with probability
457: $h(b,a)\qout{b}$. We can then express $\wH{a}{b}$ as the ratio of the
458: probability that the latter condition is satisfied to the probability that
459: either the former or the latter is. This leads to
460: \begin{equation}
461: \wH{a}{b} = \frac{h(b,a)\qout{b}}{1 - h(b,a) + h(b,a)\qout{b}}.
462: \label{eq:wh}
463: \end{equation}
464:
465: Now let $\wP{a}{b}$ be the probability that $v$ has degree $b$ in $G$. Clearly,
466: $\wP{a}{b}$ is proportional to the joint probability that $v$ satisfies one of
467: the above conditions regarding the existence of edge $(v \to u)$ in $S$ and also
468: that a node's neighbor in $G$ has degree $b$. That is, $\wP{a}{b}$ is
469: proportional to
470: $\left(1-h(b,a)+h(b,a)\qout{b}\right)bP_G(b)/Z_G$. Using (\ref{eq:deout}), we
471: obtain
472: \begin{equation}
473: \wP{a}{b} = \left(\frac{1-h(b,a)+h(b,a)\qout{b}}{\deout{a}}\right) \frac{bP_G(b)}{Z_G}.
474: \label{eq:wp}
475: \end{equation}
476:
477: Let $b$ be the degree of $v$ in $G$. Because $u$ does not belong to $\GOUT{S}$,
478: nodes $u$ and $v$ are neighbors in $V$ if and only if $v$ does not belong to
479: $\GOUT{S}$ either. If $(v \to u)$ is an edge of $S$, which occurs with
480: probability $\wH{a}{b}$, then $v$ is obviously not in $\GOUT{S}$, as it would
481: otherwise make $u$ belong to $\GOUT{S}$ along with it. On the other hand, if
482: $(v \to u)$ is not an edge of $S$ (with probability $1-\wH{a}{b}$), then $v$
483: does not belong to $\GOUT{S}$ if and only if the number of nodes that can reach
484: it in $S$ is small, which happens with probability $\qout{b}$. It follows that
485: the probability that $u$ and $v$ are neighbors in $V$ is given by
486: \begin{equation}
487: \wH{a}{b} + (1 - \wH{a}{b})\qout{b}.
488: \label{eq:aux30}
489: \end{equation}
490: When $u$ and $v$ are indeed neighbors in $V$, we define $\qgcc{b}$ as the
491: probability that $u$ has a small reach in $V$ through $v$. We say that $v$ is a
492: dead end with respect to $u$ in $V$ if either $v$ is not a neighbor of $u$ in
493: $V$, which occurs with probability
494: $1-\left[\wH{a}{b} + (1 - \wH{a}{b})\qout{b}\right]$, or it is but the reach of
495: $u$ through $v$ in $V$ is small, which occurs with probability
496: $\left[\wH{a}{b} + (1 - \wH{a}{b})\qout{b}\right]\qgcc{b}$. Thus, the
497: probability that $v$ is a dead end with respect to $u$ in $V$ is
498: \begin{eqnarray}
499: \lefteqn{1-\left[\wH{a}{b} + (1 - \wH{a}{b})\qout{b}\right] +
500: \left[\wH{a}{b} + (1 - \wH{a}{b})\qout{b}\right]\qgcc{b}}
501: \hspace{1.25in}\nonumber\\
502: && = \wH{a}{b}\qgcc{b} + (1-\wH{a}{b})(1 - \qout{b} + \qout{b}\qgcc{b}),
503: \label{eq:wgccV}
504: \end{eqnarray}
505: so the probability that a neighbor of $u$ is a dead end with respect to $u$ in
506: $V$, which we denote by $\degcc{a}$, is clearly
507: \begin{equation}
508: \degcc{a} = \sum_{b=1}^{n-1} \left[\wH{a}{b}\qgcc{b} + (1-\wH{a}{b})(1 - \qout{b} + \qout{b}\qgcc{b})\right]\wP{a}{b}.
509: \label{eq:degccV}
510: \end{equation}
511:
512: In order to calculate $\qgcc{b}$, notice that the reach of $u$ through $v$ in
513: $V$ is small if and only if all other $b-1$ neighbors of $v$ in $G$ are
514: themselves dead ends with respect to $v$ in $V$. Then, assuming that the degrees
515: of a node's neighbors in $G$ remain independent from one another even under the
516: condition that the node does not belong to $\GOUT{S}$, we have
517: \begin{equation}
518: \qgcc{b} = (\degcc{b})^{b-1}.
519: \label{eq:qgccV}
520: \end{equation}
521: Putting (\ref{eq:degccV}) and (\ref{eq:qgccV}) together leads to an equation
522: where $\degcc{a}$ is a function of all the other $\degcc{}$'s, which can then be
523: solved numerically for $0 \leq a \leq n-1$.
524:
525: We are, finally, in position to calculate the value of $\gcc{V}$. Let $u$ be a
526: randomly chosen node of $G$ having degree $a$. In order to belong to $\GCC{V}$,
527: node $u$ must not belong to $\GOUT{S}$, which occurs with probability
528: $\left(\deout{a}\right)^a$. Furthermore, $u$ belongs to $\GCC{V}$ only if at
529: least one of its neighbors is not a dead end with respect to it in $V$, which
530: occurs with probability $1-(\degcc{a})^a$. It then follows that
531: \begin{equation}
532: \gcc{V} = \sum_{a=0}^{n-1} (\deout{a})^a \left[1 - (\degcc{a})^a\right] P_G(a).
533: \label{eq:gccV}
534: \end{equation}
535:
536: %===============================================================================
537: %===============================================================================
538: %===============================================================================
539: \section{The heuristic function} \label{sec:heur}
540:
541: The efficiency of heuristic flooding as a means of immunizing a network depends
542: heavily on the choice of the heuristic function $h(a,b)$. Before introducing our
543: heuristic function, we elaborate on the properties of subgraph $S$ that we may
544: expect to lead to good results for $\wPs$ and $\wPv$.
545:
546: First of all, it is clear that $S$ must be above the phase transition that gives
547: rise to $\GSCC{S}$, thereby guaranteeing that $\GSCC{S}$, $\GIN{S}$, and
548: $\GOUT{S}$ almost surely exist. When this is the case, the nodes of $\GIN{S}$
549: are the most suitable ones for being the originator, as they can immunize a
550: non-negligible number of nodes. But since we cannot assume any prior information
551: on the originator, $\GIN{S}$ should contain as many nodes as possible in order
552: to make the probability that the originator is chosen from outside it as small
553: as possible. With regard to $\GOUT{S}$, we know that it contains the nodes that
554: receive the vaccine when the originator belongs to $\GIN{S}$. In order to
555: prevent an excessive number of nodes from receiving the vaccine, the size of
556: $\GOUT{S}$ should be kept to modest values. Putting these two observations
557: together, we ideally want $\GIN{S}$ to span all the nodes of the network,
558: $\GSCC{S}$ to contain only the nodes that can more efficiently block the
559: spreading of an infection, and $\GOUT{S}$ to be the same as $\GSCC{S}$.
560:
561: Since we know that immunizing the nodes with the highest degrees is an efficient
562: way to prevent epidemics in scale-free networks
563: \cite{albert2000,cohen2001,satorras2003}, we introduce, in this section, a
564: heuristic function that stimulates the transmission of the vaccine to
565: high-degree nodes. Introducing a parameter $\alpha \geq 0$, and considering a
566: degree-$a$ node $u$ that has the vaccine and a degree-$b$ neighbor $v$ of $u$,
567: our heuristic function $h(a,b)$, which gives the probability that $u$ sends the
568: vaccine to $v$, is defined as follows:
569: \begin{itemize}
570: \item If $b=1$, that is, $v$ has no neighbor besides $u$, then $h(a,b)=0$ and
571: $u$ deterministically decides not to send the vaccine to $v$. In this case,
572: since $u$ is already immune, should $v$ become infected it can transmit the
573: virus to no other node, so we choose not to give $v$ the vaccine.
574: \item If $a \leq 2 \leq b$, that is, $u$ has degree at most $2$ and $v$ has
575: degree at least $2$, then $h(a,b)=1$ and $u$ deterministically decides to send
576: the vaccine to $v$. This is meant to force some low-degree nodes to forward the
577: vaccine, thereby precluding a premature conclusion of heuristic flooding and, as
578: a consequence, leading to a larger $\GIN{S}$.
579: \item For all the other positive values of $a$ and $b$, we let
580: \begin{equation}
581: h(a,b)=\tanh\left(\frac{b-1}{\left(a-2\right)^\alpha}\right).
582: \label{eq:h}
583: \end{equation}
584: \end{itemize}
585:
586: Figure~\ref{fig:h} shows two plots illustrating this heuristic function for
587: $\alpha=0.7$ (part~(a)) and $\alpha=1.0$ (part~(b)). Clearly, for fixed $a>2$,
588: $h(a,b)$ increases with $b$, so the vaccine is more likely to be transmitted to
589: high-degree nodes. For fixed $b>1$, $h(a,b)$ decreases with $a$, thus reflecting
590: the intuition that, when $u$ is a high-degree node, sending the vaccine to $v$
591: may be unnecessary even if $v$ is a high-degree node (there are probably other
592: paths through which the vaccine can be transmitted from $u$ to $v$).
593:
594: \begin{figure*}[!t]
595: \centering
596: \begin{tabular}{c}
597: \includegraphics[height=\heurHeight]{imgs/h07.eps}\\
598: \\
599: \includegraphics[height=\heurHeight]{imgs/h10.eps}
600: \end{tabular}
601: \caption{Plots of the heuristic function given by (\ref{eq:h}) for
602: $\alpha=0.7$ (a) and $\alpha=1.0$ (b).}
603: \label{fig:h}
604: \end{figure*}
605:
606: %===============================================================================
607: %===============================================================================
608: %===============================================================================
609: \section{Simulation results} \label{sec:sim}
610:
611: We have conducted extensive simulations on random graphs with node degrees
612: distributed according to a power law. Generating such a graph is achieved in two
613: phases \cite{newman2001}. Let $u_1, u_2, \ldots, u_n$ be the nodes of the random
614: graph we want to generate. In the first phase, for $i=1,\ldots,n$ we sample the
615: degree $d_i$ of each $u_i$ from the power-law distribution, obtaining the
616: so-called degree sequence of the graph. If $\sum_{i=1}^n d_ i$ turns out to be
617: odd, then we discard the entire degree sequence and sample a new one, repeating
618: the process until the sum of the degrees comes out even. In the second phase, we
619: consider an imaginary urn having $\sum_{i=1}^n d_i$ labeled balls, the labels of
620: $d_i$ of them being $u_i$. We then successively remove pairs of balls from the
621: urn until it has no more balls. For each pair we remove---say, of labels $u_i$
622: and $u_j$---we add edge $(u_i,u_j)$ to the graph. This method can produce graphs
623: having multiple edges or self-loops, but it has the advantage of generating
624: graphs whose degrees remain independent even after the edges are added, which is
625: a core assumption of our analysis.
626:
627: We carried out our simulations for $n=10000$ and $2 \leq \tau \leq 3$. For each
628: value of $\tau$, we generated $500$ $G$ instances. For each $G$ instance, we
629: used the heuristic $h(a,b)$ to both sample $1000$ instances of the subgraph $S$
630: and, in an independent way, conduct $1000$ vaccine disseminations by heuristic
631: flooding from an originator selected randomly among the nodes of the largest
632: connected component of the $G$ instance. For each $S$ instance, we selected the
633: largest strongly connected component and calculated the sizes of the
634: corresponding in-component (counting the nodes that can reach the strongly
635: connected component) and out-component (counting the nodes that can be reached
636: from the strongly connected component). We then obtained the expected sizes of
637: $\GIN{S}$ and $\GOUT{S}$ by averaging these quantities over the $500000$
638: samples. For each vaccine dissemination, we calculated the fraction of nodes
639: that receive the vaccine and the fraction of nodes to which an infection may
640: spread when an attempt at infecting a randomly chosen node inside the largest
641: connected component of $G$ takes place. We then obtained $\wPs$ and $\wPv$ by
642: averaging these quantities over the $500000$ samples.
643:
644: Simulation results are shown in Figure~\ref{fig:sim} for
645: $\alpha=0.1,0.4,0.7,1.0$. We note, in general, a satisfactory agreement between
646: analytic and simulation results, with the exception of part~(d), in which case
647: the deviation may be attributed to the approximations made during the derivation
648: of $\gcc{V}$ in Section~\ref{sec:vulnerability} to yield (\ref{eq:gccV}).
649:
650: \begin{figure*}[!t]
651: \centering
652: \begin{tabular}{rr}
653: \includegraphics[height=\graphHeight]{imgs/graph_powerlaw_worm2_gin_nolookahead.eps} &
654: \includegraphics[height=\graphHeight]{imgs/graph_powerlaw_worm2_gout_nolookahead.eps} \\
655: \includegraphics[height=\graphHeight]{imgs/graph_powerlaw_worm2_immuned_nolookahead.eps} &
656: \includegraphics[height=\graphHeight]{imgs/graph_powerlaw_worm2_new_nolookahead.eps}
657: \end{tabular}
658: \caption{Simulation results of vaccine dissemination by heuristic flooding.
659: Solid lines give the analytic predictions.}
660: \label{fig:sim}
661: \end{figure*}
662:
663: When $\tau \leq 2.5$, the plots for $\gin{S}/\gcc{G}$ and $\gout{S}/\gcc{G}$
664: (Figure~\ref{fig:sim}(a,b)) reveal that the heuristic function introduced in
665: Section~\ref{sec:heur} results in a $\GIN{S}$ that spans almost all the nodes of
666: $\GCC{G}$, while the size of $\GOUT{S}$ keeps to a relatively modest fraction of
667: $\GCC{G}$. For example, for $\tau \leq 2.5$ and $\alpha=1.0$, the relative size
668: of $\GIN{S}$ is always above $0.97$ and the relative size of $\GOUT{S}$ is
669: always below $0.13$. For $\tau > 2.5$, the relative size of $\GIN{S}$ decreases
670: with $\tau$, thus evidencing that heuristic flooding has more difficulty
671: disseminating the vaccine when the graph is sparser.
672:
673: Owing to $\wPs$ being given by $(\gin{S}/\gcc{G})(\gout{S}/\gcc{G})$
674: (cf.\ (\ref{eq:wPi})), and to $\gin{S}/\gcc{G}$ being relatively close to $1$
675: (Figure~\ref{fig:sim}(a)), the plots for $\wPs$ (Figure~\ref{fig:sim}(c)) are of
676: course similar to the plots for $\gout{S}/\gcc{G}$ (Figure~\ref{fig:sim}(b)).
677: Furthermore, given a value of $\alpha$, $\wPs$ decreases with $\tau$, which
678: means that heuristic flooding spreads through a smaller number of nodes when the
679: graph is sparser, as, in this case, there are less paths conducting to the
680: high-degree nodes.
681:
682: As for $\wPv$ (Figure~\ref{fig:sim}(d)), we note that, for $\tau \leq 2.5$,
683: $\wPv$ is nearly zero. This result is a natural consequence both of the guiding
684: principle of the heuristic introduced in Section~\ref{sec:heur}, which ascribes
685: more probability for transmitting the vaccine to nodes having higher degrees,
686: and of the result for $\gin{S}/\gcc{G}$ (Figure~\ref{fig:sim}(a)), which
687: indicates that $\GIN{S}$ spans almost all the nodes of $\GCC{G}$. As $\tau$ is
688: increased to values greater than $2.5$, $\wPv$ moves farther away from zero,
689: since the size of $\GIN{S}$ decreases and, therefore, the probability that
690: heuristic flooding distributes the vaccine to only a small number of nodes
691: increases. Regarding the value of $\alpha$, we note a clear trade-off between
692: $\wPs$ and $\wPv$. If we were to adjust $\alpha$ in such a way as to decrease
693: $\wPs$, we would have an increase in $\wPv$, which shows that the number of
694: immunized nodes has a direct impact on the resulting vulnerability of the
695: network.
696:
697: %===============================================================================
698: %===============================================================================
699: %===============================================================================
700: \section{Conclusion} \label{sec:conc}
701:
702: We have considered in this paper the problem of immunizing a scale-free network
703: against a virus or worm. We introduced a new immunization strategy, one that we
704: believe reflects more accurately what happens in real scenarios. In our
705: strategy, we assume that the vaccine enters the network at exactly one node, in
706: general the site of the vaccine's development or the site in charge of its
707: distribution, for example. This node begins the dissemination of the vaccine by
708: heuristic flooding, aiming at immunizing the nodes that have the highest
709: degrees. With this purpose in mind, we introduced a heuristic function that
710: gives more probability to forwarding the vaccine toward nodes with higher
711: degrees.
712:
713: We obtained analytical and simulation results on random graphs having node
714: degrees distributed according to a power law. Our mathematical analysis has
715: innovative aspects that we expect may shed some light on obtaining analytical
716: results for similar distributed algorithms. Also, we hope our analysis can
717: contribute to the development of new heuristic functions for vaccine
718: dissemination. With regard to our simulation results, they show satisfactory
719: agreement with our mathematical analysis and highlight the expected trade-off
720: between the number of nodes that receive the vaccine and the vulnerability of
721: the network to future infections. Especially for power laws with relatively
722: small value for the parameter $\tau$, our heuristic function achieves very good
723: results, making the network practically invulnerable to an epidemic while
724: requiring the immunization of only roughly $10\%$ of the nodes.
725:
726: We note, finally, that one possible direction in which this paper's research may
727: be extended, in addition to the search for other heuristic functions, is that of
728: allowing for multiple concurrent initiators. While algorithmically (i.e., from
729: the perspective of flooding the network) such an extension is trivial, extending
730: the analysis of Section~\ref{sec:math} is expected to be a significantly more
731: complex endeavor.
732:
733: \subsection*{Acknowledgments}
734:
735: The authors acknowledge partial support from CNPq, CAPES, and a FAPERJ BBP
736: grant.
737:
738: \bibliography{imm}
739: \bibliographystyle{plain}
740:
741: \end{document}
742:
743: