cs0501069/analysis-internode.tex
1: %\vspace*{-0.5cm}
2: \section{Assumptions \& Definitions}
3: %\vspace*{-0.25cm}
4: \label{sec:assum}
5: {\bf Basic Notation.} In what follows, we assume that the reader is
6: familiar with Chord. However we introduce the notation used below. We use
7: ${\cal K}$ to mean the size of the Chord key space and $N$ the number
8: of nodes. Let ${\cal M} = \log_2{\cal K}$ be the number of fingers of
9: a node and ${\cal S}$ the length of the immediate successor list,
10: usually set to a value $= O(\log(N))$. We refer to nodes by their
11: keys, so a node $n$ implies a node with key $n \in 0 \cdots {\cal
12: K}-1$.  We use $p$ to refer to the predecessor, $s$ for referring to the successor list as a whole, 
13: and $s_i$ for the $i^{th}$ successor.  Data
14: structures of different nodes are distinguished by prefixing them with
15: a node key e.g. $n'.s_1$, etc. Let \emph{$fin_i$.start} denote
16: the start of the $i^{th}$ finger (Where for a node $n$, $\forall i \in
17: 1..{\cal M}$, $n.fin_i.start$ = $n + 2^{i-1}$) and \emph{$fin_i$.node}
18: denote the actual node pointed to by that finger.
19: %An unqualified $fin_i$  will mean \emph{$fin_i$.node}. 
20:  
21: %{\bf Churn.} The continuous process of node joins and failures can be
22: %expressed in a number of different ways.  In \cite{nowell02analysis},
23: %joins are modeled by a single poisson process and failures by an
24: %exponential lifetime process for every node. Joins and failures can
25: %equivalently be described by the median session time as in
26: %\cite{li03comparing, rhea04handling, rowstron04depend}. As we discuss
27: %later, it is of interest to differentiate between a ``per-node'' rate
28: %and a ``per-network'' rate. For example in \cite{nowell02analysis},
29: %the join rate is per network and the failure rate is per node. In our
30: %simulations and analysis we use per-node rates for both joins and
31: %failures.
32: 
33: {\bf Steady State Assumption.} $\lambda_j$  is the rate of joins per node, $\lambda_f$ the rate of failures per node and $\lambda_s$  the rate of stabilizations per node. We carry out our analysis
34: for the general case when the rate of doing successor stabilizations $\alpha\lambda_s$, 
35: is not necessarily the same as the rate at which finger stabilizations  $(1-\alpha)\lambda_s$ 
36: are performed. In all that follows, we impose the steady state condition
37: $\lambda_j=\lambda_f$. Further it is useful to define $r \equiv \frac{\lambda_s}{\lambda_f}$ 
38: which is the relevant ratio on which all the quantities we are interested in will depend,
39: e.g, $r=50$ means that a join/fail event takes place every
40: half an hour for a stabilization which takes place once every $36$ seconds.
41: 
42: %{\bf Communication and Failure Model.} We assume a fail-stop model and reliable communication. More importantly, we make the simplifying assumption that communication delays due to a limited number of hops is much smaller than the average time interval between joins, failures or stabilization events. However, we do not expect that the results will change much even if this were not satisfied.
43: 
44: {\bf Parameters.} The parameters of the problem are hence: ${\cal K}$, $N$, $\alpha$ and $r$. 
45: All relevant measurable quantities should be entirely expressible in terms of these parameters.
46: 
47: {\bf Chord Simulation.} We use our own discrete event simulation environment implemented in Java which can be retrieved from \cite{ansary:analysis}. We assume the familiarity of the reader with Chord, however an exact analysis necessitates the provision of a few details. Successor stabilizations performed by a node $n$ on $n.s_1$ accomplish two main goals: $i)$ Retrieving the predecessor and successor list of of $n.s_1$ and reconciling with $n$'s state. $ii)$ Informing $n.s_1$ that $n$ is alive/newly joined. A finger stabilization picks one finger at random and looks up its start. Lookups do not use the optimization of checking the successor list before using the fingers.
48: However, the successor list is used as a last resort if fingers could not provide progress. Lookups are assumed not to
49: change the state of a node. For joins, a new node $u$ finds its successor $v$ through some initial random contact and performs successor stabilization on that successor. All fingers of $u$ that have $v$ as an acceptable finger node are set to $v$. The rest of the fingers are computed as best estimates from $v's$ routing table. All failures are ungraceful. We make the simplifying assumption that communication delays due to a limited number of hops is much smaller than the average time interval between joins, failures or stabilization events. However, we do not expect that the results will change much even if this were not satisfied.
50: 
51: %{\bf Wrong pointers and lookups.} The churned network is always compared against an artificially-optimal network
52: %constructed from the alive nodes to determine how outdated every node is and
53: %whether the answer obtained from a lookup is the correct answer.
54: 
55: {\bf Averaging.} Since we are collecting statistics like the probability of a particular finger pointer to be wrong, we need to repeat each experiment $100$ times before obtaining well-averaged results. 
56: The total simulation sequential real time for obtaining the results of this paper was about $1800$ hours that was parallelized on a cluster of $14$ nodes where we had $N=1000$, ${\cal K}=2^{20}$, ${\cal S}=6$, $200 \leq r \leq 2000$
57: and $0.25 \leq \alpha \leq 0.75$.
58: 
59: \section{The Analysis}
60: \vspace*{-0.25cm}
61: \subsection{Distribution of Inter-Node Distances}
62: \vspace*{-0.25cm}
63: During churn, the inter-node distance (the difference between the keys of two consecutive nodes) is a fluctuating variable. An important quantity used throughout the analysis is the
64: pdf of inter-node distances. We define this quantity below and state a theorem giving its
65: functional form. We then mention three properties of this distribution
66: which are needed in the ensuing analysis. Due to space limitations, we omit the proof of this theorem and the properties here and provide them in  \cite{ansary:analysis}.
67: 
68: 
69: \begin{definition} Let $Int(x)$ be the number of intervals of length $x$, i.e. the number of pairs of consecutive nodes which are separated by a distance of $x$ keys on the ring. 
70: %If two nodes immediately follow each other on the ring, the distance between them is equal to $1$.
71: \end{definition}
72: 
73: %$N$, the number of peers, is also the total numbers of intervals on the ring.
74: 
75: %\end{multicols}
76: 
77: \begin{figure*}
78: 	\centering
79: 		\includegraphics[height=9cm, angle=270]{wdboth-sep}
80: 		\includegraphics[height=9cm, angle=270]{i-sep}
81: %		\includegraphics[height=8cm, angle=270]{f}
82: %		\includegraphics[height=8cm, angle=270]{l}
83: 
84: 		%\begin{table}[t]
85: 	   %\centering
86: 	\caption{Theory and Simulation for $w_1(r,\alpha)$, $d_1(r,\alpha)$, $I(r,\alpha)$}
87: 	\label{fig:wi}
88: \end{figure*}
89: 
90: %\begin{multicols}{2}
91: 
92: \begin{theorem} For a process in which nodes join 
93: or leave with equal rates (and the number of nodes in the network is almost constant) independently of each other and uniformly on the ring,
94: The probability ($P(x) \equiv \frac{Int(x)}{N}$) of finding an interval of length $x$ is:
95: 
96: $P(x) = \rho^{x-1}(1-\rho)$ where $\rho = \frac{{\cal K}-N}{\cal K}$ and $1-\rho=\frac{N}{\cal K}$ 
97: \end{theorem}
98: The derivation of the distribution $P(x)$ is independent of any details of the Chord implementation and depends solely on the join and leave process. It is hence applicable to any DHT that deploys a ring.
99: %Fig {} shows the comparisn of theory and simulations. The slight 
100: %deviations from the theory are due in part to the fact that the number
101: %of nodes is actually a widely fluctuating quantity 
102: %under our implementation of churn(see Section ..).
103: 
104: 
105: %\begin{definition}
106: %$\tilde{P}(x) \equiv \frac{2I(x)}{N}$ is the probability of picking an interval of length $x$ if \emph{nodes} are picked randomly.
107: %\end{definition}
108: 
109: %By definition $\sum{P(x)}=1$ and  $\sum{x~P(x)}={\cal K}/N$. For the total number of peers, the equation for the mean %number of peers is simply $\frac{d\left\langle N \right\rangle}{dt}=\lambda_j-\lambda_f=0$. 
110: %The variance can grow with time even if the rates are equal. {\bf [That is elaborated on in section foo.]}
111: 
112: 
113: %We now write an equation for $\avg{I(x)}$ by considering all the processes which lead to its gain or loss.
114: %We will use $I_x$ and $\avg{I(x)}$ interchangeably to denote the mean number of intervals of size $x$
115: %averaged over many ring configurations.
116: %
117: %A micro instant of time $\Delta t$ is a small interval of time when only one event occurs. This event could be a join, failure  or stabilization event. We only need to consider join and failure events for this computation, since stabilization events do not change the inter-node distances. The quantity $I_x$ is a fluctuating quantity which can either increase or decrease as a join or failure happens. Table \ref{tab:rates} lists the changes that can occur in $I_x$ in an interval of time $\Delta t$ along with their rates.
118: %\begin{table}
119: %	\centering
120: %		\begin{tabular}{|l|l|} \hline
121: %		Change in $I_x$	&  Rate of Change   \\ %\hline 
122: %		$I_x(t+\Delta t) = I_x(t)-1$ & $c_1=(\lambda_f \Delta t) \tilde{P}(x)$
123: %		\\ %\hline
124: %		$I_x(t+\Delta t) = I_x(t)-1$ & $c_2=\frac{(N \lambda_j \Delta t)}{{\cal K}-N}(x-1) P(x)$ \\ %\hline
125: %		$I_x(t+\Delta t) = I_x(t)+1$ & $c_3=\frac{\tilde{P}(x_1)}{N} (\lambda_f \Delta t) P(x-x_1)$ \\ 
126: %  															 & where $1 \leq x_1 \leq x-1$ \\ %\hline
127: %		$I_x(t+\Delta t) = I_x(t)+1$ & $c_4=(\lambda_j \Delta t) \frac{2}{{\cal K}-N} \sum_{x1>x} P(x_1)$\\ %\hline
128: %		$I_x(t+\Delta t) = I_x(t)$ & $1 - (c_1 + c_2 + c_3 + c_4)$\\ \hline
129: %		\end{tabular}
130: %\caption{Changes and their rate for $I(x)$ the number of intervals of length $x$.}
131: %\label{tab:rates}
132: %\end{table}
133: %
134: %First, a failure of either of the boundary nodes of an interval of size $x$ leads to its loss
135: %at rate $c_1$.
136: %Second, An interval of size $x$ can be lost at rate $c_2$ if a joining node splits it. The join can be
137: %initialized by any one of the $N$ nodes in the system, hence the factor of $N$ multiplying $\lambda_s$.
138: %Third, the number of intervals of size $x$ can increase by $1$ at rate $c_3$ if a failure of a boundary node results
139: %in the aggregation of two adjacent intervals. Fourth, an increase can happen at rate $c_4$ if a join event splits a larger
140: %interval into an interval of size $x$. Finally, $I_x$ remains the same if none of the above happens. Therefore
141: %the equation for $I_x$ is:
142: %
143: %\begin{equation}
144: %\label{eqn:i}
145: %\begin{split}
146: %\frac{d I_x}{dt} = &- P(x) \left[ 2\lambda_f + \frac{N\lambda_j(x-1)}{{\cal K}-N} \right] \\
147: %		& + \lambda_f \sum_{x_1=1}^{x-1} P(x)P(x-x_1)  \\
148: %		&+ 2\lambda_j \frac{N}{{\cal K}-N} \sum_{x_1>x} P(x_1) , x \geq 1      \\
149: %\end{split}
150: %\end{equation}
151: %
152: %We can check that :
153: %\begin{equation}
154: %\frac{d}{dt}\sum I_x = \frac{dN}{dt} = \lambda_j - \lambda_f \nonumber
155: %\end{equation}
156: %
157: %Further we can check that :
158: %\begin{equation}
159: %\frac{d}{dt}\sum xI_x = \frac{d{\cal K}}{dt} = 0    \nonumber
160: %\end{equation}
161: 
162: %The set of equations \ref{eqn:i} can be solved leading to the solution:
163: 
164: %\begin{equation}
165: %\label{eqn:ii}
166: %P(x) = \rho^{x-1}(1-\rho)
167: %\end{equation}
168: %where $\rho = \frac{{\cal K}-2N}{{\cal K}-N}$ and $1-\rho=\frac{N}{{\cal K}-N}$ 
169: 
170: %We write and equation for  $\left\langle N$
171: 
172: %We now derive some properties of this distribution which will be used in the ensuing analysis.
173: 
174: 
175: \begin{property}
176: For any two keys $u$ and $v$, where $v=u+x$, let $b_i$ be the probability
177: that the first node encountered inbetween these two keys is at $u+i$ (where $0 \leq i < x-1$).
178: Then $b_i \equiv {\rho^{i}(1-\rho)}$. 
179: The probability that there is definitely atleast one node between $u$ and $v$ is: $a(x)\equiv {1-\rho^x}$. 
180: Hence the conditional probability that the first node is at a distance $i$ {\it given} that
181: there is atleast one node in the interval is $ bc(i,x)\equiv b(i)/a(x)$.
182: 
183: %
184: %
185: %
186: %The probability that there is definitely atleast one node between any two keys 
187: %a distance $x$ apart is: $a(x)\equiv {1-\rho^x}$. 
188: %%(left included, right excluded)
189: %The probability that the first node is at a distance $x$ from the beginning of the interval is:
190: %$b(x) \equiv {\rho^{x-1}(1-\rho)}$. Hence the conditional
191: %probability that the first node is at a distance $x$ {\it given} that
192: %there is atleast one node in the interval is $ bc(x)\equiv b(x)/a(x)$
193: 
194: \end{property}
195: 
196: 
197: \begin{property}
198: \label{prop:share}
199: The probability that a node and atleast one of its immediate predecessors 
200: share the same $k^{th}$ finger
201: is $p_1(k)\equiv \frac{\rho}{1+\rho} (1-\rho^{2^k-2})$. This is $\sim 1/2$ for 
202: ${\cal K} >> 1$ and $N << {\cal K}$.Clearly $p_1=0$ for $k=1$. 
203: It is straightforward (though tedious) to
204: derive similar expressions for $p_2(k)$ the probability that a node and atleast {\it two} of its immediate predecessors share the same $k^{th}$ finger,
205: $p_3(k)$ and so on.
206: \end{property}
207: 
208: \begin{property}
209: \label{prop:copy}
210: We can similarly assess the probability that the join protocol (see previous section) 
211: results in further replication of the $k^{th}$ pointer. That is, the probability that a newly joined node will choose the $k^{th}$  entry of its successor's finger table 
212: as its own $k^{th}$ entry is
213: $p_{\mathrm join}(k) \sim \rho (1-\rho^{2^{k-2} -2}) + (1-\rho) (1-\rho^{2^{k-2}-2}) -(1-\rho) \rho (2^{k-2} -2) \rho^{2^{k-2}-3} $.
214: The function $p_{\mathrm join}(k)=0$ for small $k$ and $1$ for large $k$.
215: \end{property}
216: %\Proof
217: %If the distance between node $n$ and its predecessor $p$ is $x$, the distance between 
218: %$n.f_k$.\emph{start} and $p.f_k$.\emph{start} is also $x$. If there is no node
219: %between $n.f_k$.\emph{start} and $p.f_k$.\emph{start} then they will share
220: %the same value for that $k^th$ finger. The probability that the distance between $n$ and $p$ is $x$ is $\rho^{x-1}(1-\rho)$ 
221: %as derived in equation \ref{eqn:ii}. The probability that no node exists between $n.f_k$.\emph{start} and $p.f_k$.\emph{start}
222: %is $\rho^x$. The probability that the $n.f_k$.\emph{start} and $p.f_k$.\emph{start} share the same successor is:
223: %\begin{equation}
224: %\label{eqn:iii}
225: %\begin{split}
226: %\sum_{x=1..{\cal K}} \rho^{x-1}(1-\rho)\rho^x  
227: %&= \frac {1-\rho}{\rho}\frac{\rho^2}{1-\rho^2} = \frac {\rho}{1+\rho}												    \\
228: %&= \left[\frac{{\cal K}-2N}{{\cal K}-N}\right]\left[\frac{{\cal K}-2N}{2{\cal K}-3N}\right] \approx 0.5
229: %\end{split}
230: %\end{equation}
231: %\qedSquare
232: %
233: %
234: %Let the probability that in an interval of length $x$, there is at least one node be denoted by $P_{>1}(x)$.
235: %$P_{>1}(x) = 1- \rho^x$ where $\rho^x$ is the probability that there is no node inside the interval.
236: 
237: 
238: %The sum $\sum_{y=1}^{x} \frac{\rho^{y-1}(1-\rho)}{1-\rho^x} =1$ since we are given that there 
239: %definitely is a node inside the interval.
240: %For $x << \frac{{\cal K}}{N}$ (the average inter-node distance), this prob is almost uniform ($\approx \frac{1}{l}$).
241: %For $x >> \frac{{\cal K}}{N}$ the probability becomes vanishingly small beyond $y \approx \frac{2{\cal K}}{N}$. 
242: