0501:cs0501069/analysis-internode.tex

1: %\vspace*{-0.5cm}

2: \section{Assumptions \& Definitions}

3: %\vspace*{-0.25cm}

4: \label{sec:assum}

5: {\bf Basic Notation.} In what follows, we assume that the reader is

6: familiar with Chord. However we introduce the notation used below. We use

7: ${\cal K}$ to mean the size of the Chord key space and $N$ the number

8: of nodes. Let ${\cal M} = \log_2{\cal K}$ be the number of fingers of

9: a node and ${\cal S}$ the length of the immediate successor list,

10: usually set to a value $= O(\log(N))$. We refer to nodes by their

11: keys, so a node $n$ implies a node with key $n \in 0 \cdots {\cal

12: K}-1$.  We use $p$ to refer to the predecessor, $s$ for referring to the successor list as a whole,

13: and $s_i$ for the $i^{th}$ successor.  Data

14: structures of different nodes are distinguished by prefixing them with

15: a node key e.g. $n'.s_1$, etc. Let \emph{$fin_i$.start} denote

16: the start of the $i^{th}$ finger (Where for a node $n$, $\forall i \in

17: 1..{\cal M}$, $n.fin_i.start$ = $n + 2^{i-1}$) and \emph{$fin_i$.node}

18: denote the actual node pointed to by that finger.

19: %An unqualified $fin_i$  will mean \emph{$fin_i$.node}.

20:

21: %{\bf Churn.} The continuous process of node joins and failures can be

22: %expressed in a number of different ways.  In \cite{nowell02analysis},

23: %joins are modeled by a single poisson process and failures by an

24: %exponential lifetime process for every node. Joins and failures can

25: %equivalently be described by the median session time as in

26: %\cite{li03comparing, rhea04handling, rowstron04depend}. As we discuss

27: %later, it is of interest to differentiate between a ``per-node'' rate

28: %and a ``per-network'' rate. For example in \cite{nowell02analysis},

29: %the join rate is per network and the failure rate is per node. In our

30: %simulations and analysis we use per-node rates for both joins and

31: %failures.

32:

33: {\bf Steady State Assumption.} $\lambda_j$  is the rate of joins per node, $\lambda_f$ the rate of failures per node and $\lambda_s$  the rate of stabilizations per node. We carry out our analysis

34: for the general case when the rate of doing successor stabilizations $\alpha\lambda_s$,

35: is not necessarily the same as the rate at which finger stabilizations  $(1-\alpha)\lambda_s$

36: are performed. In all that follows, we impose the steady state condition

37: $\lambda_j=\lambda_f$. Further it is useful to define $r \equiv \frac{\lambda_s}{\lambda_f}$

38: which is the relevant ratio on which all the quantities we are interested in will depend,

39: e.g, $r=50$ means that a join/fail event takes place every

40: half an hour for a stabilization which takes place once every $36$ seconds.

41:

42: %{\bf Communication and Failure Model.} We assume a fail-stop model and reliable communication. More importantly, we make the simplifying assumption that communication delays due to a limited number of hops is much smaller than the average time interval between joins, failures or stabilization events. However, we do not expect that the results will change much even if this were not satisfied.

43:

44: {\bf Parameters.} The parameters of the problem are hence: ${\cal K}$, $N$, $\alpha$ and $r$.

45: All relevant measurable quantities should be entirely expressible in terms of these parameters.

46:

47: {\bf Chord Simulation.} We use our own discrete event simulation environment implemented in Java which can be retrieved from \cite{ansary:analysis}. We assume the familiarity of the reader with Chord, however an exact analysis necessitates the provision of a few details. Successor stabilizations performed by a node $n$ on $n.s_1$ accomplish two main goals: $i)$ Retrieving the predecessor and successor list of of $n.s_1$ and reconciling with $n$'s state. $ii)$ Informing $n.s_1$ that $n$ is alive/newly joined. A finger stabilization picks one finger at random and looks up its start. Lookups do not use the optimization of checking the successor list before using the fingers.

48: However, the successor list is used as a last resort if fingers could not provide progress. Lookups are assumed not to

49: change the state of a node. For joins, a new node $u$ finds its successor $v$ through some initial random contact and performs successor stabilization on that successor. All fingers of $u$ that have $v$ as an acceptable finger node are set to $v$. The rest of the fingers are computed as best estimates from $v's$ routing table. All failures are ungraceful. We make the simplifying assumption that communication delays due to a limited number of hops is much smaller than the average time interval between joins, failures or stabilization events. However, we do not expect that the results will change much even if this were not satisfied.

50:

51: %{\bf Wrong pointers and lookups.} The churned network is always compared against an artificially-optimal network

52: %constructed from the alive nodes to determine how outdated every node is and

53: %whether the answer obtained from a lookup is the correct answer.

54:

55: {\bf Averaging.} Since we are collecting statistics like the probability of a particular finger pointer to be wrong, we need to repeat each experiment $100$ times before obtaining well-averaged results.

56: The total simulation sequential real time for obtaining the results of this paper was about $1800$ hours that was parallelized on a cluster of $14$ nodes where we had $N=1000$, ${\cal K}=2^{20}$, ${\cal S}=6$, $200 \leq r \leq 2000$

57: and $0.25 \leq \alpha \leq 0.75$.

58:

59: \section{The Analysis}

60: \vspace*{-0.25cm}

61: \subsection{Distribution of Inter-Node Distances}

62: \vspace*{-0.25cm}

63: During churn, the inter-node distance (the difference between the keys of two consecutive nodes) is a fluctuating variable. An important quantity used throughout the analysis is the

64: pdf of inter-node distances. We define this quantity below and state a theorem giving its

65: functional form. We then mention three properties of this distribution

66: which are needed in the ensuing analysis. Due to space limitations, we omit the proof of this theorem and the properties here and provide them in  \cite{ansary:analysis}.

67:

68:

69: \begin{definition} Let $Int(x)$ be the number of intervals of length $x$, i.e. the number of pairs of consecutive nodes which are separated by a distance of $x$ keys on the ring.

70: %If two nodes immediately follow each other on the ring, the distance between them is equal to $1$.

71: \end{definition}

72:

73: %$N$, the number of peers, is also the total numbers of intervals on the ring.

74:

75: %\end{multicols}

76:

77: \begin{figure*}

78: 	\centering

79: 		\includegraphics[height=9cm, angle=270]{wdboth-sep}

80: 		\includegraphics[height=9cm, angle=270]{i-sep}

81: %		\includegraphics[height=8cm, angle=270]{f}

82: %		\includegraphics[height=8cm, angle=270]{l}

83:

84: 		%\begin{table}[t]

85: 	   %\centering

86: 	\caption{Theory and Simulation for $w_1(r,\alpha)$, $d_1(r,\alpha)$, $I(r,\alpha)$}

87: 	\label{fig:wi}

88: \end{figure*}

89:

90: %\begin{multicols}{2}

91:

92: \begin{theorem} For a process in which nodes join

93: or leave with equal rates (and the number of nodes in the network is almost constant) independently of each other and uniformly on the ring,

94: The probability ($P(x) \equiv \frac{Int(x)}{N}$) of finding an interval of length $x$ is:

95:

96: $P(x) = \rho^{x-1}(1-\rho)$ where $\rho = \frac{{\cal K}-N}{\cal K}$ and $1-\rho=\frac{N}{\cal K}$

97: \end{theorem}

98: The derivation of the distribution $P(x)$ is independent of any details of the Chord implementation and depends solely on the join and leave process. It is hence applicable to any DHT that deploys a ring.

99: %Fig {} shows the comparisn of theory and simulations. The slight

100: %deviations from the theory are due in part to the fact that the number

101: %of nodes is actually a widely fluctuating quantity

102: %under our implementation of churn(see Section ..).

103:

104:

105: %\begin{definition}

106: %$\tilde{P}(x) \equiv \frac{2I(x)}{N}$ is the probability of picking an interval of length $x$ if \emph{nodes} are picked randomly.

107: %\end{definition}

108:

109: %By definition $\sum{P(x)}=1$ and  $\sum{x~P(x)}={\cal K}/N$. For the total number of peers, the equation for the mean %number of peers is simply $\frac{d\left\langle N \right\rangle}{dt}=\lambda_j-\lambda_f=0$.

110: %The variance can grow with time even if the rates are equal. {\bf [That is elaborated on in section foo.]}

111:

112:

113: %We now write an equation for $\avg{I(x)}$ by considering all the processes which lead to its gain or loss.

114: %We will use $I_x$ and $\avg{I(x)}$ interchangeably to denote the mean number of intervals of size $x$

115: %averaged over many ring configurations.

116: %

117: %A micro instant of time $\Delta t$ is a small interval of time when only one event occurs. This event could be a join, failure  or stabilization event. We only need to consider join and failure events for this computation, since stabilization events do not change the inter-node distances. The quantity $I_x$ is a fluctuating quantity which can either increase or decrease as a join or failure happens. Table \ref{tab:rates} lists the changes that can occur in $I_x$ in an interval of time $\Delta t$ along with their rates.

118: %\begin{table}

119: %	\centering

120: %		\begin{tabular}{|l|l|} \hline

121: %		Change in $I_x$	&  Rate of Change   \\ %\hline

122: %		$I_x(t+\Delta t) = I_x(t)-1$ & $c_1=(\lambda_f \Delta t) \tilde{P}(x)$

123: %		\\ %\hline

124: %		$I_x(t+\Delta t) = I_x(t)-1$ & $c_2=\frac{(N \lambda_j \Delta t)}{{\cal K}-N}(x-1) P(x)$ \\ %\hline

125: %		$I_x(t+\Delta t) = I_x(t)+1$ & $c_3=\frac{\tilde{P}(x_1)}{N} (\lambda_f \Delta t) P(x-x_1)$ \\

126: %  															 & where $1 \leq x_1 \leq x-1$ \\ %\hline

127: %		$I_x(t+\Delta t) = I_x(t)+1$ & $c_4=(\lambda_j \Delta t) \frac{2}{{\cal K}-N} \sum_{x1>x} P(x_1)$\\ %\hline

128: %		$I_x(t+\Delta t) = I_x(t)$ & $1 - (c_1 + c_2 + c_3 + c_4)$\\ \hline

129: %		\end{tabular}

130: %\caption{Changes and their rate for $I(x)$ the number of intervals of length $x$.}

131: %\label{tab:rates}

132: %\end{table}

133: %

134: %First, a failure of either of the boundary nodes of an interval of size $x$ leads to its loss

135: %at rate $c_1$.

136: %Second, An interval of size $x$ can be lost at rate $c_2$ if a joining node splits it. The join can be

137: %initialized by any one of the $N$ nodes in the system, hence the factor of $N$ multiplying $\lambda_s$.

138: %Third, the number of intervals of size $x$ can increase by $1$ at rate $c_3$ if a failure of a boundary node results

139: %in the aggregation of two adjacent intervals. Fourth, an increase can happen at rate $c_4$ if a join event splits a larger

140: %interval into an interval of size $x$. Finally, $I_x$ remains the same if none of the above happens. Therefore

141: %the equation for $I_x$ is:

142: %

143: %\begin{equation}

144: %\label{eqn:i}

145: %\begin{split}

146: %\frac{d I_x}{dt} = &- P(x) \left[ 2\lambda_f + \frac{N\lambda_j(x-1)}{{\cal K}-N} \right] \\

147: %		& + \lambda_f \sum_{x_1=1}^{x-1} P(x)P(x-x_1)  \\

148: %		&+ 2\lambda_j \frac{N}{{\cal K}-N} \sum_{x_1>x} P(x_1) , x \geq 1      \\

149: %\end{split}

150: %\end{equation}

151: %

152: %We can check that :

153: %\begin{equation}

154: %\frac{d}{dt}\sum I_x = \frac{dN}{dt} = \lambda_j - \lambda_f \nonumber

155: %\end{equation}

156: %

157: %Further we can check that :

158: %\begin{equation}

159: %\frac{d}{dt}\sum xI_x = \frac{d{\cal K}}{dt} = 0    \nonumber

160: %\end{equation}

161:

162: %The set of equations \ref{eqn:i} can be solved leading to the solution:

163:

164: %\begin{equation}

165: %\label{eqn:ii}

166: %P(x) = \rho^{x-1}(1-\rho)

167: %\end{equation}

168: %where $\rho = \frac{{\cal K}-2N}{{\cal K}-N}$ and $1-\rho=\frac{N}{{\cal K}-N}$

169:

170: %We write and equation for  $\left\langle N$

171:

172: %We now derive some properties of this distribution which will be used in the ensuing analysis.

173:

174:

175: \begin{property}

176: For any two keys $u$ and $v$, where $v=u+x$, let $b_i$ be the probability

177: that the first node encountered inbetween these two keys is at $u+i$ (where $0 \leq i < x-1$).

178: Then $b_i \equiv {\rho^{i}(1-\rho)}$.

179: The probability that there is definitely atleast one node between $u$ and $v$ is: $a(x)\equiv {1-\rho^x}$.

180: Hence the conditional probability that the first node is at a distance $i$ {\it given} that

181: there is atleast one node in the interval is $ bc(i,x)\equiv b(i)/a(x)$.

182:

183: %

184: %

185: %

186: %The probability that there is definitely atleast one node between any two keys

187: %a distance $x$ apart is: $a(x)\equiv {1-\rho^x}$.

188: %%(left included, right excluded)

189: %The probability that the first node is at a distance $x$ from the beginning of the interval is:

190: %$b(x) \equiv {\rho^{x-1}(1-\rho)}$. Hence the conditional

191: %probability that the first node is at a distance $x$ {\it given} that

192: %there is atleast one node in the interval is $ bc(x)\equiv b(x)/a(x)$

193:

194: \end{property}

195:

196:

197: \begin{property}

198: \label{prop:share}

199: The probability that a node and atleast one of its immediate predecessors

200: share the same $k^{th}$ finger

201: is $p_1(k)\equiv \frac{\rho}{1+\rho} (1-\rho^{2^k-2})$. This is $\sim 1/2$ for

202: ${\cal K} >> 1$ and $N << {\cal K}$.Clearly $p_1=0$ for $k=1$.

203: It is straightforward (though tedious) to

204: derive similar expressions for $p_2(k)$ the probability that a node and atleast {\it two} of its immediate predecessors share the same $k^{th}$ finger,

205: $p_3(k)$ and so on.

206: \end{property}

207:

208: \begin{property}

209: \label{prop:copy}

210: We can similarly assess the probability that the join protocol (see previous section)

211: results in further replication of the $k^{th}$ pointer. That is, the probability that a newly joined node will choose the $k^{th}$  entry of its successor's finger table

212: as its own $k^{th}$ entry is

213: $p_{\mathrm join}(k) \sim \rho (1-\rho^{2^{k-2} -2}) + (1-\rho) (1-\rho^{2^{k-2}-2}) -(1-\rho) \rho (2^{k-2} -2) \rho^{2^{k-2}-3} $.

214: The function $p_{\mathrm join}(k)=0$ for small $k$ and $1$ for large $k$.

215: \end{property}

216: %\Proof

217: %If the distance between node $n$ and its predecessor $p$ is $x$, the distance between

218: %$n.f_k$.\emph{start} and $p.f_k$.\emph{start} is also $x$. If there is no node

219: %between $n.f_k$.\emph{start} and $p.f_k$.\emph{start} then they will share

220: %the same value for that $k^th$ finger. The probability that the distance between $n$ and $p$ is $x$ is $\rho^{x-1}(1-\rho)$

221: %as derived in equation \ref{eqn:ii}. The probability that no node exists between $n.f_k$.\emph{start} and $p.f_k$.\emph{start}

222: %is $\rho^x$. The probability that the $n.f_k$.\emph{start} and $p.f_k$.\emph{start} share the same successor is:

223: %\begin{equation}

224: %\label{eqn:iii}

225: %\begin{split}

226: %\sum_{x=1..{\cal K}} \rho^{x-1}(1-\rho)\rho^x

227: %&= \frac {1-\rho}{\rho}\frac{\rho^2}{1-\rho^2} = \frac {\rho}{1+\rho}												    \\

228: %&= \left[\frac{{\cal K}-2N}{{\cal K}-N}\right]\left[\frac{{\cal K}-2N}{2{\cal K}-3N}\right] \approx 0.5

229: %\end{split}

230: %\end{equation}

231: %\qedSquare

232: %

233: %

234: %Let the probability that in an interval of length $x$, there is at least one node be denoted by $P_{>1}(x)$.

235: %$P_{>1}(x) = 1- \rho^x$ where $\rho^x$ is the probability that there is no node inside the interval.

236:

237:

238: %The sum $\sum_{y=1}^{x} \frac{\rho^{y-1}(1-\rho)}{1-\rho^x} =1$ since we are given that there

239: %definitely is a node inside the interval.

240: %For $x << \frac{{\cal K}}{N}$ (the average inter-node distance), this prob is almost uniform ($\approx \frac{1}{l}$).

241: %For $x >> \frac{{\cal K}}{N}$ the probability becomes vanishingly small beyond $y \approx \frac{2{\cal K}}{N}$.

242: