0611:cs0611098/pr2.tex

1: \section{One-one correspondences between combinatorial structures}

2: We first define a path reversal transformation in $T_{n}$ and its {\em cost}

3: (see \cite{gi}). Then we point out some one-to-one correspondences between

4: combinatorial objects and structures which are relevant to the problem of computing

5: the average cost of path reversal. Such one-to-one tools are used in Section 3

6: to compute this expected cost and its variance by means of corresponding

7: probability generating functions.

8:

9: \subsection{Path reversal}

10: Let $T_{n}$ be a rooted $n$-node tree, or an ordered tree with $n$ nodes,

11: according to either \cite{gi}, or \cite[page 306]{kn}. A {\em path reversal}

12: at a node $x$ in $T_{n}$ is performed by traversing the path from $x$ to the

13: tree root $r$ and making $x$ the parent (or pointer {\em Last}) of each node

14: on the path other than $x$. Thus $x$ becomes the new tree root.

15: The {\em cost} of the reversal is the number of edges on the path

16: reversed. Path reversal is a variant of the standard path compression

17: algorithm for maintaining disjoint sets under union.

18:

19: \begin{figure}[t]

20:     \center

21:     \includegraphics[width=0.9\textwidth]{pr.eps}

22:     \caption{Path reversal $\varphi_{x_{0}}$.

23:     The $T_{i}$'s denote the (left/right) subtrees of $T_{n}$.}

24: \end{figure}

25:

26: The average cost of a path reversal performed on an initial ordered $n$-node

27: tree $T_{n}$ which consists of a root with $n - 1$ descendants (or

28: {\em children}, as in \cite{gi}) is the expected number of edges on the paths

29: reversed in $T_{n}$ (see Figure 1).

30: In words, it is the {\em expected height of such reversed trees}

31: $\varphi(T_{n})$, provided that we let the

32: height of a tree root be 1: {\em viz.} the {\em height} of a node $x$ in

33: $T_{n}$ is thus defined as being the number of nodes on the path from the

34: node $x$ to the root $r$ of $T_{n}$.

35: \par

36: It turns out that the average number of messages used in $\cal A$ is

37: actually the expected cost of a path reversal performed on such initial

38: ordered $n$-node

39: trees $T_{n}$ which consist of a root with $n - 1$ children. This is indeed

40: the average number of changes of the variable {\em Last} which builds the

41: dynamic data structure of path reversal used in algorithm $\cal A$.

42:

43: \subsection{Priority queues, tournament trees and permutations}

44: Whenever two combinatorial structures are counted by the same number,

45: there exist one-one mappings between the two structures.

46: Explicit one-to-one correspondences between combinatorial representations

47: provide coding and decoding algorithms between the stuctures. We now need the

48: following definitions of some combinatorial structures which are closely

49: connected with path reversal and involved in the computation of its cost.

50:

51: \subsubsection{Definitions and notations}

52: \begin{description}

53: \item

54: {\sl (i)} Let $[n]$ be the set $\{1, 2, \ldots , n\}$.

55: A {\em permutation} is a {\em one-one mapping} $\sigma : [n] \rightarrow

56: [n]$; we write $\sigma \in S_{n}$, where $S_{n}$ is the symmetric group over

57: $[n]$.

58: \item

59: {\sl (ii)} A {\em binary tournament tree} of size $n$ is a binary $n$-node

60: tree whose internal nodes are labeled with consecutive integers of $[n]$, in

61: such a way that the root is labeled 1, and all labels are decreasing

62: ({\em bottom-up}) along each branch. Let ${\cal T}_{n}$ denote the set of

63: all binary tournament trees of size $n$.

64: ${\cal T}_{n}$ also denotes the set of tournament representations of all

65: permutations $\sigma~\in~S_{n}$, considered as elements of $[n]^{n}$,

66: since the correspondence $\tau : S_{n} \rightarrow {\cal T}_{n}$ is

67: one-one (see \cite{vu} for a detailed proof). Note that this one-to-one

68: mapping implies that $\left| {\cal T}_{n} \right| = n!$

69: \item

70: {\sl (iii)} A {\em priority queue} of size $n$ is a set $Q_{n}$ of keys ;

71: each key $K \in Q_{n}$ has an associated priority $p(K)$ which is an

72: arbitrary integer. To avoid cumbersome

73: notations, we identify $Q_{n}$ with the set of priorities of its keys.

74: Strictly speaking, this is a set with repetitions since priorities need not

75: be all distincts. However, it is convenient to ignore this technicality

76: and assume {\em distinct priorities}. The simplest representation of a

77: priority queue of size $n$ is then a sequence

78: $s~=~(p_{1},p_{2},~\ldots~,p_{n})$ of the priorities of $Q_{n}$, kept in

79: their order of arrival. Assume the $n!$ possible orders of arrival of the

80: $p_{i}$'s

81: to be equally likely, a priority queue $Q_{n}$ ({\em i.e.} a sequence $s$ of

82: $p_{i}$'s) is defined as

83: random {\em iff} it is associated to a random order of the $p_{i}$'s.

84: There is a one-to-one correspondence between the set ${\cal T}_{n}$ of all the

85: $n$-node binary tournament trees and the set of all the priority queues

86: $Q_{n}$ of size $n$.

87: To each one sequence of priorities $s = (p_{1}, \ldots ,p_{n}) \in Q_{n}$,

88: we associate a binary tournament tree $\gamma(s) = T \in {\cal T}_{n}$

89: by the following rules: let {\bf m}$ \; = \; \min(s)$, we then write $s =

90: \ell \; \mbox{\bf m} \; r$; the binary tree $T \in {\cal T}_{n}$ possesses

91: {\bf m} as root, $\gamma(\ell)$ as left subtree and $\gamma(r)$ as right

92: subtree. The rules are applied repeatedly to all the left and right

93: subsequences of $s$, and from the root of $T$ to the leaves of $T$; by

94: convention, we let $\gamma(\emptyset) = \Lambda$ (where $\Lambda$

95: denotes the empty binary tree). The correspondence $\gamma$ is obviously

96: one-one (see \cite{fr} for a fully detailed constructive proof).

97: \par

98: We shall thus use binary tournaments ${\cal T}_{n}$ to represent the

99: permutations of $S_{n}$ as well as the priority queues $Q_{n}$ of size $n$.

100: \item

101: {\sl (iv)} If $T \in {\cal T}_{n}$ is a binary tournament, its

102: {\em right branch} $RB (T)$ is the increasing sequence of

103: priorities found on the path starting at the root of $T$ and repeatedly

104: going to the right subtree. The {\em bottom} of $RB (T)$ is the node

105: having no right son. The {\em left branch} $LB (T)$ of $T$ is defined in a

106: symmetrical manner.

107: \end{description}

108:

109: \subsection{The one-one correspondence between \protect\bm{Q_{n}} and

110: \protect\bm{T_{n}}}

111: We now give a constuctive proof of a {\em one-to-one correspondence} mapping

112: the given combinatorial structure of ordered trees $T_{n}$ (as defined in

113: the Introduction) onto the priority queues $Q_{n}$.

114: \begin{theorem}

115: There is a one-to-one correspondence between the priority queues of size

116: $n, Q_{n}$, and the ordered $n$-node trees $T_{n}$ which consist of a root with

117: $n - 1$ children.

118: \end{theorem}

119: \begin{proof}

120: There are many representations of priority queues $Q_{n}$ ; let us

121: consider the $n$-node {\em binary heap} structure, which is very simple and

122: perfectly suitable for the constructive proof.

123: \begin{itemize}

124: \item First, a {\em binary heap} of size $n$ is an {\em essentially complete

125: binary tree}. A binary tree is {\em essentially complete} if each of its

126: internal nodes possesses exactly two children, with the possible exception

127: of a unique {\em special} node situated on level $(h - 1)$ (where $h$ denotes

128: the height of the heap), which may possess only a left child and no right child.

129: Moreover, all the leaves are either on level $h$, or else they are on levels

130: $h$ and $(h-1)$, and no leaf is found on level $(h-1)$ to the left of an

131: internal node at the same level. The unique special node, if it exists, is

132: to the right of all the other level $(h-1)$ internal nodes in the subtree.\\

133: Besides, each tree node in a binary heap contains one item, with the items

134: arranged in heap order ({\em i.e.} the priority queue ordering): the key of

135: the item in the parent node is strictly smaller than the key of the item in

136: any descendant's node.

137: Thus the root is located at position 1 and contains an item of minimum key.

138: If we number the nodes of such a essentially complete binary tree from 1 to

139: $n$ in heap order and identify nodes with numbers, the parent of the node located

140: at position $x$ is located at $\lfloor x/2 \rfloor$. Similarly, The left

141: son of node $x$ is located at $2x$ and its right son at $\min\{2x + 1,n\}$.

142: We can thus represent each node by an integer and the entire binary heap

143: by a map from $[n]$ onto the items: the binary heap with $n$ nodes fits well

144: into locations $1, \ldots , n$. This forces a breadth-first, left-to-right

145: filling of the binary tree, {\em i.e.} a heap or priority queue ordering.

146:

147: \item Next, it is well-known that any ordered tree with $n$ nodes may easily

148: be transformed into a binary tree by the {\em natural correspondence} between

149: ordered trees and binary trees. The corresponding binary tree is obtained by

150: linking together the brothering nodes of the given ordered tree and removing

151: vertical links except from a father to its first (left) son. \\

152: Conversely, it is easy to see that any binary tree may be represented as an

153: ordered tree by reversing the process. The correspondence is thus one-one

154: (see \cite[Vol.~1, page 333]{kn}).

155: \end{itemize}

156: Note that the construction of a binary heap of size $n$ can be carried out

157: in a linear time, and more precisely in $\Theta(n)$ sift-up operations.

158:

159: Now, to each one sequence of priorities $s = (p_{1}, \ldots ,p_{n}) \in

160: Q_{n}$, we may associate a unique $n$-node tree $\alpha(s) = T_{n}$ in

161: the natural breadth-first, left-to-right order; by convention, we also let

162: $\alpha(\emptyset) = \Lambda$. In such a representation, $T_{n} = \alpha(s)$

163: is then an ordered $n$-node tree the ordering of which is the priority queue

164: (or heap) order, and it is thus built as an essentially complete binary heap

165: of size $n$. The correspondence $\alpha$ naturally represents the priority

166: queues $Q_{n}$ of size $n$ as ordered trees $T_{n}$ with $n$ nodes.

167:

168: Conversely, to any ordered tree $T_{n}$ with $n$ nodes, we may associate a

169: binary tree with heap ordered nodes, that is an essentially complete binary

170: heap. Hence, there exists a correspondence $\beta$ mapping any given ordered

171: $n$-node tree $T_{n}$ onto a unique sequence of priorities $s = \beta(T_{n})

172: \in Q_{n}$; by convention we again let $\beta(\Lambda) = \emptyset$.

173:

174: The correspondence is one-one, and it is easily seen that mappings $\alpha$

175: and $\beta$ are respective inverses.

176: \end{proof}

177:

178: Let binary tournament trees represent each one of the above structures. Any

179: operation can thus be performed as if dealing with ordered trees $T_{n}$,

180: whereas binary tournament trees or permutations are really manipulated.

181: More precisely, since we know that $T_{n} \longleftrightarrow Q_{n}

182: \longleftrightarrow {\cal T}_{n} \longleftrightarrow S_{n}$, the cost of path

183: reversal performed on initial $n$-node trees $T_{n}$ which consist of a

184: root with $n-1$ children is {\em transported} from the $T_{n}$'s onto the

185: tournament trees $T \in {\cal T}_{n}$ and onto the permutations

186: $\sigma \in S_{n}$. In the following definitions (see Section 3.1 below),

187: we therefore let $\varphi(\sigma) \in S_{n}$ denote the ``reversed''

188: permutation which corresponds to the reversed tree $T_{n}$.

189: From this point the {\em first moment of the cost of path reversal},

190: $\varphi : T_{n} \rightarrow T_{n}$, can be derived, and a

191: straightforward proof technique of the result, distinct from the one in

192: section 3 below, is also detailed in the Appendix.

193:

194: \section{Expected cost of path reversal, average message complexity of $\bm{\cal A}$}

195: It is fully detailed in the Introduction how the two data structures at hand

196: are actually involved in algorithm $\cal A$ and the design of the algorithm

197: takes place in~\cite{nta,tn}.

198:

199: \subsection{Analysis}

200: Eq.~(13) proved in the Appendix, is actually sufficient to provide the average

201: cost of path reversal. However, since we also desire to know the second moment

202: of the cost, we do need the probability generating function of the probabilities

203: $p_{n,k}$, defined as follows.

204:

205: \medskip Let $h(T_{n})$ denote the height of $T_{n}$, {\em i.e.} the number

206: of nodes on the path from the deepest node in $T_{n}$ to the root of $T_{n}$,

207: and let $T \in {\cal T}_{n-1}$.

208: $$p_{n,k} \;= \;\Pr\{\mbox{cost of path reversal for}\ T_{n}\ \mbox{is}\ k\} %

209: \;=\; \Pr\{h (\varphi(T)) = k\}$$

210: is the probability that the tournament tree $\varphi(T)$ is of height $k$.

211: We also have

212: $$p_{n,k} \;=\; \Pr\{k\ \mbox{changes occur in the variable \textit{Last}

213: of algorithm} {\cal A}\}.$$

214:

215: More precisely, let a {\bf swap} be any interchanged pair of adjacent

216: prime cycles (see {\rm \cite[Vol.~3, pages 28-30]{kn}}) in a permutation

217: $\sigma$ of $[n-1]$ to obtain the ``reversed'' permutation

218: $\varphi_{x}(\sigma)$ corresponding to the path reversal performed at a node

219: $x \in T_{n}$, that is any interchange which occurs in the relative order of

220: the elements of $\varphi_{x}(\sigma)$ from the one of $\sigma$'s elements, and

221: let $N$ be the number of these swaps occurring from $\sigma \in S_{n-1}$ to

222: $\varphi_{x}(\sigma)$, then,

223: $$p_{n,k} \:=\; \frac{1}{(n-1)!}\, (\mbox{number of}\ \sigma \in S_{n-1}\ %

224: \mbox{for which}\ N=k),$$

225: since the cost of a path reversal at the root of an ordered tree such as

226: $T_{n}$ is zero.

227:

228: \begin{lemma}

229: Let $P_{n}(z) = \sum_{k \geq 0} p_{n,k} z^{k}$ be the probability generating

230: function of the $p_{n,k}$'s. We have the following identity,

231: $$P_{n}(z) \:=\; \prod_{j=1}^{n-1} \frac{z + j - 1}{j}\,.$$

232: \end{lemma}

233: \begin{proof}

234: We have $p_{1,0} = 1$ and $p_{1,k} = 0$ for all $k > 0$.

235: \par

236: A fundamental point in this derivation is that we are averaging not over

237: all tournament trees $T \in {\cal T}_{n-1}$, but {\em over all possible

238: orders} of the elements of $S_{n-1}$.

239: Thus, every permutation of $(n - 1)$ elements with $k$ swaps corresponds to

240: $(n - 2)$ permutations of $(n - 2)$ elements with $k$ swaps and one

241: permutation of $(n - 2)$ elements with $(k - 1)$ swaps. This leads directly

242: to the recurrence

243: $$(n-1)! p_{n,k} \;=\; (n-2)(n-2)! \,p_{n-1,k} \;+\; (n-2)! p_{n-1,k-1},$$

244: or

245: \begin{equation}

246: p_{n,k} =\; \left(1-\frac{1}{n-1} \right) p_{n-1,k} \;+\; %

247: \left(\frac{1}{n-1}\right) p_{n-1,k-1}.

248: \end{equation}

249: \par

250: Consider any permutation $\sigma = \langle \sigma_{1} \ldots \sigma_{n-1}

251: \rangle$ of $[n-1]$. Formula (1) can also be derived directly with the

252: argument that the probability of $N$ being equal to $k$ is

253: the simultaneous occurrence of $\sigma_{i} = j\; \; (1 \leq i,j \leq n-1)$ and

254: $N$ being equal to $k-1$ for the remaining elements of $\sigma$, {\em plus}

255: the simultaneous occurrence of $\sigma_{i} \neq j \; (1 \leq i,j \leq n-1)$

256: and $N$ being equal to $k$ for the remaining elements of $\sigma$. Therefore,

257: \begin{eqnarray*}

258: p_{n,k} & = & \Pr\{\sigma_{i} = j\} \times p_{n-1,k-1} \;+\; %

259: \Pr\{\sigma_{i} \neq j\} \times p_{n-1,k} \\

260: & = & \Big(1/(n-1)\Big) p_{n-1,k-1} \;+\; \Big(1-1/(n-1)\Big) p_{n-1,k}.

261: \end{eqnarray*}

262: \par

263: Using now the probability generating function $P_{n}(z) =\: \sum_{k \geq 0} p_{n,k} z^{k}$,

264: we get after multiplying~(1) by $z^{k}$ and summing,

265: $$(n - 1) P_{n}(z) \;=\; z P_{n-1}(z) \;+\; (n - 2) P_{n-1}(z),$$

266: which yields

267: \begin{eqnarray}

268: P_{n}(z) & = & \frac{z + n - 2}{n - 1} \; P_{n-1}(z) \nonumber \\

269: P_{1}(z) & = & z.

270: \end{eqnarray}

271:

272: The latter recurrence~(2) telescopes immediately to

273:

274: $$P_{n}(z) \; = \; \prod_{j=1}^{n-1} \frac{z + j - 1}{j} .$$

275: \end{proof}

276: \begin{remark}

277: The property proved by Trehel that the average number of

278: messages required by $\cal A$ is exactly the number of nodes at

279: height 2 in the reversed ordered trees $\varphi(T_{n})$ (see \cite{nta})

280: is hidden in the definition of the $p_{n,k}$'s. As a matter of fact, the

281: number of permutations of $[n]$ which contains exactly 2 prime cycles is

282: $\left[\begin{array}{c} n \\ 2\end{array}\right]

283: \;=\; (n-1)!\, H_{n-1}$ (see~\cite{kn}), and whence the result.

284: \end{remark}

285: \begin{theorem}

286: The expected cost of path reversal and the average message complexity

287: of algorithm $\cal A$ is $\E(C_{n})\:=\; \overline{C_{n}} \:=\: H_{n-1}$,

288: with variance $var(C_{n}) \:=\: H_{n-1} \;-\; H_{n-1}^{(2)}$.

289: Asymptotically, for large $n$,

290: $$\overline{C_{n}} \:=\; \ln n \;+\;\gamma \;+\; O(n^{-1})\ \quad \mbox{and}\

291: \quad var(C_{n}) \;=\; \ln n \;+\; \gamma \;-\; \pi^{2}/6 \;+\; O(n^{-1}).$$

292: \end{theorem}

293: \begin{proof}

294: By Lemma 3.1, the probability generating function $P_{n}(z)$ may be regarded

295: as the product of a number of very simple probability generating functions

296: (P.G.F.s), namely, for $1\leq j\leq n-1$,

297: $$P_{n}(z) =\; \prod_{1 \leq j \leq n-1} \Pi_{j}(z),\ \quad\ \mbox{with}\ %

298: \ \Pi_{j}(z) \;=\; \frac{j-1}{j} \;+\;\frac{z}{j}\,.$$

299:

300: Therefore, we need only compute moments for the P.G.F. $\Pi_{j}(z)$,

301: and then sum for $j = 1$ to $n - 1$. This is a classical property of P.G.F.s

302: that one may transform products to sums.

303:

304: \medskip \noindent Now, $\Pi_{j}'(1) \;=\; 1/j$ and $\Pi_{j}''(1) \:=\: 0$,

305: and hence

306: $$\E(C_{n}) \;=\: \overline{C_{n}} \:=\; P_{n}'(1) \;=\; \sum_{j=1}^{n-1} %

307: \Pi_{j}'(1) \;= \: H_{n-1}.$$

308: Moreover, the variance of $C_{n}$ is

309: $$var(C_{n}) \:=\; P_{n}''(1) \;+\;P_{n}'(1) \;-\; P_{n}'^{2}(1),$$

310: and thus,

311: $$var(C_{n}) \:= \; \sum_{j=1}^{n-1} \frac{1}{j} \;-\; \sum_{j=1}^{n-1}

312: \frac{1}{j^{2}} \;=\; H_{n-1} \;-\; H_{n-1}^{(2)}.$$

313: \par

314: Since $H_{n-1}^{(2)} \;=\; \pi^{2}/6 \;-\; 1/n \:+\: O(n^{-2})$ when

315: $n\rightarrow +\infty$, and by the asymptotic expansion of $H_{n}$, the

316: asymptotic values of $\overline{C_{n}}$ and of $var(C_{n})$ are easily obtained.

317: (Recall that Euler's constant is $\gamma  = 0.57721\ldots$, thus

318: $\gamma - \pi^{2}/6 \;=\: - 1.6772\ldots$)

319:

320: Hence, $\overline{C_{n}} =\: .693\ldots \lg n \;+\; O(1)$, and

321: $var(C_{n}) =\: .693\ldots \lg n \;+\; O(1)$.

322: \end{proof}

323: Note also that, by a generalization of the central limit theorem to sums of

324: independent but nonidentical random variables, it follows that

325: $$\frac{C_{n} \:-\: {\overline C_{n}}}{(\ln n \,-\, 1.06772\ldots)^{1/2}}$$

326: converges to the normal distribution whe $n\rightarrow +\infty$.

327: \begin{proposition}

328: The worst-case message complexity of algorithm $\cal A$ is $O(n)$.

329: \end{proposition}

330: \begin{proof}

331: Let $\Delta$ be the {\em maximum} communication delay time in the

332: network and let $\Sigma$ be the {\em minimum} delay time for a process

333: to enter, proceed and release the critical section. \\

334: Set $q =\: \left\lceil \Delta/\Sigma \right\rceil$, the number of messages

335: used in $\cal A$ is at most $(n-1) \:+\: (n-1)q \:=\: (n-1)(q+1) \:= O(n)$.

336: \end{proof}

337:

338: \begin{remarks}

339: \item[1.]\ The one-to-one correspondence between ordered trees with $(n+1)$

340: nodes and the words of lenght $2n$ in the Dycklanguage with one type of

341: bracket is used in \cite{nta} to compute the average message complexity of

342: $\cal A$. Several properties and results connecting the depth of a Dyckword

343: and the height of the ordered $n$-node trees can be derived from the

344: one-to-one correspondences between combinatorial structures involved

345: in the proof of Theorem 2.1.

346:

347: \item[2.]\ In the first variant of algorithm $\cal A$ (see \cite{tn})

348: which is analysed here, a node never stores more than one request of some

349: other node and hence it only requires $O(\log n)$ bits to store the variables,

350: and the message size is also $O(\log n)$ bits. This is not true of the second

351: variant of algorithm $\cal A$ (designed in \cite{nt}). Though the constant

352: factor within the order of magnitude of the average number of messages is claimed

353: to be slightly improved (from 1 downto $.4$), the token now consists of a queue

354: of processes requesting the critical section.

355: Since at most $n-1$ processes belong to the requesting queue, the size of the

356: token is $O(n\log n)$. Therefore, whereas the average message complexity

357: is slightly improved (up to a constant factor), the message size increases

358: from $O(\log n)$ bits to $O(n\log n)$ bits. The bit complexity is thus much

359: larger in the second variant~\cite{nt} of $\cal A$. Moreover, the state

360: information stored at each node is also $O(n\log n)$ bits in the second

361: variant, which again is much larger than in the first variant of $\cal A$.

362: \end{remarks}

363: