0207:cs0207061/interval

1: \section{Interval Analysis}

2: \label{sec:ia}

3:

4: We turn now to two problems on flowgraphs.  The first is

5: \emph{interval analysis}.  Let $G =(V, A, r)$ be a flowgraph, and

6: let $D$ be a given depth-first search tree rooted at $r$.  Identify

7: vertices by their preorder number with respect to the DFS: $v < w$

8: means that $v$ was visited before $w$.

9: {\em Reverse preorder} of the vertices

10: is decreasing order by (preorder) vertex number.

11: For each vertex $v$, the

12: \emph{head} of $v$ is

13: \begin{eqnarray*}

14: h(v)  =  \max \{ u   : u \neq v \mbox{ and there is a path

15: from } v  \mbox{ to } u \mbox{ containing only descendants of } u \};

16: \end{eqnarray*}

17: $h(v) = \nul$ if this set is empty. The heads define a forest $H$

18: called the \emph{interval forest}: $h(v)$ is the parent of  $v$ in

19: $H$. Each subtree $H(v)$ of $H$ induces a strongly connected

20: subgraph of $G$, containing only vertices in $D(v)$ (the descendants

21: of $v$ in $D$). See Figure~\ref{fig:iforest}. Tarjan~\cite{st:t}

22: proposed an algorithm that uses an NCA computation, incremental

23: backward search, and a DSU data structure to compute $H$ in

24: $O(m\alpha(m, n))$ time on a pointer machine. We shall add

25: microtrees, a maximal path partition of the core, and a stack to

26: Tarjan's algorithm, thereby improving its running time to $O(m)$ on

27: a pointer machine.

28:

29: \begin{figure}[t]

30: \begin{center}

31: \scalebox{0.825}[0.825]{\input{iforest.pstex_t}}

32: \end{center} \caption{(a) A DFS tree $D$

33: of the input flowgraph $G$;

34: non-tree arcs are dashed. (b)

35: The interval forest $H$ of $G$ with respect to $D$;

36: arrows are parent pointers.

37: \label{fig:iforest}}

38: \end{figure}

39:

40: Tarjan's algorithm proceeds as follows.  Delete all the arcs

41: from the graph.  For each vertex $u$, form a set of all deleted arcs

42: $(x, y)$ such that $\nca{x, y} = u$. Process the vertices in any

43: bottom-up order; reverse preorder will do.  To process a vertex $u$,

44: add back to the graph arcs corresponding to all the deleted arcs

45: $(x, y)$ with $\nca{x, y} = u$. Then examine each arc $(v, u)$

46: entering $u$.  If $v \neq u$, set $h(v) = u$, and contract $v$ into

47: $u$; for all arcs having $v$ as an end, replace $v$ by $u$. This may

48: create multiple arcs and loops, which poses no difficulty for the

49: algorithm. Continue until all arcs into $u$ have been examined,

50: including those formed by contraction.  When adding arcs back to the

51: graph, the arc corresponding to an original arc is the one formed by

52: doing end replacements corresponding to all the contractions done so

53: far.

54:

55:

56: To keep track of contractions, Tarjan's algorithm uses a DSU

57: structure whose elements are the graph vertices.  The algorithm also

58: uses a reverse adjacency set $R(u)$, initially empty, for each

59: vertex $u$.  A more detailed description of the algorithm is as

60: follows. To process $u$, for each arc $(x,y)$ such that $\nca{x, y}

61: = u$, add $x$ to $R(\find(y))$. (The replacement for $x$ is done

62: later.) Then, while $R(u)$ is non-empty, delete a vertex $x$ from

63: $R(u)$; let $v \leftarrow \find(x)$; if $v \neq u$, set $h(v)

64: \leftarrow u$, set $R(u) \leftarrow R(u) \cup R(v)$, and do

65: $\unite(u,v)$.

66:

67: With the sets $R(u)$ represented as singly linked circular lists (so

68: that set union takes constant time), the running time of this

69: algorithm on a pointer machine is linear except for the NCA

70: computations and the DSU operations, which take $O(m \alpha(m, n))$

71: time in Tarjan's original implementation.  We shall reduce the

72: running time to linear

73: by using microtrees to eliminate redundant computation

74: and by reordering the unites into a bottom-up order.

75:

76: As in Section \ref{sec:nca}, partition $D$ into a set of

77: bottom-level microtrees (the fringe), each with fewer than $g =

78: \log^{1/3}{n}$ vertices, and $D'$, the remainder of $D$ (the core).

79: Use a topological graph computation to compute $h(v)$ for every

80: vertex $v$ such that $h(v)$ is in the fringe.

81: The definition of heads implies that for any such vertex

82: $v$, $h(v)$ and $v$ are in the same microtree,

83: and furthermore that

84: the only information

85: needed to compute heads in the fringe is, for each microtree, the

86: subgraph induced by its vertices, with non-tree edges marked

87: by a bit.  With

88: $g = \log^{1/3}{n}$, this computation takes $O(m)$ time by

89: Theorem~\ref{thm:tgc2}.

90:

91: It remains to compute heads for vertices whose heads are in the

92: core. Our approach is to run Tarjan's algorithm starting from the

93: state it would have reached after processing the fringe.  This

94: amounts to contracting all the strong components in the fringe and

95: then running the algorithm.  This approach does not quite work as

96: stated, because the DSU operations are not restricted enough for

97: Lemma \ref{lemma:pc} to apply.  To overcome this difficulty, we

98: partition the core into maximal paths. Then we run Tarjan's

99: algorithm path-by-path, keeping track of contractions with a hybrid

100: structure consisting of a DSU structure that maintains contractions

101: outside the path being processed and a stack that maintains

102: contractions inside the path being processed.  The latter structure

103: functions in the same way as the one Gabow used in his

104: algorithm\cite{pathdfs:g00} for finding strong components. Now we

105: give the complete description of our algorithm.

106:

107: Partition the vertices in $D'$ into a set of maximal paths by

108: choosing, for each non-leaf vertex $v$ in $D'$, a child $c(v)$ in

109: $D'$. (Any child will do.)  The arcs $(v, c(v))$ form a set of paths

110: that partition the vertices in $D'$.  For such a path $P$, we denote

111: the smallest and largest vertices on $P$ by $\ltop{P}$ and

112: $\lbottom{P}$, respectively; $\lbottom{P}$ is a leaf of $D'$. Since

113: $D'$ has at most $n/g$ leaves, the number of paths is at most $n/g$.

114: Partitioning $D'$ into paths takes $O(n)$ time.

115:

116: After constructing a maximal path partition of the core, initialize

117: a DSU structure containing every vertex (fringe and core) as a

118: singleton set.  Visit the fringe vertices in bottom-up order, and,

119: for each fringe vertex $v$ with $h(v)$ also in the fringe, perform

120: $\unite(h(v), v)$; for such a vertex, $h(v)$ has already been

121: computed. Initialize $R(u) \leftarrow \emptyset$ for every vertex

122: $u$.  For every arc $(x, y)$ with $x$ and $y$ in the same microtree,

123: add $x$ to $R(\find(y))$. For every remaining arc $(x, y)$, compute

124: $u=\nca{x, y}$ and add $(x, y)$ to the set of arcs associated with

125: $u$. These NCA computations take $O(m)$ time using the algorithm of

126: Section \ref{sec:nca}. Indeed, every NCA query is big, so the AHU

127: algorithm answers them in linear time.  \ignore{(The NCA computation does

128: need a separate DSU structure, whose underlying tree is $D$; the DSU

129: structure for Tarjan's algorithm has $H$ as the underlying tree.)}

130: This completes the initialization.

131:

132: Now process each path $P$ in the path partition, in bottom-up order

133: with respect to $\ltop{P}$.  To process a path $P$, initialize

134: an empty

135: stack $S$. Process each vertex $u$ of $P$ in bottom-up

136: order. To process $u$, for each arc $(x, y)$ such that $\nca{x, y} =

137: u$, add $x$ to $R(\find(y))$.  Then, while $R(u)$ is non-empty,

138: delete a vertex $x$ from $R(u)$.  Let $v \leftarrow \find(x)$.  If

139: $v$ is not on $P$, set $h(v) \leftarrow u$, set $R(u) \leftarrow

140: R(u) \cup R(v)$, and do $\unite(u, v)$. If, on the other hand, $v$

141: is on $P$, $v \neq u$, and $v$ is greater than the top vertex on

142: $S$, pop from $S$ each vertex $w$ less than or equal to $v$, set

143: $h(w) \leftarrow u$, and set $R(u) \leftarrow R(u) \cup R(w)$. Once

144: $R(u)$ is empty, push $u$ onto $S$.  After processing all vertices

145: on $P$, visit each vertex $u$ on $P$ again, in bottom-up order, and

146: if $h(u)$ is now defined, perform $\unite(h(u), u)$.

147: See Figure \ref{fig:ia}

148:

149: \begin{figure}

150: %\begin{center}

151: \scalebox{0.82}[0.82]{\input{ia.pstex_t}}

152: %\end{center}

153: \caption{

154: Idealized execution of the algorithm on the graph in (a),

155: with circled microtree.

156: Arcs depict the effects of contractions:

157: whenever $x\in R(y)$, $(\find(x),\find(y))$ is an arc

158: in the corresponding graph.

159: The first vertex in each labeled set is the corresponding original vertex

160: in (a).

161: (a$\rightarrow$b) During preprocessing, $h(v)\leftarrow u_1$, and $v$ is inserted

162: into the set of $u_1$.

163: (b$\rightarrow$c) When processing $u_2$, $h(u_1)\leftarrow u_2$ via the arc

164: 	$(v,u_2)$.

165: (c$\rightarrow$d) When processing $u_3$, the stack $S$ is (top-down)

166: 	$(u_2,\lbottom{P})$.  Hence, when processing the arc

167: 	$(\lbottom{P},u_3)$, $S$ is popped so that

168: 	$h(u_2)\leftarrow u_3$ and $h(\lbottom{P})\leftarrow u_3$.

169: 	(d) shows the state

170: 	after doing the $\unite(\cdot)$'s for path $P$.

171: (d$\rightarrow$e) When processing $u_4$, $S$ is $(w,z,\lbottom{Q})$.

172: 	Arc $(u_2,u_4)$ sets $h(u_3)\leftarrow u_4$ and

173: 	adds $\ltop{P}$ and $z$ to $R(u_4)$.

174: 	Processing $\ltop{P}$ causes

175: 	$h(\ltop{P})\leftarrow u_4$,

176: 	and processing $z$ pops the stack so that

177: 	$h(w)\leftarrow u_4$ and $h(z)\leftarrow u_4$.

178: (f) After processing path $Q$.

179: \label{fig:ia}}

180: \end{figure}

181:

182: This algorithm delays the unites for vertices on a path until the

183: entire path is processed, using the stack to keep track of the

184: corresponding contractions.  Specifically, the algorithm maintains

185: the following invariant: if vertex $u$ on path $P$ is currently

186: being processed and $x$ is any original vertex, then the vertex into

187: which $x$ has been contracted is $v = \find(x)$ if $v$ is not on

188: $P$, or the largest vertex on $S$ less than or equal to $v$ if $v$

189: is on $P$ and $S$ is non-empty, or $u$ otherwise.  It is

190: straightforward to verify this invariant by induction on time; the

191: correctness of this implementation of Tarjan's algorithm follows.

192:

193: \begin{theorem}

194: \label{thm:ia} The interval analysis algorithm runs in $O(m)$ time

195: on a pointer machine.

196: \end{theorem}

197:

198: \begin{proof}

199: The running time is linear except for the find operations: each

200: vertex gets added to $S$ once and has its head set at most once. To

201: bound the time for the find operations, we apply Lemma

202: \ref{lemma:pc} to the tree built by the parent assignments done by

203: the unite operations. Mark the tops of all paths.  Since there are

204: at most $n/g$ paths, there are at most $n/g = n/\log^{1/3}{n}$

205: marked vertices.  We claim that $k = 4$ satisfies the hypothesis of

206: the lemma.  We need a property of the interval forest $H$: if $h(v)

207: = u$, then every vertex $w \neq u$ on the path in $D$ from $u$ to

208: $v$ is a descendant of $u$ in $H$. This holds because there is a path

209: containing only vertices in $D(u)$ from $w$ to $v$ (via $D$) to $u$.

210:

211: The unites occur in batches, one initial batch for all the microsets

212: and one batch per path.  Consider any vertex $v$.  We bound the

213: number of times the set containing $v$ in the DSU structure can

214: change, as a result of a batch of unites, before $v$ is in a set

215: with a marked vertex.  Vertex $v$ can change sets once as a result

216: of the initialization (from a singleton set to a larger set). After

217: the initialization, $v$ is in some set, whose designated vertex may

218: be fringe or core.  The first batch of unites that changes the set

219: containing $v$ puts $v$ in a set with a designated vertex $u$ that

220: is in the core, specifically on some path $P$. The second batch of

221: unites that changes the set containing $v$ puts $v$ in the same set

222: as $\ltop{P}$ (by the property above), and $v$ is now in a set with

223: a marked node. Thus $v$ can change sets at most thrice before it is

224: in a set with a marked vertex.   The parent of $v$ can only change

225: once, as a result of a compression, without $v$ changing sets.

226: Therefore, the parent of $v$ can change at most four times before

227: $v$ is in a set with a marked vertex, so the claim is true.

228:

229: With $k = 4$ and $\ell \le n/\log^{1/3}{n}$, Lemma \ref{lemma:pc}

230: gives a bound of $O(m)$ on the time for the $\find$ operations.

231: \end{proof}

232:

233:

234:

235:

236: Interval analysis is an important component of program flow

237: analysis~\cite{aho:dragon2}. It also has other applications,

238: including testing flow graph reducibility \cite{reducibility:tarjan},

239: finding a pair of arc-disjoint spanning trees in a directed

240: graph \cite{st:t} and verifying a dominator tree \cite{domv:gt05}.

241: Our interval analysis algorithm gives $O(m)$-time

242: algorithms on a pointer machine for these applications as well.

243:

244:

245: In the next section we shall need a compressed version of the

246: interval forest $H'$ that is defined with respect to the fringe-core

247: partition: the parent $h'(v)$ of a vertex $v$ is its nearest core

248: ancestor in $H$ if it has one, $\nul$ otherwise.  We can easily

249: compute $H'$ from $H$ in linear time, but if we only want $H'$ and

250: not $H$, we can avoid the topological graph computation on the

251: microtrees: First, find the strong components of the graphs induced

252: by the vertex sets of the microtrees. For each such component, find

253: its smallest vertex $u$, and perform $\unite(u, v)$ for every other

254: vertex $v$ in the component.  Then run the algorithm above for the

255: core.  This computes $h(v) = h'(v)$ for every vertex $v$ with head

256: in the core. Complete the computation by setting $h'(v) = h'(u)$ for

257: each vertex $v \neq u$ in a fringe strong component with smallest

258: vertex $u$.

259: