1: \section{Interval Analysis}
2: \label{sec:ia}
3:
4: We turn now to two problems on flowgraphs. The first is
5: \emph{interval analysis}. Let $G =(V, A, r)$ be a flowgraph, and
6: let $D$ be a given depth-first search tree rooted at $r$. Identify
7: vertices by their preorder number with respect to the DFS: $v < w$
8: means that $v$ was visited before $w$.
9: {\em Reverse preorder} of the vertices
10: is decreasing order by (preorder) vertex number.
11: For each vertex $v$, the
12: \emph{head} of $v$ is
13: \begin{eqnarray*}
14: h(v) = \max \{ u : u \neq v \mbox{ and there is a path
15: from } v \mbox{ to } u \mbox{ containing only descendants of } u \};
16: \end{eqnarray*}
17: $h(v) = \nul$ if this set is empty. The heads define a forest $H$
18: called the \emph{interval forest}: $h(v)$ is the parent of $v$ in
19: $H$. Each subtree $H(v)$ of $H$ induces a strongly connected
20: subgraph of $G$, containing only vertices in $D(v)$ (the descendants
21: of $v$ in $D$). See Figure~\ref{fig:iforest}. Tarjan~\cite{st:t}
22: proposed an algorithm that uses an NCA computation, incremental
23: backward search, and a DSU data structure to compute $H$ in
24: $O(m\alpha(m, n))$ time on a pointer machine. We shall add
25: microtrees, a maximal path partition of the core, and a stack to
26: Tarjan's algorithm, thereby improving its running time to $O(m)$ on
27: a pointer machine.
28:
29: \begin{figure}[t]
30: \begin{center}
31: \scalebox{0.825}[0.825]{\input{iforest.pstex_t}}
32: \end{center} \caption{(a) A DFS tree $D$
33: of the input flowgraph $G$;
34: non-tree arcs are dashed. (b)
35: The interval forest $H$ of $G$ with respect to $D$;
36: arrows are parent pointers.
37: \label{fig:iforest}}
38: \end{figure}
39:
40: Tarjan's algorithm proceeds as follows. Delete all the arcs
41: from the graph. For each vertex $u$, form a set of all deleted arcs
42: $(x, y)$ such that $\nca{x, y} = u$. Process the vertices in any
43: bottom-up order; reverse preorder will do. To process a vertex $u$,
44: add back to the graph arcs corresponding to all the deleted arcs
45: $(x, y)$ with $\nca{x, y} = u$. Then examine each arc $(v, u)$
46: entering $u$. If $v \neq u$, set $h(v) = u$, and contract $v$ into
47: $u$; for all arcs having $v$ as an end, replace $v$ by $u$. This may
48: create multiple arcs and loops, which poses no difficulty for the
49: algorithm. Continue until all arcs into $u$ have been examined,
50: including those formed by contraction. When adding arcs back to the
51: graph, the arc corresponding to an original arc is the one formed by
52: doing end replacements corresponding to all the contractions done so
53: far.
54:
55:
56: To keep track of contractions, Tarjan's algorithm uses a DSU
57: structure whose elements are the graph vertices. The algorithm also
58: uses a reverse adjacency set $R(u)$, initially empty, for each
59: vertex $u$. A more detailed description of the algorithm is as
60: follows. To process $u$, for each arc $(x,y)$ such that $\nca{x, y}
61: = u$, add $x$ to $R(\find(y))$. (The replacement for $x$ is done
62: later.) Then, while $R(u)$ is non-empty, delete a vertex $x$ from
63: $R(u)$; let $v \leftarrow \find(x)$; if $v \neq u$, set $h(v)
64: \leftarrow u$, set $R(u) \leftarrow R(u) \cup R(v)$, and do
65: $\unite(u,v)$.
66:
67: With the sets $R(u)$ represented as singly linked circular lists (so
68: that set union takes constant time), the running time of this
69: algorithm on a pointer machine is linear except for the NCA
70: computations and the DSU operations, which take $O(m \alpha(m, n))$
71: time in Tarjan's original implementation. We shall reduce the
72: running time to linear
73: by using microtrees to eliminate redundant computation
74: and by reordering the unites into a bottom-up order.
75:
76: As in Section \ref{sec:nca}, partition $D$ into a set of
77: bottom-level microtrees (the fringe), each with fewer than $g =
78: \log^{1/3}{n}$ vertices, and $D'$, the remainder of $D$ (the core).
79: Use a topological graph computation to compute $h(v)$ for every
80: vertex $v$ such that $h(v)$ is in the fringe.
81: The definition of heads implies that for any such vertex
82: $v$, $h(v)$ and $v$ are in the same microtree,
83: and furthermore that
84: the only information
85: needed to compute heads in the fringe is, for each microtree, the
86: subgraph induced by its vertices, with non-tree edges marked
87: by a bit. With
88: $g = \log^{1/3}{n}$, this computation takes $O(m)$ time by
89: Theorem~\ref{thm:tgc2}.
90:
91: It remains to compute heads for vertices whose heads are in the
92: core. Our approach is to run Tarjan's algorithm starting from the
93: state it would have reached after processing the fringe. This
94: amounts to contracting all the strong components in the fringe and
95: then running the algorithm. This approach does not quite work as
96: stated, because the DSU operations are not restricted enough for
97: Lemma \ref{lemma:pc} to apply. To overcome this difficulty, we
98: partition the core into maximal paths. Then we run Tarjan's
99: algorithm path-by-path, keeping track of contractions with a hybrid
100: structure consisting of a DSU structure that maintains contractions
101: outside the path being processed and a stack that maintains
102: contractions inside the path being processed. The latter structure
103: functions in the same way as the one Gabow used in his
104: algorithm\cite{pathdfs:g00} for finding strong components. Now we
105: give the complete description of our algorithm.
106:
107: Partition the vertices in $D'$ into a set of maximal paths by
108: choosing, for each non-leaf vertex $v$ in $D'$, a child $c(v)$ in
109: $D'$. (Any child will do.) The arcs $(v, c(v))$ form a set of paths
110: that partition the vertices in $D'$. For such a path $P$, we denote
111: the smallest and largest vertices on $P$ by $\ltop{P}$ and
112: $\lbottom{P}$, respectively; $\lbottom{P}$ is a leaf of $D'$. Since
113: $D'$ has at most $n/g$ leaves, the number of paths is at most $n/g$.
114: Partitioning $D'$ into paths takes $O(n)$ time.
115:
116: After constructing a maximal path partition of the core, initialize
117: a DSU structure containing every vertex (fringe and core) as a
118: singleton set. Visit the fringe vertices in bottom-up order, and,
119: for each fringe vertex $v$ with $h(v)$ also in the fringe, perform
120: $\unite(h(v), v)$; for such a vertex, $h(v)$ has already been
121: computed. Initialize $R(u) \leftarrow \emptyset$ for every vertex
122: $u$. For every arc $(x, y)$ with $x$ and $y$ in the same microtree,
123: add $x$ to $R(\find(y))$. For every remaining arc $(x, y)$, compute
124: $u=\nca{x, y}$ and add $(x, y)$ to the set of arcs associated with
125: $u$. These NCA computations take $O(m)$ time using the algorithm of
126: Section \ref{sec:nca}. Indeed, every NCA query is big, so the AHU
127: algorithm answers them in linear time. \ignore{(The NCA computation does
128: need a separate DSU structure, whose underlying tree is $D$; the DSU
129: structure for Tarjan's algorithm has $H$ as the underlying tree.)}
130: This completes the initialization.
131:
132: Now process each path $P$ in the path partition, in bottom-up order
133: with respect to $\ltop{P}$. To process a path $P$, initialize
134: an empty
135: stack $S$. Process each vertex $u$ of $P$ in bottom-up
136: order. To process $u$, for each arc $(x, y)$ such that $\nca{x, y} =
137: u$, add $x$ to $R(\find(y))$. Then, while $R(u)$ is non-empty,
138: delete a vertex $x$ from $R(u)$. Let $v \leftarrow \find(x)$. If
139: $v$ is not on $P$, set $h(v) \leftarrow u$, set $R(u) \leftarrow
140: R(u) \cup R(v)$, and do $\unite(u, v)$. If, on the other hand, $v$
141: is on $P$, $v \neq u$, and $v$ is greater than the top vertex on
142: $S$, pop from $S$ each vertex $w$ less than or equal to $v$, set
143: $h(w) \leftarrow u$, and set $R(u) \leftarrow R(u) \cup R(w)$. Once
144: $R(u)$ is empty, push $u$ onto $S$. After processing all vertices
145: on $P$, visit each vertex $u$ on $P$ again, in bottom-up order, and
146: if $h(u)$ is now defined, perform $\unite(h(u), u)$.
147: See Figure \ref{fig:ia}
148:
149: \begin{figure}
150: %\begin{center}
151: \scalebox{0.82}[0.82]{\input{ia.pstex_t}}
152: %\end{center}
153: \caption{
154: Idealized execution of the algorithm on the graph in (a),
155: with circled microtree.
156: Arcs depict the effects of contractions:
157: whenever $x\in R(y)$, $(\find(x),\find(y))$ is an arc
158: in the corresponding graph.
159: The first vertex in each labeled set is the corresponding original vertex
160: in (a).
161: (a$\rightarrow$b) During preprocessing, $h(v)\leftarrow u_1$, and $v$ is inserted
162: into the set of $u_1$.
163: (b$\rightarrow$c) When processing $u_2$, $h(u_1)\leftarrow u_2$ via the arc
164: $(v,u_2)$.
165: (c$\rightarrow$d) When processing $u_3$, the stack $S$ is (top-down)
166: $(u_2,\lbottom{P})$. Hence, when processing the arc
167: $(\lbottom{P},u_3)$, $S$ is popped so that
168: $h(u_2)\leftarrow u_3$ and $h(\lbottom{P})\leftarrow u_3$.
169: (d) shows the state
170: after doing the $\unite(\cdot)$'s for path $P$.
171: (d$\rightarrow$e) When processing $u_4$, $S$ is $(w,z,\lbottom{Q})$.
172: Arc $(u_2,u_4)$ sets $h(u_3)\leftarrow u_4$ and
173: adds $\ltop{P}$ and $z$ to $R(u_4)$.
174: Processing $\ltop{P}$ causes
175: $h(\ltop{P})\leftarrow u_4$,
176: and processing $z$ pops the stack so that
177: $h(w)\leftarrow u_4$ and $h(z)\leftarrow u_4$.
178: (f) After processing path $Q$.
179: \label{fig:ia}}
180: \end{figure}
181:
182: This algorithm delays the unites for vertices on a path until the
183: entire path is processed, using the stack to keep track of the
184: corresponding contractions. Specifically, the algorithm maintains
185: the following invariant: if vertex $u$ on path $P$ is currently
186: being processed and $x$ is any original vertex, then the vertex into
187: which $x$ has been contracted is $v = \find(x)$ if $v$ is not on
188: $P$, or the largest vertex on $S$ less than or equal to $v$ if $v$
189: is on $P$ and $S$ is non-empty, or $u$ otherwise. It is
190: straightforward to verify this invariant by induction on time; the
191: correctness of this implementation of Tarjan's algorithm follows.
192:
193: \begin{theorem}
194: \label{thm:ia} The interval analysis algorithm runs in $O(m)$ time
195: on a pointer machine.
196: \end{theorem}
197:
198: \begin{proof}
199: The running time is linear except for the find operations: each
200: vertex gets added to $S$ once and has its head set at most once. To
201: bound the time for the find operations, we apply Lemma
202: \ref{lemma:pc} to the tree built by the parent assignments done by
203: the unite operations. Mark the tops of all paths. Since there are
204: at most $n/g$ paths, there are at most $n/g = n/\log^{1/3}{n}$
205: marked vertices. We claim that $k = 4$ satisfies the hypothesis of
206: the lemma. We need a property of the interval forest $H$: if $h(v)
207: = u$, then every vertex $w \neq u$ on the path in $D$ from $u$ to
208: $v$ is a descendant of $u$ in $H$. This holds because there is a path
209: containing only vertices in $D(u)$ from $w$ to $v$ (via $D$) to $u$.
210:
211: The unites occur in batches, one initial batch for all the microsets
212: and one batch per path. Consider any vertex $v$. We bound the
213: number of times the set containing $v$ in the DSU structure can
214: change, as a result of a batch of unites, before $v$ is in a set
215: with a marked vertex. Vertex $v$ can change sets once as a result
216: of the initialization (from a singleton set to a larger set). After
217: the initialization, $v$ is in some set, whose designated vertex may
218: be fringe or core. The first batch of unites that changes the set
219: containing $v$ puts $v$ in a set with a designated vertex $u$ that
220: is in the core, specifically on some path $P$. The second batch of
221: unites that changes the set containing $v$ puts $v$ in the same set
222: as $\ltop{P}$ (by the property above), and $v$ is now in a set with
223: a marked node. Thus $v$ can change sets at most thrice before it is
224: in a set with a marked vertex. The parent of $v$ can only change
225: once, as a result of a compression, without $v$ changing sets.
226: Therefore, the parent of $v$ can change at most four times before
227: $v$ is in a set with a marked vertex, so the claim is true.
228:
229: With $k = 4$ and $\ell \le n/\log^{1/3}{n}$, Lemma \ref{lemma:pc}
230: gives a bound of $O(m)$ on the time for the $\find$ operations.
231: \end{proof}
232:
233:
234:
235:
236: Interval analysis is an important component of program flow
237: analysis~\cite{aho:dragon2}. It also has other applications,
238: including testing flow graph reducibility \cite{reducibility:tarjan},
239: finding a pair of arc-disjoint spanning trees in a directed
240: graph \cite{st:t} and verifying a dominator tree \cite{domv:gt05}.
241: Our interval analysis algorithm gives $O(m)$-time
242: algorithms on a pointer machine for these applications as well.
243:
244:
245: In the next section we shall need a compressed version of the
246: interval forest $H'$ that is defined with respect to the fringe-core
247: partition: the parent $h'(v)$ of a vertex $v$ is its nearest core
248: ancestor in $H$ if it has one, $\nul$ otherwise. We can easily
249: compute $H'$ from $H$ in linear time, but if we only want $H'$ and
250: not $H$, we can avoid the topological graph computation on the
251: microtrees: First, find the strong components of the graphs induced
252: by the vertex sets of the microtrees. For each such component, find
253: its smallest vertex $u$, and perform $\unite(u, v)$ for every other
254: vertex $v$ in the component. Then run the algorithm above for the
255: core. This computes $h(v) = h'(v)$ for every vertex $v$ with head
256: in the core. Complete the computation by setting $h'(v) = h'(u)$ for
257: each vertex $v \neq u$ in a fringe strong component with smallest
258: vertex $u$.
259: