cs0208004/race.tex
1: \documentclass[11pt]{article}
2: \usepackage{epsfig,color,fullpage,citesort}
3: %\usepackage{pst-node}
4: \def\myendproof{{\ \vbox{\hrule\hbox{%
5:    \vrule height1.3ex\hskip0.8ex\vrule}\hrule }}\par}
6: \newtheorem{theorem}{Theorem}[section]
7: \newtheorem{lemma}[theorem]{Lemma}
8: \newtheorem{corollary}[theorem]{Corollary}
9: \newenvironment{proof}{{\it Proof. }}{\myendproof}
10: %\documentstyle[psfig,times,doublespace]{article}
11: %\setcounter{topnumber}{1}
12: %\setcounter{bottomnumber}{1}
13: %\setcounter{totalnumber}{1}
14: %\newtheorem{mytheorem}{Theorem}
15: %\newtheorem{mylemma}{Lemma} 
16: \newcommand{\qed}{\myendproof}
17: \newcommand{\setof}[1]{\{{#1}\}}
18: \newcommand{\set}[2]{\{{{#1}:{#2}}\}} 
19: \newcommand{\fname}[1]{{\sc #1}}
20: %\setlength{\marginparwidth}{.65in}
21: %\newcommand{\note}[1]{\marginpar{\renewcommand{\baselinestretch}{0.8}
22: %                          \footnotesize\sf #1}}
23: %\reversemarginpar
24: %\newcommand{\note}[1]{\marginpar{#1}}
25: \newcommand{\note}[1]{}
26: \newcommand{\maxof}[1]{\max\setof{#1}}
27: \newcommand{\notchb}{\not\chb}
28: \newcommand{\chb}{\rightarrow}
29: \newcommand{\opt}[1]{h({#1})}
30: \newcommand{\mmcc}{{minimal maximum cumulative cost}}
31: \newcommand{\cost}[1]{{c({#1})}}
32: \newcommand{\rank}[1]{{r({#1})}}
33: \newcommand{\height}[1]{{h({#1})}}
34: \newcommand{\hb}{\mbox{\rm hb}}
35: \newcommand{\fall}[1]{{\tilde{h}(#1)}}
36: \newcommand{\length}[1]{|{#1}|}
37: \newcommand{\myheading}[1]{\noindent {\bf #1}}
38: \newcommand{\almostone}{1+\epsilon}
39: \newcommand{\comment}[1]{}
40: \newcommand{\Xomit}[1]{}
41: \newcommand{\first}[2]{\mbox{\it first}_{#1}(#2)}
42: \newcommand{\init}{\bot}
43: \newcommand{\fini}{\top}
44: \newcommand{\pr}[2]{\mbox{\it pred}_{#1}(#2)}
45: \newcommand{\su}[2]{\mbox{\it succ}_{#1}(#2)}
46: \newcommand{\prr}[1]{\mbox{\it pred}(#1)}
47: \newcommand{\suu}[1]{\mbox{\it succ}(#1)}
48: \newcommand{\prfx}[2]{[-,{#1}]_{#2}}
49: \newcommand{\sufx}[2]{[{#1},-]_{#2}}
50: \newcommand{\prfxx}[1]{[-,{#1}]}
51: \newcommand{\sufxx}[1]{[{#1},-]}
52: \newcommand{\head}[1]{\mbox{\it start}(#1)}
53: \newcommand{\tail}[1]{\mbox{\it end}(#1)}
54: \newcommand{\algo}[1]{{\sc{#1}}}
55: \newcommand{\cut}{\Gamma}
56: \newcommand{\jopt}[1]{h_j(G[{#1}])}
57: \newcommand{\decomp}[1]{\mbox{\algo{Decomp}}(#1)}
58: \newcommand{\merge}[1]{\mbox{\algo{Merge}}(#1)}
59: \newcommand{\minheight}[1]{\mbox{\algo{MinHeight}}(#1)}
60: \newcommand{\best}[1]{\mbox{\algo{Best}}(#1)}
61: \newcommand{\cuttrans}[1]{\mbox{\algo{CutTrans}}(#1)}
62: \newcommand{\chainpair}[1]{\mbox{\algo{ChainPair}}(#1)}
63: \newcommand{\construct}[1]{\mbox{\algo{Construct}}(#1)}
64: \newcommand{\removerange}[1]{\mbox{\algo{RemoveRange}}(#1)}
65: \newcommand{\newcut}[1]{\mbox{\algo{Newcut}}(#1)}
66: \newtheorem{claim}{Claim}
67: 
68: %\title{Detecting Race Conditions in Parallel Programs that Use
69: %Semaphores}
70: %%
71: %\author{Philip N. Klein \and
72: %  Hsueh-I Lu \and Robert H.B. Netzer
73: %     \thanks{Department of Computer Science, Brown University, Providence,
74: %       RI 02912-1910, USA. Email: {\tt\{klein,hil,rn\}@cs.brown.edu}. Fax:
75: %       (401)863-7657.
76: %     }
77: %}
78: 
79: \title{Detecting Race Conditions in Parallel Programs that Use
80: Semaphores\thanks{Preliminaries versions of this paper appeared
81: in~\cite{Lu:19xx:DRC,Klein:19xx:RCD}.%
82: %Most of the research was performed while the second author 
83: %was with Department of Computer Science, Brown
84: %University.
85: }}
86: \author{Philip N. Klein\thanks{Department of Computer Science, Brown
87:      University, 
88:      Providence, RI 02912, USA. Email:
89:      klein@cs.brown.edu.}
90: \and
91:   Hsueh-I Lu\thanks{Corresponding author. Institute of Information Science, Academia
92:   Sinica, Taipei 115, Taiwan. Email: hil@iis.sinica.edu.tw. URL: www.iis.sinica.edu.tw/\~{ }hil/ }
93: \and
94:   Robert H.B. Netzer\thanks{Department of Computer Science, Brown University, 
95:      Providence, RI 02912, USA. Email:
96:      rn@cs.brown.edu.}
97: }
98: 
99: \begin{document}
100: \maketitle
101: \begin{abstract}
102: We address the problem of detecting race conditions in programs that
103: use semaphores for synchronization. Netzer and Miller showed that it
104: is NP-complete to detect race conditions in programs that use many
105: semaphores. We show in this paper that it remains NP-complete even if
106: only two semaphores are used in the parallel programs.
107: 
108: For the tractable case, i.e., using only one semaphore, we give two
109: algorithms for detecting race conditions from the trace of executing a
110: parallel program on $p$ processors, where $n$ semaphore operations are
111: executed.  The first algorithm determines in $O(n)$ time whether a
112: race condition exists between any two given operations. The second
113: algorithm runs in $O(np\log n)$ time and outputs a compact
114: representation from which one can determine in $O(1)$ time whether a
115: race condition exists between any two given operations. The second
116: algorithm is near-optimal in that the running time is only $O(\log n)$
117: times the time required simply to write down the output.
118: %
119: %This paper combines the results in the two preliminary
120: %versions~\cite{Lu:19xx:DRC,Klein:19xx:RCD}.
121: %
122: %{\em Keywords:} Parallel debugging, Synchronization, Race conditions,
123: %Semaphores, NP-completeness, Polynomial-time algorithms, Scheduling
124: \end{abstract}
125: 
126: \section{Introduction}
127: Race detection is crucial in developing and debugging shared-memory
128: parallel
129: programs~\cite{Simmons:1996:DPT,Savage:1997:EDD,Emrath:1992:DNP,Netzer:1992:WRC,Itzkovitz:1999:TID,Ha:2002:SEF}. Explicit
130: synchronization is usually added to such programs to coordinate access
131: to shared data.  For example, when using a semaphore, a $V$-operation
132: increments the semaphore, and a $P$-operation waits until the
133: semaphore is greater than zero and then decrements the
134: semaphore. $P$-operations are typically used to wait (synchronize)
135: until some condition is true (such as a shared buffer becoming
136: non-empty), and $V$-operations typically signal that some condition is
137: now true.  Race conditions result when this synchronization does not
138: force concurrent processes to access data in the expected order.  One
139: way to dynamically detect races in a program is to trace its execution
140: and analyze the traces afterward.  A central part of dynamic race
141: detection is to compute from the trace the order in which
142: shared-memory accesses were guaranteed by the execution's
143: synchronization to have executed. Accesses to the same location not
144: guaranteed to execute in some particular order are considered a race.
145: When programs use semaphore operations for synchronization, some
146: operations (belonging to different processes) could have potentially
147: executed in an order different than what was traced.
148: 
149: 
150: In this paper, we address the tractability of detecting race
151: conditions from the traces of parallel programs that use semaphores.
152: Let $p$ be the number of processors used to execute the parallel
153: program, and let $n$ be the total number of semaphore operations
154: performed in the execution. The trace can then be represented by a
155: directed $n$-node graph $G$ consisting of $p$ disjoint chains, each
156: represents the sequence of semaphore operations executed by a
157: processor.  A {\em schedule} of $G$ is a linear ordering of all nodes
158: in $G$ consistent with the precedence constraints imposed by the arcs
159: of $G$.  A prefix of a schedule of $G$ is a {\em subschedule} of $G$.
160: A subschedule of $G$ is {\em valid} if at each point in the
161: subschedule, the number of $V$ operations is never exceeded by the
162: number of $P$ operations for each semaphore (i.e., all semaphores are
163: always nonnegative).  Then, if the trace indicates that $v$ preceded
164: $w$ in the actual execution, but a valid subschedule\footnote{We
165: consider subschedules rather than schedules because deadlocks might
166: happen during the execution of parallel programs.}  exists in which
167: $w$ precedes $v$, then $v$ and $w$ could have executed in either
168: order, i.e., there is a {\em race condition} between $v$ and $w$.
169: Miller and Netzer showed that detecting race conditions in parallel
170: programs that use multiple semaphores is
171: NP-complete~\cite{Netzer:1990:CEO}.  Researchers have developed exact
172: algorithms for cases where the problem is efficiently solvable
173: (programs that use types of synchronization weaker than semaphores
174: such as
175: post/wait/clear)~\cite{Netzer:1992:ERC,Helmbold:1993:CSO,Helmbold:1996:TRC},
176: and heuristics for the multiple semaphore
177: case~\cite{Emrath:19xx:ESA,Helmbold:19xx:ATA}.  The complexity for the
178: case of constant number of semaphores was unknown. In the present
179: paper, we show that the problem remains NP-complete even if only two
180: semaphores are used in the parallel program.
181: 
182: For the case of using only one semaphore in parallel programs, we give
183: two algorithms.  The first algorithm detects in $O(n)$ time whether a
184: race condition exists between any two operations.  The second
185: algorithm computes in $O(np\log n)$ time a compact representation,
186: from which one can determine whether a race condition exists between
187: any two operations in $O(1)$ time.  Our results are based on the
188: reducing the problem of determining whether a valid subschedule exists
189: in which $w$ precedes $v$ to the problem of {\em Sequencing to
190: Minimize Maximum Cumulative Cost (SMMCC)\/}.
191: %We first describe the SMMCC problem and then explain the
192: %equivalence in the next two paragraphs.  
193: Given an acyclic directed graph $G$ with costs on the nodes, the {\em
194: cumulative cost} of the first $i$ nodes in a schedule of $G$ is the
195: sum of the cost of these nodes.  Thus, minimizing the maximum
196: cumulative cost is an attempt to ensure that the cumulative cost stays
197: low throughout the schedule. The SMMCC problem is NP-complete in
198: general even if the node costs are restricted to
199: $\pm1$~\cite{Abdel-Wahab:1976:SAR,Garey:1979:CIG}.  Abdel-Wahab and
200: Kameda~\cite{Abdel-Wahab:1978:SMM} presented an $O(n^2)$-time
201: algorithm for the special case that $G$ is a series-parallel graph.
202: (The time bound was later improved to $O(n \log n)$ by the same
203: authors~\cite{Abdel-Wahab:1980:SOS}.)  As part of this solution, they
204: gave an $O(n\log p)$-time algorithm applicable when $G$ consists of
205: $p$ disjoint chains.  The existence problem of a valid {\em schedule}
206: in which $v$ precedes $w$ can be reduced to the SMMCC problem in a
207: chain graph augmented with one inter-chain edge. We add an edge from
208: $w$ to $v$, assign costs to the nodes ($+1$ if the node is a
209: $P$-operation, $-1$ if a $V$-operation), and compute the minimum
210: maximum cumulative cost.  Clearly, the cost is non-positive if and
211: only if there is a valid schedule. The augmented chain graph is not
212: series-parallel, so the algorithms of Abdel-Wahab and
213: Kameda~\cite{Abdel-Wahab:1978:SMM,Abdel-Wahab:1980:SOS} are not
214: applicable.  We show that the SMMCC problem can nevertheless be solved
215: in polynomial time.  In fact, for the special case of interest, that
216: in which the costs are $\pm 1$, we give a linear-time algorithm.
217: 
218: The rest of the paper is organized as follows.
219: Section~\ref{sec:prelim} gives the preliminaries.
220: Section~\ref{sec:single} gives the algorithm for a single pair of
221: nodes. Section~\ref{sec:all} gives the algorithm for all pairs of
222: nodes. Section~\ref{sec:2semaphores} sketches the proof for showing that
223: race-condition detection is NP-complete if two semaphores are used in
224: the parallel program.
225: 
226: \section{Preliminaries}
227: \label{sec:prelim}
228: %\subsection{Definition and Notation}
229: Suppose $G$ is an acyclic graph with node costs.  We introduce some
230: terminology having to do with schedules, mostly adapted
231: from~\cite{Abdel-Wahab:1978:SMM}.  
232: %A {\em schedule} of $G$ is a
233: %sequence of $G$'s nodes which is consistent with the precedence
234: %constraints imposed by the arcs of $G$. 
235: A {\em segment} of a schedule is a consecutive subsequence.  Let $H =
236: v_1v_2\cdots v_m$ be a sequence of nodes.  The {\em cost} of $H$,
237: denoted $\cost{H}$, is the sum of the costs of its nodes.  The {\em
238: height of a node $v_\ell$ in $H$} is defined to be the sum of the
239: costs of the nodes $v_1$ through $v_\ell$.  The {\em height of $H$},
240: denoted $\height{H}$, is the maximum of 0 and the maximum height of
241: the nodes in $H$.
242: %(a) the maximum height of any node in $H$, if
243: %some node of $H$ has non-negative height, or (b) zero, if all nodes in
244: %$H$ have negative heights.  
245: A node of maximum height in $H$ is called a {\em peak}. A node of
246: minimum height in $H$ is called a {\em valley}.  The {\em reverse
247: height} of $H$, denote $\fall{H}$, is the height of $H$ minus the cost
248: of $H$.  Note that height and reverse height are nonnegative.  A
249: schedule of $G$ is {\em optimal} if its height is minimum over all
250: schedules of $G$.  We use $\opt{G}$ to denote the height of its
251: optimal schedule.
252: 
253: A sequence $C=v_1v_2\cdots v_m$ of nodes of $G$ is called a {\em chain}
254: of $G$ if the only edges in $G$ incident on these nodes are $v_0v_1,
255: v_1v_2,\ldots,v_{m-1}v_m, v_mv_{m+1}$, where $v_0$ and $v_{m+1}$ are
256: other nodes, denoted $\prr{C}$ and $\suu{C}$, respectively.  We use
257: $\head{C}$ to denote $v_1$ and $\tail{C}$ to denote $v_m$. Note that $C$
258: could be a single node.
259: 
260: We use $[v,w]_{G}$ to denote the chain of $G$ starting from $v$ and
261: ending at $w$. Let $[v,-]_{G}$ denote the longest chain of $G$
262: starting from $v$, and $[-,v]_{G}$ the longest chain of $G$ ending
263: at $v$.  If it is clear from the context which graph is intended, then we
264: may omit the subscript $G$. Note that the above notation might not be
265: well-defined for any acyclic graph $G$, but it is so when $G$ is
266: composed of disjoint chains, which is the case of interest in this
267: paper. 
268: 
269: Suppose $H$ is a chain of $G$ containing a peak $v_\ell$ such that
270: (1) every node of $H$ preceding $v_\ell$ has nonnegative height in
271: $H$, and (2) every node of $H$ following $v_\ell$ has height in $H$ at
272: least the cost of $H$. In this case, we call $H$ a {\em hump}, and we say
273: $v_\ell$ is a {\em useful peak} of $H$.  This definition is illustrated in
274: Figure~\ref{hump}\note{Figure~\ref{hump}}.  We say a hump is an {\em $N$-hump} if its
275: cost is negative, a {\em $P$-hump} if its cost is nonnegative.
276: 
277: \begin{figure}%[p]
278: \centerline{\input{fig1}}
279: %\centerline{\psfig{figure=hump.ps,width=4in,silent=1}}
280: \caption[]{A hump $H$ of 12 nodes: $v_1,v_2,\ldots,v_{12}$. The cost
281:   of each node is in the circle. By definition $\cost{H}=-2$,
282:   $\height{H}=2$, and $\fall{H}=4$. Both of $v_2$ and $v_8$ are peaks of
283:   $H$, but only $v_2$ is useful.}
284: \label{hump}
285: \end{figure}
286: 
287: We are concerned primarily with graphs $G$ consisting of disjoint
288: chains $C_1,C_2,\ldots,C_p$.  For convenience, we assume that $G$
289: contains an {\em initial pseudonode} ($\init$),
290: preceding all nodes, and a {\em terminal pseudonode} ($\fini$),
291: following all nodes, each of cost zero. Thus, $\prr{v}$ could be
292: $\init$ and $\suu{v}$ could be $\fini$.
293: 
294: For the rest of the section we describe the properties of humps in
295: schedules, mostly adapted from~\cite{Abdel-Wahab:1978:SMM}.
296: 
297: \subsection{Hump Decomposition}
298: \label{property-sect}
299: 
300: As part of their scheduling algorithm for series-parallel graphs,
301: Abdel-Wahab and Kameda~\cite{Abdel-Wahab:1980:SOS} show that in linear
302: time a sequence of nodes can be decomposed into a set of humps by an
303: algorithm $\decomp{}$.
304: %The algorithm $\decomp{}$ is
305: %shown in Figure~\ref{humpdecomp}\note{Figure~\ref{humpdecomp}}. 
306: It
307: takes a chain as input and outputs a set of disjoint subchains such
308: that every subchain is a hump.  
309: %The first Repeat-loop produces
310: %$N$-humps; and the second Repeat-loop produces $P$-humps.  Each loop
311: %alternates between identifying peaks and valleys.  It is not difficult
312: %to see that every sequence of nodes between two consecutive valleys is
313: %a hump.  
314: The output of $\decomp{C}$ is unique, although the output is not
315: necessarily the only hump decomposition of $C$.  An example is shown
316: in Figure~\ref{decompose}\note{Figure~\ref{decompose}}. The chain is
317: decomposed by $\decomp{}$ into two $N$-humps and three $P$-humps.  For
318: a chain $C$, we say $H$ is a {\em hump of $C$} if $H\in\decomp{C}$.
319: It can be proved that $\decomp{}$ has the following properties.
320: 
321: %\begin{figure}%[p]
322: %\begin{center}
323: %\fbox{
324: %\begin{minipage}{5in}
325: %\begin{center}
326: %\begin{tabbing}
327: %\quad\=\quad\=\quad\=\quad\=\quad\=\quad\=\kill
328: %Function $\decomp{C}$ \+\\
329: %$S:=\setof{};$\\
330: %$u:=$ the first valley of $C$;\\
331: %Repeat\+\\
332: %$v$\>$:=$ the first peak of $[\prr{C},u]$;\\
333: %$w$\>$:=$ the first valley of $[\prr{C},v]$;\\
334: %$S$\>$:=$ $S\cup\setof{[\suu{w},u]};$\\
335: %$u$\>$:=$ $w;$\-\\
336: %Until $u=\prr{C}$;\\
337: %$u:=$ the first valley of $C$;\\
338: %Repeat\+\\
339: %$v$\>$:=$ the last peak of $[u,\tail{C}]$;\\
340: %$w$\>$:=$ the last valley of $[v,\tail{C}]$;\\
341: %$S$\>$:=$ $S\cup\setof{[\suu{u},w]}$;\\
342: %$u$\>$:=$ $w;$\-\\
343: %Until $u=\tail{C}$;\\
344: %Return $S$;
345: %\end{tabbing}
346: %\end{center}
347: %\newpage
348: %\end{minipage}
349: %}
350: %\end{center}
351: %\caption[]{The algorithm decomposing a chain into a set of humps.}
352: %\label{humpdecomp}
353: %\end{figure}
354: 
355: 
356: \begin{figure}%[p]
357: %\centerline{\psfig{figure=chain.ps,width=4in,silent=1}} %
358: \centerline{\input{fig3}}
359: \caption[]{A chain decomposed into two $N$-humps and three $P$-humps.}
360: \label{decompose}
361: \end{figure}
362: %\clearpage
363: 
364: \paragraph{Hump-decomposition properties:}
365: \begin{enumerate}
366: \item Suppose $H_1, H_2\in\decomp{C}$ and $H_1$ precedes $H_2$ in $C$.
367:   If $\cost{H_1}\ge0$, then $\cost{H_2}\ge0$ and $\fall{H_1}>\fall{H_2}$.
368:   If $\cost{H_2}<0$, then $\cost{H_1}<0$ and $\height{H_1}<\height{H_2}$.
369: \item If $v$ is the first valley of $[u,w]$, then $\decomp{[u,v]}$
370: (respectively, $\decomp{[\suu{v},w]}$) consists of $N$-humps
371: (respectively, $P$-humps) only.
372: 
373: \item Let $C$ and $C'$ be two disjoint chains, whose humps are
374:   respectively $H_1,H_2,\ldots,H_k$ and $H_{k+1},H_{k+2},\ldots,H_\ell$
375:   in order.  Then, for some $1\le i\le k$ and $k\le j\le \ell$, the humps
376:   of $CC'$ are 
377:   \[
378:      H_1,H_2,\ldots,H_i,(H_{i+1}\cdots H_j),H_{j+1},\ldots,H_\ell
379:   \]
380:   in order.
381: \end{enumerate}
382: The third property implies that
383: %\begin{eqnarray*}
384: \begin{displaymath}
385: \set{\tail{H}}{H\in\decomp{CC'}}\subseteq
386: \set{\tail{H}}{H\in\decomp{C}}\cup\set{\tail{H}}{H\in\decomp{C'}}.
387: \end{displaymath}
388: %\end{eqnarray*}
389: 
390: It will turn out that once we decompose a chain into humps, we need
391: not be concerned with the internal structure of these humps. For each
392: hump $H$ we need only store $\cost{H}$ and $\height{H}$.  Thus, a
393: chain consisting of $\ell$ humps can be represented by a length-$\ell$
394: sequence of pairs $(\cost{H},\height{H})$. We call this sequence the
395: {\em hump representation} of the chain. Using the third
396: hump-decomposition property, one could straightforwardly derive the
397: hump representation of $C_1C_2$ from the hump representation of $C_1$
398: and that of $C_2$.  In particular, if we are given $\decomp{C}$ and
399: $\decomp{C'}$, then computing $\decomp{CC'}$ takes
400: $O(|\decomp{C}|+|\decomp{C'}|)$ time.
401: 
402: \begin{figure}%[p]
403: %\centerline{\psfig{figure=useful.ps,width=4in,silent=1}}
404: \centerline{\input{fig4}}
405: \caption[]{The second sequence of nodes is obtained from the first one
406: by clustering the nodes $1--5$ to node $3$.}
407: \label{cluster-line}
408: \end{figure}
409: 
410: \subsection{Hump Clustering}
411: %%Two lemmas are useful to our results. They are both generalizations
412: %%of lemmas in [.Kameda 1978.]. 
413: The following lemma concerns an operation on a schedule called {\em
414: clustering} the nodes of a hump.  Suppose $H$ is a hump of $G$, and
415: let $v$ be a useful peak of $H$.  Let $S$ be a schedule of $G$.  If
416: all the nodes of $H$ are consecutive in $S$, then we say $H$ is {\em
417: clustered in $S$}.  If every hump of $G$ is clustered in $S$, then we say
418: the schedule $S$ is {\em clustered}.  If a hump is not clustered in a
419: schedule, then we can modify the schedule to make it so.  To {\em cluster
420: the nodes of $H$ to $v$} is to change the positions of nodes of $H$
421: other than $v$ so that all the nodes of $H$ are consecutive, and the
422: order among nodes of $H$ is unchanged.  An example is shown in
423: Figure~\ref{cluster-line}\note{Figure~\ref{cluster-line}}.
424: 
425: \begin{lemma}[See~\cite{Abdel-Wahab:1978:SMM}]
426: \label{cluster}
427: %{\rm [. Kameda 1978 .]}
428: Let $G$ be an acyclic graph with node costs and $H$ be a hump of
429: $G$. Suppose $S$ is a schedule of $G$. If $T$ is obtained from $S$
430: by clustering all nodes in $H$ to a useful peak of $H$, then $T$ is a
431: schedule of $G$ and $\height{T} \le \height{S}$.
432: \end{lemma}
433: 
434: An example is shown in
435: Figure~\ref{cluster-fig}\note{Figure~\ref{cluster-fig}}. The height of
436: the schedule in Figure~\ref{cluster-fig}(c) is smaller than that of
437: the schedule in Figure~\ref{cluster-fig}(b).  
438: %It follows from
439: %Lemma~\ref{cluster} that there is always a clustered optimal schedule
440: %of $G$. 
441: Two clustered schedules of the graph in Figure~\ref{cluster-fig}(a)
442: are shown in Figures~\ref{cluster-fig}(d) and~\ref{cluster-fig}(e).
443: It follows from Lemma~\ref{cluster} that there is always an optimal
444: schedule of $G$ which is clustered.
445: %We prove the lemma as follows.
446: 
447: 
448: \begin{figure*}%[p]
449: %\centerline{\psfig{figure=cluster.ps,width=4.8in,silent=1}}
450: \centerline{\input{fig5}}
451: \caption[]{(a) A graph $G$ consists of two chains. The first chain
452: contains an $N$-hump followed by a $P$-hump. The second chain contains
453: two $P$-humps. (b) A schedule for $G$ of height four. (c) The schedule
454: obtained from the previous one by clustering the $N$-hump to its
455: useful peak. (d) A clustered schedule of $G$ of height two. This one
456: is obtained from the previous schedule by clustering every hump. (e) A
457: clustered schedule of $G$ with minimum height.}
458: \label{cluster-fig}
459: \end{figure*}
460: 
461: 
462: %\paragraph{Proof of Lemma~\ref{cluster}}
463: %The original lemma in~\cite{Abdel-Wahab:1978:SMM} restricts $G$ to be a
464: %chain graph. We can prove as follows that the same properties hold
465: %even without the restriction. Suppose $H=v_1\cdots v_p\cdots v_d$,
466: %where $v_p$ is a useful peak of $H$. Suppose $w_1w_2\cdots w_\ell$ is
467: %the segment of $S$ such that $w_{j_i}=v_i$ for all $1\le i\le d$ and
468: %$1=j_1<j_2<\cdots<j_d=\ell$.  The only difference between $S$ and $T$
469: %is that the segment of $W=w_1w_2\cdots w_\ell$ in $S$ is replaced with
470: %$W'=W'_1HW'_2$ in $T$, where
471: %\begin{eqnarray*}
472: %  W'_1&=&w_{j_1+1}\cdots w_{j_2-1}w_{j_2+1}\cdots w_{j_p-1}\\
473: %  W'_2&=&w_{j_p+1}\cdots w_{j_{p+1}-1}w_{j_{p+1}+1}\cdots w_{j_d-1}.
474: %\end{eqnarray*}
475: %Suppose $w_j$ is not in $H$. By definition of chains the precedence
476: %relations between $v_i$ and $w_j$ imposed by $G$ are the same over all
477: %$1\le i\le d$. Note that $w_j$ precedes some node of $H$ in $W$ and
478: %succeeds some other node of $H$ in $W$. It follows that there is no
479: %precedence constraint between $v_i$ and $w_j$. Therefore $T$ is a
480: %schedule of $G$.
481: %%
482: %
483: %We denote the heights of $w_i$ in $S$ and in $T$ by $h_S(w_i)$ and
484: %$h_T(w_i)$, respectively.  Note that $h_S(v_p)=h_T(v_p)$ since the set
485: %of nodes preceding $v_p$ does not change. We show that for every
486: %$1\le\alpha\le\ell$ there exists a $1\le\beta\le\ell$ such that
487: %$h_T(w_\alpha)\le h_S(w_\beta)$.
488: %\begin{itemize}
489: %\item If $\alpha=j_i$ for some $1\le i\le d$, then
490: %$h_T(w_\alpha)=h_T(v_i)\le h_T(v_p)=h_S(v_p)=h_S(w_{j_p})$.
491: %\item If $j_i<\alpha<j_{i+1}$ for some $p\le i<d$, then
492: %$h_T(w_\alpha)=\cost{v_{i+1}}+\cost{v_{i+2}}+\cdots+\cost{v_d}+h_S(w_\alpha)$.
493: %Since $H$ is a hump and $i\ge p$,
494: %$\cost{v_{i+1}}+\cost{v_{i+2}}+\cdots+\cost{v_d}=h_S(v_d)-h_S(v_i)\le
495: %0$.  Thus $h_T(w_\alpha)\le h_S(w_\alpha)$.
496: %\item If $j_i<\alpha<j_{i+1}$ for some $1\le i<p$, then
497: %$h_T(w_\alpha)=-\cost{v_1}-\cost{v_2}-\cdots-\cost{v_i}+h_S(w_\alpha)$.
498: %Since $H$ is a hump and $i<p$,
499: %$-\cost{v_1}-\cost{v_2}-\cdots-\cost{v_i}=h_S(v_0)-h_S(v_i)\le 0$,
500: %where $v_0$ is the node that precedes $v_1$ in $S$. Thus
501: %$h_T(w_\alpha)\le h_S(w_\alpha)$.
502: %\end{itemize}
503: %It follows that $\height{W'}\le\height{W}$. Since nodes other than the
504: %$w_i$'s preserve their heights in $S$ and $T$, the lemma is proved.
505: %\qed
506: 
507: 
508: \subsection{Standard Order}
509: A series $S_1 \cdots S_m$ of subsequences of nodes is in {\em standard
510: order} if it satisfies the following properties.
511: 
512: \paragraph{Standard order properties.}
513: \begin{itemize}
514:    \item The series consists of $S_i$'s with negative costs, followed
515:          by $S_i$'s with nonnegative costs;
516:    \item The $S_i$'s with negative costs are in nondecreasing order of
517:          height; and the $S_i$'s with nonnegative costs are in
518:          nonincreasing order of reverse height.
519: \end{itemize}
520: 
521: If the humps of a chain are $H_1,H_2,\ldots,H_m$ in order, then the
522: series $H_1H_2\cdots H_m$ is in standard order by the first
523: hump-decomposition property.
524: 
525: \begin{lemma}[See~\cite{Abdel-Wahab:1978:SMM}]
526: \label{exchange}
527: Let $A$, $B$, $S_1$ and $S_2$ be subsequences of nodes. Suppose
528: $S=S_1ABS_2$ and $T=S_1BAS_2$. If the series $BA$ is in standard order,
529: then $\height{S}\ge\height{T}$.
530: \end{lemma}
531: 
532: For example, the sequence in Figure~\ref{cluster-fig}(d) is a
533: clustered schedule of the graph in Figure~\ref{cluster-fig}(a). Note
534: that the series of the last two humps in the schedule is not in
535: standard order: the reverse height of the first hump (zero) is less
536: than that of the second hump (one). The schedule in
537: Figure~\ref{cluster-fig}(e) obtained by exchanging those two
538: clustered humps has height one less than that of the schedule in
539: Figure~\ref{cluster-fig}(d).
540: 
541: %\paragraph{Proof of Lemma~\ref{exchange}}
542: %The original version of this lemma in~\cite{Abdel-Wahab:1978:SMM} restricts
543: %$A,B$ to be humps $G$. We can prove as follows that the same property
544: %holds even without these restrictions. Since the heights of nodes in
545: %$S_1$ and $S_2$ are not changed in $S$ and $T$, it suffices to ensure
546: %that
547: %\begin{eqnarray*}
548: % \height{AB}&=&\maxof{\height{A},\cost{A}+\height{B}}\\
549: %            &\ge&\maxof{\height{B},\cost{B}+\height{A}}\\
550: %            &=&\height{BA}.
551: %\end{eqnarray*}
552: %\begin{itemize}
553: %\item If $\cost{A}<0$ and $\cost{B}<0$, since the series $BA$ is in
554: %         standard order, $\height{A}\ge\height{B}$. Since
555: %         $\cost{B}<0$, it follows that $\height{A}>\cost{B}+\height{B}$.
556: %\item If $\cost{A}\ge0$ and $\cost{B}\ge0$, since the series $BA$ is
557: %         in standard order,
558: %         $\height{B}-\cost{B}=\fall{B}\ge\fall{A}=\height{A}-\cost{A}$. 
559: %         Thus $\cost{A}+\height{B}\ge\cost{B}+\height{A}$. Since
560: %         $\cost{A}\ge0$, $\cost{A}+\height{B}\ge\height{B}$. 
561: %\item If $\cost{A}\ge0$ and $\cost{B}<0$, then
562: %    $\height{A}>\cost{B}+\height{A}$ and
563: %    $\cost{A}+\height{B}\ge\height{B}$.
564: %\end{itemize}
565: %Since in all cases each of $\height{B}$ and $\cost{B}+\height{A}$ is
566: %less than or equal to one of $\height{A}$ and $\cost{A}+\height{B}$,
567: %the lemma is proved.
568: %\qed
569: 
570: \subsection{Hump Merging}
571: A schedule of $G$ is in {\em standard form} if it is clustered and its
572: series of humps of $G$ is in standard order.  Let $T$ be any schedule
573: of $G$ in standard form.  Recall that by Lemma~\ref{cluster} there is
574: always an optimal schedule $S$ of $G$ which is clustered.  The humps
575: of $G$, while clustered in both $T$ and $S$, may not be in the same
576: order.  However, any two humps of the same chain of $G$ must be in the
577: same order in $T$ and in $S$, else either $T$ or $S$ is not a
578: schedule.  Take two consecutive humps in $S$ that are from different
579: chains and that are not in the same order as in $T$, and exchange
580: their positions.  By Lemma~\ref{exchange}, the resulting ordering has
581: height no more than $S$.  By a series of such exchanges, we eventually
582: obtain $T$ from $S$.  It follows that the height of $T$ is no more
583: than that of $S$, and hence that $T$ is optimal.  This argument shows
584: that every schedule in standard form is an optimal schedule of $G$.
585: 
586: Let $I=\setof{H_1,H_2,\ldots,H_m}$, where the series $H_1H_2\cdots
587: H_m$ is in standard order. Suppose $\merge{I}$ returns a sequence of
588: nodes obtained by concatenating all humps in $I$ into standard order.
589: Namely, $\merge{I}=H_1H_2\cdots H_m$. Assume for uniqueness that
590: $\merge{}$ breaks ties in some arbitrary but fixed way.  By the above
591: argument we have the following lemma.
592: 
593: \begin{lemma}[See~\cite{Abdel-Wahab:1978:SMM}]
594: The output of
595: \[
596:    \merge{\bigcup_{1\le i\le p}\decomp{C_i}}
597: \] 
598: is an optimal schedule of $G$.
599: \label{optsched}
600: \end{lemma}
601: 
602: An example is shown in Figure~\ref{cluster-fig}. Since the schedule in
603: Figure~\ref{cluster-fig}(e) is clustered and its series of humps is in
604: standard order, it is an optimal schedule of the graph in
605: Figure~\ref{cluster-fig}(a).  Abdel-Wahab and
606: Kameda~\cite{Abdel-Wahab:1978:SMM} showed that $\merge{\bigcup_{1\le
607: i\le p}\decomp{C_i}}$ can be obtained in $O(n\log p)$ time.  Note that
608: the output of function $\merge{}$ may not be unique.  Without loss of
609: generality, however, we may define $\merge{}$ more restrictively as
610: follows to make its output unique for the same $G$. Suppose $G$ is
611: composed of disjoint chains, $C_1, C_2, \ldots, C_p$ and
612: $I=\bigcup_{1\le i\le p}\decomp{C_i}$. Define $\merge{I}=H_1H_2\cdots
613: H_m$, where $\setof{H_1,H_2,\ldots,H_m}=I$ and the series
614: $H_1H_2\cdots H_m$ is in standard order. Furthermore, if $H_iH_j$ and
615: $H_jH_i$ are both in standard order, where $C_{i'}$ contains $H_i$,
616: $C_{j'}$ contains $H_j$, and $i'<j'$, then $H_i$ precedes $H_j$ in
617: $\merge{I}$.
618: 
619: \section{Algorithm for Single Pair}
620: \label{sec:single}
621: %To detect race conditions between $v$ and $w$, we need to find a valid
622: %{\em subschedule} containing $v$ and $w$ such that its maximum
623: %cumulative cost is minimized.  Note that every valid subschedule of
624: %$G$ is a valid schedule of a prefix subgraph of $G$. A graph $G_0$ is
625: %a {\em prefix subgraph} of $G$ if (i) there is no arc of $G$ from any
626: %node of $G-G_0$ to any node of $G_0$; (ii) every arc of $G$ between
627: %two nodes of $G_0$ is also an arc of $G_0$. Clearly, in the graph of
628: %interest, i.e., $p$ parallel chains with an augmented arc, every
629: %prefix subgraph is determined by a cut comprising $p$ cutpoints.
630: 
631: %Let $G$ be a graph composed of disjoint chains, $C_1, C_2, \ldots, C_p$.
632: %Recall that there are two pseudonodes, $\init$ and $\fini$. The cost of
633: %each node of $G$ is either $+1$ or $-1$. 
634: %A subschedule $S$ of $G$ is {\em valid} if $\height{S}=0$.  Let $v$
635: %and $w$ be two nodes of $G$. In this section we show how to determine
636: %in linear time whether $v$ could precede $w$ in some valid subschedule
637: %of $G$.
638: 
639: %\subsection{Notation}
640: A vector $\cut=(x_1,x_2,\ldots,x_p)$ of $p$ nodes is called a {\em
641: cut} of $G$ if each $x_i$ is either $\init$ or a node in $C_i$. We
642: call $x_i$ the $i$-th {\em cutpoint} of $\cut$.  The {\em prefix
643: subgraph} $G[\cut]$ of $G$ is the subgraph $\bigcup_{1\leq i\leq
644: p}[-,x_i]$.  Therefore, the problem we address can be reduced to
645: finding a cut such that the valid schedule of the prefix subgraph
646: determined by the cut has the minimal maximum cumulative cost.  Let
647: $h$ be the maximum cumulative cost of the optimal subschedule that
648: contains $v$ and $w$.  If $h$ is zero, then a valid subschedule exists
649: (i.e., the optimal valid subschedule.) If $h$ is positive, then there
650: is no valid subschedule because the maximum cumulative cost of any
651: valid subschedule is greater than or equal to $h$ and is thus
652: positive, too.  The rest of the section shows that a best cut can be
653: found in linear time.
654: 
655: Since we will frequently encounter two cuts that differ at only one
656: cutpoint, let $\newcut{\cut, i, u}$ denote a cut $\cut'$ with
657: \begin{displaymath}
658:    \cut'(\ell)=\left\{
659:                  \begin{array}{ll}
660:                     \cut(\ell)&\mbox{if\ $\ell\ne i$};\\
661:                     u&\mbox{if\ $\ell=i$}.
662:                  \end{array}
663:                \right.
664: \end{displaymath}
665: A {\em $j$-schedule} of $G[\cut]$ is a schedule of $G[\cut]$ whose
666: last node is $\cut(j)$. We use $\jopt{\cut}$ to denote the height of
667: an optimal $j$-schedule of $G[\cut]$. Suppose $\cut(j)\ne\init$. One
668: can compute $\jopt{\cut}$ for a given $\cut$ as follows. Let
669: $\cut'=\newcut{\cut,j,\prr{\cut(j)}}$.  Clearly, if $S$ is an optimal
670: schedule of $G[\cut']$, then $S\cut(j)$ is an optimal $j$-schedule of
671: $G[\cut]$.
672: %(However, the other direction is not true.) 
673: It follows that
674: \begin{displaymath}
675:   \jopt{\cut}=\max\setof{\height{G[\cut']},\cost{G[\cut']}+\height{\cut(j)}}. 
676: \end{displaymath}
677: Note that $\opt{G[\cut]}$ and $\jopt{\cut}$ are both nonnegative.  We
678: use $v\chb w$ to signify that there is a valid subschedule of $G$ in
679: which $v$ precedes $w$. Let $v\notchb w$ signify that $v\chb w$ is not
680: true. Note that neither $\chb$ nor $\notchb$ is a partial order.
681: 
682: 
683: \subsection{Basic Idea}
684: Every valid subschedule of $G$ is a valid schedule of a prefix subgraph
685: $G[\cut]$ for some cut $\cut$ of $G$. Therefore, $v\chb w$ if and only if
686: there is a cut $\cut$ of $G$ such that $G[\cut]$ has a valid schedule in
687: which $v$ precedes $w$.  Let $h^*$ be the minimum of
688: $\height{G[\cut]\cup\setof{vw}}$ over all $G[\cut]$'s that contain $v$
689: and $w$. It follows that $v\chb w$ if and only if $h^*=0$. Hence, the
690: problem of determining whether $v\chb w$ is reduced to computing the
691: minimum height of a set of chain graphs each augmented with an
692: interchain arc.  Clearly, two immediate questions arise. 1) How do we
693: compute the height of $G[\cut]\cup\setof{vw}$, which is not even
694: serial-parallel?  2) How do we cope with the fact that there could be
695: exponential number of prefix subgraphs that contain $v$ and $w$?
696: 
697: Let $v$ and $w$ be contained in two disjoint chains $C_i$ and $C_j$,
698: respectively.  The following observation will ease the situation.
699: Suppose $S$ is a subschedule of $G$ containing $w$. Let $S'$ be the
700: subschedule of $G$ obtained from $S$ by discarding all nodes succeeding
701: $w$ in $S$. Clearly, $\height{S'}\le\height{S}$. Therefore, without loss
702: of generality the minimum of $\height{G[\cut]\cup\setof{vw}}$ can be
703: computed over only cuts $\cut$ with $\cut(j)=w$. Moreover, we can
704: let $w$ always be the last node of a subschedule by considering only the
705: minimum-height $j$-schedule of each $G[\cut]$ that contains $v$. The
706: first question above is no longer an issue.
707: 
708: It turns out that the second question is not an issue, either.  We
709: will show that in order to obtain the minimum-height of all those
710: $j$-schedules, it suffices to consider only $O\left(\sqrt{n}\right)$ cuts. In
711: particular each of those $O(\sqrt{n})$ cuts is uniquely determined by
712: its $j$-th cutpoint.
713: 
714: \subsection{The Algorithm}
715: The algorithm takes $v$ and $w$ as inputs. Let $C_i$ contain $v$ and
716: $C_j$ contain $w$. The algorithm proceeds iteratively with different
717: cutpoint $\cut(i)$ such that $\cut(i)$ does not precede $v$. In each
718: iteration the algorithm calls the function $\best{}$ to obtain a
719: minimum-height $j$-schedule for $G[\cut]$ over all cuts $\cut$ with the
720: designated cutpoints in $C_i$ and $C_j$. By comparing the heights of
721: these $j$-schedules with respect to different $\cut(i)$'s, the algorithm
722: outputs the minimum height of $j$-schedules for $G[\cut]$ over all
723: $\cut$ such that $\cut(j)=w$ and $\cut(i)$ does not precede $v$. In
724: Figure~\ref{minheight}\note{Figure~\ref{minheight}} we give the algorithm to
725: compute $\opt{G[\cut^*]\cup\setof{vw}}$, where $\cut^*$ is a best cut of $G$
726: corresponding to $vw$.
727: 
728: %%% The function $\best{}$ is the essential part of the algorithm.  {\bf We
729: %%% need a better explanation than what follows\ldots}.
730: 
731: %%% 
732: %%% The algorithm takes $v$ and $w$ as inputs. Let $C_i$ contain $v$ and
733: %%% $C_j$ contain $w$. After fixing $\cut(j)$ at $w$, the algorithm
734: %%% proceeds iteratively with different $\cut(i)$ in every iteration.
735: %%% Based on the designated $\cut(i)$ and the fixed $\cut(j)$, the
736: %%% algorithm obtains a $j$-schedule of minimum height for $G[\cut]$ over
737: %%% all cuts $\cut'$ such that $\cut'(i)=\cut(i)$ and $\cut'(j)=\cut(j)$.
738: %%% By comparing the heights of these $j$-schedules with respect to
739: %%% different $\cut(i)$'s, the algorithm outputs the minimum height of
740: %%% $j$-schedules for $G[\cut]$ over all cuts $\cut'$ such that
741: %%% $\cut'(j)=\cut(j)$.
742: %%% 
743: %%% If $u$ is a node of $C_k$, by definition $[-,u]=[\head{C_k}, u]$ and
744: %%% $[u,-]=[u,\tail{C_k}]$.  In Figure~\ref{minheight} we give the
745: %%% algorithm to compute $\opt{G[\cut^*]\cup\setof{vw}}$, where $\cut^*$
746: %%% is a best cut of $G$ corresponding to $vw$.  
747: 
748: 
749: Function $\best{}$ is the essential part of the algorithm. Based on the
750: given subset $F$ of $\setof{1,2,\ldots, p}$ and the given cut $\cut$, it
751: looks for a best cut $\cut^*$ corresponding to $vw$ such that
752: $\cut^*(k)=\cut(k)$ for every $k\in F$.  (In the case that we are
753: interested, $F=\setof{i,j}$.)  An optimal $j$-schedule of $G[\cut^*]$ is
754: then returned. Note that for every $k\not\in F$, $\cut^*(k)$ depends on
755: a value $s$, which is the maximum of $s_1$ and $s_2$. Each of $s_1$ and
756: $s_2$ is determined simply by chains with indices in $F$ and their
757: designated cutpoints.  Namely, the choices of $\cut^*(k)$'s for different
758: $k\not\in F$ are mutually independent. This is the key to our efficient
759: algorithm.
760: 
761: In $\best{}$, we do not explicitly specify cutpoints of $\cut^*$.
762: Instead, we work on hump representation of subchains and every
763: cutpoint is implicitly specified by an $\tail{H}$ for some hump $H$.
764: Specifically, Step~1 ensures $\cut^*(k)=\cut(k)$ for every $k\in F,
765: k\ne j$. Steps~3 and~8 ensure $\cut^*(k)=\tail{H}$, where $H$ is the
766: highest $N$-hump of all $C_k$ with $\height{H}<s$ and $k\not\in F$.
767: %that has height less than $s$, for every
768: %$k\not\in F$. 
769: Since we are considering $j$-schedules, $\cut^*(j)$ is specified
770: slightly differently. Although in Step~2 the subchain of $C_j$ is only
771: up to $\prr{\cut(j)}$, $\cut^*(j)$ is still $\cut(j)$, since
772: $j$-schedule $S^*\cut(j)$ is returned in Step~10.
773: 
774: \begin{figure*}%[p]
775: \begin{center}
776: \fbox{
777: %\begin{center}
778: \begin{minipage}[t]{3in}
779: \begin{tabbing}
780: Function $\minheight{v,w}$\\
781: 1\quad\=$C_i$\quad\=$:=$ the chain containing $v;$\\
782: 2\>     $C_j$     \>$:=$ the chain containing $w;$\\
783: 3\>     $\cut(j)$ \>$:=w;$\\
784: 4\>     $h^*$     \>$:=\infty;$\\
785: 5\>     $I_0$     \>$:=\setof{v}\cup\decomp{[\suu{v},-]};$\\
786: 6\>     For every $\cut(i) \in \set{\tail{H}}{H\in I_0}$ do\\
787: 7\>\quad\= $S^*$\=$:=\best{j,\setof{i,j},\cut};$\\
788: 8\>\>      $h^*$\>$:=\min\setof{h^*,\height{S^*}};$\\
789: 9\>     Return $h^*;$
790: \end{tabbing}
791: \end{minipage}
792: }
793: \quad
794: \fbox{
795: \begin{minipage}[t]{3in}
796: \begin{tabbing}%\\ \\
797: Function $\best{j,F,\cut}$\\
798: 1\quad\=$I$\quad\=$:=\bigcup_{k\in F,k\ne j}\decomp{[-,\cut(k)]};$\\
799: 2\>     $J$\>     $:=\decomp{[-,\prr{\cut(j)}]};$\\
800: 3\>     $K$\>     $:=\bigcup_{k\not\in F}\decomp{C_k};$\\
801: 4\>     $s_1$\>   $:=\max\set{\height{H}}{H\in I\cup J, \cost{H}<0};$\\
802: 5\>     $S^+$\>   $:=\merge{\set{H\in I\cup J}{\cost{H}\ge0}};$\\
803: 6\>     $s_2$\>   $:=\height{S^+\cut(j)};$\\
804: 7\>     $s$\>     $:=\maxof{s_1,s_2};$\\
805: 8\>     $K_s$\>   $:=\set{H\in K}{\height{H}<s, \cost{H}<0};$\\
806: 9\>     $S_s$\>   $:=\merge{I\cup J\cup K_s};$\\
807: 10\>    Return    $S_s\cut(j);$
808: \end{tabbing}
809: %\end{center}
810: \end{minipage}
811: }
812: \end{center}
813: \caption[]{The algorithm for computing $\opt{G[\cut^*]\cup\setof{vw}}$
814: for a best cut $\cut^*$ of $G$ corresponding to $vw$.}
815: \label{minheight}
816: \end{figure*}
817: 
818: 
819: \subsection{Correctness}
820: We answer the following two questions in this subsection:
821: \begin{enumerate}
822: \item Why is it sufficient to try for $\cut(i)$ only those nodes in
823:   $\set{\tail{H}}{H\in I_0}$?
824: \item Why does $\best{j, F, \cut}$ return an optimal $j$-schedule of
825:    $G[\cut^*]$ with $\cut^*(k)=\cut(k)$ for every $k \in F$?
826: \end{enumerate}
827: 
828: \begin{lemma}
829: Let $\cut$ be a cut of $G$. Suppose $[x,z]$ is a subchain of $G$
830: containing $\cut(i)$. Let $H$ be the hump of $[x,z]$ containing
831: $\cut(i)$. Let $y$ be the first valley of $[\prr{H},\cut(i)]$.
832: If
833: \[
834: \cut_1(k)=\left\{
835:      \begin{array}{ll}
836:        \cut(k)&\mbox{if $k\ne i$};\\
837:        \prr{H}&\mbox{if $k=i$ and $y=\prr{H}$};\\
838:        \tail{H}&\mbox{if $k=i$ and $y\ne\prr{H}$},
839:      \end{array}
840:           \right.
841: \]
842: then
843: $\jopt{\cut_1}\le\jopt{\cut}$.
844: \label{humpboundary}
845: \end{lemma}
846: \begin{proof}
847: Straightforward.
848: \end{proof}
849: 
850: Note that the $\prr{H}$ in the above lemma is always an $\tail{H'}$ for
851: some hump $H'$ in $I_0$, which is defined in Step~5 of $\minheight{}$.
852: Therefore, Lemma~\ref{humpboundary} answers the first question.
853: 
854: By definitions of $I$, $J$, and $K_s$ it is not difficult to see that
855: the sequence returned by $\best{j,F,\cut}$ is an optimal $j$-schedule of
856: $G[\cut^*]$ for some cut $\cut^*$ such that $\cut^*(k)=\cut(k)$ for
857: every $k\in F$. The correctness of $\minheight{}$ thus relies on the
858: following lemma, which answers the second question.
859: 
860: \begin{lemma}
861: Let $\cut$ be a cut. Let $F$ be a subset of $\setof{1,2,\ldots,p}$
862: containing $j$. If $S^*=\best{j,F,\cut}$, then
863: $\height{S^*}\le\jopt{\cut}$.
864: \label{fix2cutpoints}
865: \end{lemma}
866: The rest of the subsection proves
867: %Lemma~\ref{humpboundary} and
868: Lemma~\ref{fix2cutpoints}.
869: %We need the following lemma to prove Lemma~\ref{humpboundary}.
870: %
871: %\begin{lemma}
872: %Let $\cut$ be a cut of $G$. Suppose $x$ is a node in $C_i$ preceding
873: %$\cut(i)$.
874: %Define $\cut_1$ by 
875: %\[
876: %    \cut_1(k)=\left\{
877: %                  \begin{array}{ll}
878: %                     \cut(k)&\mbox{if $k\ne i$};\\
879: %                     \mbox{the first valley of $[x,\cut(i)]$}&\mbox{if $k=i$}.
880: %                  \end{array}
881: %              \right.
882: %\]
883: %Then $\jopt{\cut_1}\le\jopt{\cut}$.
884: %\label{valley}
885: %\end{lemma}
886: %\begin{proof}
887: %Straightforward.
888: %\end{proof}
889: %
890: %\begin{proof}
891: %By the second hump decomposition property, there exists a series of
892: %nodes in $C_i$, $\cut_1(i)=y_1, y_2,\ldots,y_m=\cut(i)$, such that
893: %every subchain $[\suu{y_k},y_{k+1}]$ is a $P$-hump. Suppose $S$ is an
894: %optimal $j$-schedule of $G[\cut]$. Let $S'$ be obtained from $S$ by
895: %clustering every $P$-hump $[\suu{y_k},y_{k+1}]$ to its useful peak. By
896: %Lemma~\ref{cluster}, $S'$ is an optimal $j$-schedule of $G[\cut]$. Let
897: %$S_1$ be obtained from $S'$ by removing every $P$-hump
898: %$[\suu{y_k},y_{k+1}]$. Clearly, $\height{S_1}\le\height{S'}$. Since
899: %$S_1$ is a $j$-schedule of $G[\cut_1]$,
900: %$\jopt{\cut_1}\le\height{S_1}\le\height{S'}=\jopt{\cut}$.
901: %\end{proof}
902: %
903: %Lemma~\ref{humpboundary} can be proved as follows.
904: %\paragraph{Proof of Lemma~\ref{humpboundary}}
905: %When $y=\prr{H}$, by choice of $y$, $\jopt{\cut_1}\le\jopt{\cut}$ is
906: %immediate from Lemma~\ref{valley}.
907: %
908: %If $H$ is a $P$-hump, by definition of $\decomp{}$, $\prr{H}$ is the
909: %first valley of $[\prr{H},\tail{H}]$, so $y=\prr{H}$. Therefore when
910: %$y\ne\prr{H}$, $H$ must be an $N$-hump. We claim that $[\head{H}, y]$
911: %is a hump. Since $y$ is the first valley of $[\prr{H},\cut(i)]$ and
912: %$y\ne\prr{H}$, $y$ is also the first valley of $[\head{H}, \cut(i)]$.
913: %By definition of humps, $y$ cannot precede any useful peak of $H$.  It
914: %follows that a useful peak of $H$ is also a useful peak of
915: %$[\head{H},y]$. Thus $[\head{H},y]$ is a hump.  Let us use $H'$ to
916: %denote $[\head{H},y]$.  Clearly, $\height{H'}=\height{H}$. Since
917: %$H$ is an $N$-hump, $\cost{H'}\ge\cost{H}$.
918: %
919: %Let us define $\cut'$ by
920: %\[
921: %   \cut'(k)=\left\{
922: %                   \begin{array}{ll}
923: %                      \cut(k)&\mbox{if $k\ne i$}\\
924: %                      y &\mbox{if $k=i$}.
925: %                   \end{array}
926: %                \right.
927: %\]
928: %By Lemma~\ref{valley}, $\jopt{\cut'}\le\jopt{\cut}$.  Suppose $S$ is
929: %an optimal $j$-schedule of $G[\cut']$ in which $H'$ is clustered.
930: %We write $S=S_1H'S_2$. Inserting the sequence $[\suu{y},\tail{H}]$
931: %immediately after $H'$, we obtain a $j$-schedule $S^*=S_1HS_2$ for
932: %$G[\cut_1]$. We show $\height{S^*}\le\height{S}$. 
933: %
934: %Now $\height{S^*}$ is equal to the maximum of $\height{S_1}$,
935: %$\cost{S_1}+\height{H}$, and $\cost{S_1H}+\height{S_2}$.  Clearly,
936: %\begin{equation}
937: %  \height{S_1}\le\height{S}.
938: %  \label{ineq6}
939: %\end{equation}
940: %Since $\height{H}=\height{H'}$,
941: %\begin{equation}
942: %   \cost{S_1}+\height{H}\le\height{S}.
943: %   \label{ineq7}
944: %\end{equation}
945: %Since $\cost{H}\le\cost{H'}$, $\cost{S_1H}\le\cost{S_1H'}$. Hence
946: %\begin{equation}
947: %   \cost{S_1H}+\height{S_2}\le\height{S}.
948: %   \label{ineq8}
949: %\end{equation}
950: %Combining (\ref{ineq6}), (\ref{ineq7}), and (\ref{ineq8}), we obtain
951: %$\height{S^*}\le\height{S}$.
952: %\qed
953: Let $F_\ell=\setof{1,\ldots,\ell-1,\ell+1,\ldots,p}$.  The following
954: lemma is a special case of Lemma~\ref{fix2cutpoints}, in which $F$ is
955: composed of $p-1$ numbers.
956: 
957: \begin{lemma}
958: Let $\cut$ be a cut.  If $S^*=\best{j,F_\ell,\cut}$ for some $\ell\ne
959: j$, then $\height{S^*}\le\jopt{\cut}$.
960: \label{bestcut1}
961: \end{lemma}
962: \begin{proof}
963: Define $\cut_1$ by
964: \[
965:    \cut_1(k)=\left\{
966:                    \begin{array}{ll}
967:                       \cut(k)&\mbox{if $k\ne\ell$};\\
968:                       \mbox{the first valley of $[-,\cut(\ell)]$}
969:                             & \mbox{if $k=\ell$}.
970:                    \end{array}
971:                 \right.
972: \]
973: Then it is not difficult to see $\jopt{\cut_1}\le\jopt{\cut}$.  Let
974: $\cut'$ be the cut with $\height{S^*}=\jopt{\cut'}$, i.e., $S^*$ is a
975: $j$-schedule of $G[\cut']$.  By definition of $\best{}$, $\cut'$ and
976: $\cut_1$ could differ only at the $\ell$-th position. Clearly, it
977: suffices to show $\jopt{\cut'}\le\jopt{\cut_1}$.
978: 
979: Let $w=\cut_1(j)$. Let $L=\decomp{[-,\cut_1(k)]}$. Define
980: \[
981:    S=\merge{I\cup J\cup L},
982: \]
983: where $I$ and $J$ are defined in Steps~1 and~2 of $\best{}$.  Clearly,
984: $Sw$ is an optimal $j$-schedule of $G[\cut_1]$. Thus,
985: $\height{Sw}=\jopt{\cut_1}$.  By choice of $\cut_1(\ell)$, $L$
986: contains no $P$-hump. Hence, by the uniqueness assumption of
987: $\merge{}$, we could write $Sw=S_1S^+w$, where $S^+$ is defined in
988: Step~5 of $\best{}$. We prove $\jopt{\cut'}\le\jopt{\cut_1}$ by 
989: showing that $\cut'(\ell)$ succeeds $\cut(\ell)$ if and only if
990: $\jopt{\cut'}\le\height{Sw}$ as follows.
991: 
992: \paragraph{Case 1: $\cut'(\ell)$ succeeds $\cut(\ell)$.}
993: Since $L$ contains no $P$-hump, each hump of $[-,\cut_1(\ell)]$
994: appears in $S_1$. Therefore, $S_1S'S^+w$ is a $j$-schedule of
995: $G[\cut']$, where $S'=[\suu{\cut_1(\ell)},\cut'(\ell)]$.  We show
996: $\height{S_1S'S^+w}\le\height{S_1S^+w}$.  Now
997: $\height{S_1S'S^+w}=\max\setof{\height{S_1},\cost{S_1}+\height{S'},\cost{S_1S'}+\height{S^+w}}$.
998: Clearly,
999: \begin{equation}
1000:   \height{S_1}\le\height{S_1S^+w}.
1001:   \label{ineq1}
1002: \end{equation}
1003: By definition of $F$, the $K_s$ defined in Step~8 of $\best{}$ is
1004: composed of the $N$-humps of $C_\ell$ that have heights less than $s$.
1005: Therefore, by choice of $\cut'(\ell)$ every hump of $[-,\cut'(\ell)]$
1006: has height less than $s$. It follows from the standard order of humps
1007: in $S'$ that $\height{S'}<s$. By Step~7 of $\best{}$,
1008: $s=\maxof{s_1,s_2}$. If $s=s_2=\height{S^+w}$, as defined in Step~6 of
1009: $\best{}$, then $\cost{S_1}+\height{S'}<\cost{S_1}+\height{S^+w}$. If
1010: $s=s_1=\height{H^*}$, where $H^*$ is a highest $N$-hump in $I\cup J$,
1011: then we could write $S_1=S_2H^*S_3$. It follows that
1012: \begin{eqnarray*}
1013:   \cost{S_1}+\height{S'}&=&\cost{S_2H^*S_3}+\height{S'}\\
1014:                         &<&\cost{S_2}+\height{H^*}\\
1015:                         &\le&\height{S_2H^*}\\
1016:                         &\le&\height{S_1}.
1017: \end{eqnarray*}
1018: Therefore, in either case we have
1019: \begin{equation}
1020:    \cost{S_1}+\height{S'}<\height{S_1S^+w}.
1021:    \label{ineq2}
1022: \end{equation}
1023: By choice of $\cut'(\ell)$, $\cost{S'}<0$. Hence,
1024: \begin{eqnarray}
1025:   \cost{S_1S'}+\height{S^+w}&<&\cost{S_1}+\height{S^+w}\nonumber\\ 
1026:   &\le&\height{S_1S^+w}.  \label{ineq3}
1027: \end{eqnarray}
1028: Combining Equations~(\ref{ineq1}),~(\ref{ineq2}), and~(\ref{ineq3}),
1029: we obtain $\height{S_1S'S^+w}\le\height{Sw}$.
1030: 
1031: \paragraph{Case 2: $\cut'(\ell)$ precedes $\cut_1(\ell)$.}
1032: Let $S'=[\suu{\cut'(\ell)},\cut_1(\ell)]$. By choice of
1033: $\cut'(\ell)$, it is not difficult to see
1034: \begin{displaymath}
1035:  \decomp{[-,\cut_1(\ell)]}=\decomp{[-,\cut'(\ell)]}\cup\decomp{S'}.
1036: \end{displaymath}
1037: By choice of $\cut_1(\ell)$, $\decomp{S'}$ contains only $N$-humps of
1038: heights no less than $s$. Note that every $N$-hump in $I\cup J$ has
1039: height no more than $s$. By standard form of $S$, we know that $S'$ is
1040: a suffix of $S_1$. Therefore, we could write $Sw=S_2S'S^+w$. Removing
1041: $S'$ from $Sw$, we obtain a $j$-schedule $S_2S^+w$ of $G[\cut']$. We
1042: show $\height{S_2S^+w}\le\height{Sw}$.
1043: 
1044: Now
1045: $\height{S_2S^+w}=\max\setof{\height{S_2},\cost{S_2}+\height{S^+w}}$. Clearly,
1046: \begin{equation}
1047:   \height{S_2}\le\height{S_2S'S^+w}=\height{Sw}.
1048:   \label{ineq4}
1049: \end{equation}
1050: Since each hump of $S'$ has height no less than $s$,
1051: $\height{S'}\ge s$. 
1052: Hence, $\height{S'S^+w}\ge\height{S'}\ge s\ge s_2=\height{S^+w}$.
1053: It follows that
1054: \begin{eqnarray}
1055:   \cost{S_1}+\height{S^+w}&\le&\cost{S_1}+\height{S'S^+w}\nonumber\\ 
1056:   &\le&\height{Sw}.  \label{ineq5}
1057: \end{eqnarray}
1058: Combining Equations~(\ref{ineq4}) and~(\ref{ineq5}), we obtain
1059: $\height{S_1S^+w}\le\height{Sw}$. 
1060: \end{proof}
1061: 
1062: Now we are ready to prove Lemma~\ref{fix2cutpoints}.
1063: 
1064: \paragraph{Proof of Lemma~\ref{fix2cutpoints}}
1065: Recall that $S^*=\best{j,F,\cut}$. Let $\cut'$ be the cut such that
1066: $S^*$ is a $j$-schedule of $G[\cut']$. ($S^*$ is certainly an optimal
1067: $j$-schedule of $G[\cut']$.) We use the algorithm in
1068: Figure~\ref{cuttransform}\note{Figure~\ref{cuttransform}} to prove the
1069: lemma. Procedure $\cuttrans{}$ proceeds with iterations, in which the
1070: value of $\ell$ varies among $\setof{1,\ldots,p}$. If $\ell\not\in F$,
1071: then the value of $\cut(\ell)$ is updated. Since $S$ is an optimal
1072: $j$-schedule of $G[\cut']$, it follows from Lemma~\ref{bestcut1} that
1073: $\jopt{\cut'}\le\jopt{\cut}$ always holds during the while-loop. If we
1074: could show that $\cuttrans{}$ always terminates, then the lemma is
1075: proved.
1076: 
1077: Let $s^*_1$, $s^*_2$, and $s^*$ be the $s_1$, $s_2$, and $s$ in the
1078: execution of $\best{j,F,\cut}$. Let $s_1$, $s_2$, and $s$ be those in
1079: the execution of $\best{j, F_\ell, \cut}$. The values of $s_1$, $s_2$,
1080: and $s$ change as the while-loop of $\cuttrans{}$ proceeds. We show
1081: that $\cut$ eventually becomes $\cut'$ by arguing that $s$ eventually
1082: becomes $s^*$.
1083: 
1084: Since $F\subseteq F_\ell$, $s_1\ge s^*_1$ always holds. By definition
1085: of $\best{}$, whenever Step~7 of $\cuttrans{}$ is finished,
1086: $[-,\cut(\ell)]$ contains only $N$-humps. Thus, after the first $p$
1087: iterations of the while-loop, $[-,\cut(\ell)]$ contains no $P$-hump
1088: for every $\ell\not\in F$. Henceforth, $s_2=s^*_2$ and therefore
1089: $s=\maxof{s_1,s_2}\ge\maxof{s^*_1,s^*_2}=s^*$. If $s>s^*$, then
1090: $s=s_1>s^*$. Since $s_1>s^*$, there must be an $N$-hump $H$ in
1091: $\bigcup_{k\not\in F}\decomp{[-,\cut(k)]}$ such that $\height{H}=s_1$.
1092: Since $s=s_1$, in the next iteration when $C_\ell$ contains $H$,
1093: $\cut(\ell)$ will be moved before $H$ by definition of $\best{}$. It
1094: follows that the value of $s$ is nonincreasing and $s$ will become
1095: $s^*$. Once $s=s^*$, in the following $p$ iterations, $\cut(k)$ will
1096: be moved to $\cut'(k)$ for every $k\not\in F$. The algorithm then
1097: terminates.
1098: \qed
1099: 
1100: \begin{figure}%[p]
1101: \begin{center}
1102: \fbox{
1103: \begin{minipage}{5in}
1104: \begin{center}
1105: \begin{tabbing}
1106: \quad\=\quad\=\quad\=\quad\=\quad\=\quad\=\kill
1107: Procedure $\cuttrans{\cut,\cut^*}$ \\
1108: 1 \> $\ell:=0;$\\
1109: 2 \> While $\cut^*\ne\cut$ do\\
1110: 3 \> \> $\ell:= (\ell \bmod p) + 1;$\\
1111: 4 \> \> If $\ell\not\in F$\\
1112: 5 \> \> \> $S:=\best{j,F_\ell,\cut};$\\
1113: 6 \> \> \> $\cut':=$ the cut such that $S$ is an
1114:            optimal $j$-schedule of $G[\cut']$;\\
1115: 7 \> \> \> $\cut := \cut';$
1116: \end{tabbing}
1117: \end{center}
1118: \end{minipage}
1119: }
1120: \end{center}
1121: \caption[]{The algorithm transforms $\cut$ to $\cut^*$. We prove
1122: Lemma~\ref{fix2cutpoints} by showing that this algorithm always
1123: terminates.}
1124: \label{cuttransform}
1125: \end{figure}
1126: 
1127: \subsection{Implementation}
1128: \label{singleimplementation}
1129: Recall that $\decomp{C}$ runs in time linear in $|C|$, the length of
1130: chain $C$. It follows that the time complexity of Steps~1--5 and
1131: Step~9 of $\minheight{}$ is $O(n)$. Suppose the order of nodes
1132: assigned to $\cut(i)$ in the for-loop is the same as their order in
1133: $C_i$.  In the subsection we focus on implementing $\best{}$ such that
1134: the for-loop runs in time $O(n)$.
1135: 
1136: \paragraph{Number of Iterations}
1137: The following lemma ensures that the size of $I_0$ is
1138: $O(\sqrt{|C_i|})$. It follows that the number of iterations is
1139: $O(\sqrt{n})$.
1140: 
1141: \begin{lemma}
1142: Suppose $C$ is a chain with node costs $\pm1$. The number of humps in
1143: $\decomp{C}$ is $O(\sqrt{|C|})$.
1144: \label{rootn}
1145: \end{lemma}
1146: \begin{proof}
1147: Since the costs of nodes are either $+1$ or $-1$, a hump of height
1148: $\ell$ contains at least $\ell$ nodes. For the same reason, a hump of
1149: reverse height $\ell$ contains at least $\ell$ nodes. By the first
1150: hump decomposition property, the heights of the $N$-humps in
1151: $\decomp{C}$ are different, and so are the reverse heights
1152: of the $P$-humps in $\decomp{C}$. If there are $n_1$
1153: $N$-humps and $n_2$ $P$-humps in $\decomp{C}$, then 
1154: $|C|=\Omega(n^2_1+n^2_2)=\Theta((n_1+n_2)^2)$. This proves the lemma.
1155: \end{proof}
1156: 
1157: \paragraph{Compact Representation of Humps}
1158: For the sake of efficiency, we do not deal with the internal structure
1159: of humps in $\best{}$. It suffices to represent each hump $H$ by a
1160: pair $(\cost{H},\height{H})$ and work on the compact representation of
1161: humps. Therefore, each of the $I$, $J$, and $K$ computed in the first
1162: three steps is a set of pairs. Clearly, each of these three steps
1163: takes $O(n)$ time.  However, the contents of $J$ and $K$ do not change
1164: in different iterations. Thus, Steps~2 and~3 need only be executed
1165: once.
1166: 
1167: By $F=\setof{i,j}$, we have $I=\decomp{[-,\cut(i)]}$.  Suppose $I_t$
1168: and $\cut_t$ are the $I$ and $\cut$ in the $t$-th iteration for some
1169: $t\ge 2$. By the order of nodes assigned to $\cut(i)$, we need not
1170: recompute $\decomp{[-,\cut_t(i)]}$ from scratch. In the $t$-th
1171: execution of Step~1, $[-,\cut_t(i)]$ is obtained by appending a hump
1172: $[\suu{\cut_{t-1}(i)},\cut_t(i)]$ to $[-,\cut_{t-1}(i)]$. By the
1173: argument following the hump decomposition properties in
1174: \S\ref{property-sect}, the $t$-th execution of Step~1 takes
1175: $O(|I_{t-1}|)$ time. By Lemma~\ref{rootn}, the time complexity of all
1176: executions of Step~1 is $O(n+\sqrt{n}\times\sqrt{n})=O(n)$.
1177: 
1178: \paragraph{Priority Tree}
1179: To compute $s_1$ efficiently, we resort to a {\em priority tree}, a
1180: complete binary tree with $n+1$ leaves.\footnote{Note that there are
1181: other ways to implement Step 4 to run in linear time. However, the
1182: necessity of priority tree will become clear when we address the
1183: implementation of the all-pairs algorithm.}  Each leaf keeps two
1184: values, {\em count} and {\em maxheight}. The cost of the $(h+1)$-st
1185: leaf is the number of $N$-humps of height $h$ in $I\cup J$. The
1186: maxheight of the $(h+1)$-st leaf is 0 (respectively, $h$), if its
1187: count is zero (respectively, nonzero). The maxheight of an internal
1188: node is the maximum maxheight of its children. It follows that the
1189: maxheight of the root of a priority tree is the correct value of
1190: $s$. The priority tree can be built in time $O(n)$. Whenever a hump is
1191: added to or deleted from $I\cup J$, the priority tree can be updated
1192: in time $O(\log n)$. Since $J$ is fixed, to compute $s_1$ in $t$-th
1193: iteration for every $t\ge 2$, we add humps in $I_t-I_{t-1}$ to $I\cup
1194: J$, remove humps in $I_{t-1}-I_t$ from $I\cup J$, and update the
1195: priority tree. By the third hump decomposition property, we have
1196: \begin{equation}
1197:   \sum_{2\leq t\leq q_i}|I_t-I_{t-1}|+|I_{t-1}-I_t|=O\left(\sqrt{|C_i|}\right),
1198:   \label{changeofI}
1199: \end{equation}
1200: where $q_i$ is the number of humps in $C_i$.
1201: Hence, the time complexity of all executions of Step~4 is
1202: $O(n+\sqrt{n}\times\log n)=O(n)$.
1203: 
1204: \paragraph{Hump Tree}
1205: To obtain the value of $s_2$, it is not necessary to know the value of
1206: $S^+$. We need only to obtain the height of $S^+\cut(j)$. Similarly,
1207: the actual value of $S_s$ is irrelevant. What we compare in Step~8 of
1208: $\minheight{}$ is the height of $S_s\cut(j)$. We need a data structure
1209: to compute these two heights efficiently.
1210: 
1211: Let $L$ be a set of humps such that $\height{H}\le n$ and $\fall{H}\le
1212: n$ for every $H\in L$. A {\em hump tree} $T$ for $L$ is a binary tree
1213: composed of two complete binary subtrees. Each subtree has $n+1$
1214: leaves. Let $T_N$ be the left subtree and $T_P$ be the right subtree.
1215: The $(h+1)$-st leaf of $T_N$ associates with the set of $N$-humps of
1216: height $h$ in $L$.  The $(h+1)$-st leaf of $T_P$ associates with the
1217: set of $P$-humps of reverse height $n-h$ in $L$. Let $T_x$ be the
1218: subtree of $T$ rooted at $x$. Let $L_x$ be the set of humps associated
1219: with leaves of $T_x$. Define $\height{T_x}=\height{\merge{L_x}}$ and
1220: $\cost{T_x}=\cost{\merge{L_x}}$.  Clearly, when $L=I\cup J$,
1221: $\height{T_P}=\height{S^+}$ and $\cost{T_P}=\cost{S^+}$. When $L=I\cup
1222: J\cup K_s$, $\height{T}=\height{S_s}$ and $\cost{T}=\cost{S_s}$. The
1223: heights of $S^+\cut(j)$ and $S_s\cut(j)$ can then be computed by
1224: \begin{eqnarray*}
1225: \height{S^+\cut(j)}&=&\maxof{\height{S^+},\cost{S^+}+\height{\cut(j)}};\\
1226: \height{S_s\cut(j)}&=&\maxof{\height{S_s},\cost{S_s}+\height{\cut(j)}}.
1227: \end{eqnarray*}
1228: 
1229: 
1230: Let us keep $\height{T_x}$ and $\cost{T_x}$ in $x$ for every node $x$ of
1231: $T$. Therefore, the hump tree $T$ takes $O(n)$ space.  We show how to
1232: compute $\height{T_x}$ and $\cost{T_x}$ for every node $x$ from leaves
1233: to root. When $x$ is a leaf of $T$, the humps in $L_x$ have the same
1234: height if $x$ is in $T_N$, and the same reverse height if $x$ is in
1235: $T_P$. It is not difficult to see that $\cost{T_x}=\sum_{H\in
1236:   L_x}\cost{H}$; and
1237: \[
1238:   \height{T_x}=
1239:     \left\{
1240:        \begin{array}{ll}
1241:           0 & \mbox{if $L_x=\emptyset$};\\
1242:           h & \mbox{if $x$ is the $(h+1)$-st leaf of $T_N$};\\
1243:           \cost{T_x}-h & \mbox{if $x$ is the $(n-h+1)$-st leaf of $T_P$}.
1244:         \end{array}
1245:     \right.
1246: \]
1247: 
1248: When $x$ is an internal node of $T$, $\height{T_x}$ and $\cost{T_x}$
1249: can be computed by the information kept in the children of $x$.
1250: Suppose $y$ and $z$ are the left and right children of $x$,
1251: respectively. For any $H$ in $L_y$ and $H'$ in $L_z$, by the way we
1252: associate humps with leaves, the series $HH'$ is in standard order.
1253: Hence,
1254: \begin{eqnarray*}
1255:   \height{T_x}&=&\maxof{\height{T_y},\cost{T_y}+\height{T_z}};\\
1256:   \cost{T_x}&=&\cost{T_y}+\cost{T_z}.
1257: \end{eqnarray*}
1258: It follows that the hump tree $T$ for $L$ can be built in time
1259: $O(n+|L|)$.
1260: 
1261: Once $T$ is built, inserting a hump to $L$ can be done efficiently.
1262: Suppose we insert $H$ to $L$. For the case that $H$ is an $N$-hump, if
1263: $L_x=\emptyset$, then let $h(T_x)=h$; otherwise, add $\cost{H}$ to
1264: $\cost{T_x}$, where $x$ is the $(\height{H}+1)$-st leaf of $T_N$. If
1265: $H$ is a $P$-hump, then we add $\cost{H}$ to both $\cost{T_x}$ and
1266: $\height{T_x}$, where $x$ is the $(n-\fall{H}+1)$-st leaf of $T_P$.
1267: To update $T$, we simply update the internal nodes on the path from
1268: $x$ to the root of $T$. Deleting a hump from $L$ can be done similarly
1269: by replacing every addition with a subtraction. Clearly, both insertion
1270: and deletion take time $O(\log n)$.
1271: 
1272: To compute the heights of $S^+\cut(j)$ and $S_s\cut(j)$, we need not
1273: maintain a hump tree for $I\cup J$ and another hump tree for $I\cup
1274: J\cup K_s$. Suppose $K^-$ is the set of $N$-humps in $K$, i.e.,
1275: $K^-=\set{H\in K}{\cost{H}<0}$. It suffices to maintain a hump tree
1276: $T$ for $I\cup J\cup K^-$. Since there is no $P$-hump in $K^-$, it is
1277: still true that $\height{T_p}=\height{S^+}$ and
1278: $\cost{T_p}=\cost{S^+}$. Although the hump tree is not for $I\cup
1279: J\cup K_s$, the values of $\height{S_s}$ and $\cost{S_s}$ can be
1280: efficiently obtained by the procedure in
1281: Figure~\ref{remove-range}\note{Figure~\ref{remove-range}}.
1282: Procedure $\removerange{}$ acts as if the $N$-humps of heights no less
1283: than $s$ are removed from the hump tree for $I\cup J\cup K^-$.
1284: Therefore, the resulting $\height{T}$ and $\cost{T}$ are $\height{S_s}$
1285: and $\cost{S_s}$, respectively. Clearly, $\removerange{}$ takes $O(\log
1286: n)$ time.  Since we maintain the hump tree for $I\cup J\cup K^-$ in
1287: every iteration, we use $O(\log n)$ space to keep those modified
1288: information of $T$. After obtaining the information we need, we
1289: restore the hump tree for $I\cup J\cup K^-$ in time $O(\log n)$.
1290: 
1291: 
1292: \begin{figure}%[p]
1293: \begin{center}
1294: \fbox{
1295: \begin{minipage}{5in}
1296: \begin{center}
1297: \begin{tabbing}
1298: \quad\=\quad\=\quad\=\quad\=\quad\=\quad\=\kill
1299: Procedure $\removerange{T,s}$ \\
1300: 1 \> $y$ := the $s$-th leaf of $T_N$;\\
1301: 2 \> While $y$ is not the root of $T_N$ do\\
1302: 3 \> \> $x := \mbox{the parent of $y$};$\\
1303: 4 \> \> If $y$ is the left child of $x$ then\\
1304: 5 \> \> \> $(\height{T_x},\cost{T_x}):=(\height{T_y},\cost{T_y})$;\\
1305: 6 \> \> else\\
1306: 7 \> \> \> Recompute $\height{T_x}$ and $\cost{T_x}$;\\
1307: 8 \> \> $y := x;$\\
1308: 9 \> Recompute $\height{T}$ and $\cost{T}$;
1309: \end{tabbing}
1310: \end{center}
1311: \end{minipage}
1312: }
1313: \end{center}
1314: \caption[]{Let $T$ be the hump tree for $I\cup J\cup K^-$. This
1315: procedure acts as if the $N$-humps of heights no less than $s$ are
1316: removed from the hump tree.}
1317: \label{remove-range}
1318: \end{figure}
1319: 
1320: Let $I_t$ be the $I$ in the $t$-th iteration for any $t\ge1$.  To
1321: obtain the hump tree for $I_t\cup J\cup K^-$ from $I_{t-1}\cup J\cup
1322: K^-$, we need to insert the humps in $I_t-I_{t-1}$ to $T$ and remove
1323: the humps in $I_{t-1}-I_{t}$ from $T$.  Since each insertion and
1324: deletion takes $O(\log n)$ time, it follows from
1325: Equation~(\ref{changeofI}) that the overall time complexity for
1326: obtaining the hump tree from that of previous iteration is
1327: $O(\sqrt{n}\times\log n)$.  Recall that building a hump tree for $L$
1328: takes $O(n+|L|)$ time. Since there are $n$ nodes in $G$, $|I_1\cup
1329: J\cup K^-|=O(n)$. It follows that the time complexity for building a
1330: hump tree for $I_1\cup J\cup K^-$ is $O(n)$.
1331: 
1332: 
1333: 
1334: By the above arguments we implement $\best{}$ such that the overall
1335: time complexity of the while-loop in $\minheight{}$ is $O(n)$.
1336: We therefore have the following theorem.
1337: 
1338: \begin{theorem}
1339: \label{singlechb}
1340: Suppose $G$ is a graph consisting of $p$ disjoint chains comprising
1341: $n$ nodes, where each node represents either a $P$-operation or a
1342: $V$-operation.  For any two nodes $v$ and $w$ of $G$, one can
1343: determine in $O(n)$ time whether there is a valid subschedule in which
1344: $v$ precedes $w$.
1345: \end{theorem}
1346: 
1347: 
1348: \section{Algorithm for All Pairs}
1349: \label{sec:all}
1350: %\subsection{All-Pairs Race-Condition Detection}
1351: %For the purpose of debugging parallel programs, it is important to
1352: %exactly detect all races. Hence, we need to determine the above for all
1353: %pairs of nodes $v$ and $w$.  
1354: In this section we show how to determine the $\chb$ relations for all
1355: pairs of nodes in $G$. The linear-time algorithm for a single pair of
1356: nodes, applied to all $O(n^2)$ pairs, takes time $O(n^3)$.
1357: Fortunately, there is a {\em compact representation} of this
1358: information.  To represent this information, it is sufficient that we
1359: indicate, for each node $v$, and for each chain $C$ not containing
1360: $v$, the first node $w$ in $C$ such that $v$ precedes $w$ in some
1361: valid subschedules. This representation has size $O(np)$, where $n$ is
1362: the number of nodes and $p$ is the number of chains.  The
1363: representation can be used to determine in constant time whether there
1364: is a race between two given operations $v$ and $w$, assuming that the
1365: input $p$ chains are schedulable.\footnote{Since the $p$ chains
1366: represent a trace of a parallel program, the assumption holds. For
1367: arbitrary $p$ chains, one can determine whether they are schedulable
1368: using the algorithm in~\cite{Abdel-Wahab:1978:SMM}.}
1369: %A race exists if
1370: %either operation can precede the other.  
1371: To determine whether $v$ can precede $w$, we obtain the first node in
1372: $w$'s chain that could be preceded by $v$ in some valid subschedules.
1373: If this first node is numbered later than $w$, then $v$ can precede
1374: $w$.  Otherwise, $v$ cannot precede $w$.  We therefore consider the
1375: complexity of constructing such a representation.  Clearly, it can be
1376: constructed by a sequence of calls to the algorithm of Theorem
1377: \ref{singlechb}. We show how to do much better; in fact the time
1378: required by our algorithm is only $O(\log n)$ times the time required
1379: simply to write down the output.
1380: 
1381: %Recall that $G$ is composed of $p$ disjoint chains,
1382: %$C_1,C_2,\ldots,C_p$, of $n$ nodes. 
1383: 
1384: \subsection{The Algorithm}
1385: 
1386: Let $\first{j}{v}$ denote the first node in $C_j$ that could be
1387: preceded by $v$ in some valid subschedule of $G$. The output of the
1388: all-pairs algorithm is thus the value of $\first{j}{v}$ for every node
1389: $v$ and $1\le j\le p$. Note that $\first{j}{v}$ could be $\fini$,
1390: which means that none of nodes in $C_j$ can be preceded by $v$ in any
1391: valid subschedule of $G$.
1392: 
1393: Let us describe first the procedure $\chainpair{i,j}$ which computes
1394: $\first{j}{v}$ for every $v\in C_i$. The all-pairs algorithm simply
1395: calls $\chainpair{i,j}$ for every $1\le i, j\le p$.  For convenience, let
1396: $\su{j}{w}=\suu{w}$ for every $w\in C_j$ and let
1397: $\su{j}{\init}=\head{C_j}$.  Procedure $\chainpair{i,j}$ is shown in
1398: Figure~\ref{chainpair}\note{Figure~\ref{chainpair}}.  The algorithm starts
1399: with letting $v$ be $\tail{C_i}$ and letting $w$ be $\tail{C_j}$. The
1400: repeat-loop proceeds by replacing $w$ with $\prr{w}$. Once $\minheight{v,w}$
1401: is not zero, the algorithm reports $\su{j}{w}$ as $\first{j}{v}$. After
1402: replacing $v$ with $\prr{v}$, the repeat-loop continues the same procedure
1403: to search for new $\first{j}{w}$.
1404: 
1405: \begin{figure}%[p]
1406: \begin{center}
1407: \fbox{
1408: \begin{minipage}{5in}
1409: \begin{center}
1410: \begin{tabbing}
1411: \quad\=\quad\=\quad\=\quad\=\quad\=\quad\=\kill
1412: Procedure $\chainpair{i,j}$ \\
1413:  1\>     $(v, w) := (\tail{C_i}, \tail{C_j})$;\\
1414:  2\>     Repeat\\
1415:  3\>        \>If $w=\init$ \=then \=$h := 1$;\\
1416:  4\>        \>             \>else \>$h := \minheight{v,w}$;\\
1417:  5\>        \>If $h>0$ \>then \=$\first{j}{v} :=\su{j}{w}$;\\
1418:  6\>        \>         \>     \>          $v := \prr{v}$;\\
1419:  7\>        \>         \>else\>$w := \prr{w}$;\\
1420:  8\>     Until $v = \init$;
1421: \end{tabbing}
1422: \end{center}
1423: \end{minipage}
1424: }
1425: \end{center}
1426: \caption[]{The algorithm that computes $\first{j}{v}$ for every $v \in C_i$.}
1427: \label{chainpair}
1428: \end{figure}
1429: 
1430: \subsection{Correctness}
1431: By induction on $v$ we show that $\chainpair{i,j}$ correctly computes
1432: $\first{j}{v}$ for every $v \in C_i$.
1433: 
1434: When $v=\tail{C_i}$, procedure $\chainpair{i,j}$ keeps replacing $w$
1435: with $\pr{j}{w}$ until $w = \init$ or $\minheight{v,w}> 0$.  If
1436: $w=\init$, then $\opt{G\cup\setof{vw'}}=0$ for every $w' \in C_j$.  Thus,
1437: $\first{j}{v}=\su{j}{\init}=\su{j}{w}=\head{C_j}$ is correct. If
1438: $\minheight{v,w}>0$, then $v\notchb w$. It follows that $v\notchb w'$ for
1439: every $w'$ precedes $w$ in $C_j$. Since $\minheight{v,\su{j}{w}}=0$,
1440: $v\chb\su{j}{w}$.  Therefore, $\su{j}{w}$ is the correct value of
1441: $\first{j}{v}$. This confirms the induction basis.
1442: 
1443: Suppose the procedure $\chainpair{i,j}$ correctly reports $\su{j}{w}$ as
1444: the value of $\first{j}{\su{i}{v}}$ in a certain iteration of the
1445: repeat-loop.  We need to show that in the remaining iterations
1446: $\first{j}{v}$ will also be correctly computed. Since
1447: $\su{i}{v}\chb\su{j}{w}$, $v\chb\su{j}{w}$. It follows that $v\chb w'$
1448: (and thus $\minheight{v,w'}=0$) for every $w'$ succeeding $w$ in $C_j$.
1449: In other words, to locate the first node in $C_j$ that could be preceded
1450: by $v$, it suffices to start testing from $w$.  For the same reason as
1451: above, $\chainpair{i,j}$ reports the correct value of $\first{j}{w}$.
1452: The correctness is therefore ensured.
1453: 
1454: \subsection{Implementation}
1455: We show in this subsection how to implement $\chainpair{i,j}$ to run
1456: in time $O((|C_i|+|C_j|)\log n)$. It then follows that the time
1457: complexity of the all-pairs algorithm is $O(np\log n)$.
1458: 
1459: Suppose each time before we call $\chainpair{i,j}$, we have the hump
1460: tree for $I\cup J\cup K^-$, where
1461: \begin{eqnarray*}
1462:  I&=&\decomp{C_i};\\
1463:  J&=&\decomp{[-,\prr{\tail{C_j}}]};\\
1464:  K^-&=&\set{H\in\bigcup_{{\scriptstyle1\le k\le p}\atop{\scriptstyle k\ne i,j}}
1465:             \decomp{C_k}}{\cost{H}<0}.
1466: \end{eqnarray*}
1467: It follows from \S\ref{singleimplementation} that the first
1468: call to $\minheight{v,w}$ can be computed in time $O(\log n)$, since
1469: only one $\cut(i)$ need be considered.  In each of the remaining
1470: iterations of the repeat-loop, we either replace $v$ with $\prr{v}$ or
1471: replace $w$ with $\prr{w}$. The remaining lemma guarantees that to
1472: compute each of the following $\minheight{v,w}$, we need only try $v$ as
1473: the cutpoint of $C_i$.
1474: 
1475: \begin{lemma}
1476: Consider any iteration of the repeat-loop in $\chainpair{i,j}$. When
1477: the algorithm computes $h=\minheight{v,w}$, $v$ is the only cutpoint
1478: of $C_i$ that could make $h$ zero.
1479: \label{onlycutpoint}
1480: \end{lemma}
1481: \begin{proof}
1482: By definition of $\chainpair{}$, when computing $\minheight{v,w}$,
1483: $\first{j}{\su{i}{v}}$ always succeeds $w$ in $C_j$.  Assume for a
1484: contradiction that $u$ is a node succeeding $v$ in $C_i$ such that
1485: there is a cut $\cut$ of $G$ where $\cut(i)=u$, $\cut(j)=w$, and
1486: $\jopt{\cut}=0$. It follows that $u\chb w$ and thus $\su{i}{v}\chb w$.
1487: This contradicts the fact that $\first{j}{\su{i}{v}}$ succeeds $w$ in
1488: $C_j$.
1489: \end{proof}
1490: 
1491: \begin{theorem}
1492:   \label{allchb}
1493:   Suppose $G$ is as in Theorem \ref{singlechb}.  The compact
1494:   representation of the relation ``$v$ precedes $w$ in some valid
1495:   subschedules'' can be constructed in $O(np\log n)$ time and $O(n)$
1496:   space.
1497: \end{theorem}
1498: \begin{proof}
1499: Note that in each iteration of the repeat-loop, either $v$ or $w$ is
1500: moved by one position. Since the costs of $v$ and $w$ are $\pm1$, by
1501: the first hump decomposition property the number of humps updated in
1502: $I\cup J\cup K^-$ between two consecutive iterations is a constant.
1503: Thus, each execution of $\minheight{v,w}$ takes only time $O(\log n)$.
1504: Since the number of iterations of the repeat-loop is $O(|C_i|+|C_j|)$,
1505: each execution of $\chainpair{i,j}$ takes time
1506: \begin{equation}
1507:   O((|C_i|+|C_j|)\times\log n).
1508:   \label{time1}
1509: \end{equation}
1510: It remains to show how to efficiently build the hump tree for each
1511: execution of $\chainpair{i,j}$.
1512: 
1513: The very first hump tree can be constructed in time
1514: \begin{equation}
1515:   O(n).
1516:   \label{time2}
1517: \end{equation}
1518: Consider the moment when $\chainpair{i,j}$ is just finished and the
1519: all-pairs algorithm is about to call $\chainpair{i_1, j_1}$.  Since
1520: all humps in $I\cup J$ have been deleted during the execution of
1521: $\chainpair{i,j}$, the current $T$ is the hump tree for the $N$-humps
1522: in $\bigcup_{1\le k\le p; k\ne i,j}\decomp{C_k}$.  In order to obtain
1523: the hump tree for $\chainpair{i_1, j_1}$, we have to add the $N$-humps
1524: in $\decomp{C_i}\cup\decomp{C_j}$, delete the $N$-humps in
1525: $\decomp{C_{j_1}}$ from $T$, and then insert the humps in
1526: \begin{displaymath}
1527: \set{H\in\decomp{C_{i_1}}}{\cost{H}\ge0}\cup\decomp{[-,\prr{\tail{C_{j_1}}}]}
1528: \end{displaymath}
1529: to $T$. The hump decomposition can be done in time
1530: \begin{equation}
1531:   O(|C_i|+|C_j|+|C_{i_1}|+|C_{j_1}|).
1532:   \label{time3}
1533: \end{equation}
1534: The insertion and deletion of humps can be done in time
1535: \begin{equation}
1536:   O\left(\left(\sqrt{|C_i|}+\sqrt{|C_j|}+\sqrt{|C_{i_1}|}+\sqrt{|C_{j_1}|}\right)\times\log n\right).
1537:   \label{time4}
1538: \end{equation}
1539: By (\ref{time1}),~(\ref{time2}),~(\ref{time3}), and~(\ref{time4}), the
1540: overall time complexity of the all-pairs algorithm is
1541: \begin{displaymath}
1542: O(n)+\sum_{1\le i,j\le p}\left(O(|C_i|+|C_j|) +
1543: O\left(\sqrt{|C_i|}+\sqrt{|C_j|}\right) \times\log n +
1544: O(|C_i|+|C_j|)\times\log n\right), 
1545: \end{displaymath}
1546: which is $O(np\log n)$.  
1547: %Theorem~\ref{allchb} is proved.
1548: \end{proof}
1549: 
1550: \section{NP-completeness}
1551: \label{sec:2semaphores}
1552: 
1553: In this section we sketch the proof for the following theorem.
1554: \begin{theorem}
1555: \label{2semaphores}
1556: The race-condition detection problem for a parallel program that uses
1557: more than one semaphore is NP-complete.
1558: \end{theorem}
1559: %the NP-complete proof for determining
1560: %whether $v\chb w$ for chain graphs of operations on more than one
1561: %semaphore.  
1562: The proof is by reduction from the NP-complete
1563: uniform-cost SMMCC problem, where the node costs are restricted to
1564: $\pm1$~\cite{Garey:1979:CIG}. The reduction has three steps. Given a SMMCC
1565: problem for a uniform-cost graph $G_0$ of $n$ nodes, we construct
1566: $O(\log n)$ chain graphs with $n+2$ semaphores. The first step of the
1567: reduction shows that the SMMCC problem for $G_0$ can be reduced to
1568: determining whether each of those $O(\log n)$ chain graphs has a valid
1569: schedule.  The second step shows that each of those $O(\log n)$ chain
1570: graphs can be {\em simulated} by a chain graph with only two
1571: semaphores. In other words, the simulated chain graph has a valid
1572: schedule if and only if the simulating chain graph has a valid
1573: schedule.  The last step shows that the simulating chain graph has a
1574: valid schedule if and only if $v\chb w$, for some $v$ and $w$, in the
1575: same chain graph.  We elaborate the details of the reduction in the
1576: appendix.
1577: 
1578: \section*{Acknowledgments}
1579: We thank the anonymous referees for their helpful remarks that
1580: significantly improved the presentation of the paper.
1581: 
1582: %
1583: %\section*{References}
1584: %\label{section:refs}\frenchspacing\indent
1585: %.[]
1586: \bibliographystyle{abbrv}
1587: \bibliography{race}
1588: 
1589: \appendix
1590: \section{Appendix}
1591: %\subsection{Definition and Notation}
1592: Let $G$ be a chain graph. Each node of $G$ is an operation on a
1593: semaphore. An operation on semaphore $S$ is either $+S$, incrementing
1594: the value of $S$ by one, or $-S$, decrementing the value of $S$ by
1595: one.  A subschedule of $G$ is {\em valid} if the value of each
1596: semaphore is always nonpositive during the execution of the
1597: subschedule.  Let $v$ and $w$ be two nodes of $G$.  If there exists a
1598: subschedule of $G$ in which $v$ precedes $w$, then we say $v\chb w$.
1599: Clearly, determining whether $v\chb w$ is in NP.  If $G$ is allowed to
1600: use more than one semaphore, then we prove the NP-hardness by a
1601: three-step reduction from the uniform-cost SMMCC problem.
1602: 
1603: \subsection{First Step}
1604: Let $G_0$ be an acyclic directed graph of $n$ nodes, $v_1,v_2,\ldots,
1605: v_n$. The cost of each node is either $+1$ or $-1$.  Suppose we would
1606: like to know whether $\height{G_0}\leq\ell$. We construct a chain graph
1607: $G_1$ composed of $2n+2$ chains of operations on $n+2$ semaphores, and
1608: argue that $G_1$ has a valid schedule if and only if
1609: $\height{G_0}\leq\ell$.  Note that $0\leq\height{G_0}\leq n$. Therefore,
1610: $\height{G_0}$ can be obtained by $O(\log n)$ queries of whether a chain
1611: graph of $n+2$ semaphores has a valid schedule.
1612: 
1613: Let $n^+$ be the number of nodes with positive costs. Let $n^-$ be the
1614: number of nodes with negative costs. Clearly, $n^+-n^-$ is the sum of
1615: node costs of $G_0$. Let $d_i$ be the number of outgoing arcs of $G_0$
1616: from $v_i$.  The $n+2$ semaphores for $G_1$ are
1617: $S_1,S_2,\ldots,S_n,S_\alpha,S_\beta$.
1618: %To distinguish the last two semaphores, we
1619: %also write $S_{\alpha}=S_{n+1}$ and $S_{\beta}=S_{n+2}$.  
1620: Let the $2n+2$ chains of $G_1$ be $C_1,\ldots, C_{n+1}$, and
1621: $C'_1,\ldots,C'_{n+1}$, all initially empty.  We construct $G_1$ from
1622: $G_0$ by the procedure $\construct{}$ in
1623: Figure~\ref{fig:construct}\note{Figure~\ref{fig:construct}}, which
1624: runs in polynomial time. Without loss of generality we can assume that
1625: $\ell-n^++n^-$, the number in the second-to-last statement of the
1626: procedure \fname{Construct}, is nonnegative, since otherwise
1627: $\height{G_0}>\ell$ is immediately concluded.
1628: 
1629: \begin{figure}%[p]
1630:   \begin{center}
1631:     \fbox{
1632:       \begin{minipage}{4in}
1633:         \begin{tabbing}
1634:           \quad\quad\=\quad\=\quad\=\quad\=\quad\=\quad\=\quad\=\kill
1635:           $\construct{G_0}$\\
1636:           1\>For $i:=1$ to $n$ do\\
1637:           2\>\>For $j:=1$ to $n$ do\\
1638:           3\>\>\>If $v_jv_i$ is an arc of $G_0$ then\\
1639:           4\>\>\>\>Append a $+S_j$ to $C_i$.\\
1640:           5\>\>If the cost of $v_i$ is $+1$ then\\
1641:           6\>\>\>Append a $+S_{\alpha}$ to $C_i$.\\
1642:           7\>\>else (i.e., the cost of $v_i$ is $-1$)\\
1643:           8\>\>\>Append a $-S_{\alpha}$ to $C_i$.\\
1644:           9\>\>\>Append a $+S_{\alpha}$ and $-S_{\alpha}$ to $C'_i$.\\
1645:           10\>\>Append $d_i$ copies of $-S_i$ to $C_i$.\\
1646:           11\>\>Append a $-S_{\beta}$ to $C_i$.\\
1647:           12\>Append $n$ copies of $+S_{\beta}$ to $C_{n+1}$.\\
1648:           13\>Append $\ell-n^++n^-$ copies of $+S_{\alpha}$ to $C_{n+1}$.\\
1649:           14\>Append $\ell$ copies of $-S_{\alpha}$ to $C'_{n+1}$.
1650:         \end{tabbing}
1651:       \end{minipage}
1652:       }
1653:   \end{center}
1654:   \caption{The procedure constructs a chain graph $G_1$ such that $G_1$
1655:     has a valid schedule if and only if $\height{G_0}\leq\ell$.}
1656:   \label{fig:construct}
1657: \end{figure}
1658: 
1659: \begin{figure*}%[p]
1660:   
1661: %  \begin{center}
1662: %    \leavevmode
1663: %    \begin{tabular}{c}
1664: %      \ovalnode{a}{$v_1:+1$}\\\\
1665: %      \ovalnode{b}{$v_2:+1$}\quad\ovalnode{c}{$v_3:+1$}\\\\
1666: %      \ovalnode{e}{$v_4:-1$}\quad\ovalnode{d}{$v_5:-1$}
1667: %    \end{tabular}
1668: %    \ncline{->}{a}{b}
1669: %    \ncline{->}{a}{c}
1670: %    \ncline{->}{b}{e}
1671: %    \ncline{->}{c}{d}
1672: %    \ncline{->}{c}{e}
1673: %    \ncline{->}{d}{e}
1674: %  \end{center}
1675: %  \vspace{0.5in}
1676:   \begin{center}
1677:     \input{fig11}\qquad
1678:     \begin{tabular}[b]{|r|r|r|r|r||r||r|r||r|}
1679:       $C_1$ &$C_2$ &$C_3$ &$C_4$ &$C_5$ &$C_6$ &$C'_4$&$C'_5$&$C'_6$\\
1680:       \hline                                                        
1681:             &      &      &$+S_2$&      &$+S_{\beta}$&$+S_{\alpha}$&$+S_{\alpha}$&$-S_{\alpha}$\\
1682:             &      &      &$+S_3$&      &$+S_{\beta}$&$-S_{\alpha}$&$-S_{\alpha}$&$-S_{\alpha}$\\
1683:             &$+S_1$&$+S_1$&$+S_5$&$+S_3$&$+S_{\beta}$&      &      &      \\
1684:             &      &      &      &      &$+S_{\beta}$&      &      &      \\
1685:       $+S_{\alpha}$&$+S_{\alpha}$&$+S_{\alpha}$&$-S_{\alpha}$&$-S_{\alpha}$&$+S_{\beta}$&      &      &      \\
1686:             &      &      &      &      &      &      &      &      \\
1687:       $-S_1$&$-S_2$&$-S_3$&      &$-S_5$&      &      &      &      \\
1688:       $-S_1$&      &$-S_3$&      &      &      &      &      &      \\
1689:             &      &      &      &      &      &      &      &      \\
1690:       $-S_{\beta}$&$-S_{\beta}$&$-S_{\beta}$&$-S_{\beta}$&$-S_{\beta}$&$+S_{\alpha}$&      &      &
1691:     \end{tabular}
1692:     \caption[]{An example for the first step of the reduction. Suppose
1693:       we would like to determine whether $\height{G_0}\leq2$, where
1694:       $G_0$ is the graph on top. We then construct, by
1695:       \fname{Construct}, the chain graph $G_1$ at bottom. Note that there
1696:       are one $+S_{\alpha}$ at the end of $C_6$ and two $-S_{\alpha}$ in
1697:       $C'_6$, according to the last two statements of \fname{Construct}. It
1698:       follows from Lemmas~\ref{lemma:np-1}(1) and~\ref{lemma:np-2} that
1699:       that exists a valid schedule of the chains at bottom if and only if
1700:       the height of the graph on top is at most two.}
1701:     \label{fig:np-complete1}
1702:   \end{center}
1703: \end{figure*}
1704: 
1705: 
1706: An example is shown in
1707: Figure~\ref{fig:np-complete1}\note{Figure~\ref{fig:np-complete1}}.
1708: The intuition is as follows. The (only) operation for $S_{\alpha}$ in
1709: $C_i$ corresponds to $v_i$, where the ``sign'' of $S_{\alpha}$
1710: reflects the cost of $v_i$.  We use the first $n$ semaphores,
1711: $S_1,\ldots,S_n$, to enforce the execution of these $n$ operations for
1712: $S_{\alpha}$ to obey the precedence constraints imposed by $G_0$. In
1713: Figure~\ref{fig:np-complete1}, for instance, in order to reach the
1714: $-S_{\alpha}$ in $C_4$, we have to unlock the $+S_2$ (and $+S_3$,
1715: $+S_5$) in the same chain first.  Since the only $-S_2$ is after the
1716: $+S_{\alpha}$ in $C_2$, we know the $+S_{\alpha}$ in $C_2$ must be
1717: executed before the $-S_{\alpha}$ in $C_4$.
1718: 
1719: The $-S_{\beta}$'s at the end of $C_1,\ldots,C_n$ are to ensure that as
1720: long as the last $+S_{\beta}$ in $C_{n+1}$ is executed, all operations
1721: in $C_1,\ldots, C_n$ are already executed. The function of those $\ell$
1722: copies of $-S_{\alpha}$ in $C'_{n+1}$ is clear: The larger $\ell$, the
1723: easier for $G_1$ to have a valid schedule.  The purpose of the
1724: $+S_{\alpha}, -S_{\alpha}$ pairs in $C'_1,\ldots, C'_n$ and those
1725: $\ell-n^++n^-$ copies of $+S_{\alpha}$'s at the end of $C_{n+1}$ will
1726: become clear as we proceed.  Basically they are used to ensure that
1727: $G_1$ has some kind of ``pairwise'' schedule, as long as $G_1$ has a
1728: valid schedule. One can verify that there are the same number of
1729: $+S_i$'s and $-S_i$'s in $G_1$, for each $1\leq i\leq n+2$.
1730: 
1731: For the rest of the subsection, we prove that $\height{G_0}\leq\ell$
1732: if and only if $G_1$ has a valid schedule. An implication of the
1733: following proofs is that $G_1$ has a valid schedule if and only if it
1734: has a valid schedule executable by some procedure \fname{Pairwise},
1735: which will be given in the proofs.
1736: \begin{lemma}
1737:   \label{lemma:np-1}
1738: \begin{enumerate} 
1739: \item If $G_1$ has a valid subschedule containing the last $+S_{\alpha}$
1740:   of $C_{n+1}$, then $\height{G_0}\leq\ell$.
1741: \item
1742:   If $G_1$ has a valid schedule, then $\height{G_0}\leq\ell$.
1743: \end{enumerate}
1744: 
1745: \end{lemma}
1746: \begin{proof}
1747:   Clearly, it suffices to prove the first statement, since the second
1748:   statement follows immediately from the first statement.
1749:   
1750:   Let $X$ be a valid subschedule of $G_1$ as described in the lemma. We
1751:   show $\height{G_0}\leq\ell$.  Let $O_i$ be the operation of
1752:   $S_{\alpha}$ in $C_i$. Since $X$ is valid and contains the last
1753:   $+S_{\alpha}$ of $C_{n+1}$, $X$ must contain all the operations in
1754:   $C_1,\ldots,C_n$.  Therefore, every $O_i$, $1\leq i\leq n$, is in $X$.
1755: 
1756:   Suppose the order of those $O_i$'s in $X$ is $O_{k_1},O_{k_2},\ldots,
1757:   O_{k_n}$. By the definition of \fname{Construct}, if $v_j$ is
1758:   reachable from $v_i$ in $G_0$, then $O_j$ does not precede $O_i$ in
1759:   $X$. It follows that the sequence $Y=v_{k_1}v_{k_2}\cdots v_{k_n}$ is
1760:   a schedule of $G_0$.  Therefore, it suffices to show 
1761:   $\height{Y}\leq\ell$.
1762: 
1763:   Assume $\height{Y}>\ell$ for a contradiction. If
1764:   we count only those $O_i$'s as the operations for $S_{\alpha}$ in $X$, then
1765:   the maximum value of $S_{\alpha}$ would be greater than $\ell$ during
1766:   the execution of $X$. Note that there are $\ell+n^-$ other
1767:   $-S_{\alpha}$'s in $C'_1,\ldots,C'_{n+1}$, which are the only hope for
1768:   bringing the maximum value of $S_{\alpha}$ down to zero.  By the
1769:   construction of $C'_1,\ldots, C'_n$, however, we know $n^-$ of those
1770:   $-S_{\alpha}$'s have to be preceded in $X$ by $n$ other
1771:   $+S_{\alpha}$'s. It follows that even if we count all operations for
1772:   $S_{\alpha}$ together, the maximum value of $S_{\alpha}$ would be
1773:   greater than zero during the execution of $X$.  This contradicts the
1774:   fact that $X$ is a valid schedule of $G_1$.
1775: \end{proof}
1776: 
1777: 
1778: \begin{lemma}
1779:   \label{lemma:np-2}
1780:   If $\height{G_0}\leq\ell$, then $G_1$ has a valid schedule.
1781: \end{lemma}
1782: \begin{proof}
1783:   Let $Y=v_{k_1}v_{k_2}\cdots v_{k_n}$ be a schedule of $G_0$ with
1784:   $\height{Y}\leq\ell$.  Let $m_i$ be the sum of costs of
1785:   $v_{k_1},\ldots,v_{k_i}$. Clearly, $m_n=n^+-n^-$, which is the sum of
1786:   node costs of $G_0$.  Since $\height{Y}\leq\ell$, we know that
1787:   $m_i\leq\ell$ for every $1\leq i\leq n$. We claim that $G_1$ can be
1788:   executed by the procedure \fname{Pairwise} in
1789:   Figure~\ref{fig:pairwise}\note{Figure~\ref{fig:pairwise}}.
1790: \begin{figure}%[p]
1791:   \begin{center}
1792:     \fbox{
1793:     \begin{minipage}{5in}
1794:       \begin{tabbing}
1795:         \qquad\=\quad\=\quad\=\quad\=\quad\=\quad\=\quad\=\kill
1796:         Procedure \fname{Pairwise}\\
1797:         1\>For $k:=k_1,k_2,\ldots,k_n$ do\\
1798:         2\>\>For $j:=1$ to $n$ do\\
1799:         3\>\>\>If $v_j v_k$ is an arc of $G_0$ then\\
1800:         4\>\>\>\>Execute a $-S_k$ in $C_j$.\\
1801:         5\>\>\>\>Execute the $+S_k$ in $C_k$.\\
1802:         6\>\>If $O_k = +S_{\alpha}$ then\\
1803:         7\>\>\>Execute one of the $-S_{\alpha}$'s\\
1804:         8\>\>\>\>in $C'_1,C'_2,\ldots,C'_{n+1}$.\\
1805:         9\>\>\>Execute the $+S_{\alpha}$ in $C_k$.\\
1806:         10\>\>else (i.e., $O_k = -S_{\alpha}$)\\
1807:         11\>\>\>Execute the $-S_{\alpha}$ in $C_k$.\\
1808:         12\>\>\>Execute the $+S_{\alpha}$ in $C'_k$.\\
1809:         13\>For $i:=1$ to $n$ do\\
1810:         14\>\>Execute the $-S_{\beta}$ in $C_i$.\\
1811:         15\>\>Execute a $+S_{\beta}$ in $C_{n+1}$.\\
1812:         16\>For $i:=1$ to $\ell-m_n$ do\\
1813:         17\>\>Execute a $-S_{\alpha}$ in $C'_1,\ldots,C'_{n+1}$.\\
1814:         18\>\>Execute a $+S_{\alpha}$ in $C_{n+1}$.
1815:       \end{tabbing}
1816:     \end{minipage}
1817:     }
1818:   \end{center}
1819:   \caption{Procedure \fname{Pairwise}.}
1820:   \label{fig:pairwise}
1821: \end{figure}
1822: 
1823: Note that in the schedule of $G_1$ executed by \fname{Pairwise}, each
1824: operation $-S_i$ is immediately followed by an operation $+S_i$.  Not
1825: every chain graph has such a ``pairwise'' schedule, however, we show
1826: that $G_1$ does.  We first show that the first for-loop of
1827: \fname{Pairwise} can be finished for $G_1$. Specifically, suppose the
1828: following claim hold:
1829: \begin{quote}
1830: \label{lemma:pairwise-1} {\bf Claim}
1831:   For each $1\leq i\leq n$, the $i$-th iteration of the first for-loop
1832:   of \fname{Pairwise} can be executed for $G_1$. Furthermore, after
1833:   executing the $i$-th iteration,
1834:   \begin{itemize}
1835:   \item the remaining operations in $C_{k_i}$ are $d_{k_i}$ copies of
1836:     $-S_{k_i}$'s followed by a $+S_{\beta}$; and
1837:   \item there are $\ell-m_i$ copies of $-S_{\alpha}$'s available in
1838:     $C'_1,\ldots,C'_{n+1}$.
1839:   \end{itemize}
1840: \end{quote}
1841: It is then not hard to see that after the execution of the first
1842: for-loop of \fname{Pairwise}, the remaining operation in each $C_i$ is a
1843: $-S_{\beta}$. Therefore, the second for-loop of \fname{Pairwise} can be
1844: finished, since there are $n$ copies of $+S_{\beta}$'s available in
1845: $C_{n+1}$.  
1846: 
1847: By Lemma~\ref{lemma:pairwise-1}, we know that after executing the first
1848: For-loop, the number of $-S_{\alpha}$'s in $C'_1,\ldots,C'_{n+1}$ is
1849: $\ell-m_n$, which is equal to the number of $+S_{\alpha}$'s at the end
1850: of $C_{n+1}$. Therefore, the last for-loop of \fname{Pairwise} can be
1851: finished. The lemma is proved.
1852: 
1853: It remains to prove the above claim by induction on $i$. For
1854: convenience we abbreviate $k_i$ to $k$ for the rest of the proof.
1855: When $i=1$, we know $v_k$ does not have any incoming arcs from other
1856: nodes. Therefore, the for-loop with index $j$ in the first iteration does
1857: not execute any operation. We then consider the if-statement.
1858: \begin{itemize}
1859: \item If $O_k=-S_{\alpha}$, then $\cost{v_k}=-1$, and thus $m_1=-1$.
1860:   There is a $+S_{\alpha}$ in $C'_k$ by the definition of
1861:   \fname{Construct}. We can execute the else-part of the if-statement
1862:   without problem. Since the second operation in $C'_k$ is a
1863:   $-S_{\alpha}$, these two steps increase the number of $-S_{\alpha}$'s
1864:   available in $C'_1,\ldots,C'_{n+1}$ by one.
1865: \item If $O_k=+S_{\alpha}$, then $\cost{v_k}=1$, and thus $m_1=1$. Since
1866:   $v_k$ is the first node in $Y$, $\height{Y}$ is at least one, and thus
1867:   $\ell\geq 1$. We can therefore execute the then-part of the
1868:   if-statement without problem.  The number of $-S_{\alpha}$'s available
1869:   in $C'_1,\ldots,C'_{n+1}$ is decreased by one.
1870: \end{itemize}
1871: Clearly, after executing the first iteration, in which the only executed
1872: operation in $C_k$ is $O_k$, the remaining operations in $C_k$ are
1873: exactly as that described in the claim.  Note that before executing the
1874: first iteration, the number of available $-S_{\alpha}$'s is $\ell$ by
1875: the definition of \fname{Construct}.  Therefore, after executing the
1876: first iteration, the number of available $-S_{\alpha}$'s is exactly
1877: $\ell-m_1$. This confirms the inductive basis.
1878: 
1879: Let $i'$ be an integer with $1<i'\leq n$. Assume that the claim holds
1880: for every $1\leq i<i'$. We show it holds for $i=i'$.  Consider the
1881: $i$-th iteration. Note that for every $j$ such that $v_jv_k$ is an arc
1882: of $G_0$, $O_j$ must have been executed.  By the inductive hypothesis
1883: we know those $d_j$ copies of $-S_j$'s are already available before
1884: executing the $i$-th iteration. Therefore, the for-loop with index $j$
1885: will proceed without problem, since there are exactly $d_j$ copies of
1886: $+S_j$'s in $G_1$ by the definition of \fname{Construct}.  We then
1887: consider the if-statement.
1888: \begin{itemize}
1889: \item If $O_k=-S_{\alpha}$, then $m_i=m_{i-1}-1$. We know there is a
1890: $+S_{\alpha}$ in $C'_k$.  Thus, the else-part can proceed without
1891: problem. Since the second operation in $C'_k$ is a $-S_{\alpha}$,
1892: these two steps increase the number of available $-S_{\alpha}$'s in
1893: $C'_1,\ldots,C'_{n+1}$ by one.
1894: \item If $O_k=+S_{\alpha}$, then $m_i=m_{i-1}+1$. The inductive hypothesis
1895: says that the number of $-S_{\alpha}$'s available in
1896: $C'_1,\ldots,C'_{n+1}$ is $\ell-m_{i-1}$ before executing the $i$-th
1897: iteration.  That number is at least one since
1898: $\ell-m_{i-1}-1=\ell-m_i\geq 0$. Therefore, the then-part of the
1899: if-statement can be executed without problem. The number of available
1900: $-S_{\alpha}$'s in $C'_1,\ldots,C'_{n+1}$ is decreased by one.
1901: \end{itemize}
1902: Therefore, the $i$-th iteration can be executed, and thus the remaining
1903: operations in $C_k$ are as required.
1904: 
1905: It follows from the inductive hypothesis that the number of available
1906: $-S_{\alpha}$ in $C'_1,\ldots,C'_{n+1}$ is $\ell-m_{i-1}$.  By the
1907: above case analysis we see that the number is exactly $\ell-m_i$ after
1908: executing the $i$-th iteration. The claim is proved.
1909: \end{proof}
1910: 
1911: If $G_1$ has a valid schedule, then by Lemma~\ref{lemma:np-1}(2) we know
1912: $\height{G_0}\leq\ell$. It then follows from the proof of
1913: Lemma~\ref{lemma:np-2} that $G_1$ has a valid schedule executable by
1914: \fname{Pairwise}.  Therefore, we have the following lemma.
1915: 
1916: \begin{lemma}
1917: \label{lemma:pairwise-2}
1918: $G_1$ has a valid schedule if and only if $G_1$ has a valid schedule
1919: executable by \fname{Pairwise}.
1920: \end{lemma}
1921: 
1922: \subsection{Second Step}
1923: \label{subsec:second-step}
1924: In this subsection we show that the $G_1$ constructed in the first step
1925: can be simulated by another chain graph $G_2$, which uses only two
1926: semaphores, $T_1$ and $T_2$. $G_2$ has $2n+3$ chains. The first chain,
1927: denoted $C_0$, is composed of two $-T_1$'s and two $-T_2$'s.  The
1928: remaining $2n+2$ chains are obtained from those of $G_1$ as follows.
1929: We replace every operation $-S_i$ (and $+S_i$) by a {\em unit} $-U_i$
1930: (and $+U_i$) for each $1\leq i\leq n+2$.  Each unit, $-U_i$ or $+U_i$,
1931: is a sequence of operations on $T_1$ and $T_2$, as shown in
1932: Figure~\ref{fig:two-semaphores}\note{Figure~\ref{fig:two-semaphores}}.  
1933: We also denote those $2n+2$ chains of
1934: $G_2$ by $C_1,\ldots,C_{n+1}$ and $C'_1,\ldots,C'_{n+1}$. Clearly, $G_2$
1935: can be constructed in polynomial time. 
1936: \begin{figure}%[p]
1937:   \begin{center}
1938:     \leavevmode
1939:     \begin{displaymath}
1940:       \begin{array}[t]{l}
1941:         +T_1\\+T_2\\
1942:         \left.\!\!\!
1943:         \begin{array}[r]{c}
1944:           -T_1\\+T_2\\
1945:           \vdots\\
1946:           -T_1\\+T_2
1947:         \end{array}
1948:         \right\}\mbox{$i+1$ pairs}\\
1949:         -T_2\\-T_2\\-T_2\\-T_2
1950:       \end{array}\qquad
1951:       \begin{array}[t]{l}
1952:         +T_1\\+T_2\\
1953:         \left.\!\!\!
1954:         \begin{array}[r]{c}
1955:           +T_1\\-T_2\\
1956:           \vdots\\
1957:           +T_1\\-T_2
1958:         \end{array}
1959:         \right\}\mbox{$i+1$ pairs}\\
1960:         +T_2\\+T_2\\-T_1\\-T_1
1961:       \end{array}
1962:     \end{displaymath}
1963:     \caption{The sequence of operations for a $-U_i$ is at left and that
1964:       for a $+U_i$ is at right, for any $1\leq i\leq n+2$.}
1965:     \label{fig:two-semaphores}
1966:   \end{center}
1967: \end{figure}
1968: 
1969: Note that the sequence of operations in each unit is arranged such that
1970: only a $-U_i$ and a $+U_i$ can ``unlock'' each other. To be more
1971: specific, suppose each of $T_1$ and $T_2$ has initial value -2, which
1972: will be the case if the four operations in $C_0$ are executed.  Consider
1973: a graph $U_{ij}$ for some $1\leq i,j\leq n+2$ composed of two units,
1974: $-U_i$ and $+U_j$, each forms a single chain. One can easily verify that
1975: $U_{ij}$ has a valid schedule if $i=j$. 
1976: %(In fact it also holds for the
1977: %other direction. We do not emphasize it, however, because it is not that
1978: %relevant to our proof.) 
1979: Moreover, after executing all the operations of $U_{ii}$, the values of
1980: $T_1$ and $T_2$ go back to $-2$.
1981: 
1982: We claim that $G_1$ has a valid schedule if and only if $G_2$ has a
1983: valid schedule. The only-if part is straightforward.  Suppose $G_1$
1984: has a valid schedule.  By Lemma~\ref{lemma:pairwise-2}, $G_1$
1985: has a valid schedule executable by \fname{Pairwise}.  Note that we can
1986: execute the four operations of $C_0$ first, which decrease the value
1987: of both semaphores down to -2. Clearly, the remaining $2n+2$ chains of
1988: units can be completely pairwisely executed by following the sequence
1989: of corresponding operations in $G_1$ executed by \fname{Pairwise}.
1990: Therefore, $G_2$ has a valid schedule.
1991: 
1992: It takes some added work to prove the other direction of the above
1993: claim.  A unit is {\em active} if its third operation is executed.  A
1994: unit is {\em finished} (and thus inactive) if its fifth-to-last
1995: operation is executed.  Suppose $G_2$ has a valid schedule. Consider
1996: the sequence of the units of $G_2$ that become active in the valid
1997: schedule.  It follows from the following lemma that the corresponding
1998: sequence of operations of $G_1$ is a valid schedule of $G_1$. In fact
1999: it is ``pairwise'', since in the schedule each $-S_i$ is immediately
2000: followed by a $+S_i$.
2001: 
2002: \begin{lemma}
2003: \label{lemma:two-semaphores}
2004: Consider the execution of a valid subschedule.  
2005: \begin{enumerate}
2006: \item When there is no
2007: active unit, the next unit that becomes active must be a $-U_i$ for
2008: some $1\leq i\leq n+2$.
2009: \item Before that active $-U_i$ is finished, a
2010: $+U_i$ must become active.
2011: \item No unit will become active unless these
2012: two active units are finished.
2013: \end{enumerate}
2014: \end{lemma}
2015: 
2016: \begin{proof}
2017: At the beginning of the valid schedule, no unit is active. We show the
2018: first statement of the lemma holds.  At this moment there are two
2019: $-T_1$'s and two $-T_2$'s available (in $C_0$). They are our only hope
2020: for activating any unit, since each unit is guarded by two $+T_1$'s
2021: and two $+T_2$'s.  Assume for a contradiction that the first unit
2022: becoming active is a $+U_i$ for some $1\leq i\leq n+2$.  Note that as
2023: soon as the first $+U_i$ becomes active, at least two $+T_1$'s are
2024: already executed. Since at most two $-T_1$'s are executed so far,
2025: there is no way to activate any other unit. The execution thus cannot
2026: proceed.
2027: 
2028: When the first unit $-U_i$ becomes active, one can see that the second
2029: statement of the lemma holds by verifying the following.
2030: \begin{itemize}
2031: \item The active $-U_i$ will not be finished unless another unit
2032: becomes active, since otherwise the execution will be blocked by
2033: some $+T_2$'s.
2034: \item The next active unit must be a $+U_j$ for some $1\leq j\leq
2035: n+2$, since otherwise the execution will be blocked by some
2036: $+T_2$'s.
2037: \item if $i<j$, the execution will be blocked by some $+T_1$'s. If
2038: $i>j$, then the execution will be blocked by some $+T_2$'s. Therefore, the
2039: next active unit must be a $+U_i$.
2040: \end{itemize}
2041: 
2042: When those two units are active, in order to activate other units, we
2043: can only hope for the $-T_1$'s at the end of the active $+U_i$. In
2044: order to reach those $-T_1$'s, the preceding consecutive $+T_2$'s must
2045: be penetrated.  Hence, at least two $-T_2$'s at the end of the active
2046: $-U_i$ must be executed first. Therefore, those two active units $-U_i$
2047: and $+U_i$ must be finished before any other unit becomes active.
2048: This confirms the third statement of the lemma.
2049: 
2050: Note that as soon as the active $+U_i$ is finished (and so must be the
2051: active $-U_i$), the situation is exactly the same as the situation at
2052: the very beginning of the execution. Namely we have two $-T_1$'s and
2053: two $-T_2$'s available, which are again our only hope for activating
2054: any other units.  Therefore, all the above argument follows
2055: inductively.  The lemma is proved.
2056: \end{proof}
2057: 
2058: \subsection{Third Step}
2059: Let $v$ be the first operation of the $C_0$ in $G_2$. Let $w$ be the
2060: last operation of the $C_{n+1}$ in $G_2$.  We claim that $v\chb w$ if
2061: and only if $G_2$ has a valid schedule.  note that $v$ is always the
2062: first node in any valid subschedule of $G_2$. The if-part of the claim
2063: holds trivially. It remains to prove the only-if-part of the claim.
2064: 
2065: Let $X$ be a valid subschedule of $G_2$ in which $v$ precedes $w$. 
2066: Consider the sequence of the units of $G_2$ that become active while
2067: executing $X$.  It follows from Lemma~\ref{lemma:two-semaphores} that
2068: the corresponding sequence of operations of $G_1$ is a valid subschedule
2069: of $G_1$, which definitely contains the last $+S_{\alpha}$ of the
2070: $C_{n+1}$ in $G_1$. Therefore, $G_1$ has a valid schedule by
2071: Lemmas~\ref{lemma:np-1}(2) and~\ref{lemma:np-2}.  Finally it follows
2072: from the claim in \S\ref{subsec:second-step} that $G_2$ has a
2073: valid schedule.
2074: 
2075: \end{document}
2076: