0208:cs0208004/race.tex

1: \documentclass[11pt]{article}

2: \usepackage{epsfig,color,fullpage,citesort}

3: %\usepackage{pst-node}

4: \def\myendproof{{\ \vbox{\hrule\hbox{%

5:    \vrule height1.3ex\hskip0.8ex\vrule}\hrule }}\par}

6: \newtheorem{theorem}{Theorem}[section]

7: \newtheorem{lemma}[theorem]{Lemma}

8: \newtheorem{corollary}[theorem]{Corollary}

9: \newenvironment{proof}{{\it Proof. }}{\myendproof}

10: %\documentstyle[psfig,times,doublespace]{article}

11: %\setcounter{topnumber}{1}

12: %\setcounter{bottomnumber}{1}

13: %\setcounter{totalnumber}{1}

14: %\newtheorem{mytheorem}{Theorem}

15: %\newtheorem{mylemma}{Lemma}

16: \newcommand{\qed}{\myendproof}

17: \newcommand{\setof}[1]{\{{#1}\}}

18: \newcommand{\set}[2]{\{{{#1}:{#2}}\}}

19: \newcommand{\fname}[1]{{\sc #1}}

20: %\setlength{\marginparwidth}{.65in}

21: %\newcommand{\note}[1]{\marginpar{\renewcommand{\baselinestretch}{0.8}

22: %                          \footnotesize\sf #1}}

23: %\reversemarginpar

24: %\newcommand{\note}[1]{\marginpar{#1}}

25: \newcommand{\note}[1]{}

26: \newcommand{\maxof}[1]{\max\setof{#1}}

27: \newcommand{\notchb}{\not\chb}

28: \newcommand{\chb}{\rightarrow}

29: \newcommand{\opt}[1]{h({#1})}

30: \newcommand{\mmcc}{{minimal maximum cumulative cost}}

31: \newcommand{\cost}[1]{{c({#1})}}

32: \newcommand{\rank}[1]{{r({#1})}}

33: \newcommand{\height}[1]{{h({#1})}}

34: \newcommand{\hb}{\mbox{\rm hb}}

35: \newcommand{\fall}[1]{{\tilde{h}(#1)}}

36: \newcommand{\length}[1]{|{#1}|}

37: \newcommand{\myheading}[1]{\noindent {\bf #1}}

38: \newcommand{\almostone}{1+\epsilon}

39: \newcommand{\comment}[1]{}

40: \newcommand{\Xomit}[1]{}

41: \newcommand{\first}[2]{\mbox{\it first}_{#1}(#2)}

42: \newcommand{\init}{\bot}

43: \newcommand{\fini}{\top}

44: \newcommand{\pr}[2]{\mbox{\it pred}_{#1}(#2)}

45: \newcommand{\su}[2]{\mbox{\it succ}_{#1}(#2)}

46: \newcommand{\prr}[1]{\mbox{\it pred}(#1)}

47: \newcommand{\suu}[1]{\mbox{\it succ}(#1)}

48: \newcommand{\prfx}[2]{[-,{#1}]_{#2}}

49: \newcommand{\sufx}[2]{[{#1},-]_{#2}}

50: \newcommand{\prfxx}[1]{[-,{#1}]}

51: \newcommand{\sufxx}[1]{[{#1},-]}

52: \newcommand{\head}[1]{\mbox{\it start}(#1)}

53: \newcommand{\tail}[1]{\mbox{\it end}(#1)}

54: \newcommand{\algo}[1]{{\sc{#1}}}

55: \newcommand{\cut}{\Gamma}

56: \newcommand{\jopt}[1]{h_j(G[{#1}])}

57: \newcommand{\decomp}[1]{\mbox{\algo{Decomp}}(#1)}

58: \newcommand{\merge}[1]{\mbox{\algo{Merge}}(#1)}

59: \newcommand{\minheight}[1]{\mbox{\algo{MinHeight}}(#1)}

60: \newcommand{\best}[1]{\mbox{\algo{Best}}(#1)}

61: \newcommand{\cuttrans}[1]{\mbox{\algo{CutTrans}}(#1)}

62: \newcommand{\chainpair}[1]{\mbox{\algo{ChainPair}}(#1)}

63: \newcommand{\construct}[1]{\mbox{\algo{Construct}}(#1)}

64: \newcommand{\removerange}[1]{\mbox{\algo{RemoveRange}}(#1)}

65: \newcommand{\newcut}[1]{\mbox{\algo{Newcut}}(#1)}

66: \newtheorem{claim}{Claim}

67:

68: %\title{Detecting Race Conditions in Parallel Programs that Use

69: %Semaphores}

70: %%

71: %\author{Philip N. Klein \and

72: %  Hsueh-I Lu \and Robert H.B. Netzer

73: %     \thanks{Department of Computer Science, Brown University, Providence,

74: %       RI 02912-1910, USA. Email: {\tt\{klein,hil,rn\}@cs.brown.edu}. Fax:

75: %       (401)863-7657.

76: %     }

77: %}

78:

79: \title{Detecting Race Conditions in Parallel Programs that Use

80: Semaphores\thanks{Preliminaries versions of this paper appeared

81: in~\cite{Lu:19xx:DRC,Klein:19xx:RCD}.%

82: %Most of the research was performed while the second author

83: %was with Department of Computer Science, Brown

84: %University.

85: }}

86: \author{Philip N. Klein\thanks{Department of Computer Science, Brown

87:      University,

88:      Providence, RI 02912, USA. Email:

89:      klein@cs.brown.edu.}

90: \and

91:   Hsueh-I Lu\thanks{Corresponding author. Institute of Information Science, Academia

92:   Sinica, Taipei 115, Taiwan. Email: hil@iis.sinica.edu.tw. URL: www.iis.sinica.edu.tw/\~{ }hil/ }

93: \and

94:   Robert H.B. Netzer\thanks{Department of Computer Science, Brown University,

95:      Providence, RI 02912, USA. Email:

96:      rn@cs.brown.edu.}

97: }

98:

99: \begin{document}

100: \maketitle

101: \begin{abstract}

102: We address the problem of detecting race conditions in programs that

103: use semaphores for synchronization. Netzer and Miller showed that it

104: is NP-complete to detect race conditions in programs that use many

105: semaphores. We show in this paper that it remains NP-complete even if

106: only two semaphores are used in the parallel programs.

107:

108: For the tractable case, i.e., using only one semaphore, we give two

109: algorithms for detecting race conditions from the trace of executing a

110: parallel program on $p$ processors, where $n$ semaphore operations are

111: executed.  The first algorithm determines in $O(n)$ time whether a

112: race condition exists between any two given operations. The second

113: algorithm runs in $O(np\log n)$ time and outputs a compact

114: representation from which one can determine in $O(1)$ time whether a

115: race condition exists between any two given operations. The second

116: algorithm is near-optimal in that the running time is only $O(\log n)$

117: times the time required simply to write down the output.

118: %

119: %This paper combines the results in the two preliminary

120: %versions~\cite{Lu:19xx:DRC,Klein:19xx:RCD}.

121: %

122: %{\em Keywords:} Parallel debugging, Synchronization, Race conditions,

123: %Semaphores, NP-completeness, Polynomial-time algorithms, Scheduling

124: \end{abstract}

125:

126: \section{Introduction}

127: Race detection is crucial in developing and debugging shared-memory

128: parallel

129: programs~\cite{Simmons:1996:DPT,Savage:1997:EDD,Emrath:1992:DNP,Netzer:1992:WRC,Itzkovitz:1999:TID,Ha:2002:SEF}. Explicit

130: synchronization is usually added to such programs to coordinate access

131: to shared data.  For example, when using a semaphore, a $V$-operation

132: increments the semaphore, and a $P$-operation waits until the

133: semaphore is greater than zero and then decrements the

134: semaphore. $P$-operations are typically used to wait (synchronize)

135: until some condition is true (such as a shared buffer becoming

136: non-empty), and $V$-operations typically signal that some condition is

137: now true.  Race conditions result when this synchronization does not

138: force concurrent processes to access data in the expected order.  One

139: way to dynamically detect races in a program is to trace its execution

140: and analyze the traces afterward.  A central part of dynamic race

141: detection is to compute from the trace the order in which

142: shared-memory accesses were guaranteed by the execution's

143: synchronization to have executed. Accesses to the same location not

144: guaranteed to execute in some particular order are considered a race.

145: When programs use semaphore operations for synchronization, some

146: operations (belonging to different processes) could have potentially

147: executed in an order different than what was traced.

148:

149:

150: In this paper, we address the tractability of detecting race

151: conditions from the traces of parallel programs that use semaphores.

152: Let $p$ be the number of processors used to execute the parallel

153: program, and let $n$ be the total number of semaphore operations

154: performed in the execution. The trace can then be represented by a

155: directed $n$-node graph $G$ consisting of $p$ disjoint chains, each

156: represents the sequence of semaphore operations executed by a

157: processor.  A {\em schedule} of $G$ is a linear ordering of all nodes

158: in $G$ consistent with the precedence constraints imposed by the arcs

159: of $G$.  A prefix of a schedule of $G$ is a {\em subschedule} of $G$.

160: A subschedule of $G$ is {\em valid} if at each point in the

161: subschedule, the number of $V$ operations is never exceeded by the

162: number of $P$ operations for each semaphore (i.e., all semaphores are

163: always nonnegative).  Then, if the trace indicates that $v$ preceded

164: $w$ in the actual execution, but a valid subschedule\footnote{We

165: consider subschedules rather than schedules because deadlocks might

166: happen during the execution of parallel programs.}  exists in which

167: $w$ precedes $v$, then $v$ and $w$ could have executed in either

168: order, i.e., there is a {\em race condition} between $v$ and $w$.

169: Miller and Netzer showed that detecting race conditions in parallel

170: programs that use multiple semaphores is

171: NP-complete~\cite{Netzer:1990:CEO}.  Researchers have developed exact

172: algorithms for cases where the problem is efficiently solvable

173: (programs that use types of synchronization weaker than semaphores

174: such as

175: post/wait/clear)~\cite{Netzer:1992:ERC,Helmbold:1993:CSO,Helmbold:1996:TRC},

176: and heuristics for the multiple semaphore

177: case~\cite{Emrath:19xx:ESA,Helmbold:19xx:ATA}.  The complexity for the

178: case of constant number of semaphores was unknown. In the present

179: paper, we show that the problem remains NP-complete even if only two

180: semaphores are used in the parallel program.

181:

182: For the case of using only one semaphore in parallel programs, we give

183: two algorithms.  The first algorithm detects in $O(n)$ time whether a

184: race condition exists between any two operations.  The second

185: algorithm computes in $O(np\log n)$ time a compact representation,

186: from which one can determine whether a race condition exists between

187: any two operations in $O(1)$ time.  Our results are based on the

188: reducing the problem of determining whether a valid subschedule exists

189: in which $w$ precedes $v$ to the problem of {\em Sequencing to

190: Minimize Maximum Cumulative Cost (SMMCC)\/}.

191: %We first describe the SMMCC problem and then explain the

192: %equivalence in the next two paragraphs.

193: Given an acyclic directed graph $G$ with costs on the nodes, the {\em

194: cumulative cost} of the first $i$ nodes in a schedule of $G$ is the

195: sum of the cost of these nodes.  Thus, minimizing the maximum

196: cumulative cost is an attempt to ensure that the cumulative cost stays

197: low throughout the schedule. The SMMCC problem is NP-complete in

198: general even if the node costs are restricted to

199: $\pm1$~\cite{Abdel-Wahab:1976:SAR,Garey:1979:CIG}.  Abdel-Wahab and

200: Kameda~\cite{Abdel-Wahab:1978:SMM} presented an $O(n^2)$-time

201: algorithm for the special case that $G$ is a series-parallel graph.

202: (The time bound was later improved to $O(n \log n)$ by the same

203: authors~\cite{Abdel-Wahab:1980:SOS}.)  As part of this solution, they

204: gave an $O(n\log p)$-time algorithm applicable when $G$ consists of

205: $p$ disjoint chains.  The existence problem of a valid {\em schedule}

206: in which $v$ precedes $w$ can be reduced to the SMMCC problem in a

207: chain graph augmented with one inter-chain edge. We add an edge from

208: $w$ to $v$, assign costs to the nodes ($+1$ if the node is a

209: $P$-operation, $-1$ if a $V$-operation), and compute the minimum

210: maximum cumulative cost.  Clearly, the cost is non-positive if and

211: only if there is a valid schedule. The augmented chain graph is not

212: series-parallel, so the algorithms of Abdel-Wahab and

213: Kameda~\cite{Abdel-Wahab:1978:SMM,Abdel-Wahab:1980:SOS} are not

214: applicable.  We show that the SMMCC problem can nevertheless be solved

215: in polynomial time.  In fact, for the special case of interest, that

216: in which the costs are $\pm 1$, we give a linear-time algorithm.

217:

218: The rest of the paper is organized as follows.

219: Section~\ref{sec:prelim} gives the preliminaries.

220: Section~\ref{sec:single} gives the algorithm for a single pair of

221: nodes. Section~\ref{sec:all} gives the algorithm for all pairs of

222: nodes. Section~\ref{sec:2semaphores} sketches the proof for showing that

223: race-condition detection is NP-complete if two semaphores are used in

224: the parallel program.

225:

226: \section{Preliminaries}

227: \label{sec:prelim}

228: %\subsection{Definition and Notation}

229: Suppose $G$ is an acyclic graph with node costs.  We introduce some

230: terminology having to do with schedules, mostly adapted

231: from~\cite{Abdel-Wahab:1978:SMM}.

232: %A {\em schedule} of $G$ is a

233: %sequence of $G$'s nodes which is consistent with the precedence

234: %constraints imposed by the arcs of $G$.

235: A {\em segment} of a schedule is a consecutive subsequence.  Let $H =

236: v_1v_2\cdots v_m$ be a sequence of nodes.  The {\em cost} of $H$,

237: denoted $\cost{H}$, is the sum of the costs of its nodes.  The {\em

238: height of a node $v_\ell$ in $H$} is defined to be the sum of the

239: costs of the nodes $v_1$ through $v_\ell$.  The {\em height of $H$},

240: denoted $\height{H}$, is the maximum of 0 and the maximum height of

241: the nodes in $H$.

242: %(a) the maximum height of any node in $H$, if

243: %some node of $H$ has non-negative height, or (b) zero, if all nodes in

244: %$H$ have negative heights.

245: A node of maximum height in $H$ is called a {\em peak}. A node of

246: minimum height in $H$ is called a {\em valley}.  The {\em reverse

247: height} of $H$, denote $\fall{H}$, is the height of $H$ minus the cost

248: of $H$.  Note that height and reverse height are nonnegative.  A

249: schedule of $G$ is {\em optimal} if its height is minimum over all

250: schedules of $G$.  We use $\opt{G}$ to denote the height of its

251: optimal schedule.

252:

253: A sequence $C=v_1v_2\cdots v_m$ of nodes of $G$ is called a {\em chain}

254: of $G$ if the only edges in $G$ incident on these nodes are $v_0v_1,

255: v_1v_2,\ldots,v_{m-1}v_m, v_mv_{m+1}$, where $v_0$ and $v_{m+1}$ are

256: other nodes, denoted $\prr{C}$ and $\suu{C}$, respectively.  We use

257: $\head{C}$ to denote $v_1$ and $\tail{C}$ to denote $v_m$. Note that $C$

258: could be a single node.

259:

260: We use $[v,w]_{G}$ to denote the chain of $G$ starting from $v$ and

261: ending at $w$. Let $[v,-]_{G}$ denote the longest chain of $G$

262: starting from $v$, and $[-,v]_{G}$ the longest chain of $G$ ending

263: at $v$.  If it is clear from the context which graph is intended, then we

264: may omit the subscript $G$. Note that the above notation might not be

265: well-defined for any acyclic graph $G$, but it is so when $G$ is

266: composed of disjoint chains, which is the case of interest in this

267: paper.

268:

269: Suppose $H$ is a chain of $G$ containing a peak $v_\ell$ such that

270: (1) every node of $H$ preceding $v_\ell$ has nonnegative height in

271: $H$, and (2) every node of $H$ following $v_\ell$ has height in $H$ at

272: least the cost of $H$. In this case, we call $H$ a {\em hump}, and we say

273: $v_\ell$ is a {\em useful peak} of $H$.  This definition is illustrated in

274: Figure~\ref{hump}\note{Figure~\ref{hump}}.  We say a hump is an {\em $N$-hump} if its

275: cost is negative, a {\em $P$-hump} if its cost is nonnegative.

276:

277: \begin{figure}%[p]

278: \centerline{\input{fig1}}

279: %\centerline{\psfig{figure=hump.ps,width=4in,silent=1}}

280: \caption[]{A hump $H$ of 12 nodes: $v_1,v_2,\ldots,v_{12}$. The cost

281:   of each node is in the circle. By definition $\cost{H}=-2$,

282:   $\height{H}=2$, and $\fall{H}=4$. Both of $v_2$ and $v_8$ are peaks of

283:   $H$, but only $v_2$ is useful.}

284: \label{hump}

285: \end{figure}

286:

287: We are concerned primarily with graphs $G$ consisting of disjoint

288: chains $C_1,C_2,\ldots,C_p$.  For convenience, we assume that $G$

289: contains an {\em initial pseudonode} ($\init$),

290: preceding all nodes, and a {\em terminal pseudonode} ($\fini$),

291: following all nodes, each of cost zero. Thus, $\prr{v}$ could be

292: $\init$ and $\suu{v}$ could be $\fini$.

293:

294: For the rest of the section we describe the properties of humps in

295: schedules, mostly adapted from~\cite{Abdel-Wahab:1978:SMM}.

296:

297: \subsection{Hump Decomposition}

298: \label{property-sect}

299:

300: As part of their scheduling algorithm for series-parallel graphs,

301: Abdel-Wahab and Kameda~\cite{Abdel-Wahab:1980:SOS} show that in linear

302: time a sequence of nodes can be decomposed into a set of humps by an

303: algorithm $\decomp{}$.

304: %The algorithm $\decomp{}$ is

305: %shown in Figure~\ref{humpdecomp}\note{Figure~\ref{humpdecomp}}.

306: It

307: takes a chain as input and outputs a set of disjoint subchains such

308: that every subchain is a hump.

309: %The first Repeat-loop produces

310: %$N$-humps; and the second Repeat-loop produces $P$-humps.  Each loop

311: %alternates between identifying peaks and valleys.  It is not difficult

312: %to see that every sequence of nodes between two consecutive valleys is

313: %a hump.

314: The output of $\decomp{C}$ is unique, although the output is not

315: necessarily the only hump decomposition of $C$.  An example is shown

316: in Figure~\ref{decompose}\note{Figure~\ref{decompose}}. The chain is

317: decomposed by $\decomp{}$ into two $N$-humps and three $P$-humps.  For

318: a chain $C$, we say $H$ is a {\em hump of $C$} if $H\in\decomp{C}$.

319: It can be proved that $\decomp{}$ has the following properties.

320:

321: %\begin{figure}%[p]

322: %\begin{center}

323: %\fbox{

324: %\begin{minipage}{5in}

325: %\begin{center}

326: %\begin{tabbing}

327: %\quad\=\quad\=\quad\=\quad\=\quad\=\quad\=\kill

328: %Function $\decomp{C}$ \+\\

329: %$S:=\setof{};$\\

330: %$u:=$ the first valley of $C$;\\

331: %Repeat\+\\

332: %$v$\>$:=$ the first peak of $[\prr{C},u]$;\\

333: %$w$\>$:=$ the first valley of $[\prr{C},v]$;\\

334: %$S$\>$:=$ $S\cup\setof{[\suu{w},u]};$\\

335: %$u$\>$:=$ $w;$\-\\

336: %Until $u=\prr{C}$;\\

337: %$u:=$ the first valley of $C$;\\

338: %Repeat\+\\

339: %$v$\>$:=$ the last peak of $[u,\tail{C}]$;\\

340: %$w$\>$:=$ the last valley of $[v,\tail{C}]$;\\

341: %$S$\>$:=$ $S\cup\setof{[\suu{u},w]}$;\\

342: %$u$\>$:=$ $w;$\-\\

343: %Until $u=\tail{C}$;\\

344: %Return $S$;

345: %\end{tabbing}

346: %\end{center}

347: %\newpage

348: %\end{minipage}

349: %}

350: %\end{center}

351: %\caption[]{The algorithm decomposing a chain into a set of humps.}

352: %\label{humpdecomp}

353: %\end{figure}

354:

355:

356: \begin{figure}%[p]

357: %\centerline{\psfig{figure=chain.ps,width=4in,silent=1}} %

358: \centerline{\input{fig3}}

359: \caption[]{A chain decomposed into two $N$-humps and three $P$-humps.}

360: \label{decompose}

361: \end{figure}

362: %\clearpage

363:

364: \paragraph{Hump-decomposition properties:}

365: \begin{enumerate}

366: \item Suppose $H_1, H_2\in\decomp{C}$ and $H_1$ precedes $H_2$ in $C$.

367:   If $\cost{H_1}\ge0$, then $\cost{H_2}\ge0$ and $\fall{H_1}>\fall{H_2}$.

368:   If $\cost{H_2}<0$, then $\cost{H_1}<0$ and $\height{H_1}<\height{H_2}$.

369: \item If $v$ is the first valley of $[u,w]$, then $\decomp{[u,v]}$

370: (respectively, $\decomp{[\suu{v},w]}$) consists of $N$-humps

371: (respectively, $P$-humps) only.

372:

373: \item Let $C$ and $C'$ be two disjoint chains, whose humps are

374:   respectively $H_1,H_2,\ldots,H_k$ and $H_{k+1},H_{k+2},\ldots,H_\ell$

375:   in order.  Then, for some $1\le i\le k$ and $k\le j\le \ell$, the humps

376:   of $CC'$ are

377:   \[

378:      H_1,H_2,\ldots,H_i,(H_{i+1}\cdots H_j),H_{j+1},\ldots,H_\ell

379:   \]

380:   in order.

381: \end{enumerate}

382: The third property implies that

383: %\begin{eqnarray*}

384: \begin{displaymath}

385: \set{\tail{H}}{H\in\decomp{CC'}}\subseteq

386: \set{\tail{H}}{H\in\decomp{C}}\cup\set{\tail{H}}{H\in\decomp{C'}}.

387: \end{displaymath}

388: %\end{eqnarray*}

389:

390: It will turn out that once we decompose a chain into humps, we need

391: not be concerned with the internal structure of these humps. For each

392: hump $H$ we need only store $\cost{H}$ and $\height{H}$.  Thus, a

393: chain consisting of $\ell$ humps can be represented by a length-$\ell$

394: sequence of pairs $(\cost{H},\height{H})$. We call this sequence the

395: {\em hump representation} of the chain. Using the third

396: hump-decomposition property, one could straightforwardly derive the

397: hump representation of $C_1C_2$ from the hump representation of $C_1$

398: and that of $C_2$.  In particular, if we are given $\decomp{C}$ and

399: $\decomp{C'}$, then computing $\decomp{CC'}$ takes

400: $O(|\decomp{C}|+|\decomp{C'}|)$ time.

401:

402: \begin{figure}%[p]

403: %\centerline{\psfig{figure=useful.ps,width=4in,silent=1}}

404: \centerline{\input{fig4}}

405: \caption[]{The second sequence of nodes is obtained from the first one

406: by clustering the nodes $1--5$ to node $3$.}

407: \label{cluster-line}

408: \end{figure}

409:

410: \subsection{Hump Clustering}

411: %%Two lemmas are useful to our results. They are both generalizations

412: %%of lemmas in [.Kameda 1978.].

413: The following lemma concerns an operation on a schedule called {\em

414: clustering} the nodes of a hump.  Suppose $H$ is a hump of $G$, and

415: let $v$ be a useful peak of $H$.  Let $S$ be a schedule of $G$.  If

416: all the nodes of $H$ are consecutive in $S$, then we say $H$ is {\em

417: clustered in $S$}.  If every hump of $G$ is clustered in $S$, then we say

418: the schedule $S$ is {\em clustered}.  If a hump is not clustered in a

419: schedule, then we can modify the schedule to make it so.  To {\em cluster

420: the nodes of $H$ to $v$} is to change the positions of nodes of $H$

421: other than $v$ so that all the nodes of $H$ are consecutive, and the

422: order among nodes of $H$ is unchanged.  An example is shown in

423: Figure~\ref{cluster-line}\note{Figure~\ref{cluster-line}}.

424:

425: \begin{lemma}[See~\cite{Abdel-Wahab:1978:SMM}]

426: \label{cluster}

427: %{\rm [. Kameda 1978 .]}

428: Let $G$ be an acyclic graph with node costs and $H$ be a hump of

429: $G$. Suppose $S$ is a schedule of $G$. If $T$ is obtained from $S$

430: by clustering all nodes in $H$ to a useful peak of $H$, then $T$ is a

431: schedule of $G$ and $\height{T} \le \height{S}$.

432: \end{lemma}

433:

434: An example is shown in

435: Figure~\ref{cluster-fig}\note{Figure~\ref{cluster-fig}}. The height of

436: the schedule in Figure~\ref{cluster-fig}(c) is smaller than that of

437: the schedule in Figure~\ref{cluster-fig}(b).

438: %It follows from

439: %Lemma~\ref{cluster} that there is always a clustered optimal schedule

440: %of $G$.

441: Two clustered schedules of the graph in Figure~\ref{cluster-fig}(a)

442: are shown in Figures~\ref{cluster-fig}(d) and~\ref{cluster-fig}(e).

443: It follows from Lemma~\ref{cluster} that there is always an optimal

444: schedule of $G$ which is clustered.

445: %We prove the lemma as follows.

446:

447:

448: \begin{figure*}%[p]

449: %\centerline{\psfig{figure=cluster.ps,width=4.8in,silent=1}}

450: \centerline{\input{fig5}}

451: \caption[]{(a) A graph $G$ consists of two chains. The first chain

452: contains an $N$-hump followed by a $P$-hump. The second chain contains

453: two $P$-humps. (b) A schedule for $G$ of height four. (c) The schedule

454: obtained from the previous one by clustering the $N$-hump to its

455: useful peak. (d) A clustered schedule of $G$ of height two. This one

456: is obtained from the previous schedule by clustering every hump. (e) A

457: clustered schedule of $G$ with minimum height.}

458: \label{cluster-fig}

459: \end{figure*}

460:

461:

462: %\paragraph{Proof of Lemma~\ref{cluster}}

463: %The original lemma in~\cite{Abdel-Wahab:1978:SMM} restricts $G$ to be a

464: %chain graph. We can prove as follows that the same properties hold

465: %even without the restriction. Suppose $H=v_1\cdots v_p\cdots v_d$,

466: %where $v_p$ is a useful peak of $H$. Suppose $w_1w_2\cdots w_\ell$ is

467: %the segment of $S$ such that $w_{j_i}=v_i$ for all $1\le i\le d$ and

468: %$1=j_1<j_2<\cdots<j_d=\ell$.  The only difference between $S$ and $T$

469: %is that the segment of $W=w_1w_2\cdots w_\ell$ in $S$ is replaced with

470: %$W'=W'_1HW'_2$ in $T$, where

471: %\begin{eqnarray*}

472: %  W'_1&=&w_{j_1+1}\cdots w_{j_2-1}w_{j_2+1}\cdots w_{j_p-1}\\

473: %  W'_2&=&w_{j_p+1}\cdots w_{j_{p+1}-1}w_{j_{p+1}+1}\cdots w_{j_d-1}.

474: %\end{eqnarray*}

475: %Suppose $w_j$ is not in $H$. By definition of chains the precedence

476: %relations between $v_i$ and $w_j$ imposed by $G$ are the same over all

477: %$1\le i\le d$. Note that $w_j$ precedes some node of $H$ in $W$ and

478: %succeeds some other node of $H$ in $W$. It follows that there is no

479: %precedence constraint between $v_i$ and $w_j$. Therefore $T$ is a

480: %schedule of $G$.

481: %%

482: %

483: %We denote the heights of $w_i$ in $S$ and in $T$ by $h_S(w_i)$ and

484: %$h_T(w_i)$, respectively.  Note that $h_S(v_p)=h_T(v_p)$ since the set

485: %of nodes preceding $v_p$ does not change. We show that for every

486: %$1\le\alpha\le\ell$ there exists a $1\le\beta\le\ell$ such that

487: %$h_T(w_\alpha)\le h_S(w_\beta)$.

488: %\begin{itemize}

489: %\item If $\alpha=j_i$ for some $1\le i\le d$, then

490: %$h_T(w_\alpha)=h_T(v_i)\le h_T(v_p)=h_S(v_p)=h_S(w_{j_p})$.

491: %\item If $j_i<\alpha<j_{i+1}$ for some $p\le i<d$, then

492: %$h_T(w_\alpha)=\cost{v_{i+1}}+\cost{v_{i+2}}+\cdots+\cost{v_d}+h_S(w_\alpha)$.

493: %Since $H$ is a hump and $i\ge p$,

494: %$\cost{v_{i+1}}+\cost{v_{i+2}}+\cdots+\cost{v_d}=h_S(v_d)-h_S(v_i)\le

495: %0$.  Thus $h_T(w_\alpha)\le h_S(w_\alpha)$.

496: %\item If $j_i<\alpha<j_{i+1}$ for some $1\le i<p$, then

497: %$h_T(w_\alpha)=-\cost{v_1}-\cost{v_2}-\cdots-\cost{v_i}+h_S(w_\alpha)$.

498: %Since $H$ is a hump and $i<p$,

499: %$-\cost{v_1}-\cost{v_2}-\cdots-\cost{v_i}=h_S(v_0)-h_S(v_i)\le 0$,

500: %where $v_0$ is the node that precedes $v_1$ in $S$. Thus

501: %$h_T(w_\alpha)\le h_S(w_\alpha)$.

502: %\end{itemize}

503: %It follows that $\height{W'}\le\height{W}$. Since nodes other than the

504: %$w_i$'s preserve their heights in $S$ and $T$, the lemma is proved.

505: %\qed

506:

507:

508: \subsection{Standard Order}

509: A series $S_1 \cdots S_m$ of subsequences of nodes is in {\em standard

510: order} if it satisfies the following properties.

511:

512: \paragraph{Standard order properties.}

513: \begin{itemize}

514:    \item The series consists of $S_i$'s with negative costs, followed

515:          by $S_i$'s with nonnegative costs;

516:    \item The $S_i$'s with negative costs are in nondecreasing order of

517:          height; and the $S_i$'s with nonnegative costs are in

518:          nonincreasing order of reverse height.

519: \end{itemize}

520:

521: If the humps of a chain are $H_1,H_2,\ldots,H_m$ in order, then the

522: series $H_1H_2\cdots H_m$ is in standard order by the first

523: hump-decomposition property.

524:

525: \begin{lemma}[See~\cite{Abdel-Wahab:1978:SMM}]

526: \label{exchange}

527: Let $A$, $B$, $S_1$ and $S_2$ be subsequences of nodes. Suppose

528: $S=S_1ABS_2$ and $T=S_1BAS_2$. If the series $BA$ is in standard order,

529: then $\height{S}\ge\height{T}$.

530: \end{lemma}

531:

532: For example, the sequence in Figure~\ref{cluster-fig}(d) is a

533: clustered schedule of the graph in Figure~\ref{cluster-fig}(a). Note

534: that the series of the last two humps in the schedule is not in

535: standard order: the reverse height of the first hump (zero) is less

536: than that of the second hump (one). The schedule in

537: Figure~\ref{cluster-fig}(e) obtained by exchanging those two

538: clustered humps has height one less than that of the schedule in

539: Figure~\ref{cluster-fig}(d).

540:

541: %\paragraph{Proof of Lemma~\ref{exchange}}

542: %The original version of this lemma in~\cite{Abdel-Wahab:1978:SMM} restricts

543: %$A,B$ to be humps $G$. We can prove as follows that the same property

544: %holds even without these restrictions. Since the heights of nodes in

545: %$S_1$ and $S_2$ are not changed in $S$ and $T$, it suffices to ensure

546: %that

547: %\begin{eqnarray*}

548: % \height{AB}&=&\maxof{\height{A},\cost{A}+\height{B}}\\

549: %            &\ge&\maxof{\height{B},\cost{B}+\height{A}}\\

550: %            &=&\height{BA}.

551: %\end{eqnarray*}

552: %\begin{itemize}

553: %\item If $\cost{A}<0$ and $\cost{B}<0$, since the series $BA$ is in

554: %         standard order, $\height{A}\ge\height{B}$. Since

555: %         $\cost{B}<0$, it follows that $\height{A}>\cost{B}+\height{B}$.

556: %\item If $\cost{A}\ge0$ and $\cost{B}\ge0$, since the series $BA$ is

557: %         in standard order,

558: %         $\height{B}-\cost{B}=\fall{B}\ge\fall{A}=\height{A}-\cost{A}$.

559: %         Thus $\cost{A}+\height{B}\ge\cost{B}+\height{A}$. Since

560: %         $\cost{A}\ge0$, $\cost{A}+\height{B}\ge\height{B}$.

561: %\item If $\cost{A}\ge0$ and $\cost{B}<0$, then

562: %    $\height{A}>\cost{B}+\height{A}$ and

563: %    $\cost{A}+\height{B}\ge\height{B}$.

564: %\end{itemize}

565: %Since in all cases each of $\height{B}$ and $\cost{B}+\height{A}$ is

566: %less than or equal to one of $\height{A}$ and $\cost{A}+\height{B}$,

567: %the lemma is proved.

568: %\qed

569:

570: \subsection{Hump Merging}

571: A schedule of $G$ is in {\em standard form} if it is clustered and its

572: series of humps of $G$ is in standard order.  Let $T$ be any schedule

573: of $G$ in standard form.  Recall that by Lemma~\ref{cluster} there is

574: always an optimal schedule $S$ of $G$ which is clustered.  The humps

575: of $G$, while clustered in both $T$ and $S$, may not be in the same

576: order.  However, any two humps of the same chain of $G$ must be in the

577: same order in $T$ and in $S$, else either $T$ or $S$ is not a

578: schedule.  Take two consecutive humps in $S$ that are from different

579: chains and that are not in the same order as in $T$, and exchange

580: their positions.  By Lemma~\ref{exchange}, the resulting ordering has

581: height no more than $S$.  By a series of such exchanges, we eventually

582: obtain $T$ from $S$.  It follows that the height of $T$ is no more

583: than that of $S$, and hence that $T$ is optimal.  This argument shows

584: that every schedule in standard form is an optimal schedule of $G$.

585:

586: Let $I=\setof{H_1,H_2,\ldots,H_m}$, where the series $H_1H_2\cdots

587: H_m$ is in standard order. Suppose $\merge{I}$ returns a sequence of

588: nodes obtained by concatenating all humps in $I$ into standard order.

589: Namely, $\merge{I}=H_1H_2\cdots H_m$. Assume for uniqueness that

590: $\merge{}$ breaks ties in some arbitrary but fixed way.  By the above

591: argument we have the following lemma.

592:

593: \begin{lemma}[See~\cite{Abdel-Wahab:1978:SMM}]

594: The output of

595: \[

596:    \merge{\bigcup_{1\le i\le p}\decomp{C_i}}

597: \]

598: is an optimal schedule of $G$.

599: \label{optsched}

600: \end{lemma}

601:

602: An example is shown in Figure~\ref{cluster-fig}. Since the schedule in

603: Figure~\ref{cluster-fig}(e) is clustered and its series of humps is in

604: standard order, it is an optimal schedule of the graph in

605: Figure~\ref{cluster-fig}(a).  Abdel-Wahab and

606: Kameda~\cite{Abdel-Wahab:1978:SMM} showed that $\merge{\bigcup_{1\le

607: i\le p}\decomp{C_i}}$ can be obtained in $O(n\log p)$ time.  Note that

608: the output of function $\merge{}$ may not be unique.  Without loss of

609: generality, however, we may define $\merge{}$ more restrictively as

610: follows to make its output unique for the same $G$. Suppose $G$ is

611: composed of disjoint chains, $C_1, C_2, \ldots, C_p$ and

612: $I=\bigcup_{1\le i\le p}\decomp{C_i}$. Define $\merge{I}=H_1H_2\cdots

613: H_m$, where $\setof{H_1,H_2,\ldots,H_m}=I$ and the series

614: $H_1H_2\cdots H_m$ is in standard order. Furthermore, if $H_iH_j$ and

615: $H_jH_i$ are both in standard order, where $C_{i'}$ contains $H_i$,

616: $C_{j'}$ contains $H_j$, and $i'<j'$, then $H_i$ precedes $H_j$ in

617: $\merge{I}$.

618:

619: \section{Algorithm for Single Pair}

620: \label{sec:single}

621: %To detect race conditions between $v$ and $w$, we need to find a valid

622: %{\em subschedule} containing $v$ and $w$ such that its maximum

623: %cumulative cost is minimized.  Note that every valid subschedule of

624: %$G$ is a valid schedule of a prefix subgraph of $G$. A graph $G_0$ is

625: %a {\em prefix subgraph} of $G$ if (i) there is no arc of $G$ from any

626: %node of $G-G_0$ to any node of $G_0$; (ii) every arc of $G$ between

627: %two nodes of $G_0$ is also an arc of $G_0$. Clearly, in the graph of

628: %interest, i.e., $p$ parallel chains with an augmented arc, every

629: %prefix subgraph is determined by a cut comprising $p$ cutpoints.

630:

631: %Let $G$ be a graph composed of disjoint chains, $C_1, C_2, \ldots, C_p$.

632: %Recall that there are two pseudonodes, $\init$ and $\fini$. The cost of

633: %each node of $G$ is either $+1$ or $-1$.

634: %A subschedule $S$ of $G$ is {\em valid} if $\height{S}=0$.  Let $v$

635: %and $w$ be two nodes of $G$. In this section we show how to determine

636: %in linear time whether $v$ could precede $w$ in some valid subschedule

637: %of $G$.

638:

639: %\subsection{Notation}

640: A vector $\cut=(x_1,x_2,\ldots,x_p)$ of $p$ nodes is called a {\em

641: cut} of $G$ if each $x_i$ is either $\init$ or a node in $C_i$. We

642: call $x_i$ the $i$-th {\em cutpoint} of $\cut$.  The {\em prefix

643: subgraph} $G[\cut]$ of $G$ is the subgraph $\bigcup_{1\leq i\leq

644: p}[-,x_i]$.  Therefore, the problem we address can be reduced to

645: finding a cut such that the valid schedule of the prefix subgraph

646: determined by the cut has the minimal maximum cumulative cost.  Let

647: $h$ be the maximum cumulative cost of the optimal subschedule that

648: contains $v$ and $w$.  If $h$ is zero, then a valid subschedule exists

649: (i.e., the optimal valid subschedule.) If $h$ is positive, then there

650: is no valid subschedule because the maximum cumulative cost of any

651: valid subschedule is greater than or equal to $h$ and is thus

652: positive, too.  The rest of the section shows that a best cut can be

653: found in linear time.

654:

655: Since we will frequently encounter two cuts that differ at only one

656: cutpoint, let $\newcut{\cut, i, u}$ denote a cut $\cut'$ with

657: \begin{displaymath}

658:    \cut'(\ell)=\left\{

659:                  \begin{array}{ll}

660:                     \cut(\ell)&\mbox{if\ $\ell\ne i$};\\

661:                     u&\mbox{if\ $\ell=i$}.

662:                  \end{array}

663:                \right.

664: \end{displaymath}

665: A {\em $j$-schedule} of $G[\cut]$ is a schedule of $G[\cut]$ whose

666: last node is $\cut(j)$. We use $\jopt{\cut}$ to denote the height of

667: an optimal $j$-schedule of $G[\cut]$. Suppose $\cut(j)\ne\init$. One

668: can compute $\jopt{\cut}$ for a given $\cut$ as follows. Let

669: $\cut'=\newcut{\cut,j,\prr{\cut(j)}}$.  Clearly, if $S$ is an optimal

670: schedule of $G[\cut']$, then $S\cut(j)$ is an optimal $j$-schedule of

671: $G[\cut]$.

672: %(However, the other direction is not true.)

673: It follows that

674: \begin{displaymath}

675:   \jopt{\cut}=\max\setof{\height{G[\cut']},\cost{G[\cut']}+\height{\cut(j)}}.

676: \end{displaymath}

677: Note that $\opt{G[\cut]}$ and $\jopt{\cut}$ are both nonnegative.  We

678: use $v\chb w$ to signify that there is a valid subschedule of $G$ in

679: which $v$ precedes $w$. Let $v\notchb w$ signify that $v\chb w$ is not

680: true. Note that neither $\chb$ nor $\notchb$ is a partial order.

681:

682:

683: \subsection{Basic Idea}

684: Every valid subschedule of $G$ is a valid schedule of a prefix subgraph

685: $G[\cut]$ for some cut $\cut$ of $G$. Therefore, $v\chb w$ if and only if

686: there is a cut $\cut$ of $G$ such that $G[\cut]$ has a valid schedule in

687: which $v$ precedes $w$.  Let $h^*$ be the minimum of

688: $\height{G[\cut]\cup\setof{vw}}$ over all $G[\cut]$'s that contain $v$

689: and $w$. It follows that $v\chb w$ if and only if $h^*=0$. Hence, the

690: problem of determining whether $v\chb w$ is reduced to computing the

691: minimum height of a set of chain graphs each augmented with an

692: interchain arc.  Clearly, two immediate questions arise. 1) How do we

693: compute the height of $G[\cut]\cup\setof{vw}$, which is not even

694: serial-parallel?  2) How do we cope with the fact that there could be

695: exponential number of prefix subgraphs that contain $v$ and $w$?

696:

697: Let $v$ and $w$ be contained in two disjoint chains $C_i$ and $C_j$,

698: respectively.  The following observation will ease the situation.

699: Suppose $S$ is a subschedule of $G$ containing $w$. Let $S'$ be the

700: subschedule of $G$ obtained from $S$ by discarding all nodes succeeding

701: $w$ in $S$. Clearly, $\height{S'}\le\height{S}$. Therefore, without loss

702: of generality the minimum of $\height{G[\cut]\cup\setof{vw}}$ can be

703: computed over only cuts $\cut$ with $\cut(j)=w$. Moreover, we can

704: let $w$ always be the last node of a subschedule by considering only the

705: minimum-height $j$-schedule of each $G[\cut]$ that contains $v$. The

706: first question above is no longer an issue.

707:

708: It turns out that the second question is not an issue, either.  We

709: will show that in order to obtain the minimum-height of all those

710: $j$-schedules, it suffices to consider only $O\left(\sqrt{n}\right)$ cuts. In

711: particular each of those $O(\sqrt{n})$ cuts is uniquely determined by

712: its $j$-th cutpoint.

713:

714: \subsection{The Algorithm}

715: The algorithm takes $v$ and $w$ as inputs. Let $C_i$ contain $v$ and

716: $C_j$ contain $w$. The algorithm proceeds iteratively with different

717: cutpoint $\cut(i)$ such that $\cut(i)$ does not precede $v$. In each

718: iteration the algorithm calls the function $\best{}$ to obtain a

719: minimum-height $j$-schedule for $G[\cut]$ over all cuts $\cut$ with the

720: designated cutpoints in $C_i$ and $C_j$. By comparing the heights of

721: these $j$-schedules with respect to different $\cut(i)$'s, the algorithm

722: outputs the minimum height of $j$-schedules for $G[\cut]$ over all

723: $\cut$ such that $\cut(j)=w$ and $\cut(i)$ does not precede $v$. In

724: Figure~\ref{minheight}\note{Figure~\ref{minheight}} we give the algorithm to

725: compute $\opt{G[\cut^*]\cup\setof{vw}}$, where $\cut^*$ is a best cut of $G$

726: corresponding to $vw$.

727:

728: %%% The function $\best{}$ is the essential part of the algorithm.  {\bf We

729: %%% need a better explanation than what follows\ldots}.

730:

731: %%%

732: %%% The algorithm takes $v$ and $w$ as inputs. Let $C_i$ contain $v$ and

733: %%% $C_j$ contain $w$. After fixing $\cut(j)$ at $w$, the algorithm

734: %%% proceeds iteratively with different $\cut(i)$ in every iteration.

735: %%% Based on the designated $\cut(i)$ and the fixed $\cut(j)$, the

736: %%% algorithm obtains a $j$-schedule of minimum height for $G[\cut]$ over

737: %%% all cuts $\cut'$ such that $\cut'(i)=\cut(i)$ and $\cut'(j)=\cut(j)$.

738: %%% By comparing the heights of these $j$-schedules with respect to

739: %%% different $\cut(i)$'s, the algorithm outputs the minimum height of

740: %%% $j$-schedules for $G[\cut]$ over all cuts $\cut'$ such that

741: %%% $\cut'(j)=\cut(j)$.

742: %%%

743: %%% If $u$ is a node of $C_k$, by definition $[-,u]=[\head{C_k}, u]$ and

744: %%% $[u,-]=[u,\tail{C_k}]$.  In Figure~\ref{minheight} we give the

745: %%% algorithm to compute $\opt{G[\cut^*]\cup\setof{vw}}$, where $\cut^*$

746: %%% is a best cut of $G$ corresponding to $vw$.

747:

748:

749: Function $\best{}$ is the essential part of the algorithm. Based on the

750: given subset $F$ of $\setof{1,2,\ldots, p}$ and the given cut $\cut$, it

751: looks for a best cut $\cut^*$ corresponding to $vw$ such that

752: $\cut^*(k)=\cut(k)$ for every $k\in F$.  (In the case that we are

753: interested, $F=\setof{i,j}$.)  An optimal $j$-schedule of $G[\cut^*]$ is

754: then returned. Note that for every $k\not\in F$, $\cut^*(k)$ depends on

755: a value $s$, which is the maximum of $s_1$ and $s_2$. Each of $s_1$ and

756: $s_2$ is determined simply by chains with indices in $F$ and their

757: designated cutpoints.  Namely, the choices of $\cut^*(k)$'s for different

758: $k\not\in F$ are mutually independent. This is the key to our efficient

759: algorithm.

760:

761: In $\best{}$, we do not explicitly specify cutpoints of $\cut^*$.

762: Instead, we work on hump representation of subchains and every

763: cutpoint is implicitly specified by an $\tail{H}$ for some hump $H$.

764: Specifically, Step~1 ensures $\cut^*(k)=\cut(k)$ for every $k\in F,

765: k\ne j$. Steps~3 and~8 ensure $\cut^*(k)=\tail{H}$, where $H$ is the

766: highest $N$-hump of all $C_k$ with $\height{H}<s$ and $k\not\in F$.

767: %that has height less than $s$, for every

768: %$k\not\in F$.

769: Since we are considering $j$-schedules, $\cut^*(j)$ is specified

770: slightly differently. Although in Step~2 the subchain of $C_j$ is only

771: up to $\prr{\cut(j)}$, $\cut^*(j)$ is still $\cut(j)$, since

772: $j$-schedule $S^*\cut(j)$ is returned in Step~10.

773:

774: \begin{figure*}%[p]

775: \begin{center}

776: \fbox{

777: %\begin{center}

778: \begin{minipage}[t]{3in}

779: \begin{tabbing}

780: Function $\minheight{v,w}$\\

781: 1\quad\=$C_i$\quad\=$:=$ the chain containing $v;$\\

782: 2\>     $C_j$     \>$:=$ the chain containing $w;$\\

783: 3\>     $\cut(j)$ \>$:=w;$\\

784: 4\>     $h^*$     \>$:=\infty;$\\

785: 5\>     $I_0$     \>$:=\setof{v}\cup\decomp{[\suu{v},-]};$\\

786: 6\>     For every $\cut(i) \in \set{\tail{H}}{H\in I_0}$ do\\

787: 7\>\quad\= $S^*$\=$:=\best{j,\setof{i,j},\cut};$\\

788: 8\>\>      $h^*$\>$:=\min\setof{h^*,\height{S^*}};$\\

789: 9\>     Return $h^*;$

790: \end{tabbing}

791: \end{minipage}

792: }

793: \quad

794: \fbox{

795: \begin{minipage}[t]{3in}

796: \begin{tabbing}%\\ \\

797: Function $\best{j,F,\cut}$\\

798: 1\quad\=$I$\quad\=$:=\bigcup_{k\in F,k\ne j}\decomp{[-,\cut(k)]};$\\

799: 2\>     $J$\>     $:=\decomp{[-,\prr{\cut(j)}]};$\\

800: 3\>     $K$\>     $:=\bigcup_{k\not\in F}\decomp{C_k};$\\

801: 4\>     $s_1$\>   $:=\max\set{\height{H}}{H\in I\cup J, \cost{H}<0};$\\

802: 5\>     $S^+$\>   $:=\merge{\set{H\in I\cup J}{\cost{H}\ge0}};$\\

803: 6\>     $s_2$\>   $:=\height{S^+\cut(j)};$\\

804: 7\>     $s$\>     $:=\maxof{s_1,s_2};$\\

805: 8\>     $K_s$\>   $:=\set{H\in K}{\height{H}<s, \cost{H}<0};$\\

806: 9\>     $S_s$\>   $:=\merge{I\cup J\cup K_s};$\\

807: 10\>    Return    $S_s\cut(j);$

808: \end{tabbing}

809: %\end{center}

810: \end{minipage}

811: }

812: \end{center}

813: \caption[]{The algorithm for computing $\opt{G[\cut^*]\cup\setof{vw}}$

814: for a best cut $\cut^*$ of $G$ corresponding to $vw$.}

815: \label{minheight}

816: \end{figure*}

817:

818:

819: \subsection{Correctness}

820: We answer the following two questions in this subsection:

821: \begin{enumerate}

822: \item Why is it sufficient to try for $\cut(i)$ only those nodes in

823:   $\set{\tail{H}}{H\in I_0}$?

824: \item Why does $\best{j, F, \cut}$ return an optimal $j$-schedule of

825:    $G[\cut^*]$ with $\cut^*(k)=\cut(k)$ for every $k \in F$?

826: \end{enumerate}

827:

828: \begin{lemma}

829: Let $\cut$ be a cut of $G$. Suppose $[x,z]$ is a subchain of $G$

830: containing $\cut(i)$. Let $H$ be the hump of $[x,z]$ containing

831: $\cut(i)$. Let $y$ be the first valley of $[\prr{H},\cut(i)]$.

832: If

833: \[

834: \cut_1(k)=\left\{

835:      \begin{array}{ll}

836:        \cut(k)&\mbox{if $k\ne i$};\\

837:        \prr{H}&\mbox{if $k=i$ and $y=\prr{H}$};\\

838:        \tail{H}&\mbox{if $k=i$ and $y\ne\prr{H}$},

839:      \end{array}

840:           \right.

841: \]

842: then

843: $\jopt{\cut_1}\le\jopt{\cut}$.

844: \label{humpboundary}

845: \end{lemma}

846: \begin{proof}

847: Straightforward.

848: \end{proof}

849:

850: Note that the $\prr{H}$ in the above lemma is always an $\tail{H'}$ for

851: some hump $H'$ in $I_0$, which is defined in Step~5 of $\minheight{}$.

852: Therefore, Lemma~\ref{humpboundary} answers the first question.

853:

854: By definitions of $I$, $J$, and $K_s$ it is not difficult to see that

855: the sequence returned by $\best{j,F,\cut}$ is an optimal $j$-schedule of

856: $G[\cut^*]$ for some cut $\cut^*$ such that $\cut^*(k)=\cut(k)$ for

857: every $k\in F$. The correctness of $\minheight{}$ thus relies on the

858: following lemma, which answers the second question.

859:

860: \begin{lemma}

861: Let $\cut$ be a cut. Let $F$ be a subset of $\setof{1,2,\ldots,p}$

862: containing $j$. If $S^*=\best{j,F,\cut}$, then

863: $\height{S^*}\le\jopt{\cut}$.

864: \label{fix2cutpoints}

865: \end{lemma}

866: The rest of the subsection proves

867: %Lemma~\ref{humpboundary} and

868: Lemma~\ref{fix2cutpoints}.

869: %We need the following lemma to prove Lemma~\ref{humpboundary}.

870: %

871: %\begin{lemma}

872: %Let $\cut$ be a cut of $G$. Suppose $x$ is a node in $C_i$ preceding

873: %$\cut(i)$.

874: %Define $\cut_1$ by

875: %\[

876: %    \cut_1(k)=\left\{

877: %                  \begin{array}{ll}

878: %                     \cut(k)&\mbox{if $k\ne i$};\\

879: %                     \mbox{the first valley of $[x,\cut(i)]$}&\mbox{if $k=i$}.

880: %                  \end{array}

881: %              \right.

882: %\]

883: %Then $\jopt{\cut_1}\le\jopt{\cut}$.

884: %\label{valley}

885: %\end{lemma}

886: %\begin{proof}

887: %Straightforward.

888: %\end{proof}

889: %

890: %\begin{proof}

891: %By the second hump decomposition property, there exists a series of

892: %nodes in $C_i$, $\cut_1(i)=y_1, y_2,\ldots,y_m=\cut(i)$, such that

893: %every subchain $[\suu{y_k},y_{k+1}]$ is a $P$-hump. Suppose $S$ is an

894: %optimal $j$-schedule of $G[\cut]$. Let $S'$ be obtained from $S$ by

895: %clustering every $P$-hump $[\suu{y_k},y_{k+1}]$ to its useful peak. By

896: %Lemma~\ref{cluster}, $S'$ is an optimal $j$-schedule of $G[\cut]$. Let

897: %$S_1$ be obtained from $S'$ by removing every $P$-hump

898: %$[\suu{y_k},y_{k+1}]$. Clearly, $\height{S_1}\le\height{S'}$. Since

899: %$S_1$ is a $j$-schedule of $G[\cut_1]$,

900: %$\jopt{\cut_1}\le\height{S_1}\le\height{S'}=\jopt{\cut}$.

901: %\end{proof}

902: %

903: %Lemma~\ref{humpboundary} can be proved as follows.

904: %\paragraph{Proof of Lemma~\ref{humpboundary}}

905: %When $y=\prr{H}$, by choice of $y$, $\jopt{\cut_1}\le\jopt{\cut}$ is

906: %immediate from Lemma~\ref{valley}.

907: %

908: %If $H$ is a $P$-hump, by definition of $\decomp{}$, $\prr{H}$ is the

909: %first valley of $[\prr{H},\tail{H}]$, so $y=\prr{H}$. Therefore when

910: %$y\ne\prr{H}$, $H$ must be an $N$-hump. We claim that $[\head{H}, y]$

911: %is a hump. Since $y$ is the first valley of $[\prr{H},\cut(i)]$ and

912: %$y\ne\prr{H}$, $y$ is also the first valley of $[\head{H}, \cut(i)]$.

913: %By definition of humps, $y$ cannot precede any useful peak of $H$.  It

914: %follows that a useful peak of $H$ is also a useful peak of

915: %$[\head{H},y]$. Thus $[\head{H},y]$ is a hump.  Let us use $H'$ to

916: %denote $[\head{H},y]$.  Clearly, $\height{H'}=\height{H}$. Since

917: %$H$ is an $N$-hump, $\cost{H'}\ge\cost{H}$.

918: %

919: %Let us define $\cut'$ by

920: %\[

921: %   \cut'(k)=\left\{

922: %                   \begin{array}{ll}

923: %                      \cut(k)&\mbox{if $k\ne i$}\\

924: %                      y &\mbox{if $k=i$}.

925: %                   \end{array}

926: %                \right.

927: %\]

928: %By Lemma~\ref{valley}, $\jopt{\cut'}\le\jopt{\cut}$.  Suppose $S$ is

929: %an optimal $j$-schedule of $G[\cut']$ in which $H'$ is clustered.

930: %We write $S=S_1H'S_2$. Inserting the sequence $[\suu{y},\tail{H}]$

931: %immediately after $H'$, we obtain a $j$-schedule $S^*=S_1HS_2$ for

932: %$G[\cut_1]$. We show $\height{S^*}\le\height{S}$.

933: %

934: %Now $\height{S^*}$ is equal to the maximum of $\height{S_1}$,

935: %$\cost{S_1}+\height{H}$, and $\cost{S_1H}+\height{S_2}$.  Clearly,

936: %\begin{equation}

937: %  \height{S_1}\le\height{S}.

938: %  \label{ineq6}

939: %\end{equation}

940: %Since $\height{H}=\height{H'}$,

941: %\begin{equation}

942: %   \cost{S_1}+\height{H}\le\height{S}.

943: %   \label{ineq7}

944: %\end{equation}

945: %Since $\cost{H}\le\cost{H'}$, $\cost{S_1H}\le\cost{S_1H'}$. Hence

946: %\begin{equation}

947: %   \cost{S_1H}+\height{S_2}\le\height{S}.

948: %   \label{ineq8}

949: %\end{equation}

950: %Combining (\ref{ineq6}), (\ref{ineq7}), and (\ref{ineq8}), we obtain

951: %$\height{S^*}\le\height{S}$.

952: %\qed

953: Let $F_\ell=\setof{1,\ldots,\ell-1,\ell+1,\ldots,p}$.  The following

954: lemma is a special case of Lemma~\ref{fix2cutpoints}, in which $F$ is

955: composed of $p-1$ numbers.

956:

957: \begin{lemma}

958: Let $\cut$ be a cut.  If $S^*=\best{j,F_\ell,\cut}$ for some $\ell\ne

959: j$, then $\height{S^*}\le\jopt{\cut}$.

960: \label{bestcut1}

961: \end{lemma}

962: \begin{proof}

963: Define $\cut_1$ by

964: \[

965:    \cut_1(k)=\left\{

966:                    \begin{array}{ll}

967:                       \cut(k)&\mbox{if $k\ne\ell$};\\

968:                       \mbox{the first valley of $[-,\cut(\ell)]$}

969:                             & \mbox{if $k=\ell$}.

970:                    \end{array}

971:                 \right.

972: \]

973: Then it is not difficult to see $\jopt{\cut_1}\le\jopt{\cut}$.  Let

974: $\cut'$ be the cut with $\height{S^*}=\jopt{\cut'}$, i.e., $S^*$ is a

975: $j$-schedule of $G[\cut']$.  By definition of $\best{}$, $\cut'$ and

976: $\cut_1$ could differ only at the $\ell$-th position. Clearly, it

977: suffices to show $\jopt{\cut'}\le\jopt{\cut_1}$.

978:

979: Let $w=\cut_1(j)$. Let $L=\decomp{[-,\cut_1(k)]}$. Define

980: \[

981:    S=\merge{I\cup J\cup L},

982: \]

983: where $I$ and $J$ are defined in Steps~1 and~2 of $\best{}$.  Clearly,

984: $Sw$ is an optimal $j$-schedule of $G[\cut_1]$. Thus,

985: $\height{Sw}=\jopt{\cut_1}$.  By choice of $\cut_1(\ell)$, $L$

986: contains no $P$-hump. Hence, by the uniqueness assumption of

987: $\merge{}$, we could write $Sw=S_1S^+w$, where $S^+$ is defined in

988: Step~5 of $\best{}$. We prove $\jopt{\cut'}\le\jopt{\cut_1}$ by

989: showing that $\cut'(\ell)$ succeeds $\cut(\ell)$ if and only if

990: $\jopt{\cut'}\le\height{Sw}$ as follows.

991:

992: \paragraph{Case 1: $\cut'(\ell)$ succeeds $\cut(\ell)$.}

993: Since $L$ contains no $P$-hump, each hump of $[-,\cut_1(\ell)]$

994: appears in $S_1$. Therefore, $S_1S'S^+w$ is a $j$-schedule of

995: $G[\cut']$, where $S'=[\suu{\cut_1(\ell)},\cut'(\ell)]$.  We show

996: $\height{S_1S'S^+w}\le\height{S_1S^+w}$.  Now

997: $\height{S_1S'S^+w}=\max\setof{\height{S_1},\cost{S_1}+\height{S'},\cost{S_1S'}+\height{S^+w}}$.

998: Clearly,

999: \begin{equation}

1000:   \height{S_1}\le\height{S_1S^+w}.

1001:   \label{ineq1}

1002: \end{equation}

1003: By definition of $F$, the $K_s$ defined in Step~8 of $\best{}$ is

1004: composed of the $N$-humps of $C_\ell$ that have heights less than $s$.

1005: Therefore, by choice of $\cut'(\ell)$ every hump of $[-,\cut'(\ell)]$

1006: has height less than $s$. It follows from the standard order of humps

1007: in $S'$ that $\height{S'}<s$. By Step~7 of $\best{}$,

1008: $s=\maxof{s_1,s_2}$. If $s=s_2=\height{S^+w}$, as defined in Step~6 of

1009: $\best{}$, then $\cost{S_1}+\height{S'}<\cost{S_1}+\height{S^+w}$. If

1010: $s=s_1=\height{H^*}$, where $H^*$ is a highest $N$-hump in $I\cup J$,

1011: then we could write $S_1=S_2H^*S_3$. It follows that

1012: \begin{eqnarray*}

1013:   \cost{S_1}+\height{S'}&=&\cost{S_2H^*S_3}+\height{S'}\\

1014:                         &<&\cost{S_2}+\height{H^*}\\

1015:                         &\le&\height{S_2H^*}\\

1016:                         &\le&\height{S_1}.

1017: \end{eqnarray*}

1018: Therefore, in either case we have

1019: \begin{equation}

1020:    \cost{S_1}+\height{S'}<\height{S_1S^+w}.

1021:    \label{ineq2}

1022: \end{equation}

1023: By choice of $\cut'(\ell)$, $\cost{S'}<0$. Hence,

1024: \begin{eqnarray}

1025:   \cost{S_1S'}+\height{S^+w}&<&\cost{S_1}+\height{S^+w}\nonumber\\

1026:   &\le&\height{S_1S^+w}.  \label{ineq3}

1027: \end{eqnarray}

1028: Combining Equations~(\ref{ineq1}),~(\ref{ineq2}), and~(\ref{ineq3}),

1029: we obtain $\height{S_1S'S^+w}\le\height{Sw}$.

1030:

1031: \paragraph{Case 2: $\cut'(\ell)$ precedes $\cut_1(\ell)$.}

1032: Let $S'=[\suu{\cut'(\ell)},\cut_1(\ell)]$. By choice of

1033: $\cut'(\ell)$, it is not difficult to see

1034: \begin{displaymath}

1035:  \decomp{[-,\cut_1(\ell)]}=\decomp{[-,\cut'(\ell)]}\cup\decomp{S'}.

1036: \end{displaymath}

1037: By choice of $\cut_1(\ell)$, $\decomp{S'}$ contains only $N$-humps of

1038: heights no less than $s$. Note that every $N$-hump in $I\cup J$ has

1039: height no more than $s$. By standard form of $S$, we know that $S'$ is

1040: a suffix of $S_1$. Therefore, we could write $Sw=S_2S'S^+w$. Removing

1041: $S'$ from $Sw$, we obtain a $j$-schedule $S_2S^+w$ of $G[\cut']$. We

1042: show $\height{S_2S^+w}\le\height{Sw}$.

1043:

1044: Now

1045: $\height{S_2S^+w}=\max\setof{\height{S_2},\cost{S_2}+\height{S^+w}}$. Clearly,

1046: \begin{equation}

1047:   \height{S_2}\le\height{S_2S'S^+w}=\height{Sw}.

1048:   \label{ineq4}

1049: \end{equation}

1050: Since each hump of $S'$ has height no less than $s$,

1051: $\height{S'}\ge s$.

1052: Hence, $\height{S'S^+w}\ge\height{S'}\ge s\ge s_2=\height{S^+w}$.

1053: It follows that

1054: \begin{eqnarray}

1055:   \cost{S_1}+\height{S^+w}&\le&\cost{S_1}+\height{S'S^+w}\nonumber\\

1056:   &\le&\height{Sw}.  \label{ineq5}

1057: \end{eqnarray}

1058: Combining Equations~(\ref{ineq4}) and~(\ref{ineq5}), we obtain

1059: $\height{S_1S^+w}\le\height{Sw}$.

1060: \end{proof}

1061:

1062: Now we are ready to prove Lemma~\ref{fix2cutpoints}.

1063:

1064: \paragraph{Proof of Lemma~\ref{fix2cutpoints}}

1065: Recall that $S^*=\best{j,F,\cut}$. Let $\cut'$ be the cut such that

1066: $S^*$ is a $j$-schedule of $G[\cut']$. ($S^*$ is certainly an optimal

1067: $j$-schedule of $G[\cut']$.) We use the algorithm in

1068: Figure~\ref{cuttransform}\note{Figure~\ref{cuttransform}} to prove the

1069: lemma. Procedure $\cuttrans{}$ proceeds with iterations, in which the

1070: value of $\ell$ varies among $\setof{1,\ldots,p}$. If $\ell\not\in F$,

1071: then the value of $\cut(\ell)$ is updated. Since $S$ is an optimal

1072: $j$-schedule of $G[\cut']$, it follows from Lemma~\ref{bestcut1} that

1073: $\jopt{\cut'}\le\jopt{\cut}$ always holds during the while-loop. If we

1074: could show that $\cuttrans{}$ always terminates, then the lemma is

1075: proved.

1076:

1077: Let $s^*_1$, $s^*_2$, and $s^*$ be the $s_1$, $s_2$, and $s$ in the

1078: execution of $\best{j,F,\cut}$. Let $s_1$, $s_2$, and $s$ be those in

1079: the execution of $\best{j, F_\ell, \cut}$. The values of $s_1$, $s_2$,

1080: and $s$ change as the while-loop of $\cuttrans{}$ proceeds. We show

1081: that $\cut$ eventually becomes $\cut'$ by arguing that $s$ eventually

1082: becomes $s^*$.

1083:

1084: Since $F\subseteq F_\ell$, $s_1\ge s^*_1$ always holds. By definition

1085: of $\best{}$, whenever Step~7 of $\cuttrans{}$ is finished,

1086: $[-,\cut(\ell)]$ contains only $N$-humps. Thus, after the first $p$

1087: iterations of the while-loop, $[-,\cut(\ell)]$ contains no $P$-hump

1088: for every $\ell\not\in F$. Henceforth, $s_2=s^*_2$ and therefore

1089: $s=\maxof{s_1,s_2}\ge\maxof{s^*_1,s^*_2}=s^*$. If $s>s^*$, then

1090: $s=s_1>s^*$. Since $s_1>s^*$, there must be an $N$-hump $H$ in

1091: $\bigcup_{k\not\in F}\decomp{[-,\cut(k)]}$ such that $\height{H}=s_1$.

1092: Since $s=s_1$, in the next iteration when $C_\ell$ contains $H$,

1093: $\cut(\ell)$ will be moved before $H$ by definition of $\best{}$. It

1094: follows that the value of $s$ is nonincreasing and $s$ will become

1095: $s^*$. Once $s=s^*$, in the following $p$ iterations, $\cut(k)$ will

1096: be moved to $\cut'(k)$ for every $k\not\in F$. The algorithm then

1097: terminates.

1098: \qed

1099:

1100: \begin{figure}%[p]

1101: \begin{center}

1102: \fbox{

1103: \begin{minipage}{5in}

1104: \begin{center}

1105: \begin{tabbing}

1106: \quad\=\quad\=\quad\=\quad\=\quad\=\quad\=\kill

1107: Procedure $\cuttrans{\cut,\cut^*}$ \\

1108: 1 \> $\ell:=0;$\\

1109: 2 \> While $\cut^*\ne\cut$ do\\

1110: 3 \> \> $\ell:= (\ell \bmod p) + 1;$\\

1111: 4 \> \> If $\ell\not\in F$\\

1112: 5 \> \> \> $S:=\best{j,F_\ell,\cut};$\\

1113: 6 \> \> \> $\cut':=$ the cut such that $S$ is an

1114:            optimal $j$-schedule of $G[\cut']$;\\

1115: 7 \> \> \> $\cut := \cut';$

1116: \end{tabbing}

1117: \end{center}

1118: \end{minipage}

1119: }

1120: \end{center}

1121: \caption[]{The algorithm transforms $\cut$ to $\cut^*$. We prove

1122: Lemma~\ref{fix2cutpoints} by showing that this algorithm always

1123: terminates.}

1124: \label{cuttransform}

1125: \end{figure}

1126:

1127: \subsection{Implementation}

1128: \label{singleimplementation}

1129: Recall that $\decomp{C}$ runs in time linear in $|C|$, the length of

1130: chain $C$. It follows that the time complexity of Steps~1--5 and

1131: Step~9 of $\minheight{}$ is $O(n)$. Suppose the order of nodes

1132: assigned to $\cut(i)$ in the for-loop is the same as their order in

1133: $C_i$.  In the subsection we focus on implementing $\best{}$ such that

1134: the for-loop runs in time $O(n)$.

1135:

1136: \paragraph{Number of Iterations}

1137: The following lemma ensures that the size of $I_0$ is

1138: $O(\sqrt{|C_i|})$. It follows that the number of iterations is

1139: $O(\sqrt{n})$.

1140:

1141: \begin{lemma}

1142: Suppose $C$ is a chain with node costs $\pm1$. The number of humps in

1143: $\decomp{C}$ is $O(\sqrt{|C|})$.

1144: \label{rootn}

1145: \end{lemma}

1146: \begin{proof}

1147: Since the costs of nodes are either $+1$ or $-1$, a hump of height

1148: $\ell$ contains at least $\ell$ nodes. For the same reason, a hump of

1149: reverse height $\ell$ contains at least $\ell$ nodes. By the first

1150: hump decomposition property, the heights of the $N$-humps in

1151: $\decomp{C}$ are different, and so are the reverse heights

1152: of the $P$-humps in $\decomp{C}$. If there are $n_1$

1153: $N$-humps and $n_2$ $P$-humps in $\decomp{C}$, then

1154: $|C|=\Omega(n^2_1+n^2_2)=\Theta((n_1+n_2)^2)$. This proves the lemma.

1155: \end{proof}

1156:

1157: \paragraph{Compact Representation of Humps}

1158: For the sake of efficiency, we do not deal with the internal structure

1159: of humps in $\best{}$. It suffices to represent each hump $H$ by a

1160: pair $(\cost{H},\height{H})$ and work on the compact representation of

1161: humps. Therefore, each of the $I$, $J$, and $K$ computed in the first

1162: three steps is a set of pairs. Clearly, each of these three steps

1163: takes $O(n)$ time.  However, the contents of $J$ and $K$ do not change

1164: in different iterations. Thus, Steps~2 and~3 need only be executed

1165: once.

1166:

1167: By $F=\setof{i,j}$, we have $I=\decomp{[-,\cut(i)]}$.  Suppose $I_t$

1168: and $\cut_t$ are the $I$ and $\cut$ in the $t$-th iteration for some

1169: $t\ge 2$. By the order of nodes assigned to $\cut(i)$, we need not

1170: recompute $\decomp{[-,\cut_t(i)]}$ from scratch. In the $t$-th

1171: execution of Step~1, $[-,\cut_t(i)]$ is obtained by appending a hump

1172: $[\suu{\cut_{t-1}(i)},\cut_t(i)]$ to $[-,\cut_{t-1}(i)]$. By the

1173: argument following the hump decomposition properties in

1174: \S\ref{property-sect}, the $t$-th execution of Step~1 takes

1175: $O(|I_{t-1}|)$ time. By Lemma~\ref{rootn}, the time complexity of all

1176: executions of Step~1 is $O(n+\sqrt{n}\times\sqrt{n})=O(n)$.

1177:

1178: \paragraph{Priority Tree}

1179: To compute $s_1$ efficiently, we resort to a {\em priority tree}, a

1180: complete binary tree with $n+1$ leaves.\footnote{Note that there are

1181: other ways to implement Step 4 to run in linear time. However, the

1182: necessity of priority tree will become clear when we address the

1183: implementation of the all-pairs algorithm.}  Each leaf keeps two

1184: values, {\em count} and {\em maxheight}. The cost of the $(h+1)$-st

1185: leaf is the number of $N$-humps of height $h$ in $I\cup J$. The

1186: maxheight of the $(h+1)$-st leaf is 0 (respectively, $h$), if its

1187: count is zero (respectively, nonzero). The maxheight of an internal

1188: node is the maximum maxheight of its children. It follows that the

1189: maxheight of the root of a priority tree is the correct value of

1190: $s$. The priority tree can be built in time $O(n)$. Whenever a hump is

1191: added to or deleted from $I\cup J$, the priority tree can be updated

1192: in time $O(\log n)$. Since $J$ is fixed, to compute $s_1$ in $t$-th

1193: iteration for every $t\ge 2$, we add humps in $I_t-I_{t-1}$ to $I\cup

1194: J$, remove humps in $I_{t-1}-I_t$ from $I\cup J$, and update the

1195: priority tree. By the third hump decomposition property, we have

1196: \begin{equation}

1197:   \sum_{2\leq t\leq q_i}|I_t-I_{t-1}|+|I_{t-1}-I_t|=O\left(\sqrt{|C_i|}\right),

1198:   \label{changeofI}

1199: \end{equation}

1200: where $q_i$ is the number of humps in $C_i$.

1201: Hence, the time complexity of all executions of Step~4 is

1202: $O(n+\sqrt{n}\times\log n)=O(n)$.

1203:

1204: \paragraph{Hump Tree}

1205: To obtain the value of $s_2$, it is not necessary to know the value of

1206: $S^+$. We need only to obtain the height of $S^+\cut(j)$. Similarly,

1207: the actual value of $S_s$ is irrelevant. What we compare in Step~8 of

1208: $\minheight{}$ is the height of $S_s\cut(j)$. We need a data structure

1209: to compute these two heights efficiently.

1210:

1211: Let $L$ be a set of humps such that $\height{H}\le n$ and $\fall{H}\le

1212: n$ for every $H\in L$. A {\em hump tree} $T$ for $L$ is a binary tree

1213: composed of two complete binary subtrees. Each subtree has $n+1$

1214: leaves. Let $T_N$ be the left subtree and $T_P$ be the right subtree.

1215: The $(h+1)$-st leaf of $T_N$ associates with the set of $N$-humps of

1216: height $h$ in $L$.  The $(h+1)$-st leaf of $T_P$ associates with the

1217: set of $P$-humps of reverse height $n-h$ in $L$. Let $T_x$ be the

1218: subtree of $T$ rooted at $x$. Let $L_x$ be the set of humps associated

1219: with leaves of $T_x$. Define $\height{T_x}=\height{\merge{L_x}}$ and

1220: $\cost{T_x}=\cost{\merge{L_x}}$.  Clearly, when $L=I\cup J$,

1221: $\height{T_P}=\height{S^+}$ and $\cost{T_P}=\cost{S^+}$. When $L=I\cup

1222: J\cup K_s$, $\height{T}=\height{S_s}$ and $\cost{T}=\cost{S_s}$. The

1223: heights of $S^+\cut(j)$ and $S_s\cut(j)$ can then be computed by

1224: \begin{eqnarray*}

1225: \height{S^+\cut(j)}&=&\maxof{\height{S^+},\cost{S^+}+\height{\cut(j)}};\\

1226: \height{S_s\cut(j)}&=&\maxof{\height{S_s},\cost{S_s}+\height{\cut(j)}}.

1227: \end{eqnarray*}

1228:

1229:

1230: Let us keep $\height{T_x}$ and $\cost{T_x}$ in $x$ for every node $x$ of

1231: $T$. Therefore, the hump tree $T$ takes $O(n)$ space.  We show how to

1232: compute $\height{T_x}$ and $\cost{T_x}$ for every node $x$ from leaves

1233: to root. When $x$ is a leaf of $T$, the humps in $L_x$ have the same

1234: height if $x$ is in $T_N$, and the same reverse height if $x$ is in

1235: $T_P$. It is not difficult to see that $\cost{T_x}=\sum_{H\in

1236:   L_x}\cost{H}$; and

1237: \[

1238:   \height{T_x}=

1239:     \left\{

1240:        \begin{array}{ll}

1241:           0 & \mbox{if $L_x=\emptyset$};\\

1242:           h & \mbox{if $x$ is the $(h+1)$-st leaf of $T_N$};\\

1243:           \cost{T_x}-h & \mbox{if $x$ is the $(n-h+1)$-st leaf of $T_P$}.

1244:         \end{array}

1245:     \right.

1246: \]

1247:

1248: When $x$ is an internal node of $T$, $\height{T_x}$ and $\cost{T_x}$

1249: can be computed by the information kept in the children of $x$.

1250: Suppose $y$ and $z$ are the left and right children of $x$,

1251: respectively. For any $H$ in $L_y$ and $H'$ in $L_z$, by the way we

1252: associate humps with leaves, the series $HH'$ is in standard order.

1253: Hence,

1254: \begin{eqnarray*}

1255:   \height{T_x}&=&\maxof{\height{T_y},\cost{T_y}+\height{T_z}};\\

1256:   \cost{T_x}&=&\cost{T_y}+\cost{T_z}.

1257: \end{eqnarray*}

1258: It follows that the hump tree $T$ for $L$ can be built in time

1259: $O(n+|L|)$.

1260:

1261: Once $T$ is built, inserting a hump to $L$ can be done efficiently.

1262: Suppose we insert $H$ to $L$. For the case that $H$ is an $N$-hump, if

1263: $L_x=\emptyset$, then let $h(T_x)=h$; otherwise, add $\cost{H}$ to

1264: $\cost{T_x}$, where $x$ is the $(\height{H}+1)$-st leaf of $T_N$. If

1265: $H$ is a $P$-hump, then we add $\cost{H}$ to both $\cost{T_x}$ and

1266: $\height{T_x}$, where $x$ is the $(n-\fall{H}+1)$-st leaf of $T_P$.

1267: To update $T$, we simply update the internal nodes on the path from

1268: $x$ to the root of $T$. Deleting a hump from $L$ can be done similarly

1269: by replacing every addition with a subtraction. Clearly, both insertion

1270: and deletion take time $O(\log n)$.

1271:

1272: To compute the heights of $S^+\cut(j)$ and $S_s\cut(j)$, we need not

1273: maintain a hump tree for $I\cup J$ and another hump tree for $I\cup

1274: J\cup K_s$. Suppose $K^-$ is the set of $N$-humps in $K$, i.e.,

1275: $K^-=\set{H\in K}{\cost{H}<0}$. It suffices to maintain a hump tree

1276: $T$ for $I\cup J\cup K^-$. Since there is no $P$-hump in $K^-$, it is

1277: still true that $\height{T_p}=\height{S^+}$ and

1278: $\cost{T_p}=\cost{S^+}$. Although the hump tree is not for $I\cup

1279: J\cup K_s$, the values of $\height{S_s}$ and $\cost{S_s}$ can be

1280: efficiently obtained by the procedure in

1281: Figure~\ref{remove-range}\note{Figure~\ref{remove-range}}.

1282: Procedure $\removerange{}$ acts as if the $N$-humps of heights no less

1283: than $s$ are removed from the hump tree for $I\cup J\cup K^-$.

1284: Therefore, the resulting $\height{T}$ and $\cost{T}$ are $\height{S_s}$

1285: and $\cost{S_s}$, respectively. Clearly, $\removerange{}$ takes $O(\log

1286: n)$ time.  Since we maintain the hump tree for $I\cup J\cup K^-$ in

1287: every iteration, we use $O(\log n)$ space to keep those modified

1288: information of $T$. After obtaining the information we need, we

1289: restore the hump tree for $I\cup J\cup K^-$ in time $O(\log n)$.

1290:

1291:

1292: \begin{figure}%[p]

1293: \begin{center}

1294: \fbox{

1295: \begin{minipage}{5in}

1296: \begin{center}

1297: \begin{tabbing}

1298: \quad\=\quad\=\quad\=\quad\=\quad\=\quad\=\kill

1299: Procedure $\removerange{T,s}$ \\

1300: 1 \> $y$ := the $s$-th leaf of $T_N$;\\

1301: 2 \> While $y$ is not the root of $T_N$ do\\

1302: 3 \> \> $x := \mbox{the parent of $y$};$\\

1303: 4 \> \> If $y$ is the left child of $x$ then\\

1304: 5 \> \> \> $(\height{T_x},\cost{T_x}):=(\height{T_y},\cost{T_y})$;\\

1305: 6 \> \> else\\

1306: 7 \> \> \> Recompute $\height{T_x}$ and $\cost{T_x}$;\\

1307: 8 \> \> $y := x;$\\

1308: 9 \> Recompute $\height{T}$ and $\cost{T}$;

1309: \end{tabbing}

1310: \end{center}

1311: \end{minipage}

1312: }

1313: \end{center}

1314: \caption[]{Let $T$ be the hump tree for $I\cup J\cup K^-$. This

1315: procedure acts as if the $N$-humps of heights no less than $s$ are

1316: removed from the hump tree.}

1317: \label{remove-range}

1318: \end{figure}

1319:

1320: Let $I_t$ be the $I$ in the $t$-th iteration for any $t\ge1$.  To

1321: obtain the hump tree for $I_t\cup J\cup K^-$ from $I_{t-1}\cup J\cup

1322: K^-$, we need to insert the humps in $I_t-I_{t-1}$ to $T$ and remove

1323: the humps in $I_{t-1}-I_{t}$ from $T$.  Since each insertion and

1324: deletion takes $O(\log n)$ time, it follows from

1325: Equation~(\ref{changeofI}) that the overall time complexity for

1326: obtaining the hump tree from that of previous iteration is

1327: $O(\sqrt{n}\times\log n)$.  Recall that building a hump tree for $L$

1328: takes $O(n+|L|)$ time. Since there are $n$ nodes in $G$, $|I_1\cup

1329: J\cup K^-|=O(n)$. It follows that the time complexity for building a

1330: hump tree for $I_1\cup J\cup K^-$ is $O(n)$.

1331:

1332:

1333:

1334: By the above arguments we implement $\best{}$ such that the overall

1335: time complexity of the while-loop in $\minheight{}$ is $O(n)$.

1336: We therefore have the following theorem.

1337:

1338: \begin{theorem}

1339: \label{singlechb}

1340: Suppose $G$ is a graph consisting of $p$ disjoint chains comprising

1341: $n$ nodes, where each node represents either a $P$-operation or a

1342: $V$-operation.  For any two nodes $v$ and $w$ of $G$, one can

1343: determine in $O(n)$ time whether there is a valid subschedule in which

1344: $v$ precedes $w$.

1345: \end{theorem}

1346:

1347:

1348: \section{Algorithm for All Pairs}

1349: \label{sec:all}

1350: %\subsection{All-Pairs Race-Condition Detection}

1351: %For the purpose of debugging parallel programs, it is important to

1352: %exactly detect all races. Hence, we need to determine the above for all

1353: %pairs of nodes $v$ and $w$.

1354: In this section we show how to determine the $\chb$ relations for all

1355: pairs of nodes in $G$. The linear-time algorithm for a single pair of

1356: nodes, applied to all $O(n^2)$ pairs, takes time $O(n^3)$.

1357: Fortunately, there is a {\em compact representation} of this

1358: information.  To represent this information, it is sufficient that we

1359: indicate, for each node $v$, and for each chain $C$ not containing

1360: $v$, the first node $w$ in $C$ such that $v$ precedes $w$ in some

1361: valid subschedules. This representation has size $O(np)$, where $n$ is

1362: the number of nodes and $p$ is the number of chains.  The

1363: representation can be used to determine in constant time whether there

1364: is a race between two given operations $v$ and $w$, assuming that the

1365: input $p$ chains are schedulable.\footnote{Since the $p$ chains

1366: represent a trace of a parallel program, the assumption holds. For

1367: arbitrary $p$ chains, one can determine whether they are schedulable

1368: using the algorithm in~\cite{Abdel-Wahab:1978:SMM}.}

1369: %A race exists if

1370: %either operation can precede the other.

1371: To determine whether $v$ can precede $w$, we obtain the first node in

1372: $w$'s chain that could be preceded by $v$ in some valid subschedules.

1373: If this first node is numbered later than $w$, then $v$ can precede

1374: $w$.  Otherwise, $v$ cannot precede $w$.  We therefore consider the

1375: complexity of constructing such a representation.  Clearly, it can be

1376: constructed by a sequence of calls to the algorithm of Theorem

1377: \ref{singlechb}. We show how to do much better; in fact the time

1378: required by our algorithm is only $O(\log n)$ times the time required

1379: simply to write down the output.

1380:

1381: %Recall that $G$ is composed of $p$ disjoint chains,

1382: %$C_1,C_2,\ldots,C_p$, of $n$ nodes.

1383:

1384: \subsection{The Algorithm}

1385:

1386: Let $\first{j}{v}$ denote the first node in $C_j$ that could be

1387: preceded by $v$ in some valid subschedule of $G$. The output of the

1388: all-pairs algorithm is thus the value of $\first{j}{v}$ for every node

1389: $v$ and $1\le j\le p$. Note that $\first{j}{v}$ could be $\fini$,

1390: which means that none of nodes in $C_j$ can be preceded by $v$ in any

1391: valid subschedule of $G$.

1392:

1393: Let us describe first the procedure $\chainpair{i,j}$ which computes

1394: $\first{j}{v}$ for every $v\in C_i$. The all-pairs algorithm simply

1395: calls $\chainpair{i,j}$ for every $1\le i, j\le p$.  For convenience, let

1396: $\su{j}{w}=\suu{w}$ for every $w\in C_j$ and let

1397: $\su{j}{\init}=\head{C_j}$.  Procedure $\chainpair{i,j}$ is shown in

1398: Figure~\ref{chainpair}\note{Figure~\ref{chainpair}}.  The algorithm starts

1399: with letting $v$ be $\tail{C_i}$ and letting $w$ be $\tail{C_j}$. The

1400: repeat-loop proceeds by replacing $w$ with $\prr{w}$. Once $\minheight{v,w}$

1401: is not zero, the algorithm reports $\su{j}{w}$ as $\first{j}{v}$. After

1402: replacing $v$ with $\prr{v}$, the repeat-loop continues the same procedure

1403: to search for new $\first{j}{w}$.

1404:

1405: \begin{figure}%[p]

1406: \begin{center}

1407: \fbox{

1408: \begin{minipage}{5in}

1409: \begin{center}

1410: \begin{tabbing}

1411: \quad\=\quad\=\quad\=\quad\=\quad\=\quad\=\kill

1412: Procedure $\chainpair{i,j}$ \\

1413:  1\>     $(v, w) := (\tail{C_i}, \tail{C_j})$;\\

1414:  2\>     Repeat\\

1415:  3\>        \>If $w=\init$ \=then \=$h := 1$;\\

1416:  4\>        \>             \>else \>$h := \minheight{v,w}$;\\

1417:  5\>        \>If $h>0$ \>then \=$\first{j}{v} :=\su{j}{w}$;\\

1418:  6\>        \>         \>     \>          $v := \prr{v}$;\\

1419:  7\>        \>         \>else\>$w := \prr{w}$;\\

1420:  8\>     Until $v = \init$;

1421: \end{tabbing}

1422: \end{center}

1423: \end{minipage}

1424: }

1425: \end{center}

1426: \caption[]{The algorithm that computes $\first{j}{v}$ for every $v \in C_i$.}

1427: \label{chainpair}

1428: \end{figure}

1429:

1430: \subsection{Correctness}

1431: By induction on $v$ we show that $\chainpair{i,j}$ correctly computes

1432: $\first{j}{v}$ for every $v \in C_i$.

1433:

1434: When $v=\tail{C_i}$, procedure $\chainpair{i,j}$ keeps replacing $w$

1435: with $\pr{j}{w}$ until $w = \init$ or $\minheight{v,w}> 0$.  If

1436: $w=\init$, then $\opt{G\cup\setof{vw'}}=0$ for every $w' \in C_j$.  Thus,

1437: $\first{j}{v}=\su{j}{\init}=\su{j}{w}=\head{C_j}$ is correct. If

1438: $\minheight{v,w}>0$, then $v\notchb w$. It follows that $v\notchb w'$ for

1439: every $w'$ precedes $w$ in $C_j$. Since $\minheight{v,\su{j}{w}}=0$,

1440: $v\chb\su{j}{w}$.  Therefore, $\su{j}{w}$ is the correct value of

1441: $\first{j}{v}$. This confirms the induction basis.

1442:

1443: Suppose the procedure $\chainpair{i,j}$ correctly reports $\su{j}{w}$ as

1444: the value of $\first{j}{\su{i}{v}}$ in a certain iteration of the

1445: repeat-loop.  We need to show that in the remaining iterations

1446: $\first{j}{v}$ will also be correctly computed. Since

1447: $\su{i}{v}\chb\su{j}{w}$, $v\chb\su{j}{w}$. It follows that $v\chb w'$

1448: (and thus $\minheight{v,w'}=0$) for every $w'$ succeeding $w$ in $C_j$.

1449: In other words, to locate the first node in $C_j$ that could be preceded

1450: by $v$, it suffices to start testing from $w$.  For the same reason as

1451: above, $\chainpair{i,j}$ reports the correct value of $\first{j}{w}$.

1452: The correctness is therefore ensured.

1453:

1454: \subsection{Implementation}

1455: We show in this subsection how to implement $\chainpair{i,j}$ to run

1456: in time $O((|C_i|+|C_j|)\log n)$. It then follows that the time

1457: complexity of the all-pairs algorithm is $O(np\log n)$.

1458:

1459: Suppose each time before we call $\chainpair{i,j}$, we have the hump

1460: tree for $I\cup J\cup K^-$, where

1461: \begin{eqnarray*}

1462:  I&=&\decomp{C_i};\\

1463:  J&=&\decomp{[-,\prr{\tail{C_j}}]};\\

1464:  K^-&=&\set{H\in\bigcup_{{\scriptstyle1\le k\le p}\atop{\scriptstyle k\ne i,j}}

1465:             \decomp{C_k}}{\cost{H}<0}.

1466: \end{eqnarray*}

1467: It follows from \S\ref{singleimplementation} that the first

1468: call to $\minheight{v,w}$ can be computed in time $O(\log n)$, since

1469: only one $\cut(i)$ need be considered.  In each of the remaining

1470: iterations of the repeat-loop, we either replace $v$ with $\prr{v}$ or

1471: replace $w$ with $\prr{w}$. The remaining lemma guarantees that to

1472: compute each of the following $\minheight{v,w}$, we need only try $v$ as

1473: the cutpoint of $C_i$.

1474:

1475: \begin{lemma}

1476: Consider any iteration of the repeat-loop in $\chainpair{i,j}$. When

1477: the algorithm computes $h=\minheight{v,w}$, $v$ is the only cutpoint

1478: of $C_i$ that could make $h$ zero.

1479: \label{onlycutpoint}

1480: \end{lemma}

1481: \begin{proof}

1482: By definition of $\chainpair{}$, when computing $\minheight{v,w}$,

1483: $\first{j}{\su{i}{v}}$ always succeeds $w$ in $C_j$.  Assume for a

1484: contradiction that $u$ is a node succeeding $v$ in $C_i$ such that

1485: there is a cut $\cut$ of $G$ where $\cut(i)=u$, $\cut(j)=w$, and

1486: $\jopt{\cut}=0$. It follows that $u\chb w$ and thus $\su{i}{v}\chb w$.

1487: This contradicts the fact that $\first{j}{\su{i}{v}}$ succeeds $w$ in

1488: $C_j$.

1489: \end{proof}

1490:

1491: \begin{theorem}

1492:   \label{allchb}

1493:   Suppose $G$ is as in Theorem \ref{singlechb}.  The compact

1494:   representation of the relation ``$v$ precedes $w$ in some valid

1495:   subschedules'' can be constructed in $O(np\log n)$ time and $O(n)$

1496:   space.

1497: \end{theorem}

1498: \begin{proof}

1499: Note that in each iteration of the repeat-loop, either $v$ or $w$ is

1500: moved by one position. Since the costs of $v$ and $w$ are $\pm1$, by

1501: the first hump decomposition property the number of humps updated in

1502: $I\cup J\cup K^-$ between two consecutive iterations is a constant.

1503: Thus, each execution of $\minheight{v,w}$ takes only time $O(\log n)$.

1504: Since the number of iterations of the repeat-loop is $O(|C_i|+|C_j|)$,

1505: each execution of $\chainpair{i,j}$ takes time

1506: \begin{equation}

1507:   O((|C_i|+|C_j|)\times\log n).

1508:   \label{time1}

1509: \end{equation}

1510: It remains to show how to efficiently build the hump tree for each

1511: execution of $\chainpair{i,j}$.

1512:

1513: The very first hump tree can be constructed in time

1514: \begin{equation}

1515:   O(n).

1516:   \label{time2}

1517: \end{equation}

1518: Consider the moment when $\chainpair{i,j}$ is just finished and the

1519: all-pairs algorithm is about to call $\chainpair{i_1, j_1}$.  Since

1520: all humps in $I\cup J$ have been deleted during the execution of

1521: $\chainpair{i,j}$, the current $T$ is the hump tree for the $N$-humps

1522: in $\bigcup_{1\le k\le p; k\ne i,j}\decomp{C_k}$.  In order to obtain

1523: the hump tree for $\chainpair{i_1, j_1}$, we have to add the $N$-humps

1524: in $\decomp{C_i}\cup\decomp{C_j}$, delete the $N$-humps in

1525: $\decomp{C_{j_1}}$ from $T$, and then insert the humps in

1526: \begin{displaymath}

1527: \set{H\in\decomp{C_{i_1}}}{\cost{H}\ge0}\cup\decomp{[-,\prr{\tail{C_{j_1}}}]}

1528: \end{displaymath}

1529: to $T$. The hump decomposition can be done in time

1530: \begin{equation}

1531:   O(|C_i|+|C_j|+|C_{i_1}|+|C_{j_1}|).

1532:   \label{time3}

1533: \end{equation}

1534: The insertion and deletion of humps can be done in time

1535: \begin{equation}

1536:   O\left(\left(\sqrt{|C_i|}+\sqrt{|C_j|}+\sqrt{|C_{i_1}|}+\sqrt{|C_{j_1}|}\right)\times\log n\right).

1537:   \label{time4}

1538: \end{equation}

1539: By (\ref{time1}),~(\ref{time2}),~(\ref{time3}), and~(\ref{time4}), the

1540: overall time complexity of the all-pairs algorithm is

1541: \begin{displaymath}

1542: O(n)+\sum_{1\le i,j\le p}\left(O(|C_i|+|C_j|) +

1543: O\left(\sqrt{|C_i|}+\sqrt{|C_j|}\right) \times\log n +

1544: O(|C_i|+|C_j|)\times\log n\right),

1545: \end{displaymath}

1546: which is $O(np\log n)$.

1547: %Theorem~\ref{allchb} is proved.

1548: \end{proof}

1549:

1550: \section{NP-completeness}

1551: \label{sec:2semaphores}

1552:

1553: In this section we sketch the proof for the following theorem.

1554: \begin{theorem}

1555: \label{2semaphores}

1556: The race-condition detection problem for a parallel program that uses

1557: more than one semaphore is NP-complete.

1558: \end{theorem}

1559: %the NP-complete proof for determining

1560: %whether $v\chb w$ for chain graphs of operations on more than one

1561: %semaphore.

1562: The proof is by reduction from the NP-complete

1563: uniform-cost SMMCC problem, where the node costs are restricted to

1564: $\pm1$~\cite{Garey:1979:CIG}. The reduction has three steps. Given a SMMCC

1565: problem for a uniform-cost graph $G_0$ of $n$ nodes, we construct

1566: $O(\log n)$ chain graphs with $n+2$ semaphores. The first step of the

1567: reduction shows that the SMMCC problem for $G_0$ can be reduced to

1568: determining whether each of those $O(\log n)$ chain graphs has a valid

1569: schedule.  The second step shows that each of those $O(\log n)$ chain

1570: graphs can be {\em simulated} by a chain graph with only two

1571: semaphores. In other words, the simulated chain graph has a valid

1572: schedule if and only if the simulating chain graph has a valid

1573: schedule.  The last step shows that the simulating chain graph has a

1574: valid schedule if and only if $v\chb w$, for some $v$ and $w$, in the

1575: same chain graph.  We elaborate the details of the reduction in the

1576: appendix.

1577:

1578: \section*{Acknowledgments}

1579: We thank the anonymous referees for their helpful remarks that

1580: significantly improved the presentation of the paper.

1581:

1582: %

1583: %\section*{References}

1584: %\label{section:refs}\frenchspacing\indent

1585: %.[]

1586: \bibliographystyle{abbrv}

1587: \bibliography{race}

1588:

1589: \appendix

1590: \section{Appendix}

1591: %\subsection{Definition and Notation}

1592: Let $G$ be a chain graph. Each node of $G$ is an operation on a

1593: semaphore. An operation on semaphore $S$ is either $+S$, incrementing

1594: the value of $S$ by one, or $-S$, decrementing the value of $S$ by

1595: one.  A subschedule of $G$ is {\em valid} if the value of each

1596: semaphore is always nonpositive during the execution of the

1597: subschedule.  Let $v$ and $w$ be two nodes of $G$.  If there exists a

1598: subschedule of $G$ in which $v$ precedes $w$, then we say $v\chb w$.

1599: Clearly, determining whether $v\chb w$ is in NP.  If $G$ is allowed to

1600: use more than one semaphore, then we prove the NP-hardness by a

1601: three-step reduction from the uniform-cost SMMCC problem.

1602:

1603: \subsection{First Step}

1604: Let $G_0$ be an acyclic directed graph of $n$ nodes, $v_1,v_2,\ldots,

1605: v_n$. The cost of each node is either $+1$ or $-1$.  Suppose we would

1606: like to know whether $\height{G_0}\leq\ell$. We construct a chain graph

1607: $G_1$ composed of $2n+2$ chains of operations on $n+2$ semaphores, and

1608: argue that $G_1$ has a valid schedule if and only if

1609: $\height{G_0}\leq\ell$.  Note that $0\leq\height{G_0}\leq n$. Therefore,

1610: $\height{G_0}$ can be obtained by $O(\log n)$ queries of whether a chain

1611: graph of $n+2$ semaphores has a valid schedule.

1612:

1613: Let $n^+$ be the number of nodes with positive costs. Let $n^-$ be the

1614: number of nodes with negative costs. Clearly, $n^+-n^-$ is the sum of

1615: node costs of $G_0$. Let $d_i$ be the number of outgoing arcs of $G_0$

1616: from $v_i$.  The $n+2$ semaphores for $G_1$ are

1617: $S_1,S_2,\ldots,S_n,S_\alpha,S_\beta$.

1618: %To distinguish the last two semaphores, we

1619: %also write $S_{\alpha}=S_{n+1}$ and $S_{\beta}=S_{n+2}$.

1620: Let the $2n+2$ chains of $G_1$ be $C_1,\ldots, C_{n+1}$, and

1621: $C'_1,\ldots,C'_{n+1}$, all initially empty.  We construct $G_1$ from

1622: $G_0$ by the procedure $\construct{}$ in

1623: Figure~\ref{fig:construct}\note{Figure~\ref{fig:construct}}, which

1624: runs in polynomial time. Without loss of generality we can assume that

1625: $\ell-n^++n^-$, the number in the second-to-last statement of the

1626: procedure \fname{Construct}, is nonnegative, since otherwise

1627: $\height{G_0}>\ell$ is immediately concluded.

1628:

1629: \begin{figure}%[p]

1630:   \begin{center}

1631:     \fbox{

1632:       \begin{minipage}{4in}

1633:         \begin{tabbing}

1634:           \quad\quad\=\quad\=\quad\=\quad\=\quad\=\quad\=\quad\=\kill

1635:           $\construct{G_0}$\\

1636:           1\>For $i:=1$ to $n$ do\\

1637:           2\>\>For $j:=1$ to $n$ do\\

1638:           3\>\>\>If $v_jv_i$ is an arc of $G_0$ then\\

1639:           4\>\>\>\>Append a $+S_j$ to $C_i$.\\

1640:           5\>\>If the cost of $v_i$ is $+1$ then\\

1641:           6\>\>\>Append a $+S_{\alpha}$ to $C_i$.\\

1642:           7\>\>else (i.e., the cost of $v_i$ is $-1$)\\

1643:           8\>\>\>Append a $-S_{\alpha}$ to $C_i$.\\

1644:           9\>\>\>Append a $+S_{\alpha}$ and $-S_{\alpha}$ to $C'_i$.\\

1645:           10\>\>Append $d_i$ copies of $-S_i$ to $C_i$.\\

1646:           11\>\>Append a $-S_{\beta}$ to $C_i$.\\

1647:           12\>Append $n$ copies of $+S_{\beta}$ to $C_{n+1}$.\\

1648:           13\>Append $\ell-n^++n^-$ copies of $+S_{\alpha}$ to $C_{n+1}$.\\

1649:           14\>Append $\ell$ copies of $-S_{\alpha}$ to $C'_{n+1}$.

1650:         \end{tabbing}

1651:       \end{minipage}

1652:       }

1653:   \end{center}

1654:   \caption{The procedure constructs a chain graph $G_1$ such that $G_1$

1655:     has a valid schedule if and only if $\height{G_0}\leq\ell$.}

1656:   \label{fig:construct}

1657: \end{figure}

1658:

1659: \begin{figure*}%[p]

1660:

1661: %  \begin{center}

1662: %    \leavevmode

1663: %    \begin{tabular}{c}

1664: %      \ovalnode{a}{$v_1:+1$}\\\\

1665: %      \ovalnode{b}{$v_2:+1$}\quad\ovalnode{c}{$v_3:+1$}\\\\

1666: %      \ovalnode{e}{$v_4:-1$}\quad\ovalnode{d}{$v_5:-1$}

1667: %    \end{tabular}

1668: %    \ncline{->}{a}{b}

1669: %    \ncline{->}{a}{c}

1670: %    \ncline{->}{b}{e}

1671: %    \ncline{->}{c}{d}

1672: %    \ncline{->}{c}{e}

1673: %    \ncline{->}{d}{e}

1674: %  \end{center}

1675: %  \vspace{0.5in}

1676:   \begin{center}

1677:     \input{fig11}\qquad

1678:     \begin{tabular}[b]{|r|r|r|r|r||r||r|r||r|}

1679:       $C_1$ &$C_2$ &$C_3$ &$C_4$ &$C_5$ &$C_6$ &$C'_4$&$C'_5$&$C'_6$\\

1680:       \hline

1681:             &      &      &$+S_2$&      &$+S_{\beta}$&$+S_{\alpha}$&$+S_{\alpha}$&$-S_{\alpha}$\\

1682:             &      &      &$+S_3$&      &$+S_{\beta}$&$-S_{\alpha}$&$-S_{\alpha}$&$-S_{\alpha}$\\

1683:             &$+S_1$&$+S_1$&$+S_5$&$+S_3$&$+S_{\beta}$&      &      &      \\

1684:             &      &      &      &      &$+S_{\beta}$&      &      &      \\

1685:       $+S_{\alpha}$&$+S_{\alpha}$&$+S_{\alpha}$&$-S_{\alpha}$&$-S_{\alpha}$&$+S_{\beta}$&      &      &      \\

1686:             &      &      &      &      &      &      &      &      \\

1687:       $-S_1$&$-S_2$&$-S_3$&      &$-S_5$&      &      &      &      \\

1688:       $-S_1$&      &$-S_3$&      &      &      &      &      &      \\

1689:             &      &      &      &      &      &      &      &      \\

1690:       $-S_{\beta}$&$-S_{\beta}$&$-S_{\beta}$&$-S_{\beta}$&$-S_{\beta}$&$+S_{\alpha}$&      &      &

1691:     \end{tabular}

1692:     \caption[]{An example for the first step of the reduction. Suppose

1693:       we would like to determine whether $\height{G_0}\leq2$, where

1694:       $G_0$ is the graph on top. We then construct, by

1695:       \fname{Construct}, the chain graph $G_1$ at bottom. Note that there

1696:       are one $+S_{\alpha}$ at the end of $C_6$ and two $-S_{\alpha}$ in

1697:       $C'_6$, according to the last two statements of \fname{Construct}. It

1698:       follows from Lemmas~\ref{lemma:np-1}(1) and~\ref{lemma:np-2} that

1699:       that exists a valid schedule of the chains at bottom if and only if

1700:       the height of the graph on top is at most two.}

1701:     \label{fig:np-complete1}

1702:   \end{center}

1703: \end{figure*}

1704:

1705:

1706: An example is shown in

1707: Figure~\ref{fig:np-complete1}\note{Figure~\ref{fig:np-complete1}}.

1708: The intuition is as follows. The (only) operation for $S_{\alpha}$ in

1709: $C_i$ corresponds to $v_i$, where the ``sign'' of $S_{\alpha}$

1710: reflects the cost of $v_i$.  We use the first $n$ semaphores,

1711: $S_1,\ldots,S_n$, to enforce the execution of these $n$ operations for

1712: $S_{\alpha}$ to obey the precedence constraints imposed by $G_0$. In

1713: Figure~\ref{fig:np-complete1}, for instance, in order to reach the

1714: $-S_{\alpha}$ in $C_4$, we have to unlock the $+S_2$ (and $+S_3$,

1715: $+S_5$) in the same chain first.  Since the only $-S_2$ is after the

1716: $+S_{\alpha}$ in $C_2$, we know the $+S_{\alpha}$ in $C_2$ must be

1717: executed before the $-S_{\alpha}$ in $C_4$.

1718:

1719: The $-S_{\beta}$'s at the end of $C_1,\ldots,C_n$ are to ensure that as

1720: long as the last $+S_{\beta}$ in $C_{n+1}$ is executed, all operations

1721: in $C_1,\ldots, C_n$ are already executed. The function of those $\ell$

1722: copies of $-S_{\alpha}$ in $C'_{n+1}$ is clear: The larger $\ell$, the

1723: easier for $G_1$ to have a valid schedule.  The purpose of the

1724: $+S_{\alpha}, -S_{\alpha}$ pairs in $C'_1,\ldots, C'_n$ and those

1725: $\ell-n^++n^-$ copies of $+S_{\alpha}$'s at the end of $C_{n+1}$ will

1726: become clear as we proceed.  Basically they are used to ensure that

1727: $G_1$ has some kind of ``pairwise'' schedule, as long as $G_1$ has a

1728: valid schedule. One can verify that there are the same number of

1729: $+S_i$'s and $-S_i$'s in $G_1$, for each $1\leq i\leq n+2$.

1730:

1731: For the rest of the subsection, we prove that $\height{G_0}\leq\ell$

1732: if and only if $G_1$ has a valid schedule. An implication of the

1733: following proofs is that $G_1$ has a valid schedule if and only if it

1734: has a valid schedule executable by some procedure \fname{Pairwise},

1735: which will be given in the proofs.

1736: \begin{lemma}

1737:   \label{lemma:np-1}

1738: \begin{enumerate}

1739: \item If $G_1$ has a valid subschedule containing the last $+S_{\alpha}$

1740:   of $C_{n+1}$, then $\height{G_0}\leq\ell$.

1741: \item

1742:   If $G_1$ has a valid schedule, then $\height{G_0}\leq\ell$.

1743: \end{enumerate}

1744:

1745: \end{lemma}

1746: \begin{proof}

1747:   Clearly, it suffices to prove the first statement, since the second

1748:   statement follows immediately from the first statement.

1749:

1750:   Let $X$ be a valid subschedule of $G_1$ as described in the lemma. We

1751:   show $\height{G_0}\leq\ell$.  Let $O_i$ be the operation of

1752:   $S_{\alpha}$ in $C_i$. Since $X$ is valid and contains the last

1753:   $+S_{\alpha}$ of $C_{n+1}$, $X$ must contain all the operations in

1754:   $C_1,\ldots,C_n$.  Therefore, every $O_i$, $1\leq i\leq n$, is in $X$.

1755:

1756:   Suppose the order of those $O_i$'s in $X$ is $O_{k_1},O_{k_2},\ldots,

1757:   O_{k_n}$. By the definition of \fname{Construct}, if $v_j$ is

1758:   reachable from $v_i$ in $G_0$, then $O_j$ does not precede $O_i$ in

1759:   $X$. It follows that the sequence $Y=v_{k_1}v_{k_2}\cdots v_{k_n}$ is

1760:   a schedule of $G_0$.  Therefore, it suffices to show

1761:   $\height{Y}\leq\ell$.

1762:

1763:   Assume $\height{Y}>\ell$ for a contradiction. If

1764:   we count only those $O_i$'s as the operations for $S_{\alpha}$ in $X$, then

1765:   the maximum value of $S_{\alpha}$ would be greater than $\ell$ during

1766:   the execution of $X$. Note that there are $\ell+n^-$ other

1767:   $-S_{\alpha}$'s in $C'_1,\ldots,C'_{n+1}$, which are the only hope for

1768:   bringing the maximum value of $S_{\alpha}$ down to zero.  By the

1769:   construction of $C'_1,\ldots, C'_n$, however, we know $n^-$ of those

1770:   $-S_{\alpha}$'s have to be preceded in $X$ by $n$ other

1771:   $+S_{\alpha}$'s. It follows that even if we count all operations for

1772:   $S_{\alpha}$ together, the maximum value of $S_{\alpha}$ would be

1773:   greater than zero during the execution of $X$.  This contradicts the

1774:   fact that $X$ is a valid schedule of $G_1$.

1775: \end{proof}

1776:

1777:

1778: \begin{lemma}

1779:   \label{lemma:np-2}

1780:   If $\height{G_0}\leq\ell$, then $G_1$ has a valid schedule.

1781: \end{lemma}

1782: \begin{proof}

1783:   Let $Y=v_{k_1}v_{k_2}\cdots v_{k_n}$ be a schedule of $G_0$ with

1784:   $\height{Y}\leq\ell$.  Let $m_i$ be the sum of costs of

1785:   $v_{k_1},\ldots,v_{k_i}$. Clearly, $m_n=n^+-n^-$, which is the sum of

1786:   node costs of $G_0$.  Since $\height{Y}\leq\ell$, we know that

1787:   $m_i\leq\ell$ for every $1\leq i\leq n$. We claim that $G_1$ can be

1788:   executed by the procedure \fname{Pairwise} in

1789:   Figure~\ref{fig:pairwise}\note{Figure~\ref{fig:pairwise}}.

1790: \begin{figure}%[p]

1791:   \begin{center}

1792:     \fbox{

1793:     \begin{minipage}{5in}

1794:       \begin{tabbing}

1795:         \qquad\=\quad\=\quad\=\quad\=\quad\=\quad\=\quad\=\kill

1796:         Procedure \fname{Pairwise}\\

1797:         1\>For $k:=k_1,k_2,\ldots,k_n$ do\\

1798:         2\>\>For $j:=1$ to $n$ do\\

1799:         3\>\>\>If $v_j v_k$ is an arc of $G_0$ then\\

1800:         4\>\>\>\>Execute a $-S_k$ in $C_j$.\\

1801:         5\>\>\>\>Execute the $+S_k$ in $C_k$.\\

1802:         6\>\>If $O_k = +S_{\alpha}$ then\\

1803:         7\>\>\>Execute one of the $-S_{\alpha}$'s\\

1804:         8\>\>\>\>in $C'_1,C'_2,\ldots,C'_{n+1}$.\\

1805:         9\>\>\>Execute the $+S_{\alpha}$ in $C_k$.\\

1806:         10\>\>else (i.e., $O_k = -S_{\alpha}$)\\

1807:         11\>\>\>Execute the $-S_{\alpha}$ in $C_k$.\\

1808:         12\>\>\>Execute the $+S_{\alpha}$ in $C'_k$.\\

1809:         13\>For $i:=1$ to $n$ do\\

1810:         14\>\>Execute the $-S_{\beta}$ in $C_i$.\\

1811:         15\>\>Execute a $+S_{\beta}$ in $C_{n+1}$.\\

1812:         16\>For $i:=1$ to $\ell-m_n$ do\\

1813:         17\>\>Execute a $-S_{\alpha}$ in $C'_1,\ldots,C'_{n+1}$.\\

1814:         18\>\>Execute a $+S_{\alpha}$ in $C_{n+1}$.

1815:       \end{tabbing}

1816:     \end{minipage}

1817:     }

1818:   \end{center}

1819:   \caption{Procedure \fname{Pairwise}.}

1820:   \label{fig:pairwise}

1821: \end{figure}

1822:

1823: Note that in the schedule of $G_1$ executed by \fname{Pairwise}, each

1824: operation $-S_i$ is immediately followed by an operation $+S_i$.  Not

1825: every chain graph has such a ``pairwise'' schedule, however, we show

1826: that $G_1$ does.  We first show that the first for-loop of

1827: \fname{Pairwise} can be finished for $G_1$. Specifically, suppose the

1828: following claim hold:

1829: \begin{quote}

1830: \label{lemma:pairwise-1} {\bf Claim}

1831:   For each $1\leq i\leq n$, the $i$-th iteration of the first for-loop

1832:   of \fname{Pairwise} can be executed for $G_1$. Furthermore, after

1833:   executing the $i$-th iteration,

1834:   \begin{itemize}

1835:   \item the remaining operations in $C_{k_i}$ are $d_{k_i}$ copies of

1836:     $-S_{k_i}$'s followed by a $+S_{\beta}$; and

1837:   \item there are $\ell-m_i$ copies of $-S_{\alpha}$'s available in

1838:     $C'_1,\ldots,C'_{n+1}$.

1839:   \end{itemize}

1840: \end{quote}

1841: It is then not hard to see that after the execution of the first

1842: for-loop of \fname{Pairwise}, the remaining operation in each $C_i$ is a

1843: $-S_{\beta}$. Therefore, the second for-loop of \fname{Pairwise} can be

1844: finished, since there are $n$ copies of $+S_{\beta}$'s available in

1845: $C_{n+1}$.

1846:

1847: By Lemma~\ref{lemma:pairwise-1}, we know that after executing the first

1848: For-loop, the number of $-S_{\alpha}$'s in $C'_1,\ldots,C'_{n+1}$ is

1849: $\ell-m_n$, which is equal to the number of $+S_{\alpha}$'s at the end

1850: of $C_{n+1}$. Therefore, the last for-loop of \fname{Pairwise} can be

1851: finished. The lemma is proved.

1852:

1853: It remains to prove the above claim by induction on $i$. For

1854: convenience we abbreviate $k_i$ to $k$ for the rest of the proof.

1855: When $i=1$, we know $v_k$ does not have any incoming arcs from other

1856: nodes. Therefore, the for-loop with index $j$ in the first iteration does

1857: not execute any operation. We then consider the if-statement.

1858: \begin{itemize}

1859: \item If $O_k=-S_{\alpha}$, then $\cost{v_k}=-1$, and thus $m_1=-1$.

1860:   There is a $+S_{\alpha}$ in $C'_k$ by the definition of

1861:   \fname{Construct}. We can execute the else-part of the if-statement

1862:   without problem. Since the second operation in $C'_k$ is a

1863:   $-S_{\alpha}$, these two steps increase the number of $-S_{\alpha}$'s

1864:   available in $C'_1,\ldots,C'_{n+1}$ by one.

1865: \item If $O_k=+S_{\alpha}$, then $\cost{v_k}=1$, and thus $m_1=1$. Since

1866:   $v_k$ is the first node in $Y$, $\height{Y}$ is at least one, and thus

1867:   $\ell\geq 1$. We can therefore execute the then-part of the

1868:   if-statement without problem.  The number of $-S_{\alpha}$'s available

1869:   in $C'_1,\ldots,C'_{n+1}$ is decreased by one.

1870: \end{itemize}

1871: Clearly, after executing the first iteration, in which the only executed

1872: operation in $C_k$ is $O_k$, the remaining operations in $C_k$ are

1873: exactly as that described in the claim.  Note that before executing the

1874: first iteration, the number of available $-S_{\alpha}$'s is $\ell$ by

1875: the definition of \fname{Construct}.  Therefore, after executing the

1876: first iteration, the number of available $-S_{\alpha}$'s is exactly

1877: $\ell-m_1$. This confirms the inductive basis.

1878:

1879: Let $i'$ be an integer with $1<i'\leq n$. Assume that the claim holds

1880: for every $1\leq i<i'$. We show it holds for $i=i'$.  Consider the

1881: $i$-th iteration. Note that for every $j$ such that $v_jv_k$ is an arc

1882: of $G_0$, $O_j$ must have been executed.  By the inductive hypothesis

1883: we know those $d_j$ copies of $-S_j$'s are already available before

1884: executing the $i$-th iteration. Therefore, the for-loop with index $j$

1885: will proceed without problem, since there are exactly $d_j$ copies of

1886: $+S_j$'s in $G_1$ by the definition of \fname{Construct}.  We then

1887: consider the if-statement.

1888: \begin{itemize}

1889: \item If $O_k=-S_{\alpha}$, then $m_i=m_{i-1}-1$. We know there is a

1890: $+S_{\alpha}$ in $C'_k$.  Thus, the else-part can proceed without

1891: problem. Since the second operation in $C'_k$ is a $-S_{\alpha}$,

1892: these two steps increase the number of available $-S_{\alpha}$'s in

1893: $C'_1,\ldots,C'_{n+1}$ by one.

1894: \item If $O_k=+S_{\alpha}$, then $m_i=m_{i-1}+1$. The inductive hypothesis

1895: says that the number of $-S_{\alpha}$'s available in

1896: $C'_1,\ldots,C'_{n+1}$ is $\ell-m_{i-1}$ before executing the $i$-th

1897: iteration.  That number is at least one since

1898: $\ell-m_{i-1}-1=\ell-m_i\geq 0$. Therefore, the then-part of the

1899: if-statement can be executed without problem. The number of available

1900: $-S_{\alpha}$'s in $C'_1,\ldots,C'_{n+1}$ is decreased by one.

1901: \end{itemize}

1902: Therefore, the $i$-th iteration can be executed, and thus the remaining

1903: operations in $C_k$ are as required.

1904:

1905: It follows from the inductive hypothesis that the number of available

1906: $-S_{\alpha}$ in $C'_1,\ldots,C'_{n+1}$ is $\ell-m_{i-1}$.  By the

1907: above case analysis we see that the number is exactly $\ell-m_i$ after

1908: executing the $i$-th iteration. The claim is proved.

1909: \end{proof}

1910:

1911: If $G_1$ has a valid schedule, then by Lemma~\ref{lemma:np-1}(2) we know

1912: $\height{G_0}\leq\ell$. It then follows from the proof of

1913: Lemma~\ref{lemma:np-2} that $G_1$ has a valid schedule executable by

1914: \fname{Pairwise}.  Therefore, we have the following lemma.

1915:

1916: \begin{lemma}

1917: \label{lemma:pairwise-2}

1918: $G_1$ has a valid schedule if and only if $G_1$ has a valid schedule

1919: executable by \fname{Pairwise}.

1920: \end{lemma}

1921:

1922: \subsection{Second Step}

1923: \label{subsec:second-step}

1924: In this subsection we show that the $G_1$ constructed in the first step

1925: can be simulated by another chain graph $G_2$, which uses only two

1926: semaphores, $T_1$ and $T_2$. $G_2$ has $2n+3$ chains. The first chain,

1927: denoted $C_0$, is composed of two $-T_1$'s and two $-T_2$'s.  The

1928: remaining $2n+2$ chains are obtained from those of $G_1$ as follows.

1929: We replace every operation $-S_i$ (and $+S_i$) by a {\em unit} $-U_i$

1930: (and $+U_i$) for each $1\leq i\leq n+2$.  Each unit, $-U_i$ or $+U_i$,

1931: is a sequence of operations on $T_1$ and $T_2$, as shown in

1932: Figure~\ref{fig:two-semaphores}\note{Figure~\ref{fig:two-semaphores}}.

1933: We also denote those $2n+2$ chains of

1934: $G_2$ by $C_1,\ldots,C_{n+1}$ and $C'_1,\ldots,C'_{n+1}$. Clearly, $G_2$

1935: can be constructed in polynomial time.

1936: \begin{figure}%[p]

1937:   \begin{center}

1938:     \leavevmode

1939:     \begin{displaymath}

1940:       \begin{array}[t]{l}

1941:         +T_1\\+T_2\\

1942:         \left.\!\!\!

1943:         \begin{array}[r]{c}

1944:           -T_1\\+T_2\\

1945:           \vdots\\

1946:           -T_1\\+T_2

1947:         \end{array}

1948:         \right\}\mbox{$i+1$ pairs}\\

1949:         -T_2\\-T_2\\-T_2\\-T_2

1950:       \end{array}\qquad

1951:       \begin{array}[t]{l}

1952:         +T_1\\+T_2\\

1953:         \left.\!\!\!

1954:         \begin{array}[r]{c}

1955:           +T_1\\-T_2\\

1956:           \vdots\\

1957:           +T_1\\-T_2

1958:         \end{array}

1959:         \right\}\mbox{$i+1$ pairs}\\

1960:         +T_2\\+T_2\\-T_1\\-T_1

1961:       \end{array}

1962:     \end{displaymath}

1963:     \caption{The sequence of operations for a $-U_i$ is at left and that

1964:       for a $+U_i$ is at right, for any $1\leq i\leq n+2$.}

1965:     \label{fig:two-semaphores}

1966:   \end{center}

1967: \end{figure}

1968:

1969: Note that the sequence of operations in each unit is arranged such that

1970: only a $-U_i$ and a $+U_i$ can ``unlock'' each other. To be more

1971: specific, suppose each of $T_1$ and $T_2$ has initial value -2, which

1972: will be the case if the four operations in $C_0$ are executed.  Consider

1973: a graph $U_{ij}$ for some $1\leq i,j\leq n+2$ composed of two units,

1974: $-U_i$ and $+U_j$, each forms a single chain. One can easily verify that

1975: $U_{ij}$ has a valid schedule if $i=j$.

1976: %(In fact it also holds for the

1977: %other direction. We do not emphasize it, however, because it is not that

1978: %relevant to our proof.)

1979: Moreover, after executing all the operations of $U_{ii}$, the values of

1980: $T_1$ and $T_2$ go back to $-2$.

1981:

1982: We claim that $G_1$ has a valid schedule if and only if $G_2$ has a

1983: valid schedule. The only-if part is straightforward.  Suppose $G_1$

1984: has a valid schedule.  By Lemma~\ref{lemma:pairwise-2}, $G_1$

1985: has a valid schedule executable by \fname{Pairwise}.  Note that we can

1986: execute the four operations of $C_0$ first, which decrease the value

1987: of both semaphores down to -2. Clearly, the remaining $2n+2$ chains of

1988: units can be completely pairwisely executed by following the sequence

1989: of corresponding operations in $G_1$ executed by \fname{Pairwise}.

1990: Therefore, $G_2$ has a valid schedule.

1991:

1992: It takes some added work to prove the other direction of the above

1993: claim.  A unit is {\em active} if its third operation is executed.  A

1994: unit is {\em finished} (and thus inactive) if its fifth-to-last

1995: operation is executed.  Suppose $G_2$ has a valid schedule. Consider

1996: the sequence of the units of $G_2$ that become active in the valid

1997: schedule.  It follows from the following lemma that the corresponding

1998: sequence of operations of $G_1$ is a valid schedule of $G_1$. In fact

1999: it is ``pairwise'', since in the schedule each $-S_i$ is immediately

2000: followed by a $+S_i$.

2001:

2002: \begin{lemma}

2003: \label{lemma:two-semaphores}

2004: Consider the execution of a valid subschedule.

2005: \begin{enumerate}

2006: \item When there is no

2007: active unit, the next unit that becomes active must be a $-U_i$ for

2008: some $1\leq i\leq n+2$.

2009: \item Before that active $-U_i$ is finished, a

2010: $+U_i$ must become active.

2011: \item No unit will become active unless these

2012: two active units are finished.

2013: \end{enumerate}

2014: \end{lemma}

2015:

2016: \begin{proof}

2017: At the beginning of the valid schedule, no unit is active. We show the

2018: first statement of the lemma holds.  At this moment there are two

2019: $-T_1$'s and two $-T_2$'s available (in $C_0$). They are our only hope

2020: for activating any unit, since each unit is guarded by two $+T_1$'s

2021: and two $+T_2$'s.  Assume for a contradiction that the first unit

2022: becoming active is a $+U_i$ for some $1\leq i\leq n+2$.  Note that as

2023: soon as the first $+U_i$ becomes active, at least two $+T_1$'s are

2024: already executed. Since at most two $-T_1$'s are executed so far,

2025: there is no way to activate any other unit. The execution thus cannot

2026: proceed.

2027:

2028: When the first unit $-U_i$ becomes active, one can see that the second

2029: statement of the lemma holds by verifying the following.

2030: \begin{itemize}

2031: \item The active $-U_i$ will not be finished unless another unit

2032: becomes active, since otherwise the execution will be blocked by

2033: some $+T_2$'s.

2034: \item The next active unit must be a $+U_j$ for some $1\leq j\leq

2035: n+2$, since otherwise the execution will be blocked by some

2036: $+T_2$'s.

2037: \item if $i<j$, the execution will be blocked by some $+T_1$'s. If

2038: $i>j$, then the execution will be blocked by some $+T_2$'s. Therefore, the

2039: next active unit must be a $+U_i$.

2040: \end{itemize}

2041:

2042: When those two units are active, in order to activate other units, we

2043: can only hope for the $-T_1$'s at the end of the active $+U_i$. In

2044: order to reach those $-T_1$'s, the preceding consecutive $+T_2$'s must

2045: be penetrated.  Hence, at least two $-T_2$'s at the end of the active

2046: $-U_i$ must be executed first. Therefore, those two active units $-U_i$

2047: and $+U_i$ must be finished before any other unit becomes active.

2048: This confirms the third statement of the lemma.

2049:

2050: Note that as soon as the active $+U_i$ is finished (and so must be the

2051: active $-U_i$), the situation is exactly the same as the situation at

2052: the very beginning of the execution. Namely we have two $-T_1$'s and

2053: two $-T_2$'s available, which are again our only hope for activating

2054: any other units.  Therefore, all the above argument follows

2055: inductively.  The lemma is proved.

2056: \end{proof}

2057:

2058: \subsection{Third Step}

2059: Let $v$ be the first operation of the $C_0$ in $G_2$. Let $w$ be the

2060: last operation of the $C_{n+1}$ in $G_2$.  We claim that $v\chb w$ if

2061: and only if $G_2$ has a valid schedule.  note that $v$ is always the

2062: first node in any valid subschedule of $G_2$. The if-part of the claim

2063: holds trivially. It remains to prove the only-if-part of the claim.

2064:

2065: Let $X$ be a valid subschedule of $G_2$ in which $v$ precedes $w$.

2066: Consider the sequence of the units of $G_2$ that become active while

2067: executing $X$.  It follows from Lemma~\ref{lemma:two-semaphores} that

2068: the corresponding sequence of operations of $G_1$ is a valid subschedule

2069: of $G_1$, which definitely contains the last $+S_{\alpha}$ of the

2070: $C_{n+1}$ in $G_1$. Therefore, $G_1$ has a valid schedule by

2071: Lemmas~\ref{lemma:np-1}(2) and~\ref{lemma:np-2}.  Finally it follows

2072: from the claim in \S\ref{subsec:second-step} that $G_2$ has a

2073: valid schedule.

2074:

2075: \end{document}

2076: