0106:math0106151/ap6.tex

1: \documentclass[matprg]{mcsreport}

2:

3: %

4: %

5: %

6: \usepackage{subeqn}

7: %

8: \usepackage{extra}

9: %

10: \usepackage{rotating}

11: \usepackage{epsfig}

12:

13: %

14: %

15: %

16: %

17: \def\labtag#1{\label{#1}}

18:

19: \newcommand{\epstol}{\epsilon_{\rm tol}}

20:

21: \begin{document}

22: \author{Jeff Linderoth \and Stephen Wright}

23:

24: \title{Decomposition Algorithms for Stochastic

25:   Programming on a Computational Grid}

26:

27: \titlerunning{Stochastic Programming on a Computational Grid}

28:

29: \institute{Jeff Linderoth\at

30: Axioma Inc., 501-F Johnson Ferry Road, Suite 450,

31: Marietta, GA 30068;

32: {\tt jlinderoth@axiomainc.com}

33: \and

34: Stephen Wright\at

35: Mathematics and Computer Science Division,

36: Argonne National Laboratory, 9700 South Cass Avenue,

37: Argonne, IL 60439; {\tt wright@mcs.anl.gov}}

38: %

39: %

40: \date{\today}

41: %

42: \subclass{90C15, 65K05, 68W10}

43: \reportnumber{P875--0401, April, 2001}

44: \maketitle

45:

46: \begin{abstract}

47:   We describe algorithms for two-stage stochastic linear programming

48:   with recourse and their implementation on a grid computing platform.

49:   In particular, we examine serial and asynchronous versions of the

50:   L-shaped method and a trust-region method. The parallel platform of

51:   choice is the dynamic, heterogeneous, opportunistic platform

52:   provided by the Condor system. The algorithms are of master-worker

53:   type (with the workers being used to solve second-stage problems),

54:   and the MW runtime support library (which supports master-worker

55:   computations) is key to the implementation.  Computational results

56:   are presented on large sample average approximations of problems

57:   from the literature.

58: \end{abstract}

59:

60: \section{Introduction} \labtag{introduction}

61:

62: Consider the following stochastic optimization problem:

63: \beq \labtag{gen.sp}

64: \min_{x \in S} \, F(x) \defeq \sum_{i=1}^N p_i f(x,\omega_i),

65: \eeq

66: %

67: where $S \in \R^n$ is a constraint set, $\Omega = \{ \omega_1,

68: \omega_2, \dots, \omega_N \}$ is the set of outcomes (consisting of

69: $N$ distinct scenarios), and $p_i$ is the probability associated with

70: each scenario. Problems of the form \eqnok{gen.sp} can arise directly

71: (in many applications, the number of scenarios is naturally finite),

72: or as discretizations of problems over continuous probability spaces,

73: obtained by approximation or sampling. In this paper, we discuss the

74: {\em two-stage stochastic linear programming problem with fixed

75:   resource}, which is a special case of \eqnok{gen.sp} defined as follows:

76: \begin{subequations} \labtag{2stage.lp}

77: \beqa

78: \labtag{2stage.lp.obj}

79: & \min \, c^T x + \sum_{i=1}^N p_i q(\omega_i)^T y(\omega_i), \sgap

80: \mbox{subject to} \\

81: \labtag{2stage.lp.x}

82: & Ax=b, \;\; x \ge 0, \\

83: \labtag{2stage.lp.y}

84: & W y(\omega_i) = h(\omega_i) - T(\omega_i) x, \;\;

85: y(\omega_i) \ge 0,

86: \sgap i=1,2,\dots,N.

87: \eeqa

88: \end{subequations}

89: The unknowns in this formulation are $x$ and $y(\omega_1),

90: y(\omega_2), \dots, y(\omega_N)$, where $x$ contains the ``first-stage

91: variables'' and each $y(\omega_i)$ contains the ``second-stage

92: variables'' associated with the $i$th scenario. The $i$th scenario is

93: characterized by the probability $p_i$ and the data objects

94: $(q(\omega_i), T(\omega_i), h(\omega_i))$.

95:

96: The formulation \eqnok{2stage.lp} is sometimes known as the

97: ``deterministic equivalent'' because it lists the unknowns for all

98: scenarios explicitly and poses the problem as a (potentially very

99: large) structured linear program. An alternative formulation is

100: obtained by recognizing that each term in the second-stage summation

101: in \eqnok{2stage.lp.obj} is  a piecewise linear convex function

102: of $x$. Defining the $i$th second-stage problem as a linear program (LP)

103: parametrized by the first-stage variables $x$, that is,

104: \begin{subequations}

105: \labtag{second-stage-lp}

106: \beqa

107: \labtag{second-stage-lp.1}

108: & \cQ_i(x) \defeq \min_{y(\omega_i)} \,  q(\omega_i)^T y(\omega_i) \;\;

109: \mbox{subject to} \\

110: \labtag{second-stage-lp.2}

111: &  W y(\omega_i) = h(\omega_i) - T(\omega_i) x,

112: \;\;  y(\omega_i) \ge 0,

113: \eeqa

114: \end{subequations}

115: and defining the objective in \eqnok{2stage.lp.obj} as

116: \beq \labtag{def.Q}

117: \cQ(x) \defeq c^Tx + \sum_{i=1}^N p_i \cQ_i(x),

118: \eeq

119: we can restate \eqnok{2stage.lp} as

120: \beq \labtag{2stage.pl}

121: \min_x \, \cQ(x), \;\; \mbox{subject to} \; Ax=b, \; x \ge 0.

122: \eeq

123: %

124: %

125: %

126: %

127: %

128:

129: We note several features about the problem \eqnok{2stage.pl}.  First, it

130: is clear from \eqnok{def.Q} and \eqnok{second-stage-lp} that $\cQ(x)$

131: can be evaluated for a given $x$ by solving the $N$ linear programs

132: \eqnok{second-stage-lp} separately. Second, we can derive subgradient

133: information for $\cQ_i(x)$ by considering dual solutions of

134: \eqnok{second-stage-lp}. If we fix $x=\hat{x}$ in

135: \eqnok{second-stage-lp}, the primal solution $y(\omega_i)$ and dual

136: solution $\pi(\omega_i)$ satisfy the following optimality conditions:

137: \beqas

138: q(\omega_i) - W^T \pi(\omega_i) \ge 0 & \perp &  y(\omega_i) \ge 0, \\

139: W y(\omega_i) & = & h(\omega_i) - T(\omega_i) \hat{x}.

140: \eeqas

141: From these two conditions we obtain that

142: \beq \labtag{theta.2}

143: \cQ_i(\hat{x}) = q(\omega_i)^T y(\omega_i) =

144: \pi(\omega_i)^T W y(\omega_i) =

145: \pi(\omega_i)^T [ h(\omega_i) - T(\omega_i) \hat{x} ].

146: \eeq

147: Moreover, since $\cQ_i$ is piecewise linear and convex, we have for

148: any $x$ that

149: \beq \labtag{subg.property}

150: \cQ_i(x) - \cQ_i(\hat{x}) \ge

151: \pi(\omega_i)^T [ -T(\omega_i) x  + T(\omega_i) \hat{x} ] =

152: \left( - T(\omega_i)^T \pi(\omega_i) \right)^T (x-\hat{x}),

153: \eeq

154: which implies that

155: \beq \labtag{subg.Qi}

156: -T(\omega_i)^T \pi(\omega_i) \in \partial \cQ_i(\hat{x}),

157: \eeq

158: where $\partial \cQ_i(\hat{x})$ denotes the subgradient of $\cQ_i$ at

159: $\hat{x}$. By Rockafellar~\cite[Theorem~23.8]{Roc70}, using

160: polyhedrality of each $\cQ_i$, we have from \eqnok{def.Q} that

161: \beq \labtag{subg.Q}

162: \partial \cQ(\hat{x}) = c + \sum_{i=1}^N p_i \partial \cQ_i(\hat{x}),

163: \eeq

164: for every $\hat{x}$ that lies in the domain of each $\cQ_i$,

165: $i=1,2,\dots,N$.

166:

167: Let $\cS$ denote the solution set for \eqnok{2stage.pl}; we assume for

168: most of the paper that $\cS$ is nonempty. Since \eqnok{2stage.pl} is a

169: convex program, $\cS$ is closed and convex, and the projection

170: operator $P(\cdot)$ onto $\cS$ is well defined.  Because the objective

171: function in \eqnok{2stage.pl} is piecewise linear and the constraints

172: are linear, the problem has a {\em weak sharp minimum} (Burke and

173: Ferris~\cite{BurF93}); that is, there exists $\hat{\epsilon}>0$ such

174: that

175: %

176: \beq \labtag{weak.sharp}

177: \cQ(x) - \cQ^* \ge \hat{\epsilon} \| x- P(x) \|_{\infty},

178: \;\; \mbox{for all $x$ with  $Ax=b$, $x \ge 0$,}

179: \eeq

180: where $\cQ^*$ is the optimal value of the objective.

181:

182: The subgradient information can be used by algorithms in different

183: ways. Successive estimates of the optimal $x$ can be obtained by

184: minimizing over a convex underestimate of $\cQ(x)$ constructed from

185: subgradients obtained at earlier iterations,

186: as in the L-shaped method described in

187: Section~\ref{sec:lshaped}. This method can be stabilized by the use of

188: a quadratic regularization term (Ruszczy{\'n}ski~\cite{Rus86},

189: Kiwiel~\cite{Kiw90}) or by the explicit use of a trust region, as in

190: the $\ell_{\infty}$ trust-region approach described in

191: Section~\ref{sec:tr}.  Alternatively, when an upper bound on the

192: optimal value $\cQ^*$ is available, one can derive each new iterate

193: from an approximate analytic center of an approximate epigraph. The

194: latter approach has been explored by Bahn et al.~\cite{BahDGV95} and

195: applied to a large stochastic programming problem by Frangi{\`e}re,

196: Gondzio, and Vial~\cite{FraGV00}.

197:

198: Because evaluation of $\cQ_i(x)$ and elements of its subdifferential can be

199: carried out independently for each $i=1,2,\dots,N$, and because such

200: evaluations usually constitute the bulk of the computational workload,

201: implementation on parallel computers is possible.  We can partition

202: second-stage scenarios $i=1,2,\dots,N$ into ``chunks'' and define a

203: computational task to be the solution of all the LPs

204: \eqnok{second-stage-lp} in a single chunk. Each such task could be

205: assigned to an available worker processor. Relationships between the

206: solutions of \eqnok{second-stage-lp} for different scenarios can be

207: exploited within each chunk (see Birge and

208: Louveaux~\cite[Section~5.4]{BirL97}).  The number of second-stage LPs

209: in each chunk should be chosen to ensure that the computation does

210: not become communication bound. That is, each chunk should be large

211: enough that its processing time significantly exceeds the time

212: required to send the data to the worker processor and to return the

213: results.

214:

215: %

216: %

217: %

218: %

219: %

220: %

221: %

222: %

223: %

224: %

225: %

226:

227: %

228: %

229: %

230: %

231: %

232: %

233:

234: In this paper, we describe implementations of decomposition algorithms

235: for stochastic programming on a dynamic, heterogeneous computational

236: grid made up of workstations, PCs (from clusters), and supercomputer

237: nodes.  Specifically, we use the environment provided by the Condor

238: system~\cite{condor}.  We also discuss the MW runtime library (Goux et

239: al.~\cite{GouLY00,GouKLY00}), a software layer that significantly

240: simplifies the process of implementing parallel algorithms in Condor.

241:

242: %

243: %

244: %

245: %

246: %

247: %

248: %

249:

250: %

251: %

252: %

253: %

254: %

255: %

256: %

257: %

258: %

259: %

260:

261: %

262: %

263: %

264: %

265: %

266: %

267: %

268: %

269: %

270: %

271: %

272: %

273: %

274: %

275: %

276: %

277: %

278: %

279: %

280: %

281: %

282: %

283: %

284: %

285: %

286: %

287: %

288: %

289: %

290:

291: For the dimensions of problems and parallel platforms considered in

292: this paper, evaluation of the functions $\cQ_i(x)$ and their

293: subgradients at a single $x$ often is insufficient to make

294: effective use of the available processors. Moreover, ``synchronous''

295: algorithms---those that depend for efficiency on all tasks completing

296: in a timely fashion---run the risk of poor performance in an

297: environment such as ours, in which failure or suspension of worker

298: processors while they are processing a task is not an infrequent

299: event.  We are led therefore to ``asynchronous'' approaches that

300: consider different points $x$ simultaneously.  Asynchronous variants

301: of the L-shaped and $\ell_{\infty}$ trust-region methods are described

302: in Sections~\ref{sec:lshaped:async} and \ref{sec:atr}, respectively.

303:

304: %

305: %

306: %

307: %

308: %

309: %

310:

311: Other parallel algorithms for stochastic programming have been devised

312: by Birge et al.~\cite{BirDHS98}, Birge and Qi~\cite{BirQ88}, and

313: Frangi{\`e}re, Gondzio, and Vial~\cite{FraGV00}.  In \cite{BirDHS98}, the

314: focus is on multistage problems in which the scenario tree is

315: decomposed into subtrees, which are processed independently and in

316: parallel on worker processors. Dual solutions from each subtree are

317: used to construct a model of the first-stage objective (using an

318: L-shaped approach like that described in Section~\ref{sec:lshaped}),

319: which is periodically solved by a master process to obtain a new

320: candidate first-stage solution $x$.  Parallelization of the linear

321: algebra operations in interior-point algorithms is considered in

322: \cite{BirQ88}, but this approach involves significant data movement

323: and does not scale particularly well.  In \cite{FraGV00}, the

324: second-stage problems \eqnok{second-stage-lp} are solved concurrently

325: and inexactly by using an interior-point code. The master process

326: maintains an upper bound on the optimal objective, and this bound

327: along with the subgradients obtained from the second-stage problems

328: yields a polygon whose (approximate) analytic center is calculated

329: periodically to obtain a new candidate $x$. The approach is based in

330: part on an algorithm described by Gondzio and Vial~\cite{GonV00}. The

331: numerical results in \cite{FraGV00} report solution of a two-stage

332: stochastic linear program with $2.6$ million variables and $1.2$

333: million constraints in three hours on a cluster of 10 Linux PCs.

334:

335:

336: \section{L-Shaped Methods} \labtag{sec:lshaped}

337:

338: We now describe the L-shaped method, a fundamental algorithm for

339: solving \eqnok{2stage.pl}, and an asynchronous variant.

340:

341: \subsection{The Multicut L-Shaped Method} \labtag{sec:lshaped:multicut}

342:

343: The L-shaped method of Van Slyke and Wets~\cite{VanW69} for solving

344: \eqnok{2stage.pl} proceeds by finding subgradients of partial sums of

345: the terms that make up $\cQ$ \eqnok{def.Q}, together with linear

346: inequalities that define the domain of $\cQ$.  The method is

347: essentially Benders decomposition~\cite{Ben62}, enhanced to deal with

348: infeasible iterates.  A full description is given in Chapter 5 of

349: Birge and Louveaux~\cite{BirL97}. We sketch the approach here and

350: show how it can be implemented in an asynchronous fashion.

351:

352: We suppose that the second-stage scenarios indexed by $1,2,\dots, N$

353: are partitioned into $T$ clusters denoted by $\cN_1, \cN_2, \dots,

354: \cN_T$.  Let $\cQ_{[j]}$ represent the partial sum

355: from \eqnok{def.Q} corresponding to the cluster $\cN_j$:

356: \beq \labtag{thetaj}

357: \cQ_{[j]}(x) = \sum_{i \in \cN_j} p_i \cQ_i(x).

358: \eeq

359: %

360: The algorithm maintains a model function $m^k_{[j]}$, which is a

361: piecewise linear lower bound on $\cQ_{[j]}$ for each $j$. We define

362: this function at iteration $k$ by

363: \beq \labtag{Qjk}

364: m_{[j]}^k(x) = \inf \{ \theta_j \, | \,

365:  \theta_j e \ge F_{[j]}^k x + f_{[j]}^k \},

366: \eeq

367: %

368: where $F_{[j]}^k$ is a matrix whose rows are subgradients of

369: $\cQ_{[j]}$ at previous iterates of the algorithm, and

370: $e=(1,1,\dots,1)^T$.  The rows of $\theta_j e \ge F_{[j]}^k x +

371: f_{[j]}^k$ are referred to as {\em optimality cuts}. Upon evaluating

372: $\cQ_{[j]}$ at the new iterate $x^k$ by solving

373: \eqnok{second-stage-lp} for each $i \in \cN_j$, a subgradient

374: $g_j \in \partial \cQ_{[j]}$ can be obtained from a formula

375: derived from \eqnok{subg.Qi} and \eqnok{subg.Q}, namely,

376: \beq \labtag{subg.Qj}

377: g_j = - \sum_{i \in \cN_j} p_i T(\omega_i)^T \pi(\omega_i),

378: \eeq

379: %

380: where each $\pi(\omega_i)$ is an optimal dual solution of

381: \eqnok{second-stage-lp}.

382: %

383: %

384: %

385: %

386: %

387: %

388: %

389: %

390: Since by the subgradient property we have

391: \[

392: \cQ_{[j]}(x) \ge g_j^T x + (\cQ_{[j]}(x^k) - g_j^T x^k),

393: \]

394: we can obtain $F_{[j]}^{k+1}$ from $F_{[j]}^k$ by appending the row

395: $g_j^T$, and $f_{[j]}^{k+1}$ from $f_{[j]}^k$ by appending the element

396: $(\cQ_{[j]}(x^k) - g_j^T x^k)$. In order to keep the number of cuts reasonable,

397: the cut is not added if $m^k_{[j]}$ is not greater than the value

398: predicted by the lower bounding approximation (see \eqnok{master}

399: below).  In this case, the current set of cuts in $F_{[j]}^k$,

400: $f_{[j]}^k$ adequately models $\cQ_{[j]}$. In addition, we may also

401: wish to delete some rows from $F_{[j]}^{k+1}$, $f_{[j]}^{k+1}$

402: corresponding to facets of the epigraph of \eqnok{Qjk} that we do not

403: expect to be active in later iterations.

404:

405: The algorithm also maintains a collection of {\em feasibility cuts}

406: of the form

407: \beq \labtag{feas.cuts}

408: D^k x \ge d^k,

409: \eeq

410: %

411: which have the effect of excluding values of $x$ that were found to be

412: infeasible, in the sense that some of the second-stage linear programs

413: \eqnok{second-stage-lp} are infeasible for these values of $x$.  By

414: Farkas's theorem (see Mangasarian~\cite[p.~31]{Man69}), if the

415: constraints \eqnok{second-stage-lp.2} are infeasible, there exists

416: $\pi(\omega_i)$ with the following properties:

417: \[

418: W^T \pi(\omega_i) \le 0, \sgap

419: \left[ h(\omega_i) - T(\omega_i) x \right]^T \pi(\omega_i) > 0.

420: \]

421: (In fact, such a $\pi(\omega_i)$ can be obtained from the dual simplex

422: method for the feasibility problem \eqnok{second-stage-lp.2}.) To

423: exclude this $x$ from further consideration, we simply add the

424: inequality $[h(\omega_i) - T(\omega_i) x]^T \pi(\omega_i) \le 0$ to

425: the constraint set, by appending the row vector $\pi(\omega_i)^T

426: T(\omega_i)$ to $D^k$ and the element $\pi(\omega_i)^T h(\omega_i)$ to

427: $d^k$ in \eqnok{feas.cuts}.

428:

429: The iterate $x^k$ of the multicut L-shaped method is obtained by solving

430: the following approximation to \eqnok{2stage.pl}:

431: \beq \labtag{2stage.pl.L}

432: \min_x \, m_k(x),

433: \;\; \mbox{subject to} \; D^k x \ge d^k, \; Ax=b, \; x \ge 0,

434: \eeq

435: where

436: \beq \labtag{def.mk}

437: m_k(x) \defeq c^Tx + \sum_{j=1}^T m_{[j]}^k(x).

438: \eeq

439: In practice, we substitute from

440: \eqnok{Qjk} to obtain the following linear program:

441:  \begin{subequations} \labtag{master}

442: \beqa

443: \labtag{master.1}

444: \min_{x, \theta_1, \dots, \theta_T} \, c^Tx + \sum_{j=1}^T \theta_j, &&

445: \mbox{subject to} \\

446: \labtag{master.4}

447: \theta_j e & \ge & F_{[j]}^k x + f_{[j]}^k, \sgap j=1,2,\dots,T, \\

448: \labtag{master.3}

449: D^k x & \ge & d^k, \\

450: \labtag{master.2}

451: Ax=b, \;\; x & \ge & 0.

452: \eeqa

453: \end{subequations}

454:

455: The L-shaped method proceeds by solving \eqnok{master} to generate a

456: new candidate $x$, then evaluating the partial sums \eqnok{thetaj} and

457: adding optimality and feasibility cuts as described above. The process

458: is repeated, terminating when the improvement in objective promised by

459: the subproblem \eqnok{2stage.pl.L} becomes small.

460:

461: For simplicity we make the following assumption for the remainder of

462: the paper.

463: %

464: \begin{assumption} \labtag{ass:S}

465: \mbox{}

466: \begin{itemize}

467: \item[(i)] The problem has complete recourse; that is, the feasible

468:   set of \eqnok{second-stage-lp} is nonempty for all $i=1,2,\dots,N$

469:   and all $x$, so that the domain of $\cQ(x)$ in \eqnok{def.Q} is $\R^n$.

470: \item[(ii)] The solution set $\cS$ is nonempty.

471: \end{itemize}

472: \end{assumption}

473: %

474: Under this assumption, feasibility cuts of the form \eqnok{feas.cuts},

475: \eqnok{master.3} do not appear during the course of the algorithm. Our

476: algorithms and their analysis can be generalized to handle situations

477: in which Assumption~\ref{ass:S} does not hold, but since our

478: development is complex enough already, we postpone discussion of these

479: generalizations to a future report.

480:

481:

482: Using Assumption~\ref{ass:S}, we can specify the L-shaped algorithm

483: formally as follows:

484: %

485: \btab

486: \> {\bf Algorithm LS} \\

487: \> choose tolerance $\epstol$; \\

488: \> choose starting point $x^0$; \\

489: \> define initial model $m_0$ to be a piecewise linear

490: underestimate of $\cQ(x)$ \\

491: \>\>  such that $m_0(x^0) = \cQ(x^0)$ and $m_0$ is bounded below; \\

492: \> $\cQ_{\rm min} \leftarrow \cQ(x^0)$; \\

493: \> {\bf for} $k=0,1,2,\dots$ \\

494: \>\> obtain $x^{k+1}$ by solving \eqnok{2stage.pl.L}; \\

495: \>\> {\bf if}

496:       $\cQ_{\rm min}  - m_k(x^{k+1}) \le \epstol (1+|\cQ_{\rm min}|) $ \\

497: \>\>\> STOP; \\

498: \>\> evaluate function and subgradient information at $x^{k+1}$; \\

499: \>\> $\cQ_{\rm min} \leftarrow  \min(\cQ_{\rm min}, \cQ(x^{k+1}))$; \\

500: \>\> obtain $m_{k+1}$ by adding optimality cuts to $m_k$; \\

501: \> {\bf end(for).}

502: \etab

503:

504: \subsection{An Asynchronous Parallel Variant of the L-Shaped Method}

505: \labtag{sec:lshaped:async}

506:

507: The L-shaped approach lends itself naturally to implementation in a

508: master-worker framework. The problem \eqnok{master} is

509: solved by the master process, while solution of each cluster

510: $\cN_j$ of second-stage problems, and generation of the associated

511: cuts, can be carried out by the worker processes running in parallel.

512: This approach can be adapted for an asynchronous, unreliable

513: environment in which the results from some second-stage clusters are

514: not returned in a timely fashion. Rather than having all the worker

515: processors sit idle while waiting for the tardy results, we can

516: proceed without them, re-solving the master by using the additional cuts

517: that were generated by the other second-stage clusters.

518:

519: We denote the model function simply by $m$ for the asynchronous

520: algorithm, rather than appending a subscript. Whenever the time comes

521: to generate a new iterate, the current model is used. In practice, we

522: would expect the algorithm to give different results each time it is

523: executed, because of the unpredictable speed and order in which the

524: functions are evaluated and subgradients generated. Because of

525: Assumption~\ref{ass:S}, we can write the subproblem

526: \beq \labtag{als.subprob}

527: \min_x \, m(x),

528: \;\; \mbox{subject to} \; Ax=b, \; x \ge 0.

529: \eeq

530:

531: Algorithm ALS, the asynchronous variant of the L-shaped method that we

532: describe here, is made up of four key operations, three of which

533: execute on the master processor and one of which runs on the

534: workers. These operations are as follows:

535: %

536: \bi

537: \item {\tt partial\_evaluate}. This is the routine for evaluating

538:   $\cQ_{[j]}(x)$ defined by \eqnok{thetaj} for a given $x$ and $j$,

539:   in the process generating a subgradient $g_j$ of $\cQ_{[j]}(x)$. It runs on a

540:   worker processor and returns its results to the master by

541:   activating the routine {\tt act\_on\_completed\_task} on the master

542:   processor.

543:

544: \item {\tt evaluate}. This routine, which runs on the master, simply

545:   places $T$ tasks of the type {\tt partial\_evaluate} for a given $x$ into the task

546:   pool for distribution to the worker processors as they become

547:   available.  The completion of these $T$ tasks is equivalent to evaluating $\cQ(x)$.

548:

549: \item {\tt initialize}. This routine runs on the master processor

550:   and performs initial bookkeeping, culminating in a call to {\tt

551:     evaluate} for the initial point $x^0$.

552:

553: \item {\tt act\_on\_completed\_task}. This routine, which runs on the

554: master, is activated whenever the results become available from a {\tt

555: partial\_evaluate} task. It updates the model and increments a counter

556: to keep track of the number of clusters that have been evaluated at

557: each candidate point. When appropriate, it solves the master problem

558: with the latest model to obtain a new candidate iterate\, and will call {\tt evaluate}.

559:

560: \ei

561:

562: In our implementation of both this algorithm and its more

563: sophisticated cousin Algorithm ATR of Section~\ref{sec:atr}, we may

564: define a single task to consist of the evaluation of more than one

565: cluster $\cN_j$. We may bundle, say, $5$ or $10$ clusters into a

566: single task, in the interests of making the task large enough to

567: justify the master's effort in packing its data and unpacking its

568: results, and to maintain the ratio of compute time to communication

569: cost at a high level. For purposes of simplicity, however, we assume

570: in the descriptions both of this algorithm and of ATR that each task

571: consists of a single cluster.

572:

573: The implementation depends on a ``synchronicity'' parameter $\sigma$

574: which is the proportion of clusters that must be evaluated at a point

575: to trigger the generation of a new candidate iterate. Typical values

576: of $\sigma$ are in the range $0.25$ to $0.9$. A logical variable

577: ${\tt speceval}_k$ keeps track of whether $x^k$ has yet triggered a

578: new candidate. Initially, ${\tt speceval}_k$ is  set to ${\tt false}$,

579: then set to ${\tt true}$ when the proportion of evaluated clusters

580: passes the threshold $\sigma$.

581:

582: We now specify all the methods making up Algorithm ALS.

583:

584: \btab

585: \>{\bf ALS:} \ \ {\tt  partial\_evaluate}$(x^q,q,j,\cQ_{[j]}(x^q),g_j)$ \\

586: \> Given $x^q$, index  $q$, and  partition number $j$,

587: evaluate $\cQ{[j]}(x^q)$ from \eqnok{thetaj} \\

588: \>\> together with a partial subgradient $g_j$ from \eqnok{subg.Qj};

589: \\

590: \> Activate {\tt act\_on\_completed\_task}$(x^q,q,j,\cQ_{[j]}(x^q),g_j)$

591: on the master processor.

592: \etab

593:

594: \medskip

595:

596: \btab

597: \> {\bf ALS:} \ \ {\tt  evaluate}$(x^q,q)$ \\

598: \> {\bf for} $j=1,2,\dots, T$ (possibly concurrently) \\

599: \>\> {\tt partial\_evaluate}$(x^q,q,j,\cQ_{[j]}(x^q), g_j)$; \\

600: \> {\bf end (for)}

601: \etab

602:

603: \medskip

604:

605: \btab

606: \> {\bf ALS:} \ \ {\tt initialize} \\

607: \> choose tolerance $\epstol$; \\

608: \> choose starting point $x^0$; \\

609: \> choose threshold $\sigma \in (0,1]$; \\

610: \> $\cQ_{\rm min} \leftarrow \infty$; \\

611: \> $k \leftarrow 0$, ${\tt speceval}_0 \leftarrow {\tt false}$, $t_0 \leftarrow 0$; \\

612: \> {\tt evaluate}$(x^0,0)$.

613: \etab

614:

615: \medskip

616:

617: \btab

618: \> {\bf ALS:} \ \

619: {\tt act\_on\_completed\_task}$(x^q,q,j,\cQ_{[j]}(x^q),g_j)$ \\

620: \> $t_q \leftarrow t_q+1$; \\

621: \> add $\cQ_{[j]}(x^q)$ and cut $g_j$ to the model $m$; \\

622: \> {\bf if}  $t_q = T$ \\

623: \>\> $\cQ_{\rm min} \leftarrow \min ( \cQ_{\rm min}, \cQ(x^q))$; \\

624: \> {\bf else if}  $t_q \ge \sigma T$ {\bf and} not ${\tt speceval}_q$ \\

625: \>\> ${\tt speceval}_q \leftarrow ${\tt true}; \\

626: \>\> $k \leftarrow k+1$;  \\

627: \>\> solve  current model problem \eqnok{als.subprob} to obtain $x^{k+1}$; \\

628: \>\> {\bf if} $\cQ_{\rm min}  - m(x^{k+1}) \le \epstol (1+|\cQ_{\rm min}|) $ \\

629: \>\>\> STOP; \\

630: \>\> {\tt evaluate}$(x^k,k)$; \\

631: \> {\bf end (if)}

632:

633: \etab

634:

635: We present results for Algorithm ALS in Section~\ref{sec:results}.

636: While the algorithm is able to use a large number of worker processors

637: on our opportunistic platform, it suffers from the usual drawbacks of

638: the L-shaped method, namely, that cuts, once generated, must be

639: retained for the remainder of the computation to ensure convergence

640: and that large steps are typically taken on early iterations before a

641: sufficiently good model approximation to $\cQ(x)$ is created, making

642: it impossible to exploit prior knowledge about the location of the

643: solution.

644:

645: \section{A Bundle-Trust-Region Method} \labtag{sec:tr}

646:

647: Trust-region approaches can be implemented by making only minor

648: modifications to implementations of the L-shaped method, and they

649: possesses several practical advantages along with stronger convergence

650: properties. The trust-region methods we describe here are related to

651: the regularized decomposition method of Ruszczy{\'n}ski~\cite{Rus86}

652: and the bundle-trust-region approaches of Kiwiel~\cite{Kiw90} and

653: Hirart-Urruty and Lemar\'echal~\cite[Chapter~XV]{HirL93}. The main

654: differences are that we use box-shaped trust regions yielding linear

655: programming subproblems (rather than quadratic programs) and that our

656: methods manipulate the size of the trust region directly rather than

657: indirectly via a regularization parameter.

658:

659: When requesting a subgradient of $\cQ$ at some

660: point $x$, our algorithms do not require particular (e.g., extreme)

661: elements of the subdifferential to be supplied. Nor do they require

662: the subdifferential $\partial \cQ(x)$ to be representable as a convex

663: combination of a finite number of vectors. In this respect, our

664: algorithms contrast with that of Ruszczy{\'n}ski~\cite{Rus86}, for

665: instance, which exploits the piecewise-linear nature of the objectives

666: $\cQ_i$ in \eqnok{second-stage-lp}. Because of our weaker conditions

667: on the subgradient information, we cannot prove a finite termination

668: result of the type presented in \cite[Section~3]{Rus86}.  However,

669: these conditions potentially allow our algorithms to be extended to a

670: more general class of convex nondifferentiable functions. We hope to

671: explore these generalizations in future work.

672:

673: \subsection{A Method Based on $\ell_{\infty}$ Trust Regions}

674: \labtag{sec:tr:tr}

675:

676: A key difference between the trust-region approach of this section and

677: the L-shaped method of the preceding section is that we impose an

678: $\ell_{\infty}$ norm bound on the size of the step. It is implemented

679: by simply adding bound constraints to the linear programming

680: subproblem \eqnok{master} as follows:

681: \beq \labtag{master.tr.bounds}

682: -\Delta e \le x-x^k \le \Delta e,

683: \eeq

684: %

685: where $e=(1,1,\dots,1)^T$, $\Delta$ is the trust-region radius, and

686: $x^k$ is the current iterate. During the $k$th iteration, it may be

687: necessary to solve several problems with trust regions of the form

688: \eqnok{master.tr.bounds}, with different model functions $m$ and

689: possibly different values of $\Delta$, before a satisfactory new

690: iterate $x^{k+1}$ is identified. We refer to $x^k$ and $x^{k+1}$ as

691: {\em major iterates} and the points $x^{k,\ell}$, $\ell=0,1,2,\dots$

692: obtained by minimizing the current model function subject to the

693: constraints and trust-region bounds of the form

694: \eqnok{master.tr.bounds} as {\em minor iterates}. Another key

695: difference between the trust-region approach and the L-shaped approach

696: is that a minor iterate $x^{k,\ell}$ is accepted as the new major

697: iterate $x^{k+1}$ only if it yields a substantial reduction in the

698: objective function $\cQ$ over the previous iterate $x^k$, in a sense

699: to be defined below.  A further important difference is that one can

700: delete optimality cuts from the model functions, between minor and

701: major iterations, without compromising the convergence properties of

702: the algorithm.

703:

704: To specify the method, we need to augment the notation established in

705: the previous section.  We define $m_{k,\ell}(x)$ to be the model

706: function after $\ell$ minor iterations have been performed at

707: iteration $k$, and $\Delta_{k,\ell}>0$ to be the trust-region radius

708: at the same stage.  Under Assumption~\ref{ass:S}, there are no

709: feasibility cuts, so that the problem to be solved to obtain the minor

710: iteration $x^{k,\ell}$ is as follows:

711: \beq \labtag{trsub.kl}

712: \min_x \, m_{k,\ell}(x) \;\; \mbox{subject to} \;Ax=b, \; x \ge 0, \;

713: \| x-x^k \|_{\infty} \le \Delta_{k,\ell}

714: \eeq

715: (cf. \eqnok{2stage.pl.L}). By expanding this problem in a similar

716: fashion to \eqnok{master}, we obtain

717: \begin{subequations} \labtag{master.kl}

718: \beqa

719: \labtag{master.kl.1}

720: \min_{x, \theta_1, \dots, \theta_T} \, c^Tx + \sum_{j=1}^T \theta_j, &&

721: \mbox{subject to} \\

722: \labtag{master.kl.4}

723: \theta_j e & \ge & F_{[j]}^{k,\ell} x + f_{[j]}^{k,\ell}, \sgap j=1,2,\dots,T, \\

724: \labtag{master.kl.2}

725: Ax=b, \;\; x & \ge & 0, \\

726: \labtag{master.kl.tr}

727: -\Delta_{k,\ell} e \le  x-x^k & \le & \Delta_{k,\ell} e.

728: \eeqa

729: \end{subequations}

730:

731: We assume the initial model $m_{k,0}$ at major iteration $k$ to

732: satisfy the following two properties:

733: \begin{subequations} \labtag{mkprop}

734: \beqa \labtag{mkprop.1}

735: & m_{k,0}(x^k) = \cQ(x^k), \\

736: \labtag{mkprop.2}

737: & \mbox{$m_{k,0}$ is a piecewise linear underestimate of $\cQ$}.

738: \eeqa

739: \end{subequations}

740:

741: %

742: %

743: %

744: %

745: %

746: %

747:

748: Denoting the solution of the subproblem \eqnok{master.kl} by

749: $x^{k,\ell}$, we accept this point as the new iterate $x^{k+1}$ if the

750: decrease in the actual objective $\cQ$ (see \eqnok{2stage.pl}) is at

751: least some fraction of the decrease predicted by the model function

752: $m_{k,\ell}$. That is, for some constant $\xi \in (0,1/2)$, the

753: acceptance test is

754: \beq \labtag{tr.accept}

755: \cQ(x^{k,\ell}) \le \cQ(x^k) - \xi

756: \left( \cQ(x^k) - m_{k,\ell}(x^{k,\ell}) \right).

757: \eeq

758: %

759: (A typical value for $\xi$ is $10^{-4}$.)

760:

761: If the test \eqnok{tr.accept} fails to hold, we obtain a new model

762: function $m_{k,\ell+1}$ by adding and possibly deleting cuts from

763: $m_{k,\ell}(x)$. This process aims to refine the model function, so

764: that it eventually generates a new major iteration, while economizing

765: on storage by allowing deletion of subgradients that no longer seem

766: helpful. Addition and deletion of cuts are implemented by adding and

767: deleting rows from $F_{[j]}^{k,\ell}$ and $f_{[j]}^{k,\ell}$, to

768: obtain $F_{[j]}^{k,\ell+1}$ and $f_{[j]}^{k,\ell+1}$, for

769: $j=1,2,\dots,T$.

770:

771: Given some parameter $\eta \in [0,1)$, we obtain $m_{k,\ell+1}$ from

772: $m_{k,\ell}$ by means of the following procedure:

773: %

774: \btab

775: \> {\bf Procedure Model-Update} $(k,\ell)$ \\

776: \> {\bf for each} optimality cut \\

777: \>\> {\tt possible\_delete}  $\leftarrow$ {\tt true}; \\

778: \>\> {\bf if} the cut was generated at $x^k$ \\

779: \>\>\> {\tt possible\_delete}  $\leftarrow$ {\tt false}; \\

780: \>\> {\bf else if} the cut is active at the solution of \eqnok{master.kl} \\

781: \>\>\> {\tt possible\_delete}  $\leftarrow$ {\tt false}; \\

782: \>\> {\bf else if} the cut was generated at an earlier minor iteration \\

783: \>\>\>

784: $\bar{\ell}=0,1,\dots,\ell-1$ such that

785: \etab

786: \beq \labtag{cut.delete.criterion}

787: \cQ(x^k) - m_{k,\ell}(x^{k,\ell}) > \eta

788: \left[ \cQ(x^k) - m_{k,\bar\ell}(x^{k,\bar\ell}) \right]

789: \eeq

790: \btab

791: \>\>\> {\tt possible\_delete}  $\leftarrow$ {\tt false}; \\

792: \>\> {\bf end (if)} \\

793: %

794: %

795: \>\> {\bf if} {\tt possible\_delete} \\

796: \>\>\> possibly delete the cut; \\

797: \> {\bf end (for each)} \\

798: \> add optimality cuts obtained from each of the component functions \\

799: \>\> $\cQ_{[j]}$ at $x^{k,\ell}$. \\

800: \etab

801: %

802:

803: In our implementation, we delete the cut if ${\tt possible\_delete}$

804: is true at the final conditional statement and, in addition, the cut

805: has not been active during the last 100 solutions of

806: \eqnok{master.kl}. More details are given in

807: Section~\ref{sec:results:parameters}.

808:

809: Because we retain all cuts active at $x^k$ during the course of

810: major iteration $k$, the following extension of \eqnok{mkprop.1} holds:

811: \beq \labtag{mkprop.1a}

812: m_{k,\ell}(x^k) = \cQ(x^k), \;\; \ell=0,1,2,\dots.

813: \eeq

814: Since we add only subgradient information, the following

815: generalization of \eqnok{mkprop.2} also holds uniformly:

816: \beq \labtag{mkprop.2a}

817: \mbox{$m_{k,\ell}$ is a piecewise linear underestimate of $\cQ$, for $\ell=0,1,2,\dots.$}

818: \eeq

819:

820: We may also decrease the trust-region radius $\Delta_{k,\ell}$ between

821: minor iterations (that is, choose $\Delta_{k,\ell+1} <

822: \Delta_{k,\ell}$) when the test \eqnok{tr.accept} fails to hold. We do

823: so if the match between model and objective appears to be particularly

824: poor.  If $\cQ(x^{k,\ell})$ exceeds $\cQ(x^k)$ by more than an

825: estimate of the quantity

826: \beq \labtag{reduce.delta.1}

827: \max_{\| x-x^k\|_{\infty} \le 1} \, \cQ(x^k) - \cQ(x),

828: \eeq

829: we conclude that the ``upside'' variation of the function $\cQ$

830: deviates too much from its ``downside'' variation, and we choose the

831: new radius $\Delta_{k,\ell+1}$ to bring these quantities more nearly

832: into line. Our estimate of \eqnok{reduce.delta.1} is simply

833: \[

834: \frac{1}{\min(1,\Delta_{k,\ell})}

835: \left[ \cQ(x^k)  - m_{k,\ell}(x^{k,\ell}) \right],

836: \]

837: that is, an extrapolation of the model reduction on the current trust

838: region to a trust region of radius $1$.  Our complete strategy for

839: reducing $\Delta$ is therefore as follows. (The {\tt counter} is

840: initialized to zero at the start of each major iteration.)

841: %

842: \btab

843: \> {\bf Procedure Reduce-$\Delta$} \\

844: \> evaluate

845: \etab

846: \beq \labtag{reduce.delta.2}

847: \rho = {\min(1,\Delta_{k,\ell})} \frac{\cQ(x^{k,\ell}) - \cQ(x^k)}{\cQ(x^k)  - m_{k,\ell}(x^{k,\ell})};

848: \eeq

849: \btab

850: \> {\bf if} $\rho>0$ \\

851: \>\> {\tt counter} $\leftarrow$ {\tt counter}$+1$; \\

852: \> {\bf if} $\rho>3$ {\bf or}

853: ({\tt counter} $\ge 3$ {\bf and} $\rho \in (1,3]$) \\

854: \>\> set

855: \etab

856: \[

857: \Delta_{k,\ell+1} = \frac{1}{\min(\rho,4)} \Delta_{k,\ell};

858: \]

859: \btab

860: \>\> reset {\tt counter} $\leftarrow 0$;

861: \etab

862: %

863: This procedure is related to the technique of

864: Kiwiel~\cite[p.~109]{Kiw90} for increasing the coefficient of the

865: quadratic penalty term in his regularized bundle method.

866:

867: If the test \eqnok{tr.accept} is passed, so that we have

868: $x^{k+1} = x^{k,\ell}$, we have a great deal of flexibility in

869: defining the new model function $m_{k+1,0}$.  We require only that the

870: properties \eqnok{mkprop} are satisfied, with $k+1$ replacing $k$.

871: Hence, we are free to delete much of the optimality cut information

872: accumulated at iteration $k$ (and previous iterates). In practice, of

873: course, it is wise to delete only those cuts that have been inactive

874: for a substantial number of iterations; otherwise we run the risk that

875: many new function and subgradient evaluations will be required to

876: restore useful model information that was deleted prematurely.

877:

878: If the step to the new major iteration $x^{k+1}$ shows a particularly

879: close match between the true function $\cQ$ and the model function

880: $m_{k,\ell}$ at the last minor iteration of iteration $k$, we consider

881: increasing the trust-region radius. Specifically, if

882: \beq \labtag{tr.incr.1}

883: \cQ(x^{k,\ell}) \le \cQ(x^k) - 0.5

884: \left( \cQ(x^k) - m_{k,\ell}(x^{k,\ell}) \right), \sgap

885: \| x^k - x^{k,\ell} \|_{\infty} = \Delta_{k,\ell},

886: \eeq

887: then we set

888: \beq \labtag{tr.incr.3}

889: \Delta_{k+1,0} = \min ( \Delta_{\rm hi}, 2 \Delta_{k,\ell}),

890: \eeq

891: where $\Delta_{\rm hi}$ is a prespecified upper bound on the radius.

892:

893: Before specifying the algorithm formally, we define the convergence

894: test. Given a parameter $\epstol>0$, we terminate if

895: \beq \labtag{conv.test}

896: \cQ(x^k) - m_{k,\ell}(x^{k,\ell}) \le

897: %

898: \epstol  (1+ |\cQ(x^k)|).

899: \eeq

900:

901: \btab

902: \> {\bf Algorithm TR} \\

903: \> choose $\xi \in (0,1/2)$, maximum trust region $\Delta_{\rm hi}$,

904: tolerance $\epstol$; \\

905: \> choose starting point $x^0$; \\

906: \> define initial model $m_{0,0}$ with the properties \eqnok{mkprop} (for $k=0$); \\

907: \> choose $\Delta_{0,0} \in (0, \Delta_{\rm hi}]$; \\

908: \> {\bf for} $k=0,1,2,\dots$ \\

909: \>\> {\tt finishedMinorIteration} $\leftarrow$ {\tt false}; \\

910: \>\> $\ell \leftarrow 0$; ${\tt counter} \leftarrow 0$; \\

911: \>\> {\bf repeat} \\

912: \>\>\> solve \eqnok{trsub.kl} to obtain $x^{k,\ell}$; \\

913: \>\>\> {\bf if} \eqnok{conv.test} is satisfied \\

914: \>\>\>\> STOP with approximate solution $x^k$; \\

915: \>\>\> evaluate function and subgradient at $x^{k,\ell}$; \\

916: \>\>\> {\bf if} \eqnok{tr.accept} is satisfied \\

917: \>\>\>\> set  $x^{k+1} = x^{k,\ell}$; \\

918: \>\>\>\> obtain $m_{k+1,0}$ by possibly deleting cuts from $m_{k,\ell}$, but \\

919: \>\>\>\>\>

920:  retaining the properties \eqnok{mkprop} (with $k+1$ replacing $k$); \\

921: \>\>\>\> choose $\Delta_{k+1,0} \in [ \Delta_{k,\ell}, \Delta_{\rm hi}]$

922: according to \eqnok{tr.incr.1}, \eqnok{tr.incr.3}; \\

923: \>\>\>\> {\tt finishedMinorIteration} $\leftarrow$ {\tt true}; \\

924: \>\>\> {\bf else} \\

925: \>\>\>\> obtain  $m_{k,\ell+1}$ from  $m_{k,\ell}$

926: via Procedure Model-Update $(k,\ell)$; \\

927: \>\>\>\> obtain $\Delta_{k,\ell+1}$ via Procedure Reduce-$\Delta$; \\

928: \>\>\> $\ell \leftarrow \ell+1$; \\

929: \>\> {\bf until} {\tt finishedMinorIteration} \\

930: \> {\bf end (for)}

931: \etab

932:

933: \subsection{Analysis of the Trust-Region Method}

934: \labtag{sec:tr:analysis}

935:

936: %

937: %

938: %

939: %

940: %

941: %

942: %

943: %

944: %

945: %

946: %

947: %

948: %

949: %

950: %

951: %

952: %

953: %

954: %

955: %

956: %

957:

958: We now describe the convergence properties of Algorithm TR. We show

959: that for $\epstol=0$, the algorithm either terminates at a solution

960: or generates a sequence of major iterates that approaches the

961: solution set $\cS$ (Theorem~\ref{th:tr:conv}). When $\epstol > 0$, the

962: algorithm terminates finitely; that is, it avoids generating infinite

963: sequences either of major or minor iterates (Theorem~\ref{th:fint}).

964:

965: Given some starting point $x^0$ satisfying the constraints

966: $Ax^0=b$, $x^0 \ge 0$, and setting $\cQ_0 = \cQ(x^0)$, we define the

967: following quantities that are useful in describing and analyzing the

968: algorithm:

969: %

970: \beqa

971: \labtag{def.ls}

972: \cL(\cQ_0) &=& \{ x \, | \, Ax=b, x \ge 0, \cQ(x) \le \cQ_0 \}, \\

973: \labtag{def.lsn}

974: \cL(\cQ_0;\Delta) &=& \{ x \, | \, \|x-y \| \le \Delta, \,

975: \mbox{for some  $y \in \cL(\cQ_0)$} \}, \\

976: \labtag{def.beta}

977: \beta &=& \sup \{ \| g \|_1 \, | \, g \in \partial \cQ(x), \,

978: \mbox{for some $x \in \cL(\cQ_0;\Delta_{\rm hi})$} \}.

979: \eeqa

980: %

981: Using Assumption~\ref{ass:S}, we can easily show that $\beta <

982: \infty$.

983:

984: We start by showing that the optimal objective value for

985: \eqnok{trsub.kl} cannot decrease from one minor iteration to the next.

986: \begin{lemma} \labtag{lem:mkl}

987:   Suppose that $x^{k,\ell}$ does not satisfy the acceptance test

988:   \eqnok{tr.accept}. Then we have

989: \[

990: m_{k,\ell}(x^{k,\ell}) \le m_{k,\ell+1}(x^{k,\ell+1}).

991: \]

992: \end{lemma}

993: \begin{proof}

994:   In obtaining $m_{k,\ell+1}$ from $m_{k,\ell}$ in Model-Update, we do

995:   not allow deletion of cuts that were active at the solution

996:   $x^{k,\ell}$ of \eqnok{master.kl}. Using $\bar{F}_{[j]}^{k,\ell}$

997:   and $\bar{f}_{[j]}^{k,\ell}$ to denote the active rows in

998:   $F_{[j]}^{k,\ell}$ and $f_{[j]}^{k,\ell}$, we have that $x^{k,\ell}$

999:   is also the solution of the following linear program (in which the

1000:   inactive cuts are not present):

1001: \begin{subequations} \labtag{master2.kl}

1002: \beqa

1003: \labtag{master2.kl.1}

1004: \min_{x, \theta_1, \dots, \theta_T} \, c^Tx + \sum_{j=1}^T \theta_j, &&

1005: \mbox{subject to} \\

1006: \labtag{master2.kl.4}

1007: \theta_j e & \ge & \bar{F}_{[j]}^{k,\ell} x + \bar{f}_{[j]}^{k,\ell}, \sgap j=1,2,\dots,T, \\

1008: \labtag{master2.kl.2}

1009: Ax=b, \;\; x & \ge & 0, \\

1010: \labtag{master2.kl.tr}

1011: -\Delta_{k,\ell} e \le  x-x^k & \le & \Delta_{k,\ell} e.

1012: \eeqa

1013: \end{subequations}

1014: The subproblem to be solved for $x^{k,\ell+1}$ differs from

1015: \eqnok{master2.kl} in two ways. First, additional rows may be added to

1016: $\bar{F}_{[j]}^{k,\ell}$ and $\bar{f}_{[j]}^{k,\ell}$, consisting of

1017: function values and subgradients obtained at $x^{k,\ell}$ and also

1018: inactive cuts carried over from the previous \eqnok{master.kl}. Second,

1019: the trust-region radius $\Delta_{k,\ell+1}$ may be smaller than

1020: $\Delta_{k,\ell}$. Hence, the feasible region of the problem to be

1021: solved for $x^{k,\ell+1}$ is a subset of the feasible region for

1022: \eqnok{master2.kl}, so the optimal objective value cannot be smaller.

1023: \end{proof}

1024:

1025: Next we have a result about the amount of reduction in the model

1026: function $m_{k,\ell}$.

1027: \begin{lemma} \labtag{lem:tr:1}

1028: For all $k=0,1,2,\ldots$ and $\ell=0,1,2,\ldots$, we have that

1029: \begin{subequations} \labtag{lem:tr:inequalities}

1030: \beqa

1031: \nonumber

1032: m_{k,\ell}(x^k) - m_{k,\ell}(x^{k,\ell}) &=&

1033: \cQ(x^k) - m_{k,\ell}(x^{k,\ell}) \\

1034: \labtag{tr.2a}

1035: & \ge  &

1036: \min \left( \Delta_{k,\ell}, \| x^k - P(x^k)\|_{\infty} \right)

1037: \frac{\cQ(x^k) - \cQ^*}{\| x^k - P(x^k) \|_{\infty}} \\

1038: \labtag{tr.2b}

1039: & \ge &

1040: \hat{\epsilon} \min \left( \Delta_{k,\ell}, \| x^k - P(x^k)\|_{\infty} \right),

1041: \eeqa

1042: \end{subequations}

1043: where $\hat{\epsilon}>0$ is defined in \eqnok{weak.sharp}.

1044: \end{lemma}

1045: \begin{proof}

1046:   The first equality follows immediately from \eqnok{mkprop.1a}, while

1047:   the second inequality \eqnok{tr.2b} follows immediately from

1048:   \eqnok{tr.2a} and \eqnok{weak.sharp}. We now prove \eqnok{tr.2a}.

1049:

1050: Consider the following subproblem in the scalar $\tau$:

1051: \beq \labtag{tr.3}

1052: \min_{\tau \in [0,1]} \, m_{k,\ell} \left( x^k + \tau [ P(x^k) - x^k] \right)

1053: \;\; \mbox{subject to} \; \left\| \tau [ P(x^k) - x^k] \right\|_{\infty} \le

1054: \Delta_{k,\ell}.

1055: \eeq

1056: Denoting the solution of this problem by $\tau_{k,\ell}$, we have by

1057: comparison with \eqnok{trsub.kl} that

1058: \beq \labtag{tr.3a}

1059: m_{k,\ell} (x^{k,\ell}) \le

1060: m_{k,\ell} \left( x^k  + \tau_{k,\ell} [ P(x^k) - x^k] \right).

1061: \eeq

1062: If $\tau=1$ is feasible in \eqnok{tr.3}, we have  from \eqnok{tr.3a} and

1063: \eqnok{mkprop.2a} that

1064: \beqas

1065: \lefteqn{m_{k,\ell} (x^{k,\ell}) \le

1066: m_{k,\ell} \left( x^k  + \tau_{k,\ell} [ P(x^k) - x^k] \right) } \\

1067: & \le &

1068: m_{k,\ell} \left( x^k  + [ P(x^k) - x^k] \right)

1069: = m_{k,\ell} (P(x^k)) \le \cQ(P(x^k)) = \cQ^*.

1070: \eeqas

1071: Therefore, when $\tau=1$ is feasible for \eqnok{tr.3}, we have from

1072: \eqnok{mkprop.1a} that

1073: \[

1074: m_{k,\ell}(x^k) - m_{k, \ell}(x^{k,\ell}) \ge \cQ(x^k) - \cQ^*,

1075: \]

1076: so that  \eqnok{tr.2a} holds in this case.

1077:

1078: When $\tau=1$ is infeasible for \eqnok{tr.3}, consider setting $\tau =

1079: \Delta_{k,\ell} / \| x^k-P(x^k) \|_{\infty}$ (which is certainly feasible for

1080: \eqnok{tr.3}). We have from \eqnok{tr.3a}, the definition of

1081: $\tau_{k,\ell}$, the fact \eqnok{mkprop.2a} that $m_{k,\ell}$

1082: underestimates $\cQ$, and convexity of $\cQ$ that

1083: \beqas

1084: m_{k,\ell}(x^{k,\ell})

1085: %

1086: & \le & m_{k,\ell}

1087: \left( x^k + \Delta_{k,\ell} \frac{P(x^k)-x^k}{\|P(x^k)-x^k\|_{\infty}} \right) \\

1088: & \le & \cQ

1089: \left( x^k + \Delta_{k,\ell} \frac{P(x^k)-x^k}{\|P(x^k)-x^k\|_{\infty}} \right) \\

1090: & \le & \cQ(x^k) +

1091: \frac{\Delta_{k,\ell}}{\|P(x^k)-x^k\|_{\infty}} (\cQ^* - \cQ(x^k)).

1092: \eeqas

1093: Therefore, using \eqnok{mkprop.1a}, we have

1094: \[

1095: m_{k,\ell}(x^k) - m_{k,\ell}(x^{k,\ell}) \ge

1096: \frac{\Delta_{k,\ell}}{\|P(x^k)-x^k\|_{\infty}} [ \cQ(x^k) - \cQ^* ],

1097: \]

1098: verifying \eqnok{tr.2a} in this case as well.

1099: \end{proof}

1100:

1101: Our next result finds a lower bound on the trust-region radii

1102: $\Delta_{k,\ell}$. For purposes of this result we define a quantity

1103: $E_k$ to measure the closest approach to the solution set for all

1104: iterates up to and including $x^k$, that is,

1105: \beq \labtag{def:Ek}

1106: E_k \defeq \min_{\bar{k}=0,1,\dots,k}

1107: \| x^{\bar{k}} - P(x^{\bar{k}}) \|_{\infty}.

1108: \eeq

1109: Note that $E_k$ decreases monotonically with $k$. We also define

1110: $\Delta_{\rm init}$ to be the initial value of the trust region.

1111: %

1112: \begin{lemma} \labtag{lem:trbounds}

1113:   There is a constant $\Delta_{\rm lo} >0$ such that for all trust

1114:   regions $\Delta_{k,\ell}$ used in the course of Algorithm TR, we

1115:   have

1116: \[

1117: \Delta_{k,\ell} \ge \min( \Delta_{\rm lo}, E_k/4).

1118: \]

1119: \end{lemma}

1120: \begin{proof}

1121:   We prove the result by showing that the value $\Delta_{\rm lo} =

1122:   (1/4) \min(1, \Delta_{\rm init}, \hat{\epsilon}/\beta)$ has the

1123:   desired property, where $\hat{\epsilon}$ is from \eqnok{weak.sharp}

1124:   and $\beta$ is from \eqnok{def.beta}.

1125:

1126:   Suppose for contradiction that there are indices $k$ and $\ell$ such

1127:   that

1128: \[

1129: \Delta_{k, \ell} < \frac14 \min

1130: \left( 1, \frac{\hat{\epsilon}}{\beta}, \Delta_{\rm init}, E_k \right).

1131: \]

1132: Since the trust region can be reduced by at most a factor of $4$ by

1133: Procedure Reduce-$\Delta$, there must be an earlier trust region

1134: radius $\Delta_{\bar{k}, \bar{\ell}}$ (with $\bar{k} \le k$) such that

1135: \beq \labtag{poo.3}

1136: \Delta_{\bar{k},\bar{\ell}} <

1137: \min \left( 1, \frac{\hat{\epsilon}}{\beta}, E_{k} \right),

1138: \eeq

1139: and $\rho>1$ in \eqnok{reduce.delta.2}, that is,

1140: \beqa

1141: \nonumber

1142: \cQ(x^{\bar{k},\bar{\ell}}) - \cQ(x^{\bar{k}}) & > &

1143: \frac{1}{\min(1,\Delta_{\bar{k},\bar{\ell}})}

1144: \left( \cQ(x^{\bar{k}}) -

1145: m_{\bar{k},\bar{\ell}}(x^{\bar{k},\bar{\ell}}) \right) \\

1146: \labtag{poo.4}

1147: & = & \frac{1}{\Delta_{\bar{k},\bar{\ell}}}

1148: \left(

1149: \cQ(x^{\bar{k}}) - m_{\bar{k},\bar{\ell}}(x^{\bar{k},\bar{\ell}})

1150: \right).

1151: \eeqa

1152: By applying Lemma~\ref{lem:tr:1}, and using  \eqnok{poo.3}, we have

1153: \beq \labtag{poo.4a}

1154: \cQ(x^{\bar{k}}) - m_{\bar{k},\bar{\ell}}(x^{\bar{k},\bar{\ell}}) \ge

1155: \hat{\epsilon} \min \left( \Delta_{\bar{k},\bar{\ell}},

1156: \| x^{\bar{k}} - P(x^{\bar{k}}) \|_{\infty} \right) =

1157: \hat{\epsilon} \Delta_{\bar{k},\bar{\ell}}

1158: \eeq

1159: where the last equality follows from

1160: $\| x^{\bar{k}} - P(x^{\bar{k}}) \|_{\infty} \ge E_{\bar{k}} \ge E_k$ and

1161: \eqnok{poo.3}.

1162: By combining  \eqnok{poo.4a} with  \eqnok{poo.4}, we have that

1163: \beq \labtag{poo.5}

1164: \cQ(x^{\bar{k},\bar{\ell}}) - \cQ(x^{\bar{k}}) > \hat{\epsilon}.

1165: \eeq

1166: By using standard properties of subgradients, we have

1167: \beqa

1168: \nonumber

1169: \lefteqn{\cQ(x^{\bar{k},\bar{\ell}}) - \cQ(x^{\bar{k}}) \le

1170: g_{\bar{\ell}}^T(x^{\bar{k},\bar{\ell}} - x^{\bar{k}})} \\

1171: \labtag{subd.5}

1172: & \le &

1173: \| g_{\bar{\ell}} \|_1 \| x^{\bar{k}} - x^{\bar{k},\bar{\ell}} \|_{\infty}

1174: \le \| g_{\bar{\ell}} \|_1 \Delta_{\bar{k},\bar{\ell}}, \;\;

1175: \mbox{for all} \; g_{\bar{\ell}} \in \partial \cQ(x^{\bar{k},\bar{\ell}}).

1176: \eeqa

1177: By combining this expression with \eqnok{poo.5}, and using

1178: \eqnok{poo.3} again, we obtain that

1179: \[

1180: \| g_{\bar{\ell}} \|_1 \ge

1181: \frac{\hat{\epsilon}}{\Delta_{\bar{k},\bar{\ell}}} > \beta.

1182: \]

1183: However, since $x^{\bar{k},\bar{\ell}} \in \cL(\cQ_0;\Delta_{\rm hi})$, we have

1184: from \eqnok{def.beta} that $\| g_{\bar{\ell}} \|_1 \le \beta$, giving a

1185: contradiction.

1186: \end{proof}

1187:

1188: Finite termination of the inner iterations is proved in the following

1189: two results. Recall that the parameters $\xi$ and $\eta$ are defined

1190: in \eqnok{tr.accept} and \eqnok{cut.delete.criterion}, respectively.

1191: \begin{lemma} \labtag{lem:tr:ft}

1192:   Let $\epstol=0$ in Algorithm TR, and let $\bar{\eta}$ be

1193:   any constant satisfying $0<\bar{\eta}<1$, $\bar{\eta}>\xi$,

1194:   $\bar{\eta} \ge \eta$. Let $\ell_1$ be any index such that

1195:   $x^{k,\ell_1}$ fails to satisfy the test \eqnok{tr.accept}.  Then

1196:   either the sequence of inner iterations eventually yields a point

1197:   $x^{k,\ell_2}$ satisfying the acceptance test \eqnok{tr.accept}, or

1198:   there is an index $\ell_2>\ell_1$ such that

1199: \beq \labtag{tr.6}

1200: \cQ(x^k) - m_{k,\ell_2}(x^{k,\ell_2}) \le \bar{\eta} \left[

1201: \cQ(x^k) - m_{k,\ell_1}(x^{k,\ell_1}) \right].

1202: \eeq

1203: \end{lemma}

1204: \begin{proof}

1205:   Suppose for contradiction that the none of the minor iterations

1206:   following $\ell_1$ satisfies either \eqnok{tr.accept} or the

1207:   criterion \eqnok{tr.6}; that is,

1208: \beqa \nonumber

1209: \cQ(x^k) - m_{k,q}(x^{k,q}) & > & \bar{\eta} \left[

1210: \cQ(x^k) - m_{k,\ell_1}(x^{k,\ell_1}) \right],  \\

1211: \labtag{contra}

1212: & \ge & \eta \left[ \cQ(x^k) - m_{k,\ell_1}(x^{k,\ell_1}) \right],

1213: \;\; \mbox{\rm for all $q > \ell_1$}.

1214: \eeqa

1215: It follows from this bound, together with Lemma~\ref{lem:mkl} and

1216: Procedure Model-Update, that none of the cuts generated at minor

1217: iterations $q \ge \ell_1$ is deleted.

1218:

1219: We assume in the remainder of the proof that $q$ and $\ell$ are

1220: generic minor iteration indices that satisfy

1221: \[

1222: q > \ell \ge \ell_1.

1223: \]

1224:

1225: Because the function and subgradients from minor iterations

1226: $x^{k,\ell}$, $l=l_1,l_1+1, \dots$ are retained throughout the major

1227: iteration $k$, we have

1228: \beq \labtag{matchQ}

1229: m_{k,q}(x^{k,\ell}) = \cQ(x^{k,\ell}).

1230: \eeq

1231: By definition of the subgradient, we have

1232: \beq \labtag{subgrad.mkq}

1233: m_{k,q}(x) - m_{k,q}(x^{k,\ell}) \ge g^T (x-x^{k,\ell}), \;\;

1234: \mbox{for all} \; g \in \partial m_{k,q}(x^{k,\ell}).

1235: \eeq

1236: Therefore, from \eqnok{mkprop.2a} and \eqnok{matchQ}, it follows that

1237: \[

1238: \cQ(x)-\cQ(x^{k,\ell}) \ge g^T (x-x^{k,\ell}), \;\; \mbox{for all} \;

1239: g \in \partial m_{k,q}(x^{k,\ell}),

1240: \]

1241: so that

1242: \beq \labtag{mkqQ}

1243: \partial m_{k,q}(x^{k,\ell}) \subset \partial \cQ(x^{k,\ell}).

1244: \eeq

1245:

1246: Since $\cQ(x^k) < \cQ(x^0) = \cQ_0$, we have from \eqnok{def.ls} that

1247: $x^k \in \cL(\cQ_0)$. Therefore, from the definition \eqnok{def.lsn}

1248: and the fact that $\| x^{k,\ell} - x^k \| \le \Delta_{k,\ell} \le

1249: \Delta_{\rm hi}$, we have that $x^{k,\ell} \in \cL(\cQ_0;\Delta_{\rm

1250: hi})$. It follows from \eqnok{def.beta} and \eqnok{mkqQ} that

1251: \beq \labtag{gbound}

1252: \| g \|_1 \le \beta, \;\; \mbox{for all} \; g \in \partial m_{k,q}(x^{k,\ell}).

1253: \eeq

1254:

1255: Since $x^{k,\ell}$ is rejected by the test \eqnok{tr.accept}, we

1256: have from \eqnok{matchQ} and Lemma~\ref{lem:mkl} that the following

1257: inequalities hold:

1258: \beqas

1259: m_{k,q}(x^{k,\ell}) = \cQ(x^{k,\ell})

1260: & \ge &\cQ(x^k) - \xi \left[ \cQ(x^k) - m_{k,\ell}(x^{k,\ell}) \right] \\

1261: & \ge & \cQ(x^k) - \xi \left[ \cQ(x^k) - m_{k,\ell_1}(x^{k,\ell_1}) \right].

1262: \eeqas

1263: By rearranging this expression, we obtain

1264: \beq \labtag{tr.8}

1265: \cQ(x^k) - m_{k,q}(x^{k,\ell}) \le

1266: \xi \left[ \cQ(x^k) - m_{k,\ell_1}(x^{k,\ell_1}) \right].

1267: \eeq

1268:

1269: Consider now all points $x$ satisfying

1270: \beq \labtag{xkl.nbd}

1271: \| x-x^{k,\ell} \|_{\infty} \le

1272: \frac{\bar{\eta}-\xi}{\beta}

1273: \left[ \cQ(x^k)-m_{k,\ell_1}(x^{k,\ell_1}) \right]

1274: \defeq \zeta>0.

1275: \eeq

1276: Using this bound together with \eqnok{subgrad.mkq} and \eqnok{gbound},

1277: we obtain

1278: \beqas

1279: \lefteqn{ m_{k,q}(x^{k,\ell}) - m_{k,q}(x) \le g^T(x^{k,\ell} - x ) } \\

1280: & \le & \beta \| x^{k,\ell}-x \|_{\infty}

1281: \le (\bar{\eta} - \xi) \left[ \cQ(x^k)-m_{k,\ell_1}(x^{k,\ell_1}) \right].

1282: \eeqas

1283: By combining this bound with  \eqnok{tr.8}, we find that the following

1284: bound is satisfied for all $x$ in the neighborhood \eqnok{xkl.nbd}:

1285: \beqas

1286: \cQ(x^k) - m_{k,q}(x) &=&

1287: \left[ \cQ(x^k) - m_{k,q}(x^{k,\ell}) \right] +

1288: \left[ m_{k,q}(x^{k,\ell}) - m_{k,q}(x) \right] \\

1289:  & \le & \bar{\eta} \left[ \cQ(x^k)-m_{k,\ell_1}(x^{k,\ell_1}) \right].

1290: \eeqas

1291: It follows from this bound, in conjunction with \eqnok{contra}, that

1292: $x^{k,q}$ (the solution of the trust-region problem with model

1293: function $m_{k,q}$) cannot lie in the neighborhood \eqnok{xkl.nbd}.

1294: Therefore, we have

1295: \beq \labtag{meshprop}

1296: \| x^{k,q} - x^{k,\ell} \|_{\infty} > \zeta.

1297: \eeq

1298: But since $\| x^{k,\ell} - x^k \|_{\infty} \le \Delta_k \le

1299: \Delta_{\rm hi}$ for all $\ell \ge \ell_1$, it is impossible for an

1300: infinite sequence $\{ x^{k,\ell} \}_{\ell \ge \ell_1}$ to satisfy

1301: \eqnok{meshprop}. We conclude that \eqnok{tr.6} must hold for some

1302: $\ell_2 \ge \ell_1$, as claimed.

1303: \end{proof}

1304:

1305: We now show that the minor iteration sequence terminates at a point

1306: $x^{k,\ell}$ satisfying the acceptance test, provided that $x^k$ is

1307: not a solution.

1308: \begin{theorem} \labtag{th:tr:ft}

1309:   Suppose that $\epstol =0$.

1310: \begin{itemize}

1311: \item[(i)]   If $x^k \notin \cS$, there is an $\ell \ge 0$ such that

1312:   $x^{k,\ell}$ satisfies \eqnok{tr.accept}.

1313: \item[(ii)] If $x^k \in \cS$, then either Algorithm TR terminates (and verifies that $x^k \in \cS$), or

1314: $\cQ(x^k) - m_{k,\ell}(x^{k,\ell}) \downarrow 0$.

1315: \end{itemize}

1316: \end{theorem}

1317: \begin{proof}

1318:   Suppose for the moment that the inner iteration sequence is

1319:   infinite, that is, the test \eqnok{tr.accept} always fails. By

1320:   applying Lemma~\ref{lem:tr:ft} recursively, with any constant

1321:   $\bar{\eta}$ satisfying the properties stated in

1322:   Lemma~\ref{lem:tr:ft}, we can identify a sequence of indices $0 <

1323:   \ell_1 < \ell_2 < \dots$ such that

1324: \beqa

1325: \nonumber

1326: \cQ(x^k) - m_{k,\ell_j}(x^{k,\ell_j}) & \le &

1327: \bar{\eta} \left[ \cQ(x^k) - m_{k,\ell_{j-1}}(x^{k,\ell_{j-1}}) \right] \\

1328: \nonumber

1329: & \le &

1330: \bar{\eta}^2  \left[ \cQ(x^k) - m_{k,\ell_{j-2}}(x^{k,\ell_{j-2}}) \right] \\

1331: \nonumber

1332: & \vdots & \\

1333: \labtag{minortozero}

1334: & \le &

1335: \bar{\eta}^j \left[ \cQ(x^k) - m_{k,0}(x^{k,0}) \right].

1336: \eeqa

1337: When $x^k \notin \cS$, we have from Lemma~\ref{lem:trbounds} that

1338: \[

1339: \Delta_{k,\ell} \ge \min( \Delta_{\rm lo}, E_k/4)

1340: \defeq \bar{\Delta}_{\rm lo} >0, \;\; \mbox{for all $\ell=0,1,2,\dots$},

1341: \]

1342: so the right-hand side of \eqnok{tr.2a} is strictly positive.  Hence

1343: for $j$ sufficiently large, we have that

1344: \[

1345: \cQ(x^k) - m_{k,\ell_j}(x^{k,\ell_j}) \le

1346: 0.5 \min \left( \bar{\Delta}_{\rm lo}, \| x^k-P(x^k) \|_{\infty} \right)

1347: \frac{\cQ(x^k) - \cQ^*}{\| x^k - P(x^k) \|_{\infty}}.

1348: \]

1349: But this inequality contradicts \eqnok{lem:tr:inequalities}, proving (i).

1350:

1351: For the case of $x^k \in \cS$, there are two possibilities. If

1352: the inner iteration sequence terminates finitely at some $x^{k,\ell}$,

1353: we have $\cQ(x^k) - m_{k,\ell}(x^{k,\ell}) = 0$ and indeed that

1354: \[

1355: m_{k,\ell}(x) \ge \cQ(x^k) = \cQ^*, \;\;

1356: \mbox{for all $x$ with $\| x-x^k \|_{\infty} \le \Delta_{k,\ell}$}.

1357: \]

1358: Because of \eqnok{mkprop.2a}, we have that $\cQ(x) \ge \cQ(x^k)$ for

1359: all $x$ in a neighborhood of $x^k$, implying that $0 \in \partial

1360: \cQ(x^k)$. Therefore, termination under these circumstances yields a

1361: guarantee that $x^k \in \cS$. When the algorithm does not terminate,

1362: it follows from \eqnok{minortozero} that $\cQ(x^k) -

1363: m_{k,\ell}(x^{k,\ell}) \to 0$. By applying Lemma~\ref{lem:mkl}, we

1364: verify our claim (ii) of monotonic convergence.

1365: \end{proof}

1366:

1367: We now prove convergence of Algorithm TR to $\cS$.

1368: \begin{theorem} \labtag{th:tr:conv}

1369:   Suppose that $\epstol=0$. The sequence of major

1370:   iterations $\{ x^k \}$ is either finite, terminating at some $x^k

1371:   \in \cS$, or  is infinite, with the property that $\| x^k - P(x^k)

1372:   \|_{\infty} \to 0$.

1373: \end{theorem}

1374: \begin{proof}

1375:   If the claim does not hold, there are two possibilities. The first

1376:   is that the sequence of major iterations terminates finitely at some

1377:   $x^k \notin \cS$. However, Theorem~\ref{th:tr:ft} ensures, however, that the

1378:   minor iteration sequence will terminate at some new major iteration

1379:   $x^{k+1}$ under these circumstances, so we can rule out this

1380:   possibility. The second possibility is that the sequence $\{x^k\}$

1381:   is infinite but that there is some $\epsilon >0$ and an infinite

1382:   subsequence of indices $\{ k_j \}_{j=1,2,\dots}$ such that

1383: \[

1384: \| x^{k_j} - P(x^{k_j}) \|_{\infty}  \ge \epsilon, \;\; j=0,1,2,\dots.

1385: \]

1386: Since the sequence $\{ \cQ(x^{k_j}) \}_{j=1,2,\dots}$ is infinite,

1387: decreasing, and bounded below, it converges to some value $\bar{\cQ} >

1388: \cQ^*$.  Moreover, since the entire sequence $\{ \cQ(x^k) \}$ is

1389: monotone decreasing, it follows that $\cQ(x^k) > \bar{\cQ}$ and

1390: therefore

1391: \[

1392: \cQ(x^k) - \cQ^* > \bar{\cQ} - \cQ^* > 0, \;\; k=0,1,2,\dots.

1393: \]

1394: Hence, by boundedness of the subgradients (see \eqnok{def.beta}), we

1395: can identify a constant $\bar{\epsilon}>0$ such that

1396: \[

1397: \| x^k - P(x^k) \|_{\infty} \ge \bar{\epsilon}, \;\; k=0,1,2,\dots.

1398: \]

1399: It follows from \eqnok{def:Ek} that

1400: \beq \labtag{Ekbb}

1401: E_k \ge \bar{\epsilon}, \;\; k=0,1,2,\dots.

1402: \eeq

1403:

1404: For each major iteration index $k$, let $\ell(k)$ be the minor

1405: iteration index that passes the acceptance test \eqnok{tr.accept}. By combining \eqnok{tr.accept} with Lemma~\ref{lem:tr:1}, we have that

1406: \[

1407: \cQ(x^k) - \cQ(x^{k+1}) \ge \xi \hat{\epsilon} \min

1408: \left( \Delta_{k, \ell(k)}, \|x^k - P(x^k) \|_{\infty} \right)

1409: \ge \xi \hat{\epsilon} \min

1410: \left( \Delta_{k, \ell(k)}, \bar{\epsilon} \right).

1411: \]

1412: Since  $\cQ(x^k) - \cQ(x^{k+1}) \to 0$, we deduce that

1413: \beq \labtag{poo.8}

1414: \lim_{k \to \infty} \Delta_{k, \ell(k)} = 0.

1415: \eeq

1416: By Lemma~\ref{lem:trbounds} and \eqnok{Ekbb}, we have

1417: \[

1418:  \Delta_{k, \ell(k)} \ge \min (\Delta_{\rm lo}, \bar{\epsilon}/4) >0, \;\;

1419: k=0,1,2,\dots,

1420: \]

1421: which contradicts \eqnok{poo.8}.  We conclude that the second

1422: possibility (an infinite sequence $\{ x^k \}$ not converging to $\cS$)

1423: cannot occur either, so the proof is complete.

1424: \end{proof}

1425:

1426: Finally, we show that the algorithm terminates when $\epstol>0$.

1427: %

1428: \begin{theorem} \labtag{th:fint}

1429: When $\epstol>0$, Algorithm TR terminates finitely.

1430: \end{theorem}

1431: \begin{proof}

1432:   We show first that the algorithm cannot ``get stuck'' at a

1433:   particular $x^k$, generating an infinite sequence of minor

1434:   iterations at $x^k$ without eventually satisfying either

1435:   \eqnok{conv.test} or the acceptance test \eqnok{tr.accept}.  We see

1436:   from the reasoning in the proof of Theorem~\ref{th:tr:ft} together

1437:   with the monotonicity property of Lemma~\ref{lem:mkl} that an

1438:   infinite sequence of minor iterations must satisfy that

1439: \beq \labtag{fint.1}

1440: \cQ(x^k) - m_{k,\ell}(x^{k,\ell}) \downarrow 0.

1441: \eeq

1442: Since the right-hand side of \eqnok{conv.test} is bounded below by

1443: $\epstol$, the test \eqnok{conv.test} must be

1444: satisfied for some $\ell$. Therefore, the minor iteration

1445: sequence cannot be infinite.

1446:

1447: Now consider the other possibility of an infinite sequence of major

1448: iterations $\{ x^k \}_{k=1,2,\dots}$. Since we have

1449: \[

1450: \cQ(x^k) - m_{k,\ell}(x^{k,\ell}) > \epstol

1451: \]

1452: for all $k$ and $\ell$, and since the acceptance test

1453: \eqnok{tr.accept} is satisfied at all $k$, we have

1454: \[

1455: \cQ(x^k) - \cQ(x^{k+1}) \ge \xi \epstol >0, \;\;

1456: \makebox{for all $k=0,1,2\dots$}.

1457: \]

1458: But this relation is inconsistent with the fact that $\{ \cQ(x^k) \}$

1459: is bounded below (by $\cQ^*$), so this possibility can also be ruled

1460: out, and the proof is complete.

1461: \end{proof}

1462:

1463:

1464:

1465: %

1466: %

1467: %

1468: %

1469: %

1470: %

1471: %

1472: %

1473: %

1474: %

1475: %

1476: %

1477:

1478: %

1479: %

1480: %

1481: %

1482: %

1483: %

1484: %

1485: %

1486: %

1487: %

1488: %

1489: %

1490: %

1491: %

1492: %

1493: %

1494: %

1495: %

1496: %

1497: %

1498: %

1499: %

1500: %

1501: %

1502: %

1503: %

1504: %

1505: %

1506: %

1507: %

1508: %

1509: %

1510: %

1511: %

1512: %

1513: %

1514: %

1515: %

1516: %

1517: %

1518: %

1519: %

1520:

1521: \subsection{Discussion} \labtag{sec:tr:discussion}

1522:

1523: The algorithm can be modified in various ways without

1524: changing its properties greatly.  For instance, we could replace the

1525: step norm bound in \eqnok{trsub.kl} by a scaled bound of the form

1526: \[

1527: \| S (x-x^k) \|_{\infty} \le \Delta_k,

1528: \]

1529: where $S$ is a diagonal positive definite matrix.  After

1530: this modification, \eqnok{master.kl} remains a linear program.  We

1531: could also use a $1$-norm trust region, at the cost of introducing an

1532: additional variable vector $s$ of the same dimension as $x$.

1533: Specifically, we enforce the constraint $\|x-x^k \|_1 \le \Delta_k$ by

1534: enforcing the following linear constraints:

1535: \[

1536: x-x^k \le s, \sgap x^k-x \le s, \sgap e^Ts \le \Delta_k.

1537: \]

1538: Once again, we obtain a linear programming subproblem, albeit one that

1539: involves more variables than \eqnok{master.kl}

1540:

1541: If a $2$-norm trust region is used, we can show by comparing the

1542: optimality conditions for the respective problems that the solution of

1543: the subproblem

1544: \[

1545: \min_x \, m_{k,\ell}(x) \;\; \mbox{subject to} \;Ax=b, \; x \ge 0, \;

1546: \| x-x^k \|_2 \le \Delta_k

1547: \]

1548: is identical to the solution of

1549: \beq \labtag{trsub.2norm}

1550: \min_x \, m_{k,\ell}(x) + \lambda \| x-x^k \|^2 \;\;

1551: \mbox{subject to} \;Ax=b, \; x \ge 0,

1552: \eeq

1553: for some $\lambda \ge 0$.

1554: %

1555: %

1556: We can transform \eqnok{trsub.2norm} to a quadratic program in the

1557: same fashion as the transformation of \eqnok{trsub.kl} to

1558: \eqnok{master.kl}. The bundle-trust-region approaches described in

1559: Kiwiel~\cite{Kiw90}, Hirart-Urruty and

1560: Lemar\'echal~\cite[Chapter~XV]{HirL93}, and

1561: Ruszczy{\'n}ski~\cite{Rus86,Rus93} also lead to problems of the form

1562: \eqnok{trsub.2norm}. These approaches manipulate the parameter

1563: $\lambda$ rather than adjusting the trust-region radius, more in the

1564: spirit of the Levenberg-Marquardt method for least-squares problems

1565: than of a true trust-region method. Hence, their analysis differs

1566: somewhat from that of the preceding section. Moreover, although

1567: quadratic programming solvers that exploit the special structure of

1568: the quadratic term in \eqnok{trsub.2norm} have been designed and

1569: implemented (see \cite{Rus86}), we believe that the linear programming

1570: subproblem \eqnok{master.kl} is more appealing from a practical point

1571: of view. Improvements in the efficiency and ease of use of linear

1572: programming software have continued to occur at a rapid pace, and

1573: availability of high-quality software has made it much easier to

1574: implement an efficient algorithm based on \eqnok{master.kl} than would

1575: have been the case if the subproblems had the form

1576: \eqnok{trsub.2norm}.

1577:

1578:

1579: %

1580: %

1581: %

1582:

1583:

1584: %

1585: %

1586: %

1587: %

1588: %

1589: %

1590:

1591: %

1592: %

1593: %

1594: %

1595: %

1596:

1597:

1598:

1599: \section{An Asynchronous Bundle-Trust-Region Method}

1600: \labtag{sec:atr}

1601:

1602: In this section we present an asynchronous, parallel version of the

1603: trust-region algorithm of the preceding section and analyze its

1604: convergence properties.

1605:

1606: \subsection{Algorithm ATR} \labtag{sec:atr:atr}

1607:

1608: We now define a variant of the method of Section~\ref{sec:tr} that

1609: allows the partial sums $\cQ_{[j]}, j=1,2,\dots,T$ \eqnok{thetaj} and

1610: their associated cuts to be evaluated simultaneously for different

1611: values of $x$. We generate candidate iterates by solving trust-region

1612: subproblems centered on an ``incumbent'' iterate, which (after a

1613: startup phase) is the point $x^I$ that, roughly speaking, is the best

1614: among those visited by the algorithm whose function value $\cQ(x)$ is

1615: fully known.

1616:

1617: By performing evaluations of $\cQ$ at different points concurrently,

1618: we relax the strict synchronicity requirements of Algorithm TR, which

1619: requires $\cQ(x^k)$ to be evaluated fully before the next candidate

1620: $x^{k+1}$ is generated.  The resulting approach, which we call

1621: Algorithm ATR (for ``asynchronous TR''), is more suitable for

1622: implementation on computational grids of the type we consider here.

1623: Besides the obvious increase in parallelism that goes with evaluating

1624: several points at once, there is no longer a risk of the entire

1625: computation being help up by the slow evaluation of one of the partial

1626: sums $\cQ_{[j]}$ on a recalcitrant worker. Algorithm ATR has similar

1627: theoretical properties to Algorithm TR, since the mechanisms for

1628: accepting a point as the new incumbent, adjusting the size of the

1629: trust region, and adding and deleting cuts are all similar to the

1630: corresponding mechanisms in Algorithm TR.

1631:

1632: Algorithm ATR maintains a ``basket'' $\cB$ of at most $K$ points for

1633: which the value of $\cQ$ and associated subgradient information is

1634: partially known. When the evaluation of $\cQ(x^q)$ is completed for a

1635: particular point $x^q$ in the basket, it is installed as the new

1636: incumbent if (i) its objective value is smaller than that of the

1637: current incumbent $x^I$; and (ii) it passes a trust-region acceptance

1638: test like \eqnok{tr.accept}, with the incumbent {\em at the time $x^q$

1639:   was generated} playing the role of the previous major iteration in

1640: Algorithm TR.  Whether $x^q$ becomes the incumbent or not, it is

1641: removed from the basket.

1642:

1643: When a vacancy arises in the basket, we may generate a new point by

1644: solving a trust-region subproblem similar to \eqnok{trsub.kl},

1645: centering the trust region at the current incumbent $x^I$.  During the

1646: startup phase, while the basket is being populated, we wait until the

1647: evaluation of some other point in the basket has reached a certain

1648: level of completion (that is, until a proportion $\sigma \in (0,1]$ of

1649: the partial sums \eqnok{thetaj} and their subgradients have been

1650: evaluated) before generating a new point. We use a logical variable

1651: ${\tt speceval}_q$ to indicate when the evaluation of $x^q$ passes the

1652: specified threshold and to ensure that $x^q$ does not trigger the

1653: evaluation of more than one new iterate. (Both $\sigma$ and ${\tt

1654:   speceval}_q$ play a similar role in Algorithm ALS.) After the

1655: startup phase is complete (that is, after the basket has been filled),

1656: vacancies arise only after evaluation of an iterate $x^q$ is

1657: completed.

1658:

1659: %

1660: We use $m(\cdot)$ (without

1661: subscripts) to denote the model function for $\cQ(\cdot)$. When

1662: generating a new iterate, we use whatever cuts are stored at the

1663: time to define $m$.  When solved around the incumbent $x^I$

1664: with trust-region radius $\Delta$, the subproblem is as follows:

1665: \beq

1666: \labtag{trsub.atr1} \mbox{\tt trsub$(x^I, \Delta)$:} \;\; \min_x \,

1667: m(x) \;\; \mbox{subject to} \;Ax=b, \; x \ge 0, \; \| x- x^I

1668: \|_{\infty} \le \Delta.

1669: \eeq

1670: We refer to $x^I$ as the {\em parent incumbent} of the solution of

1671: \eqnok{trsub.atr1}.

1672: %

1673: %

1674: %

1675: %

1676: %

1677: %

1678: %

1679: %

1680: %

1681: %

1682: %

1683: %

1684: %

1685:

1686: %

1687: %

1688: %

1689:

1690: In the following description, we use $k$ to index the successive

1691: points $x^k$ that are explored by the algorithm, $I$ to denote the

1692: index of the incumbent, and $\cB$ to denote the basket.  We use $t_k$

1693: to count the number of partial sums $\cQ_{[j]}(x^k)$, $j=1,2,\dots,T$

1694: that have been evaluated so far.

1695:

1696: %

1697: %

1698: %

1699: %

1700: %

1701: %

1702: %

1703: %

1704:

1705: %

1706: %

1707: %

1708: %

1709: %

1710: %

1711: %

1712: %

1713:

1714: Given a starting guess $x^0$, we initialize the algorithm by setting

1715: the dummy point $x^{-1}$ to $x^0$, setting the incumbent index $I$ to

1716: $-1$, and setting the initial incumbent value $\cQ^I =\cQ^{-1}$ to

1717: $\infty$. The iterate at which the first evaluation is completed

1718: becomes the first ``serious'' incumbent.

1719:

1720: We now outline some other notation used in specifying Algorithm ATR:

1721: %

1722: \bi

1723:

1724: \item[$\cQ^I$:] The objective value of the incumbent $x^I$, except in

1725: the case of $I=-1$, in which case $\cQ^{-1} = \infty$.

1726:

1727: \item[$I_q$:] The index of the parent incumbent of $x^q$, that is, the

1728:   incumbent index $I$ at the time that $x^q$ was generated from

1729:   \eqnok{trsub.atr1}. Hence, $\cQ^{I_q} = \cQ(x^{I_q})$ (except when

1730:   $I_q=-1$; see previous item).

1731:

1732: \item[$\Delta_q$:] The value of the trust-region radius $\Delta$ used

1733: when solving for  $x^q$.

1734:

1735: \item[$\Delta_{\rm curr}$:] Current value of the trust-region

1736: radius. When it comes time to solve \eqnok{trsub.atr1} to obtain a new

1737: iterate $x^q$, we set $\Delta_q \leftarrow \Delta_{\rm curr}$.

1738:

1739: \item[$m^q$:] The optimal value of the objective function $m$ in the

1740: subproblem {\tt trsub}$(x^{I_q}, \Delta_q)$ \eqnok{trsub.atr1}.

1741:

1742: %

1743: %

1744:

1745: \ei

1746: %

1747: %

1748: %

1749: %

1750: %

1751: %

1752: %

1753: %

1754: %

1755:

1756: Our strategy for maintaining the model closely follows that of

1757: Algorithm TR. Whenever the incumbent changes, we have a fairly free

1758: hand in deleting the cuts that define $m$, just as we do after

1759: accepting a new major iterate in Algorithm TR. If the incumbent does

1760: not change for a long sequence of iterations (corresponding to a long

1761: sequence of minor iterations in Algorithm TR), we can still delete

1762: ``stale'' cuts that represent information in $m$ that has likely been

1763: superseded (as quantified by a parameter $\eta \in [0,1)$). The

1764: following version of Procedure Model-Update, which applies to

1765: Algorithm ATR, takes as an argument the index $k$ of the latest

1766: iterate generated by the algorithm. It is called after the evaluation

1767: of $\cQ$ at an earlier iterate $x^q$ has just been completed, but

1768: $x^q$ does {\em not} meet the conditions needed to become the new

1769: incumbent.

1770: %

1771: \btab

1772: \> {\bf Procedure Model-Update} $(k)$ \\

1773: \> {\bf for each} optimality cut defining $m$\\

1774: \>\> {\tt possible\_delete}  $\leftarrow$ {\tt true}; \\

1775: \>\> {\bf if} the cut was generated at the parent incumbent $I_k$ of $k$\\

1776: \>\>\> {\tt possible\_delete}  $\leftarrow$ {\tt false}; \\

1777: \>\> {\bf else if} the cut was active at the solution $x^k$ of

1778: {\tt trsub}$(x^{I_k},\Delta_k)$ \\

1779: \>\>\> {\tt possible\_delete}  $\leftarrow$ {\tt false}; \\

1780: \>\> {\bf else if} the cut was generated at an earlier

1781: iteration $\bar{\ell}$ \\

1782: \>\>\>\> such that $I_{\bar{\ell}} = I_k \neq -1$ and

1783: \etab

1784: \beq \labtag{atr.cut.delete.criterion}

1785: \cQ^{I_k} - m^k > \eta [ \cQ^{I_k} - m^{\bar{\ell}} ]

1786: \eeq

1787: \btab

1788: \>\>\> {\tt possible\_delete}  $\leftarrow$ {\tt false}; \\

1789: \>\> {\bf end (if)} \\

1790: %

1791: %

1792: \>\> {\bf if} {\tt possible\_delete} \\

1793: \>\>\> possibly delete the cut; \\

1794: \> {\bf end (for each)}

1795: \etab

1796: %

1797:

1798: Our strategy for adjusting the trust region $\Delta_{\rm curr}$

1799: also follows that of Algorithm TR. The differences arise from the fact

1800: that between the time an iterate $x^q$ is generated and its function

1801: value $\cQ(x^q)$ becomes known, other adjustments of $\Delta_{\rm

1802: current}$ may have occurred, as the evaluation of intervening iterates

1803: is completed. The version of Procedure Reduce-$\Delta$  for

1804: Algorithm ATR is as follows.

1805: %

1806: \btab

1807: \> {\bf Procedure Reduce-$\Delta(q)$} \\

1808: \> {\bf if} $I_q = -1$ \\

1809: \>\> return; \\

1810: \> evaluate

1811: \etab

1812: \beq \labtag{atr.reduce.delta.2}

1813: \rho = {\min(1,\Delta_q)}

1814: \frac{\cQ(x^q) - \cQ^{I_q}}{\cQ^{I_q}  - m^q};

1815: \eeq

1816: \btab

1817: \> {\bf if} $\rho>0$ \\

1818: \>\> {\tt counter} $\leftarrow$ {\tt counter}$+1$; \\

1819: \> {\bf if} $\rho>3$ {\bf or}

1820: ({\tt counter} $\ge 3$ {\bf and} $\rho \in (1,3]$) \\

1821: \>\> set $\Delta_q^+ \leftarrow \Delta_q / \min(\rho,4)$; \\

1822: \>\> set

1823: $\Delta_{\rm curr} \leftarrow \min(\Delta_{\rm curr}, \Delta_q^+)$; \\

1824: \>\> reset {\tt counter} $\leftarrow 0$; \\

1825: \> return.

1826: \etab

1827: %

1828:

1829: The protocol for increasing the trust region after a successful step

1830: is based on \eqnok{tr.incr.1}, \eqnok{tr.incr.3}. If on completion of

1831: evaluation of $\cQ(x^q)$, the iterate $x^q$ becomes the new incumbent,

1832: then we test the following condition:

1833: \beq \labtag{atr.incr.1}

1834: \cQ(x^q) \le \cQ^{I_q} - 0.5 (\cQ^{I_q} - m^q) \;\; \mbox{and} \;\;

1835: \| x^q - x^{I_q} \|_{\infty} = \Delta_q.

1836: \eeq

1837: If this condition is satisfied, we set

1838: \beq \labtag{atr.incr.3}

1839: \Delta_{\rm curr} \leftarrow \max(\Delta_{\rm curr},

1840:  \min (\Delta_{\rm hi}, 2 \Delta_q) ).

1841: \eeq

1842:

1843: The convergence test is also similar to the test \eqnok{conv.test}

1844: used for Algorithm TR. We terminate if, on generation of a new iterate

1845: $x^k$, we find that

1846: \beq \labtag{conv.test.atr}

1847: \cQ^I - m^k \le \epstol (1+|\cQ^I|).

1848: \eeq

1849:

1850:

1851: We now specify the four key routines of the Algorithm ATR, which serve

1852: a similar function to the four main routines of Algorithm ALS. As in

1853: the earlier case, we assume for simplicity of description that each

1854: task consists of evaluation of the function and a subgradient for

1855: a single cluster (although in practice we may bundle more than one

1856: cluster into a single task). The routine {\tt partial\_evaluate}

1857: executes on worker processors, while the other three routines execute

1858: on the master processor.

1859:

1860: %

1861: %

1862: %

1863: %

1864: %

1865: %

1866: %

1867: %

1868: %

1869: %

1870: %

1871: %

1872: %

1873: %

1874: %

1875: %

1876: %

1877: %

1878: %

1879:

1880: \btab

1881: \>{\bf ATR:} \ \ {\tt  partial\_evaluate}$(x^q,q,j,\cQ_{[j]}(x^q),g_j)$ \\

1882: \> Given $x^q$, index  $q$, and  partition number $j$,

1883: evaluate $\cQ_{[j]}(x^q)$ from \eqnok{thetaj} \\

1884: \>\> together with a partial subgradient $g_j$ from \eqnok{subg.Qj}; \\

1885: \> Activate {\tt act\_on\_completed\_task}$(x^q,q,j,\cQ_{[j]}(x^q),g_j)$

1886: on the master processor.

1887: \etab

1888:

1889: \medskip

1890:

1891: \btab

1892: \> {\bf ATR:} \ \ {\tt  evaluate}$(x^q,q)$ \\

1893: \> {\bf for} $j=1,2,\dots, T$ (possibly concurrently) \\

1894: \>\> {\tt partial\_evaluate}$(x^q,q,j,\cQ_{[j]}(x^q), g_j)$; \\

1895: \> {\bf end (for)}

1896: \etab

1897:

1898: \medskip

1899:

1900: \btab

1901: \> {\bf ATR:} \ \ {\tt initialization}$(x^0)$ \\

1902: \> choose $\xi \in (0,1/2)$, trust region upper bound

1903: $\Delta_{\rm hi}>0$; \\

1904: \> choose synchronicity parameter $\sigma \in (0,1]$; \\

1905: \> choose maximum basket size $K>0$; \\

1906: \> choose $\Delta_{\rm curr} \in (0, \Delta_{\rm hi}]$,

1907: {\tt counter} $\leftarrow 0$; $\cB \leftarrow \emptyset$; \\

1908: \> $I \leftarrow -1$; $x^{-1} \leftarrow x^0$; $\cQ^{-1} \leftarrow \infty$;

1909: $I_0 \leftarrow -1$; \\

1910: \> $k \leftarrow 0$;

1911: ${\tt speceval}_0 \leftarrow {\tt  false}$;

1912: $t_0 \leftarrow 0$; \\

1913: \> {\tt evaluate}$(x^0,0)$.

1914: \etab

1915:

1916: \medskip

1917:

1918: \btab

1919: \> {\bf ATR:} \ \

1920: {\tt act\_on\_completed\_task}$(x^q,q,j,\cQ_{[j]}(x^q),g_j))$ \\

1921: \> $t_q \leftarrow t_q+1$; \\

1922: \> add $\cQ_{[j]}(x^q)$ and cut $g_j$ to the model $m$; \\

1923: \> {\tt basketFill} $\leftarrow$ {\tt  false};

1924: {\tt basketUpdate} $\leftarrow$ {\tt  false}; \\

1925: \> {\bf if} $t_q=T$ (* evaluation of $\cQ(x^q)$ is complete *) \\

1926: \>\> {\bf if} $\cQ(x^q) < \cQ^I$ and (${I_q}=-1$ or

1927: $\cQ(x^q) \le \cQ^{I_q} - \xi (\cQ^{I_q} - m^q)$) \\

1928: \>\>\> (* make $x^q$ the new incumbent *) \\

1929: \>\>\>  $I \leftarrow q$;  $\cQ^I \leftarrow \cQ(x^I)$; \\

1930: \>\>\> possibly increase  $\Delta_{\rm curr}$ according to

1931: \eqnok{atr.incr.1} and \eqnok{atr.incr.3}; \\

1932: \>\>\> modify the model function by possibly deleting cuts not arising \\

1933: \>\>\>\> from the evaluation of $\cQ(x^q)$; \\

1934: \>\> {\bf else} \\

1935: \>\>\> call Model-Update$(k)$; \\

1936: \>\>\> call Reduce-$\Delta(q)$ to update $\Delta_{\rm curr}$; \\

1937: \>\> {\bf end (if)} \\

1938: \>\> $\cB \leftarrow \cB \backslash \{ q \}$; \\

1939: \>\> {\tt basketUpdate} $\leftarrow$ {\tt true}; \\

1940:

1941: \> {\bf else if }

1942:  $t_q \ge \sigma T$ {\bf and} $| \cB| <K$ {\bf and} not ${\tt speceval}_q$ \\

1943: \>\> (* basket-filling phase: enough partial sums have been evaluated at $x^q$

1944:  \\

1945: \>\>\> to trigger calculation of a new candidate iterate *) \\

1946: \>\> ${\tt speceval}_q \leftarrow ${\tt true};

1947: {\tt basketFill} $\leftarrow$ {\tt true}; \\

1948: \> {\bf end (if)} \\

1949:

1950: \> {\bf if } {\tt basketFill} {or}

1951: {\tt basketUpdate} \\

1952: \>\> $k \leftarrow k+1$;

1953: set $\Delta_k \leftarrow \Delta_{\rm curr}$; set $I_k \leftarrow I$; \\

1954: \>\> solve {\tt trsub}$(x^I,\Delta_k)$ to obtain $x^k$; \\

1955: \>\> $m^k \leftarrow m(x^k)$; \\

1956: \>\> {\bf if} \eqnok{conv.test.atr} holds \\

1957: \>\>\> STOP;  \\

1958: \>\> $\cB \leftarrow \cB \cup \{ k \}$; \\

1959: \>\> ${\tt speceval}_k \leftarrow${\tt false}; $t_k \leftarrow 0$; \\

1960: \>\> {\tt evaluate}$(x^k,k)$; \\

1961: \> {\bf end (if)}

1962:

1963: \etab

1964:

1965: It is not generally true that the first $K$ iterates $x^0, x^1, \dots,

1966: x^{K-1}$ generated by the algorithm are all basket-filling

1967: iterates. Often, an evaluation of some iterate is completed before the

1968: basket has filled completely, so a ``basket-update'' iterate is used

1969: to generate a replacement for this point. Since each basket-update

1970: iterate does not change the size of the basket, however, the number of

1971: basket-filling iterates that are generated in the course of the

1972: algorithm is exactly $K$.

1973:

1974: \subsection{Analysis of Algorithm ATR} \labtag{sec:atr:analysis}

1975:

1976: We now analyze Algorithm ATR, showing that its convergence properties

1977: are similar to those of Algorithm TR. Throughout, we make the

1978: following assumption:

1979: %

1980: \beq \labtag{all.tasks.completed}

1981: \mbox{Every task is completed after a finite  time}.

1982: \eeq

1983: %

1984: %

1985: %

1986: %

1987:

1988: The analysis follows closely that of Algorithm TR presented in

1989: Section~\ref{sec:tr:analysis}. We state the analogues of all the

1990: lemmas and theorems from the earlier section, incorporating the

1991: changes and redefinitions needed to handle Algorithm ATR. Most of the

1992: details of the proofs are omitted, however, since they are similar to

1993: those of the earlier results.

1994:

1995: We start by defining the level set within which the points and

1996: incumbents generated by ATR lie.

1997: \begin{lemma} \labtag{lem:atr1.1}

1998: All incumbents $x^I$ generated by ATR lie in $\cL(\cQ_{\rm max})$,

1999: whereas all points $x^k$ considered by the algorithm lie in

2000: $\cL(\cQ_{\rm max}; \Delta_{\rm hi})$, where $\cL(\cdot)$ and

2001: $\cL(\cdot;\cdot)$ are defined by \eqnok{def.ls} and \eqnok{def.lsn},

2002: respectively, and $\cQ_{\rm max}$ is defined by

2003: \[

2004: \cQ_{\rm max} \defeq \sup \{ \cQ(x) \, | \,

2005: \| x-x^0 \| \le \Delta_{\rm hi} \}.

2006: \]

2007: \end{lemma}

2008: \begin{proof}

2009:   Consider first what happens in ATR before the  first function

2010:   evaluation is complete. Up to this point, all the iterates $x^k$ in

2011:   the basket are generated in the basket-filling part and therefore

2012:   satisfy $\| x^k-x^0 \| \le \Delta_k \le \Delta_{\rm hi}$, with

2013:   $\cQ^{I_k} = \cQ^{-1} = \infty$.

2014:

2015:   When the first evaluation is completed (by $x^k$, say), it trivially

2016:   passes the test to be accepted as the new incumbent. Hence, the

2017:   first noninfinite incumbent value becomes $\cQ^I = \cQ(x^k)$, and

2018:   by definition we have $\cQ^I \le \cQ_{\rm max}$.  Since all later

2019:   incumbents must have objective values smaller than this first

2020:   $\cQ^I$, they all must lie in the level set $\cL(\cQ_{\rm max})$,

2021:   proving our first statement.

2022:

2023: All points $x^k$ generated within {\tt act\_on\_completed\_task} lie

2024: within a distance $\Delta_k \le \Delta_{\rm hi}$ either of $x^0$ or of

2025: one of the later incumbents $x^I$. Since all the incumbents, including

2026: $x^0$, lie in $\cL(\cQ_{\rm max})$, we conclude that the second claim

2027: in the theorem is also true.

2028: %

2029:

2030: \end{proof}

2031:

2032: Analogously with $\beta$ \eqnok{def.beta}, we define a bound on the

2033: subgradients over the set $\cL(\cQ_{\rm max}; \Delta_{\rm hi})$ as

2034: follows:

2035: \beq \labtag{def.barbeta}

2036: \bar{\beta} = \sup \{ \| g \|_1 \, | \, g \in \partial \cQ(x), \,

2037: \mbox{for some $x \in \cL(\cQ_{\rm max};\Delta_{\rm hi})$} \}.

2038: \eeq

2039:

2040: The next result is analogous to Lemma~\ref{lem:mkl}. It shows that for

2041: any sequence of iterates $x^k$ for which the parent incumbent $x^I_k$

2042: is the same, the optimal objective value in {\tt trsub}$(x^{I_k},

2043: \Delta_k)$ is monotonically increasing.

2044: \begin{lemma} \labtag{lem:mkl.atr}

2045: Consider any contiguous subsequence of iterates $x^{k}$,

2046: $k=k_1,k_1+1,\dots, k_2$ for which the parent incumbent is identical;

2047: that is, $I_{k_1}=I_{k_1+1}= \cdots = I_{k_2}$. Then we have

2048: \[

2049: m^{k_1} \le m^{k_1+1} \le \cdots \le m^{k_2}.

2050: \]

2051: \end{lemma}

2052: \begin{proof}

2053: We select any $k=k_1, k_1+1, \dots, k_2-1$ and

2054: prove that $m^k \le m^{k+1}$.

2055: Since $x^k$ and $x^{k+1}$ have the same parent incumbent ($x^I$, say),

2056: no new incumbent has been accepted between the generation of these two

2057: iterates, so the wholesale cut deletion that may occur with the

2058: adoption of a new incumbent cannot have occurred.  There may, however,

2059: have been a call to {\tt Model-Update}$(k)$. The

2060: first ``else if'' clause in {\tt Model-Update} would have ensured that

2061: cuts active at the solution of {\tt trsub}$(x^I, \Delta_k)$ were still

2062: present in the model when we solved {\tt trsub}$(x^I, \Delta_{k+1})$ to

2063: obtain $x^{k+1}$. Moreover, since no new incumbent was accepted,

2064: $\Delta_{\rm curr}$ cannot have been increased, and we have

2065: $\Delta_{k+1} \le \Delta_k$. We now use the same argument as in the

2066: proof of Lemma~\ref{lem:mkl} to deduce that $m^{k} \le m^{k+1}$.

2067: \end{proof}

2068:

2069: The following result is analogous to Lemma~\ref{lem:tr:1}. We omit the

2070: proof, which modulo the change in notation is identical to the earlier

2071: result.

2072: \begin{lemma} \labtag{lem:atr:1}

2073: For all $k=0,1,2,\ldots$ such that $I_k \neq -1$,  we have that

2074: \begin{subequations} \labtag{lem:atr:inequalities}

2075: \beqa

2076: \labtag{atr.2a}

2077: \cQ^{I_k} - m^k & \ge  &

2078: \min \left( \Delta_{k}, \| x^{I_k} - P(x^{I_k})\|_{\infty} \right)

2079: \frac{\cQ^{I_k} - \cQ^*}{\| x^{I_k} - P(x^{I_k}) \|_{\infty}} \\

2080: \labtag{atr.2b}

2081: & \ge &

2082: \hat{\epsilon} \min

2083: \left( \Delta_{k}, \| x^{I_k} - P(x^{I_k})\|_{\infty} \right),

2084: \eeqa

2085: \end{subequations}

2086: where $\hat{\epsilon}>0$ is defined in \eqnok{weak.sharp}.

2087: \end{lemma}

2088:

2089: The following analogue of Lemma~\ref{lem:trbounds} requires a slight

2090: redefinition of the quantity $E_k$ from \eqnok{def:Ek}. We now

2091: define it to be the closest approach by an {\em incumbent} to the

2092: solution set, up to and including iteration $k$; that is,

2093: \beq \labtag{def:Ek:atr}

2094: E_k \defeq \min_{\bar{k} = 0,1,\dots, k; I_{\bar{k}} \neq -1}

2095: \| x^{I_{\bar{k}}} - P(x^{I_{\bar{k}}}) \|_{\infty}.

2096: \eeq

2097: %

2098: We also omit the proof of the following result, which, allowing for

2099: the change of notation, is almost identical to that of

2100: Lemma~\ref{lem:trbounds}.

2101: %

2102: \begin{lemma} \labtag{lem:trbounds:atr}

2103:   There is a constant $\Delta_{\rm lo} >0$ such that for all trust

2104:   regions $\Delta_{k}$ used in the course of Algorithm ATR, we

2105:   have

2106: \[

2107: \Delta_{k} \ge \min( \Delta_{\rm lo}, E_k/4).

2108: \]

2109: \end{lemma}

2110: The value of $\Delta_{\rm lo}$ that works in this case is $\Delta_{\rm

2111:   lo} = (1/4) \min(1, \hat{\epsilon}/\bar{\beta}, \Delta_{\rm hi})$,

2112: where $\bar{\beta}$ comes from \eqnok{def.barbeta}.

2113:

2114: There is also an analogue of Lemma~\ref{lem:tr:ft} that shows that if

2115: the incumbent remains the same for a number of consecutive iterations,

2116: the gap between incumbent objective value and model function decreases

2117: significantly as the iterations proceed.

2118: %

2119: \begin{lemma} \labtag{lem:atr:ft}

2120:   Let $\epstol=0$ in Algorithm ATR, and let $\bar{\eta}$ be

2121:   any constant satisfying $0<\bar{\eta}<1$, $\bar{\eta}>\xi$,

2122:   $\bar{\eta} \ge \eta$. Choosing any index $k_1$ with $I_{k_1} \neq

2123:   -1$, we have either that the incumbent $I_{k_1}=I$ is eventually

2124:   replaced by a new incumbent or that there is an iteration

2125:   $k_2>k_1$ such that

2126: \beq \labtag{atr.6}

2127: \cQ^{I} - m^{k_2} \le \bar{\eta} \left[

2128: \cQ^{I} - m^{k_1} \right].

2129: \eeq

2130: \end{lemma}

2131: The proof of this result follows closely that of its antecedent

2132: Lemma~\ref{lem:tr:ft}. The key is in the construction of the

2133: Model-Update procedure. As long as

2134: \beq \labtag{atr.7}

2135: \cQ^I - m^k > \eta [\cQ^I - m^{k_1}], \;\; \mbox{for $k \ge k_1$, where

2136: $I=I_{k_1} = I_k$},

2137: \eeq

2138: none of the cuts generated during the evaluation of $\cQ(x^q)$ for any

2139: $q=k_1, k_1+1, \dots, k$ can be deleted. The proof technique of

2140: Lemma~\ref{lem:tr:ft} can then be used to show that the successive

2141: iterates $x^{k_1}, x^{k_1+1}, \dots$ cannot be too closely spaced if

2142: the condition \eqnok{atr.7} is to hold and if all of them fail to

2143: satisfy the test to become a new incumbent. Since they all belong

2144: to a box of finite size centered on $x^I$, there can be only finitely

2145: many of these iterates. Hence, either a new incumbent is adopted

2146: at some iteration $k \ge k_1$ or  condition \eqnok{atr.6} is

2147: eventually satisfied.

2148:

2149: We now show that the algorithm cannot ``get stuck'' at a nonoptimal

2150: incumbent. The following result is analogous to

2151: Theorem~\ref{th:tr:ft}, and its proof relies on the earlier results in

2152: exactly the same way.

2153: \begin{theorem} \labtag{th:atr:ft}

2154:   Suppose that $\epstol =0$.

2155: \begin{itemize}

2156: \item[(i)] If $x^I \notin \cS$, then this incumbent is replaced by a

2157: new incumbent after a finite time.

2158: \item[(ii)] If $x^I \in \cS$, then either Algorithm ATR terminates

2159: (and verifies that $x^I \in \cS$), or $\cQ^I - m^k \downarrow 0$

2160: as $k \to \infty$.

2161: \end{itemize}

2162: \end{theorem}

2163:

2164: We conclude with the result that shows convergence of the sequence of

2165: incumbents to $\cS$. Once again, the logic of proof follows that of

2166: the synchronous analogue Theorem~\ref{th:tr:conv}.

2167: %

2168: \begin{theorem} \labtag{th:atr:conv}

2169:   Suppose that $\epstol=0$. The sequence of incumbents

2170:   $\{ x^{I_k} \}_{k=0,1,2,\dots}$ is either finite,

2171: terminating at some $x^I \in \cS$ or is infinite with

2172:  the property that $\| x^{I_k} - P(x^{I_k})

2173:   \|_{\infty} \to 0$.

2174: \end{theorem}

2175:

2176: \section{Implementation on Computational Grids} \labtag{sec:grids}

2177:

2178: We now describe some salient properties of the computational

2179: environment in which we implemented the algorithms, namely, a

2180: computational grid running the Condor system and the MW runtime

2181: support library.

2182:

2183: \subsection{Properties of Grids} \labtag{sec:grids:intro}

2184:

2185: The term ``grid computing'' (synonymously ``metacomputing'') is

2186: generally used to describe parallel computations on a geographically

2187: distributed, heterogeneous computing platform. Within this framework

2188: there are several variants of the concept. The one of interest here is

2189: a parallel platform made up of shared workstations, nodes of PC

2190: clusters, and supercomputers.  Although such platforms are potentially

2191: powerful and inexpensive, they are difficult to harness for productive

2192: use, for the following reasons:

2193: %

2194: \bi

2195: \item Poor communications properties. Latencies between the processors

2196:   may be high, variable, and unpredictable.

2197:

2198: \item Unreliability. Resources may disappear without notice. A

2199:   workstation performing part of our computation may be reclaimed by

2200:   its owner and our job terminated.

2201:

2202: \item Dynamic availability. The pool of available processors grows and

2203: shrinks during the computation, according to the claims of other users

2204: and scheduling considerations at some of the nodes.

2205:

2206: \item Heterogeneity. Resources may vary in their operational

2207: characteristics (memory, swap space, processor speed, operating

2208: system).

2209:

2210: \ei

2211: %

2212: In all these respects, our target platform differs from conventional

2213: multiprocessor platforms (such as IBM SP or SGI Origin machines) and

2214: from Linux clusters.

2215:

2216: \subsection{Condor} \labtag{sec:grids:condor}

2217:

2218: Our particular interest is in grid computing platforms based on the

2219: Condor system~\cite{condor}, which manages distributively owned

2220: collections (``pools'') of processors of different types, including

2221: workstations, nodes from PC clusters, and nodes from conventional

2222: multiprocessor platforms. When a user submits a job, the Condor system

2223: discovers a suitable processor for the job in the pool, transfers the

2224: executable and starts the

2225: job on that processor. It traps system calls (such as input/output

2226: operations), referring them back to the submitting workstation,

2227: and checkpoints the state of the job periodically. It also migrates the

2228: job to a different processor in the pool if the current host becomes

2229: unavailable for any reason (for example, if the workstation is

2230: reclaimed by its owner).  Condor managed

2231: processes can communicate through a Condor-enabled version of PVM

2232: \cite{PVMbook} or by using Condor's I/O trapping to write into and

2233: read from a series of shared files.

2234:

2235: %

2236: %

2237: %

2238: %

2239: %

2240: %

2241: %

2242: %

2243: %

2244: %

2245: %

2246: %

2247: %

2248: %

2249: %

2250: %

2251: %

2252: %

2253: %

2254: %

2255: %

2256: %

2257: %

2258: %

2259: %

2260: %

2261: %

2262: %

2263: %

2264: %

2265: %

2266: %

2267: %

2268: %

2269: %

2270: %

2271:

2272: %

2273: %

2274: %

2275: %

2276: %

2277: %

2278: %

2279: %

2280: %

2281:

2282: %

2283: %

2284: %

2285: %

2286:

2287:

2288: \subsection{Implementation in MW} \labtag{sec:grids:mw}

2289:

2290: MW (see Goux, Linderoth, and Yoder~\cite{GouLY00} and Goux et

2291: al.~\cite{GouKLY00}) is a runtime support library that facilitates

2292: implementation of parallel master-worker applications on computational

2293: grids. To implement MW on a particular computational grid, a grid

2294: programmer must reimplement a small number of functions to perform

2295: basic operations for communications between processors and management

2296: of computational resources. These functions are encapsulated in the

2297: MWRMComm class. Of more relevance to the current paper is the other

2298: side of MW, the application programming interface presented to the

2299: application programmer. This interface takes the form of a set of

2300: three C$++$ abstract classes that must be reimplemented in a way that

2301: describes the particular application. These classes, named MWDriver,

2302: MWTask, and MWWorker, contain a total of ten methods for which the

2303: user must supply implementations. We describe these methods briefly,

2304: indicating how they are implemented for the particular case of the ATR

2305: and ALS algorithms.

2306:

2307: \paragraph{MWDriver.}

2308:

2309: This class is made up of methods that execute on the submitting

2310: workstation, which acts as the master processor. It contains the

2311: following four C$++$ pure virtual functions. (Naturally, other methods

2312: can be defined as needed to implement parts of the algorithm.)

2313: %

2314: \begin{itemize}

2315: %

2316: \item {\tt get\_userinfo}: Processes command-line arguments and does

2317:   basic setup. In our applications this function reads a command file

2318:   to set various parameters, including convergence tolerances, number

2319:   of scenarios, number of partial sums to be evaluated in each task,

2320:   maximum number of worker processors to be requested, initial trust

2321:   region radius, and so on. It calls the routines that read and store

2322:   the problem data files, and it reads the initial point, if one is

2323:   supplied.  It also performs the operations specified in the {\tt

2324:     initialization} routine of Algorithms ALS and ATR, except for the

2325:   final {\tt evaluate} operation, which is handled by the next

2326:   function.

2327:

2328: %

2329: \item {\tt setup\_initial\_tasks}: Defines the initial pool of tasks.

2330:   In the case of Algorithms ALS and ATR, this function corresponds to

2331:   a call to {\tt evaluate} at $x^0$.

2332:

2333: %

2334: \item {\tt pack\_worker\_init\_data}: Packs the initial data to be

2335:   sent to each worker processor when it joins the pool. In our case,

2336:   the information contained in the input files for the stochastic

2337:   programming problem is sent to each worker.  When the worker

2338:   subsequently receives a task requiring it to solve a number of

2339:   second-stage scenarios, it can use the original input data to

2340:   generate the particular data for its assigned set of scenarios.

2341: %

2342: %

2343: %

2344:   By loading each new worker with the problem data, we avoid having to

2345:   subsequently pass a complete set of data for every scenario in every

2346:   task.

2347:

2348: %

2349: \item {\tt act\_on\_completed\_task}: Is called every time

2350:   a task finishes, to process the results of the task and to take any

2351:   actions arising from these results.  See Algorithms ALS and ATR for

2352:   our definition of this function in our applications.

2353: %

2354: %

2355:

2356: \end{itemize}

2357:

2358: %

2359: %

2360: %

2361:

2362: The MWDriver base class performs many other operations associated with

2363: handling worker processes that join and leave the computation,

2364: assigning tasks to appropriate workers, rescheduling tasks when their

2365: host workers disappear without warning, and keeping track of

2366: performance data for the run. All this complexity is hidden from the

2367: application programmer.

2368:

2369: \paragraph{MWTask.}

2370:

2371: The MWTask is the abstraction of a single task. It holds both the data

2372: describing that task and the results obtained by executing the task.

2373: The user must implement four functions for packing and unpacking this

2374: data and results between master and workers into simple data

2375: structures that can be communicated between master and workers using

2376: the appropriate primitives for the particular computational grid

2377: platform on which MW is implemented. In most of the results reported

2378: in Section~\ref{sec:results}, the message-passing facilities of

2379: Condor-PVM were used to perform the communication.  By simply changing

2380: compiler directives, the same algorithmic code can also be implemented

2381: on an alternative communication protocol that uses shared files to

2382: pass messages between master and workers. The large run reported in

2383: the next section used this version of the code.

2384:

2385: %

2386: %

2387:

2388: In our applications, each task evaluates the partial sum

2389: $\cQ_{[j]}(x)$ and a subgradient for a given number of clusters. The

2390: task is described by a range of scenario indices for each cluster in

2391: the task and by a value of the first-stage variables $x$. The results

2392: consist of the function and subgradient for each of the clusters

2393: in the task.

2394:

2395: \paragraph{MWWorker.}

2396:

2397: The MWWorker class is the core of the executable that runs on each

2398: worker. The user must implement two pure virtual functions:

2399:

2400: \begin{itemize}

2401: \item {\tt unpack\_init\_data}: Unpacks the initial information passed

2402:   to the worker by the MWDriver function {\tt

2403:     pack\_worker\_init\_data()} when the worker joins the pool. (See

2404:   the discussion of {\tt pack\_worker\_init\_data} in the MWDriver class.)

2405:

2406: \item {\tt execute\_task}: Executes a single task.

2407: \end{itemize}

2408:

2409: After initializing itself, using the information passed to it by the

2410: master, the worker process sits in a loop, waiting for tasks to be

2411: sent to it. When it detects a new task, it calls {\tt execute\_task}

2412: to compute the results. It passes the results back to the worker by

2413: using the appropriate function from the MWTask class, and then returns

2414: to its wait loop. The wait loop terminates when the master sends a

2415: termination message. In our applications, the {\tt execute\_task()}

2416: function formulates the second-stage linear programs in its clusters

2417: by using the information in the task definition and the data passed to

2418: the worker on initialization. It then calls the linear programming

2419: solvers SOPLEX or CPLEX

2420:  to solve these linear programs, and

2421: uses the dual solutions to calculate the subgradient for each cluster.

2422:

2423:

2424: \section{Computational Results} \labtag{sec:results}

2425:

2426: %

2427: %

2428: %

2429: %

2430: %

2431: %

2432: %

2433: %

2434:

2435: %

2436: %

2437: %

2438: %

2439: %

2440: %

2441: %

2442: %

2443: %

2444: %

2445: %

2446:

2447: We now report on computational experiments obtained with

2448: implementations of the ALS, TR, and ATR algorithms using MW on the

2449: Condor system. After describing some further details of the

2450: implementations and the experiments, we discuss our choices for the

2451: various algorithmic parameters and how these were varied between runs.

2452: We then tabulate and discuss the results.

2453:

2454: \subsection{Implementations and Experiments}

2455: \label{sec:results:details}

2456:

2457: As noted earlier, we used the Condor-PVM implementation of MW for most

2458: of the the runs reported here.

2459: %

2460: %

2461: %

2462: %

2463: %

2464: %

2465: %

2466: Most of the computational time is taken up with solving linear

2467: programming problems, both by the master process (in the {\tt

2468:   act\_on\_completed\_task} function) and in the tasks, which solve

2469: clusters of second-stage linear programs. We used the CPLEX simplex

2470: solver on the master processor and the SOPLEX public-domain simplex

2471: code (see Wunderling~\cite{soplex}) on the workers. SOPLEX is somewhat

2472: slower in general, but since most of the machines in the Condor pool

2473: do not have CPLEX licenses, there was little alternative but

2474: to use a public-domain code.

2475:

2476: We ran most of our experiments on the Condor pool at the University of

2477: Wisconsin, sometimes using Condor's flocking mechanism to augment this

2478: pool with processors from other sites. The other sites included the

2479: University of New Mexico, Columbia University, and the Linux cluster

2480: Chiba City at Argonne National Laboratory. The architectures included

2481: PCs running Linux, and PCs and Sun workstations running different

2482: versions of Solaris. The number of workers available for our use

2483: varied dramatically between and during each set of trials, because of

2484: the differing priorities of the two accounts we used, the variation of

2485: our priority during each run, the number and priorities of other users

2486: of the Condor pool at the time, and the varying number of machines

2487: available to the pool.  The latter number tends to be larger during

2488: the night, when owners of the individual workstations are less likely

2489: to be using them.  The master process was run on a Linux machine in

2490: some experiments and an Intel Solaris machine in other cases.

2491:

2492:

2493: The input files for the problems reported here were in SMPS format

2494: (see Birge et al.~\cite{BirDGGKW87} and Gassmann and

2495: Schweitzer~\cite{GasS97}). We considered two-stage stochastic linear

2496: programs in which the number of scenarios is finite but extremely

2497: large. We used Monte Carlo sampling to obtain approximate problems

2498: with a specified number $N$ of second-stage scenarios. Brief

2499: descriptions of the test problems can be found at \cite{Hol97}.

2500: %

2501: %

2502: %

2503: %

2504: %

2505: %

2506: %

2507: %

2508: %

2509: In each experiment, we supplied a starting point to the code, obtained

2510: from the solution of a different sampled instance of the same problem.

2511: The function value of the starting point was therefore quite close to

2512: the optimal objective value.

2513:

2514:

2515: \subsection{Critical Parameters}

2516: \label{sec:results:parameters}

2517:

2518: As part of the initialization procedure (implemented by the {\tt

2519:   get\_userinfo} function in the MWDriver class), the code reads an

2520: input file in which various parameters are specified. Several

2521: parameters, such as those associated with modifying the size of the

2522: trust region, have fixed values that we have discussed already in the

2523: text. Others are assigned the same values for all algorithms and all

2524: experiments, namely,

2525: \[

2526: \epsilon_{\rm tol} = 10^{-5}, \sgap

2527: \Delta_{\rm hi} = 10^3, \sgap

2528: \Delta_{0,0} = \Delta_0 = 1, \sgap

2529: \xi = 10^{-4}.

2530: \]

2531: We also set $\eta= 0$ in the Model-Update functions in both TR and

2532: ATR. In TR, this choice has the effect of not allowing deletion of

2533: cuts generated during any major iterations, until a new major iterate

2534: is accepted. In ATR, the effect is to not allow deletion of cuts that

2535: are generated at points whose parent incumbent is still the incumbent.

2536: Even among cuts for which {\tt possible\_delete} is still true at the

2537: final conditional statement of the Model-Update procedures, we do not

2538: actually delete the cuts until they have been inactive at the solution

2539: of the trust-region subproblem for a specified number of consecutive

2540: iterations. For TR, we delete the cut if it has been inactive for more

2541: than 100 consecutive minor iterations, while in ATR we delete the cut

2542: if it was last active at subproblem $\ell$, where $\ell < k-100$ and

2543: $k$ is the current iteration index. Our cut deletion strategy is

2544: therefore not at all parsimonious; it tends to lead to subproblems

2545: \eqnok{trsub.kl} and \eqnok{trsub.atr1} with fairly large numbers of

2546: cuts. In most cases, however, the storage required for these cuts and

2547: the time required to solve the subproblems remain reasonable. We

2548: discuss the exceptions below.

2549:

2550: The synchronicity parameter $\sigma$, which arises in Algorithms ALS

2551: and ATR and which specifies the proportion of clusters from a

2552: particular point that must be evaluated in order to trigger evaluation

2553: of a new candidate solution, is varied between $.5$ and $1.0$ in our

2554: experiments.  The size $K$ of the basket $\cB$ is varied between $1$

2555: and $14$. For each problem, the number $T$ of clusters is also varied

2556: in a manner described in the tables, as is the number of tasks into

2557: which the second-stage calculations are divided, which we denote by

2558: $C$. Note that the number of second-stage LPs per chunk is therefore

2559: $N/C$ while the number per cluster is $N/T$.

2560:

2561: The MW library allows us to specify an upper bound on the number of

2562: workers we request from the Condor pool, so that we can avoid claiming

2563: more workers than we can utilize effectively. We calculate a rough

2564: estimate of this number based on the number of tasks $C$ per

2565: evaluation of $\cQ(x)$ and the basket size $K$.  For instance, the

2566: synchronous TR and LS algorithms can never use more than $C$ worker

2567: processors, since they evaluate $\cQ$ at just one $x$ at a time. In

2568: the case of TR and ATR, we request $\mbox{mid} (25, 200, \lfloor

2569: (K+1)C/2 \rfloor)$

2570: workers.  For ALS, we request $\mbox{mid}(25,200,2C)$ workers.

2571:

2572: We have a single code that implements all four algorithms LS, ALS, TR,

2573: and ATR, using logical branches within the code to distinguish between

2574: the L-shaped and trust-region variants.  There is no distinction in

2575: the code between the two synchronous variants and their asynchronous

2576: counterparts. Instead, by setting $\sigma=1.0$, we force synchronicity

2577: by ensuring that the algorithm considers only one value of $x$ at a

2578: time.

2579:

2580: Whenever a worker processor joins the computation, MW sends it a

2581: benchmark task that typifies the type of task it will receive during

2582: the run. In our case, we define the benchmark task to be the solution

2583: of $N/C$ second-stage LPs. The time required for the processor to

2584: solve this task is logged, and we set the ordering policy so as to

2585: ensure that when more than one worker is available to process a

2586: particular task, the task is sent to the worker that logged the

2587: fastest time on the benchmark task.

2588:

2589: \subsection{Results: Varying Parameter Choices} \label{sec:results:numbers}

2590:

2591: In this section we describe a series of experiments on the same

2592: problem, using different parameter settings, and run under different

2593: conditions on the Condor pool. For these trials, we use the problem

2594: SSN, which arises from a network design application described by Sen,

2595: Doverspike, and Cosares~\cite{SenDC94}. This problem is based on a

2596: graph with 89 arcs, each representing a telecommunications link

2597: between two cities. The first-stage variables represent the

2598: (nonnegative) extra capacity to be added to each of these 89 arcs to

2599: meet an uncertain demand pattern. There is a constraint on the total

2600: added capacity. The demands consist of requests for service between

2601: pairs of nodes in the graph. For each set of requests, a route through

2602: the network of sufficient capacity to meet the requests must be found,

2603: otherwise a penalty term for each request that cannot be satisfied is

2604: added to the objective. The second-stage problems are network flow

2605: problems for calculating the routing for a given set of demand flows.

2606: Each such problem is nontrivial: 706 variables, 175 constraints, and

2607: 2284 nonzeros in the constraint matrix. The uncertainty lies in the

2608: fact that the demand for service on each of the 86 pairs is not known

2609: exactly. Rather, there are three to seven possible scenarios for

2610: these demands, all independent of each other, giving a total of about

2611: $10^{70}$ possible scenarios. We use Monte Carlo sampling to obtain a

2612: sampled approximation with $N=10,000$ scenarios. The deterministic

2613: equivalent for this sampled approximation has approximately $1.75

2614: \times 10^6$ constraints and $7.06 \times 10^6$ variables. In all the

2615: runs, we used as starting point the computed solution for a different

2616: sampled approximation---one with $20,000$ scenarios and a different

2617: random seed. The starting point had a function value of approximately

2618: $9.868860$, whereas the optimal objective was approximately

2619: $9.832544$.

2620:

2621: In the tables below we list the following information.

2622: %

2623: \begin{itemize}

2624: \item {\bf points evaluated}. The number of distinct values of the

2625: first-stage variables $x$ generated by solving the master

2626: subproblem---the problem \eqnok{als.subprob} for Algorithm ALS,

2627: \eqnok{trsub.kl} for Algorithm TR, and \eqnok{trsub.atr1} for

2628: Algorithm ATR.

2629: %

2630: %

2631: %

2632: %

2633: %

2634:

2635: \item {\bf $| \cB |$}. Maximum size of the basket, also denoted above by $K$.

2636:

2637: \item {\bf number of tasks (chunks)}. Denoted above by $C$.

2638:

2639: \item {\bf number of clusters}. Denoted above by $T$, the number of

2640: partial sums \eqnok{thetaj} into which the second-stage problems are

2641: divided.

2642:

2643: \item {\bf max processors}. The number of workers requested.

2644:

2645: \item {\bf average processors}. The average of the number of active

2646: (nonsuspended) worker processors available for use by our problem

2647: during the run.  Because of the dynamic nature of the Condor system,

2648: the actual number of available processors fluctuates continually

2649: during the run.

2650:

2651: \item {\bf parallel efficiency}. The proportion of time for which

2652:   worker processors were kept busy solving second-stage problems

2653:   while they were owned by this run.

2654:

2655: \item {\bf maximum number of cuts in the model}. The maximum number of

2656: (partial) subgradients that are used to define the model function

2657: during the course of the algorithm.

2658:

2659: \item {\bf masterproblem solve time}. The total time spent solving the

2660: master subproblem to generate new candidate iterates during the course of the

2661: algorithm.

2662:

2663: \item {\bf wall clock}. The total time (in minutes) between submission

2664: of the job and termination.

2665:

2666: \end{itemize}

2667:

2668: %

2669:

2670: \begin{table}

2671: \vspace*{1.0in}

2672: \centering

2673: \begin{tabular}{|c|r|rrr|rrr|rr|r|}

2674: \begin{rotate}{-45} run \end{rotate} &

2675: \begin{rotate}{-45} points evaluated \end{rotate} &

2676: \begin{rotate}{-45} $\sigma$  \end{rotate} &

2677: \begin{rotate}{-45} \# tasks ($C$) \end{rotate} &

2678: \begin{rotate}{-45} \# clusters ($T$) \end{rotate} &

2679: \begin{rotate}{-45} max. processors allowed \end{rotate} &

2680: \begin{rotate}{-45} av. processors \end{rotate} &

2681: %

2682: \begin{rotate}{-45} parallel efficiency \end{rotate} &

2683: \begin{rotate}{-45} max. \# cuts in model \end{rotate} &

2684: \begin{rotate}{-45} masterproblem solve time (min) \end{rotate} &

2685: \begin{rotate}{-45} wall clock time (min) \end{rotate} \\ \hline

2686:

2687: ALS & 269 & $.5$ & 10 & 50 & 20 & 15 & %

2688: .74 & 5491 & 26 & 368 \\

2689: ALS & 275 & $.5$ & 25 & 50 & 50 & 21 & %

2690: .90 & 5536 & 25 & 270 \\

2691: ALS & 293 & $.5$ & 50 & 50 & 100 & 20 & %

2692: .83 & 5639 & 27 & 329 \\

2693: ALS & 270 & $.7$ & 10 & 50 & 20 & 12 & %

2694: .79 & 5522 & 27 & 509 \\

2695: ALS & 274 & $.7$ & 25 & 50 & 50 & 25 & %

2696: .73 & 5550 & 25 & 281 \\

2697: ALS & 282 & $.7$ & 50 & 50 & 100 & 26 & %

2698: .81 & 5562 & 24 & 254 \\

2699: ALS & 254 & $.85$ & 10 & 50 & 20 & 12 & %

2700: .58 & 5496 & 22 & 575 \\

2701: ALS & 276 & $.85$ & 25 & 50 & 50 & 19 & %

2702: .57 & 5575 & 23 & 516 \\

2703: ALS & 278 & $.85$ & 50 & 50 & 100 & 35 & %

2704: .49 & 5498 & 25 & 260 \\

2705: \hline

2706:

2707:

2708: \end{tabular}

2709: \caption{SSN, with $N=10,000$ scenarios, Algorithm ALS.\label{tab.ssn.10k.exp2}}

2710: \end{table}

2711:

2712: Table~\ref{tab.ssn.10k.exp2} shows the results of a series of trials

2713: of Algorithm ALS with three different values of $\sigma$ ($.5$, $.7$,

2714: and $.85$) and three different choices for the number of chunks $C$

2715: into which the second-stage solutions were divided (10, 25, and 50).

2716: The number of clusters $T$ was fixed at 50, so that up to 50

2717: cuts were generated at each iteration.  For $\sigma=.5$, the number of

2718: values of $x$ for which second-stage evaluations are occurring at any

2719: point in time ranged from 2 to 4 during the runs, while for

2720: $\sigma=.85$, there were never more than 2 points being evaluated

2721: simultaneously.

2722:

2723: When these runs were performed, we were not able to obtain anything

2724: approaching the requested number $2C$ of workers from the Condor pool.

2725: As general trends, we see that the less synchronous variants (with

2726: $\sigma = .5$ and $\sigma=.7$) tend to be faster than the more

2727: synchronous variant (with $\sigma=.85$), except for the final run,

2728: during which more processors were available.  Moreover, larger values

2729: of $C$ also tend to produce faster runs.  We also note that the number

2730: of iterations does not depend strongly on $\sigma$. We would not, of

2731: course, expect $C$ to affect strongly the number of iterations, but

2732: since it affects the manner in which the second-stage evaluation work

2733: is distributed, we {\em would} expect it to affect the run time. Since

2734: the number of workers available to us during this run was limited,

2735: however, we did not see the full benefit of a finer-grained work

2736: distribution ($C=50$), though the relatively low parallel efficiency

2737: of the final run ($\sigma=.85$, $C=50$) indicates that the benefits of

2738: more processors may not have been great in any case.

2739:

2740: A note on typical task sizes: For $C=10$, a typical task required

2741: about $50$-$280$ seconds on a typical worker machine available to us,

2742: while for $C=50$, about $9$-$60$ seconds were required. The large

2743: variation reflects the wide range in processing ability of the

2744: machines available in a pool during a typical run. These numbers also

2745: generally hold for the results in Tables~\ref{tab.ssn.10k.exp4.2} and

2746: \ref{tab.ssn.10k.exp4.1}.

2747:

2748: By comparing the results from Table~\ref{tab.ssn.10k.exp2} with those

2749: reported in Tables~\ref{tab.ssn.10k.exp4.2} and

2750: \ref{tab.ssn.10k.exp4.1}, we  verified that Algorithm

2751: ALS was not as efficient on this problem as Algorithm TR and certain

2752: variants of Algorithm ATR. One advantage, however, was that the

2753: asymptotic convergence of ALS was quite fast. Having taken many

2754: iterations to build up a model and return to a neighborhood of the

2755: solution after having strayed far from it in early iterations, the

2756: last three to four iterations home in rapidly from a relatively crude

2757: approximate solution (a relative accuracy $(\cQ_{\rm min} -

2758: m(x^{k+1})) / (1 + | \cQ_{\rm min}|)$ of between $.0006$ and $.0026$)

2759: to a solution of high accuracy.

2760: %

2761: %

2762: %

2763: %

2764:

2765: %

2766: %

2767: \begin{table}

2768: \vspace*{1.0in}

2769: \centering

2770: \begin{tabular}{|c|r|rrr|rrr|rr|r|}

2771: \begin{rotate}{-45} run \end{rotate} &

2772: \begin{rotate}{-45} points evaluated \end{rotate} &

2773: \begin{rotate}{-45} $|\cB|$ ($K$) \end{rotate} &

2774: \begin{rotate}{-45} \# tasks ($C$) \end{rotate} &

2775: \begin{rotate}{-45} \# clusters ($T$) \end{rotate} &

2776: \begin{rotate}{-45} max. processors allowed \end{rotate} &

2777: \begin{rotate}{-45} av. processors \end{rotate} &

2778: \begin{rotate}{-45} parallel efficiency \end{rotate} &

2779: \begin{rotate}{-45} max. \# cuts in model \end{rotate} &

2780: \begin{rotate}{-45} masterproblem solve time (min) \end{rotate} &

2781: \begin{rotate}{-45} wall clock time (min) \end{rotate} \\ \hline

2782:

2783: TR & 48 & - & 10 & 100 & 20 & 19 & .21 & 4284 & 3 & 131 \\

2784: TR & 72 & - & 10 & 50 & 20 & 19 & .26 & 3520 & 3 & 150  \\

2785: %

2786: TR & 39 & - & 25 & 100 & 25 & 22 & .49 & 3126 & 2 & 59 \\

2787: %

2788: TR & 75 & - & 25 & 50 & 25 & 23 & .48 & 3519 & 3 & 114  \\

2789: TR & 43 & - & 50 & 100 & 50 & 42 & .52 & 3860 & 3 & 35  \\

2790: TR & 61 & - & 50 & 50 & 50 & 44 & .53 & 3011 & 3 & 40  \\

2791: \hline

2792:

2793: ATR & 109 & 3 & 10 & 100 & 20 & 18 & .74 & 7680 & 9 & 107  \\

2794: ATR & 121 & 3 & 10 & 50 & 20 & 19 & .66 & 4825 & 6 & 111  \\

2795: ATR & 105 & 3 & 25 & 100 & 50 & 37 & .73 & 7367 & 8 & 49  \\

2796: ATR & 113 & 3 & 25 & 50 & 50 & 41 & .60 & 4997 & 6 & 48  \\

2797: ATR & 103 & 3 & 50 & 100 & 100 & 66 & .55 & 7032 & 9 & 29  \\

2798: ATR & 129 & 3 & 50 & 50 & 100 & 66 & .59 & 5183 & 7 & 32  \\

2799: \hline

2800:

2801: ATR & 167 & 6 & 10 & 100 & 35 & 24 & .93 & 7848 & 13 & 99  \\

2802: ATR & 209 & 6 & 10 & 50 & 35 & 22 & .89 & 5730 & 15 & 92  \\

2803: ATR & 186 & 6 & 25 & 100 & 87 & 49 & .77 & 8220 & 14 & 53  \\

2804: %

2805: %

2806: ATR & 172 & 6 & 25 & 50 & 87 & 49 & .80 & 5945 & 7 & 49 \\

2807: %

2808: ATR & 159 & 6 & 50 & 100 & 175 & 31 & .89 & 7092 & 11 & 65  \\

2809: ATR & 213 & 6 & 50 & 50 & 175 & 40 & .88 & 6299 & 12 & 70  \\

2810: \hline

2811:

2812: ATR & 260 & 9 & 10 & 100 & 50 & 12 & .95 & 14431 & 35 & 267  \\

2813: ATR & 286 & 9 & 10 & 50 & 50 & 23 & .90 & 6528 & 19 & 160  \\

2814: ATR & 293 & 9 & 25 & 100 & 125 & 17 & .93 & 9911 & 30 & 232  \\

2815: ATR & 377 & 9 & 25 & 50 & 125 & 15 & .96 & 7080 & 24 & 321  \\

2816: ATR & 218 & 9 & 50 & 100 & 200 & 28 & .82 & 10075 & 25 & 101  \\

2817: ATR & 356 & 9 & 50 & 50 & 200 & 23 & .93 & 6132 & 23 & 194  \\

2818: \hline

2819:

2820: ATR & 378 & 14 & 10 & 100 & 75 & 18 & .88 & 15213 & 77 & 302  \\

2821: ATR & 683 & 14 & 10 & 50 & 75 & 14 & .98 & 8850 & 48 & 648  \\

2822: ATR & 441 & 14 & 25 & 100 & 187 & 22 & .89 & 14597 & 61 & 312  \\

2823: ATR & 480 & 14 & 25 & 50 & 187 & 20 & .94 & 8379 & 36 & 347  \\

2824: ATR & 446 & 14 & 50 & 100 & 200 & 20 & .83 & 13956 & 64 & 331  \\

2825: ATR & 498 & 14 & 50 & 50 & 200 & 22 & .94 & 7892 & 35 & 329   \\

2826: \hline

2827:

2828: \end{tabular}

2829: \caption{SSN, with $N=10,000$ scenarios, first trial, Algorithms TR and ATR.\label{tab.ssn.10k.exp4.2}}

2830: \end{table}

2831:

2832: %

2833:

2834: \begin{table}

2835: \vspace*{1.0in}

2836: \centering

2837: \begin{tabular}{|c|r|rrr|rrr|rr|r|}

2838: \begin{rotate}{-45} run \end{rotate} &

2839: \begin{rotate}{-45} points evaluated \end{rotate} &

2840: \begin{rotate}{-45} $|\cB|$ ($K$) \end{rotate} &

2841: \begin{rotate}{-45} \# tasks ($C$) \end{rotate} &

2842: \begin{rotate}{-45} \# clusters ($T$) \end{rotate} &

2843: \begin{rotate}{-45} max. processors allowed \end{rotate} &

2844: \begin{rotate}{-45} av. processors \end{rotate} &

2845: \begin{rotate}{-45} parallel efficiency \end{rotate} &

2846: \begin{rotate}{-45} max. \# cuts in model \end{rotate} &

2847: \begin{rotate}{-45} masterproblem solve time (min) \end{rotate} &

2848: \begin{rotate}{-45} wall clock time (min) \end{rotate} \\ \hline

2849:

2850: TR & 47 & - & 10 & 100 & 20 & 17 & .24 & 3849 & 4 & 192  \\

2851: TR & 67 & - & 10 & 50 & 20 & 13 & .34 & 3355 & 3 & 256 \\

2852: TR & 47 & - & 25 & 100 & 25 & 18 & .49 & 3876 & 4 & 97 \\

2853: TR & 57 & - & 25 & 50 & 25 & 18 & .40 & 2835 & 3 & 119 \\

2854: TR & 42 & - & 50 & 100 & 50 & 30 & .22 & 3732 & 3 & 122 \\

2855: TR & 65 & - & 50 & 50 & 50 & 31 & .25 & 3128 & 4 & 151 \\

2856: \hline

2857:

2858: ATR & 92 & 3 & 10 & 100 & 20 & 11 & .89 & 7828 & 9 & 125 \\

2859: ATR & 98 & 3 & 10 & 50 & 20 & 11 & .84 & 4893 & 5 & 173 \\

2860: ATR & 86 & 3 & 25 & 100 & 50 & 34 & .38 & 6145 & 5 & 70 \\

2861: ATR & 95 & 3 & 25 & 50 & 50 & 32 & .41 & 4469 & 4 & 77 \\

2862: ATR & 80 & 3 & 50 & 100 & 100 & 52 & .23 & 5411 & 5 & 80 \\

2863: ATR & 131 & 3 & 50 & 50 & 100 & 59 & .47 & 4717 & 6 & 55 \\

2864: \hline

2865:

2866: ATR & 137 & 6 & 10 & 100 & 35 & 30 & .57 & 8338 & 12 & 84 \\

2867: ATR & 200 & 6 & 10 & 50 & 35 & 26 & .60 & 5211 & 9 & 130 \\

2868: ATR & 119 & 6 & 25 & 100 & 87 & 52 & .55 & 7181 & 7 & 44 \\

2869: ATR & 199 & 6 & 25 & 50 & 87 & 58 & .48 & 5298 & 9 & 81 \\

2870: ATR & 178 & 6 & 50 & 100 & 175 & 50 & .47 & 9776 & 15 & 77 \\

2871: ATR & 240 & 6 & 50 & 50 & 175 & 61 & .64 & 5910 & 11 & 74 \\

2872: \hline

2873:

2874: ATR & 181 & 9 & 10 & 100 & 50 & 37 & .56 & 8737 & 15 & 96 \\

2875: ATR & 289 & 9 & 10 & 50 & 50 & 19 & .93 & 7491 & 25 & 238 \\

2876: ATR & 212 & 9 & 25 & 100 & 125 & 90 & .66 & 11017 & 21 & 45 \\

2877: ATR & 272 & 9 & 25 & 50 & 125 & 65 & .45 & 6365 & 15 & 105 \\

2878: ATR & 281 & 9 & 50 & 100 & 200 & 51 & .72 & 11216 & 34 & 88 \\

2879: ATR & 299 & 9 & 50 & 50 & 200 & 26 & .83 & 7438 & 27 & 225 \\

2880: \hline

2881:

2882: ATR & 304 & 14 & 10 & 100 & 75 & 38 & .89 & 13608 & 43 & 129 \\

2883: ATR & 432 & 14 & 10 & 50 & 75 & 42 & .95 & 7844 & 28 & 132 \\

2884: ATR & 356 & 14 & 25 & 100 & 187 & 71 & .78 & 13332 & 48 & 111 \\

2885: ATR & 444 & 14 & 25 & 50 & 187 & 45 & .89 & 7435 & 36 & 163 \\

2886: ATR & 388 & 14 & 50 & 100 & 200 & 42 & .79 & 12302 & 52 & 192 \\

2887: ATR & 626 & 14 & 50 & 50 & 200 & 48 & .81 & 7273 & 46 & 254  \\

2888: \hline

2889: \end{tabular}

2890: \caption{SSN, with $N=10,000$ scenarios, second trial, Algorithms TR and ATR.\label{tab.ssn.10k.exp4.1}}

2891: \end{table}

2892:

2893: We now turn to Tables~\ref{tab.ssn.10k.exp4.2} and

2894: \ref{tab.ssn.10k.exp4.1}, which report on two sets of trials on the

2895: same problem as in Table~\ref{tab.ssn.10k.exp2}. In these trials we

2896: varied the following parameters:

2897: \bi

2898: \item {\bf basket size:}

2899: $K=1$ (synchronous TR) as well as $K=3,6,9,14$;

2900:

2901: \item {\bf number of tasks:}

2902: $C=10,25,50$, as in Table~\ref{tab.ssn.10k.exp2};

2903:

2904: \item {\bf number of clusters:} $T=50,100$.

2905: \ei

2906: %

2907: The parameter $\sigma$ was fixed at $.7$ in all these runs.

2908:

2909: The results in Table~\ref{tab.ssn.10k.exp4.2} were obtained with the

2910: master processor running on an Intel Solaris machine, while

2911: Table~\ref{tab.ssn.10k.exp4.1} was obtained with a Linux master.  In

2912: both cases, the Condor pool that we tapped for worker processors was

2913: identical. Therefore, it is possible to do a meaningful comparison

2914: between each line of Table~\ref{tab.ssn.10k.exp4.1} and its

2915: counterpart in Table~\ref{tab.ssn.10k.exp4.2}.  Conditions on the

2916: Condor pool varied between and during each trial. This fact, combined

2917: with the properties of the algorithm, resulted in large variability of

2918: runtime from one trial to the next, as we discuss below.

2919:

2920: The nondeterministic nature of the algorithms is evident in doing a

2921: side-by-side comparison of the two tables. Even for synchronous TR,

2922: the slightly different numerical values for function and subgradient

2923: value returned by different workers in different runs results in

2924: slight variations in the iteration sequence and therefore slight

2925: differences in the number of iterations. For the asynchronous

2926: Algorithm ATR, the nondeterminism is even more marked.  During the

2927: basket-filling phase of the algorithm, computation of a new $x$ is

2928: triggered when a certain proportion of tasks from a current value of

2929: $x$ has been returned. On different runs, the tasks will be returned

2930: in different orders, so the information used by the trust-region

2931: subproblem \eqnok{trsub.atr1} in generating the new point will vary

2932: from run to run, and the resulting iteration sequences will generally

2933: show substantial differences.

2934:

2935: The synchronous TR algorithm is clearly better than the ATR variants

2936: with $K>1$ in terms of total computation, which is roughly

2937: proportional to the number of iterations. In fact, the total amount of

2938: work increases steadily with basket size.  Because of the decreased

2939: synchronicity requirements and the greater parallelism obtained for

2940: $K>1$, the wall clock times (last columns) do not follow quite the

2941: same trend. The wall clock times for basket sizes $K=3$ and $K=6$ are

2942: at least competitive with the results obtained for the synchronous TR

2943: algorithm. The choice $K=6$ gave few of the fastest runs but did yield

2944: consistent performance over all the different choices for the other

2945: parameters, and under different Condor pool conditions.

2946:

2947: %

2948: %

2949: %

2950: %

2951: %

2952: %

2953:

2954: The deleterious effects of synchronicity in Algorithm TR can be seen in

2955: its poor performance on several instances, particularly during the

2956: second trial. Let us compare, for instance, the entries in the two

2957: tables for the variant of TR with $C=50$ and $T=100$. In the first

2958: trial, this run used 42 worker processors on average and took 35

2959: minutes, while in the second trial it used 30 workers on average and

2960: required 122 minutes. The difference in runtime is too large to be

2961: accounted for by the number of workers. Because this is a synchronous

2962: algorithm, the time required for each iteration is determined by the

2963: time required for the slowest worker to return the results of its

2964: task. In the first trial, almost all tasks required between 6 and 35

2965: seconds, except for a few iterations that contained tasks that took up

2966: to 62 seconds. In the second trial, the slowest worker at each

2967: iteration almost always required more than 60 seconds to complete its

2968: task. We return to this point in discussing

2969: Table~\ref{tab.ssn.10k.exp5} below.

2970:

2971: Other general observations we can make are that 100 clusters give

2972: almost uniformly better results in terms of wall clock time than 50

2973: clusters, although the higher number results in a larger number of

2974: cuts in the trust-region subproblems and an increased amount of time

2975: on the master processor in solving these problems. The latter factor

2976: is critical for $K=9$ and $K=14$, which do not compare

2977: favorably with the smaller values of $K$ on this problem, even if many

2978: more worker processors are available.  For the large basket sizes, the

2979: loss of control induced by the increase in assynchronicity leads to a

2980: significantly larger number of points that are evaluated.

2981:

2982: %

2983: %

2984: %

2985: %

2986: %

2987:

2988: In all cases, it takes some time for the model $m$ to become a good

2989: enough approximation to $\cQ$ that it generates a step that meets the

2990: trust-region acceptance criteria. The six TR runs in

2991: Table~\ref{tab.ssn.10k.exp4.1}, for instance, required 18, 27, 16, 22,

2992: 16, and 26 trust-region subproblems to be solved, respectively, before

2993: they stepped away from the initial point. (Note that, as expected, the

2994: runs with $T=100$ required fewer such iterations than those with

2995: $T=50$.) After the first step is taken, most steps are successful;

2996: that is, the first minor iterate usually is accepted as the next major

2997: iterate. Occasionally, two to four minor iterations are required

2998: before the next major iteration is identified.  Similar behavior is

2999: observed for the runs of ATR, except that successful iterations are

3000: more widely spaced. For the first run with $K=6$ in

3001: Table~\ref{tab.ssn.10k.exp4.1}, for instance, the $37$th solution of

3002: \eqnok{trsub.atr1} yields the first successful step; then 36 of the

3003: following 99 solutions of the subproblem yield successful steps.

3004:

3005:

3006: %

3007: %

3008: \begin{table}

3009: \vspace*{1.0in}

3010: \centering

3011: \begin{tabular}{|c|r|rrr|rrr|rr|r|}

3012: \begin{rotate}{-45} run \end{rotate} &

3013: \begin{rotate}{-45} points evaluated \end{rotate} &

3014: \begin{rotate}{-45} $|\cB|$ ($K$) \end{rotate} &

3015: \begin{rotate}{-45} \# tasks ($C$) \end{rotate} &

3016: \begin{rotate}{-45} \# clusters ($T$) \end{rotate} &

3017: \begin{rotate}{-45} max. processors allowed \end{rotate} &

3018: \begin{rotate}{-45} av. processors \end{rotate} &

3019: \begin{rotate}{-45} parallel efficiency \end{rotate} &

3020: \begin{rotate}{-45} max. \# cuts in model \end{rotate} &

3021: \begin{rotate}{-45} masterproblem solve time (min) \end{rotate} &

3022: \begin{rotate}{-45} wall clock time (min) \end{rotate} \\ \hline

3023:

3024: TR & 47 & - & 25 & 100 & 25 & 23 & .49 & 4040 & 3 & 58 \\

3025: TR & 44 & - & 25 & 100 & 25 & 21 & .31 & 3220 & 3 & 97 \\

3026: TR & 45 & - & 25 & 100 & 25 & 20 & .23 & 3966 & 4 & 158 \\ \hline

3027:

3028: TR & 51 & - & 50 & 100 & 50 & 37 & .33 & 4428 & 3 & 48 \\

3029: TR & 51 & - & 50 & 100 & 50 & 45 & .14 & 4806 & 3 & 135 \\

3030: TR & 46 & - & 50 & 100 & 50 & 41 & .15 & 3847 & 4 & 135 \\ \hline

3031:

3032: ATR & 81 & 3 & 25 & 100 & 50 & 43 & .38 & 7451 & 6 & 64 \\

3033: ATR & 81 & 3 & 25 & 100 & 50 & 39 & .41 & 6461 & 5 & 64 \\

3034: ATR & 87 & 3 & 25 & 100 & 50 & 36 & .44 & 6055 & 8 & 66 \\ \hline

3035:

3036: ATR & 106 & 3 & 50 & 100 & 100 & 84 & .28 & 8222 & 9 & 53 \\

3037: ATR & 95  & 3 & 50 & 100 & 100 & 65 & .26 & 6786 & 7 & 64 \\

3038: ATR & 94  & 3 & 50 & 100 & 100 & 23 & .44 & 6593 & 8 & 105 \\ \hline

3039:

3040: ATR & 171 & 6 & 25 & 100 & 87 & 70 & .45 & 9173 & 19 & 61 \\

3041: ATR & 135 & 6 & 25 & 100 & 87 & 61 & .39 & 7354 & 12 & 75 \\

3042: ATR & 145 & 6 & 25 & 100 & 87 & 38 & .35 & 8919 & 16 & 146 \\ \hline

3043:

3044: ATR & 177 & 6 & 50 & 100 & 175 & 87 & .41 & 9263 & 22 & 54 \\

3045: ATR & 162 & 6 & 50 & 100 & 175 & 93 & .34 & 7832 & 18 & 66 \\

3046: ATR & 159 & 6 & 50 & 100 & 175 & 39 & .27 & 8215 & 22 & 199 \\ \hline

3047:

3048: \end{tabular}

3049: \caption{SSN final trial with best parameter combinations, $N=10,000$ scenarios, Algorithms TR and ATR.\label{tab.ssn.10k.exp5}}

3050: \end{table}

3051:

3052:

3053: In Table~\ref{tab.ssn.10k.exp5}, we took the most promising parameter

3054: combinations from Tables~\ref{tab.ssn.10k.exp4.1} and

3055: \ref{tab.ssn.10k.exp4.2} and ran three trials with each combination.

3056: The Condor pool conditions varied widely during this trial, as can be

3057: seen by the way that the average number of workers varies within each

3058: group of three runs. For the asynchronous ATR runs, the differences in

3059: wall clock times within each set of three runs usually can be

3060: explained in terms of the varying number of workers available. (A

3061: possible exception is the last line of the table, the third run of ATR

3062: with $K=6$, $C=50$ and $T=100$, which took almost four times as long

3063: as the first run while having only slightly fewer than half as many

3064: processors. While the speed of machines available was roughly similar

3065: between these runs, the third run was plagued with numerous

3066: suspensions as the workers were reclaimed by their owners. Total time

3067: that workers were suspended was over 23,000 seconds on the third run

3068: and less than 2,800 seconds during the first run.)  On the other hand,

3069: the variability in wall clock time between the six runs of the

3070: synchronous TR algorithm was due not to the number of available

3071: workers but rather to the synchronicity effect described above. In the

3072: run reported in the first line of the table, for instance, the slowest

3073: worker on any iteration typically took less than 65 seconds. In the

3074: run reported on the third line, the time required by the slowest

3075: worker varied significantly but was typically much longer, 150 seconds

3076: and more.

3077:

3078: %

3079: %

3080: %

3081: %

3082: %

3083: %

3084: %

3085: %

3086: %

3087: %

3088: %

3089: %

3090: %

3091: %

3092: %

3093:

3094:

3095: %

3096:

3097: %

3098: %

3099: %

3100: %

3101: %

3102: %

3103: %

3104: %

3105: %

3106: %

3107: %

3108: %

3109: %

3110: %

3111: %

3112: %

3113: %

3114: %

3115: %

3116: %

3117: %

3118: %

3119: %

3120:

3121: \subsection{Larger Instances} \label{sec:results:large}

3122:

3123: We also performed runs on several larger instances of SSN (with

3124: %

3125: $N=100,000$ scenarios) and on some very large instances

3126: of the stormG2 problem, a cargo flight scheduling application described

3127: by Mulvey and Ruszczy{\'n}ski~\cite{MulR95}.  Our interest

3128: in this section is more in the sheer size of the problems that can be

3129: solved using the algorithms developed for the computational grid

3130: than with the relative performance of the algorithms with

3131: different parameter settings.

3132:

3133: %

3134: %

3135: %

3136: %

3137: %

3138: %

3139: %

3140: %

3141: %

3142: %

3143: %

3144: %

3145: %

3146: %

3147: %

3148: %

3149: %

3150: %

3151: %

3152: %

3153: %

3154: %

3155: %

3156: %

3157: %

3158: %

3159: %

3160: %

3161: %

3162: %

3163: %

3164: %

3165: %

3166: %

3167: %

3168: %

3169: %

3170: %

3171: %

3172: %

3173: %

3174: %

3175: %

3176: %

3177: %

3178: %

3179: %

3180: %

3181: %

3182: %

3183: %

3184: %

3185: %

3186: %

3187: %

3188: %

3189:

3190: %

3191: %

3192: \begin{table}

3193: \vspace*{1.0in}

3194: \centering

3195: \begin{tabular}{|c|r|rrr|rrr|rr|r|}

3196: \begin{rotate}{-45} run \end{rotate} &

3197: \begin{rotate}{-45} points evaluated \end{rotate} &

3198: \begin{rotate}{-45} $|\cB|$ ($K$) \end{rotate} &

3199: \begin{rotate}{-45} \# tasks ($C$) \end{rotate} &

3200: \begin{rotate}{-45} \# clusters ($T$) \end{rotate} &

3201: \begin{rotate}{-45} max. processors allowed \end{rotate} &

3202: \begin{rotate}{-45} av. processors \end{rotate} &

3203: \begin{rotate}{-45} parallel efficiency \end{rotate} &

3204: \begin{rotate}{-45} max. \# cuts in model \end{rotate} &

3205: \begin{rotate}{-45} masterproblem solve time (min) \end{rotate} &

3206: \begin{rotate}{-45} wall clock time (min) \end{rotate} \\ \hline

3207: ATR & 177 & 3 & 100 & 100 & 200 & 38 & .52 & 10558 & 47 & 1357 \\

3208: \hline

3209: \end{tabular}

3210: \caption{SSN, with $N=100,000$ scenarios.\label{tab.ssn.100k}}

3211: \end{table}

3212:

3213: Table~\ref{tab.ssn.100k} shows results for a sampled instance of SSN

3214: with $N=100,000$ scenarios, which is a linear program with

3215: approximately $1.75 \times 10^7$ constraints and $7.06 \times 10^7$

3216: variables. This run was performed at a time when not many machines

3217: were available, and many suspensions occurred during the run. We chose

3218: $T=100$ chunks per evaluation and found that most tasks required

3219: between 41 and 300 seconds on the workers, with a few task times of

3220: more than 500 seconds. (The benchmarks indicated that the worker speed

3221: varied over a factor of 7.)  A total of 77 different workers were used

3222: during the run, though the average number of nonsuspended workers

3223: available at any time was only 39. In fact, at any given point in the

3224: computation there were an average of 7 workers assigned to this task

3225: that were suspended. Still, a result was obtained in about 22 hours.

3226:

3227: \begin{table}

3228: \vspace*{1.0in}

3229: \centering

3230: \begin{tabular}{|c|r|rrr|rrr|rr|r|}

3231: \begin{rotate}{-45} run \end{rotate} &

3232: \begin{rotate}{-45} points evaluated \end{rotate} &

3233: \begin{rotate}{-45} $|\cB|$ ($K$) \end{rotate} &

3234: \begin{rotate}{-45} \# tasks ($C$) \end{rotate} &

3235: \begin{rotate}{-45} \# clusters ($T$) \end{rotate} &

3236: \begin{rotate}{-45} max. processors allowed \end{rotate} &

3237: \begin{rotate}{-45} av. processors \end{rotate} &

3238: \begin{rotate}{-45} parallel efficiency \end{rotate} &

3239: \begin{rotate}{-45} max. \# cuts in model \end{rotate} &

3240: \begin{rotate}{-45} masterproblem solve time (min) \end{rotate} &

3241: \begin{rotate}{-45} wall clock time (min) \end{rotate} \\ \hline

3242: TR  & 17 & -   & 125 & 125  & 250 & 106 & .55 & 2310 & 0.5 & 146  \\ %

3243: ATR & 25 & 3  & 125 & 125 & 250 & 106 & .90 & 3292 & 0.5 & 116 \\ \hline %

3244: \end{tabular}

3245: \caption{stormG2, with $N=250000$ scenarios. \label{tab.storm.250k}}

3246: \end{table}

3247:

3248: In the stormG2 problem of Mulvey and Ruszczy{\'n}ski~\cite{MulR95}, the

3249: first-stage problem contained 121 variables, while each second-stage

3250: problem contained 1259 variables.  We considered first a sampled

3251: approximation of this problem with 250000 scenarios, which resulted

3252: in a linear program with $1.32 \times 10^8$ constraints and $315 \times 10^8$

3253: unknowns.  Results are shown in Table~\ref{tab.storm.250k}. The

3254: algorithm was started at a solution of a sampled instance with fewer

3255: scenarios and was quite close to optimal. The objective function at

3256: the initial point was approximately $15499595.1$, compared with an

3257: optimal value of $15499591.9$ achieved by Algorithm TR. In fact, the

3258: TR algorithm takes only one major iteration---it accepts the 16th

3259: minor iteration as the first major iterate $x^1$. The ATR variant does

3260: not take even one step---it terminates after determining that the

3261: initial point $x^0$ is optimal to within the given convergence

3262: tolerance.  Although we requested 250 processors, an average of only

3263: 106 were available during the time that we performed these two test

3264: runs. The second run is able to utilize these to high efficiency, as

3265: the second-stage workload can be divided into a large number of chunks

3266: and very little time is spent in solving the trust-region subproblem.

3267:

3268: \begin{table}

3269: \vspace*{1.0in}

3270: \centering

3271: \begin{tabular}{|c|r|rrr|rrr|rr|r|}

3272: \begin{rotate}{-45} run \end{rotate} &

3273: \begin{rotate}{-45} points evaluated \end{rotate} &

3274: \begin{rotate}{-45} $|\cB|$ ($K$) \end{rotate} &

3275: \begin{rotate}{-45} \# tasks ($C$) \end{rotate} &

3276: \begin{rotate}{-45} \# clusters ($T$) \end{rotate} &

3277: \begin{rotate}{-45} max. processors allowed \end{rotate} &

3278: \begin{rotate}{-45} av. processors \end{rotate} &

3279: \begin{rotate}{-45} parallel efficiency \end{rotate} &

3280: \begin{rotate}{-45} max. \# cuts in model \end{rotate} &

3281: \begin{rotate}{-45} masterproblem solve time (hr) \end{rotate} &

3282: \begin{rotate}{-45} wall clock time (hr) \end{rotate} \\ \hline

3283: ATR & 28 & 4 & 1024 & 1024 & 800 & 433 & .668 & 39647 & 1.9 & 31.9 \\ \hline

3284: \end{tabular}

3285: \caption{stormG2, with $N=10^7$ scenarios.\label{tab.storm.1e7}}

3286: \end{table}

3287:

3288: Finally, we report on a very large sampled instance of stormG2 with

3289: $N=10^7$ scenarios, an instance whose deterministic equivalent is a

3290: linear program with $9.85 \times 10^8$ constraints and $1.26 \times

3291: 10^{10}$ variables.  Performance is profiled in

3292: Table~\ref{tab.storm.1e7}.

3293:

3294: We used the tighter convergence tolerance $\epstol = 10^{-6}$ for this

3295: run. The algorithm took successful steps at iterations 28, 34, 37, and

3296: 38, the last of these being the final iteration. The first evaluated

3297: point had a function value of

3298: %

3299: $15526740$, compared with a value of

3300: %

3301: $15498842$ at the final iteration.

3302: %

3303: %

3304: %

3305: %

3306:

3307: For this run, we augmented the Wisconsin Computer Science Condor pool with

3308: machines from Georgia Tech, the University of New Mexico, the Italian

3309: National Institute of Physics (INFN), the NCSA at the University of Illinois,

3310: and the IEOR Department at Columbia, the Albu, and the Wisconsin

3311: engineering Department.  Table~\ref{bigstorm.tab} shows

3312: the number and type of processors available at each of these

3313: locations.

3314: %

3315: %

3316: %

3317: In contrast to the other runs

3318: reported here, we used the ``MW-files'' implementation of MW, the

3319: variant that uses shared files to perform communication between master

3320: and workers rather than Condor-PVM.

3321:

3322: \begin{table}

3323: \centering

3324: \begin{tabular}{|c|c|c|} \hline

3325: Number & Type & Location \\ \hline

3326: 184 & Intel/Linux & Argonne \\ \hline

3327: 254  & Intel/Linux & New Mexico \\ \hline

3328: 36  & Intel/Linux & NCSA \\ \hline

3329: 265 & Intel/Linux & Wisconsin \\

3330: 88 & Intel/Solaris & Wisconsin \\

3331: 239 & Sun/Solaris & Wisconsin \\ \hline

3332: 124 & Intel/Linux & Georgia Tech  \\

3333: 90  & Intel/Solaris & Georgia Tech  \\

3334: 13 & Sun/Solaris & Georgia Tech \\ \hline

3335: 9   & Intel/Linux & Columbia U.  \\

3336: 10  & Sun/Solaris & Columbia U.  \\ \hline

3337:  33  & Intel/Linux & Italy (INFN)  \\ \hline \hline

3338: 1345 & & \\ \hline

3339: \end{tabular}

3340: \caption{Machines available for stormG2, with $N=10^7$

3341: scenarios.\label{bigstorm.tab}}

3342: \end{table}

3343:

3344: The job ran for a total of almost 32 hours.  The number of workers

3345: being used during the course of the run is shown in

3346: Figure~\ref{bigstorm-workers.fig}.  The job was stopped after

3347: approximately 8 hours and was restarted manually from a checkpoint

3348: about 2 hours later.  It then ran for approximately 24 hours to

3349: completion.  The number of workers dopped off significantly on two

3350: occasions.  The drops were due to the master processor ``blocking'' to

3351: solve a difficult master problem and to checkpoint the state of the

3352: computation.  During this time the worker processors were idle, and

3353: MW decided to release a number of the processors rather than have them

3354: sit idle.

3355:

3356: \begin{figure}

3357: \centering

3358: \epsfig{figure=storm1e7workers.ps,angle=270,width=\linewidth}

3359: \caption{Number of workers used for stormG2, with $N=10^7$ scenarios.\label{bigstorm-workers.fig}}

3360: \end{figure}

3361:

3362: As noted in Table~\ref{tab.storm.1e7}, an average of 433 workers were

3363: present at any given point in the run. The computation used a maximum

3364: of 556 workers, and there was a ratio of 12 in the speed of the

3365: slowest and fastest machines, as determined by the benchmarks. A total

3366: of 40837 tasks were generated during the run, representing $3.99

3367: \times 10^8$ second-stage linear programs. (At this rate, an average

3368: of 3472 second-stage linear programs were being solved per second

3369: during the run.) The average time to solve a task was 774 seconds.

3370: The total cumulative CPU time spent by the worker pool was 9014 hours,

3371: or just over one year of computation.

3372:

3373: %

3374:

3375: %

3376: %

3377: %

3378: %

3379: %

3380: %

3381: %

3382: %

3383: %

3384: %

3385: %

3386: %

3387: %

3388: %

3389: %

3390: %

3391: %

3392: %

3393: %

3394: %

3395: %

3396: %

3397: %

3398:

3399:

3400: \section{Conclusions}

3401:

3402: We have described L-shaped and trust-region algorithms for solving the

3403: two-stage stochastic linear programming problem with recourse, and

3404: derived asynchronous variants suitable for parallel implementation on

3405: distributed heterogeneous computational grids. We prove convergence

3406: results for the trust-region algorithms. Implementations based on the

3407: MW library and the Condor system are described, and we report on

3408: computational studies using different algorithmic parameters under

3409: different pool conditions.  Becasue of the dynamic nature of the

3410: computational pool, it is impossible to arrive at a ``best''

3411: configuration or set of algorithmic parameters for all instances.

3412: Instead, it may be important to adjust the algorithm parameters

3413: dynamically; we suggest this as a line of future research.  Finally,

3414: we report on the solution of some large sampled instances of problems

3415: from the literature, including an instance of the stormG2 problem

3416: whose deterministic equivalent has more than $10^{10}$ unknowns.

3417: Since the use of the computational grid has the greatest benefit on

3418: problems that require large amounts of computation, the algorithms

3419: developed here are best suited to larger (multistage) problems or

3420: incorporated into a sample average approximation approach (see Shapiro and Homem-de-Mello~\cite{ShaH01}.

3421:

3422: \section*{Acknowledgments}

3423:

3424: This research was supported by the Mathematics, Information, and

3425: Computational Sciences Division subprogram of the Office of Advanced

3426: Scientific Computing Research, U.S. Department of Energy, under

3427: Contract W-31-109-Eng-38.  We also acknowledge the support of the

3428: National Science Foundation, under Grant CDA-9726385.  We would also

3429: like to acknowledge the IHPCL at Georgia Tech, which is supported by a

3430: grant from Intel; the National Computational Science Alliance under

3431: grant number MCA00N015N for providing resources at the University of

3432: Wisconsin, the NCSA SGI/CRAY Origin2000, and the University of New

3433: Mexico/Albuquerque High Performance Computing Center AltaCluster; and

3434: the Italian Istituto Nazionale di Fisica Nucleare (INFN) and Columbia

3435: University for allowing us access to their Condor pools.

3436:

3437: We are grateful to Alexander Shapiro and Sven Leyffer for discussions

3438: about the algorithms presented here.

3439:

3440: \bibliographystyle{plain}

3441: \bibliography{refs}

3442:

3443: \end{document}

3444:

3445: This was an earlier proof of finite termination. It applied to a

3446: version of the termination test in which $\Delta_{k,\ell}$ was present

3447: on the right-hand side. Moreover, it was wrong in the last step, where

3448: we used in correctly $\Delta_{k,\ell} > \Delta_{\rm lo}$. In fact as

3449: the new version of Lemma~\ref{lem:trbounds} shows, we have only that

3450: \[

3451: \Delta_{k,\ell} \ge \min( \Delta_{\rm lo}, \| x^k-P(x^k)\|_{\infty}/4).

3452: \]

3453: Still, elements of the proof might be useful if we ever want to devise

3454: a termination test that guarantees some sort of near-optimality.

3455:

3456: \begin{theorem} \labtag{th:fint}

3457: When $\epstol>0$, Algorithm TR terminates finitely.

3458: \end{theorem}

3459: \begin{proof}

3460: In the first part of the proof, we show that the algorithm cannot

3461: ``get stuck'' at a particular $x^k$, generating an infinite sequence

3462: of minor iterations at $x^k$ without eventually satisfying either the

3463: termination test or the acceptance test \eqnok{tr.accept}.

3464:

3465: Consider first the case of $x^k \notin \cS$. From

3466: Lemma~\ref{lem:trbounds}, we have that the right-hand side of the

3467: termination test is bounded below by a positive constant as follows:

3468: \beq \labtag{fint.0}

3469: \epstol \Delta_{k,\ell} (1+| \cQ(x^k)|) \ge \epstol \Delta_{\rm lo} >0.

3470: \eeq

3471: By using the reasoning in the proof of Theorem~\ref{th:tr:ft},

3472: together with the monotonicity property of Lemma~\ref{lem:mkl}, we see

3473: that an infinite sequence of minor iterations would have the property

3474: that

3475: \beq \labtag{fint.1}

3476: \cQ(x^k) - m_{k,\ell}(x^{k,\ell}) \downarrow 0.

3477: \eeq

3478: Therefore, the minor iteration sequence must terminate finitely,

3479: either by satisfying the termination test or the trust-region

3480: acceptance test \eqnok{tr.accept}.

3481:

3482: Now consider $x^k \in \cS$, and consider first the situation in which

3483: trust-region radii $\Delta_{k,\ell}$, $\ell=1,2,\dots$ are bounded

3484: below, that is, $\Delta_{k,\ell} \ge \bar{\Delta}$ for some

3485: $\bar{\Delta}>0$ and all $\ell=1,2,\dots$. Then the right-hand side of

3486: \eqnok{conv.test} is strictly positive, that is,

3487: \[

3488: \epstol \Delta_{k,\ell} (1+| \cQ(x^k)|) \ge \epstol \bar{\Delta} >0.

3489: \]

3490: The logic leading to \eqnok{fint.1} again holds for this case, so the

3491: minor iteration sequence must eventually satisfy the convergence test

3492: and terminate.

3493:

3494: For the other case, we have that $x^k \in \cS$ and $\Delta_{k,\ell}

3495: \downarrow 0$ as $\ell \to \infty$. Because of our assumption that the

3496: \eqnok{conv.test} is not satisfied, we have for all $\ell=1,2,\dots$

3497: that

3498: \beq \labtag{fint.2}

3499: \frac{\cQ(x^k) - m_{k,\ell}(x^{k,\ell})}{\Delta_{k,\ell}} >

3500: \epstol (1+| \cQ(x^k) |) \ge \epstol, \;\;

3501: \ell=1,2,\dots.

3502: \eeq

3503: Because $\Delta_{k,\ell} \to 0$, it follows

3504: from the Reduce-$\Delta$ routine, we have that there are

3505: infinitely many minor iterations $\ell_j$, $j=1,2,\dots$, such that

3506: $\rho>1$, that is,

3507: \beq \labtag{fint.3}

3508: \Delta_{k,\ell_j} \frac{\cQ(x^{k,\ell_j}) - \cQ(x^k)}{\cQ(x^k)-m_{k,\ell_j}(x^{k,\ell_j})} >1.

3509: \eeq

3510: By combining  \eqnok{fint.2} (at $\ell=\ell_j$) with \eqnok{fint.3}, we

3511: obtain

3512: \beq \labtag{fint.4}

3513: \cQ(x^{k,\ell_j}) - \cQ(x^k) > \epstol, \;\; j=1,2,\dots.

3514: \eeq

3515: Using \eqnok{subd.5}, together with $\| g_j \|_1 \le \beta$ for all $g_j \in

3516: \partial \cQ(x^{k,\ell_j})$, we have

3517: \beq \labtag{fint.5}

3518: \cQ(x^{k,\ell_j}) - \cQ(x^k)  \le \beta \| x^k - x^{k,\ell_j} \|_{\infty}

3519: \le \beta \Delta_{k,\ell_j}, \;\; j=1,2,\dots.

3520: \eeq

3521: Since $\Delta_{k,\ell_j} \downarrow 0$ by assumption, \eqnok{fint.5}

3522: contradicts \eqnok{fint.4}, so we conclude that the minor iteration

3523: sequence terminates finitely in this case as well.

3524:

3525: Having shown that no major iterate $x^k$ can give rise to a

3526: non-terminating sequence of minor iterations, we show now that the

3527: sequence of major iterations itself must terminate. Consider first the

3528: case in which $x^k \in \cS$ for some $k$. Since $\cQ(x^{k,\ell}) \ge

3529: \cQ(x^k) = \cQ^*$ for all $\ell=1,2,\dots$, the trust-region

3530: acceptance test \eqnok{tr.accept} can be satisfied only if $\cQ(x^k) -

3531: m_{k,\ell}(x^{k,\ell}) =0$. But if this were the case, the left-hand

3532: side of \eqnok{conv.test} would have been satisfied before

3533: \eqnok{tr.accept} was even tested, and the algorithm would have

3534: stopped. Therefore, the algorithm fails to terminate at $x^k$ only if

3535: an infinite sequence of minor iterations is generated at this

3536: point---a case that we have already ruled out.

3537:

3538: We are left with the case of an infinite sequence of major iterations

3539: $\{ x^k \}_{k=1,2,\dots}$ for which $x^k \notin \cS$ for all

3540: $k=1,2,\dots$.  If \eqnok{conv.test} is never satisfied, we have from

3541: Lemma~\ref{lem:trbounds} that \eqnok{fint.0} holds at all $k$ and

3542: $\ell$. Because the acceptance test \eqnok{tr.accept} is eventually

3543: satisfied by some minor iteration $\ell$ for each $k$, we have from

3544: \eqnok{tr.accept} and \eqnok{conv.test} that

3545: \[

3546: \cQ(x^k) - \cQ(x^{k+1}) \ge

3547: \xi \left( \cQ(x^k) - m_{k,\ell}(x^{k,\ell}) \right) \ge

3548: \xi \epstol \Delta_{\rm lo} >0.

3549: \]

3550: This bound implies that $\cQ(x^k) \downarrow -\infty$, contradicting

3551: Assumption~\ref{ass:S}.

3552: \end{proof}

3553:

3554: {\bf The following stuff was the earlier analysis of Algorithm ATR,

3555: much of it now wrong and in any case superseded.}

3556:

3557: \begin{proof}

3558:   Suppose for contradiction that $x^I \notin \cS$ is an incumbent that

3559:   is never replaced by a later trial point $x^k$.  Clearly we must

3560:   have $\cQ^I = \cQ(x^I)$. (The alternative $\cQ^I=\infty$ can happen

3561:   only if no evaluation of $\cQ(\cdot)$ is ever completed; this is

3562:   excluded by \eqnok{all.tasks.completed}.)  In fact, because of

3563:   \eqnok{all.tasks.completed}, the sequence $\{ x^k \}$ is infinite.

3564:   Moreover, since at most $K$ of these points are generated in the

3565:   basket-filling part of {\tt act\_on\_completed\_task}, we have that

3566:   infinitely many of them are obtained by solving a trust-region

3567:   subproblem ${\tt trsub}(x^I, \Delta_k)$ centered on $x^I$. Each time

3568:   one of these points is generated, it eventually contributes cuts to

3569:   the model function $m$ that are never deleted, since these cuts are

3570:   all labeled with the index pair $(I,k)$, and we have by assumption

3571:   that $I \in \cB$ forever. Moreover, by Lemma~\ref{lem:atr1.1}, all

3572:   these $x^k$ lie in $\cL(\cQ_{\rm max}; \Delta_{\rm hi})$, so we can

3573:   define a uniform bound $\bar{\beta}$ on the $1$-norm of the

3574:   subgradients of $\cQ(x)$ for all $x \in \cL(\cQ_{\rm max};

3575:   \Delta_{\rm hi})$, analogously to \eqnok{def.beta}. Equipped with

3576:   $\bar{\beta}$, we can now apply logic very similar to that of

3577:   Lemma~\ref{lem:tr:ft} and Theorem~\ref{th:tr:ft}, with the $x^k$

3578:   obtained by solving ${\tt trsub}(x^I, \Delta_k)$ playing the role of

3579:   the minor iterates of Algorithm TR, to deduce that one of the

3580:   $x^k$'s in question must eventually satisfy the test

3581: \[

3582: \cQ(x^k) \le {\tt target}_k = \cQ(x^I) - \xi \left( \cQ(x^I) - m(x^k) \right).

3583: \]

3584: The $x^k$ that passes this test also trivially passes the test

3585: $\cQ(x^k) < \cQ^I$, so that it replaces $x^I$ as the incumbent,

3586: giving a contradiction.

3587: \end{proof}

3588:

3589: We conclude that unless some incumbent satisfies $x^I \in \cS$, the

3590: sequence of incumbents $\{x^{I_i}\}_{i=0,1,2, \dots}$ must be

3591: infinite. From the conditional test in the basket-update part of {\tt

3592:   act\_on\_completed\_task}, we know that the sequence $\{

3593: \cQ(x^{I_i}) \}_{i=0,1,2, \dots}$ is monotonically decreasing, and

3594: that $\cQ(x^{I_i}) \le {\tt target}_{I_i}$. At most a finite number of these

3595: quantities ${\tt target}_{I_i}$ satisfy ${\tt target}_{I_i} = \infty$ (since the

3596: basket-filling part of {\tt act\_on\_completed\_task} is executed at

3597: most $K$ times), so for infinitely many $I_i$, we have that ${\tt target}_{I_i}$

3598: is defined as in \eqnok{target.k}, and so

3599: \beq \labtag{atr1.chain1}

3600: \cQ(x^{I_i}) \le {\tt target}_{I_i} = \cQ(x^{I_{i_-}}) - \xi  \left(

3601: \cQ(x^{I_{i_-}}) - m(x^{I_i})

3602: \right),

3603: \eeq

3604: %

3605: for some previous incumbent indexed by $I_{i_-}$, with $i_- < i$. It

3606: follows that we can choose at least one infinite chain of incumbents

3607: such that  each point in the chain satisfies the trust-region

3608: acceptance test at the previous point in the chain. That is, we have

3609: a sequence $\{ i_j \}_{j=0,1,2,\dots}$ such that

3610: \beq \labtag{atr1.chain2}

3611: \cQ(x^{I_{i_j}}) \le \cQ(x^{I_{i_{j-1}}}) - \xi  \left(

3612: \cQ(x^{I_{i_{j-1}}}) - m(x^{I_{i_j}})

3613: \right), \sgap j=1,2,\dots.

3614: \eeq

3615: Since at every point in Algorithm ATR, $m(\cdot)$ is a linear

3616: underestimate of $\cQ(\cdot)$, and since $m(x^{I_{i_{j-1}}}) =

3617: \cQ(x^{I_{i_{j-1}}})$ at the moment when the right-hand side

3618: ${\tt target}_{I_{i_j}}$ is evaluated, we can use the proof technique of

3619: Lemma~\ref{lem:tr:1} to deduce that

3620: \beqas

3621: m(x^{I_{i_{j-1}}}) -  m(x^{I_{i_j}}) & \ge & \hat{\epsilon}

3622: \min  \left( \Delta_{I_{i_j}},

3623: \| x^{I_{i_{j-1}}} - P(x^{I_{i_{j-1}}})\|_{\infty} \right)  \\

3624: & \ge & \hat{\epsilon}

3625: \min  \left( \Delta_{\rm lo},

3626: \| x^{I_{i_{j-1}}} - P(x^{I_{i_{j-1}}}) \|_{\infty} \right),

3627: \eeqas

3628: at the moment at which ${\tt target}_{I_{i_j}}$ is evaluated. By substituting into

3629: \eqnok{atr1.chain2}, we deduce that

3630: \beq \labtag{atr1.chain3}

3631: \cQ(x^{I_{i_{j-1}}}) - \cQ(x^{I_{i_j}}) \ge

3632: \xi \hat{\epsilon} \min  \left( \Delta_{\rm lo},

3633: \| x^{I_{i_{j-1}}} - P(x^{I_{i_{j-1}}}) \|_{\infty} \right),

3634:  \sgap j=1,2,\dots.

3635: \eeq

3636:

3637: \begin{theorem} \labtag{th:atr1.3}

3638: Suppose that none of the incumbents $x^I$ lies in the solution

3639: set. Then  $ \lim_{j \to \infty} \| x^{I_i} - P(x^{I_i}) \| = 0$.

3640: \end{theorem}

3641: \begin{proof}

3642: Consider the sequence $\{ \cQ(x^{I_i}) \}$ of objective values of

3643: incumbents. This sequence is monotonically decreasing and is bounded

3644: below by $\cQ^*$, so it has a limit, say $\bar{\cQ}$. Assume that the

3645: strict inequality $\bar{\cQ}> \cQ^*$ is satisfied. We then have for

3646: all $x^{I_i}$ that $\cQ(x^{I_i}) > \bar{\cQ} > \cQ^*$ so by continuity of

3647: $\cQ$ and boundedness of the subdifferential $\partial \cQ$,

3648:  there is $\delta>0$ such that

3649: \beq \labtag{atr1.away}

3650: \| x^{I_i} - P(x^{I_i}) \| \ge

3651: \delta, \sgap \mbox{for all $i=0,1,2,\dots$}.

3652: \eeq

3653: Consider now the infinite chain of incumbents discussed above.

3654: We have from \eqnok{atr1.chain2} and \eqnok{atr1.away} that

3655: \beq

3656: \cQ(x^{I_{i_{j-1}}}) - \cQ(x^{I_{i_j}}) \ge

3657: \xi \hat{\epsilon} \min  \left( \Delta_{\rm lo}, \delta \right) >0,

3658:  \sgap j=1,2,\dots.

3659: \eeq

3660: which implies that $\cQ(x^{I_{i_j}}) \downarrow -\infty$ as $j \to

3661: \infty$, giving a contradiction.

3662: We therefore have that $\cQ(x^{I_i})$ converges monotonically to $\cQ^*$.

3663: The result now follows immediately from \eqnok{weak.sharp}.

3664: \end{proof}

3665:

3666: