1: \documentclass[matprg]{mcsreport}
2:
3: %
4: %
5: %
6: \usepackage{subeqn}
7: %
8: \usepackage{extra}
9: %
10: \usepackage{rotating}
11: \usepackage{epsfig}
12:
13: %
14: %
15: %
16: %
17: \def\labtag#1{\label{#1}}
18:
19: \newcommand{\epstol}{\epsilon_{\rm tol}}
20:
21: \begin{document}
22: \author{Jeff Linderoth \and Stephen Wright}
23:
24: \title{Decomposition Algorithms for Stochastic
25: Programming on a Computational Grid}
26:
27: \titlerunning{Stochastic Programming on a Computational Grid}
28:
29: \institute{Jeff Linderoth\at
30: Axioma Inc., 501-F Johnson Ferry Road, Suite 450,
31: Marietta, GA 30068;
32: {\tt jlinderoth@axiomainc.com}
33: \and
34: Stephen Wright\at
35: Mathematics and Computer Science Division,
36: Argonne National Laboratory, 9700 South Cass Avenue,
37: Argonne, IL 60439; {\tt wright@mcs.anl.gov}}
38: %
39: %
40: \date{\today}
41: %
42: \subclass{90C15, 65K05, 68W10}
43: \reportnumber{P875--0401, April, 2001}
44: \maketitle
45:
46: \begin{abstract}
47: We describe algorithms for two-stage stochastic linear programming
48: with recourse and their implementation on a grid computing platform.
49: In particular, we examine serial and asynchronous versions of the
50: L-shaped method and a trust-region method. The parallel platform of
51: choice is the dynamic, heterogeneous, opportunistic platform
52: provided by the Condor system. The algorithms are of master-worker
53: type (with the workers being used to solve second-stage problems),
54: and the MW runtime support library (which supports master-worker
55: computations) is key to the implementation. Computational results
56: are presented on large sample average approximations of problems
57: from the literature.
58: \end{abstract}
59:
60: \section{Introduction} \labtag{introduction}
61:
62: Consider the following stochastic optimization problem:
63: \beq \labtag{gen.sp}
64: \min_{x \in S} \, F(x) \defeq \sum_{i=1}^N p_i f(x,\omega_i),
65: \eeq
66: %
67: where $S \in \R^n$ is a constraint set, $\Omega = \{ \omega_1,
68: \omega_2, \dots, \omega_N \}$ is the set of outcomes (consisting of
69: $N$ distinct scenarios), and $p_i$ is the probability associated with
70: each scenario. Problems of the form \eqnok{gen.sp} can arise directly
71: (in many applications, the number of scenarios is naturally finite),
72: or as discretizations of problems over continuous probability spaces,
73: obtained by approximation or sampling. In this paper, we discuss the
74: {\em two-stage stochastic linear programming problem with fixed
75: resource}, which is a special case of \eqnok{gen.sp} defined as follows:
76: \begin{subequations} \labtag{2stage.lp}
77: \beqa
78: \labtag{2stage.lp.obj}
79: & \min \, c^T x + \sum_{i=1}^N p_i q(\omega_i)^T y(\omega_i), \sgap
80: \mbox{subject to} \\
81: \labtag{2stage.lp.x}
82: & Ax=b, \;\; x \ge 0, \\
83: \labtag{2stage.lp.y}
84: & W y(\omega_i) = h(\omega_i) - T(\omega_i) x, \;\;
85: y(\omega_i) \ge 0,
86: \sgap i=1,2,\dots,N.
87: \eeqa
88: \end{subequations}
89: The unknowns in this formulation are $x$ and $y(\omega_1),
90: y(\omega_2), \dots, y(\omega_N)$, where $x$ contains the ``first-stage
91: variables'' and each $y(\omega_i)$ contains the ``second-stage
92: variables'' associated with the $i$th scenario. The $i$th scenario is
93: characterized by the probability $p_i$ and the data objects
94: $(q(\omega_i), T(\omega_i), h(\omega_i))$.
95:
96: The formulation \eqnok{2stage.lp} is sometimes known as the
97: ``deterministic equivalent'' because it lists the unknowns for all
98: scenarios explicitly and poses the problem as a (potentially very
99: large) structured linear program. An alternative formulation is
100: obtained by recognizing that each term in the second-stage summation
101: in \eqnok{2stage.lp.obj} is a piecewise linear convex function
102: of $x$. Defining the $i$th second-stage problem as a linear program (LP)
103: parametrized by the first-stage variables $x$, that is,
104: \begin{subequations}
105: \labtag{second-stage-lp}
106: \beqa
107: \labtag{second-stage-lp.1}
108: & \cQ_i(x) \defeq \min_{y(\omega_i)} \, q(\omega_i)^T y(\omega_i) \;\;
109: \mbox{subject to} \\
110: \labtag{second-stage-lp.2}
111: & W y(\omega_i) = h(\omega_i) - T(\omega_i) x,
112: \;\; y(\omega_i) \ge 0,
113: \eeqa
114: \end{subequations}
115: and defining the objective in \eqnok{2stage.lp.obj} as
116: \beq \labtag{def.Q}
117: \cQ(x) \defeq c^Tx + \sum_{i=1}^N p_i \cQ_i(x),
118: \eeq
119: we can restate \eqnok{2stage.lp} as
120: \beq \labtag{2stage.pl}
121: \min_x \, \cQ(x), \;\; \mbox{subject to} \; Ax=b, \; x \ge 0.
122: \eeq
123: %
124: %
125: %
126: %
127: %
128:
129: We note several features about the problem \eqnok{2stage.pl}. First, it
130: is clear from \eqnok{def.Q} and \eqnok{second-stage-lp} that $\cQ(x)$
131: can be evaluated for a given $x$ by solving the $N$ linear programs
132: \eqnok{second-stage-lp} separately. Second, we can derive subgradient
133: information for $\cQ_i(x)$ by considering dual solutions of
134: \eqnok{second-stage-lp}. If we fix $x=\hat{x}$ in
135: \eqnok{second-stage-lp}, the primal solution $y(\omega_i)$ and dual
136: solution $\pi(\omega_i)$ satisfy the following optimality conditions:
137: \beqas
138: q(\omega_i) - W^T \pi(\omega_i) \ge 0 & \perp & y(\omega_i) \ge 0, \\
139: W y(\omega_i) & = & h(\omega_i) - T(\omega_i) \hat{x}.
140: \eeqas
141: From these two conditions we obtain that
142: \beq \labtag{theta.2}
143: \cQ_i(\hat{x}) = q(\omega_i)^T y(\omega_i) =
144: \pi(\omega_i)^T W y(\omega_i) =
145: \pi(\omega_i)^T [ h(\omega_i) - T(\omega_i) \hat{x} ].
146: \eeq
147: Moreover, since $\cQ_i$ is piecewise linear and convex, we have for
148: any $x$ that
149: \beq \labtag{subg.property}
150: \cQ_i(x) - \cQ_i(\hat{x}) \ge
151: \pi(\omega_i)^T [ -T(\omega_i) x + T(\omega_i) \hat{x} ] =
152: \left( - T(\omega_i)^T \pi(\omega_i) \right)^T (x-\hat{x}),
153: \eeq
154: which implies that
155: \beq \labtag{subg.Qi}
156: -T(\omega_i)^T \pi(\omega_i) \in \partial \cQ_i(\hat{x}),
157: \eeq
158: where $\partial \cQ_i(\hat{x})$ denotes the subgradient of $\cQ_i$ at
159: $\hat{x}$. By Rockafellar~\cite[Theorem~23.8]{Roc70}, using
160: polyhedrality of each $\cQ_i$, we have from \eqnok{def.Q} that
161: \beq \labtag{subg.Q}
162: \partial \cQ(\hat{x}) = c + \sum_{i=1}^N p_i \partial \cQ_i(\hat{x}),
163: \eeq
164: for every $\hat{x}$ that lies in the domain of each $\cQ_i$,
165: $i=1,2,\dots,N$.
166:
167: Let $\cS$ denote the solution set for \eqnok{2stage.pl}; we assume for
168: most of the paper that $\cS$ is nonempty. Since \eqnok{2stage.pl} is a
169: convex program, $\cS$ is closed and convex, and the projection
170: operator $P(\cdot)$ onto $\cS$ is well defined. Because the objective
171: function in \eqnok{2stage.pl} is piecewise linear and the constraints
172: are linear, the problem has a {\em weak sharp minimum} (Burke and
173: Ferris~\cite{BurF93}); that is, there exists $\hat{\epsilon}>0$ such
174: that
175: %
176: \beq \labtag{weak.sharp}
177: \cQ(x) - \cQ^* \ge \hat{\epsilon} \| x- P(x) \|_{\infty},
178: \;\; \mbox{for all $x$ with $Ax=b$, $x \ge 0$,}
179: \eeq
180: where $\cQ^*$ is the optimal value of the objective.
181:
182: The subgradient information can be used by algorithms in different
183: ways. Successive estimates of the optimal $x$ can be obtained by
184: minimizing over a convex underestimate of $\cQ(x)$ constructed from
185: subgradients obtained at earlier iterations,
186: as in the L-shaped method described in
187: Section~\ref{sec:lshaped}. This method can be stabilized by the use of
188: a quadratic regularization term (Ruszczy{\'n}ski~\cite{Rus86},
189: Kiwiel~\cite{Kiw90}) or by the explicit use of a trust region, as in
190: the $\ell_{\infty}$ trust-region approach described in
191: Section~\ref{sec:tr}. Alternatively, when an upper bound on the
192: optimal value $\cQ^*$ is available, one can derive each new iterate
193: from an approximate analytic center of an approximate epigraph. The
194: latter approach has been explored by Bahn et al.~\cite{BahDGV95} and
195: applied to a large stochastic programming problem by Frangi{\`e}re,
196: Gondzio, and Vial~\cite{FraGV00}.
197:
198: Because evaluation of $\cQ_i(x)$ and elements of its subdifferential can be
199: carried out independently for each $i=1,2,\dots,N$, and because such
200: evaluations usually constitute the bulk of the computational workload,
201: implementation on parallel computers is possible. We can partition
202: second-stage scenarios $i=1,2,\dots,N$ into ``chunks'' and define a
203: computational task to be the solution of all the LPs
204: \eqnok{second-stage-lp} in a single chunk. Each such task could be
205: assigned to an available worker processor. Relationships between the
206: solutions of \eqnok{second-stage-lp} for different scenarios can be
207: exploited within each chunk (see Birge and
208: Louveaux~\cite[Section~5.4]{BirL97}). The number of second-stage LPs
209: in each chunk should be chosen to ensure that the computation does
210: not become communication bound. That is, each chunk should be large
211: enough that its processing time significantly exceeds the time
212: required to send the data to the worker processor and to return the
213: results.
214:
215: %
216: %
217: %
218: %
219: %
220: %
221: %
222: %
223: %
224: %
225: %
226:
227: %
228: %
229: %
230: %
231: %
232: %
233:
234: In this paper, we describe implementations of decomposition algorithms
235: for stochastic programming on a dynamic, heterogeneous computational
236: grid made up of workstations, PCs (from clusters), and supercomputer
237: nodes. Specifically, we use the environment provided by the Condor
238: system~\cite{condor}. We also discuss the MW runtime library (Goux et
239: al.~\cite{GouLY00,GouKLY00}), a software layer that significantly
240: simplifies the process of implementing parallel algorithms in Condor.
241:
242: %
243: %
244: %
245: %
246: %
247: %
248: %
249:
250: %
251: %
252: %
253: %
254: %
255: %
256: %
257: %
258: %
259: %
260:
261: %
262: %
263: %
264: %
265: %
266: %
267: %
268: %
269: %
270: %
271: %
272: %
273: %
274: %
275: %
276: %
277: %
278: %
279: %
280: %
281: %
282: %
283: %
284: %
285: %
286: %
287: %
288: %
289: %
290:
291: For the dimensions of problems and parallel platforms considered in
292: this paper, evaluation of the functions $\cQ_i(x)$ and their
293: subgradients at a single $x$ often is insufficient to make
294: effective use of the available processors. Moreover, ``synchronous''
295: algorithms---those that depend for efficiency on all tasks completing
296: in a timely fashion---run the risk of poor performance in an
297: environment such as ours, in which failure or suspension of worker
298: processors while they are processing a task is not an infrequent
299: event. We are led therefore to ``asynchronous'' approaches that
300: consider different points $x$ simultaneously. Asynchronous variants
301: of the L-shaped and $\ell_{\infty}$ trust-region methods are described
302: in Sections~\ref{sec:lshaped:async} and \ref{sec:atr}, respectively.
303:
304: %
305: %
306: %
307: %
308: %
309: %
310:
311: Other parallel algorithms for stochastic programming have been devised
312: by Birge et al.~\cite{BirDHS98}, Birge and Qi~\cite{BirQ88}, and
313: Frangi{\`e}re, Gondzio, and Vial~\cite{FraGV00}. In \cite{BirDHS98}, the
314: focus is on multistage problems in which the scenario tree is
315: decomposed into subtrees, which are processed independently and in
316: parallel on worker processors. Dual solutions from each subtree are
317: used to construct a model of the first-stage objective (using an
318: L-shaped approach like that described in Section~\ref{sec:lshaped}),
319: which is periodically solved by a master process to obtain a new
320: candidate first-stage solution $x$. Parallelization of the linear
321: algebra operations in interior-point algorithms is considered in
322: \cite{BirQ88}, but this approach involves significant data movement
323: and does not scale particularly well. In \cite{FraGV00}, the
324: second-stage problems \eqnok{second-stage-lp} are solved concurrently
325: and inexactly by using an interior-point code. The master process
326: maintains an upper bound on the optimal objective, and this bound
327: along with the subgradients obtained from the second-stage problems
328: yields a polygon whose (approximate) analytic center is calculated
329: periodically to obtain a new candidate $x$. The approach is based in
330: part on an algorithm described by Gondzio and Vial~\cite{GonV00}. The
331: numerical results in \cite{FraGV00} report solution of a two-stage
332: stochastic linear program with $2.6$ million variables and $1.2$
333: million constraints in three hours on a cluster of 10 Linux PCs.
334:
335:
336: \section{L-Shaped Methods} \labtag{sec:lshaped}
337:
338: We now describe the L-shaped method, a fundamental algorithm for
339: solving \eqnok{2stage.pl}, and an asynchronous variant.
340:
341: \subsection{The Multicut L-Shaped Method} \labtag{sec:lshaped:multicut}
342:
343: The L-shaped method of Van Slyke and Wets~\cite{VanW69} for solving
344: \eqnok{2stage.pl} proceeds by finding subgradients of partial sums of
345: the terms that make up $\cQ$ \eqnok{def.Q}, together with linear
346: inequalities that define the domain of $\cQ$. The method is
347: essentially Benders decomposition~\cite{Ben62}, enhanced to deal with
348: infeasible iterates. A full description is given in Chapter 5 of
349: Birge and Louveaux~\cite{BirL97}. We sketch the approach here and
350: show how it can be implemented in an asynchronous fashion.
351:
352: We suppose that the second-stage scenarios indexed by $1,2,\dots, N$
353: are partitioned into $T$ clusters denoted by $\cN_1, \cN_2, \dots,
354: \cN_T$. Let $\cQ_{[j]}$ represent the partial sum
355: from \eqnok{def.Q} corresponding to the cluster $\cN_j$:
356: \beq \labtag{thetaj}
357: \cQ_{[j]}(x) = \sum_{i \in \cN_j} p_i \cQ_i(x).
358: \eeq
359: %
360: The algorithm maintains a model function $m^k_{[j]}$, which is a
361: piecewise linear lower bound on $\cQ_{[j]}$ for each $j$. We define
362: this function at iteration $k$ by
363: \beq \labtag{Qjk}
364: m_{[j]}^k(x) = \inf \{ \theta_j \, | \,
365: \theta_j e \ge F_{[j]}^k x + f_{[j]}^k \},
366: \eeq
367: %
368: where $F_{[j]}^k$ is a matrix whose rows are subgradients of
369: $\cQ_{[j]}$ at previous iterates of the algorithm, and
370: $e=(1,1,\dots,1)^T$. The rows of $\theta_j e \ge F_{[j]}^k x +
371: f_{[j]}^k$ are referred to as {\em optimality cuts}. Upon evaluating
372: $\cQ_{[j]}$ at the new iterate $x^k$ by solving
373: \eqnok{second-stage-lp} for each $i \in \cN_j$, a subgradient
374: $g_j \in \partial \cQ_{[j]}$ can be obtained from a formula
375: derived from \eqnok{subg.Qi} and \eqnok{subg.Q}, namely,
376: \beq \labtag{subg.Qj}
377: g_j = - \sum_{i \in \cN_j} p_i T(\omega_i)^T \pi(\omega_i),
378: \eeq
379: %
380: where each $\pi(\omega_i)$ is an optimal dual solution of
381: \eqnok{second-stage-lp}.
382: %
383: %
384: %
385: %
386: %
387: %
388: %
389: %
390: Since by the subgradient property we have
391: \[
392: \cQ_{[j]}(x) \ge g_j^T x + (\cQ_{[j]}(x^k) - g_j^T x^k),
393: \]
394: we can obtain $F_{[j]}^{k+1}$ from $F_{[j]}^k$ by appending the row
395: $g_j^T$, and $f_{[j]}^{k+1}$ from $f_{[j]}^k$ by appending the element
396: $(\cQ_{[j]}(x^k) - g_j^T x^k)$. In order to keep the number of cuts reasonable,
397: the cut is not added if $m^k_{[j]}$ is not greater than the value
398: predicted by the lower bounding approximation (see \eqnok{master}
399: below). In this case, the current set of cuts in $F_{[j]}^k$,
400: $f_{[j]}^k$ adequately models $\cQ_{[j]}$. In addition, we may also
401: wish to delete some rows from $F_{[j]}^{k+1}$, $f_{[j]}^{k+1}$
402: corresponding to facets of the epigraph of \eqnok{Qjk} that we do not
403: expect to be active in later iterations.
404:
405: The algorithm also maintains a collection of {\em feasibility cuts}
406: of the form
407: \beq \labtag{feas.cuts}
408: D^k x \ge d^k,
409: \eeq
410: %
411: which have the effect of excluding values of $x$ that were found to be
412: infeasible, in the sense that some of the second-stage linear programs
413: \eqnok{second-stage-lp} are infeasible for these values of $x$. By
414: Farkas's theorem (see Mangasarian~\cite[p.~31]{Man69}), if the
415: constraints \eqnok{second-stage-lp.2} are infeasible, there exists
416: $\pi(\omega_i)$ with the following properties:
417: \[
418: W^T \pi(\omega_i) \le 0, \sgap
419: \left[ h(\omega_i) - T(\omega_i) x \right]^T \pi(\omega_i) > 0.
420: \]
421: (In fact, such a $\pi(\omega_i)$ can be obtained from the dual simplex
422: method for the feasibility problem \eqnok{second-stage-lp.2}.) To
423: exclude this $x$ from further consideration, we simply add the
424: inequality $[h(\omega_i) - T(\omega_i) x]^T \pi(\omega_i) \le 0$ to
425: the constraint set, by appending the row vector $\pi(\omega_i)^T
426: T(\omega_i)$ to $D^k$ and the element $\pi(\omega_i)^T h(\omega_i)$ to
427: $d^k$ in \eqnok{feas.cuts}.
428:
429: The iterate $x^k$ of the multicut L-shaped method is obtained by solving
430: the following approximation to \eqnok{2stage.pl}:
431: \beq \labtag{2stage.pl.L}
432: \min_x \, m_k(x),
433: \;\; \mbox{subject to} \; D^k x \ge d^k, \; Ax=b, \; x \ge 0,
434: \eeq
435: where
436: \beq \labtag{def.mk}
437: m_k(x) \defeq c^Tx + \sum_{j=1}^T m_{[j]}^k(x).
438: \eeq
439: In practice, we substitute from
440: \eqnok{Qjk} to obtain the following linear program:
441: \begin{subequations} \labtag{master}
442: \beqa
443: \labtag{master.1}
444: \min_{x, \theta_1, \dots, \theta_T} \, c^Tx + \sum_{j=1}^T \theta_j, &&
445: \mbox{subject to} \\
446: \labtag{master.4}
447: \theta_j e & \ge & F_{[j]}^k x + f_{[j]}^k, \sgap j=1,2,\dots,T, \\
448: \labtag{master.3}
449: D^k x & \ge & d^k, \\
450: \labtag{master.2}
451: Ax=b, \;\; x & \ge & 0.
452: \eeqa
453: \end{subequations}
454:
455: The L-shaped method proceeds by solving \eqnok{master} to generate a
456: new candidate $x$, then evaluating the partial sums \eqnok{thetaj} and
457: adding optimality and feasibility cuts as described above. The process
458: is repeated, terminating when the improvement in objective promised by
459: the subproblem \eqnok{2stage.pl.L} becomes small.
460:
461: For simplicity we make the following assumption for the remainder of
462: the paper.
463: %
464: \begin{assumption} \labtag{ass:S}
465: \mbox{}
466: \begin{itemize}
467: \item[(i)] The problem has complete recourse; that is, the feasible
468: set of \eqnok{second-stage-lp} is nonempty for all $i=1,2,\dots,N$
469: and all $x$, so that the domain of $\cQ(x)$ in \eqnok{def.Q} is $\R^n$.
470: \item[(ii)] The solution set $\cS$ is nonempty.
471: \end{itemize}
472: \end{assumption}
473: %
474: Under this assumption, feasibility cuts of the form \eqnok{feas.cuts},
475: \eqnok{master.3} do not appear during the course of the algorithm. Our
476: algorithms and their analysis can be generalized to handle situations
477: in which Assumption~\ref{ass:S} does not hold, but since our
478: development is complex enough already, we postpone discussion of these
479: generalizations to a future report.
480:
481:
482: Using Assumption~\ref{ass:S}, we can specify the L-shaped algorithm
483: formally as follows:
484: %
485: \btab
486: \> {\bf Algorithm LS} \\
487: \> choose tolerance $\epstol$; \\
488: \> choose starting point $x^0$; \\
489: \> define initial model $m_0$ to be a piecewise linear
490: underestimate of $\cQ(x)$ \\
491: \>\> such that $m_0(x^0) = \cQ(x^0)$ and $m_0$ is bounded below; \\
492: \> $\cQ_{\rm min} \leftarrow \cQ(x^0)$; \\
493: \> {\bf for} $k=0,1,2,\dots$ \\
494: \>\> obtain $x^{k+1}$ by solving \eqnok{2stage.pl.L}; \\
495: \>\> {\bf if}
496: $\cQ_{\rm min} - m_k(x^{k+1}) \le \epstol (1+|\cQ_{\rm min}|) $ \\
497: \>\>\> STOP; \\
498: \>\> evaluate function and subgradient information at $x^{k+1}$; \\
499: \>\> $\cQ_{\rm min} \leftarrow \min(\cQ_{\rm min}, \cQ(x^{k+1}))$; \\
500: \>\> obtain $m_{k+1}$ by adding optimality cuts to $m_k$; \\
501: \> {\bf end(for).}
502: \etab
503:
504: \subsection{An Asynchronous Parallel Variant of the L-Shaped Method}
505: \labtag{sec:lshaped:async}
506:
507: The L-shaped approach lends itself naturally to implementation in a
508: master-worker framework. The problem \eqnok{master} is
509: solved by the master process, while solution of each cluster
510: $\cN_j$ of second-stage problems, and generation of the associated
511: cuts, can be carried out by the worker processes running in parallel.
512: This approach can be adapted for an asynchronous, unreliable
513: environment in which the results from some second-stage clusters are
514: not returned in a timely fashion. Rather than having all the worker
515: processors sit idle while waiting for the tardy results, we can
516: proceed without them, re-solving the master by using the additional cuts
517: that were generated by the other second-stage clusters.
518:
519: We denote the model function simply by $m$ for the asynchronous
520: algorithm, rather than appending a subscript. Whenever the time comes
521: to generate a new iterate, the current model is used. In practice, we
522: would expect the algorithm to give different results each time it is
523: executed, because of the unpredictable speed and order in which the
524: functions are evaluated and subgradients generated. Because of
525: Assumption~\ref{ass:S}, we can write the subproblem
526: \beq \labtag{als.subprob}
527: \min_x \, m(x),
528: \;\; \mbox{subject to} \; Ax=b, \; x \ge 0.
529: \eeq
530:
531: Algorithm ALS, the asynchronous variant of the L-shaped method that we
532: describe here, is made up of four key operations, three of which
533: execute on the master processor and one of which runs on the
534: workers. These operations are as follows:
535: %
536: \bi
537: \item {\tt partial\_evaluate}. This is the routine for evaluating
538: $\cQ_{[j]}(x)$ defined by \eqnok{thetaj} for a given $x$ and $j$,
539: in the process generating a subgradient $g_j$ of $\cQ_{[j]}(x)$. It runs on a
540: worker processor and returns its results to the master by
541: activating the routine {\tt act\_on\_completed\_task} on the master
542: processor.
543:
544: \item {\tt evaluate}. This routine, which runs on the master, simply
545: places $T$ tasks of the type {\tt partial\_evaluate} for a given $x$ into the task
546: pool for distribution to the worker processors as they become
547: available. The completion of these $T$ tasks is equivalent to evaluating $\cQ(x)$.
548:
549: \item {\tt initialize}. This routine runs on the master processor
550: and performs initial bookkeeping, culminating in a call to {\tt
551: evaluate} for the initial point $x^0$.
552:
553: \item {\tt act\_on\_completed\_task}. This routine, which runs on the
554: master, is activated whenever the results become available from a {\tt
555: partial\_evaluate} task. It updates the model and increments a counter
556: to keep track of the number of clusters that have been evaluated at
557: each candidate point. When appropriate, it solves the master problem
558: with the latest model to obtain a new candidate iterate\, and will call {\tt evaluate}.
559:
560: \ei
561:
562: In our implementation of both this algorithm and its more
563: sophisticated cousin Algorithm ATR of Section~\ref{sec:atr}, we may
564: define a single task to consist of the evaluation of more than one
565: cluster $\cN_j$. We may bundle, say, $5$ or $10$ clusters into a
566: single task, in the interests of making the task large enough to
567: justify the master's effort in packing its data and unpacking its
568: results, and to maintain the ratio of compute time to communication
569: cost at a high level. For purposes of simplicity, however, we assume
570: in the descriptions both of this algorithm and of ATR that each task
571: consists of a single cluster.
572:
573: The implementation depends on a ``synchronicity'' parameter $\sigma$
574: which is the proportion of clusters that must be evaluated at a point
575: to trigger the generation of a new candidate iterate. Typical values
576: of $\sigma$ are in the range $0.25$ to $0.9$. A logical variable
577: ${\tt speceval}_k$ keeps track of whether $x^k$ has yet triggered a
578: new candidate. Initially, ${\tt speceval}_k$ is set to ${\tt false}$,
579: then set to ${\tt true}$ when the proportion of evaluated clusters
580: passes the threshold $\sigma$.
581:
582: We now specify all the methods making up Algorithm ALS.
583:
584: \btab
585: \>{\bf ALS:} \ \ {\tt partial\_evaluate}$(x^q,q,j,\cQ_{[j]}(x^q),g_j)$ \\
586: \> Given $x^q$, index $q$, and partition number $j$,
587: evaluate $\cQ{[j]}(x^q)$ from \eqnok{thetaj} \\
588: \>\> together with a partial subgradient $g_j$ from \eqnok{subg.Qj};
589: \\
590: \> Activate {\tt act\_on\_completed\_task}$(x^q,q,j,\cQ_{[j]}(x^q),g_j)$
591: on the master processor.
592: \etab
593:
594: \medskip
595:
596: \btab
597: \> {\bf ALS:} \ \ {\tt evaluate}$(x^q,q)$ \\
598: \> {\bf for} $j=1,2,\dots, T$ (possibly concurrently) \\
599: \>\> {\tt partial\_evaluate}$(x^q,q,j,\cQ_{[j]}(x^q), g_j)$; \\
600: \> {\bf end (for)}
601: \etab
602:
603: \medskip
604:
605: \btab
606: \> {\bf ALS:} \ \ {\tt initialize} \\
607: \> choose tolerance $\epstol$; \\
608: \> choose starting point $x^0$; \\
609: \> choose threshold $\sigma \in (0,1]$; \\
610: \> $\cQ_{\rm min} \leftarrow \infty$; \\
611: \> $k \leftarrow 0$, ${\tt speceval}_0 \leftarrow {\tt false}$, $t_0 \leftarrow 0$; \\
612: \> {\tt evaluate}$(x^0,0)$.
613: \etab
614:
615: \medskip
616:
617: \btab
618: \> {\bf ALS:} \ \
619: {\tt act\_on\_completed\_task}$(x^q,q,j,\cQ_{[j]}(x^q),g_j)$ \\
620: \> $t_q \leftarrow t_q+1$; \\
621: \> add $\cQ_{[j]}(x^q)$ and cut $g_j$ to the model $m$; \\
622: \> {\bf if} $t_q = T$ \\
623: \>\> $\cQ_{\rm min} \leftarrow \min ( \cQ_{\rm min}, \cQ(x^q))$; \\
624: \> {\bf else if} $t_q \ge \sigma T$ {\bf and} not ${\tt speceval}_q$ \\
625: \>\> ${\tt speceval}_q \leftarrow ${\tt true}; \\
626: \>\> $k \leftarrow k+1$; \\
627: \>\> solve current model problem \eqnok{als.subprob} to obtain $x^{k+1}$; \\
628: \>\> {\bf if} $\cQ_{\rm min} - m(x^{k+1}) \le \epstol (1+|\cQ_{\rm min}|) $ \\
629: \>\>\> STOP; \\
630: \>\> {\tt evaluate}$(x^k,k)$; \\
631: \> {\bf end (if)}
632:
633: \etab
634:
635: We present results for Algorithm ALS in Section~\ref{sec:results}.
636: While the algorithm is able to use a large number of worker processors
637: on our opportunistic platform, it suffers from the usual drawbacks of
638: the L-shaped method, namely, that cuts, once generated, must be
639: retained for the remainder of the computation to ensure convergence
640: and that large steps are typically taken on early iterations before a
641: sufficiently good model approximation to $\cQ(x)$ is created, making
642: it impossible to exploit prior knowledge about the location of the
643: solution.
644:
645: \section{A Bundle-Trust-Region Method} \labtag{sec:tr}
646:
647: Trust-region approaches can be implemented by making only minor
648: modifications to implementations of the L-shaped method, and they
649: possesses several practical advantages along with stronger convergence
650: properties. The trust-region methods we describe here are related to
651: the regularized decomposition method of Ruszczy{\'n}ski~\cite{Rus86}
652: and the bundle-trust-region approaches of Kiwiel~\cite{Kiw90} and
653: Hirart-Urruty and Lemar\'echal~\cite[Chapter~XV]{HirL93}. The main
654: differences are that we use box-shaped trust regions yielding linear
655: programming subproblems (rather than quadratic programs) and that our
656: methods manipulate the size of the trust region directly rather than
657: indirectly via a regularization parameter.
658:
659: When requesting a subgradient of $\cQ$ at some
660: point $x$, our algorithms do not require particular (e.g., extreme)
661: elements of the subdifferential to be supplied. Nor do they require
662: the subdifferential $\partial \cQ(x)$ to be representable as a convex
663: combination of a finite number of vectors. In this respect, our
664: algorithms contrast with that of Ruszczy{\'n}ski~\cite{Rus86}, for
665: instance, which exploits the piecewise-linear nature of the objectives
666: $\cQ_i$ in \eqnok{second-stage-lp}. Because of our weaker conditions
667: on the subgradient information, we cannot prove a finite termination
668: result of the type presented in \cite[Section~3]{Rus86}. However,
669: these conditions potentially allow our algorithms to be extended to a
670: more general class of convex nondifferentiable functions. We hope to
671: explore these generalizations in future work.
672:
673: \subsection{A Method Based on $\ell_{\infty}$ Trust Regions}
674: \labtag{sec:tr:tr}
675:
676: A key difference between the trust-region approach of this section and
677: the L-shaped method of the preceding section is that we impose an
678: $\ell_{\infty}$ norm bound on the size of the step. It is implemented
679: by simply adding bound constraints to the linear programming
680: subproblem \eqnok{master} as follows:
681: \beq \labtag{master.tr.bounds}
682: -\Delta e \le x-x^k \le \Delta e,
683: \eeq
684: %
685: where $e=(1,1,\dots,1)^T$, $\Delta$ is the trust-region radius, and
686: $x^k$ is the current iterate. During the $k$th iteration, it may be
687: necessary to solve several problems with trust regions of the form
688: \eqnok{master.tr.bounds}, with different model functions $m$ and
689: possibly different values of $\Delta$, before a satisfactory new
690: iterate $x^{k+1}$ is identified. We refer to $x^k$ and $x^{k+1}$ as
691: {\em major iterates} and the points $x^{k,\ell}$, $\ell=0,1,2,\dots$
692: obtained by minimizing the current model function subject to the
693: constraints and trust-region bounds of the form
694: \eqnok{master.tr.bounds} as {\em minor iterates}. Another key
695: difference between the trust-region approach and the L-shaped approach
696: is that a minor iterate $x^{k,\ell}$ is accepted as the new major
697: iterate $x^{k+1}$ only if it yields a substantial reduction in the
698: objective function $\cQ$ over the previous iterate $x^k$, in a sense
699: to be defined below. A further important difference is that one can
700: delete optimality cuts from the model functions, between minor and
701: major iterations, without compromising the convergence properties of
702: the algorithm.
703:
704: To specify the method, we need to augment the notation established in
705: the previous section. We define $m_{k,\ell}(x)$ to be the model
706: function after $\ell$ minor iterations have been performed at
707: iteration $k$, and $\Delta_{k,\ell}>0$ to be the trust-region radius
708: at the same stage. Under Assumption~\ref{ass:S}, there are no
709: feasibility cuts, so that the problem to be solved to obtain the minor
710: iteration $x^{k,\ell}$ is as follows:
711: \beq \labtag{trsub.kl}
712: \min_x \, m_{k,\ell}(x) \;\; \mbox{subject to} \;Ax=b, \; x \ge 0, \;
713: \| x-x^k \|_{\infty} \le \Delta_{k,\ell}
714: \eeq
715: (cf. \eqnok{2stage.pl.L}). By expanding this problem in a similar
716: fashion to \eqnok{master}, we obtain
717: \begin{subequations} \labtag{master.kl}
718: \beqa
719: \labtag{master.kl.1}
720: \min_{x, \theta_1, \dots, \theta_T} \, c^Tx + \sum_{j=1}^T \theta_j, &&
721: \mbox{subject to} \\
722: \labtag{master.kl.4}
723: \theta_j e & \ge & F_{[j]}^{k,\ell} x + f_{[j]}^{k,\ell}, \sgap j=1,2,\dots,T, \\
724: \labtag{master.kl.2}
725: Ax=b, \;\; x & \ge & 0, \\
726: \labtag{master.kl.tr}
727: -\Delta_{k,\ell} e \le x-x^k & \le & \Delta_{k,\ell} e.
728: \eeqa
729: \end{subequations}
730:
731: We assume the initial model $m_{k,0}$ at major iteration $k$ to
732: satisfy the following two properties:
733: \begin{subequations} \labtag{mkprop}
734: \beqa \labtag{mkprop.1}
735: & m_{k,0}(x^k) = \cQ(x^k), \\
736: \labtag{mkprop.2}
737: & \mbox{$m_{k,0}$ is a piecewise linear underestimate of $\cQ$}.
738: \eeqa
739: \end{subequations}
740:
741: %
742: %
743: %
744: %
745: %
746: %
747:
748: Denoting the solution of the subproblem \eqnok{master.kl} by
749: $x^{k,\ell}$, we accept this point as the new iterate $x^{k+1}$ if the
750: decrease in the actual objective $\cQ$ (see \eqnok{2stage.pl}) is at
751: least some fraction of the decrease predicted by the model function
752: $m_{k,\ell}$. That is, for some constant $\xi \in (0,1/2)$, the
753: acceptance test is
754: \beq \labtag{tr.accept}
755: \cQ(x^{k,\ell}) \le \cQ(x^k) - \xi
756: \left( \cQ(x^k) - m_{k,\ell}(x^{k,\ell}) \right).
757: \eeq
758: %
759: (A typical value for $\xi$ is $10^{-4}$.)
760:
761: If the test \eqnok{tr.accept} fails to hold, we obtain a new model
762: function $m_{k,\ell+1}$ by adding and possibly deleting cuts from
763: $m_{k,\ell}(x)$. This process aims to refine the model function, so
764: that it eventually generates a new major iteration, while economizing
765: on storage by allowing deletion of subgradients that no longer seem
766: helpful. Addition and deletion of cuts are implemented by adding and
767: deleting rows from $F_{[j]}^{k,\ell}$ and $f_{[j]}^{k,\ell}$, to
768: obtain $F_{[j]}^{k,\ell+1}$ and $f_{[j]}^{k,\ell+1}$, for
769: $j=1,2,\dots,T$.
770:
771: Given some parameter $\eta \in [0,1)$, we obtain $m_{k,\ell+1}$ from
772: $m_{k,\ell}$ by means of the following procedure:
773: %
774: \btab
775: \> {\bf Procedure Model-Update} $(k,\ell)$ \\
776: \> {\bf for each} optimality cut \\
777: \>\> {\tt possible\_delete} $\leftarrow$ {\tt true}; \\
778: \>\> {\bf if} the cut was generated at $x^k$ \\
779: \>\>\> {\tt possible\_delete} $\leftarrow$ {\tt false}; \\
780: \>\> {\bf else if} the cut is active at the solution of \eqnok{master.kl} \\
781: \>\>\> {\tt possible\_delete} $\leftarrow$ {\tt false}; \\
782: \>\> {\bf else if} the cut was generated at an earlier minor iteration \\
783: \>\>\>
784: $\bar{\ell}=0,1,\dots,\ell-1$ such that
785: \etab
786: \beq \labtag{cut.delete.criterion}
787: \cQ(x^k) - m_{k,\ell}(x^{k,\ell}) > \eta
788: \left[ \cQ(x^k) - m_{k,\bar\ell}(x^{k,\bar\ell}) \right]
789: \eeq
790: \btab
791: \>\>\> {\tt possible\_delete} $\leftarrow$ {\tt false}; \\
792: \>\> {\bf end (if)} \\
793: %
794: %
795: \>\> {\bf if} {\tt possible\_delete} \\
796: \>\>\> possibly delete the cut; \\
797: \> {\bf end (for each)} \\
798: \> add optimality cuts obtained from each of the component functions \\
799: \>\> $\cQ_{[j]}$ at $x^{k,\ell}$. \\
800: \etab
801: %
802:
803: In our implementation, we delete the cut if ${\tt possible\_delete}$
804: is true at the final conditional statement and, in addition, the cut
805: has not been active during the last 100 solutions of
806: \eqnok{master.kl}. More details are given in
807: Section~\ref{sec:results:parameters}.
808:
809: Because we retain all cuts active at $x^k$ during the course of
810: major iteration $k$, the following extension of \eqnok{mkprop.1} holds:
811: \beq \labtag{mkprop.1a}
812: m_{k,\ell}(x^k) = \cQ(x^k), \;\; \ell=0,1,2,\dots.
813: \eeq
814: Since we add only subgradient information, the following
815: generalization of \eqnok{mkprop.2} also holds uniformly:
816: \beq \labtag{mkprop.2a}
817: \mbox{$m_{k,\ell}$ is a piecewise linear underestimate of $\cQ$, for $\ell=0,1,2,\dots.$}
818: \eeq
819:
820: We may also decrease the trust-region radius $\Delta_{k,\ell}$ between
821: minor iterations (that is, choose $\Delta_{k,\ell+1} <
822: \Delta_{k,\ell}$) when the test \eqnok{tr.accept} fails to hold. We do
823: so if the match between model and objective appears to be particularly
824: poor. If $\cQ(x^{k,\ell})$ exceeds $\cQ(x^k)$ by more than an
825: estimate of the quantity
826: \beq \labtag{reduce.delta.1}
827: \max_{\| x-x^k\|_{\infty} \le 1} \, \cQ(x^k) - \cQ(x),
828: \eeq
829: we conclude that the ``upside'' variation of the function $\cQ$
830: deviates too much from its ``downside'' variation, and we choose the
831: new radius $\Delta_{k,\ell+1}$ to bring these quantities more nearly
832: into line. Our estimate of \eqnok{reduce.delta.1} is simply
833: \[
834: \frac{1}{\min(1,\Delta_{k,\ell})}
835: \left[ \cQ(x^k) - m_{k,\ell}(x^{k,\ell}) \right],
836: \]
837: that is, an extrapolation of the model reduction on the current trust
838: region to a trust region of radius $1$. Our complete strategy for
839: reducing $\Delta$ is therefore as follows. (The {\tt counter} is
840: initialized to zero at the start of each major iteration.)
841: %
842: \btab
843: \> {\bf Procedure Reduce-$\Delta$} \\
844: \> evaluate
845: \etab
846: \beq \labtag{reduce.delta.2}
847: \rho = {\min(1,\Delta_{k,\ell})} \frac{\cQ(x^{k,\ell}) - \cQ(x^k)}{\cQ(x^k) - m_{k,\ell}(x^{k,\ell})};
848: \eeq
849: \btab
850: \> {\bf if} $\rho>0$ \\
851: \>\> {\tt counter} $\leftarrow$ {\tt counter}$+1$; \\
852: \> {\bf if} $\rho>3$ {\bf or}
853: ({\tt counter} $\ge 3$ {\bf and} $\rho \in (1,3]$) \\
854: \>\> set
855: \etab
856: \[
857: \Delta_{k,\ell+1} = \frac{1}{\min(\rho,4)} \Delta_{k,\ell};
858: \]
859: \btab
860: \>\> reset {\tt counter} $\leftarrow 0$;
861: \etab
862: %
863: This procedure is related to the technique of
864: Kiwiel~\cite[p.~109]{Kiw90} for increasing the coefficient of the
865: quadratic penalty term in his regularized bundle method.
866:
867: If the test \eqnok{tr.accept} is passed, so that we have
868: $x^{k+1} = x^{k,\ell}$, we have a great deal of flexibility in
869: defining the new model function $m_{k+1,0}$. We require only that the
870: properties \eqnok{mkprop} are satisfied, with $k+1$ replacing $k$.
871: Hence, we are free to delete much of the optimality cut information
872: accumulated at iteration $k$ (and previous iterates). In practice, of
873: course, it is wise to delete only those cuts that have been inactive
874: for a substantial number of iterations; otherwise we run the risk that
875: many new function and subgradient evaluations will be required to
876: restore useful model information that was deleted prematurely.
877:
878: If the step to the new major iteration $x^{k+1}$ shows a particularly
879: close match between the true function $\cQ$ and the model function
880: $m_{k,\ell}$ at the last minor iteration of iteration $k$, we consider
881: increasing the trust-region radius. Specifically, if
882: \beq \labtag{tr.incr.1}
883: \cQ(x^{k,\ell}) \le \cQ(x^k) - 0.5
884: \left( \cQ(x^k) - m_{k,\ell}(x^{k,\ell}) \right), \sgap
885: \| x^k - x^{k,\ell} \|_{\infty} = \Delta_{k,\ell},
886: \eeq
887: then we set
888: \beq \labtag{tr.incr.3}
889: \Delta_{k+1,0} = \min ( \Delta_{\rm hi}, 2 \Delta_{k,\ell}),
890: \eeq
891: where $\Delta_{\rm hi}$ is a prespecified upper bound on the radius.
892:
893: Before specifying the algorithm formally, we define the convergence
894: test. Given a parameter $\epstol>0$, we terminate if
895: \beq \labtag{conv.test}
896: \cQ(x^k) - m_{k,\ell}(x^{k,\ell}) \le
897: %
898: \epstol (1+ |\cQ(x^k)|).
899: \eeq
900:
901: \btab
902: \> {\bf Algorithm TR} \\
903: \> choose $\xi \in (0,1/2)$, maximum trust region $\Delta_{\rm hi}$,
904: tolerance $\epstol$; \\
905: \> choose starting point $x^0$; \\
906: \> define initial model $m_{0,0}$ with the properties \eqnok{mkprop} (for $k=0$); \\
907: \> choose $\Delta_{0,0} \in (0, \Delta_{\rm hi}]$; \\
908: \> {\bf for} $k=0,1,2,\dots$ \\
909: \>\> {\tt finishedMinorIteration} $\leftarrow$ {\tt false}; \\
910: \>\> $\ell \leftarrow 0$; ${\tt counter} \leftarrow 0$; \\
911: \>\> {\bf repeat} \\
912: \>\>\> solve \eqnok{trsub.kl} to obtain $x^{k,\ell}$; \\
913: \>\>\> {\bf if} \eqnok{conv.test} is satisfied \\
914: \>\>\>\> STOP with approximate solution $x^k$; \\
915: \>\>\> evaluate function and subgradient at $x^{k,\ell}$; \\
916: \>\>\> {\bf if} \eqnok{tr.accept} is satisfied \\
917: \>\>\>\> set $x^{k+1} = x^{k,\ell}$; \\
918: \>\>\>\> obtain $m_{k+1,0}$ by possibly deleting cuts from $m_{k,\ell}$, but \\
919: \>\>\>\>\>
920: retaining the properties \eqnok{mkprop} (with $k+1$ replacing $k$); \\
921: \>\>\>\> choose $\Delta_{k+1,0} \in [ \Delta_{k,\ell}, \Delta_{\rm hi}]$
922: according to \eqnok{tr.incr.1}, \eqnok{tr.incr.3}; \\
923: \>\>\>\> {\tt finishedMinorIteration} $\leftarrow$ {\tt true}; \\
924: \>\>\> {\bf else} \\
925: \>\>\>\> obtain $m_{k,\ell+1}$ from $m_{k,\ell}$
926: via Procedure Model-Update $(k,\ell)$; \\
927: \>\>\>\> obtain $\Delta_{k,\ell+1}$ via Procedure Reduce-$\Delta$; \\
928: \>\>\> $\ell \leftarrow \ell+1$; \\
929: \>\> {\bf until} {\tt finishedMinorIteration} \\
930: \> {\bf end (for)}
931: \etab
932:
933: \subsection{Analysis of the Trust-Region Method}
934: \labtag{sec:tr:analysis}
935:
936: %
937: %
938: %
939: %
940: %
941: %
942: %
943: %
944: %
945: %
946: %
947: %
948: %
949: %
950: %
951: %
952: %
953: %
954: %
955: %
956: %
957:
958: We now describe the convergence properties of Algorithm TR. We show
959: that for $\epstol=0$, the algorithm either terminates at a solution
960: or generates a sequence of major iterates that approaches the
961: solution set $\cS$ (Theorem~\ref{th:tr:conv}). When $\epstol > 0$, the
962: algorithm terminates finitely; that is, it avoids generating infinite
963: sequences either of major or minor iterates (Theorem~\ref{th:fint}).
964:
965: Given some starting point $x^0$ satisfying the constraints
966: $Ax^0=b$, $x^0 \ge 0$, and setting $\cQ_0 = \cQ(x^0)$, we define the
967: following quantities that are useful in describing and analyzing the
968: algorithm:
969: %
970: \beqa
971: \labtag{def.ls}
972: \cL(\cQ_0) &=& \{ x \, | \, Ax=b, x \ge 0, \cQ(x) \le \cQ_0 \}, \\
973: \labtag{def.lsn}
974: \cL(\cQ_0;\Delta) &=& \{ x \, | \, \|x-y \| \le \Delta, \,
975: \mbox{for some $y \in \cL(\cQ_0)$} \}, \\
976: \labtag{def.beta}
977: \beta &=& \sup \{ \| g \|_1 \, | \, g \in \partial \cQ(x), \,
978: \mbox{for some $x \in \cL(\cQ_0;\Delta_{\rm hi})$} \}.
979: \eeqa
980: %
981: Using Assumption~\ref{ass:S}, we can easily show that $\beta <
982: \infty$.
983:
984: We start by showing that the optimal objective value for
985: \eqnok{trsub.kl} cannot decrease from one minor iteration to the next.
986: \begin{lemma} \labtag{lem:mkl}
987: Suppose that $x^{k,\ell}$ does not satisfy the acceptance test
988: \eqnok{tr.accept}. Then we have
989: \[
990: m_{k,\ell}(x^{k,\ell}) \le m_{k,\ell+1}(x^{k,\ell+1}).
991: \]
992: \end{lemma}
993: \begin{proof}
994: In obtaining $m_{k,\ell+1}$ from $m_{k,\ell}$ in Model-Update, we do
995: not allow deletion of cuts that were active at the solution
996: $x^{k,\ell}$ of \eqnok{master.kl}. Using $\bar{F}_{[j]}^{k,\ell}$
997: and $\bar{f}_{[j]}^{k,\ell}$ to denote the active rows in
998: $F_{[j]}^{k,\ell}$ and $f_{[j]}^{k,\ell}$, we have that $x^{k,\ell}$
999: is also the solution of the following linear program (in which the
1000: inactive cuts are not present):
1001: \begin{subequations} \labtag{master2.kl}
1002: \beqa
1003: \labtag{master2.kl.1}
1004: \min_{x, \theta_1, \dots, \theta_T} \, c^Tx + \sum_{j=1}^T \theta_j, &&
1005: \mbox{subject to} \\
1006: \labtag{master2.kl.4}
1007: \theta_j e & \ge & \bar{F}_{[j]}^{k,\ell} x + \bar{f}_{[j]}^{k,\ell}, \sgap j=1,2,\dots,T, \\
1008: \labtag{master2.kl.2}
1009: Ax=b, \;\; x & \ge & 0, \\
1010: \labtag{master2.kl.tr}
1011: -\Delta_{k,\ell} e \le x-x^k & \le & \Delta_{k,\ell} e.
1012: \eeqa
1013: \end{subequations}
1014: The subproblem to be solved for $x^{k,\ell+1}$ differs from
1015: \eqnok{master2.kl} in two ways. First, additional rows may be added to
1016: $\bar{F}_{[j]}^{k,\ell}$ and $\bar{f}_{[j]}^{k,\ell}$, consisting of
1017: function values and subgradients obtained at $x^{k,\ell}$ and also
1018: inactive cuts carried over from the previous \eqnok{master.kl}. Second,
1019: the trust-region radius $\Delta_{k,\ell+1}$ may be smaller than
1020: $\Delta_{k,\ell}$. Hence, the feasible region of the problem to be
1021: solved for $x^{k,\ell+1}$ is a subset of the feasible region for
1022: \eqnok{master2.kl}, so the optimal objective value cannot be smaller.
1023: \end{proof}
1024:
1025: Next we have a result about the amount of reduction in the model
1026: function $m_{k,\ell}$.
1027: \begin{lemma} \labtag{lem:tr:1}
1028: For all $k=0,1,2,\ldots$ and $\ell=0,1,2,\ldots$, we have that
1029: \begin{subequations} \labtag{lem:tr:inequalities}
1030: \beqa
1031: \nonumber
1032: m_{k,\ell}(x^k) - m_{k,\ell}(x^{k,\ell}) &=&
1033: \cQ(x^k) - m_{k,\ell}(x^{k,\ell}) \\
1034: \labtag{tr.2a}
1035: & \ge &
1036: \min \left( \Delta_{k,\ell}, \| x^k - P(x^k)\|_{\infty} \right)
1037: \frac{\cQ(x^k) - \cQ^*}{\| x^k - P(x^k) \|_{\infty}} \\
1038: \labtag{tr.2b}
1039: & \ge &
1040: \hat{\epsilon} \min \left( \Delta_{k,\ell}, \| x^k - P(x^k)\|_{\infty} \right),
1041: \eeqa
1042: \end{subequations}
1043: where $\hat{\epsilon}>0$ is defined in \eqnok{weak.sharp}.
1044: \end{lemma}
1045: \begin{proof}
1046: The first equality follows immediately from \eqnok{mkprop.1a}, while
1047: the second inequality \eqnok{tr.2b} follows immediately from
1048: \eqnok{tr.2a} and \eqnok{weak.sharp}. We now prove \eqnok{tr.2a}.
1049:
1050: Consider the following subproblem in the scalar $\tau$:
1051: \beq \labtag{tr.3}
1052: \min_{\tau \in [0,1]} \, m_{k,\ell} \left( x^k + \tau [ P(x^k) - x^k] \right)
1053: \;\; \mbox{subject to} \; \left\| \tau [ P(x^k) - x^k] \right\|_{\infty} \le
1054: \Delta_{k,\ell}.
1055: \eeq
1056: Denoting the solution of this problem by $\tau_{k,\ell}$, we have by
1057: comparison with \eqnok{trsub.kl} that
1058: \beq \labtag{tr.3a}
1059: m_{k,\ell} (x^{k,\ell}) \le
1060: m_{k,\ell} \left( x^k + \tau_{k,\ell} [ P(x^k) - x^k] \right).
1061: \eeq
1062: If $\tau=1$ is feasible in \eqnok{tr.3}, we have from \eqnok{tr.3a} and
1063: \eqnok{mkprop.2a} that
1064: \beqas
1065: \lefteqn{m_{k,\ell} (x^{k,\ell}) \le
1066: m_{k,\ell} \left( x^k + \tau_{k,\ell} [ P(x^k) - x^k] \right) } \\
1067: & \le &
1068: m_{k,\ell} \left( x^k + [ P(x^k) - x^k] \right)
1069: = m_{k,\ell} (P(x^k)) \le \cQ(P(x^k)) = \cQ^*.
1070: \eeqas
1071: Therefore, when $\tau=1$ is feasible for \eqnok{tr.3}, we have from
1072: \eqnok{mkprop.1a} that
1073: \[
1074: m_{k,\ell}(x^k) - m_{k, \ell}(x^{k,\ell}) \ge \cQ(x^k) - \cQ^*,
1075: \]
1076: so that \eqnok{tr.2a} holds in this case.
1077:
1078: When $\tau=1$ is infeasible for \eqnok{tr.3}, consider setting $\tau =
1079: \Delta_{k,\ell} / \| x^k-P(x^k) \|_{\infty}$ (which is certainly feasible for
1080: \eqnok{tr.3}). We have from \eqnok{tr.3a}, the definition of
1081: $\tau_{k,\ell}$, the fact \eqnok{mkprop.2a} that $m_{k,\ell}$
1082: underestimates $\cQ$, and convexity of $\cQ$ that
1083: \beqas
1084: m_{k,\ell}(x^{k,\ell})
1085: %
1086: & \le & m_{k,\ell}
1087: \left( x^k + \Delta_{k,\ell} \frac{P(x^k)-x^k}{\|P(x^k)-x^k\|_{\infty}} \right) \\
1088: & \le & \cQ
1089: \left( x^k + \Delta_{k,\ell} \frac{P(x^k)-x^k}{\|P(x^k)-x^k\|_{\infty}} \right) \\
1090: & \le & \cQ(x^k) +
1091: \frac{\Delta_{k,\ell}}{\|P(x^k)-x^k\|_{\infty}} (\cQ^* - \cQ(x^k)).
1092: \eeqas
1093: Therefore, using \eqnok{mkprop.1a}, we have
1094: \[
1095: m_{k,\ell}(x^k) - m_{k,\ell}(x^{k,\ell}) \ge
1096: \frac{\Delta_{k,\ell}}{\|P(x^k)-x^k\|_{\infty}} [ \cQ(x^k) - \cQ^* ],
1097: \]
1098: verifying \eqnok{tr.2a} in this case as well.
1099: \end{proof}
1100:
1101: Our next result finds a lower bound on the trust-region radii
1102: $\Delta_{k,\ell}$. For purposes of this result we define a quantity
1103: $E_k$ to measure the closest approach to the solution set for all
1104: iterates up to and including $x^k$, that is,
1105: \beq \labtag{def:Ek}
1106: E_k \defeq \min_{\bar{k}=0,1,\dots,k}
1107: \| x^{\bar{k}} - P(x^{\bar{k}}) \|_{\infty}.
1108: \eeq
1109: Note that $E_k$ decreases monotonically with $k$. We also define
1110: $\Delta_{\rm init}$ to be the initial value of the trust region.
1111: %
1112: \begin{lemma} \labtag{lem:trbounds}
1113: There is a constant $\Delta_{\rm lo} >0$ such that for all trust
1114: regions $\Delta_{k,\ell}$ used in the course of Algorithm TR, we
1115: have
1116: \[
1117: \Delta_{k,\ell} \ge \min( \Delta_{\rm lo}, E_k/4).
1118: \]
1119: \end{lemma}
1120: \begin{proof}
1121: We prove the result by showing that the value $\Delta_{\rm lo} =
1122: (1/4) \min(1, \Delta_{\rm init}, \hat{\epsilon}/\beta)$ has the
1123: desired property, where $\hat{\epsilon}$ is from \eqnok{weak.sharp}
1124: and $\beta$ is from \eqnok{def.beta}.
1125:
1126: Suppose for contradiction that there are indices $k$ and $\ell$ such
1127: that
1128: \[
1129: \Delta_{k, \ell} < \frac14 \min
1130: \left( 1, \frac{\hat{\epsilon}}{\beta}, \Delta_{\rm init}, E_k \right).
1131: \]
1132: Since the trust region can be reduced by at most a factor of $4$ by
1133: Procedure Reduce-$\Delta$, there must be an earlier trust region
1134: radius $\Delta_{\bar{k}, \bar{\ell}}$ (with $\bar{k} \le k$) such that
1135: \beq \labtag{poo.3}
1136: \Delta_{\bar{k},\bar{\ell}} <
1137: \min \left( 1, \frac{\hat{\epsilon}}{\beta}, E_{k} \right),
1138: \eeq
1139: and $\rho>1$ in \eqnok{reduce.delta.2}, that is,
1140: \beqa
1141: \nonumber
1142: \cQ(x^{\bar{k},\bar{\ell}}) - \cQ(x^{\bar{k}}) & > &
1143: \frac{1}{\min(1,\Delta_{\bar{k},\bar{\ell}})}
1144: \left( \cQ(x^{\bar{k}}) -
1145: m_{\bar{k},\bar{\ell}}(x^{\bar{k},\bar{\ell}}) \right) \\
1146: \labtag{poo.4}
1147: & = & \frac{1}{\Delta_{\bar{k},\bar{\ell}}}
1148: \left(
1149: \cQ(x^{\bar{k}}) - m_{\bar{k},\bar{\ell}}(x^{\bar{k},\bar{\ell}})
1150: \right).
1151: \eeqa
1152: By applying Lemma~\ref{lem:tr:1}, and using \eqnok{poo.3}, we have
1153: \beq \labtag{poo.4a}
1154: \cQ(x^{\bar{k}}) - m_{\bar{k},\bar{\ell}}(x^{\bar{k},\bar{\ell}}) \ge
1155: \hat{\epsilon} \min \left( \Delta_{\bar{k},\bar{\ell}},
1156: \| x^{\bar{k}} - P(x^{\bar{k}}) \|_{\infty} \right) =
1157: \hat{\epsilon} \Delta_{\bar{k},\bar{\ell}}
1158: \eeq
1159: where the last equality follows from
1160: $\| x^{\bar{k}} - P(x^{\bar{k}}) \|_{\infty} \ge E_{\bar{k}} \ge E_k$ and
1161: \eqnok{poo.3}.
1162: By combining \eqnok{poo.4a} with \eqnok{poo.4}, we have that
1163: \beq \labtag{poo.5}
1164: \cQ(x^{\bar{k},\bar{\ell}}) - \cQ(x^{\bar{k}}) > \hat{\epsilon}.
1165: \eeq
1166: By using standard properties of subgradients, we have
1167: \beqa
1168: \nonumber
1169: \lefteqn{\cQ(x^{\bar{k},\bar{\ell}}) - \cQ(x^{\bar{k}}) \le
1170: g_{\bar{\ell}}^T(x^{\bar{k},\bar{\ell}} - x^{\bar{k}})} \\
1171: \labtag{subd.5}
1172: & \le &
1173: \| g_{\bar{\ell}} \|_1 \| x^{\bar{k}} - x^{\bar{k},\bar{\ell}} \|_{\infty}
1174: \le \| g_{\bar{\ell}} \|_1 \Delta_{\bar{k},\bar{\ell}}, \;\;
1175: \mbox{for all} \; g_{\bar{\ell}} \in \partial \cQ(x^{\bar{k},\bar{\ell}}).
1176: \eeqa
1177: By combining this expression with \eqnok{poo.5}, and using
1178: \eqnok{poo.3} again, we obtain that
1179: \[
1180: \| g_{\bar{\ell}} \|_1 \ge
1181: \frac{\hat{\epsilon}}{\Delta_{\bar{k},\bar{\ell}}} > \beta.
1182: \]
1183: However, since $x^{\bar{k},\bar{\ell}} \in \cL(\cQ_0;\Delta_{\rm hi})$, we have
1184: from \eqnok{def.beta} that $\| g_{\bar{\ell}} \|_1 \le \beta$, giving a
1185: contradiction.
1186: \end{proof}
1187:
1188: Finite termination of the inner iterations is proved in the following
1189: two results. Recall that the parameters $\xi$ and $\eta$ are defined
1190: in \eqnok{tr.accept} and \eqnok{cut.delete.criterion}, respectively.
1191: \begin{lemma} \labtag{lem:tr:ft}
1192: Let $\epstol=0$ in Algorithm TR, and let $\bar{\eta}$ be
1193: any constant satisfying $0<\bar{\eta}<1$, $\bar{\eta}>\xi$,
1194: $\bar{\eta} \ge \eta$. Let $\ell_1$ be any index such that
1195: $x^{k,\ell_1}$ fails to satisfy the test \eqnok{tr.accept}. Then
1196: either the sequence of inner iterations eventually yields a point
1197: $x^{k,\ell_2}$ satisfying the acceptance test \eqnok{tr.accept}, or
1198: there is an index $\ell_2>\ell_1$ such that
1199: \beq \labtag{tr.6}
1200: \cQ(x^k) - m_{k,\ell_2}(x^{k,\ell_2}) \le \bar{\eta} \left[
1201: \cQ(x^k) - m_{k,\ell_1}(x^{k,\ell_1}) \right].
1202: \eeq
1203: \end{lemma}
1204: \begin{proof}
1205: Suppose for contradiction that the none of the minor iterations
1206: following $\ell_1$ satisfies either \eqnok{tr.accept} or the
1207: criterion \eqnok{tr.6}; that is,
1208: \beqa \nonumber
1209: \cQ(x^k) - m_{k,q}(x^{k,q}) & > & \bar{\eta} \left[
1210: \cQ(x^k) - m_{k,\ell_1}(x^{k,\ell_1}) \right], \\
1211: \labtag{contra}
1212: & \ge & \eta \left[ \cQ(x^k) - m_{k,\ell_1}(x^{k,\ell_1}) \right],
1213: \;\; \mbox{\rm for all $q > \ell_1$}.
1214: \eeqa
1215: It follows from this bound, together with Lemma~\ref{lem:mkl} and
1216: Procedure Model-Update, that none of the cuts generated at minor
1217: iterations $q \ge \ell_1$ is deleted.
1218:
1219: We assume in the remainder of the proof that $q$ and $\ell$ are
1220: generic minor iteration indices that satisfy
1221: \[
1222: q > \ell \ge \ell_1.
1223: \]
1224:
1225: Because the function and subgradients from minor iterations
1226: $x^{k,\ell}$, $l=l_1,l_1+1, \dots$ are retained throughout the major
1227: iteration $k$, we have
1228: \beq \labtag{matchQ}
1229: m_{k,q}(x^{k,\ell}) = \cQ(x^{k,\ell}).
1230: \eeq
1231: By definition of the subgradient, we have
1232: \beq \labtag{subgrad.mkq}
1233: m_{k,q}(x) - m_{k,q}(x^{k,\ell}) \ge g^T (x-x^{k,\ell}), \;\;
1234: \mbox{for all} \; g \in \partial m_{k,q}(x^{k,\ell}).
1235: \eeq
1236: Therefore, from \eqnok{mkprop.2a} and \eqnok{matchQ}, it follows that
1237: \[
1238: \cQ(x)-\cQ(x^{k,\ell}) \ge g^T (x-x^{k,\ell}), \;\; \mbox{for all} \;
1239: g \in \partial m_{k,q}(x^{k,\ell}),
1240: \]
1241: so that
1242: \beq \labtag{mkqQ}
1243: \partial m_{k,q}(x^{k,\ell}) \subset \partial \cQ(x^{k,\ell}).
1244: \eeq
1245:
1246: Since $\cQ(x^k) < \cQ(x^0) = \cQ_0$, we have from \eqnok{def.ls} that
1247: $x^k \in \cL(\cQ_0)$. Therefore, from the definition \eqnok{def.lsn}
1248: and the fact that $\| x^{k,\ell} - x^k \| \le \Delta_{k,\ell} \le
1249: \Delta_{\rm hi}$, we have that $x^{k,\ell} \in \cL(\cQ_0;\Delta_{\rm
1250: hi})$. It follows from \eqnok{def.beta} and \eqnok{mkqQ} that
1251: \beq \labtag{gbound}
1252: \| g \|_1 \le \beta, \;\; \mbox{for all} \; g \in \partial m_{k,q}(x^{k,\ell}).
1253: \eeq
1254:
1255: Since $x^{k,\ell}$ is rejected by the test \eqnok{tr.accept}, we
1256: have from \eqnok{matchQ} and Lemma~\ref{lem:mkl} that the following
1257: inequalities hold:
1258: \beqas
1259: m_{k,q}(x^{k,\ell}) = \cQ(x^{k,\ell})
1260: & \ge &\cQ(x^k) - \xi \left[ \cQ(x^k) - m_{k,\ell}(x^{k,\ell}) \right] \\
1261: & \ge & \cQ(x^k) - \xi \left[ \cQ(x^k) - m_{k,\ell_1}(x^{k,\ell_1}) \right].
1262: \eeqas
1263: By rearranging this expression, we obtain
1264: \beq \labtag{tr.8}
1265: \cQ(x^k) - m_{k,q}(x^{k,\ell}) \le
1266: \xi \left[ \cQ(x^k) - m_{k,\ell_1}(x^{k,\ell_1}) \right].
1267: \eeq
1268:
1269: Consider now all points $x$ satisfying
1270: \beq \labtag{xkl.nbd}
1271: \| x-x^{k,\ell} \|_{\infty} \le
1272: \frac{\bar{\eta}-\xi}{\beta}
1273: \left[ \cQ(x^k)-m_{k,\ell_1}(x^{k,\ell_1}) \right]
1274: \defeq \zeta>0.
1275: \eeq
1276: Using this bound together with \eqnok{subgrad.mkq} and \eqnok{gbound},
1277: we obtain
1278: \beqas
1279: \lefteqn{ m_{k,q}(x^{k,\ell}) - m_{k,q}(x) \le g^T(x^{k,\ell} - x ) } \\
1280: & \le & \beta \| x^{k,\ell}-x \|_{\infty}
1281: \le (\bar{\eta} - \xi) \left[ \cQ(x^k)-m_{k,\ell_1}(x^{k,\ell_1}) \right].
1282: \eeqas
1283: By combining this bound with \eqnok{tr.8}, we find that the following
1284: bound is satisfied for all $x$ in the neighborhood \eqnok{xkl.nbd}:
1285: \beqas
1286: \cQ(x^k) - m_{k,q}(x) &=&
1287: \left[ \cQ(x^k) - m_{k,q}(x^{k,\ell}) \right] +
1288: \left[ m_{k,q}(x^{k,\ell}) - m_{k,q}(x) \right] \\
1289: & \le & \bar{\eta} \left[ \cQ(x^k)-m_{k,\ell_1}(x^{k,\ell_1}) \right].
1290: \eeqas
1291: It follows from this bound, in conjunction with \eqnok{contra}, that
1292: $x^{k,q}$ (the solution of the trust-region problem with model
1293: function $m_{k,q}$) cannot lie in the neighborhood \eqnok{xkl.nbd}.
1294: Therefore, we have
1295: \beq \labtag{meshprop}
1296: \| x^{k,q} - x^{k,\ell} \|_{\infty} > \zeta.
1297: \eeq
1298: But since $\| x^{k,\ell} - x^k \|_{\infty} \le \Delta_k \le
1299: \Delta_{\rm hi}$ for all $\ell \ge \ell_1$, it is impossible for an
1300: infinite sequence $\{ x^{k,\ell} \}_{\ell \ge \ell_1}$ to satisfy
1301: \eqnok{meshprop}. We conclude that \eqnok{tr.6} must hold for some
1302: $\ell_2 \ge \ell_1$, as claimed.
1303: \end{proof}
1304:
1305: We now show that the minor iteration sequence terminates at a point
1306: $x^{k,\ell}$ satisfying the acceptance test, provided that $x^k$ is
1307: not a solution.
1308: \begin{theorem} \labtag{th:tr:ft}
1309: Suppose that $\epstol =0$.
1310: \begin{itemize}
1311: \item[(i)] If $x^k \notin \cS$, there is an $\ell \ge 0$ such that
1312: $x^{k,\ell}$ satisfies \eqnok{tr.accept}.
1313: \item[(ii)] If $x^k \in \cS$, then either Algorithm TR terminates (and verifies that $x^k \in \cS$), or
1314: $\cQ(x^k) - m_{k,\ell}(x^{k,\ell}) \downarrow 0$.
1315: \end{itemize}
1316: \end{theorem}
1317: \begin{proof}
1318: Suppose for the moment that the inner iteration sequence is
1319: infinite, that is, the test \eqnok{tr.accept} always fails. By
1320: applying Lemma~\ref{lem:tr:ft} recursively, with any constant
1321: $\bar{\eta}$ satisfying the properties stated in
1322: Lemma~\ref{lem:tr:ft}, we can identify a sequence of indices $0 <
1323: \ell_1 < \ell_2 < \dots$ such that
1324: \beqa
1325: \nonumber
1326: \cQ(x^k) - m_{k,\ell_j}(x^{k,\ell_j}) & \le &
1327: \bar{\eta} \left[ \cQ(x^k) - m_{k,\ell_{j-1}}(x^{k,\ell_{j-1}}) \right] \\
1328: \nonumber
1329: & \le &
1330: \bar{\eta}^2 \left[ \cQ(x^k) - m_{k,\ell_{j-2}}(x^{k,\ell_{j-2}}) \right] \\
1331: \nonumber
1332: & \vdots & \\
1333: \labtag{minortozero}
1334: & \le &
1335: \bar{\eta}^j \left[ \cQ(x^k) - m_{k,0}(x^{k,0}) \right].
1336: \eeqa
1337: When $x^k \notin \cS$, we have from Lemma~\ref{lem:trbounds} that
1338: \[
1339: \Delta_{k,\ell} \ge \min( \Delta_{\rm lo}, E_k/4)
1340: \defeq \bar{\Delta}_{\rm lo} >0, \;\; \mbox{for all $\ell=0,1,2,\dots$},
1341: \]
1342: so the right-hand side of \eqnok{tr.2a} is strictly positive. Hence
1343: for $j$ sufficiently large, we have that
1344: \[
1345: \cQ(x^k) - m_{k,\ell_j}(x^{k,\ell_j}) \le
1346: 0.5 \min \left( \bar{\Delta}_{\rm lo}, \| x^k-P(x^k) \|_{\infty} \right)
1347: \frac{\cQ(x^k) - \cQ^*}{\| x^k - P(x^k) \|_{\infty}}.
1348: \]
1349: But this inequality contradicts \eqnok{lem:tr:inequalities}, proving (i).
1350:
1351: For the case of $x^k \in \cS$, there are two possibilities. If
1352: the inner iteration sequence terminates finitely at some $x^{k,\ell}$,
1353: we have $\cQ(x^k) - m_{k,\ell}(x^{k,\ell}) = 0$ and indeed that
1354: \[
1355: m_{k,\ell}(x) \ge \cQ(x^k) = \cQ^*, \;\;
1356: \mbox{for all $x$ with $\| x-x^k \|_{\infty} \le \Delta_{k,\ell}$}.
1357: \]
1358: Because of \eqnok{mkprop.2a}, we have that $\cQ(x) \ge \cQ(x^k)$ for
1359: all $x$ in a neighborhood of $x^k$, implying that $0 \in \partial
1360: \cQ(x^k)$. Therefore, termination under these circumstances yields a
1361: guarantee that $x^k \in \cS$. When the algorithm does not terminate,
1362: it follows from \eqnok{minortozero} that $\cQ(x^k) -
1363: m_{k,\ell}(x^{k,\ell}) \to 0$. By applying Lemma~\ref{lem:mkl}, we
1364: verify our claim (ii) of monotonic convergence.
1365: \end{proof}
1366:
1367: We now prove convergence of Algorithm TR to $\cS$.
1368: \begin{theorem} \labtag{th:tr:conv}
1369: Suppose that $\epstol=0$. The sequence of major
1370: iterations $\{ x^k \}$ is either finite, terminating at some $x^k
1371: \in \cS$, or is infinite, with the property that $\| x^k - P(x^k)
1372: \|_{\infty} \to 0$.
1373: \end{theorem}
1374: \begin{proof}
1375: If the claim does not hold, there are two possibilities. The first
1376: is that the sequence of major iterations terminates finitely at some
1377: $x^k \notin \cS$. However, Theorem~\ref{th:tr:ft} ensures, however, that the
1378: minor iteration sequence will terminate at some new major iteration
1379: $x^{k+1}$ under these circumstances, so we can rule out this
1380: possibility. The second possibility is that the sequence $\{x^k\}$
1381: is infinite but that there is some $\epsilon >0$ and an infinite
1382: subsequence of indices $\{ k_j \}_{j=1,2,\dots}$ such that
1383: \[
1384: \| x^{k_j} - P(x^{k_j}) \|_{\infty} \ge \epsilon, \;\; j=0,1,2,\dots.
1385: \]
1386: Since the sequence $\{ \cQ(x^{k_j}) \}_{j=1,2,\dots}$ is infinite,
1387: decreasing, and bounded below, it converges to some value $\bar{\cQ} >
1388: \cQ^*$. Moreover, since the entire sequence $\{ \cQ(x^k) \}$ is
1389: monotone decreasing, it follows that $\cQ(x^k) > \bar{\cQ}$ and
1390: therefore
1391: \[
1392: \cQ(x^k) - \cQ^* > \bar{\cQ} - \cQ^* > 0, \;\; k=0,1,2,\dots.
1393: \]
1394: Hence, by boundedness of the subgradients (see \eqnok{def.beta}), we
1395: can identify a constant $\bar{\epsilon}>0$ such that
1396: \[
1397: \| x^k - P(x^k) \|_{\infty} \ge \bar{\epsilon}, \;\; k=0,1,2,\dots.
1398: \]
1399: It follows from \eqnok{def:Ek} that
1400: \beq \labtag{Ekbb}
1401: E_k \ge \bar{\epsilon}, \;\; k=0,1,2,\dots.
1402: \eeq
1403:
1404: For each major iteration index $k$, let $\ell(k)$ be the minor
1405: iteration index that passes the acceptance test \eqnok{tr.accept}. By combining \eqnok{tr.accept} with Lemma~\ref{lem:tr:1}, we have that
1406: \[
1407: \cQ(x^k) - \cQ(x^{k+1}) \ge \xi \hat{\epsilon} \min
1408: \left( \Delta_{k, \ell(k)}, \|x^k - P(x^k) \|_{\infty} \right)
1409: \ge \xi \hat{\epsilon} \min
1410: \left( \Delta_{k, \ell(k)}, \bar{\epsilon} \right).
1411: \]
1412: Since $\cQ(x^k) - \cQ(x^{k+1}) \to 0$, we deduce that
1413: \beq \labtag{poo.8}
1414: \lim_{k \to \infty} \Delta_{k, \ell(k)} = 0.
1415: \eeq
1416: By Lemma~\ref{lem:trbounds} and \eqnok{Ekbb}, we have
1417: \[
1418: \Delta_{k, \ell(k)} \ge \min (\Delta_{\rm lo}, \bar{\epsilon}/4) >0, \;\;
1419: k=0,1,2,\dots,
1420: \]
1421: which contradicts \eqnok{poo.8}. We conclude that the second
1422: possibility (an infinite sequence $\{ x^k \}$ not converging to $\cS$)
1423: cannot occur either, so the proof is complete.
1424: \end{proof}
1425:
1426: Finally, we show that the algorithm terminates when $\epstol>0$.
1427: %
1428: \begin{theorem} \labtag{th:fint}
1429: When $\epstol>0$, Algorithm TR terminates finitely.
1430: \end{theorem}
1431: \begin{proof}
1432: We show first that the algorithm cannot ``get stuck'' at a
1433: particular $x^k$, generating an infinite sequence of minor
1434: iterations at $x^k$ without eventually satisfying either
1435: \eqnok{conv.test} or the acceptance test \eqnok{tr.accept}. We see
1436: from the reasoning in the proof of Theorem~\ref{th:tr:ft} together
1437: with the monotonicity property of Lemma~\ref{lem:mkl} that an
1438: infinite sequence of minor iterations must satisfy that
1439: \beq \labtag{fint.1}
1440: \cQ(x^k) - m_{k,\ell}(x^{k,\ell}) \downarrow 0.
1441: \eeq
1442: Since the right-hand side of \eqnok{conv.test} is bounded below by
1443: $\epstol$, the test \eqnok{conv.test} must be
1444: satisfied for some $\ell$. Therefore, the minor iteration
1445: sequence cannot be infinite.
1446:
1447: Now consider the other possibility of an infinite sequence of major
1448: iterations $\{ x^k \}_{k=1,2,\dots}$. Since we have
1449: \[
1450: \cQ(x^k) - m_{k,\ell}(x^{k,\ell}) > \epstol
1451: \]
1452: for all $k$ and $\ell$, and since the acceptance test
1453: \eqnok{tr.accept} is satisfied at all $k$, we have
1454: \[
1455: \cQ(x^k) - \cQ(x^{k+1}) \ge \xi \epstol >0, \;\;
1456: \makebox{for all $k=0,1,2\dots$}.
1457: \]
1458: But this relation is inconsistent with the fact that $\{ \cQ(x^k) \}$
1459: is bounded below (by $\cQ^*$), so this possibility can also be ruled
1460: out, and the proof is complete.
1461: \end{proof}
1462:
1463:
1464:
1465: %
1466: %
1467: %
1468: %
1469: %
1470: %
1471: %
1472: %
1473: %
1474: %
1475: %
1476: %
1477:
1478: %
1479: %
1480: %
1481: %
1482: %
1483: %
1484: %
1485: %
1486: %
1487: %
1488: %
1489: %
1490: %
1491: %
1492: %
1493: %
1494: %
1495: %
1496: %
1497: %
1498: %
1499: %
1500: %
1501: %
1502: %
1503: %
1504: %
1505: %
1506: %
1507: %
1508: %
1509: %
1510: %
1511: %
1512: %
1513: %
1514: %
1515: %
1516: %
1517: %
1518: %
1519: %
1520:
1521: \subsection{Discussion} \labtag{sec:tr:discussion}
1522:
1523: The algorithm can be modified in various ways without
1524: changing its properties greatly. For instance, we could replace the
1525: step norm bound in \eqnok{trsub.kl} by a scaled bound of the form
1526: \[
1527: \| S (x-x^k) \|_{\infty} \le \Delta_k,
1528: \]
1529: where $S$ is a diagonal positive definite matrix. After
1530: this modification, \eqnok{master.kl} remains a linear program. We
1531: could also use a $1$-norm trust region, at the cost of introducing an
1532: additional variable vector $s$ of the same dimension as $x$.
1533: Specifically, we enforce the constraint $\|x-x^k \|_1 \le \Delta_k$ by
1534: enforcing the following linear constraints:
1535: \[
1536: x-x^k \le s, \sgap x^k-x \le s, \sgap e^Ts \le \Delta_k.
1537: \]
1538: Once again, we obtain a linear programming subproblem, albeit one that
1539: involves more variables than \eqnok{master.kl}
1540:
1541: If a $2$-norm trust region is used, we can show by comparing the
1542: optimality conditions for the respective problems that the solution of
1543: the subproblem
1544: \[
1545: \min_x \, m_{k,\ell}(x) \;\; \mbox{subject to} \;Ax=b, \; x \ge 0, \;
1546: \| x-x^k \|_2 \le \Delta_k
1547: \]
1548: is identical to the solution of
1549: \beq \labtag{trsub.2norm}
1550: \min_x \, m_{k,\ell}(x) + \lambda \| x-x^k \|^2 \;\;
1551: \mbox{subject to} \;Ax=b, \; x \ge 0,
1552: \eeq
1553: for some $\lambda \ge 0$.
1554: %
1555: %
1556: We can transform \eqnok{trsub.2norm} to a quadratic program in the
1557: same fashion as the transformation of \eqnok{trsub.kl} to
1558: \eqnok{master.kl}. The bundle-trust-region approaches described in
1559: Kiwiel~\cite{Kiw90}, Hirart-Urruty and
1560: Lemar\'echal~\cite[Chapter~XV]{HirL93}, and
1561: Ruszczy{\'n}ski~\cite{Rus86,Rus93} also lead to problems of the form
1562: \eqnok{trsub.2norm}. These approaches manipulate the parameter
1563: $\lambda$ rather than adjusting the trust-region radius, more in the
1564: spirit of the Levenberg-Marquardt method for least-squares problems
1565: than of a true trust-region method. Hence, their analysis differs
1566: somewhat from that of the preceding section. Moreover, although
1567: quadratic programming solvers that exploit the special structure of
1568: the quadratic term in \eqnok{trsub.2norm} have been designed and
1569: implemented (see \cite{Rus86}), we believe that the linear programming
1570: subproblem \eqnok{master.kl} is more appealing from a practical point
1571: of view. Improvements in the efficiency and ease of use of linear
1572: programming software have continued to occur at a rapid pace, and
1573: availability of high-quality software has made it much easier to
1574: implement an efficient algorithm based on \eqnok{master.kl} than would
1575: have been the case if the subproblems had the form
1576: \eqnok{trsub.2norm}.
1577:
1578:
1579: %
1580: %
1581: %
1582:
1583:
1584: %
1585: %
1586: %
1587: %
1588: %
1589: %
1590:
1591: %
1592: %
1593: %
1594: %
1595: %
1596:
1597:
1598:
1599: \section{An Asynchronous Bundle-Trust-Region Method}
1600: \labtag{sec:atr}
1601:
1602: In this section we present an asynchronous, parallel version of the
1603: trust-region algorithm of the preceding section and analyze its
1604: convergence properties.
1605:
1606: \subsection{Algorithm ATR} \labtag{sec:atr:atr}
1607:
1608: We now define a variant of the method of Section~\ref{sec:tr} that
1609: allows the partial sums $\cQ_{[j]}, j=1,2,\dots,T$ \eqnok{thetaj} and
1610: their associated cuts to be evaluated simultaneously for different
1611: values of $x$. We generate candidate iterates by solving trust-region
1612: subproblems centered on an ``incumbent'' iterate, which (after a
1613: startup phase) is the point $x^I$ that, roughly speaking, is the best
1614: among those visited by the algorithm whose function value $\cQ(x)$ is
1615: fully known.
1616:
1617: By performing evaluations of $\cQ$ at different points concurrently,
1618: we relax the strict synchronicity requirements of Algorithm TR, which
1619: requires $\cQ(x^k)$ to be evaluated fully before the next candidate
1620: $x^{k+1}$ is generated. The resulting approach, which we call
1621: Algorithm ATR (for ``asynchronous TR''), is more suitable for
1622: implementation on computational grids of the type we consider here.
1623: Besides the obvious increase in parallelism that goes with evaluating
1624: several points at once, there is no longer a risk of the entire
1625: computation being help up by the slow evaluation of one of the partial
1626: sums $\cQ_{[j]}$ on a recalcitrant worker. Algorithm ATR has similar
1627: theoretical properties to Algorithm TR, since the mechanisms for
1628: accepting a point as the new incumbent, adjusting the size of the
1629: trust region, and adding and deleting cuts are all similar to the
1630: corresponding mechanisms in Algorithm TR.
1631:
1632: Algorithm ATR maintains a ``basket'' $\cB$ of at most $K$ points for
1633: which the value of $\cQ$ and associated subgradient information is
1634: partially known. When the evaluation of $\cQ(x^q)$ is completed for a
1635: particular point $x^q$ in the basket, it is installed as the new
1636: incumbent if (i) its objective value is smaller than that of the
1637: current incumbent $x^I$; and (ii) it passes a trust-region acceptance
1638: test like \eqnok{tr.accept}, with the incumbent {\em at the time $x^q$
1639: was generated} playing the role of the previous major iteration in
1640: Algorithm TR. Whether $x^q$ becomes the incumbent or not, it is
1641: removed from the basket.
1642:
1643: When a vacancy arises in the basket, we may generate a new point by
1644: solving a trust-region subproblem similar to \eqnok{trsub.kl},
1645: centering the trust region at the current incumbent $x^I$. During the
1646: startup phase, while the basket is being populated, we wait until the
1647: evaluation of some other point in the basket has reached a certain
1648: level of completion (that is, until a proportion $\sigma \in (0,1]$ of
1649: the partial sums \eqnok{thetaj} and their subgradients have been
1650: evaluated) before generating a new point. We use a logical variable
1651: ${\tt speceval}_q$ to indicate when the evaluation of $x^q$ passes the
1652: specified threshold and to ensure that $x^q$ does not trigger the
1653: evaluation of more than one new iterate. (Both $\sigma$ and ${\tt
1654: speceval}_q$ play a similar role in Algorithm ALS.) After the
1655: startup phase is complete (that is, after the basket has been filled),
1656: vacancies arise only after evaluation of an iterate $x^q$ is
1657: completed.
1658:
1659: %
1660: We use $m(\cdot)$ (without
1661: subscripts) to denote the model function for $\cQ(\cdot)$. When
1662: generating a new iterate, we use whatever cuts are stored at the
1663: time to define $m$. When solved around the incumbent $x^I$
1664: with trust-region radius $\Delta$, the subproblem is as follows:
1665: \beq
1666: \labtag{trsub.atr1} \mbox{\tt trsub$(x^I, \Delta)$:} \;\; \min_x \,
1667: m(x) \;\; \mbox{subject to} \;Ax=b, \; x \ge 0, \; \| x- x^I
1668: \|_{\infty} \le \Delta.
1669: \eeq
1670: We refer to $x^I$ as the {\em parent incumbent} of the solution of
1671: \eqnok{trsub.atr1}.
1672: %
1673: %
1674: %
1675: %
1676: %
1677: %
1678: %
1679: %
1680: %
1681: %
1682: %
1683: %
1684: %
1685:
1686: %
1687: %
1688: %
1689:
1690: In the following description, we use $k$ to index the successive
1691: points $x^k$ that are explored by the algorithm, $I$ to denote the
1692: index of the incumbent, and $\cB$ to denote the basket. We use $t_k$
1693: to count the number of partial sums $\cQ_{[j]}(x^k)$, $j=1,2,\dots,T$
1694: that have been evaluated so far.
1695:
1696: %
1697: %
1698: %
1699: %
1700: %
1701: %
1702: %
1703: %
1704:
1705: %
1706: %
1707: %
1708: %
1709: %
1710: %
1711: %
1712: %
1713:
1714: Given a starting guess $x^0$, we initialize the algorithm by setting
1715: the dummy point $x^{-1}$ to $x^0$, setting the incumbent index $I$ to
1716: $-1$, and setting the initial incumbent value $\cQ^I =\cQ^{-1}$ to
1717: $\infty$. The iterate at which the first evaluation is completed
1718: becomes the first ``serious'' incumbent.
1719:
1720: We now outline some other notation used in specifying Algorithm ATR:
1721: %
1722: \bi
1723:
1724: \item[$\cQ^I$:] The objective value of the incumbent $x^I$, except in
1725: the case of $I=-1$, in which case $\cQ^{-1} = \infty$.
1726:
1727: \item[$I_q$:] The index of the parent incumbent of $x^q$, that is, the
1728: incumbent index $I$ at the time that $x^q$ was generated from
1729: \eqnok{trsub.atr1}. Hence, $\cQ^{I_q} = \cQ(x^{I_q})$ (except when
1730: $I_q=-1$; see previous item).
1731:
1732: \item[$\Delta_q$:] The value of the trust-region radius $\Delta$ used
1733: when solving for $x^q$.
1734:
1735: \item[$\Delta_{\rm curr}$:] Current value of the trust-region
1736: radius. When it comes time to solve \eqnok{trsub.atr1} to obtain a new
1737: iterate $x^q$, we set $\Delta_q \leftarrow \Delta_{\rm curr}$.
1738:
1739: \item[$m^q$:] The optimal value of the objective function $m$ in the
1740: subproblem {\tt trsub}$(x^{I_q}, \Delta_q)$ \eqnok{trsub.atr1}.
1741:
1742: %
1743: %
1744:
1745: \ei
1746: %
1747: %
1748: %
1749: %
1750: %
1751: %
1752: %
1753: %
1754: %
1755:
1756: Our strategy for maintaining the model closely follows that of
1757: Algorithm TR. Whenever the incumbent changes, we have a fairly free
1758: hand in deleting the cuts that define $m$, just as we do after
1759: accepting a new major iterate in Algorithm TR. If the incumbent does
1760: not change for a long sequence of iterations (corresponding to a long
1761: sequence of minor iterations in Algorithm TR), we can still delete
1762: ``stale'' cuts that represent information in $m$ that has likely been
1763: superseded (as quantified by a parameter $\eta \in [0,1)$). The
1764: following version of Procedure Model-Update, which applies to
1765: Algorithm ATR, takes as an argument the index $k$ of the latest
1766: iterate generated by the algorithm. It is called after the evaluation
1767: of $\cQ$ at an earlier iterate $x^q$ has just been completed, but
1768: $x^q$ does {\em not} meet the conditions needed to become the new
1769: incumbent.
1770: %
1771: \btab
1772: \> {\bf Procedure Model-Update} $(k)$ \\
1773: \> {\bf for each} optimality cut defining $m$\\
1774: \>\> {\tt possible\_delete} $\leftarrow$ {\tt true}; \\
1775: \>\> {\bf if} the cut was generated at the parent incumbent $I_k$ of $k$\\
1776: \>\>\> {\tt possible\_delete} $\leftarrow$ {\tt false}; \\
1777: \>\> {\bf else if} the cut was active at the solution $x^k$ of
1778: {\tt trsub}$(x^{I_k},\Delta_k)$ \\
1779: \>\>\> {\tt possible\_delete} $\leftarrow$ {\tt false}; \\
1780: \>\> {\bf else if} the cut was generated at an earlier
1781: iteration $\bar{\ell}$ \\
1782: \>\>\>\> such that $I_{\bar{\ell}} = I_k \neq -1$ and
1783: \etab
1784: \beq \labtag{atr.cut.delete.criterion}
1785: \cQ^{I_k} - m^k > \eta [ \cQ^{I_k} - m^{\bar{\ell}} ]
1786: \eeq
1787: \btab
1788: \>\>\> {\tt possible\_delete} $\leftarrow$ {\tt false}; \\
1789: \>\> {\bf end (if)} \\
1790: %
1791: %
1792: \>\> {\bf if} {\tt possible\_delete} \\
1793: \>\>\> possibly delete the cut; \\
1794: \> {\bf end (for each)}
1795: \etab
1796: %
1797:
1798: Our strategy for adjusting the trust region $\Delta_{\rm curr}$
1799: also follows that of Algorithm TR. The differences arise from the fact
1800: that between the time an iterate $x^q$ is generated and its function
1801: value $\cQ(x^q)$ becomes known, other adjustments of $\Delta_{\rm
1802: current}$ may have occurred, as the evaluation of intervening iterates
1803: is completed. The version of Procedure Reduce-$\Delta$ for
1804: Algorithm ATR is as follows.
1805: %
1806: \btab
1807: \> {\bf Procedure Reduce-$\Delta(q)$} \\
1808: \> {\bf if} $I_q = -1$ \\
1809: \>\> return; \\
1810: \> evaluate
1811: \etab
1812: \beq \labtag{atr.reduce.delta.2}
1813: \rho = {\min(1,\Delta_q)}
1814: \frac{\cQ(x^q) - \cQ^{I_q}}{\cQ^{I_q} - m^q};
1815: \eeq
1816: \btab
1817: \> {\bf if} $\rho>0$ \\
1818: \>\> {\tt counter} $\leftarrow$ {\tt counter}$+1$; \\
1819: \> {\bf if} $\rho>3$ {\bf or}
1820: ({\tt counter} $\ge 3$ {\bf and} $\rho \in (1,3]$) \\
1821: \>\> set $\Delta_q^+ \leftarrow \Delta_q / \min(\rho,4)$; \\
1822: \>\> set
1823: $\Delta_{\rm curr} \leftarrow \min(\Delta_{\rm curr}, \Delta_q^+)$; \\
1824: \>\> reset {\tt counter} $\leftarrow 0$; \\
1825: \> return.
1826: \etab
1827: %
1828:
1829: The protocol for increasing the trust region after a successful step
1830: is based on \eqnok{tr.incr.1}, \eqnok{tr.incr.3}. If on completion of
1831: evaluation of $\cQ(x^q)$, the iterate $x^q$ becomes the new incumbent,
1832: then we test the following condition:
1833: \beq \labtag{atr.incr.1}
1834: \cQ(x^q) \le \cQ^{I_q} - 0.5 (\cQ^{I_q} - m^q) \;\; \mbox{and} \;\;
1835: \| x^q - x^{I_q} \|_{\infty} = \Delta_q.
1836: \eeq
1837: If this condition is satisfied, we set
1838: \beq \labtag{atr.incr.3}
1839: \Delta_{\rm curr} \leftarrow \max(\Delta_{\rm curr},
1840: \min (\Delta_{\rm hi}, 2 \Delta_q) ).
1841: \eeq
1842:
1843: The convergence test is also similar to the test \eqnok{conv.test}
1844: used for Algorithm TR. We terminate if, on generation of a new iterate
1845: $x^k$, we find that
1846: \beq \labtag{conv.test.atr}
1847: \cQ^I - m^k \le \epstol (1+|\cQ^I|).
1848: \eeq
1849:
1850:
1851: We now specify the four key routines of the Algorithm ATR, which serve
1852: a similar function to the four main routines of Algorithm ALS. As in
1853: the earlier case, we assume for simplicity of description that each
1854: task consists of evaluation of the function and a subgradient for
1855: a single cluster (although in practice we may bundle more than one
1856: cluster into a single task). The routine {\tt partial\_evaluate}
1857: executes on worker processors, while the other three routines execute
1858: on the master processor.
1859:
1860: %
1861: %
1862: %
1863: %
1864: %
1865: %
1866: %
1867: %
1868: %
1869: %
1870: %
1871: %
1872: %
1873: %
1874: %
1875: %
1876: %
1877: %
1878: %
1879:
1880: \btab
1881: \>{\bf ATR:} \ \ {\tt partial\_evaluate}$(x^q,q,j,\cQ_{[j]}(x^q),g_j)$ \\
1882: \> Given $x^q$, index $q$, and partition number $j$,
1883: evaluate $\cQ_{[j]}(x^q)$ from \eqnok{thetaj} \\
1884: \>\> together with a partial subgradient $g_j$ from \eqnok{subg.Qj}; \\
1885: \> Activate {\tt act\_on\_completed\_task}$(x^q,q,j,\cQ_{[j]}(x^q),g_j)$
1886: on the master processor.
1887: \etab
1888:
1889: \medskip
1890:
1891: \btab
1892: \> {\bf ATR:} \ \ {\tt evaluate}$(x^q,q)$ \\
1893: \> {\bf for} $j=1,2,\dots, T$ (possibly concurrently) \\
1894: \>\> {\tt partial\_evaluate}$(x^q,q,j,\cQ_{[j]}(x^q), g_j)$; \\
1895: \> {\bf end (for)}
1896: \etab
1897:
1898: \medskip
1899:
1900: \btab
1901: \> {\bf ATR:} \ \ {\tt initialization}$(x^0)$ \\
1902: \> choose $\xi \in (0,1/2)$, trust region upper bound
1903: $\Delta_{\rm hi}>0$; \\
1904: \> choose synchronicity parameter $\sigma \in (0,1]$; \\
1905: \> choose maximum basket size $K>0$; \\
1906: \> choose $\Delta_{\rm curr} \in (0, \Delta_{\rm hi}]$,
1907: {\tt counter} $\leftarrow 0$; $\cB \leftarrow \emptyset$; \\
1908: \> $I \leftarrow -1$; $x^{-1} \leftarrow x^0$; $\cQ^{-1} \leftarrow \infty$;
1909: $I_0 \leftarrow -1$; \\
1910: \> $k \leftarrow 0$;
1911: ${\tt speceval}_0 \leftarrow {\tt false}$;
1912: $t_0 \leftarrow 0$; \\
1913: \> {\tt evaluate}$(x^0,0)$.
1914: \etab
1915:
1916: \medskip
1917:
1918: \btab
1919: \> {\bf ATR:} \ \
1920: {\tt act\_on\_completed\_task}$(x^q,q,j,\cQ_{[j]}(x^q),g_j))$ \\
1921: \> $t_q \leftarrow t_q+1$; \\
1922: \> add $\cQ_{[j]}(x^q)$ and cut $g_j$ to the model $m$; \\
1923: \> {\tt basketFill} $\leftarrow$ {\tt false};
1924: {\tt basketUpdate} $\leftarrow$ {\tt false}; \\
1925: \> {\bf if} $t_q=T$ (* evaluation of $\cQ(x^q)$ is complete *) \\
1926: \>\> {\bf if} $\cQ(x^q) < \cQ^I$ and (${I_q}=-1$ or
1927: $\cQ(x^q) \le \cQ^{I_q} - \xi (\cQ^{I_q} - m^q)$) \\
1928: \>\>\> (* make $x^q$ the new incumbent *) \\
1929: \>\>\> $I \leftarrow q$; $\cQ^I \leftarrow \cQ(x^I)$; \\
1930: \>\>\> possibly increase $\Delta_{\rm curr}$ according to
1931: \eqnok{atr.incr.1} and \eqnok{atr.incr.3}; \\
1932: \>\>\> modify the model function by possibly deleting cuts not arising \\
1933: \>\>\>\> from the evaluation of $\cQ(x^q)$; \\
1934: \>\> {\bf else} \\
1935: \>\>\> call Model-Update$(k)$; \\
1936: \>\>\> call Reduce-$\Delta(q)$ to update $\Delta_{\rm curr}$; \\
1937: \>\> {\bf end (if)} \\
1938: \>\> $\cB \leftarrow \cB \backslash \{ q \}$; \\
1939: \>\> {\tt basketUpdate} $\leftarrow$ {\tt true}; \\
1940:
1941: \> {\bf else if }
1942: $t_q \ge \sigma T$ {\bf and} $| \cB| <K$ {\bf and} not ${\tt speceval}_q$ \\
1943: \>\> (* basket-filling phase: enough partial sums have been evaluated at $x^q$
1944: \\
1945: \>\>\> to trigger calculation of a new candidate iterate *) \\
1946: \>\> ${\tt speceval}_q \leftarrow ${\tt true};
1947: {\tt basketFill} $\leftarrow$ {\tt true}; \\
1948: \> {\bf end (if)} \\
1949:
1950: \> {\bf if } {\tt basketFill} {or}
1951: {\tt basketUpdate} \\
1952: \>\> $k \leftarrow k+1$;
1953: set $\Delta_k \leftarrow \Delta_{\rm curr}$; set $I_k \leftarrow I$; \\
1954: \>\> solve {\tt trsub}$(x^I,\Delta_k)$ to obtain $x^k$; \\
1955: \>\> $m^k \leftarrow m(x^k)$; \\
1956: \>\> {\bf if} \eqnok{conv.test.atr} holds \\
1957: \>\>\> STOP; \\
1958: \>\> $\cB \leftarrow \cB \cup \{ k \}$; \\
1959: \>\> ${\tt speceval}_k \leftarrow${\tt false}; $t_k \leftarrow 0$; \\
1960: \>\> {\tt evaluate}$(x^k,k)$; \\
1961: \> {\bf end (if)}
1962:
1963: \etab
1964:
1965: It is not generally true that the first $K$ iterates $x^0, x^1, \dots,
1966: x^{K-1}$ generated by the algorithm are all basket-filling
1967: iterates. Often, an evaluation of some iterate is completed before the
1968: basket has filled completely, so a ``basket-update'' iterate is used
1969: to generate a replacement for this point. Since each basket-update
1970: iterate does not change the size of the basket, however, the number of
1971: basket-filling iterates that are generated in the course of the
1972: algorithm is exactly $K$.
1973:
1974: \subsection{Analysis of Algorithm ATR} \labtag{sec:atr:analysis}
1975:
1976: We now analyze Algorithm ATR, showing that its convergence properties
1977: are similar to those of Algorithm TR. Throughout, we make the
1978: following assumption:
1979: %
1980: \beq \labtag{all.tasks.completed}
1981: \mbox{Every task is completed after a finite time}.
1982: \eeq
1983: %
1984: %
1985: %
1986: %
1987:
1988: The analysis follows closely that of Algorithm TR presented in
1989: Section~\ref{sec:tr:analysis}. We state the analogues of all the
1990: lemmas and theorems from the earlier section, incorporating the
1991: changes and redefinitions needed to handle Algorithm ATR. Most of the
1992: details of the proofs are omitted, however, since they are similar to
1993: those of the earlier results.
1994:
1995: We start by defining the level set within which the points and
1996: incumbents generated by ATR lie.
1997: \begin{lemma} \labtag{lem:atr1.1}
1998: All incumbents $x^I$ generated by ATR lie in $\cL(\cQ_{\rm max})$,
1999: whereas all points $x^k$ considered by the algorithm lie in
2000: $\cL(\cQ_{\rm max}; \Delta_{\rm hi})$, where $\cL(\cdot)$ and
2001: $\cL(\cdot;\cdot)$ are defined by \eqnok{def.ls} and \eqnok{def.lsn},
2002: respectively, and $\cQ_{\rm max}$ is defined by
2003: \[
2004: \cQ_{\rm max} \defeq \sup \{ \cQ(x) \, | \,
2005: \| x-x^0 \| \le \Delta_{\rm hi} \}.
2006: \]
2007: \end{lemma}
2008: \begin{proof}
2009: Consider first what happens in ATR before the first function
2010: evaluation is complete. Up to this point, all the iterates $x^k$ in
2011: the basket are generated in the basket-filling part and therefore
2012: satisfy $\| x^k-x^0 \| \le \Delta_k \le \Delta_{\rm hi}$, with
2013: $\cQ^{I_k} = \cQ^{-1} = \infty$.
2014:
2015: When the first evaluation is completed (by $x^k$, say), it trivially
2016: passes the test to be accepted as the new incumbent. Hence, the
2017: first noninfinite incumbent value becomes $\cQ^I = \cQ(x^k)$, and
2018: by definition we have $\cQ^I \le \cQ_{\rm max}$. Since all later
2019: incumbents must have objective values smaller than this first
2020: $\cQ^I$, they all must lie in the level set $\cL(\cQ_{\rm max})$,
2021: proving our first statement.
2022:
2023: All points $x^k$ generated within {\tt act\_on\_completed\_task} lie
2024: within a distance $\Delta_k \le \Delta_{\rm hi}$ either of $x^0$ or of
2025: one of the later incumbents $x^I$. Since all the incumbents, including
2026: $x^0$, lie in $\cL(\cQ_{\rm max})$, we conclude that the second claim
2027: in the theorem is also true.
2028: %
2029:
2030: \end{proof}
2031:
2032: Analogously with $\beta$ \eqnok{def.beta}, we define a bound on the
2033: subgradients over the set $\cL(\cQ_{\rm max}; \Delta_{\rm hi})$ as
2034: follows:
2035: \beq \labtag{def.barbeta}
2036: \bar{\beta} = \sup \{ \| g \|_1 \, | \, g \in \partial \cQ(x), \,
2037: \mbox{for some $x \in \cL(\cQ_{\rm max};\Delta_{\rm hi})$} \}.
2038: \eeq
2039:
2040: The next result is analogous to Lemma~\ref{lem:mkl}. It shows that for
2041: any sequence of iterates $x^k$ for which the parent incumbent $x^I_k$
2042: is the same, the optimal objective value in {\tt trsub}$(x^{I_k},
2043: \Delta_k)$ is monotonically increasing.
2044: \begin{lemma} \labtag{lem:mkl.atr}
2045: Consider any contiguous subsequence of iterates $x^{k}$,
2046: $k=k_1,k_1+1,\dots, k_2$ for which the parent incumbent is identical;
2047: that is, $I_{k_1}=I_{k_1+1}= \cdots = I_{k_2}$. Then we have
2048: \[
2049: m^{k_1} \le m^{k_1+1} \le \cdots \le m^{k_2}.
2050: \]
2051: \end{lemma}
2052: \begin{proof}
2053: We select any $k=k_1, k_1+1, \dots, k_2-1$ and
2054: prove that $m^k \le m^{k+1}$.
2055: Since $x^k$ and $x^{k+1}$ have the same parent incumbent ($x^I$, say),
2056: no new incumbent has been accepted between the generation of these two
2057: iterates, so the wholesale cut deletion that may occur with the
2058: adoption of a new incumbent cannot have occurred. There may, however,
2059: have been a call to {\tt Model-Update}$(k)$. The
2060: first ``else if'' clause in {\tt Model-Update} would have ensured that
2061: cuts active at the solution of {\tt trsub}$(x^I, \Delta_k)$ were still
2062: present in the model when we solved {\tt trsub}$(x^I, \Delta_{k+1})$ to
2063: obtain $x^{k+1}$. Moreover, since no new incumbent was accepted,
2064: $\Delta_{\rm curr}$ cannot have been increased, and we have
2065: $\Delta_{k+1} \le \Delta_k$. We now use the same argument as in the
2066: proof of Lemma~\ref{lem:mkl} to deduce that $m^{k} \le m^{k+1}$.
2067: \end{proof}
2068:
2069: The following result is analogous to Lemma~\ref{lem:tr:1}. We omit the
2070: proof, which modulo the change in notation is identical to the earlier
2071: result.
2072: \begin{lemma} \labtag{lem:atr:1}
2073: For all $k=0,1,2,\ldots$ such that $I_k \neq -1$, we have that
2074: \begin{subequations} \labtag{lem:atr:inequalities}
2075: \beqa
2076: \labtag{atr.2a}
2077: \cQ^{I_k} - m^k & \ge &
2078: \min \left( \Delta_{k}, \| x^{I_k} - P(x^{I_k})\|_{\infty} \right)
2079: \frac{\cQ^{I_k} - \cQ^*}{\| x^{I_k} - P(x^{I_k}) \|_{\infty}} \\
2080: \labtag{atr.2b}
2081: & \ge &
2082: \hat{\epsilon} \min
2083: \left( \Delta_{k}, \| x^{I_k} - P(x^{I_k})\|_{\infty} \right),
2084: \eeqa
2085: \end{subequations}
2086: where $\hat{\epsilon}>0$ is defined in \eqnok{weak.sharp}.
2087: \end{lemma}
2088:
2089: The following analogue of Lemma~\ref{lem:trbounds} requires a slight
2090: redefinition of the quantity $E_k$ from \eqnok{def:Ek}. We now
2091: define it to be the closest approach by an {\em incumbent} to the
2092: solution set, up to and including iteration $k$; that is,
2093: \beq \labtag{def:Ek:atr}
2094: E_k \defeq \min_{\bar{k} = 0,1,\dots, k; I_{\bar{k}} \neq -1}
2095: \| x^{I_{\bar{k}}} - P(x^{I_{\bar{k}}}) \|_{\infty}.
2096: \eeq
2097: %
2098: We also omit the proof of the following result, which, allowing for
2099: the change of notation, is almost identical to that of
2100: Lemma~\ref{lem:trbounds}.
2101: %
2102: \begin{lemma} \labtag{lem:trbounds:atr}
2103: There is a constant $\Delta_{\rm lo} >0$ such that for all trust
2104: regions $\Delta_{k}$ used in the course of Algorithm ATR, we
2105: have
2106: \[
2107: \Delta_{k} \ge \min( \Delta_{\rm lo}, E_k/4).
2108: \]
2109: \end{lemma}
2110: The value of $\Delta_{\rm lo}$ that works in this case is $\Delta_{\rm
2111: lo} = (1/4) \min(1, \hat{\epsilon}/\bar{\beta}, \Delta_{\rm hi})$,
2112: where $\bar{\beta}$ comes from \eqnok{def.barbeta}.
2113:
2114: There is also an analogue of Lemma~\ref{lem:tr:ft} that shows that if
2115: the incumbent remains the same for a number of consecutive iterations,
2116: the gap between incumbent objective value and model function decreases
2117: significantly as the iterations proceed.
2118: %
2119: \begin{lemma} \labtag{lem:atr:ft}
2120: Let $\epstol=0$ in Algorithm ATR, and let $\bar{\eta}$ be
2121: any constant satisfying $0<\bar{\eta}<1$, $\bar{\eta}>\xi$,
2122: $\bar{\eta} \ge \eta$. Choosing any index $k_1$ with $I_{k_1} \neq
2123: -1$, we have either that the incumbent $I_{k_1}=I$ is eventually
2124: replaced by a new incumbent or that there is an iteration
2125: $k_2>k_1$ such that
2126: \beq \labtag{atr.6}
2127: \cQ^{I} - m^{k_2} \le \bar{\eta} \left[
2128: \cQ^{I} - m^{k_1} \right].
2129: \eeq
2130: \end{lemma}
2131: The proof of this result follows closely that of its antecedent
2132: Lemma~\ref{lem:tr:ft}. The key is in the construction of the
2133: Model-Update procedure. As long as
2134: \beq \labtag{atr.7}
2135: \cQ^I - m^k > \eta [\cQ^I - m^{k_1}], \;\; \mbox{for $k \ge k_1$, where
2136: $I=I_{k_1} = I_k$},
2137: \eeq
2138: none of the cuts generated during the evaluation of $\cQ(x^q)$ for any
2139: $q=k_1, k_1+1, \dots, k$ can be deleted. The proof technique of
2140: Lemma~\ref{lem:tr:ft} can then be used to show that the successive
2141: iterates $x^{k_1}, x^{k_1+1}, \dots$ cannot be too closely spaced if
2142: the condition \eqnok{atr.7} is to hold and if all of them fail to
2143: satisfy the test to become a new incumbent. Since they all belong
2144: to a box of finite size centered on $x^I$, there can be only finitely
2145: many of these iterates. Hence, either a new incumbent is adopted
2146: at some iteration $k \ge k_1$ or condition \eqnok{atr.6} is
2147: eventually satisfied.
2148:
2149: We now show that the algorithm cannot ``get stuck'' at a nonoptimal
2150: incumbent. The following result is analogous to
2151: Theorem~\ref{th:tr:ft}, and its proof relies on the earlier results in
2152: exactly the same way.
2153: \begin{theorem} \labtag{th:atr:ft}
2154: Suppose that $\epstol =0$.
2155: \begin{itemize}
2156: \item[(i)] If $x^I \notin \cS$, then this incumbent is replaced by a
2157: new incumbent after a finite time.
2158: \item[(ii)] If $x^I \in \cS$, then either Algorithm ATR terminates
2159: (and verifies that $x^I \in \cS$), or $\cQ^I - m^k \downarrow 0$
2160: as $k \to \infty$.
2161: \end{itemize}
2162: \end{theorem}
2163:
2164: We conclude with the result that shows convergence of the sequence of
2165: incumbents to $\cS$. Once again, the logic of proof follows that of
2166: the synchronous analogue Theorem~\ref{th:tr:conv}.
2167: %
2168: \begin{theorem} \labtag{th:atr:conv}
2169: Suppose that $\epstol=0$. The sequence of incumbents
2170: $\{ x^{I_k} \}_{k=0,1,2,\dots}$ is either finite,
2171: terminating at some $x^I \in \cS$ or is infinite with
2172: the property that $\| x^{I_k} - P(x^{I_k})
2173: \|_{\infty} \to 0$.
2174: \end{theorem}
2175:
2176: \section{Implementation on Computational Grids} \labtag{sec:grids}
2177:
2178: We now describe some salient properties of the computational
2179: environment in which we implemented the algorithms, namely, a
2180: computational grid running the Condor system and the MW runtime
2181: support library.
2182:
2183: \subsection{Properties of Grids} \labtag{sec:grids:intro}
2184:
2185: The term ``grid computing'' (synonymously ``metacomputing'') is
2186: generally used to describe parallel computations on a geographically
2187: distributed, heterogeneous computing platform. Within this framework
2188: there are several variants of the concept. The one of interest here is
2189: a parallel platform made up of shared workstations, nodes of PC
2190: clusters, and supercomputers. Although such platforms are potentially
2191: powerful and inexpensive, they are difficult to harness for productive
2192: use, for the following reasons:
2193: %
2194: \bi
2195: \item Poor communications properties. Latencies between the processors
2196: may be high, variable, and unpredictable.
2197:
2198: \item Unreliability. Resources may disappear without notice. A
2199: workstation performing part of our computation may be reclaimed by
2200: its owner and our job terminated.
2201:
2202: \item Dynamic availability. The pool of available processors grows and
2203: shrinks during the computation, according to the claims of other users
2204: and scheduling considerations at some of the nodes.
2205:
2206: \item Heterogeneity. Resources may vary in their operational
2207: characteristics (memory, swap space, processor speed, operating
2208: system).
2209:
2210: \ei
2211: %
2212: In all these respects, our target platform differs from conventional
2213: multiprocessor platforms (such as IBM SP or SGI Origin machines) and
2214: from Linux clusters.
2215:
2216: \subsection{Condor} \labtag{sec:grids:condor}
2217:
2218: Our particular interest is in grid computing platforms based on the
2219: Condor system~\cite{condor}, which manages distributively owned
2220: collections (``pools'') of processors of different types, including
2221: workstations, nodes from PC clusters, and nodes from conventional
2222: multiprocessor platforms. When a user submits a job, the Condor system
2223: discovers a suitable processor for the job in the pool, transfers the
2224: executable and starts the
2225: job on that processor. It traps system calls (such as input/output
2226: operations), referring them back to the submitting workstation,
2227: and checkpoints the state of the job periodically. It also migrates the
2228: job to a different processor in the pool if the current host becomes
2229: unavailable for any reason (for example, if the workstation is
2230: reclaimed by its owner). Condor managed
2231: processes can communicate through a Condor-enabled version of PVM
2232: \cite{PVMbook} or by using Condor's I/O trapping to write into and
2233: read from a series of shared files.
2234:
2235: %
2236: %
2237: %
2238: %
2239: %
2240: %
2241: %
2242: %
2243: %
2244: %
2245: %
2246: %
2247: %
2248: %
2249: %
2250: %
2251: %
2252: %
2253: %
2254: %
2255: %
2256: %
2257: %
2258: %
2259: %
2260: %
2261: %
2262: %
2263: %
2264: %
2265: %
2266: %
2267: %
2268: %
2269: %
2270: %
2271:
2272: %
2273: %
2274: %
2275: %
2276: %
2277: %
2278: %
2279: %
2280: %
2281:
2282: %
2283: %
2284: %
2285: %
2286:
2287:
2288: \subsection{Implementation in MW} \labtag{sec:grids:mw}
2289:
2290: MW (see Goux, Linderoth, and Yoder~\cite{GouLY00} and Goux et
2291: al.~\cite{GouKLY00}) is a runtime support library that facilitates
2292: implementation of parallel master-worker applications on computational
2293: grids. To implement MW on a particular computational grid, a grid
2294: programmer must reimplement a small number of functions to perform
2295: basic operations for communications between processors and management
2296: of computational resources. These functions are encapsulated in the
2297: MWRMComm class. Of more relevance to the current paper is the other
2298: side of MW, the application programming interface presented to the
2299: application programmer. This interface takes the form of a set of
2300: three C$++$ abstract classes that must be reimplemented in a way that
2301: describes the particular application. These classes, named MWDriver,
2302: MWTask, and MWWorker, contain a total of ten methods for which the
2303: user must supply implementations. We describe these methods briefly,
2304: indicating how they are implemented for the particular case of the ATR
2305: and ALS algorithms.
2306:
2307: \paragraph{MWDriver.}
2308:
2309: This class is made up of methods that execute on the submitting
2310: workstation, which acts as the master processor. It contains the
2311: following four C$++$ pure virtual functions. (Naturally, other methods
2312: can be defined as needed to implement parts of the algorithm.)
2313: %
2314: \begin{itemize}
2315: %
2316: \item {\tt get\_userinfo}: Processes command-line arguments and does
2317: basic setup. In our applications this function reads a command file
2318: to set various parameters, including convergence tolerances, number
2319: of scenarios, number of partial sums to be evaluated in each task,
2320: maximum number of worker processors to be requested, initial trust
2321: region radius, and so on. It calls the routines that read and store
2322: the problem data files, and it reads the initial point, if one is
2323: supplied. It also performs the operations specified in the {\tt
2324: initialization} routine of Algorithms ALS and ATR, except for the
2325: final {\tt evaluate} operation, which is handled by the next
2326: function.
2327:
2328: %
2329: \item {\tt setup\_initial\_tasks}: Defines the initial pool of tasks.
2330: In the case of Algorithms ALS and ATR, this function corresponds to
2331: a call to {\tt evaluate} at $x^0$.
2332:
2333: %
2334: \item {\tt pack\_worker\_init\_data}: Packs the initial data to be
2335: sent to each worker processor when it joins the pool. In our case,
2336: the information contained in the input files for the stochastic
2337: programming problem is sent to each worker. When the worker
2338: subsequently receives a task requiring it to solve a number of
2339: second-stage scenarios, it can use the original input data to
2340: generate the particular data for its assigned set of scenarios.
2341: %
2342: %
2343: %
2344: By loading each new worker with the problem data, we avoid having to
2345: subsequently pass a complete set of data for every scenario in every
2346: task.
2347:
2348: %
2349: \item {\tt act\_on\_completed\_task}: Is called every time
2350: a task finishes, to process the results of the task and to take any
2351: actions arising from these results. See Algorithms ALS and ATR for
2352: our definition of this function in our applications.
2353: %
2354: %
2355:
2356: \end{itemize}
2357:
2358: %
2359: %
2360: %
2361:
2362: The MWDriver base class performs many other operations associated with
2363: handling worker processes that join and leave the computation,
2364: assigning tasks to appropriate workers, rescheduling tasks when their
2365: host workers disappear without warning, and keeping track of
2366: performance data for the run. All this complexity is hidden from the
2367: application programmer.
2368:
2369: \paragraph{MWTask.}
2370:
2371: The MWTask is the abstraction of a single task. It holds both the data
2372: describing that task and the results obtained by executing the task.
2373: The user must implement four functions for packing and unpacking this
2374: data and results between master and workers into simple data
2375: structures that can be communicated between master and workers using
2376: the appropriate primitives for the particular computational grid
2377: platform on which MW is implemented. In most of the results reported
2378: in Section~\ref{sec:results}, the message-passing facilities of
2379: Condor-PVM were used to perform the communication. By simply changing
2380: compiler directives, the same algorithmic code can also be implemented
2381: on an alternative communication protocol that uses shared files to
2382: pass messages between master and workers. The large run reported in
2383: the next section used this version of the code.
2384:
2385: %
2386: %
2387:
2388: In our applications, each task evaluates the partial sum
2389: $\cQ_{[j]}(x)$ and a subgradient for a given number of clusters. The
2390: task is described by a range of scenario indices for each cluster in
2391: the task and by a value of the first-stage variables $x$. The results
2392: consist of the function and subgradient for each of the clusters
2393: in the task.
2394:
2395: \paragraph{MWWorker.}
2396:
2397: The MWWorker class is the core of the executable that runs on each
2398: worker. The user must implement two pure virtual functions:
2399:
2400: \begin{itemize}
2401: \item {\tt unpack\_init\_data}: Unpacks the initial information passed
2402: to the worker by the MWDriver function {\tt
2403: pack\_worker\_init\_data()} when the worker joins the pool. (See
2404: the discussion of {\tt pack\_worker\_init\_data} in the MWDriver class.)
2405:
2406: \item {\tt execute\_task}: Executes a single task.
2407: \end{itemize}
2408:
2409: After initializing itself, using the information passed to it by the
2410: master, the worker process sits in a loop, waiting for tasks to be
2411: sent to it. When it detects a new task, it calls {\tt execute\_task}
2412: to compute the results. It passes the results back to the worker by
2413: using the appropriate function from the MWTask class, and then returns
2414: to its wait loop. The wait loop terminates when the master sends a
2415: termination message. In our applications, the {\tt execute\_task()}
2416: function formulates the second-stage linear programs in its clusters
2417: by using the information in the task definition and the data passed to
2418: the worker on initialization. It then calls the linear programming
2419: solvers SOPLEX or CPLEX
2420: to solve these linear programs, and
2421: uses the dual solutions to calculate the subgradient for each cluster.
2422:
2423:
2424: \section{Computational Results} \labtag{sec:results}
2425:
2426: %
2427: %
2428: %
2429: %
2430: %
2431: %
2432: %
2433: %
2434:
2435: %
2436: %
2437: %
2438: %
2439: %
2440: %
2441: %
2442: %
2443: %
2444: %
2445: %
2446:
2447: We now report on computational experiments obtained with
2448: implementations of the ALS, TR, and ATR algorithms using MW on the
2449: Condor system. After describing some further details of the
2450: implementations and the experiments, we discuss our choices for the
2451: various algorithmic parameters and how these were varied between runs.
2452: We then tabulate and discuss the results.
2453:
2454: \subsection{Implementations and Experiments}
2455: \label{sec:results:details}
2456:
2457: As noted earlier, we used the Condor-PVM implementation of MW for most
2458: of the the runs reported here.
2459: %
2460: %
2461: %
2462: %
2463: %
2464: %
2465: %
2466: Most of the computational time is taken up with solving linear
2467: programming problems, both by the master process (in the {\tt
2468: act\_on\_completed\_task} function) and in the tasks, which solve
2469: clusters of second-stage linear programs. We used the CPLEX simplex
2470: solver on the master processor and the SOPLEX public-domain simplex
2471: code (see Wunderling~\cite{soplex}) on the workers. SOPLEX is somewhat
2472: slower in general, but since most of the machines in the Condor pool
2473: do not have CPLEX licenses, there was little alternative but
2474: to use a public-domain code.
2475:
2476: We ran most of our experiments on the Condor pool at the University of
2477: Wisconsin, sometimes using Condor's flocking mechanism to augment this
2478: pool with processors from other sites. The other sites included the
2479: University of New Mexico, Columbia University, and the Linux cluster
2480: Chiba City at Argonne National Laboratory. The architectures included
2481: PCs running Linux, and PCs and Sun workstations running different
2482: versions of Solaris. The number of workers available for our use
2483: varied dramatically between and during each set of trials, because of
2484: the differing priorities of the two accounts we used, the variation of
2485: our priority during each run, the number and priorities of other users
2486: of the Condor pool at the time, and the varying number of machines
2487: available to the pool. The latter number tends to be larger during
2488: the night, when owners of the individual workstations are less likely
2489: to be using them. The master process was run on a Linux machine in
2490: some experiments and an Intel Solaris machine in other cases.
2491:
2492:
2493: The input files for the problems reported here were in SMPS format
2494: (see Birge et al.~\cite{BirDGGKW87} and Gassmann and
2495: Schweitzer~\cite{GasS97}). We considered two-stage stochastic linear
2496: programs in which the number of scenarios is finite but extremely
2497: large. We used Monte Carlo sampling to obtain approximate problems
2498: with a specified number $N$ of second-stage scenarios. Brief
2499: descriptions of the test problems can be found at \cite{Hol97}.
2500: %
2501: %
2502: %
2503: %
2504: %
2505: %
2506: %
2507: %
2508: %
2509: In each experiment, we supplied a starting point to the code, obtained
2510: from the solution of a different sampled instance of the same problem.
2511: The function value of the starting point was therefore quite close to
2512: the optimal objective value.
2513:
2514:
2515: \subsection{Critical Parameters}
2516: \label{sec:results:parameters}
2517:
2518: As part of the initialization procedure (implemented by the {\tt
2519: get\_userinfo} function in the MWDriver class), the code reads an
2520: input file in which various parameters are specified. Several
2521: parameters, such as those associated with modifying the size of the
2522: trust region, have fixed values that we have discussed already in the
2523: text. Others are assigned the same values for all algorithms and all
2524: experiments, namely,
2525: \[
2526: \epsilon_{\rm tol} = 10^{-5}, \sgap
2527: \Delta_{\rm hi} = 10^3, \sgap
2528: \Delta_{0,0} = \Delta_0 = 1, \sgap
2529: \xi = 10^{-4}.
2530: \]
2531: We also set $\eta= 0$ in the Model-Update functions in both TR and
2532: ATR. In TR, this choice has the effect of not allowing deletion of
2533: cuts generated during any major iterations, until a new major iterate
2534: is accepted. In ATR, the effect is to not allow deletion of cuts that
2535: are generated at points whose parent incumbent is still the incumbent.
2536: Even among cuts for which {\tt possible\_delete} is still true at the
2537: final conditional statement of the Model-Update procedures, we do not
2538: actually delete the cuts until they have been inactive at the solution
2539: of the trust-region subproblem for a specified number of consecutive
2540: iterations. For TR, we delete the cut if it has been inactive for more
2541: than 100 consecutive minor iterations, while in ATR we delete the cut
2542: if it was last active at subproblem $\ell$, where $\ell < k-100$ and
2543: $k$ is the current iteration index. Our cut deletion strategy is
2544: therefore not at all parsimonious; it tends to lead to subproblems
2545: \eqnok{trsub.kl} and \eqnok{trsub.atr1} with fairly large numbers of
2546: cuts. In most cases, however, the storage required for these cuts and
2547: the time required to solve the subproblems remain reasonable. We
2548: discuss the exceptions below.
2549:
2550: The synchronicity parameter $\sigma$, which arises in Algorithms ALS
2551: and ATR and which specifies the proportion of clusters from a
2552: particular point that must be evaluated in order to trigger evaluation
2553: of a new candidate solution, is varied between $.5$ and $1.0$ in our
2554: experiments. The size $K$ of the basket $\cB$ is varied between $1$
2555: and $14$. For each problem, the number $T$ of clusters is also varied
2556: in a manner described in the tables, as is the number of tasks into
2557: which the second-stage calculations are divided, which we denote by
2558: $C$. Note that the number of second-stage LPs per chunk is therefore
2559: $N/C$ while the number per cluster is $N/T$.
2560:
2561: The MW library allows us to specify an upper bound on the number of
2562: workers we request from the Condor pool, so that we can avoid claiming
2563: more workers than we can utilize effectively. We calculate a rough
2564: estimate of this number based on the number of tasks $C$ per
2565: evaluation of $\cQ(x)$ and the basket size $K$. For instance, the
2566: synchronous TR and LS algorithms can never use more than $C$ worker
2567: processors, since they evaluate $\cQ$ at just one $x$ at a time. In
2568: the case of TR and ATR, we request $\mbox{mid} (25, 200, \lfloor
2569: (K+1)C/2 \rfloor)$
2570: workers. For ALS, we request $\mbox{mid}(25,200,2C)$ workers.
2571:
2572: We have a single code that implements all four algorithms LS, ALS, TR,
2573: and ATR, using logical branches within the code to distinguish between
2574: the L-shaped and trust-region variants. There is no distinction in
2575: the code between the two synchronous variants and their asynchronous
2576: counterparts. Instead, by setting $\sigma=1.0$, we force synchronicity
2577: by ensuring that the algorithm considers only one value of $x$ at a
2578: time.
2579:
2580: Whenever a worker processor joins the computation, MW sends it a
2581: benchmark task that typifies the type of task it will receive during
2582: the run. In our case, we define the benchmark task to be the solution
2583: of $N/C$ second-stage LPs. The time required for the processor to
2584: solve this task is logged, and we set the ordering policy so as to
2585: ensure that when more than one worker is available to process a
2586: particular task, the task is sent to the worker that logged the
2587: fastest time on the benchmark task.
2588:
2589: \subsection{Results: Varying Parameter Choices} \label{sec:results:numbers}
2590:
2591: In this section we describe a series of experiments on the same
2592: problem, using different parameter settings, and run under different
2593: conditions on the Condor pool. For these trials, we use the problem
2594: SSN, which arises from a network design application described by Sen,
2595: Doverspike, and Cosares~\cite{SenDC94}. This problem is based on a
2596: graph with 89 arcs, each representing a telecommunications link
2597: between two cities. The first-stage variables represent the
2598: (nonnegative) extra capacity to be added to each of these 89 arcs to
2599: meet an uncertain demand pattern. There is a constraint on the total
2600: added capacity. The demands consist of requests for service between
2601: pairs of nodes in the graph. For each set of requests, a route through
2602: the network of sufficient capacity to meet the requests must be found,
2603: otherwise a penalty term for each request that cannot be satisfied is
2604: added to the objective. The second-stage problems are network flow
2605: problems for calculating the routing for a given set of demand flows.
2606: Each such problem is nontrivial: 706 variables, 175 constraints, and
2607: 2284 nonzeros in the constraint matrix. The uncertainty lies in the
2608: fact that the demand for service on each of the 86 pairs is not known
2609: exactly. Rather, there are three to seven possible scenarios for
2610: these demands, all independent of each other, giving a total of about
2611: $10^{70}$ possible scenarios. We use Monte Carlo sampling to obtain a
2612: sampled approximation with $N=10,000$ scenarios. The deterministic
2613: equivalent for this sampled approximation has approximately $1.75
2614: \times 10^6$ constraints and $7.06 \times 10^6$ variables. In all the
2615: runs, we used as starting point the computed solution for a different
2616: sampled approximation---one with $20,000$ scenarios and a different
2617: random seed. The starting point had a function value of approximately
2618: $9.868860$, whereas the optimal objective was approximately
2619: $9.832544$.
2620:
2621: In the tables below we list the following information.
2622: %
2623: \begin{itemize}
2624: \item {\bf points evaluated}. The number of distinct values of the
2625: first-stage variables $x$ generated by solving the master
2626: subproblem---the problem \eqnok{als.subprob} for Algorithm ALS,
2627: \eqnok{trsub.kl} for Algorithm TR, and \eqnok{trsub.atr1} for
2628: Algorithm ATR.
2629: %
2630: %
2631: %
2632: %
2633: %
2634:
2635: \item {\bf $| \cB |$}. Maximum size of the basket, also denoted above by $K$.
2636:
2637: \item {\bf number of tasks (chunks)}. Denoted above by $C$.
2638:
2639: \item {\bf number of clusters}. Denoted above by $T$, the number of
2640: partial sums \eqnok{thetaj} into which the second-stage problems are
2641: divided.
2642:
2643: \item {\bf max processors}. The number of workers requested.
2644:
2645: \item {\bf average processors}. The average of the number of active
2646: (nonsuspended) worker processors available for use by our problem
2647: during the run. Because of the dynamic nature of the Condor system,
2648: the actual number of available processors fluctuates continually
2649: during the run.
2650:
2651: \item {\bf parallel efficiency}. The proportion of time for which
2652: worker processors were kept busy solving second-stage problems
2653: while they were owned by this run.
2654:
2655: \item {\bf maximum number of cuts in the model}. The maximum number of
2656: (partial) subgradients that are used to define the model function
2657: during the course of the algorithm.
2658:
2659: \item {\bf masterproblem solve time}. The total time spent solving the
2660: master subproblem to generate new candidate iterates during the course of the
2661: algorithm.
2662:
2663: \item {\bf wall clock}. The total time (in minutes) between submission
2664: of the job and termination.
2665:
2666: \end{itemize}
2667:
2668: %
2669:
2670: \begin{table}
2671: \vspace*{1.0in}
2672: \centering
2673: \begin{tabular}{|c|r|rrr|rrr|rr|r|}
2674: \begin{rotate}{-45} run \end{rotate} &
2675: \begin{rotate}{-45} points evaluated \end{rotate} &
2676: \begin{rotate}{-45} $\sigma$ \end{rotate} &
2677: \begin{rotate}{-45} \# tasks ($C$) \end{rotate} &
2678: \begin{rotate}{-45} \# clusters ($T$) \end{rotate} &
2679: \begin{rotate}{-45} max. processors allowed \end{rotate} &
2680: \begin{rotate}{-45} av. processors \end{rotate} &
2681: %
2682: \begin{rotate}{-45} parallel efficiency \end{rotate} &
2683: \begin{rotate}{-45} max. \# cuts in model \end{rotate} &
2684: \begin{rotate}{-45} masterproblem solve time (min) \end{rotate} &
2685: \begin{rotate}{-45} wall clock time (min) \end{rotate} \\ \hline
2686:
2687: ALS & 269 & $.5$ & 10 & 50 & 20 & 15 & %
2688: .74 & 5491 & 26 & 368 \\
2689: ALS & 275 & $.5$ & 25 & 50 & 50 & 21 & %
2690: .90 & 5536 & 25 & 270 \\
2691: ALS & 293 & $.5$ & 50 & 50 & 100 & 20 & %
2692: .83 & 5639 & 27 & 329 \\
2693: ALS & 270 & $.7$ & 10 & 50 & 20 & 12 & %
2694: .79 & 5522 & 27 & 509 \\
2695: ALS & 274 & $.7$ & 25 & 50 & 50 & 25 & %
2696: .73 & 5550 & 25 & 281 \\
2697: ALS & 282 & $.7$ & 50 & 50 & 100 & 26 & %
2698: .81 & 5562 & 24 & 254 \\
2699: ALS & 254 & $.85$ & 10 & 50 & 20 & 12 & %
2700: .58 & 5496 & 22 & 575 \\
2701: ALS & 276 & $.85$ & 25 & 50 & 50 & 19 & %
2702: .57 & 5575 & 23 & 516 \\
2703: ALS & 278 & $.85$ & 50 & 50 & 100 & 35 & %
2704: .49 & 5498 & 25 & 260 \\
2705: \hline
2706:
2707:
2708: \end{tabular}
2709: \caption{SSN, with $N=10,000$ scenarios, Algorithm ALS.\label{tab.ssn.10k.exp2}}
2710: \end{table}
2711:
2712: Table~\ref{tab.ssn.10k.exp2} shows the results of a series of trials
2713: of Algorithm ALS with three different values of $\sigma$ ($.5$, $.7$,
2714: and $.85$) and three different choices for the number of chunks $C$
2715: into which the second-stage solutions were divided (10, 25, and 50).
2716: The number of clusters $T$ was fixed at 50, so that up to 50
2717: cuts were generated at each iteration. For $\sigma=.5$, the number of
2718: values of $x$ for which second-stage evaluations are occurring at any
2719: point in time ranged from 2 to 4 during the runs, while for
2720: $\sigma=.85$, there were never more than 2 points being evaluated
2721: simultaneously.
2722:
2723: When these runs were performed, we were not able to obtain anything
2724: approaching the requested number $2C$ of workers from the Condor pool.
2725: As general trends, we see that the less synchronous variants (with
2726: $\sigma = .5$ and $\sigma=.7$) tend to be faster than the more
2727: synchronous variant (with $\sigma=.85$), except for the final run,
2728: during which more processors were available. Moreover, larger values
2729: of $C$ also tend to produce faster runs. We also note that the number
2730: of iterations does not depend strongly on $\sigma$. We would not, of
2731: course, expect $C$ to affect strongly the number of iterations, but
2732: since it affects the manner in which the second-stage evaluation work
2733: is distributed, we {\em would} expect it to affect the run time. Since
2734: the number of workers available to us during this run was limited,
2735: however, we did not see the full benefit of a finer-grained work
2736: distribution ($C=50$), though the relatively low parallel efficiency
2737: of the final run ($\sigma=.85$, $C=50$) indicates that the benefits of
2738: more processors may not have been great in any case.
2739:
2740: A note on typical task sizes: For $C=10$, a typical task required
2741: about $50$-$280$ seconds on a typical worker machine available to us,
2742: while for $C=50$, about $9$-$60$ seconds were required. The large
2743: variation reflects the wide range in processing ability of the
2744: machines available in a pool during a typical run. These numbers also
2745: generally hold for the results in Tables~\ref{tab.ssn.10k.exp4.2} and
2746: \ref{tab.ssn.10k.exp4.1}.
2747:
2748: By comparing the results from Table~\ref{tab.ssn.10k.exp2} with those
2749: reported in Tables~\ref{tab.ssn.10k.exp4.2} and
2750: \ref{tab.ssn.10k.exp4.1}, we verified that Algorithm
2751: ALS was not as efficient on this problem as Algorithm TR and certain
2752: variants of Algorithm ATR. One advantage, however, was that the
2753: asymptotic convergence of ALS was quite fast. Having taken many
2754: iterations to build up a model and return to a neighborhood of the
2755: solution after having strayed far from it in early iterations, the
2756: last three to four iterations home in rapidly from a relatively crude
2757: approximate solution (a relative accuracy $(\cQ_{\rm min} -
2758: m(x^{k+1})) / (1 + | \cQ_{\rm min}|)$ of between $.0006$ and $.0026$)
2759: to a solution of high accuracy.
2760: %
2761: %
2762: %
2763: %
2764:
2765: %
2766: %
2767: \begin{table}
2768: \vspace*{1.0in}
2769: \centering
2770: \begin{tabular}{|c|r|rrr|rrr|rr|r|}
2771: \begin{rotate}{-45} run \end{rotate} &
2772: \begin{rotate}{-45} points evaluated \end{rotate} &
2773: \begin{rotate}{-45} $|\cB|$ ($K$) \end{rotate} &
2774: \begin{rotate}{-45} \# tasks ($C$) \end{rotate} &
2775: \begin{rotate}{-45} \# clusters ($T$) \end{rotate} &
2776: \begin{rotate}{-45} max. processors allowed \end{rotate} &
2777: \begin{rotate}{-45} av. processors \end{rotate} &
2778: \begin{rotate}{-45} parallel efficiency \end{rotate} &
2779: \begin{rotate}{-45} max. \# cuts in model \end{rotate} &
2780: \begin{rotate}{-45} masterproblem solve time (min) \end{rotate} &
2781: \begin{rotate}{-45} wall clock time (min) \end{rotate} \\ \hline
2782:
2783: TR & 48 & - & 10 & 100 & 20 & 19 & .21 & 4284 & 3 & 131 \\
2784: TR & 72 & - & 10 & 50 & 20 & 19 & .26 & 3520 & 3 & 150 \\
2785: %
2786: TR & 39 & - & 25 & 100 & 25 & 22 & .49 & 3126 & 2 & 59 \\
2787: %
2788: TR & 75 & - & 25 & 50 & 25 & 23 & .48 & 3519 & 3 & 114 \\
2789: TR & 43 & - & 50 & 100 & 50 & 42 & .52 & 3860 & 3 & 35 \\
2790: TR & 61 & - & 50 & 50 & 50 & 44 & .53 & 3011 & 3 & 40 \\
2791: \hline
2792:
2793: ATR & 109 & 3 & 10 & 100 & 20 & 18 & .74 & 7680 & 9 & 107 \\
2794: ATR & 121 & 3 & 10 & 50 & 20 & 19 & .66 & 4825 & 6 & 111 \\
2795: ATR & 105 & 3 & 25 & 100 & 50 & 37 & .73 & 7367 & 8 & 49 \\
2796: ATR & 113 & 3 & 25 & 50 & 50 & 41 & .60 & 4997 & 6 & 48 \\
2797: ATR & 103 & 3 & 50 & 100 & 100 & 66 & .55 & 7032 & 9 & 29 \\
2798: ATR & 129 & 3 & 50 & 50 & 100 & 66 & .59 & 5183 & 7 & 32 \\
2799: \hline
2800:
2801: ATR & 167 & 6 & 10 & 100 & 35 & 24 & .93 & 7848 & 13 & 99 \\
2802: ATR & 209 & 6 & 10 & 50 & 35 & 22 & .89 & 5730 & 15 & 92 \\
2803: ATR & 186 & 6 & 25 & 100 & 87 & 49 & .77 & 8220 & 14 & 53 \\
2804: %
2805: %
2806: ATR & 172 & 6 & 25 & 50 & 87 & 49 & .80 & 5945 & 7 & 49 \\
2807: %
2808: ATR & 159 & 6 & 50 & 100 & 175 & 31 & .89 & 7092 & 11 & 65 \\
2809: ATR & 213 & 6 & 50 & 50 & 175 & 40 & .88 & 6299 & 12 & 70 \\
2810: \hline
2811:
2812: ATR & 260 & 9 & 10 & 100 & 50 & 12 & .95 & 14431 & 35 & 267 \\
2813: ATR & 286 & 9 & 10 & 50 & 50 & 23 & .90 & 6528 & 19 & 160 \\
2814: ATR & 293 & 9 & 25 & 100 & 125 & 17 & .93 & 9911 & 30 & 232 \\
2815: ATR & 377 & 9 & 25 & 50 & 125 & 15 & .96 & 7080 & 24 & 321 \\
2816: ATR & 218 & 9 & 50 & 100 & 200 & 28 & .82 & 10075 & 25 & 101 \\
2817: ATR & 356 & 9 & 50 & 50 & 200 & 23 & .93 & 6132 & 23 & 194 \\
2818: \hline
2819:
2820: ATR & 378 & 14 & 10 & 100 & 75 & 18 & .88 & 15213 & 77 & 302 \\
2821: ATR & 683 & 14 & 10 & 50 & 75 & 14 & .98 & 8850 & 48 & 648 \\
2822: ATR & 441 & 14 & 25 & 100 & 187 & 22 & .89 & 14597 & 61 & 312 \\
2823: ATR & 480 & 14 & 25 & 50 & 187 & 20 & .94 & 8379 & 36 & 347 \\
2824: ATR & 446 & 14 & 50 & 100 & 200 & 20 & .83 & 13956 & 64 & 331 \\
2825: ATR & 498 & 14 & 50 & 50 & 200 & 22 & .94 & 7892 & 35 & 329 \\
2826: \hline
2827:
2828: \end{tabular}
2829: \caption{SSN, with $N=10,000$ scenarios, first trial, Algorithms TR and ATR.\label{tab.ssn.10k.exp4.2}}
2830: \end{table}
2831:
2832: %
2833:
2834: \begin{table}
2835: \vspace*{1.0in}
2836: \centering
2837: \begin{tabular}{|c|r|rrr|rrr|rr|r|}
2838: \begin{rotate}{-45} run \end{rotate} &
2839: \begin{rotate}{-45} points evaluated \end{rotate} &
2840: \begin{rotate}{-45} $|\cB|$ ($K$) \end{rotate} &
2841: \begin{rotate}{-45} \# tasks ($C$) \end{rotate} &
2842: \begin{rotate}{-45} \# clusters ($T$) \end{rotate} &
2843: \begin{rotate}{-45} max. processors allowed \end{rotate} &
2844: \begin{rotate}{-45} av. processors \end{rotate} &
2845: \begin{rotate}{-45} parallel efficiency \end{rotate} &
2846: \begin{rotate}{-45} max. \# cuts in model \end{rotate} &
2847: \begin{rotate}{-45} masterproblem solve time (min) \end{rotate} &
2848: \begin{rotate}{-45} wall clock time (min) \end{rotate} \\ \hline
2849:
2850: TR & 47 & - & 10 & 100 & 20 & 17 & .24 & 3849 & 4 & 192 \\
2851: TR & 67 & - & 10 & 50 & 20 & 13 & .34 & 3355 & 3 & 256 \\
2852: TR & 47 & - & 25 & 100 & 25 & 18 & .49 & 3876 & 4 & 97 \\
2853: TR & 57 & - & 25 & 50 & 25 & 18 & .40 & 2835 & 3 & 119 \\
2854: TR & 42 & - & 50 & 100 & 50 & 30 & .22 & 3732 & 3 & 122 \\
2855: TR & 65 & - & 50 & 50 & 50 & 31 & .25 & 3128 & 4 & 151 \\
2856: \hline
2857:
2858: ATR & 92 & 3 & 10 & 100 & 20 & 11 & .89 & 7828 & 9 & 125 \\
2859: ATR & 98 & 3 & 10 & 50 & 20 & 11 & .84 & 4893 & 5 & 173 \\
2860: ATR & 86 & 3 & 25 & 100 & 50 & 34 & .38 & 6145 & 5 & 70 \\
2861: ATR & 95 & 3 & 25 & 50 & 50 & 32 & .41 & 4469 & 4 & 77 \\
2862: ATR & 80 & 3 & 50 & 100 & 100 & 52 & .23 & 5411 & 5 & 80 \\
2863: ATR & 131 & 3 & 50 & 50 & 100 & 59 & .47 & 4717 & 6 & 55 \\
2864: \hline
2865:
2866: ATR & 137 & 6 & 10 & 100 & 35 & 30 & .57 & 8338 & 12 & 84 \\
2867: ATR & 200 & 6 & 10 & 50 & 35 & 26 & .60 & 5211 & 9 & 130 \\
2868: ATR & 119 & 6 & 25 & 100 & 87 & 52 & .55 & 7181 & 7 & 44 \\
2869: ATR & 199 & 6 & 25 & 50 & 87 & 58 & .48 & 5298 & 9 & 81 \\
2870: ATR & 178 & 6 & 50 & 100 & 175 & 50 & .47 & 9776 & 15 & 77 \\
2871: ATR & 240 & 6 & 50 & 50 & 175 & 61 & .64 & 5910 & 11 & 74 \\
2872: \hline
2873:
2874: ATR & 181 & 9 & 10 & 100 & 50 & 37 & .56 & 8737 & 15 & 96 \\
2875: ATR & 289 & 9 & 10 & 50 & 50 & 19 & .93 & 7491 & 25 & 238 \\
2876: ATR & 212 & 9 & 25 & 100 & 125 & 90 & .66 & 11017 & 21 & 45 \\
2877: ATR & 272 & 9 & 25 & 50 & 125 & 65 & .45 & 6365 & 15 & 105 \\
2878: ATR & 281 & 9 & 50 & 100 & 200 & 51 & .72 & 11216 & 34 & 88 \\
2879: ATR & 299 & 9 & 50 & 50 & 200 & 26 & .83 & 7438 & 27 & 225 \\
2880: \hline
2881:
2882: ATR & 304 & 14 & 10 & 100 & 75 & 38 & .89 & 13608 & 43 & 129 \\
2883: ATR & 432 & 14 & 10 & 50 & 75 & 42 & .95 & 7844 & 28 & 132 \\
2884: ATR & 356 & 14 & 25 & 100 & 187 & 71 & .78 & 13332 & 48 & 111 \\
2885: ATR & 444 & 14 & 25 & 50 & 187 & 45 & .89 & 7435 & 36 & 163 \\
2886: ATR & 388 & 14 & 50 & 100 & 200 & 42 & .79 & 12302 & 52 & 192 \\
2887: ATR & 626 & 14 & 50 & 50 & 200 & 48 & .81 & 7273 & 46 & 254 \\
2888: \hline
2889: \end{tabular}
2890: \caption{SSN, with $N=10,000$ scenarios, second trial, Algorithms TR and ATR.\label{tab.ssn.10k.exp4.1}}
2891: \end{table}
2892:
2893: We now turn to Tables~\ref{tab.ssn.10k.exp4.2} and
2894: \ref{tab.ssn.10k.exp4.1}, which report on two sets of trials on the
2895: same problem as in Table~\ref{tab.ssn.10k.exp2}. In these trials we
2896: varied the following parameters:
2897: \bi
2898: \item {\bf basket size:}
2899: $K=1$ (synchronous TR) as well as $K=3,6,9,14$;
2900:
2901: \item {\bf number of tasks:}
2902: $C=10,25,50$, as in Table~\ref{tab.ssn.10k.exp2};
2903:
2904: \item {\bf number of clusters:} $T=50,100$.
2905: \ei
2906: %
2907: The parameter $\sigma$ was fixed at $.7$ in all these runs.
2908:
2909: The results in Table~\ref{tab.ssn.10k.exp4.2} were obtained with the
2910: master processor running on an Intel Solaris machine, while
2911: Table~\ref{tab.ssn.10k.exp4.1} was obtained with a Linux master. In
2912: both cases, the Condor pool that we tapped for worker processors was
2913: identical. Therefore, it is possible to do a meaningful comparison
2914: between each line of Table~\ref{tab.ssn.10k.exp4.1} and its
2915: counterpart in Table~\ref{tab.ssn.10k.exp4.2}. Conditions on the
2916: Condor pool varied between and during each trial. This fact, combined
2917: with the properties of the algorithm, resulted in large variability of
2918: runtime from one trial to the next, as we discuss below.
2919:
2920: The nondeterministic nature of the algorithms is evident in doing a
2921: side-by-side comparison of the two tables. Even for synchronous TR,
2922: the slightly different numerical values for function and subgradient
2923: value returned by different workers in different runs results in
2924: slight variations in the iteration sequence and therefore slight
2925: differences in the number of iterations. For the asynchronous
2926: Algorithm ATR, the nondeterminism is even more marked. During the
2927: basket-filling phase of the algorithm, computation of a new $x$ is
2928: triggered when a certain proportion of tasks from a current value of
2929: $x$ has been returned. On different runs, the tasks will be returned
2930: in different orders, so the information used by the trust-region
2931: subproblem \eqnok{trsub.atr1} in generating the new point will vary
2932: from run to run, and the resulting iteration sequences will generally
2933: show substantial differences.
2934:
2935: The synchronous TR algorithm is clearly better than the ATR variants
2936: with $K>1$ in terms of total computation, which is roughly
2937: proportional to the number of iterations. In fact, the total amount of
2938: work increases steadily with basket size. Because of the decreased
2939: synchronicity requirements and the greater parallelism obtained for
2940: $K>1$, the wall clock times (last columns) do not follow quite the
2941: same trend. The wall clock times for basket sizes $K=3$ and $K=6$ are
2942: at least competitive with the results obtained for the synchronous TR
2943: algorithm. The choice $K=6$ gave few of the fastest runs but did yield
2944: consistent performance over all the different choices for the other
2945: parameters, and under different Condor pool conditions.
2946:
2947: %
2948: %
2949: %
2950: %
2951: %
2952: %
2953:
2954: The deleterious effects of synchronicity in Algorithm TR can be seen in
2955: its poor performance on several instances, particularly during the
2956: second trial. Let us compare, for instance, the entries in the two
2957: tables for the variant of TR with $C=50$ and $T=100$. In the first
2958: trial, this run used 42 worker processors on average and took 35
2959: minutes, while in the second trial it used 30 workers on average and
2960: required 122 minutes. The difference in runtime is too large to be
2961: accounted for by the number of workers. Because this is a synchronous
2962: algorithm, the time required for each iteration is determined by the
2963: time required for the slowest worker to return the results of its
2964: task. In the first trial, almost all tasks required between 6 and 35
2965: seconds, except for a few iterations that contained tasks that took up
2966: to 62 seconds. In the second trial, the slowest worker at each
2967: iteration almost always required more than 60 seconds to complete its
2968: task. We return to this point in discussing
2969: Table~\ref{tab.ssn.10k.exp5} below.
2970:
2971: Other general observations we can make are that 100 clusters give
2972: almost uniformly better results in terms of wall clock time than 50
2973: clusters, although the higher number results in a larger number of
2974: cuts in the trust-region subproblems and an increased amount of time
2975: on the master processor in solving these problems. The latter factor
2976: is critical for $K=9$ and $K=14$, which do not compare
2977: favorably with the smaller values of $K$ on this problem, even if many
2978: more worker processors are available. For the large basket sizes, the
2979: loss of control induced by the increase in assynchronicity leads to a
2980: significantly larger number of points that are evaluated.
2981:
2982: %
2983: %
2984: %
2985: %
2986: %
2987:
2988: In all cases, it takes some time for the model $m$ to become a good
2989: enough approximation to $\cQ$ that it generates a step that meets the
2990: trust-region acceptance criteria. The six TR runs in
2991: Table~\ref{tab.ssn.10k.exp4.1}, for instance, required 18, 27, 16, 22,
2992: 16, and 26 trust-region subproblems to be solved, respectively, before
2993: they stepped away from the initial point. (Note that, as expected, the
2994: runs with $T=100$ required fewer such iterations than those with
2995: $T=50$.) After the first step is taken, most steps are successful;
2996: that is, the first minor iterate usually is accepted as the next major
2997: iterate. Occasionally, two to four minor iterations are required
2998: before the next major iteration is identified. Similar behavior is
2999: observed for the runs of ATR, except that successful iterations are
3000: more widely spaced. For the first run with $K=6$ in
3001: Table~\ref{tab.ssn.10k.exp4.1}, for instance, the $37$th solution of
3002: \eqnok{trsub.atr1} yields the first successful step; then 36 of the
3003: following 99 solutions of the subproblem yield successful steps.
3004:
3005:
3006: %
3007: %
3008: \begin{table}
3009: \vspace*{1.0in}
3010: \centering
3011: \begin{tabular}{|c|r|rrr|rrr|rr|r|}
3012: \begin{rotate}{-45} run \end{rotate} &
3013: \begin{rotate}{-45} points evaluated \end{rotate} &
3014: \begin{rotate}{-45} $|\cB|$ ($K$) \end{rotate} &
3015: \begin{rotate}{-45} \# tasks ($C$) \end{rotate} &
3016: \begin{rotate}{-45} \# clusters ($T$) \end{rotate} &
3017: \begin{rotate}{-45} max. processors allowed \end{rotate} &
3018: \begin{rotate}{-45} av. processors \end{rotate} &
3019: \begin{rotate}{-45} parallel efficiency \end{rotate} &
3020: \begin{rotate}{-45} max. \# cuts in model \end{rotate} &
3021: \begin{rotate}{-45} masterproblem solve time (min) \end{rotate} &
3022: \begin{rotate}{-45} wall clock time (min) \end{rotate} \\ \hline
3023:
3024: TR & 47 & - & 25 & 100 & 25 & 23 & .49 & 4040 & 3 & 58 \\
3025: TR & 44 & - & 25 & 100 & 25 & 21 & .31 & 3220 & 3 & 97 \\
3026: TR & 45 & - & 25 & 100 & 25 & 20 & .23 & 3966 & 4 & 158 \\ \hline
3027:
3028: TR & 51 & - & 50 & 100 & 50 & 37 & .33 & 4428 & 3 & 48 \\
3029: TR & 51 & - & 50 & 100 & 50 & 45 & .14 & 4806 & 3 & 135 \\
3030: TR & 46 & - & 50 & 100 & 50 & 41 & .15 & 3847 & 4 & 135 \\ \hline
3031:
3032: ATR & 81 & 3 & 25 & 100 & 50 & 43 & .38 & 7451 & 6 & 64 \\
3033: ATR & 81 & 3 & 25 & 100 & 50 & 39 & .41 & 6461 & 5 & 64 \\
3034: ATR & 87 & 3 & 25 & 100 & 50 & 36 & .44 & 6055 & 8 & 66 \\ \hline
3035:
3036: ATR & 106 & 3 & 50 & 100 & 100 & 84 & .28 & 8222 & 9 & 53 \\
3037: ATR & 95 & 3 & 50 & 100 & 100 & 65 & .26 & 6786 & 7 & 64 \\
3038: ATR & 94 & 3 & 50 & 100 & 100 & 23 & .44 & 6593 & 8 & 105 \\ \hline
3039:
3040: ATR & 171 & 6 & 25 & 100 & 87 & 70 & .45 & 9173 & 19 & 61 \\
3041: ATR & 135 & 6 & 25 & 100 & 87 & 61 & .39 & 7354 & 12 & 75 \\
3042: ATR & 145 & 6 & 25 & 100 & 87 & 38 & .35 & 8919 & 16 & 146 \\ \hline
3043:
3044: ATR & 177 & 6 & 50 & 100 & 175 & 87 & .41 & 9263 & 22 & 54 \\
3045: ATR & 162 & 6 & 50 & 100 & 175 & 93 & .34 & 7832 & 18 & 66 \\
3046: ATR & 159 & 6 & 50 & 100 & 175 & 39 & .27 & 8215 & 22 & 199 \\ \hline
3047:
3048: \end{tabular}
3049: \caption{SSN final trial with best parameter combinations, $N=10,000$ scenarios, Algorithms TR and ATR.\label{tab.ssn.10k.exp5}}
3050: \end{table}
3051:
3052:
3053: In Table~\ref{tab.ssn.10k.exp5}, we took the most promising parameter
3054: combinations from Tables~\ref{tab.ssn.10k.exp4.1} and
3055: \ref{tab.ssn.10k.exp4.2} and ran three trials with each combination.
3056: The Condor pool conditions varied widely during this trial, as can be
3057: seen by the way that the average number of workers varies within each
3058: group of three runs. For the asynchronous ATR runs, the differences in
3059: wall clock times within each set of three runs usually can be
3060: explained in terms of the varying number of workers available. (A
3061: possible exception is the last line of the table, the third run of ATR
3062: with $K=6$, $C=50$ and $T=100$, which took almost four times as long
3063: as the first run while having only slightly fewer than half as many
3064: processors. While the speed of machines available was roughly similar
3065: between these runs, the third run was plagued with numerous
3066: suspensions as the workers were reclaimed by their owners. Total time
3067: that workers were suspended was over 23,000 seconds on the third run
3068: and less than 2,800 seconds during the first run.) On the other hand,
3069: the variability in wall clock time between the six runs of the
3070: synchronous TR algorithm was due not to the number of available
3071: workers but rather to the synchronicity effect described above. In the
3072: run reported in the first line of the table, for instance, the slowest
3073: worker on any iteration typically took less than 65 seconds. In the
3074: run reported on the third line, the time required by the slowest
3075: worker varied significantly but was typically much longer, 150 seconds
3076: and more.
3077:
3078: %
3079: %
3080: %
3081: %
3082: %
3083: %
3084: %
3085: %
3086: %
3087: %
3088: %
3089: %
3090: %
3091: %
3092: %
3093:
3094:
3095: %
3096:
3097: %
3098: %
3099: %
3100: %
3101: %
3102: %
3103: %
3104: %
3105: %
3106: %
3107: %
3108: %
3109: %
3110: %
3111: %
3112: %
3113: %
3114: %
3115: %
3116: %
3117: %
3118: %
3119: %
3120:
3121: \subsection{Larger Instances} \label{sec:results:large}
3122:
3123: We also performed runs on several larger instances of SSN (with
3124: %
3125: $N=100,000$ scenarios) and on some very large instances
3126: of the stormG2 problem, a cargo flight scheduling application described
3127: by Mulvey and Ruszczy{\'n}ski~\cite{MulR95}. Our interest
3128: in this section is more in the sheer size of the problems that can be
3129: solved using the algorithms developed for the computational grid
3130: than with the relative performance of the algorithms with
3131: different parameter settings.
3132:
3133: %
3134: %
3135: %
3136: %
3137: %
3138: %
3139: %
3140: %
3141: %
3142: %
3143: %
3144: %
3145: %
3146: %
3147: %
3148: %
3149: %
3150: %
3151: %
3152: %
3153: %
3154: %
3155: %
3156: %
3157: %
3158: %
3159: %
3160: %
3161: %
3162: %
3163: %
3164: %
3165: %
3166: %
3167: %
3168: %
3169: %
3170: %
3171: %
3172: %
3173: %
3174: %
3175: %
3176: %
3177: %
3178: %
3179: %
3180: %
3181: %
3182: %
3183: %
3184: %
3185: %
3186: %
3187: %
3188: %
3189:
3190: %
3191: %
3192: \begin{table}
3193: \vspace*{1.0in}
3194: \centering
3195: \begin{tabular}{|c|r|rrr|rrr|rr|r|}
3196: \begin{rotate}{-45} run \end{rotate} &
3197: \begin{rotate}{-45} points evaluated \end{rotate} &
3198: \begin{rotate}{-45} $|\cB|$ ($K$) \end{rotate} &
3199: \begin{rotate}{-45} \# tasks ($C$) \end{rotate} &
3200: \begin{rotate}{-45} \# clusters ($T$) \end{rotate} &
3201: \begin{rotate}{-45} max. processors allowed \end{rotate} &
3202: \begin{rotate}{-45} av. processors \end{rotate} &
3203: \begin{rotate}{-45} parallel efficiency \end{rotate} &
3204: \begin{rotate}{-45} max. \# cuts in model \end{rotate} &
3205: \begin{rotate}{-45} masterproblem solve time (min) \end{rotate} &
3206: \begin{rotate}{-45} wall clock time (min) \end{rotate} \\ \hline
3207: ATR & 177 & 3 & 100 & 100 & 200 & 38 & .52 & 10558 & 47 & 1357 \\
3208: \hline
3209: \end{tabular}
3210: \caption{SSN, with $N=100,000$ scenarios.\label{tab.ssn.100k}}
3211: \end{table}
3212:
3213: Table~\ref{tab.ssn.100k} shows results for a sampled instance of SSN
3214: with $N=100,000$ scenarios, which is a linear program with
3215: approximately $1.75 \times 10^7$ constraints and $7.06 \times 10^7$
3216: variables. This run was performed at a time when not many machines
3217: were available, and many suspensions occurred during the run. We chose
3218: $T=100$ chunks per evaluation and found that most tasks required
3219: between 41 and 300 seconds on the workers, with a few task times of
3220: more than 500 seconds. (The benchmarks indicated that the worker speed
3221: varied over a factor of 7.) A total of 77 different workers were used
3222: during the run, though the average number of nonsuspended workers
3223: available at any time was only 39. In fact, at any given point in the
3224: computation there were an average of 7 workers assigned to this task
3225: that were suspended. Still, a result was obtained in about 22 hours.
3226:
3227: \begin{table}
3228: \vspace*{1.0in}
3229: \centering
3230: \begin{tabular}{|c|r|rrr|rrr|rr|r|}
3231: \begin{rotate}{-45} run \end{rotate} &
3232: \begin{rotate}{-45} points evaluated \end{rotate} &
3233: \begin{rotate}{-45} $|\cB|$ ($K$) \end{rotate} &
3234: \begin{rotate}{-45} \# tasks ($C$) \end{rotate} &
3235: \begin{rotate}{-45} \# clusters ($T$) \end{rotate} &
3236: \begin{rotate}{-45} max. processors allowed \end{rotate} &
3237: \begin{rotate}{-45} av. processors \end{rotate} &
3238: \begin{rotate}{-45} parallel efficiency \end{rotate} &
3239: \begin{rotate}{-45} max. \# cuts in model \end{rotate} &
3240: \begin{rotate}{-45} masterproblem solve time (min) \end{rotate} &
3241: \begin{rotate}{-45} wall clock time (min) \end{rotate} \\ \hline
3242: TR & 17 & - & 125 & 125 & 250 & 106 & .55 & 2310 & 0.5 & 146 \\ %
3243: ATR & 25 & 3 & 125 & 125 & 250 & 106 & .90 & 3292 & 0.5 & 116 \\ \hline %
3244: \end{tabular}
3245: \caption{stormG2, with $N=250000$ scenarios. \label{tab.storm.250k}}
3246: \end{table}
3247:
3248: In the stormG2 problem of Mulvey and Ruszczy{\'n}ski~\cite{MulR95}, the
3249: first-stage problem contained 121 variables, while each second-stage
3250: problem contained 1259 variables. We considered first a sampled
3251: approximation of this problem with 250000 scenarios, which resulted
3252: in a linear program with $1.32 \times 10^8$ constraints and $315 \times 10^8$
3253: unknowns. Results are shown in Table~\ref{tab.storm.250k}. The
3254: algorithm was started at a solution of a sampled instance with fewer
3255: scenarios and was quite close to optimal. The objective function at
3256: the initial point was approximately $15499595.1$, compared with an
3257: optimal value of $15499591.9$ achieved by Algorithm TR. In fact, the
3258: TR algorithm takes only one major iteration---it accepts the 16th
3259: minor iteration as the first major iterate $x^1$. The ATR variant does
3260: not take even one step---it terminates after determining that the
3261: initial point $x^0$ is optimal to within the given convergence
3262: tolerance. Although we requested 250 processors, an average of only
3263: 106 were available during the time that we performed these two test
3264: runs. The second run is able to utilize these to high efficiency, as
3265: the second-stage workload can be divided into a large number of chunks
3266: and very little time is spent in solving the trust-region subproblem.
3267:
3268: \begin{table}
3269: \vspace*{1.0in}
3270: \centering
3271: \begin{tabular}{|c|r|rrr|rrr|rr|r|}
3272: \begin{rotate}{-45} run \end{rotate} &
3273: \begin{rotate}{-45} points evaluated \end{rotate} &
3274: \begin{rotate}{-45} $|\cB|$ ($K$) \end{rotate} &
3275: \begin{rotate}{-45} \# tasks ($C$) \end{rotate} &
3276: \begin{rotate}{-45} \# clusters ($T$) \end{rotate} &
3277: \begin{rotate}{-45} max. processors allowed \end{rotate} &
3278: \begin{rotate}{-45} av. processors \end{rotate} &
3279: \begin{rotate}{-45} parallel efficiency \end{rotate} &
3280: \begin{rotate}{-45} max. \# cuts in model \end{rotate} &
3281: \begin{rotate}{-45} masterproblem solve time (hr) \end{rotate} &
3282: \begin{rotate}{-45} wall clock time (hr) \end{rotate} \\ \hline
3283: ATR & 28 & 4 & 1024 & 1024 & 800 & 433 & .668 & 39647 & 1.9 & 31.9 \\ \hline
3284: \end{tabular}
3285: \caption{stormG2, with $N=10^7$ scenarios.\label{tab.storm.1e7}}
3286: \end{table}
3287:
3288: Finally, we report on a very large sampled instance of stormG2 with
3289: $N=10^7$ scenarios, an instance whose deterministic equivalent is a
3290: linear program with $9.85 \times 10^8$ constraints and $1.26 \times
3291: 10^{10}$ variables. Performance is profiled in
3292: Table~\ref{tab.storm.1e7}.
3293:
3294: We used the tighter convergence tolerance $\epstol = 10^{-6}$ for this
3295: run. The algorithm took successful steps at iterations 28, 34, 37, and
3296: 38, the last of these being the final iteration. The first evaluated
3297: point had a function value of
3298: %
3299: $15526740$, compared with a value of
3300: %
3301: $15498842$ at the final iteration.
3302: %
3303: %
3304: %
3305: %
3306:
3307: For this run, we augmented the Wisconsin Computer Science Condor pool with
3308: machines from Georgia Tech, the University of New Mexico, the Italian
3309: National Institute of Physics (INFN), the NCSA at the University of Illinois,
3310: and the IEOR Department at Columbia, the Albu, and the Wisconsin
3311: engineering Department. Table~\ref{bigstorm.tab} shows
3312: the number and type of processors available at each of these
3313: locations.
3314: %
3315: %
3316: %
3317: In contrast to the other runs
3318: reported here, we used the ``MW-files'' implementation of MW, the
3319: variant that uses shared files to perform communication between master
3320: and workers rather than Condor-PVM.
3321:
3322: \begin{table}
3323: \centering
3324: \begin{tabular}{|c|c|c|} \hline
3325: Number & Type & Location \\ \hline
3326: 184 & Intel/Linux & Argonne \\ \hline
3327: 254 & Intel/Linux & New Mexico \\ \hline
3328: 36 & Intel/Linux & NCSA \\ \hline
3329: 265 & Intel/Linux & Wisconsin \\
3330: 88 & Intel/Solaris & Wisconsin \\
3331: 239 & Sun/Solaris & Wisconsin \\ \hline
3332: 124 & Intel/Linux & Georgia Tech \\
3333: 90 & Intel/Solaris & Georgia Tech \\
3334: 13 & Sun/Solaris & Georgia Tech \\ \hline
3335: 9 & Intel/Linux & Columbia U. \\
3336: 10 & Sun/Solaris & Columbia U. \\ \hline
3337: 33 & Intel/Linux & Italy (INFN) \\ \hline \hline
3338: 1345 & & \\ \hline
3339: \end{tabular}
3340: \caption{Machines available for stormG2, with $N=10^7$
3341: scenarios.\label{bigstorm.tab}}
3342: \end{table}
3343:
3344: The job ran for a total of almost 32 hours. The number of workers
3345: being used during the course of the run is shown in
3346: Figure~\ref{bigstorm-workers.fig}. The job was stopped after
3347: approximately 8 hours and was restarted manually from a checkpoint
3348: about 2 hours later. It then ran for approximately 24 hours to
3349: completion. The number of workers dopped off significantly on two
3350: occasions. The drops were due to the master processor ``blocking'' to
3351: solve a difficult master problem and to checkpoint the state of the
3352: computation. During this time the worker processors were idle, and
3353: MW decided to release a number of the processors rather than have them
3354: sit idle.
3355:
3356: \begin{figure}
3357: \centering
3358: \epsfig{figure=storm1e7workers.ps,angle=270,width=\linewidth}
3359: \caption{Number of workers used for stormG2, with $N=10^7$ scenarios.\label{bigstorm-workers.fig}}
3360: \end{figure}
3361:
3362: As noted in Table~\ref{tab.storm.1e7}, an average of 433 workers were
3363: present at any given point in the run. The computation used a maximum
3364: of 556 workers, and there was a ratio of 12 in the speed of the
3365: slowest and fastest machines, as determined by the benchmarks. A total
3366: of 40837 tasks were generated during the run, representing $3.99
3367: \times 10^8$ second-stage linear programs. (At this rate, an average
3368: of 3472 second-stage linear programs were being solved per second
3369: during the run.) The average time to solve a task was 774 seconds.
3370: The total cumulative CPU time spent by the worker pool was 9014 hours,
3371: or just over one year of computation.
3372:
3373: %
3374:
3375: %
3376: %
3377: %
3378: %
3379: %
3380: %
3381: %
3382: %
3383: %
3384: %
3385: %
3386: %
3387: %
3388: %
3389: %
3390: %
3391: %
3392: %
3393: %
3394: %
3395: %
3396: %
3397: %
3398:
3399:
3400: \section{Conclusions}
3401:
3402: We have described L-shaped and trust-region algorithms for solving the
3403: two-stage stochastic linear programming problem with recourse, and
3404: derived asynchronous variants suitable for parallel implementation on
3405: distributed heterogeneous computational grids. We prove convergence
3406: results for the trust-region algorithms. Implementations based on the
3407: MW library and the Condor system are described, and we report on
3408: computational studies using different algorithmic parameters under
3409: different pool conditions. Becasue of the dynamic nature of the
3410: computational pool, it is impossible to arrive at a ``best''
3411: configuration or set of algorithmic parameters for all instances.
3412: Instead, it may be important to adjust the algorithm parameters
3413: dynamically; we suggest this as a line of future research. Finally,
3414: we report on the solution of some large sampled instances of problems
3415: from the literature, including an instance of the stormG2 problem
3416: whose deterministic equivalent has more than $10^{10}$ unknowns.
3417: Since the use of the computational grid has the greatest benefit on
3418: problems that require large amounts of computation, the algorithms
3419: developed here are best suited to larger (multistage) problems or
3420: incorporated into a sample average approximation approach (see Shapiro and Homem-de-Mello~\cite{ShaH01}.
3421:
3422: \section*{Acknowledgments}
3423:
3424: This research was supported by the Mathematics, Information, and
3425: Computational Sciences Division subprogram of the Office of Advanced
3426: Scientific Computing Research, U.S. Department of Energy, under
3427: Contract W-31-109-Eng-38. We also acknowledge the support of the
3428: National Science Foundation, under Grant CDA-9726385. We would also
3429: like to acknowledge the IHPCL at Georgia Tech, which is supported by a
3430: grant from Intel; the National Computational Science Alliance under
3431: grant number MCA00N015N for providing resources at the University of
3432: Wisconsin, the NCSA SGI/CRAY Origin2000, and the University of New
3433: Mexico/Albuquerque High Performance Computing Center AltaCluster; and
3434: the Italian Istituto Nazionale di Fisica Nucleare (INFN) and Columbia
3435: University for allowing us access to their Condor pools.
3436:
3437: We are grateful to Alexander Shapiro and Sven Leyffer for discussions
3438: about the algorithms presented here.
3439:
3440: \bibliographystyle{plain}
3441: \bibliography{refs}
3442:
3443: \end{document}
3444:
3445: This was an earlier proof of finite termination. It applied to a
3446: version of the termination test in which $\Delta_{k,\ell}$ was present
3447: on the right-hand side. Moreover, it was wrong in the last step, where
3448: we used in correctly $\Delta_{k,\ell} > \Delta_{\rm lo}$. In fact as
3449: the new version of Lemma~\ref{lem:trbounds} shows, we have only that
3450: \[
3451: \Delta_{k,\ell} \ge \min( \Delta_{\rm lo}, \| x^k-P(x^k)\|_{\infty}/4).
3452: \]
3453: Still, elements of the proof might be useful if we ever want to devise
3454: a termination test that guarantees some sort of near-optimality.
3455:
3456: \begin{theorem} \labtag{th:fint}
3457: When $\epstol>0$, Algorithm TR terminates finitely.
3458: \end{theorem}
3459: \begin{proof}
3460: In the first part of the proof, we show that the algorithm cannot
3461: ``get stuck'' at a particular $x^k$, generating an infinite sequence
3462: of minor iterations at $x^k$ without eventually satisfying either the
3463: termination test or the acceptance test \eqnok{tr.accept}.
3464:
3465: Consider first the case of $x^k \notin \cS$. From
3466: Lemma~\ref{lem:trbounds}, we have that the right-hand side of the
3467: termination test is bounded below by a positive constant as follows:
3468: \beq \labtag{fint.0}
3469: \epstol \Delta_{k,\ell} (1+| \cQ(x^k)|) \ge \epstol \Delta_{\rm lo} >0.
3470: \eeq
3471: By using the reasoning in the proof of Theorem~\ref{th:tr:ft},
3472: together with the monotonicity property of Lemma~\ref{lem:mkl}, we see
3473: that an infinite sequence of minor iterations would have the property
3474: that
3475: \beq \labtag{fint.1}
3476: \cQ(x^k) - m_{k,\ell}(x^{k,\ell}) \downarrow 0.
3477: \eeq
3478: Therefore, the minor iteration sequence must terminate finitely,
3479: either by satisfying the termination test or the trust-region
3480: acceptance test \eqnok{tr.accept}.
3481:
3482: Now consider $x^k \in \cS$, and consider first the situation in which
3483: trust-region radii $\Delta_{k,\ell}$, $\ell=1,2,\dots$ are bounded
3484: below, that is, $\Delta_{k,\ell} \ge \bar{\Delta}$ for some
3485: $\bar{\Delta}>0$ and all $\ell=1,2,\dots$. Then the right-hand side of
3486: \eqnok{conv.test} is strictly positive, that is,
3487: \[
3488: \epstol \Delta_{k,\ell} (1+| \cQ(x^k)|) \ge \epstol \bar{\Delta} >0.
3489: \]
3490: The logic leading to \eqnok{fint.1} again holds for this case, so the
3491: minor iteration sequence must eventually satisfy the convergence test
3492: and terminate.
3493:
3494: For the other case, we have that $x^k \in \cS$ and $\Delta_{k,\ell}
3495: \downarrow 0$ as $\ell \to \infty$. Because of our assumption that the
3496: \eqnok{conv.test} is not satisfied, we have for all $\ell=1,2,\dots$
3497: that
3498: \beq \labtag{fint.2}
3499: \frac{\cQ(x^k) - m_{k,\ell}(x^{k,\ell})}{\Delta_{k,\ell}} >
3500: \epstol (1+| \cQ(x^k) |) \ge \epstol, \;\;
3501: \ell=1,2,\dots.
3502: \eeq
3503: Because $\Delta_{k,\ell} \to 0$, it follows
3504: from the Reduce-$\Delta$ routine, we have that there are
3505: infinitely many minor iterations $\ell_j$, $j=1,2,\dots$, such that
3506: $\rho>1$, that is,
3507: \beq \labtag{fint.3}
3508: \Delta_{k,\ell_j} \frac{\cQ(x^{k,\ell_j}) - \cQ(x^k)}{\cQ(x^k)-m_{k,\ell_j}(x^{k,\ell_j})} >1.
3509: \eeq
3510: By combining \eqnok{fint.2} (at $\ell=\ell_j$) with \eqnok{fint.3}, we
3511: obtain
3512: \beq \labtag{fint.4}
3513: \cQ(x^{k,\ell_j}) - \cQ(x^k) > \epstol, \;\; j=1,2,\dots.
3514: \eeq
3515: Using \eqnok{subd.5}, together with $\| g_j \|_1 \le \beta$ for all $g_j \in
3516: \partial \cQ(x^{k,\ell_j})$, we have
3517: \beq \labtag{fint.5}
3518: \cQ(x^{k,\ell_j}) - \cQ(x^k) \le \beta \| x^k - x^{k,\ell_j} \|_{\infty}
3519: \le \beta \Delta_{k,\ell_j}, \;\; j=1,2,\dots.
3520: \eeq
3521: Since $\Delta_{k,\ell_j} \downarrow 0$ by assumption, \eqnok{fint.5}
3522: contradicts \eqnok{fint.4}, so we conclude that the minor iteration
3523: sequence terminates finitely in this case as well.
3524:
3525: Having shown that no major iterate $x^k$ can give rise to a
3526: non-terminating sequence of minor iterations, we show now that the
3527: sequence of major iterations itself must terminate. Consider first the
3528: case in which $x^k \in \cS$ for some $k$. Since $\cQ(x^{k,\ell}) \ge
3529: \cQ(x^k) = \cQ^*$ for all $\ell=1,2,\dots$, the trust-region
3530: acceptance test \eqnok{tr.accept} can be satisfied only if $\cQ(x^k) -
3531: m_{k,\ell}(x^{k,\ell}) =0$. But if this were the case, the left-hand
3532: side of \eqnok{conv.test} would have been satisfied before
3533: \eqnok{tr.accept} was even tested, and the algorithm would have
3534: stopped. Therefore, the algorithm fails to terminate at $x^k$ only if
3535: an infinite sequence of minor iterations is generated at this
3536: point---a case that we have already ruled out.
3537:
3538: We are left with the case of an infinite sequence of major iterations
3539: $\{ x^k \}_{k=1,2,\dots}$ for which $x^k \notin \cS$ for all
3540: $k=1,2,\dots$. If \eqnok{conv.test} is never satisfied, we have from
3541: Lemma~\ref{lem:trbounds} that \eqnok{fint.0} holds at all $k$ and
3542: $\ell$. Because the acceptance test \eqnok{tr.accept} is eventually
3543: satisfied by some minor iteration $\ell$ for each $k$, we have from
3544: \eqnok{tr.accept} and \eqnok{conv.test} that
3545: \[
3546: \cQ(x^k) - \cQ(x^{k+1}) \ge
3547: \xi \left( \cQ(x^k) - m_{k,\ell}(x^{k,\ell}) \right) \ge
3548: \xi \epstol \Delta_{\rm lo} >0.
3549: \]
3550: This bound implies that $\cQ(x^k) \downarrow -\infty$, contradicting
3551: Assumption~\ref{ass:S}.
3552: \end{proof}
3553:
3554: {\bf The following stuff was the earlier analysis of Algorithm ATR,
3555: much of it now wrong and in any case superseded.}
3556:
3557: \begin{proof}
3558: Suppose for contradiction that $x^I \notin \cS$ is an incumbent that
3559: is never replaced by a later trial point $x^k$. Clearly we must
3560: have $\cQ^I = \cQ(x^I)$. (The alternative $\cQ^I=\infty$ can happen
3561: only if no evaluation of $\cQ(\cdot)$ is ever completed; this is
3562: excluded by \eqnok{all.tasks.completed}.) In fact, because of
3563: \eqnok{all.tasks.completed}, the sequence $\{ x^k \}$ is infinite.
3564: Moreover, since at most $K$ of these points are generated in the
3565: basket-filling part of {\tt act\_on\_completed\_task}, we have that
3566: infinitely many of them are obtained by solving a trust-region
3567: subproblem ${\tt trsub}(x^I, \Delta_k)$ centered on $x^I$. Each time
3568: one of these points is generated, it eventually contributes cuts to
3569: the model function $m$ that are never deleted, since these cuts are
3570: all labeled with the index pair $(I,k)$, and we have by assumption
3571: that $I \in \cB$ forever. Moreover, by Lemma~\ref{lem:atr1.1}, all
3572: these $x^k$ lie in $\cL(\cQ_{\rm max}; \Delta_{\rm hi})$, so we can
3573: define a uniform bound $\bar{\beta}$ on the $1$-norm of the
3574: subgradients of $\cQ(x)$ for all $x \in \cL(\cQ_{\rm max};
3575: \Delta_{\rm hi})$, analogously to \eqnok{def.beta}. Equipped with
3576: $\bar{\beta}$, we can now apply logic very similar to that of
3577: Lemma~\ref{lem:tr:ft} and Theorem~\ref{th:tr:ft}, with the $x^k$
3578: obtained by solving ${\tt trsub}(x^I, \Delta_k)$ playing the role of
3579: the minor iterates of Algorithm TR, to deduce that one of the
3580: $x^k$'s in question must eventually satisfy the test
3581: \[
3582: \cQ(x^k) \le {\tt target}_k = \cQ(x^I) - \xi \left( \cQ(x^I) - m(x^k) \right).
3583: \]
3584: The $x^k$ that passes this test also trivially passes the test
3585: $\cQ(x^k) < \cQ^I$, so that it replaces $x^I$ as the incumbent,
3586: giving a contradiction.
3587: \end{proof}
3588:
3589: We conclude that unless some incumbent satisfies $x^I \in \cS$, the
3590: sequence of incumbents $\{x^{I_i}\}_{i=0,1,2, \dots}$ must be
3591: infinite. From the conditional test in the basket-update part of {\tt
3592: act\_on\_completed\_task}, we know that the sequence $\{
3593: \cQ(x^{I_i}) \}_{i=0,1,2, \dots}$ is monotonically decreasing, and
3594: that $\cQ(x^{I_i}) \le {\tt target}_{I_i}$. At most a finite number of these
3595: quantities ${\tt target}_{I_i}$ satisfy ${\tt target}_{I_i} = \infty$ (since the
3596: basket-filling part of {\tt act\_on\_completed\_task} is executed at
3597: most $K$ times), so for infinitely many $I_i$, we have that ${\tt target}_{I_i}$
3598: is defined as in \eqnok{target.k}, and so
3599: \beq \labtag{atr1.chain1}
3600: \cQ(x^{I_i}) \le {\tt target}_{I_i} = \cQ(x^{I_{i_-}}) - \xi \left(
3601: \cQ(x^{I_{i_-}}) - m(x^{I_i})
3602: \right),
3603: \eeq
3604: %
3605: for some previous incumbent indexed by $I_{i_-}$, with $i_- < i$. It
3606: follows that we can choose at least one infinite chain of incumbents
3607: such that each point in the chain satisfies the trust-region
3608: acceptance test at the previous point in the chain. That is, we have
3609: a sequence $\{ i_j \}_{j=0,1,2,\dots}$ such that
3610: \beq \labtag{atr1.chain2}
3611: \cQ(x^{I_{i_j}}) \le \cQ(x^{I_{i_{j-1}}}) - \xi \left(
3612: \cQ(x^{I_{i_{j-1}}}) - m(x^{I_{i_j}})
3613: \right), \sgap j=1,2,\dots.
3614: \eeq
3615: Since at every point in Algorithm ATR, $m(\cdot)$ is a linear
3616: underestimate of $\cQ(\cdot)$, and since $m(x^{I_{i_{j-1}}}) =
3617: \cQ(x^{I_{i_{j-1}}})$ at the moment when the right-hand side
3618: ${\tt target}_{I_{i_j}}$ is evaluated, we can use the proof technique of
3619: Lemma~\ref{lem:tr:1} to deduce that
3620: \beqas
3621: m(x^{I_{i_{j-1}}}) - m(x^{I_{i_j}}) & \ge & \hat{\epsilon}
3622: \min \left( \Delta_{I_{i_j}},
3623: \| x^{I_{i_{j-1}}} - P(x^{I_{i_{j-1}}})\|_{\infty} \right) \\
3624: & \ge & \hat{\epsilon}
3625: \min \left( \Delta_{\rm lo},
3626: \| x^{I_{i_{j-1}}} - P(x^{I_{i_{j-1}}}) \|_{\infty} \right),
3627: \eeqas
3628: at the moment at which ${\tt target}_{I_{i_j}}$ is evaluated. By substituting into
3629: \eqnok{atr1.chain2}, we deduce that
3630: \beq \labtag{atr1.chain3}
3631: \cQ(x^{I_{i_{j-1}}}) - \cQ(x^{I_{i_j}}) \ge
3632: \xi \hat{\epsilon} \min \left( \Delta_{\rm lo},
3633: \| x^{I_{i_{j-1}}} - P(x^{I_{i_{j-1}}}) \|_{\infty} \right),
3634: \sgap j=1,2,\dots.
3635: \eeq
3636:
3637: \begin{theorem} \labtag{th:atr1.3}
3638: Suppose that none of the incumbents $x^I$ lies in the solution
3639: set. Then $ \lim_{j \to \infty} \| x^{I_i} - P(x^{I_i}) \| = 0$.
3640: \end{theorem}
3641: \begin{proof}
3642: Consider the sequence $\{ \cQ(x^{I_i}) \}$ of objective values of
3643: incumbents. This sequence is monotonically decreasing and is bounded
3644: below by $\cQ^*$, so it has a limit, say $\bar{\cQ}$. Assume that the
3645: strict inequality $\bar{\cQ}> \cQ^*$ is satisfied. We then have for
3646: all $x^{I_i}$ that $\cQ(x^{I_i}) > \bar{\cQ} > \cQ^*$ so by continuity of
3647: $\cQ$ and boundedness of the subdifferential $\partial \cQ$,
3648: there is $\delta>0$ such that
3649: \beq \labtag{atr1.away}
3650: \| x^{I_i} - P(x^{I_i}) \| \ge
3651: \delta, \sgap \mbox{for all $i=0,1,2,\dots$}.
3652: \eeq
3653: Consider now the infinite chain of incumbents discussed above.
3654: We have from \eqnok{atr1.chain2} and \eqnok{atr1.away} that
3655: \beq
3656: \cQ(x^{I_{i_{j-1}}}) - \cQ(x^{I_{i_j}}) \ge
3657: \xi \hat{\epsilon} \min \left( \Delta_{\rm lo}, \delta \right) >0,
3658: \sgap j=1,2,\dots.
3659: \eeq
3660: which implies that $\cQ(x^{I_{i_j}}) \downarrow -\infty$ as $j \to
3661: \infty$, giving a contradiction.
3662: We therefore have that $\cQ(x^{I_i})$ converges monotonically to $\cQ^*$.
3663: The result now follows immediately from \eqnok{weak.sharp}.
3664: \end{proof}
3665:
3666: