1: \section{Introduction}\label{sec:intro}
2: The Analysis of Algorithms community has been challenged by the
3: existence of remarkable algorithms that are known by scientists and
4: engineers to work well in practice, but whose theoretical analyses
5: are negative or inconclusive.
6: The root of this problem is that algorithms
7: are usually analyzed in one of two ways: by worst-case or average-case
8: analysis.
9: Worst-case analysis can improperly suggest that an
10: algorithm will perform poorly by examining its performance under
11: the most contrived circumstances.
12: Average-case analysis was introduced to
13: provide a less pessimistic measure of the performance of algorithms,
14: and many practical algorithms perform well on the random
15: inputs considered in average-case analysis.
16: However, average-case analysis may be unconvincing as
17: the inputs encountered in many application domains
18: may bear little resemblance to the random inputs
19: that dominate the analysis.
20:
21: We propose an analysis that we call smoothed analysis which
22: can help explain the
23: success of algorithms that have poor worst-case complexity
24: and whose inputs look sufficiently different from random that
25: average-case analysis cannot be convincingly applied.
26: In smoothed analysis, we measure the
27: performance of an algorithm under slight random perturbations of
28: arbitrary inputs.
29: In particular, we consider
30: Gaussian perturbations of inputs to algorithms that take real
31: inputs, and we measure the running times of algorithms in terms
32: of their input size and the standard deviation of the Gaussian perturbations.
33:
34: We show that the simplex method has polynomial smoothed
35: complexity.
36: The simplex method is the classic example of an
37: algorithm that is known to perform well in practice but which takes
38: exponential time in the worst case
39: \cite{KleeMinty,Murty,GoldfarbSit,Goldfarb,AvisChvatal,Jeroslow,AmentaZiegler}.
40: In the late 1970's and early 1980's the simplex method was shown
41: to converge in expected polynomial time on various distributions of
42: random inputs by researchers including Borgwardt, Smale, Haimovich, Adler,
43: Karp, Shamir, Megiddo, and Todd
44: \cite{Borg82,Borg77,SmaleRand,Haimovich,AdlerKarpShamir,AdlerMegiddo,ToddRand}.
45: These works introduced novel probabilistic tools to the analysis
46: of algorithms, and provided some intuition as to why the
47: simplex method runs so quickly.
48: However, these analyses are dominated by
49: ``random looking'' inputs: even if one were to prove
50: very strong bounds on the higher moments of the distributions
51: of running times on random inputs,
52: one could not prove that an algorithm performs well
53: in any particular small neighborhood of inputs.
54:
55: To bound expected running times on small neighborhoods of inputs,
56: we consider linear programming problems in the form
57: \begin{eqnarray}\label{prg:A}
58: & \mbox{maximize} & \zz ^{T} \xx \nonumber \\
59: & \mbox{subject to} & \AA \xx \leq \yy,
60: \end{eqnarray}
61: and prove that for every vector $\zz$
62: and every matrix $\AAo$ and vector $\orig{\yy}$,
63: the expectation over standard deviation
64: $\sigma \left(\max_{i}\norm{(\orig{y}_{i}, \aao_{i})} \right)$
65: Gaussian perturbations $\AA$ and $\yy$ of
66: $\AAo $ and $\orig{\yy}$
67: of the time taken by a two-phase shadow-vertex simplex method
68: to solve such a linear program
69: is polynomial in $1/\sigma$ and the dimensions of $\AA$.
70:
71:
72: \subsection{Linear Programming and the Simplex Method}\label{ssec:lp}
73: It is difficult to overstate the importance of linear programming
74: to optimization.
75: Linear programming problems arise in innumerable industrial contexts.
76: Moreover, linear programming is often used as a fundamental step
77: in other optimization algorithms.
78: In a linear programming problem, one is asked to maximize or
79: minimize a linear function over a polyhedral region.
80:
81: Perhaps one reason we see so many linear programs is that we
82: can solve them efficiently.
83: In 1947, Dantzig~\cite{Dantzig} introduced the simplex method,
84: which was the first practical approach to solving linear programs
85: and which remains widely used today.
86: To state it roughly, the simplex method proceeds by walking from
87: one vertex to another of the polyhedron defined by the inequalities
88: in \eqref{prg:A}.
89: At each step, it walks to a vertex that is better with respect to
90: the objective function.
91: The algorithm will either determine that
92: the constraints are unsatisfiable, determine that the objective function is
93: unbounded, or reach a vertex from which it cannot make
94: progress, which necessarily optimizes the objective function.
95:
96: Because of its great importance, other algorithms for
97: linear programming have been invented.
98: In 1979, Khachiyan~\cite{Khachiyan} applied the
99: ellipsoid algorithm to linear programming and proved that
100: it always converged in time polynomial in
101: $d$, $n$, and $L$---the number of
102: bits needed to represent the linear program.
103: However, the ellipsoid algorithm has not been competitive
104: with the simplex method in practice.
105: In contrast, the interior-point method introduced in 1984
106: by Karmarkar~\cite{Karmarkar}, which also runs in time polynomial
107: in $d$, $n$, and $L$, has performed very well:
108: variations of the interior point method are competitive with
109: and occasionally superior to the simplex method in practice.
110:
111: In spite of half a century of attempts to unseat it,
112: the simplex method remains the most popular method
113: for solving linear programs.
114: However, there has been no satisfactory theoretical
115: explanation of its excellent performance.
116: A fascinating approach to understanding the performance of the
117: simplex method has been the attempt to
118: prove that there always exists a short
119: walk from each vertex to the optimal vertex.
120: The Hirsch conjecture states that there should
121: always be a walk of length at most $n - d$.
122: Significant progress on this conjecture was
123: made by Kalai and Kleitman~\cite{KalaiKleitman}, who proved that
124: there always exists a walk of length
125: at most $n ^{\log_{2}d + 2}$.
126: However, the existence of such a short walk does not imply
127: that the simplex method will find it.
128:
129: A simplex method is not completely defined until one
130: specifies its \textit{pivot rule}---the method by which
131: it decides which vertex to walk to
132: when it has many to choose from.
133: There is no deterministic pivot rule under which the
134: simplex method is known to take a sub-exponential
135: number of steps.
136: In fact, for almost every deterministic
137: pivot rule there is a family of polytopes
138: on which it is known to take an exponential number of
139: steps
140: \cite{KleeMinty,Murty,GoldfarbSit,Goldfarb,AvisChvatal,Jeroslow}.
141: (See~\cite{AmentaZiegler} for a survey and a
142: unified construction of these polytopes).
143: The best present analysis of randomized pivot rules shows
144: that they take expected time $n^{O (\sqrt{d})}$%
145: \cite{KalaiSubexp,Matousek},
146: which is quite far from the polynomial complexity
147: observed in practice.
148: This inconsistency between the exponential worst-case behavior of the
149: simplex method and its everyday practicality leave us wanting
150: a more reasonable theoretical analysis.
151:
152: %% from STOC version
153:
154: Various average-case analyses of the simplex method
155: have been performed.
156: Most relevant to this paper is the analysis of
157: Borgwardt~\cite{Borg77,Borg82}, who
158: proved that the simplex method with the shadow
159: vertex pivot rule runs in expected polynomial time
160: for polytopes whose constraints are drawn independently from
161: spherically symmetric distributions
162: (\textit{e.g.} Gaussian distributions centered at the origin).
163: Independently,
164: Smale~\cite{SmaleRand,SmaleRand2} proved bounds on the
165: expected running time of Lemke's self-dual parametric simplex algorithm
166: on linear programming problems
167: chosen from a spherically-symmetric distribution.
168: Smale's analysis was substantially improved by Megiddo~\cite{Megiddo}.
169:
170: While these average-case analyses are significant
171: accomplishments, it is not clear whether they
172: actually provide intuition for what happens
173: on typical inputs.
174: Edelman~\cite{EdelmanRoulette} writes on this point:
175: \begin{quotation}
176: What is a mistake is to psychologically link a random
177: matrix with the intuitive notion of a ``typical'' matrix
178: or the vague concept of ``any old matrix.''
179: \end{quotation}
180:
181: Another model of random linear programs was studied in
182: a line of research initiated independently
183: by Haimovich~\cite{Haimovich} and Adler~\cite{Adler}.
184: Their works
185: considered the maximum over matrices, $\AA$,
186: of the expected time taken by parametric simplex
187: methods to solve linear programs over these matrices
188: in which the directions of the
189: inequalities are chosen at random.
190: As this framework considers the maximum of an average,
191: it may be viewed as a precursor to smoothed
192: analysis---the distinction being that
193: the random choice of
194: inequalities cannot be viewed as a perturbation,
195: as different choices yield radically different linear programs.
196: Haimovich and Adler both proved that
197: parametric simplex methods
198: would take an expected linear number of steps
199: to go from the vertex minimizing the objective function
200: to the vertex maximizing the objective function,
201: even conditioned on the program being feasible.
202: While their theorems confirmed the intuitions of many practitioners,
203: they were geometric rather than algorithmic%
204: \footnote{Our results in Section~\ref{sec:shadow} are analogous to
205: these results.}
206: as it
207: was not clear how an algorithm would locate either vertex.
208: Building on these analyses, Todd~\cite{ToddRand},
209: Adler and Megiddo~\cite{AdlerMegiddo},
210: and Adler, Karp and Shamir~\cite{AdlerKarpShamir}
211: analyzed parametric algorithms for linear programming under this model
212: and proved quadratic
213: bounds on their expected running time.
214: While the random inputs considered in these analyses are
215: not as special as the random inputs obtained from spherically
216: symmetric distributions,
217: the model of randomly flipped inequalities provokes some
218: similar objections.
219:
220: \subsection{Smoothed Analysis of Algorithms
221: and Related Work}\label{ssec:smooth}
222: We introduce the \textit{smoothed analysis of algorithms} in the hope that
223: it will help explain the good practical performance of many
224: algorithms that worst-case does not and for which average-case analysis
225: is unconvincing.
226: Our first application of the smoothed analysis of algorithms will be to
227: the simplex method.
228: We will consider the maximum over $\AAo$
229: and $\orig{\yy}$ of the expected running time
230: of the simplex method on inputs of the form
231: \begin{eqnarray}
232: & \mbox{maximize} & \zz ^{T} \xx \nonumber \\
233: & \mbox{subject to} & (\AAo + \GG) \xx \leq (\orig{\yy} + \hh), \label{prg:AG}
234: \end{eqnarray}
235: where we let $\AAo$ and $\orig{\yy}$ be arbitrary
236: and $\GG$ and $\hh$ be a matrix and a vector of independently chosen
237: Gaussian random variables of mean $0$ and
238: standard deviation $\sigma \left(\max_{i}\norm{(\orig{y}_{i}, \aao_{i})} \right)$.
239: If we let $\sigma $ go to $0$, then we obtain the worst-case
240: complexity of the simplex method; whereas, if we let $\sigma $
241: be so large that $\GG$ swamps out $\AA$, we obtain the
242: average-case analyzed by Borgwardt.
243: By choosing polynomially small $\sigma $, this analysis combines
244: advantages of worst-case and average-case analysis, and roughly
245: corresponds to the notion of imprecision in low-order digits.
246:
247: In a smoothed analysis of an algorithm, we assume that the inputs
248: to the algorithm are subject to slight random perturbations,
249: and we measure the complexity of the algorithm in terms of the input
250: size and the standard deviation of the perturbations.
251: If an algorithm has low smoothed complexity, then one should expect it to
252: work well in practice since most real-world problems are generated
253: from data that is inherently noisy.
254: Another way of thinking about smoothed complexity is to observe that if an
255: algorithm has low smoothed complexity, then one must be unlucky
256: to choose an input instance on which it performs poorly.
257:
258:
259: We now provide some definitions for the smoothed analysis of algorithms
260: that take real or complex inputs.
261: For an algorithm $A$ and input $\xx $, let
262: \[
263: \calC_{A} (\xx )
264: \]
265: be a complexity measure of $A$ on input $\xx$.
266: Let $X$ be the domain of inputs to $A$, and let
267: $X_{n}$ be the set of inputs of size $n$.
268: The size of an input can be measured in various ways.
269: Standard measures are the number of real variables
270: contained in the input and the sums of the bit-lengths
271: of the variables.
272: Using this notation, one can say that $A$ has worst-case
273: $\calC$-complexity $f (n)$ if
274: \[
275: \max _{\xx \in X_{n}} (\calC_{A} (\xx )) = f (n).
276: \]
277: Given a family of distributions $\mu_{n} $ on $X_{n}$, we say that $A$
278: has average-case $\calC$-complexity $f (n)$ under $\mu $ if
279: \[
280: \expec{\xx \from{\mu _{n}}{X_{n}}}{\calC_{A} (\xx )} = f (n).
281: \]
282: Similarly, we say that $A$ has \textit{smoothed $\calC$-complexity}
283: $f (n, \sigma )$ if
284: \begin{equation}\label{eqn:smoothedcomplexity}
285: \max _{\xx \in X_{n}}
286: \expec{\gg }{\calC_{A} (\xx + \left(\sigma \norm{\xx}_{?} \right) \gg )} = f (n, \sigma ),
287: \end{equation}
288: \index{smoothed-complexity}%
289: where $\left( \sigma \norm{\xx}_{?} \right) \gg$ is a vector of Gaussian random variables of mean $0$ and
290: standard deviation $\sigma \norm{\xx}_{?}$ and $\norm{\xx}_{?}$ is a measure of the magnitude
291: of $\xx$, such as the largest element or the norm.
292: We say that an algorithm has \textit{polynomial smoothed complexity}
293: if its smoothed complexity is polynomial in $n$ and $1/\sigma $.
294: \index{polynomial smoothed complexity}
295: In Section~\ref{sec:conclusions}, we present some
296: generalizations of the definition of smoothed complexity that
297: might prove useful.
298: To further contrast smoothed analysis with average-case analysis,
299: we note that the probability mass in \eqref{eqn:smoothedcomplexity} is
300: concentrated in a region of radius $O (\sigma \sqrt{n})$ and
301: volume at most $O (\sigma \sqrt{n})^{n}$,
302: and so, when $\sigma$ is small, this region contains an exponentially small fraction
303: of the probability mass in an average-case analysis.
304: Thus, even an extension of average-case analysis to higher moments
305: will not imply meaningful bounds on smoothed complexity.
306:
307: A discrete analog of smoothed analysis has been studied in a collection
308: of works inspired by Santha and Vazirani's \textit{semi-random source}
309: model~\cite{SanthaVazirani}.
310: In this model, an adversary generates an input, and each bit of this input
311: has some probability of being flipped.
312: Blum and Spencer~\cite{BlumSpencer} design a polynomial-time
313: algorithm that $k$-colors
314: $k$-colorable graphs generated by this model.
315: Feige and Krauthgamer~\cite{FeigeKrauthgamer} analyze a model
316: in which the adversary is more powerful,
317: and use it to show that Turner's algorithm~\cite{Turner}
318: for approximating the bandwidth performs well
319: on semi-random inputs.
320: They also improve Turner's analysis.
321: Feige and Kilian~\cite{FeigeKilian}
322: present polynomial-time algorithms that
323: recover large independent sets,
324: $k$-colorings, and optimal bisections
325: in semi-random graphs.
326: They also demonstrate that significantly better
327: results would lead to surprising
328: collapses of complexity classes.
329:
330: \subsection{Our Results}\label{ssec:results}
331:
332: We consider
333: the maximum over $\zz$, $\orig{\yy}$,
334: and $\vs{\aao}{1}{n}$ of the expected time taken
335: by a two-phase shadow vertex simplex method to solve
336: linear programming problems of the form
337: \begin{eqnarray}
338: & \mbox{maximize} & \zz^{T} \xx \nonumber \\
339: & \mbox{subject to} & \form{\aa _{i}}{\xx} \leq y _{i},
340: \mbox{ for $1 \leq i \leq n$,} \label{eqn:lpEnumerated2}
341: \end{eqnarray} \index{zz@$\zz $}%
342: where each $\aa _{i}$ is a Gaussian random vector of standard deviation
343: $\sigma \max_{i} \norm{(\orig{y}_{i}, \aao_{i})}$ centered at $\aao _{i}$,
344: and each $y_{i}$ is a Gaussian random variable of standard deviation
345: $\sigma \max_{i} \norm{(\orig{y}_{i}, \aao_{i})}$ centered at $\orig{y} _{i}$.
346:
347: We begin by considering the case in which
348: $\yy = \oone $, $\norm{\aao _{i}} \leq 1$,
349: and $\sigma < 1/3 \sqrt{d \ln n}$.
350: In this case, our first result, Theorem~\ref{thm:shadow}, says that
351: for every vector
352: $\tt $ the expected size of the {\em shadow} of the polytope---the
353: projection of the polytope defined
354: by the equations (\ref{eqn:lpEnumerated2}) onto the plane
355: spanned by $\tt $ and $\zz $---is polynomial in $n$, the dimension,
356: and $1/\sigma $.
357: This result is the geometric foundation of our work, but
358: it does not directly bound the running time of an algorithm,
359: as the shadow relevant to the analysis of an algorithm
360: depends on the perturbed program and cannot be specified
361: beforehand as the vector $\tt$ must be.
362: In Section~\ref{sec:introSVM2phase}, we describe a two-phase
363: shadow-vertex simplex algorithm,
364: and in Section~\ref{sec:phaseI} we
365: use Theorem~\ref{thm:shadow} as a black box to show
366: that it takes expected time polynomial in $n$, $d$,
367: and $1/\sigma $ in the case described above.
368:
369: Efforts have been made to analyze how much the solution of a linear
370: program can change as its data is perturbed.
371: For an introduction to such analyses,
372: and an analysis of the complexity of interior point
373: methods in terms of the resulting condition number,
374: we refer the reader to
375: the work of Renegar~\cite{RenegarFunc,RenegarCond,RenegarPert}.
376:
377:
378: \subsection{Intuition Through Condition Numbers}\label{sec:intuition}
379: For those already familiar with the simplex method and condition numbers,
380: we include this section to provide some intuition for why our
381: results should be true.
382:
383: Our analysis will exploit geometric properties
384: of the condition number of a matrix, rather than of a
385: linear program.
386: We start with the observation that if a corner of a polytope
387: is specified by the equation $A_{I} \xx = \yy_{I}$,
388: where $I$ is a $d$-set, then the condition number of
389: the matrix $A_{I}$ provides a good measure of how far the corner
390: is from being flat.
391: Moreover, it is relatively easy to show that if
392: $A$ is subject to perturbation, then it is unlikely that
393: $A_{I}$ has poor condition number.
394: So, it seems intuitive that if $A$ is perturbed, then most
395: corners of the polytope should have angles bounded away
396: from being flat.
397: This already provides some intuition as to why the simplex method
398: should run quickly: one should make reasonable progress as
399: one rounds a corner if it is not too flat.
400:
401: There are two difficulties in making the above intuition rigorous:
402: the first is that even if $A_{I}$ is well-conditioned for most
403: sets $I$, it is not clear that $A_{I}$ will be well-conditioned
404: for most sets $I$ that are bases of corners of the polytope.
405: The second difficulty is that even if most corners of the polytope
406: have reasonable condition number, it is not clear that a simplex
407: method will actually encounter many of these corners.
408: By analyzing the shadow vertex pivot rule, it is possible to resolve
409: both of these difficulties.
410:
411: The first advantage of studying the shadow vertex pivot rule is
412: that its analysis comes down to studying the expected sizes
413: of shadows of the polytope.
414: From the specification of the plane onto which the polytope will be projected,
415: one obtains a characterization of all the corners that will be in
416: the shadow, thereby avoiding the complication of an iterative
417: characterization.
418: The second advantage is that these corners are specified by the
419: property that they optimize a particular objective function,
420: and using this property one can actually bound the probability
421: that they are ill-conditioned.
422: While the results of Section~\ref{sec:shadow} are not stated in
423: these terms, this is the intuition behind them.
424:
425: Condition numbers also play a fundamental role in our
426: analysis of the shadow-vertex algorithm.
427: The analysis of the algorithm differs from the mere analysis
428: of the sizes of shadows in that, in the study of an algorithm,
429: the plane onto which the polytope is projected depends upon
430: the polytope itself.
431: This correlation of the plane with the polytope complicates
432: the analysis, but is also resolved through the help
433: of condition numbers.
434: In our analysis, we view the perturbation as the composition
435: of two perturbations, where the second is small relative to the first.
436: We show that our choice of the plane onto which we
437: project the shadow is well-conditioned with high
438: probability after the first perturbation.
439: That is, we show that the second perturbation is unlikely
440: to substantially change the plane onto which we project,
441: and therefore unlikely to substantially change the shadow.
442: Thus, it suffices to measure the expected size of the
443: shadow obtained after the second perturbation onto the
444: plane that would have been chosen after just the first
445: perturbation.
446:
447: The technical lemma that enables this analysis, Lemma~\ref{lem:MGC},
448: is a concentration result that proves that it is highly
449: unlikely that almost all of the minors of a random
450: matrix have poor condition number.
451: This analysis also enables us to show that it is highly
452: unlikely that we will need a large ``big-$M$''
453: in phase I of our algorithm.
454:
455: We note that the condition numbers of the $A_{I}$s
456: have been studied before in the complexity of
457: linear programming algorithms.
458: The condition number $\bar{\chi}_{A}$
459: of Vavasis and Ye~\cite{VavasisYe} measures
460: the condition number of the worst sub-matrix $A_{I}$,
461: and their algorithm runs in time proportional
462: to $\ln (\bar{\chi }_{A})$.
463: Todd, Tun{\c{c}}el, and Ye~\cite{ToddTuncelYe} have shown
464: that for a Gaussian random matrix the expectation
465: of $\ln (\bar{\chi }_{A})$ is $O (\min (d \ln n, n))$.
466: That is, they show that it is unlikely that any $A_{I}$
467: is exponentially ill-conditioned.
468: It is relatively simple to apply the techniques of
469: Section~\ref{sec:phaseIManyGood} to obtain a similar
470: result in the smoothed case.
471: We wonder whether our concentration result that it
472: is exponentially unlikely that many $A_{I}$
473: are even polynomially ill-conditioned could
474: be used to obtain a better smoothed analysis
475: of the Vavasis-Ye algorithm.
476:
477: \subsection{Discussion}\label{sec:introDiscussion}
478:
479: One can debate whether the definition of
480: \textit{polynomial smoothed complexity}
481: should be that an algorithm have complexity polynomial in $1/\sigma $
482: or $\log (1/\sigma )$.
483: We believe that the choice of being polynomial in $1/\sigma $
484: will prove more useful as the other definition is too strong
485: and quite similar
486: to the notion of being polynomial in the worst case.
487: In particular, one can convert any algorithm for linear programming
488: whose smoothed complexity
489: is polynomial in $d$, $n$ and $\log (1/\sigma) $
490: into an algorithm whose worst-case complexity is polynomial in $d$,
491: $n$, and $L$.
492: That said, one should certainly prefer complexity bounds that are
493: lower as a function of $1/\sigma$, $d$ and $n$.
494:
495:
496: We also remark that a simple examination of the
497: constructions that provide exponential lower bounds
498: for various pivot
499: rules~\cite{KleeMinty,Murty,GoldfarbSit,Goldfarb,AvisChvatal,Jeroslow}
500: reveals that none of these pivot rules
501: have smoothed complexity polynomial in $n$ and
502: sub-polynomial in $1/\sigma $.
503: That is, these constructions are unaffected by exponentially
504: small perturbations.
505:
506:
507:
508:
509:
510:
511:
512: % Local Variables: ***
513: % TeX-master:"shadow.tex" ***
514: % End: ***
515:
516: