cs0304018/qaba.tex
1: %% Quasiconvex Analysis of Backtracking Algorithms
2: %% Preprint, February 2003
3: 
4: %% Index of notation:
5: %%
6: %% MIS example:
7: %%     G: input graph
8: %%     u,v: vertices of input graph
9: %%     n: number of vertices of input graph
10: %%     k: size constraint for independent set
11: %%     T(n,k): recurrence for algorithm analysis
12: %%     Delta: triangle in G
13: %%
14: %% Recurrences:
15: %%     d: dimension of vector space
16: %%     x: vector in Z^d
17: %%     F(x): recurrence on F(x)
18: %%     i: index of term of recurrence
19: %%     tau: number of terms
20: %%     j: index of summand of term in recurrence
21: %%     delta_i,j: vector in definition of summand
22: %%     t: target vector in Z^d
23: %%     n: size parameter of vector
24: %%     f(n) = F(n t) = univariate function, the asymptotics of which are desired
25: %%
26: %% Upper bounds:
27: %%     w: weight vector
28: %%     F_w: univariate recurrence defined from weight vector
29: %%     y: real variable bounded by or related to w.x
30: %%     c_w: base of exponent of solution to asymptotics of F_w
31: %%     c: real variable viewed as possible solution to c_w
32: %%     r_w(c): monotonic function arising in analysis
33: %%     c_w,i: base of solution to single term of recurrence
34: %%     r_w,i: monotonic function arising in analysis of single term
35: %%
36: %% Quasiconvex programming:
37: %%     d: dimension of domain of function (related but unequal d from recurrences)
38: %%     q(w): quasiconvex function
39: %%     lambda: bounding value of level set, or value of qcf
40: %%     q^{\le\lambda}: level set of q
41: %%     i: index of set of qcf's (will be same as index of term in recurrence application)
42: %%     q_i: qcf in indexed set, then later qcf derived from term of recurrence
43: %%     S: set of possible indexes or set of nested convex families
44: %%     q_S: max_{i in S} q_i
45: %%     kappa(lambda), kappa_i(lambda): nested convex family
46: %%     q_kappa: qcf derived from ncf
47: %%     lambda*(S): optimal value of qcp
48: %%     ell: index of sequence converging to infimum of qcp
49: %%
50: %% Implementation, smooth QCP:
51: %%     k: index of coordinate in vector space
52: %%     w[k]: kth coordinate of w (do not use subscripts due to w_ell above)
53: %%     rho: real number
54: %%     b: integer number of bits
55: %%     x: another vector argument to qcf
56: %%     q_i^*: inward pointing normal to tangent of level set
57: %%     v: vector pointing inwards to all level sets
58: %%     epsilon: small improvement amount
59: %%
60: %% Lower bounds:
61: %%     i*(x): index of term giving maximum value for F(x)
62: %%     mu(x): arbitrary function mapping vectors to indices of terms
63: %%     G_mu: infinite graph with paths representing recurrence expansions
64: %%     pi_mu: number of paths in the graph
65: %%     B: basis of qcp
66: %%     p_{i,j}: probability of using summand j for a vertex v of G_mu with mu(v)=i
67: %%     D_i: expected change to position of v by following probabilities
68: %%     phi(x) = r_x(c)
69: %%     b_i: probability of choosing a member of the basis
70: 
71: \documentclass[11pt]{llncs}
72: \usepackage{graphicx}
73: \usepackage{cite}
74: \usepackage{url}
75: \urlstyle{same}
76: 
77: \usepackage{times}
78: \usepackage{mathfont}
79: \def\min{\mathop{\rm min}}
80: \def\max{\mathop{\rm max}}
81: \def\sup{\mathop{\rm sup}}
82: \def\inf{\mathop{\rm inf}}
83: \def\log{\mathop{{\rm log}}}
84: \def\argmin{\mathop{{\rm arg\,min}}}
85: \def\argmax{\mathop{{\rm arg\,max}}}
86: 
87: \setlength{\textwidth}{6.5in}
88: \setlength{\textheight}{9in}
89: \setlength{\evensidemargin}{0in}
90: \setlength{\oddsidemargin}{0in}
91: \setlength{\topmargin}{-.5in}
92: 
93: % magic to make big-O notation use script font
94: \mathcode`O="724F
95: 
96: \DeclareSymbolFont{AMSb}{U}{msb}{m}{n}
97: \DeclareSymbolFontAlphabet{\Bbb}{AMSb}
98: \def\R{\ensuremath{\Bbb R}}
99: \def\Z{\ensuremath{\Bbb Z}}
100: \def\E{\ensuremath{\Bbb E}}
101: \def\H{\ensuremath{\Bbb H}}
102: \def\S{\ensuremath{\Bbb S}}
103: \DeclareMathSymbol{\subsetneq}{\mathrel}{AMSb}{"28}
104: 
105: % fix broken caption in llncs
106: \makeatletter
107: \def\hb@xt@{\hbox to }
108: \makeatother
109: 
110: % MARK ENDS OF PROOFS!
111: \let\oldendproof\endproof
112: \def\endproof{\qed\oldendproof}
113: 
114: 
115: \begin{document}
116: 
117: \title{Quasiconvex Analysis of Backtracking Algorithms}
118: \author{David Eppstein}
119: \authorrunning{Eppstein}
120: \institute{School of Information \& Computer Science\\
121: University of California, Irvine\\
122: Irvine, CA 92697-3425, USA\\
123: \email{eppstein@ics.uci.edu}}
124: 
125: \date{ }
126: 
127: 
128: \maketitle
129: \begin{abstract}
130: We consider a class of multivariate recurrences frequently arising in the worst case analysis of Davis-Putnam-style exponential time backtracking algorithms for NP-hard problems.  We describe a technique for proving asymptotic upper bounds on these recurrences, by using a suitable weight function to reduce the problem to that of solving univariate linear recurrences; show how to use quasiconvex programming to determine the weight function yielding the smallest upper bound; and prove that the resulting upper bounds are within a polynomial factor of the true asymptotics of the recurrence.  We develop and implement a multiple-gradient descent algorithm for the resulting quasiconvex programs, using a real-number arithmetic package for guaranteed accuracy of the computed worst case time bounds.
131: \end{abstract}
132: 
133: \section{Introduction}
134: 
135: The topic of exponential-time exact algorithms for hard problems has led to much research in recent years \cite{Bei-SODA-99,Bys-SODA-03,DanHir-TR-00,Epp-SODA-01,Epp-WADS-01,csds0302030,GraHirNie-SAT-00,PatPudSak-FOCS-98,Sch-FOCS-99}.  In contrast to the situation with polynomial time algorithms, one can not significantly increase the size of a solvable instance by waiting for Moore's law to provide faster computers, so algorithmic improvement is especially important in this area.  Several design principles are known for such algorithms, including dynamic programming~\cite{Rob-Algs-86} and randomized hill climbing~\cite{Sch-FOCS-99}, but the most common approach is a simple form of branch-and-bound in which one repeatedly performs some case analysis to find an appropriate structure in the problem instance, and then uses that structure to split the problem into several smaller subproblems which are solved by recursive calls to the algorithm.
136: 
137: As an example of this approach, a graph coloring algorithm of the author~\cite{Epp-WADS-01} uses a subroutine for listing each maximal independent set of at most $k$ vertices ($k$-MIS) in an $n$-vertex graph.  This subroutine (slightly simplified for this example)
138: repeatedly selects and applies one of the following cases:
139: \begin{itemize}
140: \item If the input graph $G$ contains a vertex $v$ of degree zero,
141: recursively list each $(k-1)$-MIS in $G\setminus\{v\}$
142: and append $v$ to each listed set.
143: \item If the input graph $G$ contains a vertex $v$ of degree one, 
144: with neighbor $u$, recursively list each $(k-1)$-MIS in $G\setminus N(u)$ and append $u$ to each listed set.  Then, recursively list each $(k-1)$-MIS in $G\setminus \{u,v\}$ and append $v$ to each listed set.
145: \item If the input graph $G$ contains a path $v_1$-$v_2$-$v_3$ of degree-two vertices,
146: then, first, recursively list each $(k-1)$-MIS in $G\setminus N(v_1)$ and append $v_1$ to each listed set.  Second, list each $(k-1)$-MIS in $G\setminus N(v_2)$ and append $v_2$ to each listed set.  Finally, list each $(k-1)$-MIS in $G\setminus(\{v_1\}\cup N(v_3))$ and append $v_3$ to each listed set.  Note that, in the last recursive call, $v_1$ may belong to $N(v_3)$ in which case
147: the number of vertices is only reduced by three.
148: \item If the input graph $G$ contains a vertex $v$ of degree three or more,
149: recursively list each $k$-MIS in $G\setminus\{v\}$.
150: Then, recursively list each $(k-1)$-MIS in
151: $G\setminus N(v)$ and append $v$ to each listed set.
152: \end{itemize}
153: It is not hard to see that any graph contains at least one of these cases.
154: We can bound the worst-case number of output sets,
155: as a recurrence in the variables $n$ and $k$:
156: $$
157: T(n,k)=\max\left\{
158: \begin{array}{l}
159: T(n-1,k-1)\\
160: 2T(n-2,k-1)\\
161: 3T(n-3,k-1)\\
162: T(n-1,k)+T(n-4,k-1)
163: \end{array}
164: \right.
165: $$
166: As base cases, $T(0,0)=1$, $T(n,-1)=0$, and $T(n,k)=0$ for $k>n$.
167: Each term in the overall maximization of the recurrence comes from a case in the case analysis; the recurrence uses the maximum of these terms because, in a worst-case analysis, the algorithm has no control over which case will arise.  Each summand in each term comes from a recursive subproblem called for that case.  It turns out that, for the range of parameters of interest $n/4\le k\le n/3$, the recurrence above is dominated by its last two terms, and has the solution $T(n,k)=(4/3)^n(3^4/4^3)^k$.  We can also find graphs having this many $k$-MISs, so the analysis given by the recurrence is tight.
168: Similar but somewhat more complicated multivariate recurrences have arisen in our algorithm for
169: 3-coloring~\cite{Epp-SODA-01} with variables counting 3- and 4-value variables in a constraint satisfaction instance, and in our algorithm for the traveling salesman problem in cubic graphs~\cite{csds0302030} with variables counting vertices, unforced edges, forced edges, and 4-cycles of unforced edges.
170: 
171: These examples of recurrences have all had few enough terms that they can be solved by hand, but Table~\ref{tbl:bigrec} depicts a recurrence, arising from unpublished work with J. Byskov on graph coloring algorithms, that is complex enough that hand solution seems unlikely.
172: This recurrence was derived through an iterative process, starting from a simple case analysis of the problem, in which the worst cases for the algorithm were repeatedly identified and replaced by a larger number of better cases.
173: 
174: \begin{table}
175: {\tiny $$
176: \input bigrecurrence.tex
177: $$}
178: \medskip
179: \caption{A recurrence arising from unpublished work with J. Byskov on graph coloring algorithms.}
180: \label{tbl:bigrec}
181: \end{table}
182: 
183: Our interest in this paper is in performing this type of analysis algorithmically: if we are given as input a recurrence such as the ones discussed above, can we efficiently determine its asymptotic solution, and determine which of the cases in the analysis are the critical ones for the performance of the backtracking algorithm that generated the recurrence?  We show that the answer is yes, by expressing the problem as a {\em quasiconvex program}, a type of generalized linear program studied previously by the author and others with applications including finite element mesh smoothing~\cite{AmeBerEpp-Algs-99}, brain flat mapping, hyperbolic graph drawing, and conformal meshing~\cite{BerEpp-WADS-01-omt}, and multi-projector tiled display color gamut equalization~\cite{BerEpp-SCG-03}.  This quasiconvex programming formulation allows us to solve a $\tau$-term recurrence in $O(\tau)$ steps, each of which involves the solution of a constant number of algebraic equations; alternatively, we can apply any of several numerical hill-climbing techniques, which are guaranteed to converge to the global optimum of a quasiconvex program.  We describe one such technique and provide two proof-of-concept implementations of it, one for exploratory analysis using floating point, and another using the {\tt XR} exact real-number computation package for Python~\cite{Python,XR}.  The algorithms we describe are able to analyze recurrences such as the one in Table~\ref{tbl:bigrec}, producing asymptotic analysis of their behavior (here, we are interested in the asymptotics of $T(n,0)$) and identifying the cases that are bottlenecks for that analysis,
184: typically within one or two seconds for a floating point evaluation sufficiently accurate for exploratory analysis of algorithms.
185: 
186: We first began using a weighting technique for upper bounding recurrences of this type in our previous work on graph coloring~\cite{Epp-SODA-01}, without much regard to its completeness or generality until this paper.
187: Despite some searching we have been unable to find relevant prior research on asymptotic analysis of similar multidimensional recurrences.
188: 
189: \section{Formalization and Statement of Results}
190: 
191: We assume that the input to our problem consists of the following items:
192: \begin{itemize}
193: \item An integer dimension $d$.
194: \item A recurrence
195: $$F(x) = \max_i \sum_j F(x - \delta_{i,j}),$$
196: where $x$ and $\delta_{i,j}$ are vectors in $\Z^d$, and $i$ and $j$ are indices ranging over the cases and subproblems of the algorithm to be analyzed.  We assume that, as base cases for the recurrence, $F(0)=1$, and $F(y)=0$ when $y$ can not reach the zero vector by any expansion of the recurrence.
197: \item A target vector $t$ in $\Z^d$.
198: \end{itemize}
199: The desired output is a description of the asymptotic behavior of the function
200: $f(n)=F(n\,t)$.  
201: We call the expressions $\sum_j F(x-\delta_{i,j})$ {\em terms} of the recurrence,
202: and their subexpressions $F(x-\delta_{i,j})$  {\em summands} of the term.
203: Much of what we describe here would generalize without difficulty to non-integer
204: values of $x$ and $\delta_{i,j}$,  and to non-integer multipliers on the summands of each term.
205: 
206: For our Python implementation, we represent the recurrence as a dictionary mapping case names to terms, with each term represented as a list of summands and each summand represented as a $d$-tuple of integers.  For instance, the cardinality-bounded maximal independent set recurrence discussed in the introduction has the representation shown in Table~\ref{tbl:repn}.
207: This Python representation also allows the inclusion of comments in the file containing the recurrence, describing in further detail the case analysis from which the recurrence arises.
208: 
209: \begin{table}[t]
210: \begin{verbatim}smallmis = {
    "deg0": [(1,1)],
    "deg1": 2*[(2,1)],
    "deg2": 3*[(3,1)],
    "deg3": [(4,1), (1,0)],
}\end{verbatim}
211: \caption{Python representation of recurrence input.}
212: \label{tbl:repn}
213: \end{table}
214: 
215: We obtain the following results:
216: 
217: \begin{itemize}
218: \item We show that our recurrences can be upper bounded by linear univariate recurrences formed by weighting the variables, and that the optimal set of weights for this upper bound technique can be found efficiently using quasiconvex programming.
219: The optimal basis for the quasiconvex program consists of the terms forming the worst cases for the recurrence. 
220: \item We describe a numerical improvement technique for quasiconvex programming, based on multi-gradient descent, and provide two implementations of this technique: one based on floating point arithmetic and capable of running at interactive speeds for exploratory algorithm analysis, and one using an exact arithmetic package for publishable guaranteed worst case bounds.
221: \item We prove lower bounds showing that the upper bounds from our optimal weighting technique are tight to within a polynomial factor.
222: \end{itemize}
223: 
224: \section{Upper Bounds}
225: 
226: In this section we describe a weighting technique that can be used to obtain asymptotic upper bounds for our recurrences, and formulate the problem of optimizing the weights in order to obtain the best such upper bound.
227: 
228: Fix a vector $w\in\R^d$, such that, for each summand $F(x-\delta_{i,j})$
229: of the input recurrence, $w\cdot\delta_{i,j}$ is positive, let $y$ be a real variable,
230: and define
231: $$F_w(y)=\max_{w\cdot x\le y} F(x).$$
232: Then it is not hard to see, by using the recurrence for $F(x)$ to expand the right hand side of this definition, and  interchanging the order in the maximization, that
233: $$F_w(y)\le\max_i\sum_j F_w(y-w\cdot\delta_{i,j}).$$
234: As base cases set $F_w(y)=1$ for $0\le y<\min_{i,j} w\cdot\delta_{i,j}$ and $F_w(y)=0$ for $y<0$.
235: This resembles the recurrence defining $F$, but with a real instead of vector-valued argument.   For linear univariate recurrences similar to this one,
236: standard solution techniques such as generating functions and characteristic polynomials are commonly taught in freshman combinatorics courses.
237: The recurrence for $F_w$ is nonlinear because of the maximization, and involves non-integer variables, but the same techniques apply:
238: 
239: \begin{lemma}
240: $F_w(y)=O(c_w^y)$, where $c_w$ is the unique positive root of the monotonic
241: function
242: $$r_w(c)=1-\max_i\sum_j c^{-w\cdot\delta_{i,j}}.$$
243: \end{lemma}
244: 
245: It will be convenient later to separate out this analysis into the different terms of the recurrence:
246: 
247: \begin{lemma}
248: Let $c_{w,i}$ denote the unique positive root of the monotonic function
249: $r_{w,i}(c)=1-\sum_j c^{-w\cdot\delta_{i,j}}$.
250: Then $c_w=\max_i c_{w,i}$.
251: \end{lemma}
252: 
253: Since $F_w$ was defined as a maximum of values of $F$, this technique immediately leads to upper bounds for the target function
254: $f(n)=F(n\, t)$:
255: 
256: \begin{lemma}\label{lem:wup}
257: Let $w\in\R^d$ be such that, for each summand $F(x-\delta_{i,j})$
258: of the input recurrence, $w\cdot\delta_{i,j}$ is positive, and let $w\cdot t=1$.
259: Then $f(n)\le F_w(n)=O(c_w^n)$.
260: \end{lemma}
261: 
262: We call $w$ a {\em weight vector}, because if one interprets the coordinates of $w$ as weights of the different backtracking algorithm instance features counted by $x$, then $w\cdot x$ is the total weight of the instance.  The recurrence for $F_w$ and its solution $O(c_w^n)$ describes the time for the backtracking algorithm as a function of instance weight.  However, different choices of the weight vector will give different upper bounds  $O(c_w^n)$ on the time used by the backtracking algorithm.  Our task now is to select the best weight vector, that is, the one yielding the tightest upper bound.  For convenience, we define $c_{w}=c_{w,i}=+\infty$ when
263: $w\cdot\delta_{i,j}$ is non-positive for some~$j$.
264: 
265: \section{Quasiconvex Programming}
266: 
267: In this section we find efficient algorithmic solutions for the optimal weighting problem  formulated in the previous section.
268: 
269: A function $q(w):\R^d\mapsto\R$ is called {\em quasiconvex} when its level sets
270: $q^{\le\lambda}=\{w\in R^d\mid q(x)\le\lambda\}$ are all convex.  In particular, the points $w$ where $q$ achieves its minimum value (if a minimum exists) form a convex set, and an approximation to a value achieving the global minimum can be found numerically by local improvement techniques.  If the functions $q_i$ for $i$ in some finite index set $S$ are all quasiconvex, then the function $q_S(w)=\max_{i\in S} q_i(w)$ is also quasiconvex, and it becomes of interest to find a point where $q_S$ achieves its minimum value.  Amenta et al.~\cite{AmeBerEpp-Algs-99} define {\em quasiconvex programming} as a formalization of this search for the minimum of $q_S$.  Linear programming can be seen as a special case of quasiconvex programming in which all the functions $q_i$ are linear.
271: 
272: More formally, Amenta et al. define a {\em nested convex family} to be a map
273: $\kappa(\lambda)$ from the nonnegative real numbers to compact convex sets in
274: $\R^d$ such that if
275: $\lambda_1<\lambda_2$ then
276: $\kappa(\lambda_1)\subset\kappa(\lambda_2)$, and such that
277: for all $\lambda$, $\kappa(\lambda)=\bigcap_{\lambda'>\lambda}\kappa(\lambda')$.
278: Any nested convex family $\kappa$ determines
279: a quasiconvex function $q_\kappa(w) = \inf\,\{\,\lambda \mathrel{|} w \in \kappa(\lambda)\,\}$
280: on $\R^d$, with level sets $q_\kappa^{\le\lambda}=\kappa(\lambda)$.
281: Conversely, when $q$ is quasiconvex, the closures of the level sets $q^{\le\lambda}$, restricted to a compact convex subdomain of $\R^d$, form a nested convex family.
282: 
283: Amenta et al. define a {\em quasiconvex program} to be formed by
284: a set of nested convex families
285: $S=\{\kappa_1,\kappa_2,\ldots \kappa_n\}$; the task to be solved is
286: finding the value
287: $$\lambda^*(S)=
288: \inf\Big\{\,
289:                 (\lambda,w) \mathrel{\big|}
290:                                 w\in \mathop{\textstyle\bigcap}\limits_{\kappa_i\in S}\kappa_i(\lambda)
291: \Big\}
292: $$
293: where the infimum is taken in the lexicographic ordering,
294: first by $\lambda$ and then by the coordinates of~$w$.
295: 
296: Quasiconvex programs can themselves be seen as a special case of
297: {\em generalized linear programs}.
298: These are optimization problems based on an objective function that maps sets to totally ordered values and that satisfies certain axioms~\cite{Ame-DCG-94,Gar-SJC-95,MatShaWel-TR-92};
299: quasiconvex programs are generalized linear programs because the function $\lambda^*(S)$ defined above satisfies these axioms~\cite{AmeBerEpp-Algs-99}.  One consequence of this generalized linear programming formulation is that the value of any quasiconvex program is determined by a {\em basis}, a subset of the nested convex families with cardinality $O(d)$.  Another consequence is that, when the dimension $d$ is bounded, quasiconvex programs can be solved by dual-simplex based randomized algorithms that perform a number of steps linear in the number of nested convex families in the input, where each step consists of solving a constant-sized subproblem.
300: 
301: As we now show, our problem of finding an optimal weight vector for our input recurrence can be expressed in this quasiconvex programming framework.
302: 
303: \begin{lemma}\label{lem:qc}
304: Let $c_{w,i}$ be as defined in the previous section.
305: Then the function $q_i(w)=c_{w,i}$ is quasiconvex.
306: \end{lemma}
307: 
308: \begin{proof}
309: We must show that $q_i^{\le\lambda}=\{w\mid c_{w,i}\le\lambda\}$ is convex.
310: Equivalently, expanding the definition of $c_{w,i}$ and using the monotonicity
311: of the function $r_{w,i}$, we must show convexity of the set
312: $$q_i^{\le\lambda}=\{w\mid r_{w,i}(\lambda)\ge 0\}
313: =\{w\mid\sum_j c^{-w\cdot\delta_{i,j}}\le 1\}.$$
314: But each summand of the sum in the right expression is an exponential of a linear function of $w$,
315: hence convex.  A sum of convex functions is convex, and its level set is a convex set.
316: \end{proof}
317: 
318: \begin{corollary}
319: We can find a pair $(c,w)$, where $c=\inf_w c_w$, $w\cdot t=1$, and
320: $w$ is a limit point of a sequence $w_\ell$ with $c_{w_\ell}$ converging to $c$,
321: as a solution to a quasiconvex program.
322: \end{corollary}
323: 
324: \begin{proof}
325: We form nested convex families from the closures of the level sets
326: of the functions $q_i(w)=c_{w,i}$.
327: The result follows from the definition of $c_w=\max c_{w,i}$.
328: \end{proof}
329: 
330: \begin{theorem}
331: If $(c,w)$ is found as the optimal solution to the quasiconvex program
332: defined as above for a recurrence for $F(x)$ with target vector $t$,
333: then $f(n)=F(n\,t)=O(c^n)$.
334: \end{theorem}
335: 
336: \begin{proof}
337: If $c_{w,i}=c<+\infty$ for all terms $i$, the result follows from Lemma~\ref{lem:wup}.
338: And if some term has $w\cdot\delta_{i,j}<0$, or has more than
339: one summands, then $c_{w,i}=+\infty$ but any sequence $w_\ell$ converging to $w$
340: has $c_{w_\ell}$ converging to $+\infty$, so the the upper bound is infinite and the result is vacuously true.  The only remaining case is that some terms have a single summand
341: $\delta_i$ with $w\cdot\delta_i=0$.  Such terms can not contribute to asymptotic growth of $F(n\,t)$ and the result follows by applying  Lemma~\ref{lem:wup} to the recurrence formed by omitting them from the definition of $F$.
342: \end{proof}
343: 
344: Therefore, we can use quasiconvex programming to find the weight vector $w$ yielding the tightest possible upper bound $O(c^n)$ on the asymptotic behavior of our recurrences.  Further, by finding a basis for the quasiconvex program, we can determine the recurrence terms that are critical for its asymptotics. The restriction $w\cdot t=1$ does not affect quasiconvexity, and in fact aids in the solution of our problem by reducing the effective dimension of the quasiconvex program to $d-1$.
345: 
346: \section{Implementation}
347: 
348: In this section we discuss two implementations of our optimization algorithms.  Our implementations use a numerical improvement technique which we call {\em smooth quasiconvex programming} and which will also feature in our later lower bound proof.
349: 
350: As stated earlier, generalized linear programming techniques can be used to solve our quasiconvex programs in $O(\tau)$ steps, where $\tau$ denotes the number of terms and each step consists of solving a constant sized subproblem.  Such a subproblem has only a constant number of potential bases, and we can test a basis by solving an algebraic system of equations
351: in the variables $c^{w[k]}$ where $w[k]$ denotes the $k$th coordinate of $w$.
352: However this approach appears likely to be cumbersome in practice.
353: Instead, we implemented a hill-climbing scheme for finding numerically the optimal value $(c,w)$ for our quasiconvex program and for certain other quasiconvex programs.
354: 
355: We have in mind two different purposes for an implementation of our analysis algorithm.
356: First, such an implementation could be used for exploratory analysis: computing a rough estimate of the running time of an algorithm, with enough accuracy to determine whether it improves other similar algorithms for the same problem, and determining the worst cases in an algorithm's case analysis, in order to refine that analysis to produce a better algorithm.  For this sort of application, it is important that the running time be fast enough to be usable at interactive or near-interactive speeds; an implementation using floating point arithmetic is appropriate.  Second, we would like to be able to publish guaranteed worst-case bounds on algorithms, for which approximate numeric schemes such as floating point that lack error bounds are inappropriate; in this setting the longer running times associated with exact or interval arithmetic may be acceptable.
357: 
358: With these considerations in mind, we implemented our algorithm twice, once using floating point and a second time using {\tt XR}~\cite{XR}, an exact real-number computation package for the Python programming language~\cite{Python}.  The results of the second implementation are guaranteed to be valid upper bounds for the recurrence in question.  In {\tt XR}, a real number $\rho$ is represented as a data structure that is capable of
359: constructing a multiprecision integer representing the rounded value of $2^b\rho$, for any integer $b$; this rounded value is required to be within unit distance of the true value.
360: We extended this package so that it could evaluate as exact real numbers the values $c_w$ given an input vector $w$ the coordinates of which are also real numbers, by performing an appropriate binary search using the values of the function $r_w(c)$.  We then implemented a quasiconvex programming algorithm to search for a sequence of vectors $w_\ell$ converging to the infimum $(c,w)$ of the quasiconvex program.  We describe in more detail below our exact arithmetic implementation; our floating point implementation is similar.
361: 
362: \begin{figure}[t]
363: \centering\includegraphics[width=4in]{circumradius}
364: \caption{Example showing the difficulty of applying standard gradient descent methods to quasiconvex programming.  The function to be minimized is the maximum distance to any point;
365: only points within the narrow shaded intersection of circles have function values smaller than the value at point~$w$.}
366: \label{fig:cr}
367: \end{figure}
368: 
369: If all functions $q_i(w)$ are quasiconvex, the function $q(w)=\max_i q_i(w)$ is itself quasiconvex, so we can apply hill-climbing procedures to find its infimum.  However, although in our application the individual functions $q_i$ are smooth, their maximum $q$ may not be smooth, so it is difficult to apply standard gradient descent techniques.  The difficulty may be seen, for instance, in a simpler quasiconvex programming problem: determining the circumradius of a planar point set (Figure~\ref{fig:cr}).  A basis for the circumradius problem may consist of either two or three points; the quasiconvex functions used to solve the problem are simply the distances (or squared distances) from each input point, and the function $q(w)$ measures the distance from $w$ to the farthest point.  But if a point set has only two points in its basis, and our hill climbing procedure for circumradius has reached a point $w$ equidistant from these two points and near but not on their midpoint, then improvements to the function value $q(w)$ may be found only by moving $w$ in a narrow range of directions towards the midpoint.  Standard gradient descent algorithms may have a difficult time finding such an improvement direction.  Similar behavior arises naturally in our recurrence analysis problem. For instance in a recurrence in~\cite{csds0302030} for the time bound of a TSP algorithm on cubic graphs, there are only two critical terms despite the problem being defined over a three-dimensional vector space, and these two terms lead to behavior very similar to the circumradius example.
370: 
371: To avoid these difficulties, we use the following algorithm, which we call {\em smooth quasiconvex programming}.  We assume we are given as input a set of quasiconvex real functions $q_i(w)$.
372: Further, we assume that for each $i$ we also can compute a vector-valued function $q_i^*(w)$,
373: satisfying the following properties:
374: \begin{enumerate}
375: \item If $q_i(x)<q_i(w)$, then $(x-w)\cdot q_i^*(w)>0$, and
376: \item If $q_i^*(w)\cdot v>0$, then for all sufficiently small $\epsilon>0$, $q_i(w + \epsilon v)< q_i(w)$.
377: \end{enumerate}
378: For the circumradius example, for instance, $q_i^*$ should be a vector pointing from $w$ towards the $i$th input point.
379: The requirements on $q_i^*$ can be described geometrically, as follows: we assume that the level set $q_i^{\le\lambda}$
380: is a {\em smooth} convex set, one that has at each of its boundary points a unique tangent plane.
381: The vector $q_i^*(x)$ is then an inward-pointing normal vector to the tangent plane
382: to $q_i^{\le q(x)}$ at $x$ (essentially it is just the negation of the gradient of $q$). The functions $q_i(w)$ arising in our recurrence analysis problem have this smoothness property,
383: and (as we discuss in more detail in the next section) the vectors $q_i^*(w)$ can be constructed by evaluating the partial derivatives for each of the coordinates of $x$ in the expression
384: $r_{x,i}(c_w)$ at $x=w$.
385: 
386: Our smooth quasiconvex programming algorithm then consists of selecting an initial value for $w$, and a desired output tolerance,
387: and repeating the following steps:
388: \begin{enumerate}
389: \item Compute the set of vectors $q_i^*(w)$,
390: for each $i$ such that $q_i(w)$ is within the desired tolerance of $\max_i q_i(w)$.
391: \item Find a vector $v$ such that $v\cdot q_i^*(w)>0$ for each vector $q_i^*(w)$ in the computed set.
392: If no such vector exists, $q(w)$ is within the tolerance of its optimal value and the algorithm terminates.
393: \item Search for a value $\epsilon$ for which $q(w+\epsilon v)\le q(w)$,
394: and replace $w$ by $w+\epsilon v$.
395: \end{enumerate}
396: 
397: Our actual implementation augments this procedure by an outer scaling loop that gradually decreases the tolerance,
398: so that multiple terms of the recurrence can influence the computation in step 1 even when the current value of $w$ is only a rough approximation to the true optimum.  We also terminate the loop when the improvement to $q(w)$ becomes much smaller than the tolerance, even when the termination condition of step 2 is not met, in order to handle situations in which the optimal basis is less than full-dimensional.
399: 
400: The search for a vector $v$ in step 2 can be expressed as a linear program.
401: However, when the dimension of the quasiconvex program is at most two
402: (equivalently, the number of variables in the recurrence to be solved is at most three)
403: it can be solved more simply by sorting the vectors $q_i^*(w)$ radially around the origin
404: and choosing $v$ to be the average of two extreme vectors.
405: 
406: In our implementation of step 3, we perform a doubling search for the largest $\epsilon$ leading to a smaller value of $q(w+\epsilon v)$, and then reduce the resulting $\epsilon$ by a factor of two before replacing $w$,
407: to attempt to control situations where the value $w$ oscillates around the true optimal value.
408: 
409: Both due to our use of exact real arithmetic, and due to the implementation in Python, a relatively slow interpreted language, our exact arithmetic implementation is not fast, taking several hours on a laptop computer to solve moderately sized 3-variable recurrences to 64 bits of precision.  However our floating point implementation is able to run at interactive speeds, taking roughly one or two seconds on a recent-model laptop to solve recurrences such as the one in Table~\ref{tbl:bigrec}.
410: We believe that significant improvements in runtime of our exact arithmetic implementation would be possible both by tuning the implementation and by using a faster software base.  However it is encouraging that, in the trials we attempted, the algorithm appears to exhibit linear convergence to the correct function value as well as to the optimal weight vector coordinates: the number of iterations of the algorithm appears to be proportional to the number of bits of precision desired.
411: 
412: \section{Lower Bounds}
413: 
414: In this section we prove that the upper bounds found by our optimal weighting technique are tight to within a polynomial factor.
415: 
416: In order to find lower bounds for the asymptotic behavior of our recurrences, it is useful to have the following combinatorial interpretation of their values.  For any $x\in\Z^d$,
417: let $i^*(x)=\argmax_i \sum_j F(x - \delta_{i,j})$.  That is, $i^*$ is the index of the term in the recurrence that determines the value of $F(x)$.  If $\mu(x):\Z^d\mapsto\Z$
418: is any function mapping vectors to recurrence term indices, define an infinite graph $G_\mu$,
419: the vertices of which are vectors in $\Z^d$, with an edge from $x$ to $x-\delta_{i,j}$
420: whenever $i=\mu(x)$ and $\delta_{i,j}$ is a summand in term~$i$ of the recurrence.
421: Let $\pi_\mu(x)$ denote the number of paths from $x$ to zero in $G_\mu$.
422: It is not hard to show by induction that $\pi_{i^*}(x)=F(x)$, so $F$ can be interpreted as counting paths in a graph.  Moreover, for any $\mu$, $\pi_\mu(x)\le F(x)$.
423: An example of the graph $G_{i^*}$, for the recurrence discussed in the introduction, is shown in Figure~\ref{fig:gis}; each vertex $x$ in the figure is labeled with the number $\pi_{i^*}(x)=F(x)$.
424: We will find a lower bound for $F(x)$ by counting paths in a graph $G_\mu$,
425: where $\mu(x)$ will not necessarily equal $i^*(x)$.
426: 
427: \begin{figure}[t]
428: \centering
429: \includegraphics[width=5.5in]{gistar}
430: \caption{A portion of the infinite graph $G_{i^*}$ for the recurrence $T(n,k)$ for listing
431: maximal independent sets of cardinality $k$ in an $n$-vertex graph, described in the introduction.
432: The horizontal position of a vertex indicates its $n$ coordinate and the vertical position indicates its $k$ coordinate.  To simplify the drawing, edges are shown in a confluent style~\cite{cs.CG/0212046} in which
433: multiple edges are allowed to merge before reaching their destination.
434: Each vertex is labeled with the number of paths in the graph from that vertex to the origin.}
435: \label{fig:gis}
436: \end{figure}
437: 
438: Let $\lambda^*(S)=(c,w)$ be the optimal solution to the quasiconvex program formulated earlier for the upper bounds for our recurrence, and let $B$ be a {\em basis} of the quasiconvex program; that is, a minimal subset of terms of the recurrence such that the quasiconvex program formed from the subset has the same value $\lambda^*(B)=\lambda^*(S)$ as that for the whole recurrence.
439: For any term $i$ in $B$, and any summand $j$ in term $i$,
440: let $p_{i,j}=c^{-w\cdot\delta_{i,j}}$, and let $D_i=\sum p_{i,j}\delta_{i,j}$.
441: 
442: \begin{lemma}\label{lem:edgeprob}
443: For any term $i$ in $B$, $\sum_j p_{i,j}=1$.
444: \end{lemma}
445: 
446: \begin{proof}
447: If term $i$ has only one summand, then it belongs to $B$ exactly when $w\cdot\delta_{i,j}=0$ and $p_{i,j}=1$.  Otherwise, $q_i(w)$ is continuous and term $i$ can belong to $B$ only
448: when $q_i(w)=c$.  But then $r_w(c)=1-\sum p_{i,j}=0$ since $q_i$ is defined as having the value that makes this equation true, so $\sum p_{i,j}=0$.
449: \end{proof}
450: 
451: \begin{lemma}
452: The vector $D_i$ defined above is a scalar multiple of the vector $q_i^*$
453: defined for the smooth quasiconvex programming algorithm of the previous section.
454: \end{lemma}
455: 
456: \begin{proof}
457: The vector $q_i^*(w)$ is the vector such that $q_i(w+\epsilon v)<w_i(w)$ (for sufficiently small $\epsilon$) if and only if $v\cdot q_i^*>0$.  Expanding the definition of $q_i$, as the value $c$ such that $r_{w,i}(c)=0$, we see that $q_i^*(w)$ can equivalently be defined as a vector $v$ such that $r_{w+\epsilon v,i}(c)>0$ (for sufficiently small $\epsilon$) if and only if $v\cdot q_i^*>0$.Therefore, $q_i^*$ can be computed as the gradient of the vector function $\phi(x)=r_{x,i}(c)$, evaluated at $x=w$.
458: Expanding this gradient by computing partial derivatives for each component of its vector argument, we arrive at the definition of $D_i$.
459: \end{proof}
460: 
461: \begin{lemma}\label{lem:unbiased}
462: There exist values $b_i$, $0\le b_i\le 1$, so that $\sum b_i=1$
463: and so that $\sum b_i D_i$ is a scalar multiple of the target vector $t$.
464: \end{lemma}
465: 
466: \begin{proof}
467: The smooth quasiconvex programming algorithm of the previous section will find an improvement to solution $(c,w)$ whenever there exists $v$, perpendicular to $t$, having positive dot product with all the vectors $q_i^*(w)$.  The pair $(c,w)$ used to define $b_i$ and $D_i$ is assumed optimal, so can not be improved.  Therefore, $v$ does not exist and the origin must be contained in the convex hull of projections perpendicular to $t$ of the vectors $q_i^*$.  Any vector in the convex hull of a set of vectors can be expressed as a convex combination of those vectors, and the same convex combination (when viewed in $\R^d$ rather than its projection perpendicular to $t$) has the property stated in the lemma.
468: \end{proof}
469: 
470: We are now ready to describe the graph $G_\mu$ used in our lower bound construction.
471: For each $x\in Z^d$, we choose $\mu(x)$ to be one of the terms in the basis $B$,
472: independently at random, with probability $b_i$ for choosing term $i$.
473: We then let $G_\mu$ be the infinite graph formed from $\mu$ as described at the start of the section.  For each choice of $\mu$ the number of paths $\pi_\mu(n\,t)$ from $n\,t$ to the origin in $G_\mu$ forms a valid lower bound for the quantity $f(n)=F(n\,t)$ that we are trying to estimate,
474: so the expected number of paths (averaged over the choice of $\mu$) also forms a valid lower bound.
475: 
476: To count the expected number of paths in $G_\mu$, we use the following random walk process.
477: Start from the vertex $n\,t$, and then from any vertex $x$ choose randomly among the
478: edges leading away from $x$, independently for each $x$.  The set of edges at $x$ is in one-to-one correspondence with the summands of term $\mu(x)$, and we choose summand $j$ with probability $p_{\mu(x),j}$.  As shown in Lemma~\ref{lem:edgeprob} these probabilities add to one at each vertex.  We continue this random walk process until we reach a vertex $x$ with
479: $w\cdot x=0$.
480: 
481: \begin{lemma}\label{lem:equiprob}
482: If a path from $n\,t$ to the origin can be formed by the random walk described above, it has probability
483: $c^{-n}$ of being chosen.
484: \end{lemma}
485: 
486: \begin{proof}
487: More generally it is easy to show by induction on the length of the path that a path from $x$ to $y$ has probability $c^{w\cdot(y-x)}$ of being chosen.  The result follows from the choice of starting point $x=n\,t$ and the constraint that $w\cdot t=1$.
488: \end{proof}
489: 
490: \begin{lemma}\label{lem:origprob}
491: The random walk described above reaches the origin with
492: probability $\Omega(n^{(1-d)/2})$.
493: \end{lemma}
494: 
495: \begin{proof}
496: By Lemma~\ref{lem:unbiased}, the projections perpendicular to $t$ of the vertices of the path
497: form an unbiased random walk in $\R^{d-1}$, with $O(n)$ steps of constant size, and the result follows from the standard theory of such walks.
498: \end{proof}
499: 
500: \begin{theorem}
501: \label{thm:lb}
502: $f(n)=F(n\,t)=\Omega(c^n n^{(1-d)/2})$.
503: \end{theorem}
504: 
505: \begin{proof}
506: By Lemmas \ref{lem:equiprob} and~\ref{lem:origprob},
507: there is the given expected number of paths in $G_\mu$ from $n\,t$ to the origin.
508: The result follows from the facts that this number of paths is less than the
509: number of paths between the same two vertices in $G_{i^*}$ (since $i^*$ is defined to maximize the number of paths) and that the number of paths in $G_{i^*}$ is exactly $F(n\,t)$.
510: \end{proof}
511: 
512: \section{Conclusions and Open Problems}
513: 
514: We have shown that the solution to the recurrence
515: $$F(x) = \max_i \sum_j F(x - \delta_{i,j}),$$
516: for $x=n\,t$
517: may be upper and lower bounded within a polynomial factor of $c^n$,
518: where the number $c$ can be computed as the solution to a quasiconvex program
519: defined from the recurrence and the target vector~$t$.
520: 
521: It would be of interest to tighten these bounds: under what conditions can we determine the correct polynomial adjustment to the bound $c^n$?  It is consistent with our observations so far that
522: $F(n\,t)=\Theta(c^n n^{(|B|-d)/2})$ where $|B|$ is the cardinality of a basis for the quasiconvex program.  For instance this would fit the central binomial coefficients, with $|B|=1$ and $d=2$, as well as the recurrence used as an example at the start of this paper with $|B|=d=2$.  However there is too little evidence yet to state such a formula as a conjecture.
523: 
524: More generally, the work here is only a first step towards the automation of backtracking algorithm design and analysis.  Perhaps it would also be possible to automatically perform some of the case analysis used to design backtracking algorithms, and to determine the appropriate variables to use in setting up the recurrences used to analyze those algorithms, before automatically solving those recurrences, at least for simple constraint satisfaction type problems.  It would also be of interest to find ways of specifying algorithms of this type in such a way that their correctness can be proven automatically, especially in situations where repeated refinement based on our analysis tools has led to highly complex case analysis such as that appearing in Table~\ref{tbl:bigrec}.
525: Also, while we can find tight worst-case bounds on the solution of the recurrence derived from an algorithm, it may not always
526: be possible to construct an instance causing the algorithm itself to have that worst case time bound; it would be useful to determine conditions under which this recurrence-based analysis is tight.
527: 
528: In another direction, the proof of Theorem~\ref{thm:lb} hints at a theory of duality for quasiconvex programs that it would be of interest to explore.
529: 
530: \section*{Acknowledgements}
531: 
532: This research was supported in part by NSF grant CCR-9912338.
533: I would like to thank Jesper Byskov and George Lueker for helpful discussions and comments on drafts of this paper, and Keith Briggs for help with programming using~{\tt XR}.
534:  
535: \raggedright
536: \bibliographystyle{abuser}
537: \bibliography{qaba}
538:  \end{document}
539: