cs0604097/ptas.tex
1: 
2: 
3: \section{A Streaming $(1+\epsilon)$ Approximation for Haar Wavelets}
4: \label{apxschemes}
5: In this section we will provide a FPTAS for the Haar system.  The
6: algorithm will be bottom up, which is convenient from a streaming
7: point of view.  Observe that in case of general $\ell_p$ norm error,
8: we cannot disprove that the optimum solution cannot have an irrational
9: value, which is detrimental from a computational point of view.  In a
10: sense we will seek to narrow down our search space, but we will need
11: to preserve near optimality.  We will show that {\em there exists}
12: sets $R_i$ such that if the solution coefficient $z_i$ was drawn from
13: $R_i$, then {\em there exists} one solution which is close to the
14: optimum unrestricted solution (where we search over all reals).  In a
15: sense the sets $R_i$ ``rescue'' us from the search. Alternately we can
16: view those sets as a ``rounding'' of the optimal solution.  Obviously
17: such sets exist if we did not care about the error, e.g. take the all
18: zero solution. We would expect a dependence between the sets $R_i$ and
19: the error bound we seek.  We will use a type of ``dual'' wavelet
20: bases; i.e., where we use one basis to construct the coefficients and
21: another to reconstruct the function. Our bases will differ by scaling
22: factors.  We will solve the problem in the scaled bases and translate
23: the solution to the original basis.  This overall approach is similar
24: to that in \cite{GH05}, however, it is different in several details
25: critical to the proofs of running time, space complexity and
26: approximation guarantee.
27: 
28: \begin{Definition}\label{def:psi-ab}
29: Define $\psia_{j,s}=2^{-j/2}\psi_{j,s}$ and
30: $\psib_{j,s}=2^{j/2}\psi_{j,s}$. 
31: Likewise define $\phia_{j,s} = 2^{-j/2}\phi_{j,s}$.
32: \end{Definition}
33: 
34: \begin{proposition}
35: The Cascade algorithm used with $\frac1{\sqrt{2}}h[]$ computes 
36: $\langle f, \psia_i \rangle$ and $\langle f,\phia_i\rangle$.
37: \end{proposition}
38: 
39: \noindent We now use the change of basis. The next proposition is
40: clear from the definition of $\{\psi^b_i\}$.
41: 
42: \begin{proposition}
43: The problem of finding a representation $\hat{f}$ with $\{z_i\}$ and
44: basis $\{\psi_i\}$ is equivalent to finding the same representation
45: $\hat{f}$ using the coefficients $\{y_i\}$ and the basis $\{\psib_i\}$.  
46: The correspondence is $y_i = y_{j,s} = 2^{-j/2}z_{j,s}$.
47: \hide{and there are no more than $B$ non-zero $y_i$'s if and only if
48: there are no more than $B$ non-zero $z_i$.}
49: \end{proposition}
50: 
51: \begin{lemma}
52: \label{changebase1}
53: Let $\{ y^*_i\}$ be the optimal solution using the basis set
54: $\{\psib_i\}$ for the reconstruction, i.e., $\hat{f} = \sum_i
55: y^*_i\psib_i$ and $\| f - \hat{f}\|_p = \E$. Let $\{y^\rho_i\}$ be the
56: set where each $y^*_i$ is rounded to the nearest multiple of
57: $\rho$. If $f^\rho = \sum_i y^\rho_i\psib_i$ then $\|f -
58: f^\rho\|_p \leq \E + O(qn^{1/p}\rho\log n)$.
59: \end{lemma}
60: \begin{proof}
61: Let $\rho_i = y^*_i - y^\rho_i$.  By the triangle inequality, 
62: \[ \|f - f^\rho\|_p \leq \E + \norm{\sum\nolimits_i \rho_i\psib_i}_p \enspace .\]
63: Proposition~\ref{prop:qlogn-basis} and the fact that $\abs{\rho_i} \le \rho$
64: imply $\abs{\sum_k\rho_i\psib_i(k)} \le c\rho q\log n \max_i\abs{\psib_i(k)}$ 
65: for a small constant $c$.  This bound gives 
66: $\|f - f^\rho\|_p \leq \E + O(qn^{1/p}\rho\log  n \max_i \|\psib_i\|_\infty)$.
67: Now $\psib_i = \psib_{j,s} = 2^{j/2}\psi_{j,s}$, and from the proof of
68: Lemma~\ref{second} we know that for large $j$, $\|\psi_{j,s}\|_\infty$ 
69: is at most $2^{-j/2}$ times a constant. 
70: For smaller $j$, $\|\psib_{j,s}\|_\infty$ is a constant.
71: \end{proof}
72: 
73: We will provide a dynamic programming formulation using the new
74: basis. But we still need to show two results; the first concerning the
75: $y^*_i$'s and the second concerning the $a_j[]$'s. The next lemma is
76: very similar to Lemma~\ref{lb} and follows from the fact that
77: $\|\psia_{j,s}\|_1 = 2^{-j/2}\|\psi_{j,s}\|_1 \le \sqrt{2q}$.
78: \begin{lemma}
79: \label{psilemma}
80: $ - C_0\sqrt{q}\E \leq \langle f, \psia_i \rangle - y^*_i \leq C_0\sqrt{q}\E$ 
81: for some constant $C_0$.
82: \end{lemma}
83: \hide{ %%% Proof is very similar to Lemma \ref{lb}.
84: \begin{proof}
85: We can follow the proof of Lemma~\ref{lb} and use the fact that if
86: $i=(j,s)$ we have $\langle \psia_{i}, \psi_{k} \rangle =
87: 2^{-j/2}\delta_{ik}$. The only other thing we need to show is that
88: $\|\psia_i\|_1$ is a constant. This
89: follows from the proof of Lemma~\ref{second}, where we show that
90: $\|\psi_i\|_1$ is $O(2^{j/2})$ if $i$ is of scale $j$. Since
91: $\psia_i=2^{-j/2}\psi_i$ the lemma follows.
92: \end{proof}
93: }
94: %
95: Now suppose we know the optimal solution $\hat{f}$, and suppose we are
96: computing the coefficients $a_j[]$ and $d_j[]$ for both $f$ and
97: $\hat{f}$ at each step $j$ of the Cascade algorithm.  We wish to know
98: by how much their coefficients differ since bounding this gap would
99: shed more light on the solution $\hat{f}$.
100: 
101: \begin{proposition}
102:   Let $a_j[s](F)$ be $a_j[s]$ computed from $a_0[s]=F(s)$ then
103:   $a_j[s](f)-a_j[s](\hat{f})=a_j[s](f-\hat{f})$.
104: \end{proposition}
105: 
106: \begin{lemma}
107: \label{philemma}
108:   If $\|f -\hat{f}\|_p \leq\E$ then $|a_j[s](f-\hat{f})|\leq C_1\sqrt{q}\E$
109:   for some constant $C_1$. (We are using $\frac{1}{\sqrt2} h[]$.)  
110: \end{lemma}
111: \begin{proof}
112: The proof is similar to that of Lemma~\ref{lb}.
113: Let $F=f-\hat{f}$. We know $-\E \leq F(i) \leq
114: \E$. Multiplying by $|\phia_{j,s}(i)|$ and summing over all $i$ we get
115: $ -\E \|\phia_{j,s}\|_1 \leq \langle F, \phia_{j,s} \rangle =
116: a_j[s](F) \leq \E \|\phia_{j,s}\|_1$.  By definition,
117: $\phia_{j,s}=2^{-j/2}\phi_{j,s}$. Further, $\|\phi_{j,s}\|_2=1$ and
118: has at most $(2q)2^j$ non-zero values. 
119: Hence, $\|\phia_{j,s}\|_1 \leq \sqrt{2q}$.  The lemma follows.
120: \end{proof}
121: %
122: At this point we have all the pieces. Summarizing:
123: \begin{lemma}\label{lemma:summary}
124: Let $\{z_i\}$ be a solution with $B$ non-zero coefficients and with
125: representation $\hat{f}=\sum_i z_i \psi_i$.  
126: If $\|f-\hat{f}\|_p \leq \E$, then there is a solution $\{y_i\}$ with
127: $B$ non-zero coefficients and representation $f'=\sum_i y_i \psib_i$
128: such that for all $i$ we have,
129: \begin{enumerate}
130: \item[(i)] $y_i$ is a multiple of $\rho$;
131: \item[(ii)] $|y_i - \langle f,\psia_{i} \rangle | \leq C_0\sqrt{q}\E + \rho$; and,
132: \item[(iii)] $| \langle f,\phia_i \rangle - \langle f',\phia_i\rangle| \leq C_1\sqrt{q}\E +O(q\rho \log n)$,
133: \end{enumerate}
134: and $\|f -f'\|_p \leq \E + O(qn^{1/p}\rho \log n)$.
135: \end{lemma}
136: \begin{proof}
137: Rewrite $\hat{f}=\sum_i z_i \psi_i = \sum_i z_i^*\psib_i$ where 
138: $z_i^* = z_{j,s}^* = 2^{-j/2} z_{j,s}$. Let $\{y_i\}$ be the
139: solution where each $y_i$ equals $z^*_i$ rounded to the nearest multiple of
140: $\rho$. Lemmas~\ref{psilemma} and~\ref{philemma} bound the $z_i^*$'s thus
141: providing properties (ii) and (iii). Finally, Lemma~\ref{changebase1}
142: gives the approximation guarantee of $\{y_i\}$.
143: \end{proof}
144: 
145: The above lemma ensures the existence of a solution $\{y_i\}$ that is
146: $O(qn^{1/p}\rho \log n)$ away from the optimal solution and that
147: possesses some useful properties which we shall exploit for designing
148: our algorithms.  Each coefficient $y_i$ in this solution is a multiple
149: of a parameter $\rho$ that we are free to choose, and it is a constant
150: multiple of $\E$ away from the $i^\text{th}$ wavelet coefficient of
151: $f$.  Further, without knowing the values of those coefficients
152: $y_{j,s}$ contributing to the reconstruction of a certain point
153: $f'(i)$, we are guaranteed that during the incremental reconstruction
154: of $f'(i)$ using the cascade algorithm, every $a_j[s](f')$ in the
155: support of $f'(i)$ is a constant multiple of $\E$ away from $a_j[s](f)
156: = \langle f, \phia_{j,s}\rangle$.  This last property allows us to
157: design our algorithms in a bottom-up fashion making them suitable for
158: data streams.  Finally, since we may choose $\rho$, setting it
159: appropriately results in true factor approximation algorithms. Details
160: of our algorithms follow.
161: 
162: \subsection{The Algorithm: A Simple Version}\label{sec:HaarAlgo}
163: We will assume here that we know the optimal error $\E$.  This
164: assumption can be circumvented by running $O(\log n)$ instances of the
165: algorithm presented below `in parallel', each with a different guess
166: of the error.  This will increase the time and space requirements of
167: the algorithm by a $O(\log n)$ factor, which is accounted for in
168: Theorem~\ref{mainthm} (and also in Theorem~\ref{mainthm2}). We detail
169: the guessing procedure in Section~\ref{sec:guesses}.  Our algorithm
170: will be given $\E$ and the desired approximation parameter $\epsilon$
171: as inputs (see Fig.~\ref{fig:apx}). 
172: \medskip
173: 
174: The Haar wavelet basis naturally form a complete binary tree, termed
175: the \emph{coefficient tree}, since their support sets are nested and
176: are of size powers of $2$ (with one additional node as a parent of the
177: tree). The data elements correspond to the leaves, and the
178: coefficients correspond to the non-leaf nodes of the tree. Assigning a
179: value $y$ to the coefficient corresponds to assigning $+y$ to all the
180: leaves that are {\em left descendants} (descendants of the left child)
181: and $-y$ to all right descendants (recall the definition of
182: $\{\psib_i\}$).  The leaves that are descendants of a node in the
183: coefficient tree are termed the {\em support} of the coefficient.
184: 
185: \begin{Definition}
186: Let $E[i,v,b]$ be the minimum possible contribution to the overall
187: error from all descendants of node $i$ using exactly $b$ coefficients,
188: under the assumption that ancestor coefficients of $i$ will add up to
189: the value $v$ at $i$ (taking account of the signs) in the final
190: solution.
191: \end{Definition}
192: 
193: The value $v$ will be set later for a subtree as more data
194: arrive. Note that the definition is bottom up and after we compute the
195: table, we do not need to remember the data items in the subtree. As
196: the reader would have guessed, this second property will be
197: significant for streaming.
198: 
199: The overall answer is $\min_b E[root,0,b]$---by the time we are at the
200: root, we have looked at all the data and no ancestors exist to set a
201: non-zero $v$. A natural dynamic program arises whose idea is as
202: follows: Let $i_L$ and $i_R$ be node $i$'s left and right children
203: respectively.  In order to compute $E[i,v,b]$, we guess the
204: coefficient of node $i$ and minimize over the error produced by $i_L$
205: and $i_R$ that results from our choice.  Specifically, the computation
206: is:
207: 
208: \begin{enumerate}
209: \item A non-root node computes $E[i,v,b]$ as follows:
210: \vspace{-0.05in}
211: \[ \min \left \{ \begin{array}{l}
212: \min_{r,b'} E[i_L,v+r,b'] + E[i_R,v-r,b-b'-1] \\
213: \min_{b'} E[i_L,v,b'] + E[i_R,v,b-b']
214: \end{array} \right.
215: \]
216: where the upper term computes the error if the $i^{th}$ coefficient is
217: chosen and it's value is $r\in R_i$ where $R_i$ is the set of
218: multiples of $\rho$ between $\langle f, \psia_i\rangle -
219: C_0\sqrt{q}\E$ and $\langle f, \psia_i\rangle + C_0\sqrt{q}\E$; and
220: the lower term computes the error if the $i^{th}$ coefficient is not
221: chosen.
222: 
223: \item  Then the root node computes: 
224: \[ \min \left \{
225: \begin{array}{ll}
226: \min_{r,b'} E[i_C,r,b'-1] & \mbox{root coefficient is $r$}\\
227: \min_{b'} E[i_C,0,b'] & \mbox{root not chosen}
228: \end{array} \right.
229: \]
230: where $i_C$ is the root's only child.
231: \end{enumerate}
232: 
233: The streaming algorithm will
234: borrow from the paradigm of reduce-merge. The high level idea
235: will be to construct and maintain a small table of possibilities
236: for each resolution of the data. On seeing each item $f(i)$, we
237: will first find out the best choices of the wavelets of length one
238: (over all future inputs) and then, if appropriate,
239: construct/update a table for wavelets of length $2,4,\ldots$ etc.
240: 
241: The idea of subdividing the data, computing some information and
242: merging results from adjacent divisions were used in \cite{GMMO00}
243: for stream clustering. The stream computation of wavelets in
244: \cite{GKMS01} can be viewed as a similar idea---where the
245: divisions corresponds to the support of the wavelet basis vectors.
246: 
247: 
248: Our streaming algorithm will compute the error arrays
249: $E[i,\cdot,\cdot]$ associated with the internal nodes of the coefficient
250: tree in a post-order fashion. Recall that the wavelet basis
251: vectors, which are described in Section~\ref{sec:prelim}, form a
252: complete binary tree. For example, the scaled basis vectors for nodes $4,
253: 3, 1$ and $2$ in the tree of Fig.~\ref{fig:salg123} are
254: $[1,1,1,1]$, $[1,1,-1,-1]$, $[1,-1,0,0]$ and $[0,0,1,-1]$
255: respectively. The data elements correspond to the leaves of the
256: tree and the coefficients of the synopsis correspond to its
257: internal nodes. 
258: \eat{
259: Hence, assigning the value $c$ to node $2$ (equivalently, setting
260: $z_2=c$) for example corresponds to adding $c$ to $\wai(Z)_1$ and
261: $\wai(Z)_2$, and adding $-c$ to $\wai(Z)_3$ and $\wai(Z)_4$.
262: }
263: 
264: We need not store the error array for every internal node since, in
265: order to compute $E[i,v,b]$ our algorithm only requires that
266: $E[i_L,\cdot,\cdot ]$ and $E[i_R,\cdot,\cdot ]$ be known.  Therefore,
267: it is natural to perform the computation of the error arrays in a
268: post-order fashion. An example best illustrates the procedure. Suppose
269: $f = \langle x_1,x_2,x_3,x_4\rangle$. In Fig.~\ref{fig:salg123} when
270: element $x_1$ arrives, the algorithm computes the error array
271: associated with $x_1$, call it $E_{x_1}$.  When element $x_2$ arrives
272: $E_{x_2}$ is computed.  The array $E[1,\cdot,\cdot ]$ is then computed
273: and $E_{x_1}$ and $E_{x_2}$ are discarded. Array $E_{x_3}$ is computed
274: when $x_3$ arrives.  Finally the arrival of $x_4$ triggers the
275: computations of the rest of the arrays as in Fig.~\ref{fig:salg456}.
276: %
277: \begin{figure}
278: \centering
279: \subfigure[The arrival of the first $3$ elements.]{\label{fig:salg123}
280: \begin{minipage}[t]{1.2in}
281: \centering \includegraphics[width=1in]{salg1}
282: \end{minipage}
283: \begin{minipage}[t]{1.2in}
284: \centering \includegraphics[width=1in]{salg2}
285: \end{minipage}
286: }  \subfigure[The arrival of $x_4$]{\label{fig:salg456}
287: \begin{minipage}[t]{1.2in}
288: \centering \includegraphics[width=1in]{salg4}
289: \end{minipage}
290: \begin{minipage}[t]{1.2in}
291: \centering \includegraphics[width=1in]{salg5}
292: \end{minipage}}
293: \caption{Upon seeing $x_2$ node $1$ computes
294: $\mbox{$E[1,\cdot,\cdot]$}$ and the two error arrays associated with
295: $x_1$ and $x_2$ are discarded.  Element $x_4$ triggers the computation
296: of $\mbox{$E[2, \cdot, \cdot ]$}$ and the two error arrays associated
297: with $x_3$ and $x_4$ are discarded. Subsequently, $\mbox{$E[3,\cdot,
298: \cdot ]$}$ is computed from $\mbox{$E[1,\cdot,\cdot]$}$ and
299: $\mbox{$E[2,\cdot,\cdot ]$}$ and both the latter arrays are
300: discarded. If $x_4$ is the last element on the stream, the root's
301: error array, $\mbox{$E[3,\cdot,\cdot ]$}$, is computed from
302: $\mbox{$E[2,\cdot,\cdot]$}$.}
303: \end{figure}
304: %
305: Note that at any point in time, there is only one error array stored
306: at each \emph{level} of the tree.  In fact, the computation of the
307: error arrays resembles a binary counter.  We start with an empty queue
308: $Q$ of error arrays. When $x_1$ arrives, $E_{q_0}$ is added to $Q$ and
309: the error associated with $x_1$ is stored in it.  When $x_2$ arrives,
310: a temporary node is created to store the error array associated with
311: $x_2$.  It is immediately used to compute an error array that is added
312: to $Q$ as $E_{q_1}$. Node $E_{q_0}$ is emptied, and it is filled again
313: upon the arrival of $x_3$. When $x_4$ arrives: (1) a temporary
314: $E_{t_1}$ is created to store the error associated with $x_4$; (2)
315: $E_{t_1}$ and $E_{q_0}$ are used to create $E_{t_2}$; $E_{t_1}$ is
316: discarded and $E_{q_0}$ is emptied; (3) $E_{t_2}$ and $E_{q_1}$ are
317: used to create $E_{q_2}$ which in turn is added to the queue;
318: $E_{t_2}$ is discarded and $E_{q_1}$ is emptied.  
319: The algorithm  for $\ell_\infty$ is shown in Fig.~\ref{fig:apx}.
320: 
321: %\begin{figure*}[htb]
322: \clearpage
323: \begin{figure}
324: \framebox[6.7in]{\parbox{6.5in}{
325: \begin{algorithm}{HaarPTAS}[B,\E,\epsilon]{\label{alg:apx}}
326: Let $\rho = \epsilon\E/(c q \log n)$ for some suitably
327: large constant $c$.  Note that $q=1$ in the Haar case.\\
328: Initialize a queue $Q$ with one node $q_0$ \qcomment{Each $q_i$
329: contains an array $E_{q_i}$ of size at most
330: $R\min\{B, 2^i\}$ and a flag {\tt isEmpty}}\\
331: {\bf repeat} Until there are no elements in the stream\\
332: Get the next element from the stream, call it $e$\\
333: \qif $q_0$ is empty \\
334: \qthen Set $q_0.a = e$. For all values $r$ s.t.~$|r -e| \leq c_1 \E$
335:   where $c_1$ is a large enough constant and $r$ is a multiple of
336:   $\rho$, initialize the table $E_{q_0}[r, 0] =
337:   |r-e|$\label{step:baseE} \\
338: \qelse Create $t_1$ and Initialize $E_{t_1}[r, 0] =|r-e|$ \emph{as in
339: Step \ref{step:baseE}}.\\
340: \qfor $i=1$ until the $1^\text{st}$ empty $q_i$ or end of $Q$ \\
341: \qdo Create a temporary node $t_2$.\\
342: Compute $t_2.a = \langle f,\phia_i\rangle$ and the wavelet coefficient
343: $t_2.o=\langle f, \psia_i\rangle$. This involves using the $a$ values
344: of $t_{1}$ and $q_{i-1}$ ($t_2$'s two children in the coefficient
345: tree) and taking their average to compute $t_2.u$ and their difference
346: divided by $2$ to compute $t_2.o$. (Recall that we are using
347: $\frac{1}{\sqrt{2}}h[]$).\\
348: For all values $r$ that are multiples of $\rho$ with $|r -t_2.a| \leq
349:   c_1(\E + \rho\log n)$, compute the table $E_{t_2}[r, b]$ for all $0\leq b \leq
350:   B$. This uses the tables of the two children $t_{1}$ and
351:   $q_{i-1}$. The size of the table is $O(\epsilon^{-1}Bn^{1/p}\log
352:   n)$. (Note that the value of a chosen coefficient at node $t_2$ is at
353:   most a constant multiple of $\E$ away from $t_2.o$. Keeping track of
354:   the chosen coefficients (the answer) costs $O(B)$ factor space
355:   more.)\label{step:generalE}\\
356: Set $t_1 \leftarrow t_2$ and Discard $t_2$\\
357: Set $q_i.\mathtt{isEmtpy} = \mbox{true}$
358: \qrof \\
359: \qif we reached the end of $Q$ \\
360: \qthen Create the node $q_i$ \qfi \\
361: Compute $E_{q_i}[r, b\in B]$ from $t_{1}$ and $q_{i-1}$ \emph{as in
362: Step \ref{step:generalE}}.\\
363: Set $q_i.\mathtt{isEmpty} = \mbox{false}$ and Discard $t_{1}$ \qfi
364: \end{algorithm}
365: }}
366: \caption{The Haar streaming FPTAS for $\ell_\infty$.}
367: \label{fig:apx}
368: \end{figure}
369: \clearpage
370: 
371: %If at any point of time the number of coefficients larger than $\E$
372: %exceeds $B$ then we know our guess of $\E$ is wrong and we abort that
373: %thread.
374:   \subsubsection{Guessing the Optimal Error}\label{sec:guesses}
375: We have so far assumed that we know the optimal error $\E$. As
376: mentioned at the beginning of Section~\ref{sec:HaarAlgo}, we will
377: avoid this assumption by running multiple instances of our algorithm
378: and supplying each instance a different guess $G_k$ of the error.  We
379: will also provide every instance $A_k$ of the algorithm with
380: $\epsilon' = \frac{\sqrt{1+4\epsilon}-1}{2}$ as the approximation
381: parameter.  The reason for this will be apparent shortly.  Our final
382: answer will be that of the instance with the minimum representation
383: error.
384: 
385: Theorem~\ref{mainthm} shows that the running time and space
386: requirements of our algorithm do not depend on the supplied error
387: parameter.  However, the algorithm's search ranges {\it do} depend on
388: the given error. Hence, as long as $G_k\ge\E$ the ranges searched by
389: the $k^\text{th}$ instance will include the ranges specified by
390: Lemma~\ref{lemma:summary}.  Lemma~\ref{lemma:summary} also tells us
391: that if we search these ranges in multiples of $\rho$, then we will
392: find a solution whose approximation guarantee is $\E+ c q
393: n^{1/p}\rho\log n$.  Our algorithm chooses $\rho$ so that its running
394: time does not depend on the supplied error parameter.  Hence, given
395: $G_k$ and $\epsilon'$, algorithm $A_k$ sets $\rho = \epsilon'G_k/(c q
396: n^{1/p}\log n)$.  Consequently, its approximation guarantee is $\E +
397: \epsilon' G_k$.
398: 
399: Now if guess $G_k$ is much larger than the optimal error $\E$, then
400: instance $A_k$ will not provide a good approximation of the optimal
401: representation.  However, if $G_k \le (1+\epsilon')\E$, then $A_k$'s
402: guarantee will be $\E+ \epsilon'(1+\epsilon')\E = (1+\epsilon)\E$
403: because of our choice of $\epsilon'$.  To summarize, in order to
404: obtain the desired $(1+\epsilon)$ approximation, we simply need to
405: ensure that one of our guesses (call it $G_{k^*}$) satisfies
406: \begin{equation*}\label{eq:guess}
407: \E \le\ G_{k^*} \le\ (1+\epsilon')\E
408: \end{equation*}
409: Setting $G_k = (1+\epsilon')^k$, the above bounds will be satisfied
410: when 
411: $k = k^* \in [\log_{1+\epsilon'}(\E),\ \log_{1+\epsilon'}(\E) +1]$.  
412: 
413: \paragraph*{Number of guesses}
414: Note that the optimal error $\E = 0$ if and only if $f$ has at 
415: most $B$ non-zero expansion coefficients $\langle f, \psi_i\rangle$. 
416: We can find these coefficients easily in a streaming fashion.
417: 
418: Since we assume that the entries in the given $f$ are polynomially
419: bounded, by the system of equations~\eqref{sys} we know that the
420: optimum error is at least as much as the $(B+1)^{\text{st}}$ largest
421: coefficient. Now any coefficient ($\langle f, \psia_k\rangle$) is the
422: sum of the left half minus the sum of the right half of the $f_i$'s
423: that are in the support of the basis and the total is divided by the
424: length of the support. Thus if the smallest non-zero number in the
425: input is $n^{-c}$ then the smallest non-zero wavelet coefficient is at
426: least $n^{-(c+1)}$. By the same logic the largest non-zero coefficient
427: is $n^c$.  Hence, it suffices to make $O(\log n)$ guesses.
428: 
429: 
430: \medskip
431: \subsection{Analysis of the Simple Algorithm}
432: \label{sec:algspacetime}
433: The size of the error table at node $i$, $E[i,\cdot,\cdot]$, is
434: $R_\phi \min\{B, 2^{t_i}\}$ where $R_\phi = 2C_1\E/\rho+\log n$ and $t_i$ is
435: the height of node $i$ in the Haar coefficient tree (the leaves have
436: height $0$). Note that $q=1$ in the Haar case.  Computing each entry
437: of $E[i,\cdot,\cdot]$ takes $O(R_\psi\min\{B, 2^{t_i}\})$ time where
438: $R_\psi = 2C_0\E/\rho+2$. Hence, letting $R = \max\{R_\phi, R_\psi\}$,
439: the total running time is $O(R^2B^2)$ for computing the root table
440: plus $O(\sum_{i=1}^n \left(R\min \{ 2^{t_i},B\}\right)^2)$ for
441: computing all the other error tables. Now,
442: \begin{eqnarray*}
443: \sum_{i=1}^n \left(R \min \{ 2^{t_i},B \}\right)^2
444: & = & R^2 \sum_{t=1}^{\log n} \frac{n}{2^t} \min \{ 2^{2t},B^2\} \\
445: & = & nR^2\left(\sum_{t=1}^{\log B}2^t + \sum_{t=\log B +1}^{\log n} \frac{B^2}{2^t}\right) \\
446: %&=& n|R|^2\left((2B-2) + \sum_{u=1}^{\log (n/B)}\frac{B}{2^{u}}\right)\\
447: & = & O(R^2nB) \enspace ,
448: \end{eqnarray*}
449: where the first equality follows from the fact that the number of
450: nodes at level $t$ is $\frac{n}{2^t}$. For $\ell_\infty$, when
451: computing $E[i,v,b]$ we do not need to range over all values of
452: $B$. For a specific $r\in R_i$, we can find the value of $b'$ that
453: minimizes $\max\{E[i_L,v+r,b'], E[i_R,v-r,b-b'-1]\}$ using binary
454: search. The running time thus becomes,
455: \[
456: \sum_{t} R^2 \frac{n}{2^t} \min \{t2^{t},B \log B \} = O(nR^2\log^2 B) \enspace .
457: \]
458: The bottom up dynamic programming will require us to store the error tables 
459: along at most two leaf to root paths. Thus the required space is,
460: \[ 2 \sum_{t} R \min \{2^{t},B \} = O(RB(1+\log \frac{n}{B})) \enspace .\]
461: %
462: Since we set $\rho=\epsilon\E/(c n^{1/p}\log n)$, we have 
463: $\mbox{$R = O((n^{1/p}\log n)/\epsilon)$}$.
464: \medskip
465: 
466: \begin{theorem}
467: \label{mainthm}
468: Algorithm~\ref{alg:apx} is a $O(\epsilon^{-1}B^2n^{1/p}\log^3 n)$ space
469: algorithm that computes a $(1+\epsilon)$ approximation to the best
470: $B$-term unrestricted representation of a signal in the Haar
471: system. Under the $\ell_p$ norm, the algorithm runs in time
472: $O(\epsilon^{-2}n^{1+2/p}B\log^3 n)$.  Under $\ell_\infty$ the running
473: time becomes $O(\epsilon^{-2}n\log^2 B\log^3 n)$.
474: \end{theorem}
475: \medskip
476: 
477: The extra $B$ factor in the space required by the algorithm accounts
478: for keeping track of the chosen coefficients.
479: \smallskip
480: 
481: \subsection{An Improved Algorithm and Analysis}
482: For large $n$ (compared to $B$), we gain in running time if we change the
483: rounding scheme given by Lemma~\ref{changebase1}.  The granularity at
484: which we search for the value of a coefficient will be fine if the
485: coefficient lies toward the top of the tree, and it will be coarse if
486: the coefficient lies toward the bottom. The idea is that, for small
487: $\ell_p$ norms, a mistake in a coefficient high in the tree affects
488: everyone, whereas mistakes at the bottom are more localized.  This
489: idea utilizes the strong locality property of the Haar basis.  We
490: start with the lemma analogous to Lemma~\ref{changebase1}.
491: 
492: \begin{lemma}
493: \label{changebase3}
494: Let $\{ y^*_i\}$, $i = (t_i,s)$ be the optimal solution using the
495: basis set $\{\psib_i\}$ for the reconstruction, i.e., $\hat{f} =
496: \sum_i y^*_i\psib_i$ and $\| f - \hat{f}\|_p = \E$. Here $t_i$ is the
497: height of node $i$ in the Haar coefficient tree.  Let $\{y^\rho_i\}$
498: be the set where each $y^*_i$ is first rounded to the nearest multiple
499: of $\rho_{t_i} = \epsilon\E / (2B 2^{t_i/p})$ then the resulting value
500: is rounded to the nearest multiple of $\rho_{t_\text{root}} =
501: \epsilon\E/(2Bn^{1/p})$. If $f^\rho = \sum_i y^\rho_i\psib_i$ then
502: $\|f - f^\rho\|_p \leq (1+\epsilon)\E$.
503: \end{lemma}
504: \begin{proof}
505: As in Lemma~\ref{changebase1}, we need to estimate
506: $\norm{\sum\nolimits_i (y_i^\rho-y_i^*)\psib_i}_p$ but using the new
507: rounding scheme.  Let $\mathcal{S}$ be the set of indices $i$ such
508: that $y_i \ne 0$.
509: \begin{IEEEeqnarray*}{rCl}
510: \norm{\sum\nolimits_{i\in\mathcal{S}} (y_i^\rho-y_i^*)\psib_i}_p 
511: & \ \le\  & \sum\nolimits_{i\in\mathcal{S}}\norm{(y_i^\rho-y_i^*)\psib_i}_p \\
512: &\ \le \ & \sum\nolimits_{i\in\mathcal{S}}(\rho_{t_i} + \rho_{t_\text{root}})\norm{\psib_i}_p \\
513: &\ \le \ & 2\sum\nolimits_{i\in\mathcal{S}}\rho_{t_i} 2^{t_i/p} \enspace .
514: \end{IEEEeqnarray*}
515: The last inequality follows from the fact that $2^{t_i}$ components of
516: $\psib_i$ are equal to one and the rest are zero. The approximation
517: hence follows from $|\mathcal{S}| \le B$ and our choices of
518: $\rho_{t_i}$.
519: \end{proof}
520: 
521: The granularity of the dynamic programming tables $E[i,\cdot,\cdot]$
522: is set according to the smallest $\rho_{t_i}$ which is
523: $\rho_{t_\text{root}} = \epsilon\E/(2Bn^{1/p})$. This allows their
524: values to align correctly.  More specifically, when a coefficient is
525: not chosen we compute (see Section~\ref{sec:HaarAlgo})
526: \[ E[i,v,b] = \min_{b'} E[i_L, v, b'] + E[i_R, v, b-b']\enspace . \]
527: A value $v$ will that is not outside the range of $E[i_L,\cdot,\cdot]$
528: and $E[i_R,\cdot,\cdot]$ will be a correct index into these two
529: arrays.  We gain from this rounding scheme, however, when we are
530: searching for a value to assign to node $i$.  If $i$ is chosen, we can
531: search for its value in the range 
532: $\langle f, \psia_i\rangle \pm 2C_0\E/\rho$ in multiples of $\rho_{t_i}$.  
533: Hence, as mentioned earlier, the granularity of our search will be
534: fine for nodes at top levels and coarse for nodes at lower levels.
535: More formally, if $i$ is chosen, we compute
536: \[ E[i,v,b] = \min_{r,b'} E[i_L, v+r, b'] + E[i_R, v-r, b-b'-1]\enspace ,\]
537: where we search for the best $r$ in multiples of $\rho_{t_i}$.
538: The value $v+r$ (resp.~$v-r$) may not index correctly into
539: $E[i_L,\cdot,\cdot]$ (resp.~$E[i_R,\cdot,\cdot]$) since 
540: $\rho_{t_{i}} = 2^{d/p}\rho_{t_{\text{root}}}$ where 
541: $d = t_{root} - t_i$. Hence, we need to round each value of $r$ we
542: wish to check to the nearest multiple of
543: $\rho_{t_{\text{root}}}$. This extra rounding is accounted for in
544: Lemma~\ref{changebase3}.
545: 
546: Letting $R$ be the number of values each table holds and 
547: $R_{t_i} = 2C_0\E/\rho_{t_i} + 2$ be the number of entries we search
548: at node $i$, and using an analysis similar to that of
549: Section~\ref{sec:algspacetime}, the running time (ignoring constant
550: factors) becomes,
551: \begin{align*}
552: O(\sum_{i=1}^n RR_{t_i}\min\{2^{2t}, B^2\})
553: &\ =\ O(R \sum_{t=1}^{\log n} \frac{n}{2^t}\frac{B2^{t/p}}{\epsilon} \min\{2^{2t}, B^2\}) \\
554: &\ =\ O(\frac{nRB}{\epsilon}\left(\sum_{t=1}^{\log B} 2^{t/p+t} + B^2\sum_{t=\log B +1}^{\log n}2^{t/p-t}\right)) \\
555: &\ =\ O(\frac{nRB}{\epsilon}B^{1+1/p})
556: \end{align*}
557: Hence, since $R = O(n^{1/p}B/\epsilon)$ based on the granularity
558: $\rho_{t_\text{root}}$, the running time for each instance of the
559: algorithm is $O((nB)^{1+1/p}B^2/\epsilon^2)$.  The space requirement
560: is the same as that of the simpler algorithm; namely, $O(RB\log n)$.
561: \smallskip
562: 
563: \begin{theorem}\label{mainthm2}
564: The above algorithm (with the new rounding scheme) is a
565: $O(\epsilon^{-1}B^3n^{1/p}\log^2 n)$ space algorithm that computes a
566: $(1+\epsilon)$ approximation to the best $B$-term unrestricted
567: representation of a signal in the Haar system under the $\ell_p$ norm.
568: The algorithm runs in time $O(\epsilon^{-2}(nB)^{1+1/p}B^2\log n)$.
569: \end{theorem}
570: \medskip
571: 
572: Again, and as in Theorem~\ref{mainthm}, the extra $B$ factor in the
573: space requirement accounts for keeping track of the chosen
574: coefficients, and the extra $\log n$ factor in both the space and time
575: requirements accounts for the guessing of the error. 
576: 
577: We choose the better of the two algorithms (or rounding schemes) whose
578: approximation and time and space requirements are guaranteed by
579: Theorems~\ref{mainthm} and~\ref{mainthm2}.