0604:cs0604097/ptas.tex

1:

2:

3: \section{A Streaming $(1+\epsilon)$ Approximation for Haar Wavelets}

4: \label{apxschemes}

5: In this section we will provide a FPTAS for the Haar system.  The

6: algorithm will be bottom up, which is convenient from a streaming

7: point of view.  Observe that in case of general $\ell_p$ norm error,

8: we cannot disprove that the optimum solution cannot have an irrational

9: value, which is detrimental from a computational point of view.  In a

10: sense we will seek to narrow down our search space, but we will need

11: to preserve near optimality.  We will show that {\em there exists}

12: sets $R_i$ such that if the solution coefficient $z_i$ was drawn from

13: $R_i$, then {\em there exists} one solution which is close to the

14: optimum unrestricted solution (where we search over all reals).  In a

15: sense the sets $R_i$ ``rescue'' us from the search. Alternately we can

16: view those sets as a ``rounding'' of the optimal solution.  Obviously

17: such sets exist if we did not care about the error, e.g. take the all

18: zero solution. We would expect a dependence between the sets $R_i$ and

19: the error bound we seek.  We will use a type of ``dual'' wavelet

20: bases; i.e., where we use one basis to construct the coefficients and

21: another to reconstruct the function. Our bases will differ by scaling

22: factors.  We will solve the problem in the scaled bases and translate

23: the solution to the original basis.  This overall approach is similar

24: to that in \cite{GH05}, however, it is different in several details

25: critical to the proofs of running time, space complexity and

26: approximation guarantee.

27:

28: \begin{Definition}\label{def:psi-ab}

29: Define $\psia_{j,s}=2^{-j/2}\psi_{j,s}$ and

30: $\psib_{j,s}=2^{j/2}\psi_{j,s}$.

31: Likewise define $\phia_{j,s} = 2^{-j/2}\phi_{j,s}$.

32: \end{Definition}

33:

34: \begin{proposition}

35: The Cascade algorithm used with $\frac1{\sqrt{2}}h[]$ computes

36: $\langle f, \psia_i \rangle$ and $\langle f,\phia_i\rangle$.

37: \end{proposition}

38:

39: \noindent We now use the change of basis. The next proposition is

40: clear from the definition of $\{\psi^b_i\}$.

41:

42: \begin{proposition}

43: The problem of finding a representation $\hat{f}$ with $\{z_i\}$ and

44: basis $\{\psi_i\}$ is equivalent to finding the same representation

45: $\hat{f}$ using the coefficients $\{y_i\}$ and the basis $\{\psib_i\}$.

46: The correspondence is $y_i = y_{j,s} = 2^{-j/2}z_{j,s}$.

47: \hide{and there are no more than $B$ non-zero $y_i$'s if and only if

48: there are no more than $B$ non-zero $z_i$.}

49: \end{proposition}

50:

51: \begin{lemma}

52: \label{changebase1}

53: Let $\{ y^*_i\}$ be the optimal solution using the basis set

54: $\{\psib_i\}$ for the reconstruction, i.e., $\hat{f} = \sum_i

55: y^*_i\psib_i$ and $\| f - \hat{f}\|_p = \E$. Let $\{y^\rho_i\}$ be the

56: set where each $y^*_i$ is rounded to the nearest multiple of

57: $\rho$. If $f^\rho = \sum_i y^\rho_i\psib_i$ then $\|f -

58: f^\rho\|_p \leq \E + O(qn^{1/p}\rho\log n)$.

59: \end{lemma}

60: \begin{proof}

61: Let $\rho_i = y^*_i - y^\rho_i$.  By the triangle inequality,

62: \[ \|f - f^\rho\|_p \leq \E + \norm{\sum\nolimits_i \rho_i\psib_i}_p \enspace .\]

63: Proposition~\ref{prop:qlogn-basis} and the fact that $\abs{\rho_i} \le \rho$

64: imply $\abs{\sum_k\rho_i\psib_i(k)} \le c\rho q\log n \max_i\abs{\psib_i(k)}$

65: for a small constant $c$.  This bound gives

66: $\|f - f^\rho\|_p \leq \E + O(qn^{1/p}\rho\log  n \max_i \|\psib_i\|_\infty)$.

67: Now $\psib_i = \psib_{j,s} = 2^{j/2}\psi_{j,s}$, and from the proof of

68: Lemma~\ref{second} we know that for large $j$, $\|\psi_{j,s}\|_\infty$

69: is at most $2^{-j/2}$ times a constant.

70: For smaller $j$, $\|\psib_{j,s}\|_\infty$ is a constant.

71: \end{proof}

72:

73: We will provide a dynamic programming formulation using the new

74: basis. But we still need to show two results; the first concerning the

75: $y^*_i$'s and the second concerning the $a_j[]$'s. The next lemma is

76: very similar to Lemma~\ref{lb} and follows from the fact that

77: $\|\psia_{j,s}\|_1 = 2^{-j/2}\|\psi_{j,s}\|_1 \le \sqrt{2q}$.

78: \begin{lemma}

79: \label{psilemma}

80: $ - C_0\sqrt{q}\E \leq \langle f, \psia_i \rangle - y^*_i \leq C_0\sqrt{q}\E$

81: for some constant $C_0$.

82: \end{lemma}

83: \hide{ %%% Proof is very similar to Lemma \ref{lb}.

84: \begin{proof}

85: We can follow the proof of Lemma~\ref{lb} and use the fact that if

86: $i=(j,s)$ we have $\langle \psia_{i}, \psi_{k} \rangle =

87: 2^{-j/2}\delta_{ik}$. The only other thing we need to show is that

88: $\|\psia_i\|_1$ is a constant. This

89: follows from the proof of Lemma~\ref{second}, where we show that

90: $\|\psi_i\|_1$ is $O(2^{j/2})$ if $i$ is of scale $j$. Since

91: $\psia_i=2^{-j/2}\psi_i$ the lemma follows.

92: \end{proof}

93: }

94: %

95: Now suppose we know the optimal solution $\hat{f}$, and suppose we are

96: computing the coefficients $a_j[]$ and $d_j[]$ for both $f$ and

97: $\hat{f}$ at each step $j$ of the Cascade algorithm.  We wish to know

98: by how much their coefficients differ since bounding this gap would

99: shed more light on the solution $\hat{f}$.

100:

101: \begin{proposition}

102:   Let $a_j[s](F)$ be $a_j[s]$ computed from $a_0[s]=F(s)$ then

103:   $a_j[s](f)-a_j[s](\hat{f})=a_j[s](f-\hat{f})$.

104: \end{proposition}

105:

106: \begin{lemma}

107: \label{philemma}

108:   If $\|f -\hat{f}\|_p \leq\E$ then $|a_j[s](f-\hat{f})|\leq C_1\sqrt{q}\E$

109:   for some constant $C_1$. (We are using $\frac{1}{\sqrt2} h[]$.)

110: \end{lemma}

111: \begin{proof}

112: The proof is similar to that of Lemma~\ref{lb}.

113: Let $F=f-\hat{f}$. We know $-\E \leq F(i) \leq

114: \E$. Multiplying by $|\phia_{j,s}(i)|$ and summing over all $i$ we get

115: $ -\E \|\phia_{j,s}\|_1 \leq \langle F, \phia_{j,s} \rangle =

116: a_j[s](F) \leq \E \|\phia_{j,s}\|_1$.  By definition,

117: $\phia_{j,s}=2^{-j/2}\phi_{j,s}$. Further, $\|\phi_{j,s}\|_2=1$ and

118: has at most $(2q)2^j$ non-zero values.

119: Hence, $\|\phia_{j,s}\|_1 \leq \sqrt{2q}$.  The lemma follows.

120: \end{proof}

121: %

122: At this point we have all the pieces. Summarizing:

123: \begin{lemma}\label{lemma:summary}

124: Let $\{z_i\}$ be a solution with $B$ non-zero coefficients and with

125: representation $\hat{f}=\sum_i z_i \psi_i$.

126: If $\|f-\hat{f}\|_p \leq \E$, then there is a solution $\{y_i\}$ with

127: $B$ non-zero coefficients and representation $f'=\sum_i y_i \psib_i$

128: such that for all $i$ we have,

129: \begin{enumerate}

130: \item[(i)] $y_i$ is a multiple of $\rho$;

131: \item[(ii)] $|y_i - \langle f,\psia_{i} \rangle | \leq C_0\sqrt{q}\E + \rho$; and,

132: \item[(iii)] $| \langle f,\phia_i \rangle - \langle f',\phia_i\rangle| \leq C_1\sqrt{q}\E +O(q\rho \log n)$,

133: \end{enumerate}

134: and $\|f -f'\|_p \leq \E + O(qn^{1/p}\rho \log n)$.

135: \end{lemma}

136: \begin{proof}

137: Rewrite $\hat{f}=\sum_i z_i \psi_i = \sum_i z_i^*\psib_i$ where

138: $z_i^* = z_{j,s}^* = 2^{-j/2} z_{j,s}$. Let $\{y_i\}$ be the

139: solution where each $y_i$ equals $z^*_i$ rounded to the nearest multiple of

140: $\rho$. Lemmas~\ref{psilemma} and~\ref{philemma} bound the $z_i^*$'s thus

141: providing properties (ii) and (iii). Finally, Lemma~\ref{changebase1}

142: gives the approximation guarantee of $\{y_i\}$.

143: \end{proof}

144:

145: The above lemma ensures the existence of a solution $\{y_i\}$ that is

146: $O(qn^{1/p}\rho \log n)$ away from the optimal solution and that

147: possesses some useful properties which we shall exploit for designing

148: our algorithms.  Each coefficient $y_i$ in this solution is a multiple

149: of a parameter $\rho$ that we are free to choose, and it is a constant

150: multiple of $\E$ away from the $i^\text{th}$ wavelet coefficient of

151: $f$.  Further, without knowing the values of those coefficients

152: $y_{j,s}$ contributing to the reconstruction of a certain point

153: $f'(i)$, we are guaranteed that during the incremental reconstruction

154: of $f'(i)$ using the cascade algorithm, every $a_j[s](f')$ in the

155: support of $f'(i)$ is a constant multiple of $\E$ away from $a_j[s](f)

156: = \langle f, \phia_{j,s}\rangle$.  This last property allows us to

157: design our algorithms in a bottom-up fashion making them suitable for

158: data streams.  Finally, since we may choose $\rho$, setting it

159: appropriately results in true factor approximation algorithms. Details

160: of our algorithms follow.

161:

162: \subsection{The Algorithm: A Simple Version}\label{sec:HaarAlgo}

163: We will assume here that we know the optimal error $\E$.  This

164: assumption can be circumvented by running $O(\log n)$ instances of the

165: algorithm presented below `in parallel', each with a different guess

166: of the error.  This will increase the time and space requirements of

167: the algorithm by a $O(\log n)$ factor, which is accounted for in

168: Theorem~\ref{mainthm} (and also in Theorem~\ref{mainthm2}). We detail

169: the guessing procedure in Section~\ref{sec:guesses}.  Our algorithm

170: will be given $\E$ and the desired approximation parameter $\epsilon$

171: as inputs (see Fig.~\ref{fig:apx}).

172: \medskip

173:

174: The Haar wavelet basis naturally form a complete binary tree, termed

175: the \emph{coefficient tree}, since their support sets are nested and

176: are of size powers of $2$ (with one additional node as a parent of the

177: tree). The data elements correspond to the leaves, and the

178: coefficients correspond to the non-leaf nodes of the tree. Assigning a

179: value $y$ to the coefficient corresponds to assigning $+y$ to all the

180: leaves that are {\em left descendants} (descendants of the left child)

181: and $-y$ to all right descendants (recall the definition of

182: $\{\psib_i\}$).  The leaves that are descendants of a node in the

183: coefficient tree are termed the {\em support} of the coefficient.

184:

185: \begin{Definition}

186: Let $E[i,v,b]$ be the minimum possible contribution to the overall

187: error from all descendants of node $i$ using exactly $b$ coefficients,

188: under the assumption that ancestor coefficients of $i$ will add up to

189: the value $v$ at $i$ (taking account of the signs) in the final

190: solution.

191: \end{Definition}

192:

193: The value $v$ will be set later for a subtree as more data

194: arrive. Note that the definition is bottom up and after we compute the

195: table, we do not need to remember the data items in the subtree. As

196: the reader would have guessed, this second property will be

197: significant for streaming.

198:

199: The overall answer is $\min_b E[root,0,b]$---by the time we are at the

200: root, we have looked at all the data and no ancestors exist to set a

201: non-zero $v$. A natural dynamic program arises whose idea is as

202: follows: Let $i_L$ and $i_R$ be node $i$'s left and right children

203: respectively.  In order to compute $E[i,v,b]$, we guess the

204: coefficient of node $i$ and minimize over the error produced by $i_L$

205: and $i_R$ that results from our choice.  Specifically, the computation

206: is:

207:

208: \begin{enumerate}

209: \item A non-root node computes $E[i,v,b]$ as follows:

210: \vspace{-0.05in}

211: \[ \min \left \{ \begin{array}{l}

212: \min_{r,b'} E[i_L,v+r,b'] + E[i_R,v-r,b-b'-1] \\

213: \min_{b'} E[i_L,v,b'] + E[i_R,v,b-b']

214: \end{array} \right.

215: \]

216: where the upper term computes the error if the $i^{th}$ coefficient is

217: chosen and it's value is $r\in R_i$ where $R_i$ is the set of

218: multiples of $\rho$ between $\langle f, \psia_i\rangle -

219: C_0\sqrt{q}\E$ and $\langle f, \psia_i\rangle + C_0\sqrt{q}\E$; and

220: the lower term computes the error if the $i^{th}$ coefficient is not

221: chosen.

222:

223: \item  Then the root node computes:

224: \[ \min \left \{

225: \begin{array}{ll}

226: \min_{r,b'} E[i_C,r,b'-1] & \mbox{root coefficient is $r$}\\

227: \min_{b'} E[i_C,0,b'] & \mbox{root not chosen}

228: \end{array} \right.

229: \]

230: where $i_C$ is the root's only child.

231: \end{enumerate}

232:

233: The streaming algorithm will

234: borrow from the paradigm of reduce-merge. The high level idea

235: will be to construct and maintain a small table of possibilities

236: for each resolution of the data. On seeing each item $f(i)$, we

237: will first find out the best choices of the wavelets of length one

238: (over all future inputs) and then, if appropriate,

239: construct/update a table for wavelets of length $2,4,\ldots$ etc.

240:

241: The idea of subdividing the data, computing some information and

242: merging results from adjacent divisions were used in \cite{GMMO00}

243: for stream clustering. The stream computation of wavelets in

244: \cite{GKMS01} can be viewed as a similar idea---where the

245: divisions corresponds to the support of the wavelet basis vectors.

246:

247:

248: Our streaming algorithm will compute the error arrays

249: $E[i,\cdot,\cdot]$ associated with the internal nodes of the coefficient

250: tree in a post-order fashion. Recall that the wavelet basis

251: vectors, which are described in Section~\ref{sec:prelim}, form a

252: complete binary tree. For example, the scaled basis vectors for nodes $4,

253: 3, 1$ and $2$ in the tree of Fig.~\ref{fig:salg123} are

254: $[1,1,1,1]$, $[1,1,-1,-1]$, $[1,-1,0,0]$ and $[0,0,1,-1]$

255: respectively. The data elements correspond to the leaves of the

256: tree and the coefficients of the synopsis correspond to its

257: internal nodes.

258: \eat{

259: Hence, assigning the value $c$ to node $2$ (equivalently, setting

260: $z_2=c$) for example corresponds to adding $c$ to $\wai(Z)_1$ and

261: $\wai(Z)_2$, and adding $-c$ to $\wai(Z)_3$ and $\wai(Z)_4$.

262: }

263:

264: We need not store the error array for every internal node since, in

265: order to compute $E[i,v,b]$ our algorithm only requires that

266: $E[i_L,\cdot,\cdot ]$ and $E[i_R,\cdot,\cdot ]$ be known.  Therefore,

267: it is natural to perform the computation of the error arrays in a

268: post-order fashion. An example best illustrates the procedure. Suppose

269: $f = \langle x_1,x_2,x_3,x_4\rangle$. In Fig.~\ref{fig:salg123} when

270: element $x_1$ arrives, the algorithm computes the error array

271: associated with $x_1$, call it $E_{x_1}$.  When element $x_2$ arrives

272: $E_{x_2}$ is computed.  The array $E[1,\cdot,\cdot ]$ is then computed

273: and $E_{x_1}$ and $E_{x_2}$ are discarded. Array $E_{x_3}$ is computed

274: when $x_3$ arrives.  Finally the arrival of $x_4$ triggers the

275: computations of the rest of the arrays as in Fig.~\ref{fig:salg456}.

276: %

277: \begin{figure}

278: \centering

279: \subfigure[The arrival of the first $3$ elements.]{\label{fig:salg123}

280: \begin{minipage}[t]{1.2in}

281: \centering \includegraphics[width=1in]{salg1}

282: \end{minipage}

283: \begin{minipage}[t]{1.2in}

284: \centering \includegraphics[width=1in]{salg2}

285: \end{minipage}

286: }  \subfigure[The arrival of $x_4$]{\label{fig:salg456}

287: \begin{minipage}[t]{1.2in}

288: \centering \includegraphics[width=1in]{salg4}

289: \end{minipage}

290: \begin{minipage}[t]{1.2in}

291: \centering \includegraphics[width=1in]{salg5}

292: \end{minipage}}

293: \caption{Upon seeing $x_2$ node $1$ computes

294: $\mbox{$E[1,\cdot,\cdot]$}$ and the two error arrays associated with

295: $x_1$ and $x_2$ are discarded.  Element $x_4$ triggers the computation

296: of $\mbox{$E[2, \cdot, \cdot ]$}$ and the two error arrays associated

297: with $x_3$ and $x_4$ are discarded. Subsequently, $\mbox{$E[3,\cdot,

298: \cdot ]$}$ is computed from $\mbox{$E[1,\cdot,\cdot]$}$ and

299: $\mbox{$E[2,\cdot,\cdot ]$}$ and both the latter arrays are

300: discarded. If $x_4$ is the last element on the stream, the root's

301: error array, $\mbox{$E[3,\cdot,\cdot ]$}$, is computed from

302: $\mbox{$E[2,\cdot,\cdot]$}$.}

303: \end{figure}

304: %

305: Note that at any point in time, there is only one error array stored

306: at each \emph{level} of the tree.  In fact, the computation of the

307: error arrays resembles a binary counter.  We start with an empty queue

308: $Q$ of error arrays. When $x_1$ arrives, $E_{q_0}$ is added to $Q$ and

309: the error associated with $x_1$ is stored in it.  When $x_2$ arrives,

310: a temporary node is created to store the error array associated with

311: $x_2$.  It is immediately used to compute an error array that is added

312: to $Q$ as $E_{q_1}$. Node $E_{q_0}$ is emptied, and it is filled again

313: upon the arrival of $x_3$. When $x_4$ arrives: (1) a temporary

314: $E_{t_1}$ is created to store the error associated with $x_4$; (2)

315: $E_{t_1}$ and $E_{q_0}$ are used to create $E_{t_2}$; $E_{t_1}$ is

316: discarded and $E_{q_0}$ is emptied; (3) $E_{t_2}$ and $E_{q_1}$ are

317: used to create $E_{q_2}$ which in turn is added to the queue;

318: $E_{t_2}$ is discarded and $E_{q_1}$ is emptied.

319: The algorithm  for $\ell_\infty$ is shown in Fig.~\ref{fig:apx}.

320:

321: %\begin{figure*}[htb]

322: \clearpage

323: \begin{figure}

324: \framebox[6.7in]{\parbox{6.5in}{

325: \begin{algorithm}{HaarPTAS}[B,\E,\epsilon]{\label{alg:apx}}

326: Let $\rho = \epsilon\E/(c q \log n)$ for some suitably

327: large constant $c$.  Note that $q=1$ in the Haar case.\\

328: Initialize a queue $Q$ with one node $q_0$ \qcomment{Each $q_i$

329: contains an array $E_{q_i}$ of size at most

330: $R\min\{B, 2^i\}$ and a flag {\tt isEmpty}}\\

331: {\bf repeat} Until there are no elements in the stream\\

332: Get the next element from the stream, call it $e$\\

333: \qif $q_0$ is empty \\

334: \qthen Set $q_0.a = e$. For all values $r$ s.t.~$|r -e| \leq c_1 \E$

335:   where $c_1$ is a large enough constant and $r$ is a multiple of

336:   $\rho$, initialize the table $E_{q_0}[r, 0] =

337:   |r-e|$\label{step:baseE} \\

338: \qelse Create $t_1$ and Initialize $E_{t_1}[r, 0] =|r-e|$ \emph{as in

339: Step \ref{step:baseE}}.\\

340: \qfor $i=1$ until the $1^\text{st}$ empty $q_i$ or end of $Q$ \\

341: \qdo Create a temporary node $t_2$.\\

342: Compute $t_2.a = \langle f,\phia_i\rangle$ and the wavelet coefficient

343: $t_2.o=\langle f, \psia_i\rangle$. This involves using the $a$ values

344: of $t_{1}$ and $q_{i-1}$ ($t_2$'s two children in the coefficient

345: tree) and taking their average to compute $t_2.u$ and their difference

346: divided by $2$ to compute $t_2.o$. (Recall that we are using

347: $\frac{1}{\sqrt{2}}h[]$).\\

348: For all values $r$ that are multiples of $\rho$ with $|r -t_2.a| \leq

349:   c_1(\E + \rho\log n)$, compute the table $E_{t_2}[r, b]$ for all $0\leq b \leq

350:   B$. This uses the tables of the two children $t_{1}$ and

351:   $q_{i-1}$. The size of the table is $O(\epsilon^{-1}Bn^{1/p}\log

352:   n)$. (Note that the value of a chosen coefficient at node $t_2$ is at

353:   most a constant multiple of $\E$ away from $t_2.o$. Keeping track of

354:   the chosen coefficients (the answer) costs $O(B)$ factor space

355:   more.)\label{step:generalE}\\

356: Set $t_1 \leftarrow t_2$ and Discard $t_2$\\

357: Set $q_i.\mathtt{isEmtpy} = \mbox{true}$

358: \qrof \\

359: \qif we reached the end of $Q$ \\

360: \qthen Create the node $q_i$ \qfi \\

361: Compute $E_{q_i}[r, b\in B]$ from $t_{1}$ and $q_{i-1}$ \emph{as in

362: Step \ref{step:generalE}}.\\

363: Set $q_i.\mathtt{isEmpty} = \mbox{false}$ and Discard $t_{1}$ \qfi

364: \end{algorithm}

365: }}

366: \caption{The Haar streaming FPTAS for $\ell_\infty$.}

367: \label{fig:apx}

368: \end{figure}

369: \clearpage

370:

371: %If at any point of time the number of coefficients larger than $\E$

372: %exceeds $B$ then we know our guess of $\E$ is wrong and we abort that

373: %thread.

374:   \subsubsection{Guessing the Optimal Error}\label{sec:guesses}

375: We have so far assumed that we know the optimal error $\E$. As

376: mentioned at the beginning of Section~\ref{sec:HaarAlgo}, we will

377: avoid this assumption by running multiple instances of our algorithm

378: and supplying each instance a different guess $G_k$ of the error.  We

379: will also provide every instance $A_k$ of the algorithm with

380: $\epsilon' = \frac{\sqrt{1+4\epsilon}-1}{2}$ as the approximation

381: parameter.  The reason for this will be apparent shortly.  Our final

382: answer will be that of the instance with the minimum representation

383: error.

384:

385: Theorem~\ref{mainthm} shows that the running time and space

386: requirements of our algorithm do not depend on the supplied error

387: parameter.  However, the algorithm's search ranges {\it do} depend on

388: the given error. Hence, as long as $G_k\ge\E$ the ranges searched by

389: the $k^\text{th}$ instance will include the ranges specified by

390: Lemma~\ref{lemma:summary}.  Lemma~\ref{lemma:summary} also tells us

391: that if we search these ranges in multiples of $\rho$, then we will

392: find a solution whose approximation guarantee is $\E+ c q

393: n^{1/p}\rho\log n$.  Our algorithm chooses $\rho$ so that its running

394: time does not depend on the supplied error parameter.  Hence, given

395: $G_k$ and $\epsilon'$, algorithm $A_k$ sets $\rho = \epsilon'G_k/(c q

396: n^{1/p}\log n)$.  Consequently, its approximation guarantee is $\E +

397: \epsilon' G_k$.

398:

399: Now if guess $G_k$ is much larger than the optimal error $\E$, then

400: instance $A_k$ will not provide a good approximation of the optimal

401: representation.  However, if $G_k \le (1+\epsilon')\E$, then $A_k$'s

402: guarantee will be $\E+ \epsilon'(1+\epsilon')\E = (1+\epsilon)\E$

403: because of our choice of $\epsilon'$.  To summarize, in order to

404: obtain the desired $(1+\epsilon)$ approximation, we simply need to

405: ensure that one of our guesses (call it $G_{k^*}$) satisfies

406: \begin{equation*}\label{eq:guess}

407: \E \le\ G_{k^*} \le\ (1+\epsilon')\E

408: \end{equation*}

409: Setting $G_k = (1+\epsilon')^k$, the above bounds will be satisfied

410: when

411: $k = k^* \in [\log_{1+\epsilon'}(\E),\ \log_{1+\epsilon'}(\E) +1]$.

412:

413: \paragraph*{Number of guesses}

414: Note that the optimal error $\E = 0$ if and only if $f$ has at

415: most $B$ non-zero expansion coefficients $\langle f, \psi_i\rangle$.

416: We can find these coefficients easily in a streaming fashion.

417:

418: Since we assume that the entries in the given $f$ are polynomially

419: bounded, by the system of equations~\eqref{sys} we know that the

420: optimum error is at least as much as the $(B+1)^{\text{st}}$ largest

421: coefficient. Now any coefficient ($\langle f, \psia_k\rangle$) is the

422: sum of the left half minus the sum of the right half of the $f_i$'s

423: that are in the support of the basis and the total is divided by the

424: length of the support. Thus if the smallest non-zero number in the

425: input is $n^{-c}$ then the smallest non-zero wavelet coefficient is at

426: least $n^{-(c+1)}$. By the same logic the largest non-zero coefficient

427: is $n^c$.  Hence, it suffices to make $O(\log n)$ guesses.

428:

429:

430: \medskip

431: \subsection{Analysis of the Simple Algorithm}

432: \label{sec:algspacetime}

433: The size of the error table at node $i$, $E[i,\cdot,\cdot]$, is

434: $R_\phi \min\{B, 2^{t_i}\}$ where $R_\phi = 2C_1\E/\rho+\log n$ and $t_i$ is

435: the height of node $i$ in the Haar coefficient tree (the leaves have

436: height $0$). Note that $q=1$ in the Haar case.  Computing each entry

437: of $E[i,\cdot,\cdot]$ takes $O(R_\psi\min\{B, 2^{t_i}\})$ time where

438: $R_\psi = 2C_0\E/\rho+2$. Hence, letting $R = \max\{R_\phi, R_\psi\}$,

439: the total running time is $O(R^2B^2)$ for computing the root table

440: plus $O(\sum_{i=1}^n \left(R\min \{ 2^{t_i},B\}\right)^2)$ for

441: computing all the other error tables. Now,

442: \begin{eqnarray*}

443: \sum_{i=1}^n \left(R \min \{ 2^{t_i},B \}\right)^2

444: & = & R^2 \sum_{t=1}^{\log n} \frac{n}{2^t} \min \{ 2^{2t},B^2\} \\

445: & = & nR^2\left(\sum_{t=1}^{\log B}2^t + \sum_{t=\log B +1}^{\log n} \frac{B^2}{2^t}\right) \\

446: %&=& n|R|^2\left((2B-2) + \sum_{u=1}^{\log (n/B)}\frac{B}{2^{u}}\right)\\

447: & = & O(R^2nB) \enspace ,

448: \end{eqnarray*}

449: where the first equality follows from the fact that the number of

450: nodes at level $t$ is $\frac{n}{2^t}$. For $\ell_\infty$, when

451: computing $E[i,v,b]$ we do not need to range over all values of

452: $B$. For a specific $r\in R_i$, we can find the value of $b'$ that

453: minimizes $\max\{E[i_L,v+r,b'], E[i_R,v-r,b-b'-1]\}$ using binary

454: search. The running time thus becomes,

455: \[

456: \sum_{t} R^2 \frac{n}{2^t} \min \{t2^{t},B \log B \} = O(nR^2\log^2 B) \enspace .

457: \]

458: The bottom up dynamic programming will require us to store the error tables

459: along at most two leaf to root paths. Thus the required space is,

460: \[ 2 \sum_{t} R \min \{2^{t},B \} = O(RB(1+\log \frac{n}{B})) \enspace .\]

461: %

462: Since we set $\rho=\epsilon\E/(c n^{1/p}\log n)$, we have

463: $\mbox{$R = O((n^{1/p}\log n)/\epsilon)$}$.

464: \medskip

465:

466: \begin{theorem}

467: \label{mainthm}

468: Algorithm~\ref{alg:apx} is a $O(\epsilon^{-1}B^2n^{1/p}\log^3 n)$ space

469: algorithm that computes a $(1+\epsilon)$ approximation to the best

470: $B$-term unrestricted representation of a signal in the Haar

471: system. Under the $\ell_p$ norm, the algorithm runs in time

472: $O(\epsilon^{-2}n^{1+2/p}B\log^3 n)$.  Under $\ell_\infty$ the running

473: time becomes $O(\epsilon^{-2}n\log^2 B\log^3 n)$.

474: \end{theorem}

475: \medskip

476:

477: The extra $B$ factor in the space required by the algorithm accounts

478: for keeping track of the chosen coefficients.

479: \smallskip

480:

481: \subsection{An Improved Algorithm and Analysis}

482: For large $n$ (compared to $B$), we gain in running time if we change the

483: rounding scheme given by Lemma~\ref{changebase1}.  The granularity at

484: which we search for the value of a coefficient will be fine if the

485: coefficient lies toward the top of the tree, and it will be coarse if

486: the coefficient lies toward the bottom. The idea is that, for small

487: $\ell_p$ norms, a mistake in a coefficient high in the tree affects

488: everyone, whereas mistakes at the bottom are more localized.  This

489: idea utilizes the strong locality property of the Haar basis.  We

490: start with the lemma analogous to Lemma~\ref{changebase1}.

491:

492: \begin{lemma}

493: \label{changebase3}

494: Let $\{ y^*_i\}$, $i = (t_i,s)$ be the optimal solution using the

495: basis set $\{\psib_i\}$ for the reconstruction, i.e., $\hat{f} =

496: \sum_i y^*_i\psib_i$ and $\| f - \hat{f}\|_p = \E$. Here $t_i$ is the

497: height of node $i$ in the Haar coefficient tree.  Let $\{y^\rho_i\}$

498: be the set where each $y^*_i$ is first rounded to the nearest multiple

499: of $\rho_{t_i} = \epsilon\E / (2B 2^{t_i/p})$ then the resulting value

500: is rounded to the nearest multiple of $\rho_{t_\text{root}} =

501: \epsilon\E/(2Bn^{1/p})$. If $f^\rho = \sum_i y^\rho_i\psib_i$ then

502: $\|f - f^\rho\|_p \leq (1+\epsilon)\E$.

503: \end{lemma}

504: \begin{proof}

505: As in Lemma~\ref{changebase1}, we need to estimate

506: $\norm{\sum\nolimits_i (y_i^\rho-y_i^*)\psib_i}_p$ but using the new

507: rounding scheme.  Let $\mathcal{S}$ be the set of indices $i$ such

508: that $y_i \ne 0$.

509: \begin{IEEEeqnarray*}{rCl}

510: \norm{\sum\nolimits_{i\in\mathcal{S}} (y_i^\rho-y_i^*)\psib_i}_p

511: & \ \le\  & \sum\nolimits_{i\in\mathcal{S}}\norm{(y_i^\rho-y_i^*)\psib_i}_p \\

512: &\ \le \ & \sum\nolimits_{i\in\mathcal{S}}(\rho_{t_i} + \rho_{t_\text{root}})\norm{\psib_i}_p \\

513: &\ \le \ & 2\sum\nolimits_{i\in\mathcal{S}}\rho_{t_i} 2^{t_i/p} \enspace .

514: \end{IEEEeqnarray*}

515: The last inequality follows from the fact that $2^{t_i}$ components of

516: $\psib_i$ are equal to one and the rest are zero. The approximation

517: hence follows from $|\mathcal{S}| \le B$ and our choices of

518: $\rho_{t_i}$.

519: \end{proof}

520:

521: The granularity of the dynamic programming tables $E[i,\cdot,\cdot]$

522: is set according to the smallest $\rho_{t_i}$ which is

523: $\rho_{t_\text{root}} = \epsilon\E/(2Bn^{1/p})$. This allows their

524: values to align correctly.  More specifically, when a coefficient is

525: not chosen we compute (see Section~\ref{sec:HaarAlgo})

526: \[ E[i,v,b] = \min_{b'} E[i_L, v, b'] + E[i_R, v, b-b']\enspace . \]

527: A value $v$ will that is not outside the range of $E[i_L,\cdot,\cdot]$

528: and $E[i_R,\cdot,\cdot]$ will be a correct index into these two

529: arrays.  We gain from this rounding scheme, however, when we are

530: searching for a value to assign to node $i$.  If $i$ is chosen, we can

531: search for its value in the range

532: $\langle f, \psia_i\rangle \pm 2C_0\E/\rho$ in multiples of $\rho_{t_i}$.

533: Hence, as mentioned earlier, the granularity of our search will be

534: fine for nodes at top levels and coarse for nodes at lower levels.

535: More formally, if $i$ is chosen, we compute

536: \[ E[i,v,b] = \min_{r,b'} E[i_L, v+r, b'] + E[i_R, v-r, b-b'-1]\enspace ,\]

537: where we search for the best $r$ in multiples of $\rho_{t_i}$.

538: The value $v+r$ (resp.~$v-r$) may not index correctly into

539: $E[i_L,\cdot,\cdot]$ (resp.~$E[i_R,\cdot,\cdot]$) since

540: $\rho_{t_{i}} = 2^{d/p}\rho_{t_{\text{root}}}$ where

541: $d = t_{root} - t_i$. Hence, we need to round each value of $r$ we

542: wish to check to the nearest multiple of

543: $\rho_{t_{\text{root}}}$. This extra rounding is accounted for in

544: Lemma~\ref{changebase3}.

545:

546: Letting $R$ be the number of values each table holds and

547: $R_{t_i} = 2C_0\E/\rho_{t_i} + 2$ be the number of entries we search

548: at node $i$, and using an analysis similar to that of

549: Section~\ref{sec:algspacetime}, the running time (ignoring constant

550: factors) becomes,

551: \begin{align*}

552: O(\sum_{i=1}^n RR_{t_i}\min\{2^{2t}, B^2\})

553: &\ =\ O(R \sum_{t=1}^{\log n} \frac{n}{2^t}\frac{B2^{t/p}}{\epsilon} \min\{2^{2t}, B^2\}) \\

554: &\ =\ O(\frac{nRB}{\epsilon}\left(\sum_{t=1}^{\log B} 2^{t/p+t} + B^2\sum_{t=\log B +1}^{\log n}2^{t/p-t}\right)) \\

555: &\ =\ O(\frac{nRB}{\epsilon}B^{1+1/p})

556: \end{align*}

557: Hence, since $R = O(n^{1/p}B/\epsilon)$ based on the granularity

558: $\rho_{t_\text{root}}$, the running time for each instance of the

559: algorithm is $O((nB)^{1+1/p}B^2/\epsilon^2)$.  The space requirement

560: is the same as that of the simpler algorithm; namely, $O(RB\log n)$.

561: \smallskip

562:

563: \begin{theorem}\label{mainthm2}

564: The above algorithm (with the new rounding scheme) is a

565: $O(\epsilon^{-1}B^3n^{1/p}\log^2 n)$ space algorithm that computes a

566: $(1+\epsilon)$ approximation to the best $B$-term unrestricted

567: representation of a signal in the Haar system under the $\ell_p$ norm.

568: The algorithm runs in time $O(\epsilon^{-2}(nB)^{1+1/p}B^2\log n)$.

569: \end{theorem}

570: \medskip

571:

572: Again, and as in Theorem~\ref{mainthm}, the extra $B$ factor in the

573: space requirement accounts for keeping track of the chosen

574: coefficients, and the extra $\log n$ factor in both the space and time

575: requirements accounts for the guessing of the error.

576:

577: We choose the better of the two algorithms (or rounding schemes) whose

578: approximation and time and space requirements are guaranteed by

579: Theorems~\ref{mainthm} and~\ref{mainthm2}.