0307:math0307152/nov2.tex

1: \documentclass[10pt]{article}

2: \usepackage{epsfig}

3: \title{An iterative thresholding algorithm for linear inverse problems

4: with a sparsity constraint}

5: \author{Ingrid Daubechies \and Michel Defrise \and

6: Christine De Mol}

7: \usepackage{amssymb,amsmath,amsfonts}

8:

9: \newtheorem{theorem}{Theorem}[section]

10: \newtheorem{proposition}[theorem]{Proposition}

11: \newtheorem{lemma}[theorem]{Lemma}

12: \newtheorem{remark}[theorem]{Remark}

13: \newtheorem{corollary}[theorem]{Corollary}

14: \newtheorem{example}[theorem]{Example}

15:

16: \voffset=-3.5cm

17: \hoffset=-2.5cm

18: \textwidth=18cm

19: \textheight=24.5cm

20: %\voffset=-2cm

21: %\hoffset=-1.5cm

22: %\textwidth=16.5cm

23: %\textheight=22 cm

24:

25: \newcommand{\QED}{\hfill \raisebox{-2pt}{\rule{5.6pt}{8pt}\rule{4pt}{0pt}}%

26:   \smallskip\par}

27:

28: \def\R{\mathbb{R}}

29: \def\C{\mathbb{C}}

30: \def\N{\mathbb{N}}

31: \def\Z{\mathbb{Z}}

32: \def\S{\mathbf{S}}

33: \def\T{\mathbf{T}}

34: \def\I{\sl {I}}

35: \def\A{\mathbf{A}}

36: \def\cB{\mathcal{B}}

37: \def\cH{\mathcal{H}}

38: \def\cV{\mathcal{V}}

39:

40: \def\mR{\mbox{R}}

41: \def\mN{\mbox{N}}

42: \def\K{K^*}

43: \def\wf{\widetilde{f}}

44: \def\fs{f^{\dagger}}

45: \def\OBJ{\mbox{\tiny\texttt{OBJECT}}}

46: \def\IM{\mbox{\tiny\texttt{IMAGE}}}

47: \def\SUR{\mbox{\tiny{\it{SUR}}}}

48: \def\fg{f_{\gamma}}

49: \def\ag{w_{\gamma}}

50: \def\vpg{\varphi_{\gamma}}

51: \def\g{\gamma}

52: \newcommand\eref[1]{(\ref{#1})}

53: \def\Vvert{\vert\!\vert\!\vert}

54: \def\VVert{[\!|\!|\!]}

55: \begin{document}

56: \maketitle

57:

58: \begin{abstract}

59:

60:

61: We consider linear inverse problems where the solution is assumed to have

62: a sparse expansion on an arbitrary pre--assigned orthonormal basis.

63: We prove that replacing the usual

64: quadratic regularizing penalties by weighted $\ell^p$-

65: penalties on the coefficients of such expansions, with $1 \leq p \leq 2$,

66: still regularizes the problem. If $p < 2$, regularized solutions of such

67: $l^p$-penalized problems will have sparser expansions, with respect to the

68: basis under consideration. To compute the corresponding regularized solutions

69: we propose an iterative algorithm which amounts to a Landweber iteration

70: with thresholding (or nonlinear shrinkage) applied at each iteration step.

71: We prove that this algorithm converges in norm.  We

72: also review some potential applications of this method.

73:

74: \end{abstract}

75:

76: \section{Introduction}

77: \subsection{Linear inverse problems}

78: In many practical problems in the sciences and applied sciences, the features

79: of most interest cannot be observed directly, but have to be inferred

80: from other, observable quantities. In the simplest

81: approximation, which works surprisingly well in a wide range of cases

82: and often suffices,

83: there is a {\em linear} relationship between the features of interest

84: and the observed quantities. If we model the {\em object} (the traditional

85: shorthand

86: for the features of interest) by a function $f$, and the derived quantities or

87: {\em image} by another function $h$, we can cast the problem of inferring

88: $f$ from $h$

89: as a {\em linear inverse problem}, the task of which is to

90: solve the equation

91: \begin{equation*}

92: %\label{eqn1}

93: Kf=h~.

94: \end{equation*}

95: This equation and the task of solving it make sense only when placed in an

96: appropriate

97: framework.

98: In this paper, we shall assume that $f$ and $h$ belong to appropriate function

99: spaces, typically Banach or Hilbert spaces,

100: $f \in \cB_{\OBJ}$,

101: $h \in \cB_{\IM}$, and that $K$ is a bounded  operator from

102: the space  $\cB_{\OBJ}$ to

103: $ \cB_{\IM}$. The choice of the spaces must be

104: appropriate for describing real-life situations.

105:

106: The observations or {\em data}, which we shall model by yet another

107: function, $g$, are typically not

108: exactly equal to the image $h=Kf$, but rather to a distortion of $h$. This

109: distortion

110: is often modeled by an {\em additive noise} or {\em error term} $e$, i.e.

111: \begin{equation*}

112: %\label{eqn2}

113: g=h+e=Kf+e ~.

114: \end{equation*}

115: Moreover, one typically assumes that the ``size'' of the noise can be measured

116: by its $L^2$-norm, $\|e\|=\left( \int_{\Omega} |e|^2\right)^{1/2}$ if $e$

117: is a function on $\Omega$. (In a finite-dimensional situation, one uses

118: $\|e\|=

119: \left(\sum_{n=1}^N |e_n|^2 \right)^{1/2}$ instead.)

120: Our only ``handle'' on the image $h$ is thus via the observed $g$, and

121: we typically have little information on $e=g-h$ beyond an upper bound on

122: its

123: $L^2$-norm $\|e\|$.

124: (We have here implicitly placed ourselves

125: in the ``deterministic setting'' customary to the

126: Inverse Problems community. In the

127: stochastic setting more familiar to statisticians, one assumes instead

128: a bound on the variance

129: of the components of $e$.)

130: Therefore it is customary to take

131: $\cB_{\IM}=L^2(\Omega)$; even if the ``true images'' $h$ (i.e. the images

132: $Kf$ of the

133: possible objects $f$) lie in a much smaller space, we can only know them up

134: to some (hopefully) small $L^2$-distance.

135:

136: We shall consider in this paper

137: a whole family of possible choices for $\cB_{\OBJ}$, but we shall

138: always assume that these spaces are subspaces of a basic Hilbert space

139: $\cH$ (often

140: an $L^2$-space as well), and that $K$ is a bounded operator from $\cH$ to

141: $L^2(\Omega)$.

142: In many applications, $K$ is an integral operator with a kernel

143: representing the response of the imaging device; in the special case where

144: this linear device is

145: translation-invariant, $K$ reduces to a convolution operator.

146:

147: To find an estimate of $f$ from the observed $g$, one can minimize

148: the {\it discrepancy} $\Delta(f)$,

149: \begin{equation*}

150: \Delta(f)=\| Kf-g \|^2\ ;

151: %\label{res}

152: \end{equation*}

153: functions that minimize $\Delta(f)$ are called {\em pseudo-solutions}

154: of the inverse problem. If the operator $K$ has a trivial null-space,

155: i.e. if $\mN(K)=\{f \in \cH ; Kf=0\}=\{0\}$, there is a unique minimizer, given

156: by

157: $\widetilde{f}=(K^*K)^{-1}K^*g$, where $K^*$ is the adjoint operator. If

158: $\mN(K)\neq

159: \{0\}$ it is customary to choose, among the set of pseudo-solutions, the unique

160: element

161: $f^\dagger$ of minimal norm,

162: i.e. $f^\dagger = \mbox{arg-min}\{\|f\|; f \mbox{ minimizes }

163: \Delta(f)\}$.

164: This function belongs to $\mN(K)^\perp$ and is called the {\it generalized

165: solution}

166:  of

167: the inverse problem. In this case the map $K^\dagger:

168: g \mapsto f^\dagger$

169: is called the {\it

170: generalized inverse } of $K$. Even when $K^*K$ is not invertible,

171: $K^\dagger g $ is

172: well-defined for all $g$ such that $K^*g \in \mR(K^*K)$.

173:  However, the generalized inverse operator may be

174: unbounded (for so-called {\it ill-posed problems}) or have a very large norm

175: (for {\it ill-conditioned problems}). In such instances, it has to be replaced

176: by bounded approximants or approximants with smaller norm, so

177: that numerically stable solutions can be defined and used as meaningful

178: approximations of the true solution corresponding to the exact data.

179: This is the issue of {\it regularization}.

180:

181: \subsection{Regularization by imposing additional quadratic constraints}

182:

183: The definition of a pseudo-solution (or even, if one considers equivalence

184: classes modulo $\mN(K)$, of a generalized solution) makes use of

185: the inverse of the operator $K^*K$; this inverse is well defined on

186: the range $\mR(K^*)$ of $\K $ when $K^*K$ is a strictly

187: positive operator, i.e. when its spectrum is bounded below away from zero.

188: When the spectrum of $K^*K$ is not bounded below by a strictly positive

189: constant, $\mR(K^*K)$ is not closed, and not all elements of $\mR(\K )$ lie

190: in $\mR(\K K)$. In this case there is no guarantee

191: that $K^*g

192: \in \mR(\K K)$; even if $K^*g$ belongs to $\mR(\K K)$, the unboundedness of

193: $(\K K)^{-1}$ can cause severe numerical instabilities unless

194: additional measures are taken.

195:

196: This blowup or these numerical instabilities are ``unphysical'', in the sense

197: that we know a priori that the true object would not have had a huge norm in

198: $\cH$, or other characteristics exhibited by the unconstrained ``solutions''.

199: A standard procedure to avoid these instabilities or to {\em regularize}

200: the inverse

201: problem is to modify the functional to be minimized, so that it

202: incorporates not

203: only  the discrepancy, but also the a priori knowledge one may have about the

204: solution.  For instance, if it is known that the object is of limited ``size''

205: in $\cH$, i.e. if

206: $\|f\|_{_{\cH}} \leq \rho$ , then the functional to be minimized can be chosen

207: as

208: \begin{equation*}

209: \Delta(f) + \mu \|f\|_{_{\cH}}^2 = \|Kf-g\|_{L^2(\Omega)}^2 +\mu

210: \|f\|_{_{\cH}}^2

211: \end{equation*}

212: where $\mu$ is some positive constant called the {\it regularization

213: parameter}. The minimizer is given by

214: \begin{equation}

215: \label{tikh}

216: f_\mu=(\K K + \mu I)^{-1} K^* g ~.

217: \end{equation}

218: where $I$ denotes the identity operator.

219: The constant $\mu$ can then be chosen appropriately,

220: depending on the application.

221: If $K$ is a

222: compact operator, with singular value decomposition given by

223: $Kf =\sum_{k=1}^{\infty} \sigma_k \left<f,v_k\right> u_k ~$, where $(u_k)_{k

224: \in \N}$ and $(v_k)_{k \in \N}$ are the orthonormal bases of eigenvectors of

225: $KK^*$ and $\K K$, respectively, with corresponding eigenvalues $\sigma_k^2$,

226: then \eref{tikh} can be rewritten as

227: \begin{equation}

228: \label{tikh-b}

229: f_\mu= \sum_{k=1}^{\infty} \frac{\sigma_k}{\sigma_k^2 + \mu}

230: \left<g, u_k\right> v_k ~.

231: \end{equation}

232: This formula shows explicitly how this regularization method reduces the

233: importance

234: of the eigenmodes of $\K K$ with small eigenvalues, which otherwise (if

235: $\mu =0$)

236: lead to instabilities. If an estimate of the ``noise'' is known, i.e. if

237: we know a priori that $g=Kf+e$ with $\|e\| \leq \epsilon$, then one finds

238: from \eref{tikh-b} that

239: \begin{equation*}

240: %\label{svd}

241: \| f - f_\mu \| \leq \left\Vert \sum_{k=1}^{\infty}

242: \frac{ \mu \left< f,v_k\right> }{\sigma_k^2+\mu}\; v_k \right\Vert +

243: \left\Vert \sum_{k=1}^{\infty} \frac{\sigma_k}{\sigma_k^2+\mu}

244: \left< e,u_k\right> v_k \right\Vert

245: \leq \Gamma(\mu) + \frac{\epsilon}{\sqrt\mu} ~,

246: \end{equation*}

247: where $\Gamma(\mu) \rightarrow 0$ as $\mu \rightarrow 0$. This means that

248: $\mu$ can be chosen appropriately, in an $\epsilon$-dependent way,

249: so that the error in estimation

250: $\|f - f_\mu \|$ converges to zero when $\epsilon$ (the estimation of

251: the noise level) shrinks to zero. This feature of the method, usually called

252: {\em stability}, is one that is required for any regularization method.

253: It is similar to requiring that a statistical estimator is consistent.

254:

255: Note that the ``regularized estimate'' $f_\mu$ of \eref{tikh-b} is

256: linear

257: in $g$. This means that we have effectively defined a linear regularized

258: estimation operator that

259: is especially well adapted to the properties of the operator $K$; however,

260: it proceeds with a ``one method fits all'' strategy, independent of the data.

261: This may not

262: always be the best approach. For instance, if $\cH$ is an $L^2$-space

263: itself, and $K$

264: is an integral operator, the functions $u_k$ and $v_k$ are typically fairly

265: smooth;

266: if on the other hand the objects $f$ are

267: likely to have local singularities or discontinuities,

268: an approximation of type \eref{tikh-b}

269: (effectively limiting the estimates $f_\mu$

270: to expansions in the first $N ~v_k$, with $N$ determined by, say, $\sigma_k^2<

271: \mu/100$ for $k>N$) will of necessity be a smoothened version of $f$, without

272: sharp features.

273:

274: Other classical regularization methods with

275: quadratic constraints may use quadratic Sobolev

276: norms, involving a few derivatives, as the ``penalty'' term added to the

277: discrepancy. This introduces a penalization of the

278: highly oscillating components, which are often the

279: most sensitive to noise. This method is especially

280: easy to use in the case where $K$ is a

281: convolution operator, diagonal in the Fourier domain.

282: In this case the regularization

283: produces a smooth cut-off on the highest Fourier components,

284: independently of the data.

285:  This works well

286: for recovering smooth objects which have their relevant structure contained in

287: the lower part of the spectrum and which have spectral content homogeneously

288: distributed across the space or time domain.

289: However, the Fourier domain is clearly not the appropriate representation

290: for expressing smoothness properties

291: of objects that are either spatially inhomogeneous, with

292: varying ``local frequency'' content, and/or present some

293: discontinuities, because

294: the frequency

295: cut-off implies that the resolution with which the fine details of the

296: solution can be stably retrieved is necessarily limited; it also

297: implies that the achievable

298: resolution is essentially the same at all points (see e.g. the book

299: \cite{Ber98}

300: for an extensive discussion of these topics).

301:

302: \subsection{Regularization by non-quadratic constraints that promote sparsity}

303:

304: The problems with the standard regularization methods described above

305: are well known and several approaches have been proposed for dealing

306: with them.

307: We propose in this paper a regularization method that, like the classical

308: methods just discussed, minimizes a functional obtained by adding a

309: penalization term

310: to the discrepancy; typically this penalization term will {\em not} be

311: quadratic,

312: but rather a weighted $\ell^p$-norm of the coefficients of $f$ with respect to

313: a particular orthonormal basis in $\cH$, with $1\leq p\leq 2$. More precisely,

314: given an orthonormal basis $\left(\vpg\right)_{\gamma\in \Gamma}$

315: of $\cH$, and given a sequence of strictly positive weights

316: ${\mathbf w}= (\ag )_{\gamma\in \Gamma}$,

317: we define the functional $\Phi_{\mathbf{w},p}$ by

318: \begin{equation}

319: \label{funct-gen}

320: \Phi_{\mathbf{w},p}(f)= \Delta(f)

321: +\sum_{\gamma \in \Gamma} \ag\; |\left<f,\vpg\right>|^p

322: = \|Kf-g\|^2 +\sum_{\gamma \in \Gamma} \ag |\left<f,\vpg\right>|^p ~.

323: \end{equation}

324:

325: For the special case $p=2$ and $\ag=\mu$ for all $\gamma \in \Gamma$

326: (we shall write this as $\mathbf{w}= \mu \mathbf{w}_{_0}$, where

327: $\mathbf{w}_{_0}$ is the sequence with all entries equal to 1),

328: this reduces

329: to the functional \eref{tikh}.  If we consider the family of functionals

330: $\Phi_{\mu\mathbf{w}_{_0},p}(f)$, keeping the weights

331: fixed at $\mu$, but decreasing $p$ from 2 to 1, we gradually

332: {\em increase} the penalization

333: on ``small'' coefficients (those with $|\left<f,\vpg\right>| <1$)

334: while simultaneously {\em decreasing} the

335: penalization on ``large coefficients'' (for which

336: $|\left<f,\vpg\right>|>1$).  As far as the

337: penalization term is concerned,  we are thus putting a lesser penalty on

338: functions $f$

339: with large but few components with respect to the basis

340: $\left(\vpg\right)_{\gamma\in \Gamma}$, and a higher penalty on

341: sums of many small components, when compared to the classical method of

342: \eref{tikh}.

343: This effect is the more pronounced the

344: smaller $p$ is. By taking $p <2$, and especially

345: for the limit value $p=1$,

346: the proposed minimization procedure thus promotes

347: sparsity of the expansion of $f$ with respect

348: to the $\vpg$.

349: (We shall not consider $p < 1$ here,

350: because then the functional

351: ceases to be convex.)

352:

353: The bulk of this paper deals with  algorithms to obtain minimizers $f^*$

354: for the

355: functional

356: \eref{funct-gen}, for general operators $K$. In the special case where $K$

357: happens

358: to be diagonal in the $\vpg$--basis,

359: $K \vpg= \kappa_{\g} \vpg$,

360: the analysis is easy and straightforward.

361: Introducing the shorthand notation $\fg$ for $\left<f,\vpg\right>$ and

362: $g_{\gamma}$ for

363: $\left<g,\vpg\right>$, we have then

364: $$

365: \Phi_{\mathbf{w},p}(f)=

366: \sum_{\gamma \in \Gamma} \left[ |\kappa_{\g} \fg-g_{\gamma}|^2

367: + \ag |\fg|^p\right] ~.

368: $$

369: The minimization problem thus uncouples into a family of 1-dimensional

370: minimizations, and is

371: easily solved. Of particular interest is the almost trivial case where

372: (i) $K$ is the identity operator,

373: (ii) $\mathbf{w}=\mu \mathbf{w}_{_0}$ and (iii) $p=1$,

374: which corresponds to the practical situation where

375: the data $g$ are equal to a noisy version of $f$ itself, and we want to remove

376: the noise

377: (as much as possible), i.e. we wish to {\em denoise} $g$. In this case the

378: minimizing $f^\star$ is given by

379: \begin{equation}

380: \label{simple}

381: f^\star = \sum_{\gamma \in \Gamma} f^\star_{\g} \vpg

382: = \sum_{\gamma \in \Gamma} S_{\mu}(g_{\gamma}) \vpg~,

383: \end{equation}

384: where $S_{\mu}$ is the (nonlinear) thresholding operator from $\R$ to $\R$

385: defined by

386: \begin{equation}

387: \label{stau}

388: S_{\mu}(x)= \left\{

389:  \begin{array}{ccl} x +\mu/2 ~&~ \mbox{if} ~& x \leq - \mu/2  \\

390: 0 ~&~ \mbox{if} ~& |x| < \mu/2

391: \\ x- \mu/2 ~&~ \mbox{if} ~& x \geq  \mu/2.

392: \end{array} \right.

393: \end{equation}

394: (We shall revisit the derivation of \eref{simple} below. For simplicity,

395: we are assuming that all functions are real-valued. If the $\fg$ are complex,

396: a derivation similar

397: to that of \eref{simple} then leads to a complex thresholding operator, which

398: is defined as $S_{\mu}(r e^{i\theta})= S_{\mu}(r) e^{i\theta}$;

399: see Remark \ref{2-5} below.)

400:

401: In more general cases, especially when $K$ is not diagonal with respect

402: to the $\vpg$--basis, it is not as straightforward to minimize

403: \eref{funct-gen}.

404:

405: An approach that promotes sparsity with respect to a particular basis

406: makes sense only if we know that the objects $f$ that we want

407: to reconstruct do indeed have a sparse expansion with respect to this basis.

408: In the next subsection we list some situations in which this is the

409: case and to which the algorithm that we propose in this paper

410: could be applied.

411:

412: \subsection{Possible applications for sparsity-promoting constraints}

413: \subsection*{1.4.1 Sparse wavelet expansions}

414:

415: This is the application that was the primary motivation for this paper.

416: Wavelets provide orthonormal bases of $L^2(\R^d)$ with localization

417: in space and in scale; this makes them more suitable than e.g.

418:  Fourier expansions for an efficient

419: representation of functions that have space-varying smoothness properties.

420: Appendix \ref{WavBes} gives a very succinct overview of wavelets and their

421: link

422: with

423: a particular family of smoothness spaces, the Besov spaces. Essentially,

424: the Besov space $B^s_{p,q}(\R^d)$ is a space of functions on

425: $\R^d$ that ``have $s$ derivatives in $L^p(R^d)$''; the

426: index $q$ provides some extra fine-tuning. The

427: precise definition involves the moduli of continuity of the function,

428: defined by finite differencing, instead of derivatives, and combines

429: the behavior of these moduli at different scales.

430: The Besov space $B^s_{p,q}(\R^d)$ is well-defined as

431: a complete metric space even if the indices $p,~q \in (0,\infty)$ are

432: $<1$, although it is no longer a Banach space in this case.

433: Functions that are mostly smooth, but that have a few local

434: ``irregularities'', nevertheless can still belong to a Besov space with

435: high smoothness index. For instance, the 1-dimensional function

436: $F(x)= \mbox{sign} (x)\; e^{-x^2}$ can belong to $B^s_{p,q}(\R)$ for

437: arbitrarily large $s$, provided $0<p<\left(s+\frac{1}{2}\right)^{-1}$. (Note that

438: this same example does not belong to any of the Sobolev spaces

439: $W^s_p(\R)$ with $s>0$, mainly because these can be defined only for

440: $p\geq 1$.) Wavelets provide unconditional bases for the Besov spaces,

441: and one can express whether or not a function $f$ on $\R^d$ belongs

442: to a Besov space by a fairly simple and completely explicit requirement

443: on the absolute values of the wavelet coefficients of $f$.

444: This expression becomes particularly simple when $p=q$; as

445: reviewed in Appendix \ref{WavBes},

446: $f \in B^s_{p,q}(\R)$ if and only if (see Appendix \ref{WavBes})

447: \begin{equation*}

448: %\label{triple-simple}

449: \VVert f \VVert_{ _{s,p}} = \left( \sum_{\lambda \in \Lambda} 2^{\sigma p

450: |\lambda|} |\left<f, \Psi_{\lambda} \right> |^p \right) ^{1/p} < \infty~,

451: \end{equation*}

452: where $\sigma$ depends on $s,p$ and is defined by $\sigma =s +d \left(

453: \frac{1}{2}-\frac{1}{p} \right)$,

454: and where $|\lambda |$ stands for the scale of the wavelet

455: $\Psi_{\lambda}$. (The $\frac{1}{2}$ in the formula for $\sigma$ is

456: due to the choice of normalization

457: of the $\Psi_{\lambda}$, $\|\Psi_{\lambda}\|_{_{L^2}} =1$.)

458: For $p=q\geq 1$, $\VVert ~ \VVert_{ _{s,p}}$ is an equivalent norm to the

459: standard Besov norm on $B^s_{p,q}(\R^d)$; we shall restrict ourselves to

460: this case in this paper.

461:

462:

463: It follows that minimizing

464: the variational functional for an inverse

465: problem with a Besov space prior falls exactly within the category of

466: problems studied in this paper: for such an inverse problem,

467: with operator $K$ and with

468: the a priori knowledge that the object lies in some $B^s_{p,p}$, it

469: is natural to define the

470: variational functional to be minimized

471: by

472: $$

473: \Delta(f)+ \VVert f \VVert _{ _{s,p}}^p = \|Kf-g\|^2 + \sum_{\lambda \in

474: \Lambda}

475: 2^{\sigma p |\lambda|} |\left< f, \Psi_{\lambda} \right> |^p ~~,

476: $$

477: which is exactly of the type

478: $\Phi_{\mathbf{w},p}(f)$, as defined in \eref{funct-gen}.

479: For the case where $K$ is the identity operator, it was

480: noted already in \cite{Cha98}

481: that the wavelet-based algorithm for denoising

482: of data with a Besov prior, derived earlier in \cite{Don94},

483: amounts exactly to the minimization of

484: $\Phi_{\mu\mathbf{w}_{ _0},1}(f)$, where

485: $K$ is the identity operator and the $\vpg$--basis is a wavelet basis; the

486: denoised approximant given in \cite{Don94} then coincides

487: exactly with (\ref{simple}, \ref{stau}).

488:

489: It should be noted that if $d > 1$, and if we are interested in functions that

490: are mostly smooth, with possible jump discontinuities

491: (or other ``irregularities'')

492: on smooth manifolds of dimension 1 or higher (i.e. not point irregularities),

493: then the Besov spaces do not constitute the optimal smoothness

494: space hierarchy. For

495: $d=2$, for instance, functions $f$ that are $C^{\infty}$ on the

496: square $[0,1]^2$, except on a finite set of smooth curves, belong to

497: $B^1_{1,1}([0,1]^2)$, but not to $B^s_{1,1}([0,1]^2)$ for $s>1$.

498: In order to obtain

499: more efficient (sparser) expansions of this type of functions, other expansions

500: have to be used, using e.g. ridgelets or curvelets (\cite{Don00},

501: \cite{Can00}).

502: One can then again use the approach in this paper, with respect to these

503: more adapted bases.

504:

505:

506: \subsection*{1.4.2 Other orthogonal expansions}

507:

508: The framework of this paper applies to enforcing sparsity of the expansion of

509: the solution on any orthonormal basis. We provide here three examples which are

510: particularly relevant for applications, but this is of course not limitative.

511:

512: The first example is the case where it is known a priori

513: that the object to be recovered is sparse in the Fourier domain, i.e. $f$

514: has only

515: a few nonzero Fourier components. It makes then sense to choose a standard

516:  Fourier

517: basis for the $\vpg$, and to apply the algorithms explained later in this

518: paper.

519: (They would have to be adapted to deal with complex functions, but

520: this is easily done; see Remark \ref{2-5} below.)

521: In the case where $K$ is the identity operator,

522: this is a classical problem, sometimes referred to as ``tracking sinusoids

523: drowned

524: in noise'', for which many other algorithms have been

525: developed.

526:

527: For other applications, objects are naturally sparse in the original

528: (space or time) domain. Then our framework can be used again if we expand such

529: objects in a basis formed by the characteristic functions of pixels or voxels.

530: Once the inverse problem is discretized in pixel space, it is regularized by

531: penalizing the $l^p$-norm of the object with $1 \le p \le 2$. Possible

532: applications include the restoration of astronomical images with scattered

533: stars

534: on a flat background. Objects formed by a few spikes are also typical of some

535: inverse

536: problems arising in spectroscopy or in particle sizing. In medical

537: imaging, $l^p$-norm

538: penalization with $p$ larger than but close to $1$ has been used e.g.

539: for the restoration of tiny blood vessels \cite{Li02}.

540:

541: The third example refers to the case where $K$ is compact and the use of SVD

542: expansions is a viable  computational approach, e.g. for the solution of

543: relatively  small-scale problems or

544: for operators that can be diagonalized in an analytic way. As already stressed

545: above, the linear regularization methods as given e.g. by \eref{tikh-b}

546: have the drawback that the penalization or cut-off eliminates the components

547: corresponding to the

548: smallest singular values, independently of the type of data. In some

549: instances,

550: the most

551: significant coefficients of the object may not correspond to the largest

552: singular

553: values; it may then happen that the object possesses

554: significant coefficient beyond

555: the cut-off imposed by linear methods. In order to avoid the elimination of

556: such

557: coefficients, it is preferable to use instead a

558: nonlinear regularization analogous to (\ref{simple}, \ref{stau}), with basis

559: functions $\vpg$ replaced

560: by the singular vectors $v_k$.

561: The theorems in this paper show that the {\it thresholded SVD expansion}

562: $$

563: f^* = \sum_{k=1}^{+\infty} S_{\mu/\sigma_k^2}

564: \left(\frac{\left< g,u_k\right>}{\sigma_k}

565: \right) v_k

566: ~=~ \sum_{k=1}^{+\infty}\frac{1}{\sigma_k^2} ~ S_{\mu}

567: \left( \sigma_k \left< g,u_k\right>

568: \right) v_k~,

569: $$

570: which is the minimizer of the functional \eref{funct-gen}  with

571: $\mathbf{w}=\mu \mathbf{w}_{_0}$ and $p=1$,

572: provides a regularized solution that is better adapted to these

573: situations.

574:

575: \subsection*{1.4.3 Frame expansions}

576:

577: In a Hilbert space $\cH$, a frame $\{\psi_n\}_{n \in \N}$ is a

578: set of vectors for which there exist

579: constants $A, B >0$ so that, for all $v \in \cH$,

580: $$

581: B^{-1} \sum_{n\in \N} |\left<v,\psi_n\right>|^2 \leq \|v\|^2 \leq

582: A^{-1} \sum_{n \in \N}

583: |\left< v, \psi_n\right>|^2  ~~.\

584: $$

585: Frames always span the whole space $\cH$, but the frame vectors $\psi_n$

586: are

587: typically not linearly independent. Frames were first proposed

588: by Duffin and Schaeffer in \cite{DuSh52}; they are now used in a wide

589: range of

590: applications.

591: For particular choices of the frame vectors, the two frame bounds $A$

592: and

593: $B$ are equal; one has then, for all $ v \in \cH$,

594: \begin{equation}

595: \label{frame-1}

596: v = A^{-1} \sum_{n \in \N} \left<v, \psi_n \right> \psi_n ~.

597: \end{equation}

598: In this case, the frame is called {\em tight}.

599: An easy example of a frame is given by taking the union of two (or more)

600: different

601: orthonormal bases in $\cH$; these unions always constitute tight frames,

602: with $A=B$ equal to the number of orthonormal bases used in the union.

603:

604: Frames are typically ``overcomplete'', i.e. they still span all of

605: $\cH$ even if some frame vectors are removed. It follows that, given

606: a vector $v$ in $\cH$, one can find many different sequences of

607: coefficients

608: such that

609: \begin{equation}

610: \label{frame-v}

611: v = \sum_{n \in \N} z_n \psi_n ~~.

612: \end{equation}

613: Among these sequences, some have special properties for which they

614: are preferred. There is, for instance,

615: a standard procedure to find the unique sequence

616: with minimal $\ell^2$-norm; if the frame is tight, then this

617: sequence is given by $z_n=A^{-1} \left<v,\psi_n\right>$,

618: as in \eref{frame-1}.

619:

620: The problem of finding sequences

621: $\mathbf{z}=(z_n)_{n \in \N}$ that satisfy \eref{frame-v} can be

622: considered as an inverse problem. Let us define the operator $K$

623: from $\ell^2 (\N)$ to $\cH$ that maps a sequence

624: $\mbox{{\bf z}}= (z_n)_{n \in \N}$ to the element $K \mbox{{\bf z}}$ of $\cH$

625: by

626: \begin{equation*}

627: %\label{frame-2}

628: K \mbox{{\bf z}} = \sum_{n \in \N} z_n \psi_n ~.

629: \end{equation*}

630: Then solving \eref{frame-v} amounts to solving $K\mathbf{z}=v$. Note

631: that

632: this operator $K$ is associated with, but not identical to

633: what is often called the ``frame operator''. One has, for

634: $v \in \cH$,

635: $$

636: K K^* v = \sum_{n \in \N} \left<v, \psi_n \right> \psi_n~ ;

637: $$

638: for $\mathbf{z} \in \ell^2$, the sequence $K^*K\mathbf{z}$ is given by

639: $$

640: (K^*K \mathbf{z})_k = \sum_{l \in \N} z_l \left< \psi_l, \psi_k \right> ~.

641: $$

642: In this framework, the sequence $\mathbf{z}$ of minimum

643: $\ell^2$-norm that satisfies \eref{frame-v} is given simply by

644: $\mathbf{z}^{\dagger}= K^{\dagger}v$. The standard procedure in frame

645: lore

646: for the construction of $\mathbf{z}^{\dagger}$ can be rewritten as

647: $\mathbf{z}^{\dagger}=K^*(KK^*)^{-1}v$, so that

648: $K^{\dagger}=K^*(KK^*)^{-1}$

649: in this case. This last formula holds because this inverse problem is

650: in fact well-posed: even though $\mN(K) \neq \{0\}$, there is a gap

651: in the spectrum of $K^*K$ between the eigenvalue 0 and the remainder

652: of the spectrum, which is contained in the interval $[A,B]$; the

653: operator

654: $KK^*$ has its spectrum completely within $[A,B]$. In practice, one

655: always

656: works with frames for which the ratio $B/A$ is reasonably close to 1, so

657: that the problem is not only well-posed but also well-conditioned.

658:

659: It is often of interest, however,

660: to find sequences

661: that are sparser than $\mathbf{z}^{\dagger}$.

662: For instance, one may know a priori that $v$ is a ``noisy'' version

663: of a linear combination of $\psi_n$ with a coefficient sequence of small

664: $\ell^1$-norm. In this case, it makes sense to determine a sequence

665: $\mathbf{z}_{\mu}$ that minimizes

666: \begin{equation}

667: \label{frame-3}

668: \|K\mathbf{z}-v\|^2_{\cH} + \mu \|{\mathbf{z}}\|_{\ell^1} ~,

669: \end{equation}

670: a problem that

671: falls exactly in the category of problems described in subsection 1.3.

672: Note that although the inverse problem for $K$ from $\ell^2(\N)$ to $\cH$

673: is well-defined, this need not be the case with the restriction

674: $K \big|_{\ell^1}$ from $\ell^1(\N)$ to $\cH$. One can indeed find

675: tight frames for which

676: $\sup \{ \|\mathbf{z}\|_{_{\ell^1}}~;~ \mathbf{z} \in \ell^1 \mbox{ and }

677: \|K\mathbf{z}\| \leq 1 \} = \infty$,

678: so that for arbitrarily large $R$ and arbitrarily small $\epsilon$,

679: one can find $\tilde{v} \in \cH$, $\tilde{\mathbf{z}} \in \ell^1$

680: with $\|\tilde{v}-K\tilde{\mathbf{z}}\| = \epsilon$, yet

681: $\inf \{ \|\mathbf{z}\|_{_{\ell^1}} ~; ~ \|\tilde{v}-K{\mathbf{z}} \|

682: \leq \epsilon/2 \} \geq R \|\tilde{\mathbf{z}}\|_{\ell^1}$.

683: In a noisy situation, it therefore may not make sense to search for the

684: sequence

685: with minimal $\ell^1$--norm that is ``closest'' to $v$; to find

686: an estimate of the $\ell^1$--sequences of which a given $v$ is known

687: to be a small perturbation, a better strategy is to compute the minimizer

688: $z_{\mu}$ of  \eref{frame-3}.

689:

690: Minimizing the functional \eref{frame-3} as an approach to obtain

691: sequences that provide sparse approximations $K\mathbf{z}$

692: to $v$ was

693: proposed and applied to various problems by Chen, Donoho and Saunders

694: \cite{CDS01};

695: in the statistical literature, least-squares regression with

696: $\ell^1$-penalty is known as the ``lasso'' \cite{Tib96}.

697: The algorithm in this paper provides thus an alternative to linear and

698: quadratic programming techniques for these problems,

699: which all amount to minimizing  \eref{frame-3}.

700:

701: % \newpage

702:

703: \subsection{A summary of our approach}

704:

705: Given an operator $K$ from $\cH$ to itself (or, more generally, from

706: $\cH$ to $\cH'$), and an orthonormal basis

707: $(\vpg)_{\gamma \in \Gamma}$, our goal is to find minimizing

708: $f^\star$ for the functionals $\Phi_{\mathbf{w},p}$ defined in section

709: 1.3. The corresponding variational equations are

710: \begin{equation}

711: \label{variational}

712: \forall \gamma \in \Gamma ~ : ~

713: \left< \K Kf, \vpg \right> - \left< \K g , \vpg \right>

714: + \frac{ \ag p}{2}~ | \left< f, \vpg\right> |^{p-1}

715: \mbox{sign}(\left< f, \vpg\right>) = 0 ~~.

716: \end{equation}

717: When $p \neq 2$ and $K$ is not diagonal in the

718: $\vpg$-basis, this gives a coupled system of

719: nonlinear equations for the $\left<f, \vpg \right>$,

720: which it is not immediately clear how to solve.

721: To bypass this problem, we

722: shall use a sequence of ``surrogate'' functionals that are each

723: easy to minimize,

724: and for which we expect, by a heuristic argument, that the successive

725: minimizers have our desired $f^\star$ as a limit.

726: These surrogate

727: functionals  are introduced in section 2 below. In section 3 we then show that

728: their successive minimizers do indeed converge to $f^\star$; we first

729: establish weak convergence, but conclude the section by proving that the

730: convergence also holds in norm. In section 4 we show that our proposed

731: iterative method is {\em stable}, in the sense given in subsection 1.2: if we

732: apply the algorithm to data that are a small perturbation of a ``true image''

733: $K f_o$,

734: then the algorithm will produce $f^\star$ that converge

735: to $f_o$ as the norm of the perturbation tends to zero.

736:

737: \subsection{Related work}

738:

739:

740: % version revisee par Ingrid

741: Exploiting the sparsity of the expansion on a given basis

742: of an unknown signal, in order to assist in the estimation

743: or approximation of the signal from noisy data, is not a new

744: idea. The key role played by sparsity to achieve superresolution

745: in diffraction-limited imaging was already emphasized by Donoho

746: \cite{Don92} more than a decade ago. Since the seminal paper by Donoho

747: and Johnstone \cite{Don94}, the use of thresholding techniques for

748: sparsifying the wavelet expansions of noisy signals in order to remove the

749: noise (the so-called ``denoising'' problem) has been abundantly discussed in

750: the literature, mainly in a statistical framework (see e.g.

751: the book \cite{Mal98}).

752: Of particular importance for the background of this paper is the article by

753: Chambolle et al. \cite{Cha98}, which provides a variational formulation

754: for denoising, through the use of penalties on a Besov-norm of

755: the signal; this is the perspective adopted in the present paper.

756:

757: Several attempts have been made to generalize the denoising framework

758: to solve inverse problems. To overcome the coupling problem stated in the

759: preceding subsection, a first approach is to construct wavelet- or

760: ``wavelet-inspired'' bases that are in some sense adapted to the operator to

761: be inverted. The so-called Wavelet-Vaguelette

762: Decomposition (WVD) proposed by Donoho \cite{Don95}, as well as the twin

763: Vaguelette-Wavelet Decomposition method \cite{Abr98}, and also the

764: deconvolution in mirror wavelet bases \cite{KMR03, Mal98} can all be

765: viewed as examples of this strategy. For the inversion of the Radon transform,

766: Lee and Lucier \cite{Lee01} formulated a

767: generalization of the WVD decomposition that uses a variational

768: approach to set thresholding levels. A drawback

769: of these methods is that they are limited to special types of operators $K$

770: (essentially convolution-type operators under some additional

771: technical assumptions).

772:

773: Other papers have explored the application of Galerkin-type methods to inverse

774: problems, using an appropriate but fixed wavelet basis \cite{Dic96, Lou97,

775: Coh02}. The underlying intuition is again that if the operator lends itself

776: to a

777: fairly  sparse representation in wavelets, e.g. if it is an operator of the

778: type

779: considered in \cite{Bey91}, and if the object is mostly smooth with some

780: singularities, then the inversion of the truncated operator will not be too

781: onerous, and the approximate representation of the object will do a good

782: job of

783: capturing the singularities. In \cite{Coh02}, the method is made

784: adaptive, so that

785: the finer-scale wavelets are used where lower scales indicate the presence of

786: singularities.

787:

788: The mathematical framework in this paper has the advantage of not

789: pre-supposing any

790: particular properties for the operator $K$ (other than boundedness) or the

791: basis

792: $(\vpg)_{\gamma\in \Gamma}$ (other than its orthonormality). We prove,

793: in complete

794: generality, that generalizing Tikhonov's regularization method from the

795: $\ell^2$-penalty case to a $\ell^1$-penalty (or, more generally, a weighted

796: $\ell^p$-penalty with $1\leq p\leq 2$) provides a proper regularization

797: method for

798: ill-posed problems in a Hilbert space $\cal H$, with estimates that are

799: independent

800: of the dimension of $\cal H$ (and are thus valid for infinite-dimensional

801: separable $\cal H$). To our knowledge, this is the first proof of this fact.

802: Moreover, we derive a Landweber-type iterative algorithm that involves a

803: denoising procedure at each iteration step and provides a sequence of

804: approximations

805: converging in norm to the variational minimizer, with estimates of the rate of

806: convergence in particular cases. This algorithm was derived previously

807: in \cite{DeM02}, using,

808: as in this paper, a construction based on ``surrogate

809: functionals''. During the final editing of the present paper, our attention

810: was

811: drawn to the independent work by Figueiredo and Nowak \cite{Fig03, Nov01},

812: who, working in the different (finite-dimensional

813: and stochastic) framework of Maximum Penalized

814: Likelihood Estimation

815: for inverting a

816: convolution operator acting on objects that are sparse in the wavelet domain,

817: derive essentially the same iterative algorithm as in \cite{DeM02} and

818: this paper.

819:

820:

821: \section{An iterative algorithm through

822: surrogate functionals}

823: It is the combined presence of $\K Kf$ (which couples all the equations)

824: and the nonlinearity of the equations that makes the system

825: \eref{variational} unpleasant. For this reason, we borrow a technique

826: of optimization transfer (see e.g. \cite{Lan00} and \cite{DeP95}) and

827: construct surrogate functionals that effectively remove the term $\K Kf$. We

828: first pick a constant $C$ so that $ \|\K K \| < C$, and then

829: we define the functional $\Xi(f;a)=

830: C\|f-a\|^2-\|Kf-Ka\|^2$ which depends on an auxiliary element $a$ of

831: $\cH$.

832: Because $C\mbox{\I} - \K K$ is a strictly positive operator, $\Xi(f;a)$ is

833: strictly convex in $f$ for any choice of $a$. If $\|K\|<1$,

834: we are allowed to set $C=1$; for simplicity, we will restrict ourselves

835: to this case, without loss of generality since

836: $K$ can always be renormalized.

837: We then add  $\Xi(f ; a)$ to $\Phi_{\mathbf{w},p}(f)$ to form the following

838: ``surrogate functional''

839: \begin{eqnarray}

840: \label{sur}

841: {\Phi}^{^{\SUR}}_{\mathbf{w},p}(f ; a)&=& \Phi_{\mathbf{w},p}(f)

842: - \|Kf-Ka\|^2 +  \|f-a\|^2 \nonumber \\

843: &=& \|Kf-g\|^2 + \sum_{\gamma \in \Gamma} \ag |\left< f,

844: \vpg \right>|^p -  \|Kf-Ka\|^2 +  \|f-a\|^2 \nonumber \\

845: &=& \|f\|^2 - 2\left<f, a+\K g  - \K K a \right> + \sum_{\gamma}

846: \ag | \left< f, \vpg \right>|^p + \|g\|^2 + \|a\|^2 - \| K a

847: \|^2\nonumber \\ &=& \sum_{\gamma} \left[ \fg^2 -2 \fg

848: \left(a +\K g  - \K K a \right)_{\gamma} + \ag |\fg|^p

849: \right] + \|g\|^2 + \|a\|^2 - \| K a \|^2

850: \end{eqnarray}

851: where we have again used the shorthand $v_{\gamma}$ for $\left<v,\varphi_

852: {\gamma} \right>$, and implicitly assumed

853: that we are dealing with real functions only.

854: Since $\Xi(f;a)$ is strictly convex in $f$,

855: ${\Phi}^{^{\SUR}}_{\mathbf{w},p}(f ; a)$ is also strictly convex

856: in $f$, and has a unique minimizer for any choice of $a$. The advantage of

857: minimizing \eref{sur} in place of \eref{variational} is that the variational

858: equations for the $\fg$ decouple. We can then try to approach the minimizer

859: of $\Phi_{\mathbf{w},p}(f)$ by an iterative process which goes as follows:

860: starting from an arbitrarily chosen $f^0$, we determine the minimizer

861: $f^1$ of \eref{sur} for $a = f^0$; each successive iterate $f^n$ is then

862: the minimizer for $f$ of

863: the surrogate functional \eref{sur} anchored at the previous iterate, i.e. for

864: $a= f^{n-1}$. The iterative algorithm thus goes as follows

865: \begin{equation}

866: \label{iter}

867: f^0 {\mbox\ {\rm arbitrary}}\ ; f^n= \mbox{{\rm arg--min}}

868: \left({\Phi}^{^{\SUR}}_{\mathbf{w},p}(f ; f^{n-1})\right)\ \ n=1, 2,\dots

869: \end{equation}

870: To gain some insight into this iteration,

871: let us first focus on two special cases.

872:

873: In the case where $\mathbf{w}=\mathbf{0}$ (i.e. the functional

874: $\Phi_{\mathbf{w},p}$ reduces to the discrepancy only), one needs to

875: minimize

876: $$

877: {\Phi}^{^{\SUR}}_{\mathbf{0},p}(f ; f^{n-1})=\|f\|^2-2\left<f,

878: f^{n-1} + \K (g - K f^{n-1}) \right> +\|g\|^2 + \|f^{n-1}\|^2 - \|Kf^{n-1}\|^2

879: ~;

880: $$

881: this leads to

882: \begin{equation*}

883: %\label{BPRes}

884: f^{n} = f^{n-1} + \K (g - Kf^{n-1})~~.

885: \end{equation*}

886:  This is

887: nothing else than the so-called Landweber iterative method, the convergence of

888: which  to the (generalized) solution of

889: $Kf=g$ is well-known (\cite{Lad51}; see also \cite{Ber98},

890: \cite{Eng96}).

891:

892: In the case where $\mathbf{w}=\mu \mathbf{w_0}$ and $p=2$,

893: the $n$-th surrogate functional reduces to

894: $$

895: {\Phi}^{^{\SUR}}_{\mathbf{w},2}(f ; f^{n-1})=(1+\mu)\;

896: \|f\|^2-2\left<f,f^{n-1} + \K (g - K f^{n-1}) \right> +\|g\|^2 +

897: \|f^{n-1}\|^2 -

898: \|Kf^{n-1}\|^2 ~;

899: $$

900: the minimizer is now

901: \begin{equation}

902: \label{DamLan}

903: f^n = \frac{1}{1+\mu} \left[ f^{n-1} +\K (g -K f^{n-1}) \right] ~,

904: \end{equation}

905: i.e. we obtain a damped or regularized Landweber iteration

906: (see e.g. \cite{Ber98}). The convergence of the function $f^n$ defined by

907: (\ref{DamLan}) follows immediately from the estimate

908: $\|f^{n+1}-f^n\| = (1+\mu)^{-1} \|(\mbox{\I}-\K K)(f^n-f^{n-1})\|

909: \leq (1+\mu)^{-1} \|f^n -f^{n-1}\|$, showing that we have a contractive

910: mapping,

911: even if $\mN(K) \neq \{0\}$.

912:

913: In these two special cases we thus find that the $f^n$ converge as

914: $n \rightarrow \infty$. This permits one to hope that the $f^n$ will

915: converge for general $\mathbf{w}, p$ as well; whenever this is the

916: case the difference

917: $\|f^{n}-f^{n-1}\|^2 - \|K(f^{n}-f^{n-1})\|^2$ between

918: ${\Phi}^{^{\SUR}}_{\mathbf{w},p}(f^n ; f^{n-1})$ and

919: $\Phi_{\mathbf{w},p}(f^n)$ tends to zero as $n \rightarrow \infty$,

920: suggesting that the minimizer

921: $f^n$ for the first functional could well tend to a minimizer

922: $f^\star$ of the second.

923: In section 3 we shall see that all this is more than a pipe-dream; i.e. we

924: shall prove that the $f^n$ do indeed converge to a minimizer  of

925: $\Phi_{\mathbf{w},p}$.

926:

927: In the remainder of this section, we derive an explicit formula

928: for the computation of the successive

929: $f^n$. We first discuss the minimization of the functional

930: \eref{sur} for a generic $a \in \cH$.

931: As already noticed, the variational equations for the $\fg$

932: decouple. For $p>1$, the summand in \eref{sur} is differentiable in $\fg$, and

933: the minimization reduces to solving the variational equation

934: \begin{equation*}

935: %\label{vareq-pneq1}

936: 2 \fg + p \, \ag \, \mbox{sign}(\fg) |\fg|^{p-1} = 2( a_{\gamma} + [\K (g-K

937: a)]_{\gamma}) ~;

938: \end{equation*}

939: since for any  $w \geq 0$ and any $p>1$,

940: the real function $F_{w,p}(x)=x+ {\frac{w p}{2}} ~ \mbox{sign}(x)|x|^{p-1}$ is

941: a one-to-one map

942: from $\mathbb{R}$ to itself,

943: we thus find that the minimizer of \eref{sur} satisfies

944: \begin{equation}

945: \label{solcomp-pneq1}

946: \fg= S_{\ag,p}( a_{\gamma} + [\K (g-K a)]_{\gamma}) ~,

947: \end{equation}

948: where $S_{w,p}$ is defined by

949: \begin{equation}

950: \label{S-pneq1}

951: S_{w,p}= \left( F_{w,p} \right)^{-1} ~, ~ \mbox{for } p>1.

952: \end{equation}

953:

954: When $p=1$, the summand of  \eref{sur} is differentiable in $\fg$

955: only if $\fg \neq 0$; except at the point of non-differentiability,

956: the variational equation now reduces to

957: \begin{equation*}

958: %\label{vareq-peq1}

959: 2 \fg + \ag \,  \mbox{sign}(\fg) = 2 ( a_{\gamma} + [\K (g-K a)]_{\gamma})

960: ~.

961: \end{equation*}

962: For $\fg>0$, this leads to $\fg= a_{\gamma} + [\K (g-K

963: a)]_{\gamma} -

964: \ag/2$; for consistency we must impose

965: in this case that $a_{\gamma} + [\K (g-K

966: a)]_{\gamma}

967: > \ag/2$. For $\fg <0$, we obtain

968: $\fg=a_{\gamma} + [\K (g-K

969: a)]_{\gamma}+\ag/2$, valid only when

970: $a_{\gamma} + [\K (g-K

971: a)]_{\gamma} < -\ag/2$. When

972: $a_{\gamma} + [\K (g-K

973: a)]_{\gamma}$ does not satisfy either of the two

974: conditions. i.e. when $|a_{\gamma} + [\K (g-K

975: a)]_{\gamma}| \leq \ag/2$,

976: we put $\fg =0$. Summarizing,

977: \begin{equation}

978: \label{solcomp-peq1}

979: \fg = S_{\ag,1}(a_{\gamma}+[ \K( g - K a)]_{\gamma}) ~,

980: \end{equation}

981: where the function $S_{w,1}$ from $\mathbb{R}$ to itself is defined by

982: \begin{equation}

983: \label{S-peq1}

984: S_{w,1}(x)=\left\{ \begin{array}{ccl} {x-w/2} & {\mbox{if}} & {x \geq w/2} \\

985: {0} & {\mbox{if}} & {|x| < w/2} \\ {x+w/2 }& {\mbox{if}} & {x \leq -w/2  ~~.}

986: \end{array} \right.

987: \end{equation}

988: (Note that this is the same nonlinear function as encountered

989: earlier in section 1.3, in definition \eref{stau}.)

990:

991: The following proposition summarizes our findings, and proves (the case

992: $p=1$ is not conclusively proved by the variational equations above)

993: that we have indeed found the minimizer of

994: ${\Phi}^{^{\SUR}}_{\mathbf{w},p}(f ; a)$:

995: \begin{proposition}

996: \label{prop-2-1} Suppose the operator $K$ maps a Hilbert space $\cH$

997: to another Hilbert space $\cH'$, with $\|\K K\| < 1$,

998: and suppose $g$ is an element of $\cH'$.

999: Let $(\vpg)_{\gamma \in \Gamma}$

1000: be an orthonormal basis for $\cH$, and

1001: let $\mathbf{w}=(\ag)_{\g \in \Gamma}$ be a sequence

1002: of strictly positive numbers. Pick

1003: arbitrary $p \geq 1$ and $a \in \cH$. Define the

1004: functional ${\Phi}^{^{\SUR}}_{\mathbf{w},p}(f ; a)$ on $\cH$ by

1005: $$

1006: {\Phi}^{^{\SUR}}_{\mathbf{w},p}(f ; a)=\|Kf-g\|^2 + \sum_{\g \in

1007: \Gamma} \ag |\fg|^p

1008: +\|f-a\|^2-\|K(f-a)\|^2 ~.

1009: $$

1010: Then ${\Phi}^{^{\SUR}}_{\mathbf{w},p}(f ; a)$ has a unique minimizer

1011: in $\cH$. \\

1012: This minimizer

1013: is given by $f=\S_{\mathbf{w},p}\left(a +\K (g-Ka) \right)$, where

1014: the operators $\S_{\mathbf{w},p}$ are defined by

1015: \begin{equation}

1016: \label{def-SS}

1017: \S_{\mathbf{w},p}(h)= \sum_{\gamma} S_{\ag,p}(h_{\gamma}) \vpg ~~,

1018: \end{equation}

1019: with the functions $S_{w,p}$ from $\R$ to itself given by

1020: {\rm (\ref{S-pneq1}, \ref{S-peq1})}. For all $h \in \cH$, one has

1021: $$

1022: {\Phi}^{^{\SUR}}_{\mathbf{w},p}(f+h ; a)

1023: \geq {\Phi}^{^{\SUR}}_{\mathbf{w},p}(f ;a)+\|h\|^2~.

1024: $$

1025: \end{proposition}

1026: {\em Proof:}

1027: The cases $p>1$ and $p=1$ should be treated slightly differently. We discuss

1028: here only the case $p=1$; the simpler case $p>1$ is left to the reader.

1029:

1030: Take $f'=f+h$, where $f$ is as defined in the Proposition, and $

1031: h \in \cH$

1032: is arbitrary. Then

1033: $$

1034: {\Phi}^{^{\SUR}}_{\mathbf{w},1}(f+h ; a) =

1035: {\Phi}^{^{\SUR}}_{\mathbf{w},1}(f ; a)+ 2 \left<h,f-a -\K (g

1036: -Ka) \right>

1037: +\sum_{\g \in \Gamma} \ag (|\fg +h_{\g}| -|\fg|) + \|h\|^2  ~.

1038: $$

1039: Define now $\Gamma_{_{\!0}}=\{\g \in \Gamma; \fg=0\}$, and

1040: $\Gamma_{_{\!1}}=\Gamma \setminus

1041: \Gamma_{_{\!0}}$.

1042: Substituting the explicit expression \eref{solcomp-peq1} for the $\fg$, we

1043: have then

1044: \begin{eqnarray*}

1045: {\Phi}^{^{\SUR}}_{\mathbf{w},1}(f+h ; a)-

1046: {\Phi}^{^{\SUR}}_{\mathbf{w},1}(f; a) &= &\|h\|^2

1047: + \sum_{\g \in \Gamma_{_{\!0}}}

1048: \left[ \ag |h_{\g}| - 2 h_{\g} (a_{\g} +[ \K (g -K a)]_{\g} ) \right] \\

1049: &&~~~~~~~~~~~~

1050: +\sum_{\g \in \Gamma_{_{\!1}}} \left( \ag |\fg + h_{\g}| - \ag |\fg| + h_{\g}

1051: [-\ag  ~ \mbox{sign}(\fg) ]\right) ~.

1052: \end{eqnarray*}

1053: For $\g \in \Gamma_{_{\!0}}$, $~2|a_{\g} +[ \K (g -K a)]_{\g}| \leq

1054: \ag$, so

1055: that

1056: $\ag |h_{\g}| - 2 h_{\g}\, (a_{\g} +[ \K (g -K a)]_{\g}) \geq  0$.\\

1057: If $\g \in \Gamma_{_{\!1}}$, we distinguish two cases, according to the sign

1058: of $\fg$\, . If $\fg >0$, then\\

1059: $\ag |\fg + h_{\g}| - \ag |\fg| + h_{\g}

1060: [-\ag  ~ \mbox{sign}(\fg) ]= \ag [ |\fg +h_{\g}| - (\fg + h_{\g}) ] \geq 0$.

1061: If $\fg <0$, then $\ag |\fg + h_{\g}| - \ag |\fg| + h_{\g}

1062: [-\ag  ~ \mbox{sign}(\fg) ]= \ag [ |\fg +h_{\g}| + (\fg + h_{\g}) ] \geq 0$.\\

1063: It follows that ${\Phi}^{^{\SUR}}_{\mathbf{w},1}(f+h;a)-

1064: {\Phi}^{^{\SUR}}_{\mathbf{w},1}(f;a) \geq \|h\|^2 $, which

1065: proves the Proposition.

1066:  \hfill \QED

1067:

1068: \bigskip

1069:

1070: For later reference it is useful to point out that

1071: \begin{lemma}

1072: \label{SS-non-exp}

1073: The operators $\S_{\mathbf{w},p}$ are non-expansive, i.e.

1074: $\forall v, ~ v' \in \cH$, $\| \S_{\mathbf{w},p} v - \S_{\mathbf{w},p} v'

1075: \| \leq

1076: \|v-v'\|~.$

1077: \end{lemma}

1078: {\em Proof:}

1079: As shown by \eref{def-SS},

1080: $$

1081: \|\S_{\mathbf{w},p} v - \S_{\mathbf{w},p} v' \|^2 =

1082: \sum_{\g \in \Gamma} |S_{\ag,p}( v_{\g}) - S_{\ag,p}( v'_{\g})|^2 ~,

1083: $$

1084: which means that it suffices to show that, $\forall x , x' \in \R$, and all

1085: $w\geq0$,

1086: \begin{equation}

1087: \label{S-non-exp}

1088: | S_{w,p}(x)-S_{w,p}(x')| \leq |x-x'|~.

1089: \end{equation}

1090: If $p >1$, then $S_{w,p}$ is the inverse of the function $F_{w,p}$; since

1091: $F_{w,p}$ is differentiable with derivative uniformly bounded below by 1,

1092: \eref{S-non-exp} follows immediately in this case.\\

1093: If $p=1$, then $S_{w,1}$ is not differentiable in $x=w/2$ or $x=-w/2$, and

1094: another

1095: argument must be used. For the sake of definiteness, let us assume $x \geq x'$.

1096: We will just check all the possible cases.

1097: If $x$ and $x'$ have the same sign and $|x|,~|x'|\geq w/2$, then

1098: $| S_{w,p}(x)-S_{w,p}(x')| =|x-x'|$. If $x'\leq -w/2$ and $x \geq w/2$, then

1099: $| S_{w,p}(x)-S_{w,p}(x')| = x +|x'|-w < |x-x'|$. If

1100: $x \geq w/2$ and $|x'| < w/2$, then

1101: $| S_{w,p}(x)-S_{w,p}(x')| =x-w/2 < |x-x'|$. A symmetric argument applies

1102: to the

1103: case $|x|<w/2$ and $x' \leq -w/2$. Finally, if both $|x|$ and $|x'|$ are

1104: less than

1105: $w/2$, we have

1106: $| S_{w,p}(x)-S_{w,p}(x')|=0 \leq |x-x'|$. This establishes \eref{S-non-exp}

1107: in all cases. \hfill \QED

1108:

1109: \bigskip

1110:

1111: Having found the minimizer of a generic

1112: ${\Phi}^{^{\SUR}}_{\mathbf{w},p}(f ; a)$, we can

1113: apply this to the

1114: iteration \eref{iter}, leading to

1115:

1116: \begin{corollary}

1117: \label{cor-2-2}

1118: Let $\cH$, $\cH'$, $K$, $g$, $\mathbf{w}$ and $(\vpg)_{\g \in \Gamma}$ be

1119: as in Proposition {\rm \ref{prop-2-1}}. Pick $f^0$ in $\cH$, and define

1120: the functions $f^n$ recursively by the algorithm {\rm \eref{iter}}.

1121: Then

1122: \begin{equation}

1123: \label{f-n}

1124: f^n= \S_{\mathbf{w},p}\left(f^{n-1}+ \K (g-Kf^{n-1}) \right) ~.

1125: \end{equation}

1126: \end{corollary}

1127: {\em Proof:} this follows immediately from Proposition \ref{prop-2-1}.

1128: $~~~~~~~~~~~$ \hfill \QED

1129:

1130: \begin{remark}

1131: \label{op-D}

1132: {\rm In the argument above, we used essentially only two ingredients: the

1133: (strict)

1134: convexity of $\|f-a\|^2-\|K(f-a)\|^2$, and the presence of the negative

1135: $-\|Kf\|^2$

1136: term in this expression, canceling the $\|Kf\|^2$ in the original

1137: functional. We can use this observation to present a slight

1138: generalization, in which the identity operator used to upper bound $ \K K

1139: $ is replaced by a more general operator $D$ that is diagonal in the

1140: $\vpg$--basis,}

1141: $$

1142: D\, \vpg = d_{\gamma} \vpg~,

1143: $$

1144: {\rm and that still gives a strict upper bound for $ \K K$, i.e. satisfies}

1145: $$

1146: D \geq K^*K + \eta I\ \  \mbox{\rm for some } \eta >0\ .

1147: $$

1148: {\rm In this case, the whole construction still carries through, with slight

1149: modifications; the successive $f^n$ are now given by }

1150: \begin{equation*}

1151: %\label{generalization-comp}

1152: \fg ^n = S_{\ag/d_{\gamma},p}\left(\fg ^{n-1} +

1153: \frac {[\K (g-Kf^{n-1})]_{\gamma}}{d_{\gamma}} \right) ~~.

1154: \end{equation*}

1155: {\rm Introducing the notation $\mathbf{w/d}$ for the sequence

1156: $(\ag/d_{\gamma})_{\gamma}$, we can rewrite this as }

1157: \begin{equation*}

1158: %\label{generalization}

1159: f^n = \S_{\mathbf{w/d},p}\left( f^{n-1} +D^{-1}[\K (g-Kf^{n-1})] \right) ~.

1160: \end{equation*}

1161: {\rm

1162: For the sake of simplicity of notation,

1163: we shall restrict ourselves

1164: to the case $D = \mbox{\I}$.}

1165: \end{remark}

1166:

1167: \begin{remark}

1168: \label{2-5}

1169: {\rm If we deal with complex rather than real functions, and the $f_{\gamma}$,

1170: $(K^*g)_{\gamma}, \cdots$ are complex quantities, then the derivation

1171: of the minimizer of ${\Phi}^{^{\SUR}}_{\mathbf{w},1}(f; a)$

1172: has to be adapted somewhat. Writing

1173: $f_{\gamma}= r_{\gamma} e^{i \theta_{\gamma}}$, with

1174: $ r_{\gamma} \geq 0$, $\theta_{\gamma} \in [0,2 \pi )$,

1175: and likewise $(a +K^*g -K^*Ka)_{\gamma} = R_{\gamma} e^{i \Theta_{\gamma}}$,

1176: we find, instead of \eref{sur},

1177: $$

1178: {\Phi}^{^{\SUR}}_{\mathbf{w},p}(f; a)

1179: = \sum_{\gamma} [ r_{\gamma}^2+ w_{\gamma}r_{\gamma}^p

1180: - 2r_{\gamma}R_{\gamma}\cos(\theta_{\gamma}-\Theta_{\gamma})]

1181: +\|g\|^2+\|a\|^2-\|Ka\|^2~.

1182: $$

1183: Minimizing over $r_{\gamma} \in [0,\infty)$ and

1184: $\theta_{\gamma} \in [0,2 \pi)$ leads to

1185: $\theta_{\gamma}=\Theta_{\gamma}$ and

1186: $r_{\gamma}=S_{w_{\gamma},p}(R_{\gamma})$.

1187: If we extend the definition

1188: of $S_{\mu,p}$ to complex arguments by setting

1189: $S_{\mu,p}(r e^{i \theta}) = S_{\mu,p}(r) e^{i \theta}$,

1190: then this still leads to

1191: $\fg= S_{w_{\gamma},p}\left(a_{\gamma} +[K^*(g -Ka)]_{\gamma} \right)$,

1192: as in (\ref{solcomp-pneq1}, \ref{solcomp-peq1}).

1193: The arguments of the different proofs still

1194: hold for this complex version, after minor and straightforward modifications.}

1195: \end{remark}

1196: %\newpage

1197:

1198:

1199: \section{Convergence of the iterative algorithm}

1200:

1201: In this section we discuss the convergence of the sequence $(f^n)_{n \in \N}$

1202: defined by {\rm \eref{f-n}}. The main result of this section is the

1203: following theorem:

1204: \begin{theorem}

1205: \label{th-3-1}

1206: Let $K$ be a bounded linear operator from

1207: $\cH$ to $\cH'$, with norm strictly bounded by $1$. Take $p \in [1,2]$, and

1208: let $\S_{\mathbf{w},p}$ be the shrinkage operator defined by {\rm

1209: \eref{def-SS}},

1210: where the sequence $\mathbf{w}=(\ag)_{\g \in \Gamma}$ is uniformly

1211: bounded

1212: below away from zero, i.e. there exists a constant $c>0$ such that $\forall \g

1213: \in

1214: \Gamma:$

1215: $\ag \geq c$.

1216: Then the sequence of iterates

1217: \begin{equation*}

1218: f^n=\S_{\mathbf{w},p}\left( f^{n-1} + \K (g- Kf^{n-1})\right)\ ,\quad

1219: n=1,2,\dots\;,

1220: \end{equation*}

1221: with $f^0$ arbitrarily chosen in $\cH$, converges strongly to a minimizer

1222: of the functional

1223: \begin{equation*}

1224: \Phi_{\mathbf{w},p}(f) = \| Kf-g\|^2 +  \Vvert f\Vvert_{\mathbf{w},p}^p\ ,

1225: \end{equation*}

1226: where $\Vvert f\Vvert_{\mathbf{w},p}$ denotes the norm

1227: \begin{equation}

1228: \label{triple-norm}

1229: \Vvert f\Vvert_{\mathbf{w},p} = \left[ \sum_{\g \in \Gamma} \ag

1230: |\left<f,\vpg \right>|^p

1231: \right]^{1/p} ~,~ 1 \leq p \leq 2~.

1232: \end{equation}

1233: If either $p>1$ or {\rm{N}}$(K)=\{0\}$, then the minimizer $f^\star$ of

1234: $\Phi_{\mathbf{w},p}$ is unique, and every sequence of iterates

1235: $f^n$ converges strongly to $f^\star$, regardless of the choice of $f^0$.

1236: \end{theorem}

1237:

1238: By ``strong convergence'' we mean convergence in the norm of $\cH$, as

1239: opposed to

1240: weak convergence.

1241: This theorem will be proved in several stages. To start, we prove weak

1242: convergence,

1243: and we establish that the weak limit is indeed a minimizer of

1244: $\Phi_{\mathbf{w},p}$.

1245: Next, we prove that the convergence holds in norm, and not only in the weak

1246: topology.

1247: To lighten our formulas, we introduce the shorthand notation

1248: $$

1249: \T f = \S_{\mathbf{w},p}\left( f + \K (g-Kf)\right)~;

1250: $$

1251: with this new notation we have $f^n= \T^n f^0$.

1252:

1253: \subsection{Weak convergence of the $f^n$}

1254:

1255: To prove weak convergence of the $f^n=\T^n f^0$, we apply the following

1256: theorem, due to

1257: Opial \cite{Opi67}:

1258: \begin{theorem}

1259: \label{thm_opial}

1260: Let

1261: the mapping $\A $ from $\cH$ to $\cH$ satisfy the following

1262: conditions:

1263: \begin{enumerate}

1264: \item[{\rm (i)}] $\A $ is non-expansive: $\forall v, v' \in \cH$,

1265: $\| \A  v -  \A  v'\| \leq \| v - v'\|$,

1266: \item[{\rm (ii)}] $\A $ is asymptotically regular: $\forall v \in \cH$,

1267: $\| \A ^{n+1}v -\A ^n v\|

1268: \xrightarrow[n \to \infty ]{~}  0$ ,

1269: \item[{\rm (iii)}] the set ${\cal F}$ of the fixed points of $\A $ in

1270: ${\cH}$ is

1271: not empty.

1272: \end{enumerate}

1273: Then, $\forall v \in \cH$, the sequence $(\A ^n v)_{n \in \N}$

1274: converges weakly to a fixed point in ${\cal F}$.

1275: \end{theorem}

1276:

1277: Opial's original proof can be simplified;

1278: we provide the simplified proof (still mainly

1279: following Opial's approach) in Appendix \ref{Opial}.

1280: (The theorem is slightly more general than what is stated

1281: in Theorem \ref{thm_opial} in that the mapping $\A $ need not be

1282: defined on all of space; it suffices that it map a closed convex subset of

1283: $\cH$ to itself -- see Appendix \ref{Opial}.

1284: \cite{Opi67} also contains additional refinements,

1285: which we shall not need here.)

1286: One of the Lemmas stated and proved in the Appendix

1287: will be invoked in its own right, further below in this section;

1288: for the reader's convenience, we state

1289: it here in full as well:

1290:

1291: \begin{lemma}

1292: \label{lem-3-2}

1293: Suppose the mapping $\A$ from $\cH$

1294: to $\cH$ satisfies the conditions {\rm (i)} and {\rm (ii)}

1295: in Theorem {\rm\ref{thm_opial}}. Then, if

1296: a subsequence of $(\A ^n v)_{n\in \mathbb{N}}$

1297: converges weakly in $\cH$, then its limit is a fixed point of $\A$.

1298: \end{lemma}

1299:

1300: In order to apply Opial's Theorem to our nonlinear operator

1301: $\T$, we need to verify that it satisfies the three conditions in Theorem

1302: \ref{thm_opial}. We do this in the following series of lemmas. We first have

1303:

1304: \begin{lemma}

1305: \label{nonexp}

1306: The mapping $\T$ is non-expansive, i. e. $\forall v, v' \in \cH$

1307: \begin{equation*}

1308: \|\T  v -  \T  v' \| \leq \| v - v' \| \ .

1309: \end{equation*}

1310: \end{lemma}

1311: {\em Proof:}

1312: It follows from Lemma \ref{SS-non-exp}

1313: that the shrinkage operator ${\S_{\mathbf{w},p}}$ is

1314: non-expansive. Hence we have

1315: \begin{eqnarray*}

1316: \|{\T} v - {\T} v' \| &\leq& \| (I-K^*K) v - (I-K^*K)

1317: v' \|\\

1318: &\leq& \| I-K^*K  \|\ \| v - v' \|  \leq \| v - v' \|

1319: \end{eqnarray*}

1320: because we assumed $\| K \| < 1$.

1321: \hfill\QED\bigskip

1322:

1323: This verifies that $\T$ satisfies the first condition (i) in Theorem

1324: \ref{thm_opial}.

1325: To verify the second condition, we first prove some auxiliary lemmas.

1326: \begin{lemma}

1327: \label{cost2}

1328: Both $\left({\Phi}_{\mathbf{w},p}(f^n)\right)_{n \in \N}$ and

1329: $\left({\Phi}^{^{\SUR}}_{\mathbf{w},p}(f^{n+1} ; f^n)\right)_{n

1330: \in

1331: \N}$ are  non-increasing sequences.

1332: \end{lemma}

1333: {\em Proof:} For the sake of convenience, we introduce the operator

1334: $L = \sqrt{I -\K K }$, so that $\|h\|^2-\|Kh\|^2= \|Lh\|^2$. Because

1335: $f^{n+1}$ is the

1336: minimizer of the functional

1337: ${\Phi}^{^{\SUR}}_{\mathbf{w},p}(f ; f^n)$ and therefore

1338: \begin{equation*}

1339: \Phi_{\mathbf{w},p}(f^{n+1})+\| L(f^{n+1}-f^n)\|^2 =

1340: {\Phi}^{^{\SUR}}_{\mathbf{w},p}(f^{n+1} ; f^n)

1341: \leq

1342: {\Phi}^{^{\SUR}}_{\mathbf{w},p}(f^n ; f^n)=\Phi_{\mathbf{w},p}(f^n)\

1343: ,

1344: \end{equation*}

1345: we obtain

1346: \begin{equation*}

1347: \Phi_{\mathbf{w},p}(f^{n+1})\leq \Phi_{\mathbf{w},p}(f^n) \ .

1348: \end{equation*}

1349: On the other hand

1350: \begin{equation*}

1351: {\Phi}^{^{\SUR}}_{\mathbf{w},p}(f^{n+2} ; f^{n+1})\leq

1352: \Phi_{\mathbf{w},p}(f^{n+1})

1353: \leq \Phi_{\mathbf{w},p}(f^{n+1})+

1354: \|L(f^{n+1}-f^n)\|^2={\Phi}^{^{\SUR}}_{\mathbf{w},p}(f^{n+1} ; f^n)\

1355: .

1356: \end{equation*}

1357: \hfill\QED\bigskip

1358:

1359: \begin{lemma}

1360: \label{unifbddness}

1361: Suppose the sequence $\mathbf{w}=(\ag)_{\g \in \Gamma}$ is uniformly bounded

1362: below by a strictly positive number.

1363: Then the $\|f^n\|$ are bounded uniformly in $n$.

1364: \end{lemma}

1365: {\em Proof:} Since $\ag \geq c$, uniformly in $\g$, for some $c>0$, we have

1366: \begin{equation*}

1367: \Vvert f^n\Vvert_{\mathbf{w},p}^p

1368: \leq  \Phi_{\mathbf{w},p}(f^n) \leq

1369: \Phi_{\mathbf{w},p}(f^0)~,

1370: \end{equation*}

1371: by Lemma \ref{cost2}. Hence the $f^n$ are bounded uniformly

1372: in the $\Vvert ~\Vvert_{\mathbf{w},p}$-norm.

1373: Since

1374: \begin{equation}

1375: \label{bdL2Ban}

1376: \| f \|^2 \leq c^{-2/p} \ {\mathop{\rm max}_{\g \in \Gamma}}

1377: [\ag^{(2-p)/p} |f_{\g}|^{2-p}]\ \Vvert f \Vvert_{\mathbf{w},p}^p

1378: \leq c^{-2/p} \Vvert f\Vvert_{\mathbf{w},p}^{2-p}\ \Vvert f

1379: \Vvert_{\mathbf{w},p}^p = c^{-2/p} \Vvert f

1380: \Vvert_{\mathbf{w},p}^2\ ,

1381: \end{equation}

1382: we also have a uniform bound on the $\|f^n\|$.

1383: \hfill\QED\bigskip

1384:

1385: \begin{lemma}

1386: \label{series}

1387: The

1388: series

1389: $\sum_{n=0}^\infty \| f^{n+1}-f^n\|^2 $ is convergent.

1390: \end{lemma}

1391: {\em Proof:} This is a consequence of the strict positive-definiteness of $L$,

1392: which holds because $\|K\| <1$. We have, for any $N \in \N$,

1393: \begin{equation*}

1394: \sum_{n=0}^N \| f^{n+1}-f^n\|^2 \leq \frac{1}{A} \sum_{n=0}^N

1395: \| L(f^{n+1}-f^n)\|^2

1396: \end{equation*}

1397: where $A$ is a strictly positive lower bound for the spectrum of $L^*L$.

1398: By Lemma \ref{cost2},

1399: \begin{equation*}

1400: \sum_{n=0}^{N}

1401: \| L(f^{n+1}-f^n)\|^2 \leq

1402: \sum_{n=0}^N [\Phi_{\mathbf{w},p}(f^n)-\Phi_{\mathbf{w},p}(f^{n+1})]

1403: = \Phi_{\mathbf{w},p}(f^{0})-\Phi_{\mathbf{w},p}(f^{N+1}) \leq

1404: \Phi_{\mathbf{w},p}(f^{0})~,

1405: \end{equation*}

1406: where we have used that

1407: $(\Phi_{\mathbf{w},p}(f^n))_{n \in \N}$ is a non-increasing sequence. \\

1408: It follows

1409: that $\sum_{n=0}^N \| f^{n+1}-f^n\|^2 $ is bounded uniformly in $N$, so that

1410: the infinite series converges.\hfill \QED

1411:

1412: \bigskip

1413:

1414: As an immediate consequence, we have that

1415: \begin{lemma}

1416: \label{asyreg}

1417: The mapping ${\T}$ is asymptotically regular, i.e.

1418: \begin{equation*}

1419: \|{\T}^{n+1}f^0 -{\T}^n f^0 \| =

1420: \| f^{n+1} - f^n \| \to 0 \quad {\rm for} \quad n \to \infty\ .

1421: \end{equation*}

1422: \end{lemma}

1423: We can now establish the following

1424: \begin{proposition}

1425: The sequence $f^n={\T}^nf^0$,

1426: $n=1, 2, \cdots$ converges weakly, and its limit is a fixed point for $\T$.

1427: \end{proposition}

1428: {\em Proof:} Since, by Lemma \ref{unifbddness}, the $f^n={\T}^n f^0$

1429: are uniformly bounded in $n$, it follows from the Banach-Alaoglu

1430: theorem that they have a weak accumulation point. By Lemma \ref{lem-3-2},

1431: this weak accumulation point is a fixed point for $\T$.

1432: It follows that the set of fixed points of $\T$ is not empty.

1433: Since $\T$ is also non-expansive (by Lemma \ref{nonexp}) and

1434: asymptotically regular (by Lemma \ref{asyreg}),

1435: we can apply Opial's theorem (Theorem \ref{th-3-1} above), and the

1436: conclusion of the Proposition follows.

1437: \hfill\QED

1438:

1439: \bigskip

1440:

1441: By the following proposition this fixed point is also a minimizer

1442: for the functional $\Phi_{\mathbf{w},p}$.

1443: \begin{proposition}

1444: \label{fix-min}

1445: A fixed point for ${\T}$ is a minimizer for the functional

1446: $\Phi_{\mathbf{w},p}$.

1447: \end{proposition}

1448: {\em Proof:}

1449: If $f^\star = {\T} f^\star$, then by Proposition

1450: \ref{prop-2-1}, we know that $f^\star$ is a minimizer for the surrogate

1451: functional ${\Phi}^{^{\SUR}}_{\mathbf{w},p}(f ; f^\star)$, and

1452: that, $\forall h \in \cH$,

1453: \begin{equation*}

1454: {\Phi}^{^{\SUR}}_{\mathbf{w},p}(f^\star + h ; f^\star) \geq

1455: {\Phi}^{^{\SUR}}_{\mathbf{w},p}(f^\star ;f^\star) + \| h \|^2

1456: ~.

1457: \end{equation*}

1458: Observing that ${\Phi}^{^{\SUR}}_{\mathbf{w},p}(f^\star ; f^\star) =

1459: \Phi_{\mathbf{w},p}(f^\star)$, and

1460: \begin{equation*}

1461: {\Phi}^{^{\SUR}}_{\mathbf{w},p}(f^\star + h ; f^\star) =

1462: \Phi_{\mathbf{w},p}(f^\star + h) + \| h \|^2 - \| Kh\|^2 \ ,

1463: \end{equation*}

1464: we conclude that, $\forall h \in \cH$,

1465: $ \Phi_{\mathbf{w},p}(f^\star + h) \geq \Phi_{\mathbf{w},p}(f^\star) + \|

1466: Kh\|^2$, which shows that $f^\star$ is a minimizer for $\Phi(f)$.

1467: \hfill\QED

1468:

1469: \bigskip

1470:

1471: The following proposition summarizes this subsection.

1472: \begin{proposition}

1473: \label{prop-wk-conv}

1474: {\rm (Weak convergence)} Make the same assumptions as in the statement of

1475: {\rm Theorem

1476: \ref{th-3-1}}. Then, for

1477: any choice of the initial $f^0$, the sequence $f^n={\T}^n f^0, \

1478: n=1,2,\cdots$ converges weakly to a minimizer for $\Phi_{\mathbf{w},p}$.

1479: If either {\rm N}$ (K)=\{0\}$ or $p>1$, then $\Phi_{\mathbf{w},p}$ has a unique

1480: minimizer $f^\star$, and all the sequences $(f^n)_{n \in \N}$ converge

1481: weakly to $f^\star$, regardless of the choice of $f^0$.

1482: \end{proposition}

1483: {\em Proof:}

1484: The only thing that hasn't been proved yet above is the

1485: uniqueness of the minimizer if $\mN (K)=\{0\}$ or $p>1$. This uniqueness

1486: follows from the observation that $\Vvert f \Vvert_{\mathbf{w},p}$ is strictly

1487: convex in $f$ if $p>1$, and that $\|Kf-g\|^2$ is strictly convex in

1488: $f$ if $\mN (K)=\{0\}$. In both these cases $\Phi_{\mathbf{w},p}$ is thus

1489: strictly convex, so that it has a unique minimizer. \hfill \QED

1490:

1491: \bigskip

1492:

1493: \begin{remark}

1494: {\rm If one has the additional prior information that the object lies in

1495: some closed convex subset ${\cal C}$ of the Hilbert space $\cH$, then the

1496: iterative procedure can be adapted to take this into account, by replacing

1497: the shrinkage operator

1498: ${\S}$  by ${\mathbf P}_{\!\cal C} {\S}$,

1499: where ${\mathbf P}_{\!\cal C}$ is the projector on ${\cal C}$. For example,

1500: if $\cH=L^2$, then ${\cal C}$ could be the cone of functions that are positive

1501: almost everywhere. The  results in this subsection can be extended to this

1502: case;

1503: a more general version of Theorem \ref{thm_opial}

1504: can be applied, in which $\A $ need not be defined

1505: on all of $\cH$, but only on ${\cal C} \subset \cH$; see Appendix \ref{Opial}.

1506: We would however need to either use other tools

1507: to ensure, or assume outright

1508: that the set of fixed points of $\T={\mathbf P}_{\!\cal C} {\S}$ is not empty

1509: (see also  \cite{Eic92})}.

1510: \end{remark}

1511:

1512: \bigskip

1513:

1514: \begin{remark}

1515: {\rm If $\Phi_{\mathbf{w},p}$ is strictly convex, then one can prove the weak

1516: convergence more directly, as follows. By the boundedness of the $f^n$

1517: (Lemma \ref{unifbddness}), we must have a weakly convergent subsequence

1518: $(f^{n_k})_{k \in \N}$. By Lemma \ref{asyreg}, the sequence

1519: $(f^{n_k+1})_{k \in \N}$ must then also be weakly convergent, with the same

1520: weak limit $\wf$. It then follows from the equation

1521: $$

1522: f^{n_k+1}_\g= S_{\ag,p}\left(f^{n_k}_\g +[\K (g-K f^{n_k})]_\g \right)~,

1523: $$

1524: together with $\lim_{k \to \infty}f^{n_k}_\g = \lim_{k \to \infty}f^{n_k+1}_\g

1525: =\wf _\g$, that $\wf$ must be the fixed point $f^\star$ of T. Since this

1526: holds for

1527: any weak accumulation point of $(f^n)_{n \in \N}$, the weak convergence

1528: of $(f^n)_{n \in \N}$ to $f^\star$ follows. }

1529: \end{remark}

1530:

1531: \bigskip

1532:

1533: \begin{remark}

1534: {\rm The proof of Lemma \ref{unifbddness} is the only place, so far, where we

1535: have explicitly used $p \leq 2$. If it were possible to establish a uniform

1536: bound on the $\|f^n\|$ by some other means (e.g. by showing that the

1537: $\|\T^n f^0\|$

1538: are bounded uniformly in $n$), then we could dispense with the restriction

1539: $p \leq 2$, and Proposition \ref{prop-wk-conv} would hold for all $p \geq 1$. }

1540: \end{remark}

1541:

1542: \subsection{Strong convergence of the $f^n$}

1543:

1544: In this subsection we shall prove that the convergence of the successive

1545: iterates

1546: $\{f^n\}$ holds not only in the weak topology, but also in the Hilbert

1547: space norm.

1548: Again, we break up the proof into several lemmas. For the sake of convenience,

1549: we introduce the following notations

1550: \begin{eqnarray}

1551: f^\star&=& \mbox{{\em w}\! --\!}\lim_{n \to \infty} f^n \nonumber \\

1552: u^n &=& f^n - f^\star \nonumber \\ %label{redef1}\\

1553: h\ &=& f^\star + \K (g-Kf^\star)\ . \label{redef2}

1554: \end{eqnarray}

1555: Here and below, we use the notation {\em w}$\,$--$\lim$ as a shorthand

1556: for {\em weak limit}.

1557: \begin{lemma}

1558: \label{Ku}

1559: $\| Ku^n \| \to 0$ for $n \to \infty$\ .

1560: \end{lemma}

1561: {\em Proof:}

1562: Since

1563: \begin{equation*}

1564: u^{n+1} - u^n = {\S}_{\mathbf{w},p}\left( h+(I-K^*K)u^n\right) -

1565: {\S}_{\mathbf{w},p}(h) - u^n

1566: \end{equation*}

1567: and $\|u^{n+1} - u^n \| = \| f^{n+1} - f^n \| \to 0\ {\rm for}\ n

1568: \to \infty$ by Lemma \ref{asyreg}, we have

1569: \begin{equation}

1570: \|\ {\S}_{\mathbf{w},p}\left( h+(I-K^*K)u^n\right) - {\S}_{\mathbf{w},p}(h) -

1571: u^n \| \to 0 \  {\rm for}\ n \to \infty ~,

1572: \label{cvaux}

1573: \end{equation}

1574: and hence also

1575: \begin{equation}

1576:  \max\left(0,\| u^n \|- \|{\S}_{\mathbf{w},p}\left( h+(I-K^*K)u^n\right) -

1577: {\S}_{\mathbf{w},p}(h)\|\ \right) \to 0 \ {\rm for}\  n \to \infty\ .

1578: \label{cvtozero}

1579: \end{equation}

1580: Since ${\S}_{\mathbf{w},p}$ is non-expansive

1581: (Lemma \ref{SS-non-exp}), we have

1582: \begin{equation*}

1583: \| \ {\S_{\mathbf{w},p}}\left( h+(I-K^*K)u^n\right) - {\S}_{\mathbf{w},p}(h) \|

1584: \ \leq

1585: \| (I-K^*K)u^n \| \leq \| u^n \|~;

1586: \end{equation*}

1587: therefore the ``max'' in  (\ref{cvtozero}) can be dropped, and it follows that

1588: \begin{equation}

1589: \| u^n \| - \| (I-K^*K)u^n \|  \to 0 \ {\rm for}\  n \to \infty \ .

1590: \label{cv2}

1591: \end{equation}

1592: Because

1593: \begin{eqnarray*}

1594: \| u^n \| + \| (I-K^*K)u^n \| &\leq& 2\| u^n\| = 2\|f^n-f^\star\|\\

1595: &\leq& 2(\| f^\star\|+ {\mathop{\rm sup}_{k}} \| f^k\|) = C

1596: \end{eqnarray*}

1597: where $C$ is a finite constant (by Lemma \ref{unifbddness}), we obtain

1598: \begin{equation*}

1599: 0 \leq \| u^n \|^2 - \| (I-K^*K)u^n \|^2 \leq

1600: C (\| u^n \| - \| (I-K^*K)u^n \|)\ ,

1601: \end{equation*}

1602: which tends to zero by (\ref{cv2}).

1603: The inequality

1604: \begin{equation*}

1605: \| u^n \|^2 - \| (I-K^*K)u^n \|^2 =

1606: 2\| Ku^n\|^2-\|\K Ku^n\|^2 \geq \| Ku^n\|^2

1607: \end{equation*}

1608: then implies that $\| Ku^n\|^2 \to 0 \ {\rm for}\  n \to \infty\ $.

1609: \hfill\QED

1610: \begin{remark}

1611: \label{ifKcomp}

1612: {\rm Note that if $K$ is a compact operator, the weak convergence

1613: to $0$ of the $u_n$ automatically implies that $\|K u_n\|$ tends

1614: to $0$ as $n$ tends to $\infty$, so that we don't need

1615: Lemma \ref{Ku} in this case.}

1616: \end{remark}

1617:

1618: \bigskip

1619:

1620: If $K$ had a bounded inverse, we could conclude from $\|K u_n\| \to 0$ that

1621: $\| u_n\|

1622: \to 0

1623: \ {\rm for}\  n \to \infty\ $. If this is not the case, however,

1624: and thus for all

1625: ill-posed linear inverse problems, we need some extra work to show the norm

1626: convergence of $f^n$ to $f^\star$.

1627: \begin{lemma}

1628: For $h$ given by {\rm (\ref{redef2})},

1629: $\| {\S}_{\mathbf{w},p}(h+u^n) - {\S}_{\mathbf{w},p}(h) - u^n \| \to 0$ for

1630: $n \to

1631: \infty$.

1632: \end{lemma}

1633: {\em Proof:}

1634: We have

1635: \begin{eqnarray*}

1636: \| {\S}_{\mathbf{w},p}(h+u^n) - {\S}_{\mathbf{w},p}(h) - u^n \|

1637: &\leq& \| {\S}_{\mathbf{w},p}(h+u^n-K^*Ku^n) - {\S}_{\mathbf{w},p}(h) - u^n

1638: \|\\ &&~~~~+

1639: \|{\S}_{\mathbf{w},p}(h+u^n) - {\S}_{\mathbf{w},p}(h+u^n-K^*Ku^n) \| \\

1640: &\leq& \| {\S}_{\mathbf{w},p}(h+u^n-K^*Ku^n) - {\S}_{\mathbf{w},p}(h) - u^n

1641: \|\\ &&

1642: ~~~~+\|\K Ku^n \|~,

1643: \end{eqnarray*}

1644: where we used the non-expansivity of ${\S}_{\mathbf{w},p}$ (Lemma

1645: \ref{SS-non-exp}).

1646: The result follows since both terms in this last bound tend to zero for $n \to

1647: \infty$ because of Lemma

1648: \ref{Ku} and (\ref{cvaux}).

1649: \hfill\QED

1650:

1651: \bigskip

1652:

1653: \begin{lemma}

1654: \label{lm-3-16}

1655: If for some $a \in \cH$, and some sequence $(v^n)_{n \in \N}$,

1656: w--$\lim_{n \to \infty}v^n=0$  and

1657: $\lim_{n \to \infty}\| { \S}_{\mathbf{w},p}(a+v^n) -

1658: {\S}_{\mathbf{w},p}(a) - v^n \|=0$

1659: then $\| v^n \| \to 0$ for $n \to \infty$.

1660: \end{lemma}

1661: {\em Proof:}

1662: The argument of the proof is slightly different for the cases $p=1$

1663: and $p>1$, and we treat the two cases separately. \\

1664: We start with $p>1$.

1665: Since the sequence $\{v^n\}$ is weakly convergent, it has to be bounded: there

1666: is a constant $B$ such that $\forall n$, $\| v^n \| \leq B$, and

1667: hence also $\forall n, \forall \g \in \Gamma$, $\vert v^n_\g \vert \leq B$.

1668: Next, we define the set $\Gamma_{_{\!0}}

1669: = \{ \g \in \Gamma; |a_{\g}| \geq B\}$; since $a \in \cH$, this is a finite

1670: set. We then have $\forall \g \in \Gamma_{_{\!1}}=\Gamma \setminus

1671: \Gamma_{_{\!0}}$,

1672: that $|a_{\g}|$ and $|a_{\g}+v^n_{\g}|$ are bounded above by $2B$.

1673: Recalling the definition of $S_{\ag,p}=\left(F_{\ag,p}\right)^{-1}$,

1674: and observing that, because $p \leq 2$,

1675: $F'_{\ag,p}(x)=1+\ag p(p-1)|x|^{p-2}/2 \geq

1676: 1+ \ag \, p(p-1) /[2(2B)^{2-p}]$ if $|x| \leq 2B$ , we have

1677: \begin{eqnarray*}

1678: |S_{\ag,p}(a_{\g}+v^n_{\g}) - S_{\ag,p}(a_{\g})|

1679: &\leq &\left( {\mathop {\max}_{|x|\leq 2B}} |S'_{\ag,p}(x)| \right) |v^n_{\g}|

1680: \\ & \leq & \left( 1+ \ag\, p(p-1) /[2 (2B)^{2-p}] \right)^{-1} |v^n_{\g}| \\

1681: & \leq & \left( 1+ c \, p(p-1) / [2(2B)^{2-p}] \right)^{-1} |v^n_{\g}|~;

1682: \end{eqnarray*}

1683: in the second inequality,

1684: we have used that $|S_{\ag,p}(x)| \leq |x|$, a consequence of the

1685: non-expansivity

1686:  of $S_{\ag,p}$ (see

1687: Lemma

1688: \ref{SS-non-exp}) to upper bound the derivative

1689: $S'_{\ag,p}$ on the interval $[-2B,2B]$ by the inverse of the lower bound for

1690: $F'_{\ag ,p}$ on the same interval;

1691: in the last inequality we used the uniform lower bound on the $\ag$, i.e.

1692: $\forall \g,

1693: ~ \ag \geq c >0$.

1694: Rewriting $\left( 1+ c \, p(p-1) / [2(2B)^{2-p}] \right)^{-1}= C'<1$, we

1695: have thus,

1696: $\forall \g \in \Gamma_{_{\!1}}$, $C'|v^n_{\g}| \geq

1697: |S_{\ag,p}(a_{\g}+v^n_{\g}) - S_{\ag,p}(a_{\g})|$, which implies

1698: $$

1699: \sum_{\g \in \Gamma_{_{\!1}}} |v^n_{\g}|^2 \leq

1700: \frac{1}{(1-C')^2} \sum_{\g \in \Gamma_{_{\!1}}} |v^n_{\g}

1701: -S_{\ag,p}(a_{\g}+v^n_{\g})

1702: + S_{\ag,p}(a_{\g})|^2  \to 0 \mbox{ as } n \to \infty ~.

1703: $$

1704: On the other hand, since $\Gamma_{_{\!0}}$ is a finite set, and the $v^n$

1705: tend to

1706: zero weakly as $n$ tends to $\infty$, we also have

1707: $$

1708: \sum_{\g \in \Gamma_{_{\!0}}} |v^n_{\g}|^2 \to \infty \mbox{ as } n \to

1709: \infty ~.

1710: $$

1711: This proves the proposition for the case $p>1$. \\

1712: For $p=1$,

1713: we define  a finite set $\Gamma_{_{\!0}} \subset \Gamma$ so

1714: that $\sum_{ \g \in \Gamma \setminus \Gamma_{_{\!0}}}

1715: |a_{\g}|^2 \leq (c/4 )^2$,

1716: where $c$ is again the uniform lower bound on the $\ag$.

1717: Because this is a finite set, the weak convergence of the $v^n$

1718: implies that $\sum_{\g \in \Gamma_{_{\!0}}} |v^n_{\g}|^2

1719: \xrightarrow[n \to \infty]{~}  0$,

1720: so that we can concentrate on

1721: $\sum_{\g \in \Gamma \setminus \Gamma_{_{\!0}}} |v^n_{\g}|^2$ only. \\

1722: For each $n$, we split $\Gamma_{_{\!1}}=\Gamma \setminus \Gamma_{_{\!0}}$ into

1723: two subsets:

1724: $\Gamma_{_{\!1,n}} = \{\g \in \Gamma_{_{\!1}};

1725: |v^n_{\g}+a_{\g}| <\ag/2\}$ and $\widetilde{\Gamma}_{_{\!1,n}}=

1726: \Gamma_{_{\!1}} \setminus \Gamma_{_{\!1,n}}$. If $\g \in \Gamma_{_{\!1,n}}$,

1727: then $S_{\ag,1}(a_{\g}+v^n_{\g})=

1728: S_{\ag,1}(a_{\g}) =0$ (since $|a_{\g}|\leq c/4 \leq \ag/2$),

1729: so that $|v^n_{\g} -

1730: S_{\ag,1}(a_{\g}+v^n_{\g}) + S_{\ag,1}(a_{\g})|=|v^n_{\g}|$.

1731: It follows that

1732: $$

1733: \sum_{\g \in \Gamma_{_{\!1,n}}} |v^n_{\g}|^2

1734: \leq \sum_{\g \in \Gamma} |v^n_{\g} -

1735: S_{\ag,1}(a_{\g}+v^n_{\g}) + S_{\ag,1}(a_{\g})|^2 \to 0 \mbox{ as }

1736: n \to \infty ~.

1737: $$

1738: It remains to prove only that

1739: the remaining sum, $\sum_{\g \in \widetilde{\Gamma}_{_{\!1,n}}}

1740: |v^n_{\g}|^2 $ also tends

1741: to $0$ as $n \to \infty$.  \\

1742: If $\g \in \Gamma_{_{\!1}}$ and $|v^n_{\g}+a_{\g}| \geq \ag/2$, then

1743: $|v^n_{\g}|\geq |v^n_{\g}+a_{\g}| - |a_{\g}| \geq \ag/2 -c/4

1744: \geq c/4 \geq |a_{\g}|$, so that $ v^n_{\g}+a_{\g}$ and

1745: $v^n_{\g}$ have the same sign; it then follows that

1746: \begin{eqnarray*}

1747: && |v^n_{\g} -

1748: S_{\ag,1}(a_{\g}+v^n_{\g}) + S_{\ag,1}(a_{\g})|=

1749: |v^n_{\g} -

1750: S_{\ag,1}(a_{\g}+v^n_{\g})| \\

1751: &&~~~~~~~~

1752: =|v^n_{\g}- (a_{\g}+v^n_{\g})+ \frac{\ag}{2} \mbox{sign}(v^n_{\g})|

1753: \geq  \frac{\ag}{2} -|a_{\g}| \geq \frac{c}{4} ~.

1754: \end{eqnarray*}

1755: This implies that

1756: $$

1757: \sum_{\g \in \widetilde{\Gamma}_{_{\!1,n}}}|v^n_{\g} -

1758: S_{\ag,1}(a_{\g}+v^n_{\g}) + S_{\ag,1}(a_{\g})|^2

1759: \geq \left(\frac{c}{4}\right)^2 \# \widetilde{\Gamma}_{_{\!1,n}} ~;

1760: $$

1761: since $\|v^n-\S_{\mathbf{w},1}(a+v^n)+\S_{\mathbf{w},1}(a)\|

1762: \xrightarrow[n \to \infty]{~} 0$, we know on the other hand

1763: that

1764: $$

1765: \sum_{\g \in \widetilde{\Gamma}_{_{\!1,n}}}|v^n_{\g} -

1766: S_{\ag,1}(a_{\g}+v^n_{\g}) + S_{\ag,1}(a_{\g})|^2 <

1767: \left(\frac{c}{4}\right)^2

1768: $$

1769: when $n$ exceeds some threshold, $N$, which implies that

1770: $\widetilde{\Gamma}_{_{\!1,n}}$ is empty when $n > N$. Consequently

1771: $\sum_{\g \in \widetilde{\Gamma}_{_{\!1,n}}}|v^n_{\g}|^2 =0$

1772: for $n > N$. This completes the proof for the case $p=1$.\hfill \QED

1773:

1774: \bigskip

1775:

1776: Combining the Lemmas in this subsection with the results of the previous

1777: subsection gives a complete proof of Theorem \ref{th-3-1} as stated at the

1778: start of this section.

1779:

1780:

1781:

1782:

1783: \section{Regularization properties and stability estimates}

1784:

1785: In the preceding section, we devised

1786: an iterative algorithm that

1787: converges towards a minimizer of the functional

1788: \begin{equation}

1789: \Phi_{\mathbf{w},p}(f) = \| Kf-g\|^2 +  \Vvert f\Vvert^p_{\mathbf{w},p}~.

1790: \label{phimu2}

1791: \end{equation}

1792: For simplicity, let us assume, until further notice,

1793: that either $p>1$ or $\mN(K)=\{0\}$, so that there is a unique minimizer.

1794:

1795: In this section, we shall discuss to what extent this minimizer is

1796: acceptable as a {\em regularized solution} of the (possibly ill-posed)

1797: inverse problem $Kf=g$. Of particular interest to us is the {\em stability}

1798: of the estimate. For instance, if $\mN(K)=\{0\}$, we would like to know

1799:  to what extent the proposed solution, in this

1800: case the minimizer of $\Phi_{\mathbf{w},p}$, deviates from the ideal

1801: solution $f_o$ if the data are a (small) perturbation of the image

1802: $Kf_o$ of $f_o$. (If $\mN(K) \neq \{0\}$, then there exist other $f$ that

1803: have the same image as $f_o$, and the algorithm might choose one of those

1804: --  see below.) In this discussion both the ``size'' of the perturbation

1805: and the weight of the penalty term in the variational functional, given

1806: by the coefficients $(\ag)_{\g \in \Gamma}$, play a role. We argued earlier

1807: that we need

1808: $\mathbf{w} \neq \mathbf{0}$ in order to provide a meaningful estimate

1809: if e.g. $K$ is a compact operator; on the other hand,

1810: if $g = Kf_o$, then the presence of the penalty term will cause the

1811: minimizer of $\Phi_{\mathbf{w},p}$ to be different from $f_o$. We therefore

1812: need to strike a balance between the respective weights of the

1813: perturbation $g - Kf_o$ and

1814: the penalty term. Let us first define a framework in which we can make this

1815: statement more precise.

1816:

1817: Because we shall deal in this section with data functions $g$ that are not

1818: fixed, we

1819: adjust our notation for the variational functional to make the dependence

1820: on $g$

1821: explicit

1822: where appropriate: with this more elaborate notation, the right hand side

1823: of, for instance, \eref{phimu2} is now $\Phi_{\mathbf{w},p; g}(f )$.

1824: (Because we work with one fixed

1825: operator $K$,  the dependence of the functional on $K$ remains ``silent''.)

1826: In order to make it possible to vary the weight of the penalty term in the

1827: functional, we introduce an extra parameter $\mu$. We shall thus consider

1828: the functional

1829: \begin{equation}

1830: \label{phimu}

1831: \Phi_{\mu,\mathbf{w},p; g}(f)=

1832: \| Kf-g\|^2 +\mu \Vvert f\Vvert^p_{\mathbf{w},p}~.

1833: \end{equation}

1834: Its minimizer will likewise depend on all these parameters. In its full

1835: glory, we denote it by $f^{\star}_{\mu ,\mathbf{w},p;g}$; when  confusion

1836: is impossible we  abbreviate this notation. In particular, since $\mathbf{w}$

1837: and $p$ typically will not vary in the limit procedure that defines stability,

1838: we may omit them in the heat of the discussion. Notice that the dependence on

1839: $\mathbf{w}$ and $\mu$ arises only through the product $\mu \mathbf{w}$.

1840:

1841: As mentioned above, if the ``error'' $e =g - Kf_o$ tends to zero, we would like

1842: to see our estimate for the solution of the inverse problem tend to $f_o$;

1843: since the minimizer of $\Phi_{\mu ,\mathbf{w},p; g}(f)$ differs from $f_o$ if

1844: $\mu \ne 0$, this means

1845: that we shall have to consider simultaneously a limit for $\mu \rightarrow 0$.

1846: More precisely, we want to find a functional dependence of $\mu$

1847: on the noise level $\epsilon$, $\mu=\mu(\epsilon)$

1848: such that

1849: \begin{equation}

1850: \label{desired-res}

1851: \mu(\epsilon)  \xrightarrow[\epsilon \rightarrow 0]{~}  0  ~~~

1852: \mbox {and } ~~~ \sup_{\|g-Kf_o\|\leq \epsilon}

1853: \|f^{\star}_{\mu(\epsilon) ,\mathbf{w},p;g}-f_o \|

1854: \xrightarrow[\epsilon \rightarrow 0]{~}  0~.

1855: \end{equation}

1856: for each $f_o$ in a certain class of functions.

1857: If we can achieve this, then the ill-posed inverse problem will be {\em

1858: regularized}

1859: (in norm or ``strongly'') by our iterative method,

1860: and $f^\star_{\mu,\mathbf{w},p;g}$ will be

1861: called a {\em regularized solution}.

1862: One  also says in this case

1863: that the minimization of the penalized least-squares functional

1864: (\ref{phimu2}) provides us with a {\em regularizing algorithm} or

1865: {\em regularization method}.

1866:

1867: \subsection {A general regularization theorem}

1868: If the $\ag$ tend to $\infty$, or more precisely, if

1869: \begin{equation}

1870: \label{compact-emb}

1871: \forall C >0 ~: ~ \# \{ \g \in \Gamma ; \ag \leq C \} < \infty ~,

1872: \end{equation}

1873: then the embedding of $\mathcal{B}_{\mathbf{w},p}=

1874: \{ f \in \cH;\sum_{\g \in \Gamma} \ag |f_\g|^p < \infty \}$ in $\cH$ is

1875: compact. (This is because the identity operator from

1876: $\mathcal{B}_{\mathbf{w},p}$ to

1877: $\cH$ is then the norm--limit in $\mathcal{L}(\mathcal{B}_{\mathbf{w},p},

1878: \cH)$,

1879: as $C \to \infty$,

1880: of the finite rank operators $P_C$ defined by

1881: $P_C f=\sum_{\g \in \Gamma_C} \ag \left<f,\varphi_\g \right> \varphi_\g$,

1882: where $\Gamma_C= \{ \g \in \Gamma ; \ag \leq C \}$.) In this case,

1883: general compactness arguments can be used to show that \eref{desired-res}

1884: can be achieved. (See also further below.)

1885: We are, however, also interested in the general case, where

1886: the $\ag$ need not grow unboundedly.

1887: The following theorem proves that we can then nevertheless

1888: choose the dependence $\mu(\epsilon)$

1889: so that \eref{desired-res} holds:

1890: \begin{theorem}

1891: \label{regthm}

1892: Assume that $K$ is a bounded operator from $\cH$ to $\cH'$ with $\|K\|<1$, that

1893: $1 \leq p \leq 2$ and that the entries in the sequence

1894: $\mathbf{w}=(\ag)_{\g \in \Gamma}$

1895: are bounded below uniformly by a strictly positive number $c$.

1896: Assume that either $p>1$ or $\mbox{{\rm N}}(K)=\{0\}$.

1897: For any $g \in \cH'$

1898: and any $\mu >0$, define

1899: $f^{\star}_{\mu, \mathbf{w},p;g}$ to be the minimizer of $\Phi_{\mu,

1900: \mathbf{w},p ; g}(f)$.

1901: If $\mu=\mu(\epsilon)$ satisfies the requirements

1902: \begin{equation}

1903: \label{mu-req}

1904: \lim_{\epsilon \rightarrow 0} \mu(\epsilon)=0 ~~~~~ \mbox{{\rm and}}

1905: ~~~~~ \lim_{\epsilon \rightarrow 0} \epsilon^2/\mu(\epsilon) =0 ~,

1906: \end{equation}

1907: then we have, for any $f_o \in \cH$,

1908: $$

1909: \lim_{\epsilon \rightarrow 0} \left[ \sup_{\|g-Kf_o\|\leq \epsilon}

1910: \|f^{\star}_{\mu(\epsilon), \mathbf{w},p;g}-\fs \|\right] =0 ~,

1911: $$

1912: where $\fs$ is the unique element of minimum

1913: $\Vvert ~ \Vvert _{\mathbf{w},p}$--norm in $\mathcal{S}

1914: = \mbox{{\rm N}}(K)+f_o =

1915: \{f; Kf = Kf_o\}$.

1916: \end{theorem}

1917:

1918:

1919: Note that under the conditions of Theorem \ref{regthm}, $\fs$ must indeed

1920: be unique:

1921: if $p>1$, then the $\Vvert ~ \Vvert _{\mathbf{w},p}$--norm is strictly

1922: convex, so

1923: that there is a unique minimizer for this norm in the hyperspace $\mN(K)+f_o$;

1924: if $p=1$, our assumptions require $\mN(K)=\{0\}$. Note also that

1925: if $\mN(K)=\{0\}$ (whether or not $p=1$), then necessarily $\fs = f_o$.

1926:

1927:

1928: To prove Theorem \ref{regthm}, we will need the following two lemmas:

1929:

1930: \begin{lemma}

1931: \label{lm-4-1}

1932: The functions $S_{w,p}$ from $\R$ to itself, defined by {\rm (\ref{S-pneq1},

1933: \ref{S-peq1})} for $p>1$, $p=1$, respectively, satisfy

1934: $$

1935: |S_{w,p}(x)-x| \leq {\frac{wp}{2}}\ |x|^{p-1}~.

1936: $$

1937: \end{lemma}

1938: {\em Proof:}

1939: For $p=1$, the definition \eref{S-peq1} implies immediately that

1940: $|x-S_{w,1}(x)|= \min(w/2,|x|) \leq w/2$, so that the proposition holds

1941: for $x \neq 0$. For $x=0$, $S_{w,1}(x)=0$.

1942:

1943: For $p>1$, $S_{w,p}= \left( F_{w,p} \right)^{-1}$, where

1944: $F_{w,p}(y)=y+{\frac{wp}{2}}|y|^{p-1} \mbox{sign}(y)$ satisfies $|F_{w,p}(y)|

1945: \geq |y|$, and $|F_{w,p}(y)-y| \leq {\frac{wp}{2}} |y|^{p-1}$.

1946: It follows that $\, |S_{w,p}(x)| \leq |x|\,$, and

1947: $\, |x-S_{w,p}(x)| $ $\leq {\frac{wp}{2}} \; |S_{w,p}(x)|^{p-1} $

1948: $ \leq {\frac{wp}{2}}\; |x|^{p-1} ~$.

1949: $~~~~~~~~~~~~~~~~~~$ \hfill \QED

1950:

1951: \bigskip

1952:

1953: \begin{lemma}

1954: \label{lm-conv}

1955: If the sequence of vectors $\left(v_k\right)_{ _{k \in \N}}$ converges weakly

1956: in $\cH$ to $v$, and $\lim_{k \to \infty} \Vvert v_k \Vvert_{\mathbf{w},p}$

1957: $ = \Vvert v \Vvert_{\mathbf{w},p}$,

1958: then $\left(v_k\right)_{ _{k \in \N}}$ converges

1959: to $v$ in the $\cH$--norm, i.e. $\lim_{k \to \infty} \|v-v_k\|=0~$.

1960: \end{lemma}

1961: {\em Proof:}

1962: It is a standard result that if {\em w}$\,$--$\lim_{k \to \infty} v_k = v$,

1963: and

1964: $\lim_{k \to \infty} \|v_k\|=\|v\|$, then $\lim_{k \to \infty} \|v- v_k\|^2

1965: = \lim_{k \to \infty} \left( \|v\|^2 + \|v_k\|^2 - 2 \left< v, v_k \right>

1966: \right)

1967: =  \|v\|^2 + \|v\|^2 - 2 \left< v, v \right>

1968:  = 0$.

1969: We thus need to prove only that $\lim_{k \to \infty} \|v_k\|=\|v\|$.

1970:

1971: Since the $v_k$ converge weakly, they are uniformly bounded. It follows that

1972: the $|v_{k,\g}| = |\left<v_k, \vpg \right>|$ are bounded uniformly in $k$

1973: and $\g$

1974: by some finite number $C$. Define $r=2/p$. Since, for $x, y > 0$,

1975: $|x^r - y^r| \leq r |x-y| \max(x,y)^{r-1}$, it follows that

1976: $ \left|~|v_{k,\g}|^2 -|v_{\g}|^2 \right|

1977: \leq r \, C^{p(r-1)}~  \left| ~ |v_{k,\g}|^p -|v_{\g}|^p \right|~.$

1978: Because the $\ag$ are uniformly bounded below by $c>0$, we obtain

1979: $$

1980: \left| \|v_k\|^2 -\|v\|^2 \right| \leq

1981: \sum_{\g \in \Gamma} \left| |v_{k,\g}|^2 - |v_{\g}|^2 \right|

1982: \leq \frac{2}{c \, p}\, C^{2-p} \sum_{\g \in \Gamma}

1983: \ag \left| ~ |v_{k,\g}|^p -|v_{\g}|^p \right| ~,

1984: $$

1985: so that it suffices to prove that this last expression tends to $0$

1986: as $k$ tends to $\infty$.

1987: Define now $u_{k,\g}=\min \left( |v_{k,\g}|,|v_{\g}| \right)$. Clearly

1988: $ \forall \g \in \Gamma ~:~ \lim_{k \to \infty} u_{k,\g}= |v_{\g}|~$; since

1989: $\sum_{\g \in \Gamma}  \ag |v_{\g}|^p < \infty$, it follows by the dominated

1990: convergence theorem that $\lim_{k \to \infty}

1991: \sum_{\g \in \Gamma}  \ag u_{k,\g}^p =

1992: \sum_{\g \in \Gamma}  \ag |v_{\g}|^p $. Since

1993: $$

1994: \sum_{\g \in \Gamma}

1995: \ag \left| ~ |v_{k,\g}|^p -|v_{\g}|^p \right| =

1996: \sum_{\g \in \Gamma}  \ag \left( |v_{\g}|^p + |v_{k,\g}|^p

1997: - 2 u_{k,\g}^p \right) \xrightarrow[k \to \infty]{~} 0 ~,

1998: $$

1999: the Lemma follows. \hfill \QED

2000:

2001: We are now ready to proceed to the

2002:

2003: {\em Proof of }Theorem 4.1:

2004: \newline

2005: Let's assume that $\mu(\epsilon)$ satisfies the requirements \eref{mu-req}.

2006: \newline

2007: We first establish weak convergence. For this it is sufficient to prove

2008: that if $(g_n)_{n \in \N}$ is a sequence in $\cH'$ such that

2009: $\|g_n-Kf_o\| \leq \epsilon_n$, where

2010: $(\epsilon_n)_{n \in \N}$ is a sequence of strictly positive numbers

2011: that converges to zero

2012: as $n \to \infty$, then {\em w}$\,$--$\lim_{n \to \infty}

2013: f^{\star}_{\mu(\epsilon_n);g_n}= f^\dagger$, where $f^{\star}_{\mu;g}$

2014: is the unique minimizer of $\Phi_{\mu ,\mathbf{w},p;g}(f)$

2015: (As predicted, we have dropped here the

2016: explicit indication of the dependence of $f^{\star}$ on $\mathbf{w}$ and

2017: $p$; these

2018: parameters will keep fixed values throughout this proof. We will take the

2019: liberty

2020: to drop them in our notation for $\Phi$ as well, when this is convenient.)

2021: For the sake of convenience,

2022: we abbreviate $\mu(\epsilon_n)$ as $\mu_n$. \\

2023: Then the $f^{\star}_{\mu_n;g_n}$ are uniformly bounded in $\cH$

2024: by the following argument:

2025: \begin{eqnarray}

2026: \|  f^\star_{\mu_n;g_n} \|^p &\leq &  \frac{1}{c}

2027: \Vvert f^\star_{\mu_n;g_n} \Vvert_{\mathbf{w},p}^p

2028: \leq \frac{1}{\mu_n \, c}\;\Phi_{\mu_n; g_n}(f^\star_{\mu_n;g_n})

2029: \leq \frac{1}{\mu_n \, c}\; \Phi_{\mu_n;g_n}(\fs )\nonumber \\

2030: &=&\frac{1}{\mu_n \, c}\left[\|Kf_o-g_n \|^2

2031: + \mu_n \Vvert \fs \Vvert^p_{\mathbf{w},p} \right]

2032: \leq  \frac{1}{c} \left( \frac{\epsilon_n^2}{\mu_n}+\Vvert f^\dagger

2033: \Vvert_{\mathbf{w},p}^p \right) ~,

2034: \label{fmuubd}

2035: \end{eqnarray}

2036: where we have used, respectively,

2037: the bound (\ref{bdL2Ban}), the fact that $f^\star_{\mu_n;g_n}$

2038: minimizes $\Phi_{\mu_n;g_n}(f)$, $K\fs = Kf_o$

2039: and the bound $\|Kf_o-g_n\|^2 \leq \epsilon_n^2$.

2040: By the assumption \eref{mu-req}, $\epsilon_n^2 /\mu_n$ tends to zero for

2041: $n\to\infty$ and hence can be bounded by a constant independent of $n$.

2042: \newline

2043: It

2044: follows that the sequence $(f^\star_{\mu_n;g_n})_{_{n\in \N}}$ has at least

2045: one weak

2046: accumulation point, i.e. there exists a subsequence

2047:

2048: $(f^\star_{\mu_{n_l};g_{n_l}})_{_{l \in {\N}}}$ that has a weak limit.

2049: Because this sequence is bounded in the $\Vvert \; \Vvert $-norm,

2050: by passing to a subsequence

2051:  $\left( f^*_{\mu_{n_{l(k)}};g_{n_{l(k)}}}\right)_{k \in \N}$, we can

2052: ensure that the $\Vvert f^*_{\mu_{n_{l(k)}};g_{n_{l(k)}}}

2053: \Vvert_{\mathbf{w},p}$

2054: constitute a converging sequence.

2055: To simplify notation, we define $\widetilde\mu_k = \mu_{n_{l(k)}}$ and

2056: ${\widetilde f}_k =

2057: f^\star_{\mu_{n_{l(k)}};g_{n_{l(k)}}}$; the $\widetilde{f}_k$ have the

2058: same weak limit $\widetilde{f}$ as the $f^*_{\mu_{n_l};g_{n_l}}$.

2059: We also define

2060: $\widetilde{g}_k = g_{n_{l(k)}}$,

2061: ${\widetilde e}_k = \widetilde{g}_k-Kf_o$ and $\widetilde\epsilon_k =

2062: \epsilon_{n_{l(k)}}$.

2063: We shall show that $\widetilde{f}=\fs$.

2064: \newline

2065: Since each $\widetilde{f}_k $ is the minimizer of

2066: $\Phi_{\widetilde\mu_k; \widetilde{g}_k}(f) $, by Proposition \ref{fix-min}, it

2067: is a fixed point of the corresponding operator $\T$. Therefore, for any $\g \in

2068: \Gamma$,

2069: ${\wf}_\g  =

2070: \left<{\wf},\vpg \right>$ satisfies

2071: \begin{eqnarray*}

2072: {\wf}_\g & = &\lim_{k\to\infty}({\wf}_k)_\g  =

2073: \lim_{k\to\infty} S_{\widetilde\mu_k \ag,p}[({\widetilde h}_k)_\g] \\

2074: \hbox{\ \ with\ \ } ~~~  {\widetilde h}_k & = &

2075: {\wf}_k + K^*(\widetilde g_k-K{\wf}_k)=

2076: \wf_k + \K K (f_o - \wf_k)+ K^*{\widetilde e}_k ~.

2077: \end{eqnarray*}

2078: We now rewrite this as

2079: \begin{equation}

2080: \label{2terms}

2081: {\wf}_\g=\lim_{k\to\infty}

2082: \left(S_{\widetilde\mu_k \ag,p}[({\widetilde h}_k)_\g] -

2083: ({\widetilde h}_k)_\g \right) +

2084: \lim_{k\to\infty} ({\widetilde h}_k)_\g ~ .

2085: \end{equation}

2086: By Lemma \ref{lm-4-1} the first limit in the right hand side is zero, since

2087: $$

2088: \left|S_{\widetilde\mu_k \ag,p}[({\widetilde h}_k)_\g] -

2089: ({\widetilde h}_k)_\g \right|

2090: \leq  p\ \ag\,   \widetilde{\mu}_k

2091: ~| (\widetilde h_k)_\g |^{p-1} /2\leq p\ C\  \widetilde{\mu}_k

2092: [ 3C + \widetilde\epsilon_k ] /2 \xrightarrow[k \to \infty]{~} 0~,

2093: $$

2094: where we have used $\|K\|<1$ ($C$ is some constant depending on $\ag$). Because

2095: $\lim_{k

2096: \to

2097: \infty}\Vert\widetilde{e}_k\Vert=0$, and {\em w}$\,$--$\lim_{k \to \infty}

2098: \wf_k =

2099: \wf$,  it then follows from \eref{2terms} that

2100: \begin{equation*}

2101: {\wf}_\g = \lim_{k\to\infty} ({\widetilde h}_k)_\g = {\wf}_\g +

2102: [\K K(\fs -{\wf})]_\g \ .

2103: \end{equation*}

2104: Since this holds for all $\g$, it follows that

2105: $\K K(\fs-{\wf})=0$. If $\mN(K)=\{0\}$, then this allows us

2106: immediately to conclude that $\wf=\fs$. When $\mN(K) \neq \{0\}$,

2107: we can only conclude that $\fs -\wf \in \mN(K)$. Because $\fs$

2108: has the smallest $\Vvert ~ \Vvert _{\mathbf{w},p}$--norm among all

2109: $f \in \mathcal{S}=\{f; Kf=Kf_o\}$, it follows that

2110: $\Vvert \wf \Vvert _{\mathbf{w},p} \geq \Vvert \fs \Vvert _{\mathbf{w},p}$.

2111: On the other hand, because

2112: the ${\wf}_k$ weakly converge to ${\wf}$, and therefore, for all $\g$,

2113: $({\wf}_k)_\g \to {\wf}_\g $ as $k\to\infty$,  we

2114: can use Fatou's lemma to obtain

2115: \begin{equation}

2116: \Vvert {\wf} \Vvert^p_{\mathbf{w},p} =

2117: {\sum_\g} \ag|{\wf}_\g |^p  \leq

2118: \limsup_{k \to\infty} {\sum_\g} \ag |({\wf}_k)_\g|^p

2119: = \limsup_{k \to\infty}\Vvert {\wf}_k\Vvert_{\mathbf{w},p}^p

2120: = \lim_{k \to\infty}\Vvert {\wf}_k\Vvert_{\mathbf{w},p}^p \ .

2121: \label{now123}

2122: \end{equation}

2123: It then follows from (\ref{fmuubd}) that

2124: \begin{equation}

2125: \lim_{k \to\infty}\Vvert {\wf}_k\Vvert_{\mathbf{w},p}^p

2126: \leq \lim_{k \to\infty}\left[\frac{\widetilde\epsilon_k^2}{ \widetilde\mu_k}

2127: +\Vvert \fs \Vvert_{\mathbf{w},p}^p\right] = \Vvert \fs

2128: \Vvert_{\mathbf{w},p}^p

2129: \leq \Vvert {\wf}\Vvert_{\mathbf{w},p}^p \ .

2130: \label{now124}

2131: \end{equation}

2132: Together, the inequalities (\ref{now123}) and (\ref{now124}) imply that

2133: \begin{equation}

2134: \lim_{k \to\infty}\Vvert {\wf}_k\Vvert_{\mathbf{w},p}

2135: = \Vvert \fs \Vvert_{\mathbf{w},p}= \Vvert {\wf}\Vvert_{\mathbf{w},p} \ .

2136: \label{now125}

2137: \end{equation}

2138: Since $\fs$ is the unique element in $\mathcal{S}$ of minimal

2139: $\Vvert ~ \Vvert_{\mathbf{w},p}$--norm, it follows

2140: that

2141: ${\wf}=\fs$.

2142: The same argument holds for any other weakly converging subsequence of

2143: $(f^\star_{\mu_n;g_n})_{n \in {\N}}$; it follows that the sequence

2144: itself converges weakly to $\fs$.

2145: Similarly we conclude from \eref{now125} that $\lim_{n \to \infty}

2146: \Vvert f^\star_{\mu_n;g_n} \Vvert_{\mathbf{w},p} =

2147: \Vvert \fs _{\mu_n;g_n} \Vvert_{\mathbf{w},p}~$.

2148: It then follows from Lemma \ref{lm-conv} that the $f^\star_{\mu_n;g_n}$

2149: converge to $\fs$ in the $\cH$-norm. \hfill \QED

2150:

2151: \bigskip

2152:

2153: \begin{remark} {\rm Even when $p=1$ and $N(K) \neq \{0\}$, it may still be the

2154: case that, for any $f_o \in \cH$, there is a unique element $\fs$ of minimal

2155: norm in $\mathcal{S}=\{f \in \cH; Kf=Kf_o \}$. (For instance, if

2156: $K$ is diagonal in the $\varphi_\g$--basis, with some zero eigenvalues,

2157: then the unique minimizer $\fs$ in $\mathcal{S}$ is given by setting to zero

2158: all the components of $f_o$ corresponding to $\g$ for which $K \vpg = 0$.)

2159: In this case the proof still applies, and we still have norm--convergence

2160: of the $f^\star_{\mu(\epsilon),\mathbf{w},p;g}$ to $\fs$ if $\mu(\epsilon)$

2161: satisfies \eref{mu-req} and $\|g -Kf_o\| \leq \epsilon \to 0$.}

2162: \end{remark}

2163:

2164: \subsection{Stability estimates}

2165:

2166: The regularization theorem of the previous subsection

2167: gives no information on the rate at

2168: which the regularized solution approaches the exact solution when the noise

2169: (as measured by $\epsilon$) decreases to

2170: zero. Such rates are not available in the general case, but can be derived

2171: under additional assumptions, discussed below.

2172: For the remainder of this section we shall assume

2173: that the operator $K$ is invertible on its range, i.e. that

2174: $\mN(K)=\{0\}$. Suppose that the

2175: unknown exact solution of the problem, $f_o$, satisfies the constraint

2176: $\Vvert f_o \Vvert_{\mathbf{w},p} \leq \rho$, where $\rho>0$ is given; in

2177: other words, we know a priori

2178: that the unknown solution lies in the ball around the origin with radius $\rho$

2179: in the Banach space $\cB_{\mathbf{w},p}$; we shall denote this ball

2180: by $\rm{B}_{\mathbf{w},p}(0,\rho)$. If we also know that $g$ lies within

2181: a distance $\epsilon$ of $Kf_o$ in $\cH'$, then

2182: we can localize the exact

2183: solution within the set

2184: \begin{equation*}

2185: {\cal F}(\epsilon,\rho) =

2186: \{ f\in \cH ; \, \| Kf-g \| \leq \epsilon\, ,

2187: \, \Vvert f\Vvert_{\mathbf{w},p} \leq \rho \}\ .

2188: %\label{solset}

2189: \end{equation*}

2190: The diameter of this set is a measure of the uncertainty of the

2191: solution for a given prior and a given noise level $\epsilon$. The maximum

2192: diameter of ${\cal F}$, namely diam(${\cal F}$)=$\sup\{ \| f-f'\|;\

2193: f,f' \in {\cal F}\}$ is bounded by $2M(\epsilon,\rho)$, where

2194: $M(\epsilon,\rho)$, defined by

2195: \begin{equation}

2196: M(\epsilon,\rho)=\sup\{\| h\|;\, \| Kh \|

2197: \leq \epsilon \, ,\, \Vvert h\Vvert_{\mathbf{w},p} \leq \rho\} \ ,

2198: \label{MC}

2199: \end{equation}

2200: is called the {\it modulus of continuity} of $K^{-1}$ under the

2201: prior. (We have once more dropped the explicit reference in our

2202: notation to the dependence on $\mathbf{w}$ and $p$.)

2203: If \eref{compact-emb} is satisfied, then

2204: the ball $\rm{B}_{\mathbf{w},p}(0,\rho)$ is compact in $\cH$, and it

2205: follows from a general topological lemma

2206: (see e.g. \cite{Eng96}) that $M(\epsilon,\rho) \to 0$

2207: when $\epsilon \to 0~$; the uncertainty on the solution

2208: thus vanishes in this limit. However,

2209: this topological argument, which holds for any regularization

2210: method enforcing the prior $\Vvert f_o \Vvert_{\mathbf{w},p} \leq \rho$,

2211: does not tell us anything about the rate of convergence

2212: of any specific method.

2213:

2214: In what follows, we shall systematically assume that \eref{compact-emb}

2215: is satisfied. We shall also make additional assumptions that will make it

2216: possible to derive more precise convergence results.

2217: Our specific regularization method consists in taking the

2218: minimizer $f^*_{\mu;g}$ of the functional

2219: $\Phi_{\mu; g}(f )$ given by

2220: (\ref{phimu}) as

2221: an estimate of the exact solution $f_o$, where we

2222: leave any links between $\mu$ and $\epsilon$ unspecified for

2223: the moment. (Because of the compactness argument above, we could

2224: conceivably dispense with \eref{mu-req}; see below.) An upper

2225: bound on the reconstruction error $\| f^*_{\mu;g} - f_o \|$ ,

2226: valid for all $g$ such that $\|g-Kf_o\| \le \epsilon$, as well as uniformly

2227: in $f_o$, is

2228: given by the following {\it modulus of convergence}:

2229: \begin{equation}

2230: M_\mu(\epsilon,\rho) = \sup\{ \| f^*_{\mu;g} -f\|;\ f \in \cH,\, g \in \cH'

2231: \, ,

2232: \| Kf - g \| \leq \epsilon\, ,\, \Vvert f \Vvert_{\mathbf{w},p} \leq \rho \}\ .

2233: \label{modcv}

2234: \end{equation}

2235: The decay of this modulus of convergence as $\epsilon \to 0$ is governed by the

2236: decay of the modulus of continuity \eref{MC}, as shown by the following

2237: proposition.

2238: \begin{proposition}

2239: \label{bestcv}

2240: The modulus of convergence {\rm \eref{modcv}} satisfies

2241: \begin{equation}

2242: M(\epsilon,\rho) \leq M_\mu(\epsilon,\rho) \leq M(\epsilon

2243: +\epsilon',\rho + \rho')\ .

2244: \label{stability}

2245: \end{equation}

2246: where

2247: \begin{equation}

2248: \epsilon' = \left(\epsilon^2 +\mu \rho^p\right)^{\frac{1}{2}}

2249: ~~~~\mbox{and}~~~~

2250: \rho' = \left(\rho^p + \frac{\epsilon^2}{\mu}\right)^{\frac{1}{p}}\ .

2251: \label{primes}

2252: \end{equation}

2253: and $M(\epsilon,\rho)$ is defined by {\rm \eref{MC}}\ .

2254: \end{proposition}

2255: {\em Proof:} We first note

2256: that $\Phi_{\mu ; g}(f^*_{\mu;g}) \leq\Phi_{\mu;g}(f_o) \leq \epsilon^2 + \mu

2257: \rho^p$ because $f^*_{\mu;g}$ is the minimizer of $\Phi_{\mu ; g}(f)$

2258: and $f_o \in {\cal F}(\epsilon, \rho)$.

2259: It follows that

2260: $$

2261: \| Kf^*_{\mu;g} - g\|^2 \leq \Phi_{\mu; g}(f^*_{\mu;g}) \le

2262: \epsilon^2 + \mu \rho^p  ~~\mbox{and}~~

2263: \mu  \Vvert f^*_{\mu;g}\Vvert_{\mathbf{w},p}^p \leq \Phi_{\mu; g}(f^*_{\mu;g} )

2264:  \le \epsilon^2 + \mu \rho^p

2265: $$

2266: or, equivalently, $f^*_{\mu;g} \in {\cal F}(\epsilon',\rho')$ with

2267: $\epsilon'$ and $\rho'$ given by \eref{primes}.

2268: The modulus of convergence (\ref{modcv}) can then be bounded as follows,

2269: using the

2270: triangle inequality. Indeed, for any $f \in {\cal F}(\epsilon,\rho)$ and

2271: $f' \in {\cal F}(\epsilon',\rho')$, we have

2272: $

2273: \| K(f - f') \|

2274: \leq \epsilon+\epsilon'

2275: $

2276: and

2277: $

2278: \Vvert f - f' \Vvert_{\mathbf{w},p} \leq \rho+\rho'\ .

2279: $

2280: and we immediately obtain from the definition of (\ref{MC}) the

2281: upper bound in (\ref{stability}). To derive the lower bound, observe that

2282: for the

2283: particular choice $g=0$,

2284: the minimizer $f^*_{\mu;g}$ of the functional (\ref{phimu}) is

2285: $f^*_{\mu;0}=0$. The desired lower bound then follows immediately upon

2286: inspection

2287: of the two definitions \eref{MC} and \eref{modcv}. \hfill\QED

2288:

2289: \bigskip

2290:

2291: Let us briefly discuss the meaning of the previous proposition.

2292: The modulus of continuity $M(\epsilon,\rho)$ yields the best possible

2293: convergence rate for any

2294: regularization method that enforces the error bound and the prior

2295: constraint defined by \eref{MC}.

2296: Proposition \ref{bestcv} provides a relation between the modulus of

2297: continuity and  the convergence rate $M_\mu(\epsilon,\rho)$ of the

2298: specific regularization method considered in this paper, which

2299: is defined by the minimization of the functional \eref{phimu}. Optimizing

2300: the upper bound

2301: in  \eref{stability} suggests the choice

2302: $\mu=\epsilon^2/\rho^p$, yielding

2303: $\epsilon'=\sqrt{2}\;\epsilon\ $ and $\rho'=2^{1/p}\rho\ $.  With these

2304: choices,

2305: we ensure that $f^*_{\mu;g} \rightarrow f_o$ when $\epsilon

2306: \rightarrow 0$, i.e.

2307: that the problem is {\it regularized}, provided we can show

2308: that the modulus of continuity

2309: tends to zero with $\epsilon$.

2310:  Moreover, once we establish its rate of

2311: decay (see below), we know that our regularization method is (nearly)

2312: optimal in the sense that the modulus of convergence (\ref{modcv}) will decay

2313: {\em at the same rate as the optimal rate} given by the modulus of stability

2314: $M(\epsilon,\rho)$\ (We call it {\em nearly} optimal because, although the

2315: rate of

2316: decay is optimal, the constant multiplier probably is not.)

2317: Note that because of the assumption of compactness of

2318: the ball $\rm{B}_{\mathbf{w},p}(0,\rho)$ (which amounts to assuming that

2319: \eref{compact-emb} is satisfied), we achieve regularization

2320: even in some cases where $\epsilon^2 / \mu$

2321: does not tend to zero for $\epsilon \to 0$, which is a case not covered by

2322: Theorem \ref{regthm}.

2323:

2324: In order to derive upper or lower bounds on $M(\epsilon,\rho)$, we must

2325: know more information about the operator $K$. The following proposition

2326: illustrates how such information can be used.

2327:

2328: \begin{proposition}

2329: \label{convrate-gen}

2330: Suppose that there exist sequences $\mathbf{b}=(b_\g)_{\g \in \Gamma}$

2331: and $\mathbf{B}=(B_\g)_{\g \in \Gamma}$ satisfying

2332: $\forall \,\g \in \Gamma ~:~ 0 < b_\g,

2333: \, B_\g < \infty\,$ and such that, for all $h$ in $\cH$,

2334: \begin{equation}

2335: \label{extraK}

2336: \sum_{\g \in \Gamma} b_\g |h_\g|^2 \leq \|K h \|^2

2337: \leq \sum_{\g \in \Gamma} B_\g |h_\g|^2 ~.

2338: \end{equation}

2339: Then the following upper and lower bounds hold for $M(\epsilon,\rho)$:

2340: \begin{eqnarray}

2341: \label{lowerbound}

2342: M(\epsilon,\rho) &\geq & \max_{\g \in \Gamma}

2343: \left[ \min \left(\rho \ag^{-1/p},\epsilon B_\g^{-1/2} \right) \right]~, \\

2344: \label{upperbound}

2345: M(\epsilon,\rho) &\leq & \min_{\Gamma = \Gamma_{_{\!1}} \cup \Gamma_{_{\!2}}}

2346: \sqrt{ \frac{\epsilon^2}{\min_{\g \in \Gamma_{_{\!1}}}b_\g} +

2347: \frac{\rho^2}{\min_{\g \in \Gamma_{_{\!2}}}\ag^{2/p}} } ~.

2348: \end{eqnarray}

2349: \end{proposition}

2350: {\em Proof:}

2351: To prove the lower bound, we need only exhibit one particular $h$ such that

2352: $\|Kh\| \le \epsilon$ and $\Vvert h \Vvert_{\mathbf{w},p}\le \rho$, for which

2353: $\|h\|$ is given by the right hand side of

2354: \eref{lowerbound}. For this we need only identify the index ${\g_{_m}}$ such

2355: that, $\forall \g \in \Gamma \,$,

2356: $$

2357: \nu = \min \left(\rho w_{\g_{_m}}^{-1/p},\epsilon B_{\g_{_m}}^{-1/2} \right)

2358: \ge  \min \left(\rho \ag^{-1/p},\epsilon B_\g^{-1/2} \right) ~,

2359: $$

2360: and choose $h = \nu \varphi_{\g_{_m}}$. Then $\Vvert h

2361: \Vvert_{\mathbf{w},p} =

2362: \nu \, w_{\g_{_m}}^{1/p} \le \rho$ and $\|K h \| \le  \nu B_{\g_{_m}}^{1/2}

2363: \le \epsilon ~$; on the other hand, $\nu$ equals the right hand side of

2364: \eref{lowerbound}.

2365:

2366: On the other hand, for any partition of $\Gamma$ into two subsets,

2367: $\Gamma = \Gamma_{_{\!1}} \cup \Gamma_{_{\!2}}$,

2368: and any $h \in \{h; \|Ku\|\le \epsilon, ~\Vvert u \Vvert_{\mathbf{w},p} \le

2369: \rho \}$, we have

2370: \begin{eqnarray*}

2371: \sum_{\g \in \Gamma}|h_\g|^2 &=& \sum_{\g \in \Gamma_{_{\!1}}} |h_\g|^2 +

2372: \sum_{\g \in \Gamma_{_{\!2}}} |h_\g|^2 \\

2373: &\le & \max_{\g' \in \Gamma_{_{\!1}}} [b_{\g'}^{-1} ]

2374: \sum_{\g \in \Gamma_{_{\!1}}} b_\g |h_\g|^2 +

2375: \max_{\g' \in \Gamma_{_{\!2}}}[ w_{\g'}^{-2/p}][\max_{\g'' \in

2376: \Gamma_{_{\!2}}} w_{\g''}

2377: |h_{\g''}|^p]^{\frac{2}{p}-1 } \sum_{\g \in \Gamma_{_{\!2}}} \ag |h_\g|^p \\

2378: & \le & \max_{\g' \in \Gamma_{_{\!1}}} [b_{\g'}^{-1}] \epsilon^2

2379: + \max_{\g' \in \Gamma_{_{\!2}}}[ w_{\g'}^{-2/p}] \rho^2 ~.

2380: \end{eqnarray*}

2381: Since this is true for any partition $\Gamma = \Gamma_{_{\!1}} \cup

2382: \Gamma_{_{\!2}}$, we

2383: still have an upper bound, uniformly valid for all $h \in

2384: \{u; \|Ku\|\le \epsilon, ~\Vvert u \Vvert_{\mathbf{w},p} \le \rho \}$, if we

2385: take the minimum over all such partitions. The upper bound on

2386: $M(\epsilon,\rho)$ then follows upon taking the square root.

2387: \hfill \QED

2388:

2389: \bigskip

2390:

2391: To illustrate how Proposition \ref{convrate-gen} could be used, let us apply

2392: it to one particular example, in which we choose

2393: the $(\vpg)$--basis with respect to which

2394: the $\Vvert~\Vvert_{\mathbf{w},p}$--norm is defined to be a wavelet basis

2395: $(\Psi_\lambda)_{\lambda \in \Lambda}$. As

2396: already pointed out in subsection 1.4.1

2397: the Besov spaces $B^s_{p,p}(\R^d)$ can then be identified with the Banach

2398: spaces

2399: $\mathcal{B}_{\mathbf{w},p}$ for the particular choice $w_\lambda =

2400: 2^{\sigma  p

2401: |\lambda|}$, where $\sigma = s + d\left( \frac{1}{2}-\frac{1}{p} \right)$

2402: is assumed to be non-negative. For $f \in B^s_{p,p}(\R^d)$, the Banach norm

2403: $\Vvert f \Vvert_{\mathbf{w},p}$ then coincides with the Besov norm

2404: $\VVert f \VVert_{s,p} = \left[ \sum_{\lambda \in \Lambda} w_{\lambda}

2405: |\left<f,\Psi_\lambda \right>|^p \right]^{1/p} ~.$

2406: Let us now consider an inverse problem for the operator $K$ with such a

2407: Besov prior.

2408: If we assume that the operator

2409: $K$ has particular smoothing properties, then we can

2410: use these to derive bounds on the corresponding modulus of continuity, and thus

2411: also on the rate of convergence for our regularization algorithm.

2412: In particular, let us assume that the operator $K$ is a smoothing

2413: operator of order

2414: $\alpha$, a property which can be formulated as an equivalence between the norm

2415: $\| Kh\|$ and the norm of $h$ in a Sobolev space of

2416: negative order $H^{-\alpha}$, i.e. in a Besov space $B^{-\alpha}_{2,2}$

2417: (see e.g.

2418: \cite{Eng96}, \cite{Lou97}, \cite{DeV98}, \cite{Coh02}). In other words,

2419: we assume that

2420: for some $\alpha >0$, there exist constants  $A_{\ell}$ and $A_u$,  such that,

2421: for all $h \in L^2(\R^d)$,

2422: \begin{equation}

2423: \label{ell}

2424: A_{\ell}^2 \sum_\lambda 2^{-2\vert\lambda\vert\alpha}\; \vert

2425: h_\lambda\vert^2 \leq \| K h\|^2 \leq A_u^2\sum_\lambda

2426: 2^{-2\vert\lambda\vert\alpha}\;

2427: \vert h_\lambda\vert^2\ .

2428: \end{equation}

2429: The decay rate of the modulus of continuity is then characterized as follows.

2430: \begin{proposition}

2431: If the operator $K$ satisfies the smoothness condition {\rm(\ref{ell})},

2432: then the

2433: modulus of continuity $M(\epsilon,\rho)$, defined by

2434: $$

2435: M(\epsilon,\rho)= \max\{ \|h\|; \|K h\| \leq \epsilon,

2436: \VVert h \VVert_{s,p} \le \rho \} ~,

2437: $$

2438: satisfies

2439: \begin{equation}

2440: \label{M}

2441: c \left( \frac{\epsilon}{A_u} \right)^{\frac{\sigma}{\sigma+\alpha}}

2442: \rho^{\frac{\alpha}{\sigma+\alpha}} \leq M(\epsilon,\rho) \leq C

2443: \left( \frac{\epsilon}{A_\ell}\right)^{\frac{\sigma}{\sigma+\alpha}}

2444: \rho^{\frac{\alpha}{\sigma+\alpha}}

2445: \end{equation}

2446: where $\sigma = s + d\left( \frac{1}{2}-\frac{1}{p} \right)\geq0$,

2447: and $c$ and $C$ are constants depending on $\sigma$

2448: and $\alpha$ only.

2449: \end{proposition}

2450: {\em Proof:}

2451: By \eref{ell}, the operator $K$ satisfies

2452: \eref{extraK} with $b_\lambda=A_{\ell}^2\, 2^{-2\vert\lambda\vert\alpha}$

2453: and $B_\lambda=A_u^2\, 2^{-2\vert\lambda\vert\alpha} \,$.

2454: \newline

2455: It then follows from \eref{lowerbound} that

2456: $$

2457: M(\epsilon, \rho) \geq \max_\lambda \left[ \min \left(

2458: \rho \, 2^{- \sigma |\lambda|} , \frac{\epsilon}{A_u}\, 2^{\alpha |\lambda|}

2459: \right) \right] ~;

2460: $$

2461: if $x=|\lambda|$ could take on all positive real values, then one easily

2462: computes

2463: that this max-min would be given for

2464: $x= - [\log_2(\epsilon/\rho A_u)]/(\alpha + \sigma)$, and would be equal to

2465: $(\epsilon/A_u)^{\sigma/(\alpha + \sigma)} \rho^{\alpha/(\alpha + \sigma)}$.

2466: Because $|\lambda|$ is constrained to take only the values in $\N$, the max-min

2467: is guaranteed only to be within a constant of this bound (corresponding

2468: to an integer neighbor of the optimal $x$), which leads

2469: to the lower bound in \eref{M}.

2470:

2471: For the upper bound \eref{upperbound}, we must partition the index set.

2472: Splitting $\Lambda$ into $\Lambda_{_1}=\{\lambda; |\lambda| < J\}$ and

2473: $\Lambda_{_2}=\{\lambda; |\lambda| \ge J\}$, we find that

2474: $$

2475: \frac{\epsilon^2}{\min_{\lambda \in \Lambda_{_{1}}}b_\lambda} +

2476: \frac{\rho^2}{\min_{\lambda \in \Lambda_{_{2}}}w_\lambda^{2/p}}

2477: =\frac{\epsilon^2}{A_\ell^2}\, 2^{2\alpha(J-1)} +  \rho^2\, 2^{-2 \sigma J} ~.

2478: $$

2479: The minimizing partition for $\Lambda$ thus corresponds with the minimizing

2480: $J$ for the right hand side of this expression. This value

2481: for  $J$ is an integer neighbor of

2482: $y= - [\log_2(\epsilon/\rho A_\ell)]/(\alpha + \sigma)$, which leads to the

2483: upper bound in \eref{M}.

2484: \hfill \QED

2485:

2486: \bigskip

2487:

2488: The stability estimates we have derived are standard in regularization

2489: theory for the special case $p=2$; they were first extended

2490: to the case $1\leq p < 2$

2491: in \cite{DeM02}. They show the interplay between the smoothing order of the

2492: operator

2493: characterized by $\alpha$ and the assumed smoothness of

2494: the solutions characterized by $\sigma=s+d(\frac{1}{2}-\frac{1}{p})$

2495: (for Besov spaces, we recall that this amounts to

2496: solutions

2497: having $s$ derivatives in $L^p$). For

2498: $\sigma/(\sigma+\alpha)$ close to one, the problem is mildly ill-posed,

2499: whereas the stability degrades for large $\alpha$. Note that if

2500: the bound (\ref{ell}) were replaced by another one,

2501: in which the decay of the $b_\lambda$ and

2502: $B_\lambda$ was given by an exponential decay in $D=2^{|\lambda|}$ (instead

2503: of the much slower decaying negative power $D^{-2\alpha}$) of \eref{ell},

2504: then the modulus of continuity would tend to zero only as an

2505: inverse power

2506: of $| \log\epsilon|$. This is the so-called {\it logarithmic continuity}

2507: which has been extensively discussed in the case $p=2$, and which extends,

2508: as shown

2509: by an easy application of Proposition \ref{convrate-gen}, to $1 \le p <2$.

2510:

2511: \section{An illustration}

2512: We provide a simple illustration of the behavior of the algorithms based on

2513: minimizing

2514: $\Phi_{\mu \mathbf{w}_0,1}$ and

2515: $\Phi_{\mu \mathbf{w}_0,2}$ for a two-dimensional deconvolution

2516: problem,

2517: considering a class of objects consisting of small bright sources on a dark

2518: background. The image is discretized on a $256 \times 256$ array,

2519: denoted by  $f$. The convolution operator $K$ is

2520: implemented by multiplying the discrete Fourier transform (DFT) of $f$

2521: by a low-pass, radially symmetrical filter and then

2522: taking the inverse DFT to obtain the data $g=Kf$ (data were zero padded on

2523: a larger

2524: $512 \times 512$ array when taking DFTs). The filter was equal

2525: (in the Fourier domain) to the convolution with itself

2526: of the characteristic function of a disk with radius equal to $0.1$ times the

2527: maximum frequency determined by the image grid sampling;

2528: this filter provides a discrete model of a diffraction-limited imaging

2529: system with

2530: incoherent light. Pseudo-random Poisson noise was

2531: added to the data array $g$, for a total number of $10000$ photons,

2532: corresponding to about $25$ photons for the data pixel with the maximum

2533: intensity.

2534:

2535: \begin{figure}[h!tb]

2536: \begin{center}

2537: \epsfig{figure=figure1.ps, width= 5 in}

2538: \caption{The object $f$ (top left), the image $g$ after convolving

2539: with a radially symmetrical low-pass filter and adding pseudo-random

2540: Poisson noise (top right), and the minimizers of $\Phi_{\mu_1 \mathbf{w}_0,1}$

2541: (bottom left) and $\Phi_{\mu_2 \mathbf{w}_0,2}$ (bottom right). The values

2542: $\mu_1=0.001$ and $\mu_2=0.0001$ have been selected separately for the

2543: $\ell^1$- and $\ell^2$-cases, to obtain a balance between

2544: sharpness and ringing and noise. }

2545: \end{center}

2546: \end{figure}

2547:

2548: The top of Figure 1 shows the object $f$ (four ellipses of axis

2549: $7.5$ or $5.0$ pixels, slightly smoothed to avoid blocking effects)

2550: and the data $g=Kf$. The figure also shows intensity

2551: distributions along two lines in the object and data arrays; along the

2552: horizontal line we see

2553: how two close sources in $f$ give rise to a joint blur in $g$.

2554: The bottom of Figure 1 shows the reconstructions obtained after

2555: 2000 steps of the iterative thresholded Landweber algorithm, which

2556: accurately approximate the minimizers of $\Phi_{\mu_1 \mathbf{w}_0,1}$ and

2557: $\Phi_{\mu_2 \mathbf{w}_0,2}$. The parameters $\mu_1$ and $\mu_2$

2558: are selected separately for each case, in order to achieve a good balance

2559: between

2560: sharpness and ringing and noise.

2561:

2562: \begin{figure}[h!tb]

2563: \begin{center}

2564: \epsfig{figure=figure2.ps, width= 5 in}

2565: \caption{A comparison to illustrate the impact of the positivity constraint,

2566: imposed at every iteration step. On the left are the fixed points for

2567: $P_{\cal C} \mathbf{S}_{\mu_1 \mathbf{w}_0,1}$ (top) and

2568: $\mathbf{S}_{\mu_1 \mathbf{w}_0,1}$ (bottom); on the right

2569: those of $P_{\cal C} \mathbf{S}_{\mu_2 \mathbf{w}_0,2}$ (top) and

2570: $\mathbf{S}_{\mu_2 \mathbf{w}_0,2}$ (bottom). The data

2571: and the values of $\mu_1,~\mu_2$ are the same as in Figure 1.}

2572: \end{center}

2573: \end{figure}

2574:

2575: As expected for an example of this type, the minimizer of

2576: $\Phi_{\mu_1 \mathbf{w}_0,1}$ does a better job at

2577: resolving the two close sources on the horizontal line; it also gives a better

2578: concentration of the source lying along the vertical line.

2579: Because the object $f$ is positive, we can

2580: apply Remark 3.12  and use $P_{\cal C} S_{\mu \mathbf{w}_0,p}$ instead

2581: of $S_{\mu \mathbf{w}_0,p}$, where $P_{\cal C}$ is the projection

2582: onto the  convex cone of $256 \times 256$ arrays that take only non-negative

2583: values.

2584:

2585: The results are

2586: shown in Figure 2; on the top is the $2000$-th iterate for the case with

2587: $P_{\cal C}$,  with the case $p=1$

2588: (left) and $p=2$ (right). In each case we used the same values of

2589: $\mu_1$ and $\mu_2$ as for

2590: the reconstructions without positivity constraint, which are shown for

2591: comparison on the bottom of

2592: Figure 2. Exploiting the positivity constraint leads to better resolution and

2593: less ringing for this example, where the background is zero.

2594:

2595: \begin{figure}[h!tb]

2596: \begin{center}

2597: \epsfig{figure=figure3.ps, width= 5 in}

2598: \caption

2599: {A different view of the four solutions in Figure 2, with a different

2600: dynamic range for the image intensity gray scale, to highlight ringing

2601: and other artifacts.}

2602: \end{center}

2603: \end{figure}

2604:

2605: Figure 3 gives a different view of the same solutions, with

2606: a compressed gray scale ranging from $- 2 \%$ (darker) to $+ 2 \%$ (lighter)

2607: of the maximum intensity in the original object. This has the effect of

2608: highlighting

2609: the ringing effects and the noise. Both

2610: ringing and noise are seen to be less pronounced for the minimizer of

2611: $\Phi_{\mu_1 \mathbf{w}_0,1}$ (top left)

2612: than that of $\Phi_{\mu_2 \mathbf{w}_0,2}$. Although the introduction of

2613: the positivity constraint removes the

2614: ringing phenomenon (top of Figure 3), we nevertheless see that noise is better

2615: suppressed with $p=1$.

2616:

2617: To produce Figures 1 to 3, the same program was used in every case; the only

2618: change was the choice

2619: of the nonlinear operator applied at the $n$-th iteration step to

2620: $f^{n-1} + K^* (g - K f^{n-1})$.

2621: For realistic applications on data of this type, more sophisticated algorithms

2622: exist. With the

2623: $\ell^2$--penalty, for instance, the reconstructions in our simple example

2624: can be

2625: obtained directly by a regularized Fourier deconvolution.

2626: These examples are included to illustrate the differences

2627: that can be achieved by the choice of $p$, and do not constitute a claim

2628: that the

2629: iterative algorithms discussed

2630: in this paper are optimal. The ``data'' in this example are also only

2631: simple-minded

2632: caricatures of quasi

2633: point-sources data sets. While similar examples may have applications in

2634: astronomy,

2635: most natural

2636: images have a much richer structure. However, as is abundantly documented, the

2637: wavelet transforms of

2638: natural images tend to have distributions that are sparse. A similar

2639: improvement in

2640: accuracy can be expected by applying $\ell^1$ rather than $\ell^2$

2641: penalizations on the

2642: wavelet coefficients

2643: in inverse problems involving natural images, similar to the gain achieved in

2644: denoising with

2645: a soft thresholding rather than with a quadratic penalty.

2646:

2647: \section{Generalizations and additional comments}

2648:

2649: The algorithm proposed in this paper can be generalized in several directions,

2650: some of which we list here, with brief comments.

2651:

2652: The penalization functionals $\Vvert f \Vvert_{\mathbf{w},p}$ we have used are

2653: symmetric, i.e. they are invariant under the exchange of $f$ for $-f$. We can

2654: equally well consider penalization functionals that treat positive and

2655: negative

2656: values of the $\fg$ differently. If $(w^+_\gamma)_{\gamma \in \Gamma}$ and

2657: $(w^-_\gamma)_{\gamma \in \Gamma}$ are two sequences of

2658: strictly positive numbers,

2659: then we can consider the problem of minimizing the functional

2660: \begin{equation}

2661: \label{asfunct}

2662: \Phi_{\mathbf{w}^+, \mathbf{w}^-,p} (f)= \Vert Kf-g\Vert^2 +

2663: \sum_{\gamma \in \Gamma}

2664: ((w^+_\gamma) [\fg]_+^p + (w^-_\gamma) [\fg]_-^p)

2665: \end{equation}

2666: where, for $r \in \mathbb{R},~ r_+ = \max(0,r),~ r_- = \max(0,-r)$. One easily

2667: checks that all the arguments in this paper can be applied equally well (after

2668: some straightforward modifications) to the general functional \eref{asfunct},

2669: provided we replace the thresholding functions $S_{w_\gamma,p}$ in the

2670: iterative

2671: algorithm by $S_{w^+_\gamma, w^-_\gamma, p}$, where, for $p>1$,

2672: \begin{equation*}

2673: S_{w^+, w^-, p} = \left( F_{w^+, w^-, p}\right)^{-1}\hbox{\ \ with\ \ }

2674: F_{w^+, w^-, p}(x) = x +\frac{p}{2}\, w^+ [x]_+^{p-1} -

2675: \frac{p}{2}\, w^- [x]_-^{p-1} ~,

2676: \end{equation*}

2677: and for $p=1$,

2678: \begin{equation*}

2679: S_{w^+, w^-, 1} = \left\{

2680:  \begin{array}{ccl} x + w^-/2 ~&~ \mbox{if} ~& x \leq - w^-/2 \\

2681: 0 ~&~ \mbox{if} ~& - w^-/2 < x < w^+/2

2682: \\ x- w^+/2 ~&~ \mbox{if} ~& x \geq  w^+/2.

2683: \end{array} \right.

2684: \end{equation*}

2685:

2686: The above applies when the $\fg$ are all real; a generalization to complex

2687: $\fg$

2688: is not straightforward. When dealing with complex functions, one could

2689: generalize

2690: the penalization $\sum_{\gamma \in \Gamma} w_\gamma |\fg|^p$ to

2691: $\sum_{\gamma \in \Gamma, |\fg| \neq 0} w_\gamma({\rm arg} \fg) |\fg|^p$,

2692: where the weight coefficients have been replaced by strictly positive

2693: $2\pi$--periodic $C^1$--functions on the $1$--torus $\mathbb{T} =

2694: \{x \in \mathbb{C}, |x|=1\}$.

2695: It turns out, however, that the variational

2696: equation for $e^{i \arg\fg}=\fg |\fg|^{-1}$ then no longer uncouples

2697: from that for $|\fg|$ (as it does in the case where $w_\gamma$ is a constant),

2698: leading to a more complicated ``generalized thresholding'' operation in

2699: which the

2700: absolute value and phase of the complex number $S_{w,p}(\fg)$ are given

2701: by a system of

2702: coupled nonlinear equations.

2703:

2704: When the $(\varphi_\gamma)_{\gamma \in \Gamma}$--basis is chosen to be a

2705: wavelet basis,

2706: then we saw in subsection 1.4.1 that is is possible to make the

2707: $\Vvert ~ \Vvert_{\mathbf{w},p}$--norm equivalent to the Besov-norm

2708: $\VVert ~ \VVert_{s,p}$, by choosing the weight for

2709: $| \langle f, \Psi_\lambda \rangle |^p$ to be given by

2710: $w_\lambda = 2^{|\lambda| \sigma p}$, where $|\lambda |$ is the scale of

2711: wavelet

2712: $\Psi_\lambda$. The label $\lambda$ contains much more information than just

2713: the scale, however, since it also indicates the location of the wavelet,

2714: as well as

2715: its ``species'' (i.e. exactly which combination of $1$-dimensional scaling

2716: functions

2717: and wavelets is used to construct the product function $\Psi_\lambda$).

2718: One could

2719: choose the $w_\lambda$ so that certain regions in space are given

2720: extra weight, or

2721: on the contrary de-emphasized, depending on prior information.

2722: In pixel space, prior information on the support of the object to be

2723: reconstructed can be easily enforced by simply

2724: setting the

2725: corresponding weights to very small values,

2726: or by choosing very large weights outside

2727: the object support. This type of constraint is of uttermost importance

2728: to achieve superresolution in inverse problems in optics and imaging

2729: (see e.g. \cite{Ber96}).

2730:  When thresholding in the wavelet domain,

2731: a constraint on the object support can be enforced in a similar way due to the

2732: good spatial localization of the wavelets.

2733: If no a priori information is known,

2734: one could

2735: even imagine repeating the wavelet thresholding

2736: algorithm several times, adapting

2737: the weights $w_\lambda$

2738: after each pass, depending on the results of the previous pass;

2739: this could be used,

2740: e.g., to emphasize certain locations at fine scales if coarser scale

2741: coefficients

2742: indicate the possible existence of an edge. The results of this paper

2743: guarantee

2744: that each pass will converge.

2745:

2746: In this paper we have restricted ourselves to penalty functions that are

2747: weighted $\ell^p$--norms of the $\fg = \left<f, \varphi_{\gamma} \right>$. The

2748: approach can be extended naturally to include penalty functions that

2749: can be written as sums, over $\gamma \in \Gamma$, of more general

2750: functions of $\fg$, so that the functional to be minimized is then written

2751: as

2752: $$

2753: \widetilde{\Phi}_{_{\mbox{\scriptsize{\bf{W}}}}}(f) = \|Kf-g\|^2 +

2754: \sum_{\gamma \in \Gamma}

2755: W_{\gamma} (|\fg|) ~.

2756: $$

2757: The arguments in this paper will still be applicable to this more general case

2758: if the functions $W_{\gamma}: \R_+ \rightarrow \R_+$ are convex, and satisfy

2759: some extra technical conditions, which ensure that the corresponding

2760: generalized component--shrinkage functions $\widetilde{S}_{\gamma}$ are still

2761: non-expansive (used in several places), and that, for some $c > 0$,

2762: $$

2763: \inf_{\|v\| \leq 1} ~ \inf_{\|a\| \leq c} \|v\|^{-2}

2764: \sum_{\gamma \in \Gamma} \left|v_{\gamma} + \widetilde{S}_{\gamma}(a_{\gamma})

2765: -\widetilde{S}_{\gamma}(v_{\gamma}+a_{\gamma}) \right|^2 > 0 ~

2766: $$

2767: (used in Lemma \ref{lm-3-16}).

2768: To ensure that both conditions are satisfied, it is sufficient to choose

2769: functions $W_{\gamma}$ that are convex, with a minimum at $0$ and e.g.

2770: twice differentiable, except possibly at $0$ (where they should nevertheless

2771: still be left and right differentiable), and for which $W_{\gamma}'' >1$

2772: on $V \setminus \{0\}$, where $V$ is a neighborhood of $0$.

2773:

2774: We conclude this paper with some comments

2775: concerning the numerical complexity of the algorithm.

2776:

2777: At each iteration step,

2778: we must compute the action of the operator $K^*K$ on the current

2779: object estimate, expressed in the $\varphi_{\gamma}$--basis.

2780: In a finite-dimensional setting where the solution is

2781: represented by a vector of length $N$, this necessitates in principle a

2782: matrix multiplication

2783: of complexity $O(N^2)$,

2784: if we neglect the cost of the shrinkage operation in each iteration step.

2785: After sufficient accuracy is attained and the iterations are stopped, the

2786: resulting $(f^n)_{\gamma}$ must be transformed back into the standard

2787: representation domain of the object function, except in the special case

2788: where the $\varphi_{\g}$ are already the basis for the standard representation

2789: (e.g., if the $\varphi_{\g}$ correspond to the pixel representation

2790: for images). This adds one final $O(N^2)$--matrix multiplication. In this

2791: scenario, the total cost equals that

2792: of the classical Landweber algorithm

2793: on the basis of a comparable number of iterations.

2794: Since Landweber's algorithm typically requires a substantial number of

2795: iterations, it follows that this method can become

2796: computationally competitive with the $O(N^3)$ SVD algorithms only

2797: when $N$ is large compared to the number of iterations necessary.

2798:

2799: Several methods have been proposed in the literature to accelerate the

2800: convergence

2801: of Landweber's iteration, which could be used for the present algorithm as

2802: well.

2803: For instance, one could use some form of preconditioning (using the

2804: operator $D$

2805: of the Remark \ref{op-D}) or group together $k$ Landweber iteration steps and

2806: apply thresholding only every $k$ steps (see e.g. the book \cite{Eng96}).

2807:

2808: Much more substantial gains can be obtained when the operator $\K K$ can be

2809: implemented via fast algorithms. In a first important class of applications,

2810: the matrix \\

2811: $\left( \left<\K K\varphi_{\gamma},\varphi_{\gamma'}\right> \right)_

2812: {\gamma , \gamma' \in \Gamma}$ is sparse; if, for instance, there are only

2813: $O(N)$ non vanishing entries in this matrix, then standard techniques to deal

2814: with the action of sparse matrices will reduce the cost of each iteration step

2815: to $O(N)$ instead of $O(N^2)$. If the $\vpg$--basis

2816: is a wavelet basis, this is the case for a large class

2817: of integro-differential operators of interest (see e.g. \cite{Bey91}).

2818: Even if $\K K$ is sparse in the $\vpg$--basis, but has an even simpler

2819: expression in another basis, and if the transforms back and forth between the

2820: two bases can be carried out via fast algorithms, then it may be useful to

2821: implement the action of $\K K$ via these back--and--forth transformations.

2822: For instance, if the object is of a type that will have a sparse representation

2823: in a wavelet basis, and the operator $\K K$ is a convolution operator, then

2824: we can pick the $\vpg$--basis to be a wavelet basis, and implement the

2825: operator $\K K$ by doing, successively, a fast reconstruction from wavelet

2826: coefficients, followed by a FFT, a multiplication in the Fourier domain, an

2827: inverse FFT, and a wavelet transform, for a total complexity of $O(N \log N)$.

2828: One can obtain similar complexity estimates if the algorithm is modified

2829: to not only take the nonlinear thresholding into account, but also additional

2830: projections $P_{\cal C}$ on a convex set, such as the cone of functions that

2831: are a.e. positive; in this case, after the thresholding operation, one needs

2832: to carry out an additional fast reconstruction from, say, the wavelet domain,

2833: take the positive part, and then perform the fast transform back, without

2834: affecting the $O(N \log N)$ complexity estimate.

2835:

2836: The situations described above cover several applications of

2837: great practical relevance, in which we expect this algorithm will prove itself

2838: to be an attractive competitor to other fast techniques for large-scale

2839: inverse problems with sparsity constraints.

2840:

2841:

2842:

2843:

2844: \section*{Acknowledgments}

2845:

2846: We thank Albert Cohen, Rich Baraniuk, Mario Bertero, Brad Lucier,

2847: St\'ephane Mallat

2848: and especially David Donoho for interesting and stimulating discussions. We

2849: also

2850: would like to thank Rich Baraniuk for drawing our attention to \cite{Fig03}.

2851:

2852: Ingrid Daubechies gratefully acknowledges partial support by NSF grants

2853: DMS-0070689

2854: and DMS-0219233, as well as by AFOSR grant F49620-01-1-0099, whereas research

2855: by Christine De Mol is supported by the ``Action de Recherche Concert\'ee'' Nb

2856: 02/07-281 and IAP-network in Statistics P5/24.

2857:

2858:

2859:

2860: \begin{thebibliography}{A}

2861:

2862: \bibitem [1]{Abr98} F. Abramovich and B. W. Silverman, \textit{Wavelet

2863: Decomposition Approaches to Statistical Inverse Problems.} Biometrika

2864: \textbf{85} (1998), 115--129.

2865:

2866: \bibitem [2]{Ber98} M. Bertero and P. Boccacci, \textit{Introduction to

2867: Inverse Problems in Imaging}, Institute of Physics, Bristol, 1998.

2868:

2869: \bibitem [3]{Ber96} M. Bertero and C. De Mol, \textit{Super-resolution by

2870: data inversion}, in: Progress in Optics (Vol. XXXVI), E. Wolf, ed.,

2871: Elsevier, Amsterdam, 1996, pp. 129--178.

2872:

2873: \bibitem [4]{Bey91} G. Beylkin, R. Coifman and V. Rokhlin, \textit{Fast

2874: Wavelet Transforms and Numerical Algorithms I.} Comm. Pure Appl. Math.

2875: \textbf{44} (1991), 141--183.

2876:

2877: \bibitem [5]{Can00} E. J. Cand\`es and D. L. Donoho, \textit{Recovering Edges

2878: in Ill-Posed Inverse Problems: Optimality of Curvelet Frames.} Ann. Statist.

2879: \textbf{30} (2000), 784--842.

2880:

2881: \bibitem [6]{Cha98} A. Chambolle, R. A. DeVore, N.-Y. Lee and B. J. Lucier,

2882: \textit{Nonlinear Wavelet Image Processing: Variational Problems, Compression,

2883: and Noise Removal through Wavelet Shrinkage.} IEEE Trans. Image Processing

2884: \textbf{7} (1998), 319--335.

2885:

2886: \bibitem [7]{CDS01} S. Chen, D. Donoho and M. Saunders, \textit{Atomic

2887: Decomposition by Basis Pursuit} SIAM Review \textbf{43} (2001), 129--159.

2888:

2889: \bibitem [8]{Coh00} A. Cohen, \textit{Wavelet methods in numerical

2890: analysis.}, Handbook of Numerical Analysis, vol. VII, P. G. Ciarlet and J. L.

2891: Lions eds., Elsevier, Amsterdam, 2000.

2892:

2893: \bibitem [9]{Coh02} A. Cohen, M. Hoffmann and M. Reiss, \textit{Adaptive

2894: wavelet Galerkin methods for linear inverse problems.} preprint, 2002.

2895:

2896:

2897: \bibitem [10]{DeM02} C. De Mol and M. Defrise, \textit{A note on

2898: wavelet-based inversion methods}, in: \textit{Inverse Problems, Image Analysis

2899: and Medical Imaging},  M. Z. Nashed and O. Scherzer eds,

2900: Series ``Contemporary Mathematics''

2901: vol. 313, pp. 85--96, American Mathematical Society, 2002.

2902:

2903: \bibitem [11]{DeP95} A. R. De Pierro, \textit{A modified expectation

2904: maximization algorithm for penalized likelihood estimation in emission

2905: tomography.} IEEE Trans. Med. Imag. \textbf{14} (1995), 132--137.

2906:

2907: \bibitem [12]{DeV98} R. DeVore, \textit{Nonlinear Approximation.} Acta

2908: Numerica (1998), 1--99.

2909:

2910: \bibitem [13]{Dic96} V. Dicken and P. Maass, \textit{Wavelet-Galerkin

2911: methods for ill-posed problems.} J. Inv. Ill-Posed Problems  \textbf{4} (1996)

2912: 203--222.

2913:

2914: \bibitem [14]{Don92} D. Donoho, \textit{Superresolution via sparsity

2915: constraints.}

2916: SIAM J. Math. Anal. \textbf{23} (1992), 1309--1331.

2917:

2918: \bibitem [15]{Don95} D. Donoho, \textit{Nonlinear solution of Linear

2919: Inverse Problems by Wavelet-Vaguelette Decomposition.} Appl. Comp. Harmonic

2920: Anal. \textbf{2} (1995), 101--126.

2921:

2922: \bibitem [16]{Don00} D. Donoho, \textit{Orthonormal ridgelets and linear

2923: singularities.} SIAM J. Math. Anal.

2924: \textbf{31} (2000), 1062--1099.

2925:

2926: \bibitem [17]{Don94} D. Donoho and I. Johnstone, \textit{Ideal

2927: spatial adaptation via wavelet shrinkage.} Biometrika

2928: \textbf{81} (1994), 425--455.

2929:

2930: \bibitem [18]{DuSh52} R. J. Duffin and A. C. Schaeffer, \textit{A class of

2931: nonharmonic Fourier series.} Trans. Am. Math. Soc. \textbf{72} (1952),

2932: 341--366.

2933:

2934: \bibitem [19]{Eic92} B. Eicke, \textit{Iteration methods for convexly

2935: constrained ill-posed problems in Hilbert space.} Numer. Funct. Anal. Optim.

2936: \textbf{13} (1990), 413--429.

2937:

2938: \bibitem [20]{Eng96} H. W. Engl, M. Hanke and A. Neubauer,

2939: \textit{Regularization of Inverse Problems}, Kluwer, Dordrecht, 1996.

2940:

2941: \bibitem [21]{Fig03} M. Figueiredo and R. Nowak, \textit{An EM Algorithm

2942: for Wavelet-Based Image Restoration}, IEEE Transactions on Image Processing.

2943: To appear in July 2003.

2944:

2945: \bibitem [22]{KMR03} J. Kalifa, S. Mallat and B. Roug\'e,

2946: \textit{Deconvolution by thresholding in mirror wavelet bases.} IEEE Trans. on

2947: Image Processing \textbf{12} (2003), 446--457.

2948:

2949: \bibitem [23]{Lad51} L. Landweber, \textit{An iterative formula for Fredholm

2950: integral equations of the first kind.} Am. J. Math.

2951: \textbf{73} (1951), 615--624.

2952:

2953: \bibitem [24]{Lan00}  K. Lange, D. R. Hunter and I. Yang,

2954: \textit{Optimization Transfer algorithms using surrogate objective functions.}

2955: J. Comp. Graph. Stat. \textbf{9} (2000), 1--59.

2956:

2957: \bibitem [25]{Lee01}  N.-Y. Lee and B. J. Lucier,

2958: \textit{Wavelet Methods for Inverting the Radon Transform with Noisy Data.}

2959: IEEE Trans. Image Processing \textbf{10} (2001), 79--94.

2960:

2961: \bibitem [26]{Li02} M. Li, H. Yang and H. Kudo, \textit{An accurate iterative

2962: reconstruction algorithm for sparse objects: application to 3-D blood-vessel

2963: reconstruction from a limited number of projections.} Phys. Med. Biol

2964: \textbf{47}

2965: (2002), 2599--2609.

2966:

2967: \bibitem [27]{Lou97} A. K. Louis, P. Maass and A. Rieder, \textit{Wavelets:

2968: Theory and Applications}, Wiley, Chichester, 1997.

2969:

2970: \bibitem [28]{Mal98} S. Mallat, \textit{A Wavelet Tour of Signal

2971: Processing}, 2nd edition, Academic Press, San Diego, 1999.

2972:

2973: \bibitem [29]{Mey92} Y. Meyer, \textit{Wavelets and Operators}, Cambridge

2974: University Press, 1992.

2975:

2976: \bibitem [30]{Nov01} R. Nowak and M. Figueiredo, \textit{Fast wavelet-based

2977: image deconvolution using the EM algorithm.} Conference Record of the

2978: Thirty-Fifth

2979: Asilomar Conference on Signals, Systems and Computers, Vol. 1 ,

2980: pp. 371--375, 2001.

2981:

2982: \bibitem [31]{Opi67} Z. Opial, \textit{Weak convergence of the sequence of

2983: successive approximations for nonexpansive mappings.} Bull. Amer. Math. Soc.

2984: \textbf{73} (1967), 591--597.

2985:

2986: \bibitem [32]{Tib96} R. Tibshirani, \textit{Regression shrinkage and

2987: selection via the lasso.} J. Royal Statist. Soc. B \textbf{58} (1996),

2988: 267--288.

2989:

2990:

2991: \end{thebibliography}

2992:

2993:

2994:

2995:

2996:

2997: \section*{Appendices}

2998: \appendix

2999:

3000: \section{Wavelets and Besov spaces}

3001: \label{WavBes}

3002:

3003: We give a brief review of basic definitions of wavelets and their

3004: connection with Besov spaces.

3005: This will be a sketch only; for details, we direct the reader to e.g.

3006: \cite{Mey92, DeV98, Coh00, Mal98}.

3007:

3008: For simplicity we start with dimension 1.  Starting from a (very special)

3009: function

3010: $\psi$ we define\begin{equation*}\psi_{j,k}(x)= 2^{j/2}\ \psi(2^j x-k) ~,

3011: j,k \in \Z~,\end{equation*}

3012: and we assume that the collection $\{\psi_{j,k}; j,~k \in \Z\}$

3013: constitutes an orthonormal basis of $L^2(\mathbb R)$.

3014: For all wavelet bases used in practical applications, there also exists an

3015: associated

3016: {\em scaling function} $\phi$, which is orthogonal to its

3017: translates by integers, and such

3018: that, for all $j \in \Z$,

3019: \begin{equation}

3020: \label{MRA}

3021: \overline{\mbox{Span}\{\phi_{j,k}; k \in \Z\}}~\mbox{\small{$\bigoplus$}}

3022: ~\overline

3023: {\mbox{Span}\{\psi_{j,k}; k \in \Z\}}

3024: = \overline{\mbox{Span}\{\phi_{j+1,k}; k \in \Z\}} ~,

3025: \end{equation}

3026: where the $\phi_{j,k}$ are defined analogously to the $\psi_{j,k}$.

3027: Typically, the functions $\phi$ and $\psi$ are very well localized, in the

3028: sense

3029: that $\forall N \in \N$, $\int_{\R} (1+|x|)^N(|\phi(x)|+|\psi(x)|) dx <

3030: \infty$;

3031: one can even

3032: choose $\phi$ and $\psi$ such that they are supported on a finite interval.

3033: This can be

3034: achieved with arbitrary finite smoothness, i.e. for any preassigned $L \in

3035: \N$, one can

3036: find such $\phi$ and $\psi$ that are moreover in $C^L(\R)$. Because of

3037: \eref{MRA},

3038: one can consider (inhomogeneous) wavelet expansions, in which not all

3039: scales $j$

3040: are used,

3041: but a cut-off is introduced at some coarsest scale, often set at $j=0$.

3042: More precisely,

3043: we shall use the following wavelet expansion

3044: of $f

3045: \in L^2$,

3046: \begin{equation}

3047: \label{inhMRA}

3048:  f= \sum_{k=-\infty}^{+\infty}  \left<f,\phi_{0,k}\right> \phi_{0,k} +

3049: \sum_{j=0}^{+\infty}

3050: \sum_{k=-\infty}^{+\infty} \left<f,\psi_{j,k}\right> \psi_{j,k}~.

3051: \end{equation}

3052: Wavelet bases in

3053: higher dimensions can be built by taking appropriate products of

3054: one-dimensional

3055: wavelet and scaling functions. Such $d$-dimensional bases can be viewed as the

3056: result of translating (by elements $k$ of $\Z^d$) and

3057: dilating (by integer powers $j$ of 2) of

3058: not just one, but several

3059: (finite in number) ``mother wavelets'', typically numbered from

3060: 1 to $2^d-1$.

3061: It will be convenient to abbreviate the full label (including $j$,  $k$ and the

3062: number of the mother wavelet) to just $\lambda$, with the convention that

3063: $|\lambda|=j$.

3064: We shall again cut off at some coarsest scale, and we shall follow the

3065: convenient

3066: slight abuse

3067: of notation used in \cite{Coh00} that sweeps up the coarsest-$j$

3068: scaling functions

3069: (as in \eref{inhMRA}) into the $\Psi_{\lambda}$ as well. We thus denote the

3070: complete

3071: $d$-dimensional, inhomogeneous wavelet basis by

3072: $\{\Psi_{\lambda}; \lambda \in \Lambda\}$.

3073:

3074: It turns out that $\{\Psi_{\lambda}; \lambda \in \Lambda\}$

3075: is not only an orthonormal basis

3076: for $L^2(\R^d)$, but also an unconditional basis for a variety of other

3077: useful

3078: Banach spaces of functions, such

3079: as H\"older spaces, Sobolev spaces and, more generally, Besov spaces.

3080: Again, we review only some basic facts; a full study can be found in e.g.

3081: \cite{Mey92, DeV98, Coh00}.  The Besov spaces $B^s_{p,q}(\R^d)$

3082: consist, basically, of functions that ``have $s$ derivatives in $L^p$'';

3083: the parameter $q$ provides

3084: some additional fine-tuning to the definition of these spaces. The norm

3085: $\|f\|_{_{B^s_{p,q}}}$ in a Besov space $B^s_{p,q}$ is traditionally

3086: defined via the

3087: {\em modulus of continuity} of $f$ in $L^p(\R)$, of which an additional

3088: weighted $L^q$-norm

3089: is then taken, in which the integral is over different scales.

3090: We shall not give its details here; for our purposes it suffices that

3091: this traditional Besov norm is equivalent with a norm that can be computed from

3092: wavelet coefficients. More precisely, let us assume that the original

3093: 1-dimensional $\phi$ and

3094: $\psi$ are in $C^L(\R)$, with $L>s$, that

3095: $\sigma=s+d(\frac{1}{2}-\frac{1}{p}) \geq 0$,

3096: and define the norm $\VVert \cdot \VVert_{_{s;p,q}}$ by

3097: \begin{equation}

3098: \label{triple}

3099: \VVert f \VVert  _{_{s;p,q} }= \left( \sum_{j=0}^{\infty} \left(2^{j \sigma

3100: p} \sum_{

3101: \lambda \in \Lambda ,

3102: |\lambda |=j }|\left<f,\Psi_{\lambda}\right>|^p\right)^{q/p}\right)^{1/q} ~~.

3103: \end{equation}

3104: Then this norm is equivalent to the traditional Besov norm,

3105: $\VVert f \Vvert _{s;p,q}\sim\| f \|_{_{B^s_{p,q}}}$, that is, there exist

3106: strictly positive constants $A$ and $B$ such that

3107: \begin{equation}

3108: \label{Besnor}

3109: A \VVert f \VVert_{_{s;p,q}} \leq \| f \| _{_{B^s_{p,q}}}

3110: \leq B \VVert f \Vvert_{_{s;p,q}} ~.

3111: \end{equation}

3112: The condition that $\sigma \geq 0$ is imposed to ensure

3113: that $B^s_{p,q}(\R^d)$ is a subspace

3114: of $L^2(\R^d)$; we shall restrict ourselves to this case in this paper.

3115: From \eref{triple} one can gauge the fine-tuning role played by the

3116: parameter $q$

3117: in the definition of the Besov spaces. A particularly convenient choice, to

3118: which we

3119: shall stick in the remainder of this paper, is $q=p$,

3120: for which the expression simplifies

3121: to

3122: \begin{equation*}

3123: %\label{triple-simple}

3124: \VVert f \VVert_{_{s,p}} = \left( \sum_{\lambda \in \Lambda} 2^{\sigma

3125: p|\lambda|} ~

3126: | \left< f, \Psi_{\lambda} \right> |^p \right)^{1/p} ~~;

3127: \end{equation*}

3128: to alleviate notation, we shall drop the extra index $q$ wherever it

3129: normally occurs,

3130: on the understanding that $q=p$ when we do so.

3131:

3132: When $0<p,~q<1$, the Besov spaces can still be defined as complete metric spaces,

3133: although they are no longer Banach spaces (because (\ref{triple}) no longer is a norm),

3134: This allows for more local variability in local smoothness than

3135: is typical for functions

3136: in  the usual H\"older

3137: or Sobolev spaces. For instance, a real function $f$ on $\R$ that is piecewise

3138: continuous, but for which each piece is locally in $C^s$, can be an element of

3139: $B^s_p(\R)$, despite the possibility of discontinuities at the

3140: transition from one piece to

3141: the next, provided $p>0$ is sufficiently small, and

3142: some technical conditions are met on the number and

3143: size of the discontinuities, and on the decay at $\infty$ of $f$.

3144: % Moreover,

3145: %this is

3146: %the case for any finite value of $s>0$.

3147:

3148: Wavelet bases are thus closely linked to a rich class

3149: of smoothness spaces; they also provide a good tool for high accuracy

3150: nonlinear approximation of a wide variety of functions.

3151: For instance, if the bounded function $f$

3152: on $[0,1]$ has only finitely many discontinuities, and is $C^s$ elsewhere,

3153: then one can find a way of renumbering (dependent on $f$ itself)

3154: the wavelets in the standard wavelet

3155: expansion of $f$, so that the distance in, say, $L^2([0,1])$ between $f$

3156: and the first $N$ terms of this reordered wavelet expansion, decreases

3157: proportionally to $N^{-s}$.

3158: If $s$ is large, it follows that  a very accurate approximation to $f$ can be

3159: obtained with relatively few wavelets; this

3160: is possible because

3161: the smooth patches of the piecewise continuous

3162: $f$  will be well approximated by coarse scale wavelets, which are

3163: few in number; to capture the behavior of $f$ near

3164: the discontinuities  much more localized

3165: finer scale wavelets are required, but only

3166: those wavelets located exactly near

3167: the discontinuities will be needed, which amounts

3168: again to a small number.

3169:

3170: In higher dimensions, $d > 1$,  the suitability of wavelets

3171: is influenced by the

3172: dimension of the manifolds on which singularities occur. If the

3173: singularities in the

3174: functions of interest are solely point singularities, then expansions

3175: using $N$ wavelets can still approximate such functions with distances that

3176: decrease like $N^{-s}$,

3177: depending on their behavior away

3178: from the

3179: singularities. If, however, we are interested in $f$ that may have, e.g.

3180: discontinuities

3181: along manifolds of dimension higher than 0, then such wavelet

3182: approximations  are not optimal.

3183: For instance, if $f:\R^2 \rightarrow \R$ is piecewise $C^L$, with

3184: possible jumps across the

3185: boundaries of the smoothness domains, which are themselves smooth

3186: (say, $C^L$ again) curves,

3187: then $N$-term wavelet approximations  to $f$ cannot achieve an error

3188: rate decay faster than $N^{-1/2}$,

3189: regardless of the

3190: value of $L>1$.

3191:

3192: It follows that whenever we are faced with an inverse problem

3193: that needs regularization,

3194: in which the objects to be restored are expected to be mostly smooth,

3195: with very localized

3196: lower dimensional areas of singularities,

3197: we can expect that their expansions into wavelets

3198: will be sparse. This sparsity can be expressed by requiring that

3199: the wavelet coefficients (possibly with some scale-dependent weight)

3200: have a finite (or small) $\ell^p$-norm,

3201: with $1\leq p \leq 2$, or equivalently that  the Besov-equivalent norm $\VVert f

3202: \VVert_{_{s,p}}$ is finite (or small), where $\VVert f \VVert_{_{s,p}}$

3203: is exactly of the form

3204: $\Phi_{\mathbf w,p}$ defined in \eref{funct-gen}.

3205:

3206: \section{ A fixed-point theorem}

3207: \label{Opial}

3208:

3209: We provide here the proof of the theorem needed to establish the weak

3210: convergence of the iterative algorithm. The theorem is given in \cite{Opi67};

3211: we give a simplified proof here (see the remark at the end),

3212: which nevertheless still follows the

3213: main lines of Opial's paper.

3214:

3215: \begin{theorem}

3216: \label{FPThm}

3217: Let ${\cal C}$ be a closed convex subset of the Hilbert space $\cH$ and let

3218: the mapping $\A : {\cal C} \to {\cal C}$ satisfy the following conditions:

3219: \begin{enumerate}

3220: \item[{\rm (i)}] $\A $ is non-expansive: $\| \A  v -  \A

3221:  v' \| \leq \| v - v'\|,\ \forall v,v' \in {\cal C}$~,

3222: \item[\rm{ (ii)}] $\A $ is asymptotically regular: $\| \A ^{n+1}

3223: v -\A ^n v\| \xrightarrow[n \to \infty]{~} 0,\ \forall v \in

3224: {\cal C}$~,

3225: \item[\rm{ (iii)}] the set ${\cal F}$ of the fixed points of $\A $ in

3226: ${\cal C}$ is not empty~.

3227: \end{enumerate}

3228: Then, $\forall v \in \cal C$, the sequence $(\A ^n v)_{n \in \mathbb{N}}$

3229: converges weakly to a fixed point in ${\cal F}$.

3230: \end{theorem}

3231: The proof of the main theorem will follow from a series of lemmas.

3232: As before, we use the notation {\em w}$\,$--$\lim$ to indicate a {\em weak}

3233: limit.

3234: \begin{lemma}

3235: \label{FP1}

3236: If $u,v \in\cH$, and if $(v_n)_{n \in \N}$ is a sequence in $\cH$ such that

3237: w--$\lim_{n \to \infty}v_n = v$, and $u \neq

3238: v$, then

3239: $\lim\inf_{n \to \infty} \| v_n - u\| > \lim\inf_{n \to \infty}\| v_n - v\|~$.

3240: \end{lemma}

3241: {\em Proof:}

3242: We have $\lim\inf_{n \to \infty}\| v_n - u\|^2 $

3243: $= \lim\inf_{n \to \infty}\| v_n - v\|^2  +

3244: \| v - u\|^2 +  2 \lim_{n \to \infty} Re (v_n-v,v-u)$

3245: $= \lim\inf_{n \to \infty}\| v_n - v\|^2  +\| v - u\|^2~$,

3246: whence the result.

3247: \hfill\QED

3248:

3249: \bigskip

3250:

3251: \begin{lemma}

3252: \label{FP2}

3253: Suppose that $\A:\cal C \rightarrow \cal C$ satisfies condition

3254: {\rm(i)} in Theorem \ref{FPThm}.\\

3255: If  w--$\lim_{n \to \infty}u_n= u$, and

3256: $\lim_{n \to \infty} \|u_n -\A u_n -h\| =0~$, then

3257: $h = u -\A u$\ .

3258: \end{lemma}

3259: {\em Proof:}

3260: Because of the non-expansivity of $\A $ (assumption (i)),

3261: we have

3262: $\| u_n - (h + \A u)\| $

3263: $\leq \| u_n - h - \A  u_n\|$ $+\|  \A  u_n - \A  u\| $

3264: $\leq \| u_n - h - \A  u_n\|$ $ +\|   u_n -  u \|~$.

3265: Hence,

3266: \begin{eqnarray*}

3267: {\mathop{\rm lim\ inf}_{n \to \infty}}\ \| u_n - (h + \A  u)\|

3268: &\leq&

3269: {\mathop{\rm lim}_{n \to \infty}}\ \| h - (u_n - \A  u_n)\| +

3270: {\mathop{\rm lim\ inf}_{n \to \infty}}\ \| u_n -  u\| \\

3271: &=&

3272: {\mathop{\rm lim\ inf}_{n \to \infty}}\ \| u_n -  u\|

3273: \end{eqnarray*}

3274: It then follows from Lemma \ref{FP1} that $u = h + \A  u$ or $h = u -

3275: \A  u$.

3276: \hfill\QED

3277:

3278: \bigskip

3279:

3280: \begin{lemma}

3281: \label{FP3}

3282: Suppose that $\A :\,\cal C \rightarrow \cal C$ satisfies conditions

3283: {\rm(i)} and {\rm(ii)} in Theorem \ref{FPThm}.

3284: If a subsequence of $(\A ^n v)_{n\in \mathbb{N}}$, with $v \in {\cal C}$,

3285: converges weakly in ${\cal C}$, then its limit is in $\cal F$.

3286: \end{lemma}

3287: {\em Proof:}

3288: Suppose {\em w}$\,$--$\lim_{k \to \infty}\A ^{n_k}v= u~$.

3289: Since, by the assumption (ii) of asymptotic

3290: regularity,

3291: $\lim_{n \to \infty}\|\A ^{n} v - \A  \A ^{n} v \|=0~$,

3292: we have

3293: $\lim_{k \to \infty}\|\A ^{n_k} v - \A  \A ^{n_k} v \|=0~$.

3294: By Lemma \ref{FP2}, it

3295: follows that $u - \A u = 0$ , i.e. that $u$ is in $\cal F$.

3296: \hfill\QED

3297:

3298: \bigskip

3299:

3300: \begin{lemma}

3301: \label{FP4}

3302: Suppose that $\A :\,\cal C \rightarrow \cal C$ satisfies conditions

3303: {\rm(i)} and {\rm(iii)} in Theorem \ref{FPThm}. Then,

3304: for all $h \in {\cal F}$, and all $v \in {\cal C}$, the sequence

3305: $(\|\A ^n v -

3306: h\|)_{n\in \N}\ $ is non-increasing and thus has a limit.

3307: \end{lemma}

3308: {\em Proof:}

3309: Since $\A $ is non-expansive, we have indeed

3310: $\| \A ^{n+1} v - h \|$

3311: $= \| \A \A ^n v -\A   h\| $

3312: $\leq \| \A ^{n} v - h \|~.$

3313: $~~~~~~~~~~~~~~~~~~~~~~~~$\hfill\QED

3314:

3315: \bigskip

3316:

3317: We can now proceed to the

3318:

3319: \bigskip

3320:

3321: \noindent

3322: {\bf Proof of Theorem \ref{FPThm}}

3323:

3324: \noindent

3325: Let $v$ be any element in $\cal{C}$. Take an arbitrary $h \in {\cal F}$.

3326: By Lemma \ref{FP4}, we then have \\

3327: $\lim\sup_{n \to \infty}\| \A ^n v\|$

3328: $ \leq

3329: \lim\sup_{n \to \infty} \|\A ^n v - h\|$

3330: $+\| h\|$

3331: $ = \| h\| $  $+ \lim_{n \to \infty}\ \|\A ^n v - h\|$

3332: $< \infty~$.

3333:

3334: \noindent

3335: Since the $\|\A ^n v\|$ are thus uniformly bounded,

3336: it follows from the Banach-Alaoglu theorem that they must have at least

3337: one weak accumulation point.

3338:

3339: \noindent

3340: The following argument shows that this

3341: accumulation point is unique.

3342: Suppose we have two different accumulation points :

3343: {\em w}$\,$--$\lim_{k \to \infty}\A ^{n_k} v =u$, and

3344: {\em w}$\,$--$\lim_{\ell \to \infty}\A ^{{\tilde n}_\ell} v =\tilde{u}~$,

3345: with $u \neq {\tilde u}$.

3346:

3347: \noindent

3348: By Lemma \ref{FP3}, $u$ and $\tilde u$ must both lie in $\cal F$,

3349: and by Lemma \ref{FP4},

3350: the limits $\lim_{n \to \infty} \|\A ^n v -

3351: u\|$ and $\lim_{n \to \infty} \|\A ^n v -

3352: {\tilde u} \|$ both exist.

3353:

3354: \noindent

3355: Since $\tilde{u} \neq u$, we obtain from Lemma \ref{FP1} that

3356: $\lim\inf_{k \to \infty} \|\A ^{n_k} v - {\tilde u}\| $

3357: $ > {\lim\inf}_{k \to \infty} \|\A ^{n_k} v - u\|\ .$

3358: On the other hand,

3359: because $(\|\A ^{n_k} v-{\tilde u}\|)_{k\in \mathbb{N}}$ and

3360: $(\|\A ^{n_k} v-u\|)_{k\in \mathbb{N}}$

3361: are each a

3362: subsequence of a convergent sequence,

3363: $\lim\inf_{k \to \infty} \|\A ^{n_k} v - {\tilde u}\|$ =

3364: $\lim_{n \to \infty} \|\A ^n v - {\tilde u}\|$ and

3365: $\lim\inf_{k \to \infty} \|\A ^{n_k} v - u\|$ =

3366: $ \lim_{k \to \infty} \|\A ^{n_k} v - u\|~$.

3367: It follows that

3368: $\lim_{n \to \infty} \|\A ^{n} v - {\tilde

3369: u}\|$ $> \lim_{n \to \infty} \|\A ^n v -u\| ~$.

3370: In a completely analogous way (working with the subsequence

3371: $\A ^{{\tilde n}_l} v$ instead of $\A ^{n_k} v$) one derives

3372: the opposite strict inequality. Since both cannot be valid

3373: simultaneously, the assumption of the existence of two different

3374: weak accumulation points for $(\A ^n v)_{n\in \N}$ is false.

3375:

3376: \noindent

3377: It thus follows that $\A ^n v$ converges weakly to this unique

3378: weak accumulation point.

3379: \hfill\QED

3380:

3381: \bigskip

3382:

3383: \begin{remark}

3384: {\rm It is essential to require that the set $\cal F$ is not empty since

3385: there are

3386: asymptotically regular, non-expansive maps that possess no fixed point.

3387: However, the only place where we used this assumption was in showing that the

3388: $\|\A ^n v \|$ were bounded. If one can prove this boundedness by

3389: some other means (e.g. by a variational principle as we did in the iterative

3390: algorithm), then we automatically have a weakly convergent subsequence

3391: $(\|\A ^{n_k} v\|)_{k\in \mathbb{N}}$, and thus, by Lemma

3392: \ref{FP3}, an element of $\cal F$.}

3393: \end{remark}

3394:

3395: \begin{remark}

3396: {\rm The simplification of the original argument of \cite{Opi67}

3397: (obtained through

3398: deriving the contradiction in the proof of Theorem \ref{FPThm}) avoids having

3399: to appeal to the convexity of $\cal F$ (which is true but not immediately

3400: obvious) and having to introduce the auxiliary sets $\cal F_\delta$ used in

3401: \cite{Opi67}.}

3402: \end{remark}

3403:

3404: \end{document}

3405:

3406: -----------------------

3407:

3408: \subsection*{1.4.1 Sparse wavelet expansions.}

3409:

3410: Wavelets provide orthonormal bases of $L^2(\R^d)$ with localization

3411: in space and in scale; this makes them more suitable than e.g.

3412: the Fourier representation for an efficient

3413: representation of functions that have space-varying smoothness properties.

3414: Appendix \ref{WavBes} gives a very succinct overview of wavelets and their

3415: link with

3416: a particular family of smoothness spaces, the Besov spaces. Essentially,

3417: the Besov space $B^s_{p,q}(\R^d)$ is a space of functions on

3418: $\R^d$ that ``have $s$ derivatives in $L^p(R^d)$''; the

3419: index $q$ provides some extra fine-tuning. The

3420: precise definition involves the moduli of continuity of the function,

3421: defined by finite differencing, instead of derivatives, and combines

3422: the behavior of these moduli at different scales. The result is that

3423: functions that are mostly smooth, but have a few local

3424: ``irregularities'', nevertheless can still belong to a Besov space with

3425: high smoothness index. For instance, the 1-dimensional function

3426: $F(x)= \mbox{sign} (x) e^{-x^2}$ belongs to $B^s_{p,q}(\R)$ for

3427: arbitrarily large $s$, and all $p,q \in [1,\infty)$. (Note that

3428: this same example does not belong to any of the Sobolev spaces

3429: $W^s_p(\R)$ with $s>0$.) As reviewed in Appendix \ref{WavBes},

3430: wavelet expansions provide an equivalent norm for the Besov spaces,

3431: which is particularly simple in the case $p=q$, to which we shall restrict

3432: ourselves here. We denote this norm by $\VVert ~ \VVert_{ _{s,p}}$;

3433: it is defined by

3434: (see Appendix \ref{WavBes})

3435: \begin{equation}

3436: %\label{triple-simple}

3437: \VVert f \VVert_{ _{s,p}} = \left( \sum_{\lambda \in \Lambda} 2^{\sigma p

3438: |\lambda|} |\left<f, \Psi_{\lambda} \right> |^p \right) ^{1/p}~,

3439: \end{equation}

3440: where $\sigma$ depends on $s,p$ and is defined by $\sigma =s +d \left(

3441: \frac{1}{2}-\frac{1}{p} \right)$,

3442: and where $|\lambda|$ stands for the scale of the wavelet

3443: $\Psi_{\lambda}$. (The $\frac{1}{2}$ in the formula for $\sigma$ is

3444: due to the choice of normalization

3445: of the $\Psi_{\lambda}$, $\|\Psi_{\lambda}\|_{_{L^2}} =1$.)

3446:

3447:

3448: It follows that minimizing

3449: the variational functional for an inverse

3450: problem with a Besov space prior falls exactly within the category of

3451: problems studied in this paper: for such an inverse problem,

3452: with operator $K$ and with

3453: the a priori knowledge that the object lies in some $B^s_{p,p}$, it

3454: is natural to define the

3455: variational functional to be minimized

3456: by

3457: $$

3458: \Delta(f)+ \VVert f \VVert _{ _{s,p}}^p = \|Kf-g\|^2 +\mu \sum_{\lambda \in

3459: \Lambda} 2^{\sigma p |\lambda|} |\left< f, \Psi_{\lambda} \right> |^p ~~,

3460: $$

3461: which is exactly of the type

3462: $\Phi_{\mathbf{w}},p}(f)$, as defined in \eref{funct-gen}.

3463: For the case where $K$ is the identity operator {\bf and $\sigma =0$ ??}, it

3464: was  noted already in \cite{Cha98}

3465: that the wavelet-based algorithm for denoising

3466: of data with a Besov prior, derived earlier in \cite{Don94},

3467: amounts exactly to the minimization of

3468: $\Phi_{\mu\mathbf{w}_{ _0},1}(f)$, where

3469: $K$ is the identity operator and the $\vpg$-basis is a wavelet basis; the

3470: denoised approximant given in \cite{Don94} then coincides

3471:  with (\ref{simple},\ref{stau}) with $\tau=\mu$.

3472:

3473: It should be noted that if $d > 1$, and if we are interested in functions that

3474: are mostly smooth, with possible jump discontinuities

3475: (or other ``irregularities'')

3476: on smooth manifolds of dimension 1 or higher (i.e. not point irregularities),

3477: then the Besov spaces do not constitute the optnite set of smooth curves,

3478: belong to

3479: $B^1_{1,1}([0,1]^2)$, but not to $B^s_{1,1}([0,1]^2)$ for $s>1$.

3480: In order to obtain

3481: more efficient (sparser) expansions of this type of functions, other bases

3482: have to be used, such as ridgelets or curvelets.

3483: {\bf References!}

3484: One can then again use the approach in this paper, with respect to these

3485: more adapted bases.

3486:

3487: ********

3488:

3489: \section{Wavelets and Besov spaces}

3490: \label{WavBes}

3491:

3492: We give a brief review of basic definitions of wavelets and their

3493: connection with Besov spaces.

3494: This will be a sketch only; for details, we direct the reader to e.g.

3495: \cite{Mal98, Meyer, Lou97, Coh00, DeVore}.

3496:

3497: For simplicity we start with dimension 1.  Starting from a (very special)

3498: function

3499: $\psi$ we define\begin{equation*}\psi_{j,k}(x)= 2^{j/2}\ \psi(2^j x-k) ~,

3500: j,k \in \Z~,\end{equation*}

3501: and we assume that the collection $\{\psi_{j,k}; j,~k \in \Z\}$

3502: constitutes an orthonormal basis of $L^2(\mathbb R)$.

3503: For all wavelet bases used in practical applications, there also exists an

3504: associated

3505: {\em scaling function} $a$, which is orthogonal to its

3506: translates by integers, and such

3507: that, for all $j \in \Z$,

3508: \begin{equation}

3509: \label{MRA}

3510: \overline{\mbox{Span}\{_{j,k}; k \in \Z\}}~\mbox{\small{$\bigoplus$}}

3511: ~\overline

3512: {\mbox{Span}\{\psi_{j,k}; k \in \Z\}}

3513: = \overline{\mbox{Span}\{a_{j+1,k}; k \in \Z\}} ~,

3514: \end{equation}

3515: where the $a_{j,k}$ are defined analogously to the $\psi_{j,k}$.

3516: Typically, the functions $a$ and $\psi$ are very well localized, in the sense

3517: that $\forall N \in \N$, $\int_{\R} (1+|x|)^N(|a(x)|+|\psi(x)|) dx <

3518: \infty$;

3519: one can even

3520: choose $a$ and $\psi$ such that they are supported on a finite interval.

3521: This can be

3522: achieved with arbitrary finite smoothness, i.e. for any preassigned $L \in

3523: \N$, one can

3524: find such $a$ and $\psi$ that are moreover in $C^L(\R)$. Because of

3525: \eref{MRA},

3526: one can consider (inhomogeneous) wavelet expansions, in which not all

3527: scales $j$

3528: are used,

3529: but a cut-off is introduced at some coarsest scale, often set at $j=0$.

3530: More precisely,

3531: we shall use the following wavelet expansion

3532: of $f

3533: \in L^2$,

3534: \begin{equation}

3535: \label{inhMRA}

3536:  f= \sum_{k=-\infty}^{+\infty}  \left<f,a_{0,k}\right>\ a_{0,k} +

3537: \sum_{j=0}^{+\infty}

3538: \sum_{k=-\infty}^{+\infty} \left<f,\psi_{j,k}\right>\ \psi_{j,k}~.

3539: \end{equation}

3540: Wavelet bases in

3541: higher dimensions can be built by taking appropriate products of

3542: one-dimensional

3543: wavelet and scaling functions. Such $d$-dimensional bases can be viewed as the

3544: result of translating (by elements $k$ of $\Z^d$) and

3545: dilating (by integer powers $j$ of 2) of

3546: not just one, but several

3547: (finite in number) ``mother wavelets'', typically numbered from

3548: 1 to $2^d-1$.

3549: It will be convenient to abbreviate the full label (including $j$,  $k$ and the

3550: number of the mother wavelet) to just $\lambda$, with the convention that

3551: $|\lambda|=j$.

3552: We shall again cut off at some coarsest scale, and we shall follow the

3553: convenient

3554: slight abuse

3555: of notation used in \cite{CohDahmDev} that sweeps up the coarsest-$j$

3556: scaling functions

3557: (as in \eref{inhMRA}) into the $\Psi_{\lambda}$ as well. We thus denote the

3558: complete

3559: $d$-dimensional, inhomogeneous wavelet basis by

3560: $\{\Psi_{\lambda}; \lambda \in \Lambda\}$.

3561:

3562: It turns out that $\{\Psi_{\lambda}; \lambda \in \Lambda\}$

3563: is not only an orthonormal basis

3564: for $L^2(\R^d)$, but also an unconditional basis for  a variety of other

3565: useful

3566: Banach spaces of functions, such

3567: as H\"older spaces, Sobolev spaces and, more generally, Besov spaces.

3568: Again, we review only some basic facts; a full study can be found in e.g.

3569: \cite{Meyer, Coh00,

3570: DeVore}.  The Besov spaces $B^s_{p,q}(\R^d)$

3571: consist, basically, of functions that ``have $s$ derivatives in $L^p$'';

3572: the parameter $q$ provides

3573: some additional fine-tuning to the definition of these spaces. The norm

3574: $\|f\|_{_{B^s_{p,q}}}$ in a Besov space $B^s_{p,q}$ is traditionally

3575: defined via the

3576: {\em modulus of continuity} of $f$ in $L^p(\R)$, of which an additional

3577: weighted $L^q$-norm

3578: is then taken, in which the integral is over different scales.

3579: We shall not give its details here; for our purposes it suffices that

3580: this traditional Besov norm is equivalent with a norm that can be computed from

3581: wavelet coefficients. More precisely, let us assume that the original

3582: 1-dimensional $a$ and

3583: $\psi$ are in $C^L(\R)$, with $L>s$, that

3584: $\sigma=s+d(\frac{1}{2}-\frac{1}{p}) \geq 0$,

3585: and define the norm $\VVert \cdot \VVert_{_{s;p,q}}$ by

3586: \begin{equation}

3587: \label{triple}

3588: \VVert f \VVert  _{_{s;p,q} }= \left( \sum_{j=0}^{\infty} \left(2^{j \sigma

3589: p} \sum_{

3590: \lambda \in \Lambda ,

3591: |\lambda |=j }|\left<f,\Psi_{\lambda}\right>|^p\right)^{q/p}\right)^{1/q} ~~.

3592: \end{equation}

3593: Then this norm is equivalent to the traditional Besov norm,

3594: $\VVert f \Vvert _{s;p,q}\sim\| f \|_{_{B^s_{p,q}}}$, that is, there exist

3595: strictly positive constants $A$ and $B$ such that

3596: \begin{equation}

3597: \label{Besnor}

3598: A \VVert f \VVert_{_{s;p,q}} \leq \| f \| _{_{B^s_{p,q}}}

3599: \leq B \VVert f \Vvert_{_{s;p,q}} ~.

3600: \end{equation}

3601: The condition that $\sigma \geq 0$ is imposed to ensure

3602: that $B^s_{p,q}(\R^d)$ is a subspace

3603: of $L^2(\R^d)$; we shall restrict ourselves to this case in this paper.

3604: >From \eref{triple} one can gauge the fine-tuning role played by the

3605: parameter $q$

3606: in the definition of the Besov spaces. A particularly convenient choice, to

3607: which we

3608: shall stick in the remainder of this paper, is $q=p$,

3609: for which the expression simplifies

3610: to

3611: \begin{equation}

3612: \label{triple-simple}

3613: \VVert f \VVert_{_{s,p}} = \left( \sum_{\lambda \in \Lambda} 2^{\sigma p

3614: \|\lambda|} ~

3615: | \left< f, \Psi_{\lambda} \right> |^p \right)^{1/p} ~~;

3616: \end{equation}

3617: to alleviate notation, we shall drop the extra index $q$ wherever it

3618: normally occurs,

3619: on the understanding that $q=p$ when we do so.

3620:

3621: Besov spaces allow more local variability in local smoothness than

3622: is typical for functions

3623: in  the usual H\"older

3624: or Sobolev spaces. For instance, a real function $f$ on $\R$ that is piecewise

3625: continuous, but for which each piece is locally in $C^s$, can be an element of

3626: $B^s_1(\R)$, despite the possibility of discontinuities at the

3627: transition from one piece to

3628: the next, provided some technical conditions are met on the number and

3629: size of the discontinuities, and on the decay at $\infty$ of $f$. Moreover,

3630: this is

3631: the case for any finite value of $s>0$.

3632: This observation makes the Besov spaces particularly suitable when we are

3633: interested

3634: in functions that have such locally variable smoothness characteristics.

3635: In higher dimensions, $d > 1$,  the suitability of the Besov spaces

3636: is influenced by the

3637: dimension of the manifolds on which singularities occur. If the

3638: singularities in the

3639: functions of interest are solely point singularities, then these

3640: functions can still lie

3641: in Besov spaces with high values for $s$, depending on their behavior away

3642: from the

3643: singularities. If, however, we are interested in $f$ that may have, e.g.

3644: discontinuities

3645: along manifolds of dimension higher than $0$, then Besov spaces are not

3646: optimal.

3647: For instance, if $f:\R^2 \rightarrow \R$ is piecewise $C^L$, with

3648: possible jumps across the

3649: boundaries of the smoothness domains, which are themselves smooth

3650: (say, $C^L$ again) curves,

3651: then $f$ cannot lie in a Besov space of smoothness larger than 1,

3652: regardless of the

3653: value of $L>1$. Nevertheless, even an index $s=1$ is a gain over what would

3654: have been

3655: obtained by a H\"older or $L^2$-based Sobolev characterization.

3656:

3657: Since wavelet bases allow particularly easy characterizations of Besov spaces,

3658: as shown by (\ref{triple}, \ref{Besnor}), it follows that they are

3659: also well suited (and in

3660: 1 dimension, optimally suited) for the  functions with local smoothness

3661: variability

3662: described above. Moreover, the smooth patches of these piecewise continuous

3663: functions

3664: will be well approximated by coarse scale wavelets, of which there are much

3665: fewer, while

3666: the discontinuities will be well captured by the much more localized

3667: finer scale wavelets,

3668: of which only select subfamilies will be needed, located exactly near

3669: the discontinuities.

3670:

3671: It follows that whenever we are faced with an inverse problem

3672: that needs regularization,

3673: in which the objects to be restored are expected to be mostly smooth,

3674: with very localized

3675: lower dimensional areas of singularities,  we have a ``natural''  framework

3676: in the

3677: family of Besov spaces, and we can expect that their expansions into wavelets

3678: will be sparse.

3679: Moreover, the $p$-th power of the Besov-equivalent norm $\VVert f

3680: \VVert_{_{s,p}}$

3681: is exactly of the form

3682: $\Phi_{\mathbf w,p}$ defined in \eref{triple-norm}.

3683:

3684: ********

3685:

3686: Remember that the iterative algorithm goes as follows

3687: \begin{itemize}

3688: \item Choose $f^0$ arbitrarily.

3689: \item Define the $(n+1)^{\rm th}$ iterate as

3690: \begin{equation} f^{n+1}={\mathbb T}\ f^n\ .

3691: \end{equation}

3692: \end{itemize}

3693: where the mapping $\mathbb T$ is defined by

3694: \begin{equation} {\mathbb T}\ a ={\mathop{\rm arg\ min}_{f}}\ \Psi_{a}(f)\ .

3695: \label{defT}

3696: \end{equation}

3697: Using an expansion of $a$ on a basis of $L^2$ which diagonalizes the prior

3698: $\Omega(f)$, e.g. on the wavelet basis $\{\psi_\lambda\}$, the variational

3699: problem (\ref{defT}) can be solved through one-dimensional minimization

3700: problems, by using

3701: the minimizer $S(y)$

3702: of the following function of the real variable $x$

3703: \begin{equation}

3704: \varphi(x) = (x-y)^2 + 2\tau \vert x \vert^p \label{1dfunc}

3705: \end{equation}

3706: for a given $y \in {\mathbb R}$ and $\tau > 0$.

3707: The properties of $S(y)$ are investigated in Appendix B. When $p=1$, $S(y)$ has

3708: an explicit expression in terms of thresholding (with a threshold depending

3709: on $\mu$ and $\sigma$) whereas for $1 < p < 2$ it has to be determined

3710: numerically but also acts as a nonlinear shrinkage of the data.

3711: We define the corresponding shrinkage operator${\S}$ as

3712: \begin{equation}

3713: {\S}\ a=\sum_\lambda S(a_\lambda) \ \psi_\lambda\ ;

3714: \ a_\lambda=(a,\psi_\lambda)\ . \label{Sbb}

3715: \end{equation)

3716: \begin{proposition} \label{mapT}

3717: The vector defined by

3718: \begin{equation}

3719: {\mathbb T}\; a= {\S}\; [a + (K^*g-K^*Ka)] \label{Ta}

3720: \end{equation}

3721: where the shrinkage operator is defined by (\ref{Sbb}), provides a

3722: minimizer for

3723: the  functional

3724: $\Psi_a(f)$, i.e.   we have

3725: \begin{equation}

3726: \Psi_a({\mathbb T}\; a +h) \geq \Psi_a({\mathbb T}\; a) + \| h \|^2\

3727: ,\quad

3728: \forall h \in L^2\ .\label{Parmin}

3729: \end{equation}

3730: \end{proposition}

3731: {\em Proof:}

3732: We notice that $\Psi_a$ can be rewritten

3733: as follows

3734: \begin{equation}

3735: \Psi_a(f) = \| f\|^2 - 2(f,c)+\mu \Vvert f \Vvert^p

3736: +\| g\|^2  +\| a\|^2 - \| Ka\|^2\ .

3737: \end{equation}

3738: where $c$ is defined by (\ref{def c}), or else, using expansions on the basis

3739: $\{\psi_\lambda\}$,

3740: \begin{equation}

3741: \Psi_a(f) = \sum_\lambda [\vert f_\lambda - c_\lambda \vert^2 + \mu\

3742: 2^{|\lambda|\sigma p} \vert f_\lambda \vert^p] + {\rm  terms\

3743: independent\ of\ } f.

3744: \end{equation}

3745: We see that

3746: each term in the sum is minimized separately by the minimizer $S(c_\lambda)$ of

3747: the function (\ref{1dfunc}) for $x=f_\lambda$,  $y=c_\lambda$ and $2\tau =

3748: \mu\ 2^{|\lambda|\sigma p}$. The properties of this minimizer are

3749: studied in detail in Appendix B. The property (\ref{Parmin}) is an immediate

3750: consequence of Lemma \ref{1dparmin}.

3751: \hfill\QED\bigskip

3752:

3753: ---------------------

3754:

3755: \newline

3756: When the null-space $\mN(K)$ of $K$ is non-trivial, let us denote

3757: by ${\calS}=\{f\ |\Kf=g_o\}$  the set of exact solutions corresponding

3758: to the noise-free data$g_o$. Since${\cal S}$ is a closed and convex set

3759: with respect to both the $L^2$ andBanach norm$\Vvert \cdot \Vvert$,  it

3760: contains an element $f^{\maltese}$ which hasminimal Banachnorm:

3761: \begin{equation}

3762: f^{\maltese} = {\mathop{\rm arg\ min}_{f \in {\cal S}}}\ \Vvert f \Vvert\ .

3763: \label{defminBnorm}

3764: \end{equation}The uniqueness of such minimizer is guaranteed when $p>1$,

3765: whereas

3766: there might by more than one element satisfying (\ref{defminBnorm}) when $p=1$.

3767: Notice that in the classical case where the Banach norm coincides with

3768: the$L^2$-norm,

3769: $f^{\maltese}$ is the usual generalized solution $f^\dagger$.

3770: When excluding the case where both $p=1$ and $N(K)\neq\{0\}$, we can now

3771: prove that the minimization of (\ref{phimu}) is indeed a regularization

3772: method, in the

3773: sense that the requirement (\ref{regprop}), with $f^{\maltese}$ replacing

3774: $f_o$ if$N(K)\neq\{0\}$, canbe met by an appropriate choice of the

3775: regularization

3776: parameter. Let usremark that inthe case where$\sigma$ is strictly positive,

3777: i.e.$s > \frac{d}{2p} (2-p)$, then $B^s_{p,p}$ is compactly embedded in

3778: $L^2(\mathbb R^d)$, and regularization follows from general compactness

3779: results, as will be recalled and shown subsequently. However, we want first

3780:

3781:

3782: to prove suchresult in the general setting which includes the case $\sigma

3783: = 0$, and

3784: therefore we needsome more work to show that we have defined a

3785: regularization method

3786:

3787:

3788: ___________

3789:

3790: RESERVE MATERIAL ON FRAMES

3791:

3792:

3793: \end{document}

3794:

3795:

3796: A simple example, the setting

3797: of which we borrow from several presentations by Donoho, is given by

3798: $\cH = \C^N$, in which we consider two different orthonormal bases

3799: $\{u_1, \cdots, u_n \}$ and $\{u_{N+1}, \cdots , u_{2N} \}$ defined by

3800: \begin{eqnarray*}

3801: (u_l)_{ _{k}} &=& \delta_{l,k}~~~~~~~~~~~~~~~~~~\mbox{if} ~ l \in \{1,

3802: \cdots,N\}

3803: \\

3804: (u_l)_{ _{k}} &=& \frac{1}{\sqrt{N}} e^{-2 \pi i (l-1)(k-1)/N} ~~~~~

3805: \mbox{if } l \in \{N+1, \cdots,2N\} ~;

3806: \end{eqnarray*}

3807: these are the ``pulse'' and ``FFT'' bases, respectively. Note that

3808: $|\left<u_l,u_k \right>| = \frac{1}{\sqrt{N}}$ if $|l-k| \geq N$, i.e.

3809: each of

3810: these basis vectors has a very non-sparse expansion in the other basis.

3811: Define a tight frame by setting $\psi_k= \frac{1}{2} u_k$, $k=1, \cdots,

3812: 2N$;

3813: one has then, for all $v \in \cH$,

3814: $$

3815: KK^*v = \frac{1}{4} \sum_{k=1}{N} \left<v,u_k\right> u_k

3816: + \frac{1}{4} \sum_{k=N+1}{2N} \left<v,u_k\right> u_k = \frac{1}{2}~ v

3817: ~.

3818: $$

3819: The Gramm operator $K^*K$ has 4 blocks:

3820: \begin{eqnarray*}

3821: (K^*K)k,l &=& \frac{1}{4} \delta_{k,l} ~~~~~~~~\mbox{if } 1 \leq k \leq

3822: N, ~

3823: 1 \leq l \leq N ~, \\

3824: && \frac{1}{4\sqrt{N}} e{2 \pi i (k-1)(l-1)/N} ~~~~ \mbox{if }

3825: 1 \leq k \leq N, ~

3826: N+1 \leq l \leq 2N ~, \\

3827: && \frac{1}{4\sqrt{N}} e{-2 \pi i (k-1)(l-1)/N} ~~~~ \mbox{if }

3828: N+1 \leq k \leq 2N, ~

3829: 1 \leq l \leq N ~, \\

3830: && \frac{1}{4} \delta_{k,l} ~~~~~~~~\mbox{if } N+1 \leq k \leq 2N, ~

3831: N+1 \leq l \leq 2N ~.

3832: \end{eqnarray*}

3833: Then $v=K \tilde{\mathbf{z}}$ for the very sparse sequence

3834: defined by $\tilde{z}_n=\delta_{n,1}+ \delta_{n,N+1}$,

3835: with $\|\tilde{\mathbf{z}}\|_{\ell^1} =2$,

3836: $\|\tilde{\mathbf{z}}\|_{\ell^2}=1$;

3837: the sequence of minimal $\ell^2$-norm

3838: $\mathbf{z}^{\dagger}=K^{\dagger}v$ is given by

3839: $(z^{\dagger})_n=\frac{1}{2} \delta_{n,1}+\frac{1}{2} \delta_{n,N+1}

3840: +\frac{1}{2\sqrt{N}}$, for which

3841: $\|z^{\dagger}\|_{\ell^1}=1 + \sqrt{N}$,

3842: $\|z^{\dagger}\|_{\ell^2}=1$.

3843: For $\mu= \frac{1}{2\sqrt{N}}$, the minimizer $\tilde{\mathbf{z}}_{\mu}$

3844:

3845: of \eref{frame-3}, given by a straightforward

3846: application of the algorithm in this paper, is

3847: defined by

3848: $$

3849: (\tilde{z}_{\mu})_n = \left( 1- \frac{2}{\sqrt{N}+1} \right)

3850: \delta_{n,1}

3851: +

3852: \left( 1- \frac{2}{\sqrt{N}+1} \right) \delta_{n,N+1} ~;

3853: $$

3854: it has $\|\tilde{\mathbf{z}}_{\mu}\|_{\ell^1}=2-\frac{4}{sqrt{N}+1}$,

3855: and

3856: $\|K(\tilde{\mathbf{z}}_{\mu}-v\|_{\ell^2}=\frac{1}{\sqrt{N}}$.

3857:

3858:

3859: \end{document}

3860:

3861:

3862: