1: \documentclass[10pt]{article}
2: \usepackage{epsfig}
3: \title{An iterative thresholding algorithm for linear inverse problems
4: with a sparsity constraint}
5: \author{Ingrid Daubechies \and Michel Defrise \and
6: Christine De Mol}
7: \usepackage{amssymb,amsmath,amsfonts}
8:
9: \newtheorem{theorem}{Theorem}[section]
10: \newtheorem{proposition}[theorem]{Proposition}
11: \newtheorem{lemma}[theorem]{Lemma}
12: \newtheorem{remark}[theorem]{Remark}
13: \newtheorem{corollary}[theorem]{Corollary}
14: \newtheorem{example}[theorem]{Example}
15:
16: \voffset=-3.5cm
17: \hoffset=-2.5cm
18: \textwidth=18cm
19: \textheight=24.5cm
20: %\voffset=-2cm
21: %\hoffset=-1.5cm
22: %\textwidth=16.5cm
23: %\textheight=22 cm
24:
25: \newcommand{\QED}{\hfill \raisebox{-2pt}{\rule{5.6pt}{8pt}\rule{4pt}{0pt}}%
26: \smallskip\par}
27:
28: \def\R{\mathbb{R}}
29: \def\C{\mathbb{C}}
30: \def\N{\mathbb{N}}
31: \def\Z{\mathbb{Z}}
32: \def\S{\mathbf{S}}
33: \def\T{\mathbf{T}}
34: \def\I{\sl {I}}
35: \def\A{\mathbf{A}}
36: \def\cB{\mathcal{B}}
37: \def\cH{\mathcal{H}}
38: \def\cV{\mathcal{V}}
39:
40: \def\mR{\mbox{R}}
41: \def\mN{\mbox{N}}
42: \def\K{K^*}
43: \def\wf{\widetilde{f}}
44: \def\fs{f^{\dagger}}
45: \def\OBJ{\mbox{\tiny\texttt{OBJECT}}}
46: \def\IM{\mbox{\tiny\texttt{IMAGE}}}
47: \def\SUR{\mbox{\tiny{\it{SUR}}}}
48: \def\fg{f_{\gamma}}
49: \def\ag{w_{\gamma}}
50: \def\vpg{\varphi_{\gamma}}
51: \def\g{\gamma}
52: \newcommand\eref[1]{(\ref{#1})}
53: \def\Vvert{\vert\!\vert\!\vert}
54: \def\VVert{[\!|\!|\!]}
55: \begin{document}
56: \maketitle
57:
58: \begin{abstract}
59:
60:
61: We consider linear inverse problems where the solution is assumed to have
62: a sparse expansion on an arbitrary pre--assigned orthonormal basis.
63: We prove that replacing the usual
64: quadratic regularizing penalties by weighted $\ell^p$-
65: penalties on the coefficients of such expansions, with $1 \leq p \leq 2$,
66: still regularizes the problem. If $p < 2$, regularized solutions of such
67: $l^p$-penalized problems will have sparser expansions, with respect to the
68: basis under consideration. To compute the corresponding regularized solutions
69: we propose an iterative algorithm which amounts to a Landweber iteration
70: with thresholding (or nonlinear shrinkage) applied at each iteration step.
71: We prove that this algorithm converges in norm. We
72: also review some potential applications of this method.
73:
74: \end{abstract}
75:
76: \section{Introduction}
77: \subsection{Linear inverse problems}
78: In many practical problems in the sciences and applied sciences, the features
79: of most interest cannot be observed directly, but have to be inferred
80: from other, observable quantities. In the simplest
81: approximation, which works surprisingly well in a wide range of cases
82: and often suffices,
83: there is a {\em linear} relationship between the features of interest
84: and the observed quantities. If we model the {\em object} (the traditional
85: shorthand
86: for the features of interest) by a function $f$, and the derived quantities or
87: {\em image} by another function $h$, we can cast the problem of inferring
88: $f$ from $h$
89: as a {\em linear inverse problem}, the task of which is to
90: solve the equation
91: \begin{equation*}
92: %\label{eqn1}
93: Kf=h~.
94: \end{equation*}
95: This equation and the task of solving it make sense only when placed in an
96: appropriate
97: framework.
98: In this paper, we shall assume that $f$ and $h$ belong to appropriate function
99: spaces, typically Banach or Hilbert spaces,
100: $f \in \cB_{\OBJ}$,
101: $h \in \cB_{\IM}$, and that $K$ is a bounded operator from
102: the space $\cB_{\OBJ}$ to
103: $ \cB_{\IM}$. The choice of the spaces must be
104: appropriate for describing real-life situations.
105:
106: The observations or {\em data}, which we shall model by yet another
107: function, $g$, are typically not
108: exactly equal to the image $h=Kf$, but rather to a distortion of $h$. This
109: distortion
110: is often modeled by an {\em additive noise} or {\em error term} $e$, i.e.
111: \begin{equation*}
112: %\label{eqn2}
113: g=h+e=Kf+e ~.
114: \end{equation*}
115: Moreover, one typically assumes that the ``size'' of the noise can be measured
116: by its $L^2$-norm, $\|e\|=\left( \int_{\Omega} |e|^2\right)^{1/2}$ if $e$
117: is a function on $\Omega$. (In a finite-dimensional situation, one uses
118: $\|e\|=
119: \left(\sum_{n=1}^N |e_n|^2 \right)^{1/2}$ instead.)
120: Our only ``handle'' on the image $h$ is thus via the observed $g$, and
121: we typically have little information on $e=g-h$ beyond an upper bound on
122: its
123: $L^2$-norm $\|e\|$.
124: (We have here implicitly placed ourselves
125: in the ``deterministic setting'' customary to the
126: Inverse Problems community. In the
127: stochastic setting more familiar to statisticians, one assumes instead
128: a bound on the variance
129: of the components of $e$.)
130: Therefore it is customary to take
131: $\cB_{\IM}=L^2(\Omega)$; even if the ``true images'' $h$ (i.e. the images
132: $Kf$ of the
133: possible objects $f$) lie in a much smaller space, we can only know them up
134: to some (hopefully) small $L^2$-distance.
135:
136: We shall consider in this paper
137: a whole family of possible choices for $\cB_{\OBJ}$, but we shall
138: always assume that these spaces are subspaces of a basic Hilbert space
139: $\cH$ (often
140: an $L^2$-space as well), and that $K$ is a bounded operator from $\cH$ to
141: $L^2(\Omega)$.
142: In many applications, $K$ is an integral operator with a kernel
143: representing the response of the imaging device; in the special case where
144: this linear device is
145: translation-invariant, $K$ reduces to a convolution operator.
146:
147: To find an estimate of $f$ from the observed $g$, one can minimize
148: the {\it discrepancy} $\Delta(f)$,
149: \begin{equation*}
150: \Delta(f)=\| Kf-g \|^2\ ;
151: %\label{res}
152: \end{equation*}
153: functions that minimize $\Delta(f)$ are called {\em pseudo-solutions}
154: of the inverse problem. If the operator $K$ has a trivial null-space,
155: i.e. if $\mN(K)=\{f \in \cH ; Kf=0\}=\{0\}$, there is a unique minimizer, given
156: by
157: $\widetilde{f}=(K^*K)^{-1}K^*g$, where $K^*$ is the adjoint operator. If
158: $\mN(K)\neq
159: \{0\}$ it is customary to choose, among the set of pseudo-solutions, the unique
160: element
161: $f^\dagger$ of minimal norm,
162: i.e. $f^\dagger = \mbox{arg-min}\{\|f\|; f \mbox{ minimizes }
163: \Delta(f)\}$.
164: This function belongs to $\mN(K)^\perp$ and is called the {\it generalized
165: solution}
166: of
167: the inverse problem. In this case the map $K^\dagger:
168: g \mapsto f^\dagger$
169: is called the {\it
170: generalized inverse } of $K$. Even when $K^*K$ is not invertible,
171: $K^\dagger g $ is
172: well-defined for all $g$ such that $K^*g \in \mR(K^*K)$.
173: However, the generalized inverse operator may be
174: unbounded (for so-called {\it ill-posed problems}) or have a very large norm
175: (for {\it ill-conditioned problems}). In such instances, it has to be replaced
176: by bounded approximants or approximants with smaller norm, so
177: that numerically stable solutions can be defined and used as meaningful
178: approximations of the true solution corresponding to the exact data.
179: This is the issue of {\it regularization}.
180:
181: \subsection{Regularization by imposing additional quadratic constraints}
182:
183: The definition of a pseudo-solution (or even, if one considers equivalence
184: classes modulo $\mN(K)$, of a generalized solution) makes use of
185: the inverse of the operator $K^*K$; this inverse is well defined on
186: the range $\mR(K^*)$ of $\K $ when $K^*K$ is a strictly
187: positive operator, i.e. when its spectrum is bounded below away from zero.
188: When the spectrum of $K^*K$ is not bounded below by a strictly positive
189: constant, $\mR(K^*K)$ is not closed, and not all elements of $\mR(\K )$ lie
190: in $\mR(\K K)$. In this case there is no guarantee
191: that $K^*g
192: \in \mR(\K K)$; even if $K^*g$ belongs to $\mR(\K K)$, the unboundedness of
193: $(\K K)^{-1}$ can cause severe numerical instabilities unless
194: additional measures are taken.
195:
196: This blowup or these numerical instabilities are ``unphysical'', in the sense
197: that we know a priori that the true object would not have had a huge norm in
198: $\cH$, or other characteristics exhibited by the unconstrained ``solutions''.
199: A standard procedure to avoid these instabilities or to {\em regularize}
200: the inverse
201: problem is to modify the functional to be minimized, so that it
202: incorporates not
203: only the discrepancy, but also the a priori knowledge one may have about the
204: solution. For instance, if it is known that the object is of limited ``size''
205: in $\cH$, i.e. if
206: $\|f\|_{_{\cH}} \leq \rho$ , then the functional to be minimized can be chosen
207: as
208: \begin{equation*}
209: \Delta(f) + \mu \|f\|_{_{\cH}}^2 = \|Kf-g\|_{L^2(\Omega)}^2 +\mu
210: \|f\|_{_{\cH}}^2
211: \end{equation*}
212: where $\mu$ is some positive constant called the {\it regularization
213: parameter}. The minimizer is given by
214: \begin{equation}
215: \label{tikh}
216: f_\mu=(\K K + \mu I)^{-1} K^* g ~.
217: \end{equation}
218: where $I$ denotes the identity operator.
219: The constant $\mu$ can then be chosen appropriately,
220: depending on the application.
221: If $K$ is a
222: compact operator, with singular value decomposition given by
223: $Kf =\sum_{k=1}^{\infty} \sigma_k \left<f,v_k\right> u_k ~$, where $(u_k)_{k
224: \in \N}$ and $(v_k)_{k \in \N}$ are the orthonormal bases of eigenvectors of
225: $KK^*$ and $\K K$, respectively, with corresponding eigenvalues $\sigma_k^2$,
226: then \eref{tikh} can be rewritten as
227: \begin{equation}
228: \label{tikh-b}
229: f_\mu= \sum_{k=1}^{\infty} \frac{\sigma_k}{\sigma_k^2 + \mu}
230: \left<g, u_k\right> v_k ~.
231: \end{equation}
232: This formula shows explicitly how this regularization method reduces the
233: importance
234: of the eigenmodes of $\K K$ with small eigenvalues, which otherwise (if
235: $\mu =0$)
236: lead to instabilities. If an estimate of the ``noise'' is known, i.e. if
237: we know a priori that $g=Kf+e$ with $\|e\| \leq \epsilon$, then one finds
238: from \eref{tikh-b} that
239: \begin{equation*}
240: %\label{svd}
241: \| f - f_\mu \| \leq \left\Vert \sum_{k=1}^{\infty}
242: \frac{ \mu \left< f,v_k\right> }{\sigma_k^2+\mu}\; v_k \right\Vert +
243: \left\Vert \sum_{k=1}^{\infty} \frac{\sigma_k}{\sigma_k^2+\mu}
244: \left< e,u_k\right> v_k \right\Vert
245: \leq \Gamma(\mu) + \frac{\epsilon}{\sqrt\mu} ~,
246: \end{equation*}
247: where $\Gamma(\mu) \rightarrow 0$ as $\mu \rightarrow 0$. This means that
248: $\mu$ can be chosen appropriately, in an $\epsilon$-dependent way,
249: so that the error in estimation
250: $\|f - f_\mu \|$ converges to zero when $\epsilon$ (the estimation of
251: the noise level) shrinks to zero. This feature of the method, usually called
252: {\em stability}, is one that is required for any regularization method.
253: It is similar to requiring that a statistical estimator is consistent.
254:
255: Note that the ``regularized estimate'' $f_\mu$ of \eref{tikh-b} is
256: linear
257: in $g$. This means that we have effectively defined a linear regularized
258: estimation operator that
259: is especially well adapted to the properties of the operator $K$; however,
260: it proceeds with a ``one method fits all'' strategy, independent of the data.
261: This may not
262: always be the best approach. For instance, if $\cH$ is an $L^2$-space
263: itself, and $K$
264: is an integral operator, the functions $u_k$ and $v_k$ are typically fairly
265: smooth;
266: if on the other hand the objects $f$ are
267: likely to have local singularities or discontinuities,
268: an approximation of type \eref{tikh-b}
269: (effectively limiting the estimates $f_\mu$
270: to expansions in the first $N ~v_k$, with $N$ determined by, say, $\sigma_k^2<
271: \mu/100$ for $k>N$) will of necessity be a smoothened version of $f$, without
272: sharp features.
273:
274: Other classical regularization methods with
275: quadratic constraints may use quadratic Sobolev
276: norms, involving a few derivatives, as the ``penalty'' term added to the
277: discrepancy. This introduces a penalization of the
278: highly oscillating components, which are often the
279: most sensitive to noise. This method is especially
280: easy to use in the case where $K$ is a
281: convolution operator, diagonal in the Fourier domain.
282: In this case the regularization
283: produces a smooth cut-off on the highest Fourier components,
284: independently of the data.
285: This works well
286: for recovering smooth objects which have their relevant structure contained in
287: the lower part of the spectrum and which have spectral content homogeneously
288: distributed across the space or time domain.
289: However, the Fourier domain is clearly not the appropriate representation
290: for expressing smoothness properties
291: of objects that are either spatially inhomogeneous, with
292: varying ``local frequency'' content, and/or present some
293: discontinuities, because
294: the frequency
295: cut-off implies that the resolution with which the fine details of the
296: solution can be stably retrieved is necessarily limited; it also
297: implies that the achievable
298: resolution is essentially the same at all points (see e.g. the book
299: \cite{Ber98}
300: for an extensive discussion of these topics).
301:
302: \subsection{Regularization by non-quadratic constraints that promote sparsity}
303:
304: The problems with the standard regularization methods described above
305: are well known and several approaches have been proposed for dealing
306: with them.
307: We propose in this paper a regularization method that, like the classical
308: methods just discussed, minimizes a functional obtained by adding a
309: penalization term
310: to the discrepancy; typically this penalization term will {\em not} be
311: quadratic,
312: but rather a weighted $\ell^p$-norm of the coefficients of $f$ with respect to
313: a particular orthonormal basis in $\cH$, with $1\leq p\leq 2$. More precisely,
314: given an orthonormal basis $\left(\vpg\right)_{\gamma\in \Gamma}$
315: of $\cH$, and given a sequence of strictly positive weights
316: ${\mathbf w}= (\ag )_{\gamma\in \Gamma}$,
317: we define the functional $\Phi_{\mathbf{w},p}$ by
318: \begin{equation}
319: \label{funct-gen}
320: \Phi_{\mathbf{w},p}(f)= \Delta(f)
321: +\sum_{\gamma \in \Gamma} \ag\; |\left<f,\vpg\right>|^p
322: = \|Kf-g\|^2 +\sum_{\gamma \in \Gamma} \ag |\left<f,\vpg\right>|^p ~.
323: \end{equation}
324:
325: For the special case $p=2$ and $\ag=\mu$ for all $\gamma \in \Gamma$
326: (we shall write this as $\mathbf{w}= \mu \mathbf{w}_{_0}$, where
327: $\mathbf{w}_{_0}$ is the sequence with all entries equal to 1),
328: this reduces
329: to the functional \eref{tikh}. If we consider the family of functionals
330: $\Phi_{\mu\mathbf{w}_{_0},p}(f)$, keeping the weights
331: fixed at $\mu$, but decreasing $p$ from 2 to 1, we gradually
332: {\em increase} the penalization
333: on ``small'' coefficients (those with $|\left<f,\vpg\right>| <1$)
334: while simultaneously {\em decreasing} the
335: penalization on ``large coefficients'' (for which
336: $|\left<f,\vpg\right>|>1$). As far as the
337: penalization term is concerned, we are thus putting a lesser penalty on
338: functions $f$
339: with large but few components with respect to the basis
340: $\left(\vpg\right)_{\gamma\in \Gamma}$, and a higher penalty on
341: sums of many small components, when compared to the classical method of
342: \eref{tikh}.
343: This effect is the more pronounced the
344: smaller $p$ is. By taking $p <2$, and especially
345: for the limit value $p=1$,
346: the proposed minimization procedure thus promotes
347: sparsity of the expansion of $f$ with respect
348: to the $\vpg$.
349: (We shall not consider $p < 1$ here,
350: because then the functional
351: ceases to be convex.)
352:
353: The bulk of this paper deals with algorithms to obtain minimizers $f^*$
354: for the
355: functional
356: \eref{funct-gen}, for general operators $K$. In the special case where $K$
357: happens
358: to be diagonal in the $\vpg$--basis,
359: $K \vpg= \kappa_{\g} \vpg$,
360: the analysis is easy and straightforward.
361: Introducing the shorthand notation $\fg$ for $\left<f,\vpg\right>$ and
362: $g_{\gamma}$ for
363: $\left<g,\vpg\right>$, we have then
364: $$
365: \Phi_{\mathbf{w},p}(f)=
366: \sum_{\gamma \in \Gamma} \left[ |\kappa_{\g} \fg-g_{\gamma}|^2
367: + \ag |\fg|^p\right] ~.
368: $$
369: The minimization problem thus uncouples into a family of 1-dimensional
370: minimizations, and is
371: easily solved. Of particular interest is the almost trivial case where
372: (i) $K$ is the identity operator,
373: (ii) $\mathbf{w}=\mu \mathbf{w}_{_0}$ and (iii) $p=1$,
374: which corresponds to the practical situation where
375: the data $g$ are equal to a noisy version of $f$ itself, and we want to remove
376: the noise
377: (as much as possible), i.e. we wish to {\em denoise} $g$. In this case the
378: minimizing $f^\star$ is given by
379: \begin{equation}
380: \label{simple}
381: f^\star = \sum_{\gamma \in \Gamma} f^\star_{\g} \vpg
382: = \sum_{\gamma \in \Gamma} S_{\mu}(g_{\gamma}) \vpg~,
383: \end{equation}
384: where $S_{\mu}$ is the (nonlinear) thresholding operator from $\R$ to $\R$
385: defined by
386: \begin{equation}
387: \label{stau}
388: S_{\mu}(x)= \left\{
389: \begin{array}{ccl} x +\mu/2 ~&~ \mbox{if} ~& x \leq - \mu/2 \\
390: 0 ~&~ \mbox{if} ~& |x| < \mu/2
391: \\ x- \mu/2 ~&~ \mbox{if} ~& x \geq \mu/2.
392: \end{array} \right.
393: \end{equation}
394: (We shall revisit the derivation of \eref{simple} below. For simplicity,
395: we are assuming that all functions are real-valued. If the $\fg$ are complex,
396: a derivation similar
397: to that of \eref{simple} then leads to a complex thresholding operator, which
398: is defined as $S_{\mu}(r e^{i\theta})= S_{\mu}(r) e^{i\theta}$;
399: see Remark \ref{2-5} below.)
400:
401: In more general cases, especially when $K$ is not diagonal with respect
402: to the $\vpg$--basis, it is not as straightforward to minimize
403: \eref{funct-gen}.
404:
405: An approach that promotes sparsity with respect to a particular basis
406: makes sense only if we know that the objects $f$ that we want
407: to reconstruct do indeed have a sparse expansion with respect to this basis.
408: In the next subsection we list some situations in which this is the
409: case and to which the algorithm that we propose in this paper
410: could be applied.
411:
412: \subsection{Possible applications for sparsity-promoting constraints}
413: \subsection*{1.4.1 Sparse wavelet expansions}
414:
415: This is the application that was the primary motivation for this paper.
416: Wavelets provide orthonormal bases of $L^2(\R^d)$ with localization
417: in space and in scale; this makes them more suitable than e.g.
418: Fourier expansions for an efficient
419: representation of functions that have space-varying smoothness properties.
420: Appendix \ref{WavBes} gives a very succinct overview of wavelets and their
421: link
422: with
423: a particular family of smoothness spaces, the Besov spaces. Essentially,
424: the Besov space $B^s_{p,q}(\R^d)$ is a space of functions on
425: $\R^d$ that ``have $s$ derivatives in $L^p(R^d)$''; the
426: index $q$ provides some extra fine-tuning. The
427: precise definition involves the moduli of continuity of the function,
428: defined by finite differencing, instead of derivatives, and combines
429: the behavior of these moduli at different scales.
430: The Besov space $B^s_{p,q}(\R^d)$ is well-defined as
431: a complete metric space even if the indices $p,~q \in (0,\infty)$ are
432: $<1$, although it is no longer a Banach space in this case.
433: Functions that are mostly smooth, but that have a few local
434: ``irregularities'', nevertheless can still belong to a Besov space with
435: high smoothness index. For instance, the 1-dimensional function
436: $F(x)= \mbox{sign} (x)\; e^{-x^2}$ can belong to $B^s_{p,q}(\R)$ for
437: arbitrarily large $s$, provided $0<p<\left(s+\frac{1}{2}\right)^{-1}$. (Note that
438: this same example does not belong to any of the Sobolev spaces
439: $W^s_p(\R)$ with $s>0$, mainly because these can be defined only for
440: $p\geq 1$.) Wavelets provide unconditional bases for the Besov spaces,
441: and one can express whether or not a function $f$ on $\R^d$ belongs
442: to a Besov space by a fairly simple and completely explicit requirement
443: on the absolute values of the wavelet coefficients of $f$.
444: This expression becomes particularly simple when $p=q$; as
445: reviewed in Appendix \ref{WavBes},
446: $f \in B^s_{p,q}(\R)$ if and only if (see Appendix \ref{WavBes})
447: \begin{equation*}
448: %\label{triple-simple}
449: \VVert f \VVert_{ _{s,p}} = \left( \sum_{\lambda \in \Lambda} 2^{\sigma p
450: |\lambda|} |\left<f, \Psi_{\lambda} \right> |^p \right) ^{1/p} < \infty~,
451: \end{equation*}
452: where $\sigma$ depends on $s,p$ and is defined by $\sigma =s +d \left(
453: \frac{1}{2}-\frac{1}{p} \right)$,
454: and where $|\lambda |$ stands for the scale of the wavelet
455: $\Psi_{\lambda}$. (The $\frac{1}{2}$ in the formula for $\sigma$ is
456: due to the choice of normalization
457: of the $\Psi_{\lambda}$, $\|\Psi_{\lambda}\|_{_{L^2}} =1$.)
458: For $p=q\geq 1$, $\VVert ~ \VVert_{ _{s,p}}$ is an equivalent norm to the
459: standard Besov norm on $B^s_{p,q}(\R^d)$; we shall restrict ourselves to
460: this case in this paper.
461:
462:
463: It follows that minimizing
464: the variational functional for an inverse
465: problem with a Besov space prior falls exactly within the category of
466: problems studied in this paper: for such an inverse problem,
467: with operator $K$ and with
468: the a priori knowledge that the object lies in some $B^s_{p,p}$, it
469: is natural to define the
470: variational functional to be minimized
471: by
472: $$
473: \Delta(f)+ \VVert f \VVert _{ _{s,p}}^p = \|Kf-g\|^2 + \sum_{\lambda \in
474: \Lambda}
475: 2^{\sigma p |\lambda|} |\left< f, \Psi_{\lambda} \right> |^p ~~,
476: $$
477: which is exactly of the type
478: $\Phi_{\mathbf{w},p}(f)$, as defined in \eref{funct-gen}.
479: For the case where $K$ is the identity operator, it was
480: noted already in \cite{Cha98}
481: that the wavelet-based algorithm for denoising
482: of data with a Besov prior, derived earlier in \cite{Don94},
483: amounts exactly to the minimization of
484: $\Phi_{\mu\mathbf{w}_{ _0},1}(f)$, where
485: $K$ is the identity operator and the $\vpg$--basis is a wavelet basis; the
486: denoised approximant given in \cite{Don94} then coincides
487: exactly with (\ref{simple}, \ref{stau}).
488:
489: It should be noted that if $d > 1$, and if we are interested in functions that
490: are mostly smooth, with possible jump discontinuities
491: (or other ``irregularities'')
492: on smooth manifolds of dimension 1 or higher (i.e. not point irregularities),
493: then the Besov spaces do not constitute the optimal smoothness
494: space hierarchy. For
495: $d=2$, for instance, functions $f$ that are $C^{\infty}$ on the
496: square $[0,1]^2$, except on a finite set of smooth curves, belong to
497: $B^1_{1,1}([0,1]^2)$, but not to $B^s_{1,1}([0,1]^2)$ for $s>1$.
498: In order to obtain
499: more efficient (sparser) expansions of this type of functions, other expansions
500: have to be used, using e.g. ridgelets or curvelets (\cite{Don00},
501: \cite{Can00}).
502: One can then again use the approach in this paper, with respect to these
503: more adapted bases.
504:
505:
506: \subsection*{1.4.2 Other orthogonal expansions}
507:
508: The framework of this paper applies to enforcing sparsity of the expansion of
509: the solution on any orthonormal basis. We provide here three examples which are
510: particularly relevant for applications, but this is of course not limitative.
511:
512: The first example is the case where it is known a priori
513: that the object to be recovered is sparse in the Fourier domain, i.e. $f$
514: has only
515: a few nonzero Fourier components. It makes then sense to choose a standard
516: Fourier
517: basis for the $\vpg$, and to apply the algorithms explained later in this
518: paper.
519: (They would have to be adapted to deal with complex functions, but
520: this is easily done; see Remark \ref{2-5} below.)
521: In the case where $K$ is the identity operator,
522: this is a classical problem, sometimes referred to as ``tracking sinusoids
523: drowned
524: in noise'', for which many other algorithms have been
525: developed.
526:
527: For other applications, objects are naturally sparse in the original
528: (space or time) domain. Then our framework can be used again if we expand such
529: objects in a basis formed by the characteristic functions of pixels or voxels.
530: Once the inverse problem is discretized in pixel space, it is regularized by
531: penalizing the $l^p$-norm of the object with $1 \le p \le 2$. Possible
532: applications include the restoration of astronomical images with scattered
533: stars
534: on a flat background. Objects formed by a few spikes are also typical of some
535: inverse
536: problems arising in spectroscopy or in particle sizing. In medical
537: imaging, $l^p$-norm
538: penalization with $p$ larger than but close to $1$ has been used e.g.
539: for the restoration of tiny blood vessels \cite{Li02}.
540:
541: The third example refers to the case where $K$ is compact and the use of SVD
542: expansions is a viable computational approach, e.g. for the solution of
543: relatively small-scale problems or
544: for operators that can be diagonalized in an analytic way. As already stressed
545: above, the linear regularization methods as given e.g. by \eref{tikh-b}
546: have the drawback that the penalization or cut-off eliminates the components
547: corresponding to the
548: smallest singular values, independently of the type of data. In some
549: instances,
550: the most
551: significant coefficients of the object may not correspond to the largest
552: singular
553: values; it may then happen that the object possesses
554: significant coefficient beyond
555: the cut-off imposed by linear methods. In order to avoid the elimination of
556: such
557: coefficients, it is preferable to use instead a
558: nonlinear regularization analogous to (\ref{simple}, \ref{stau}), with basis
559: functions $\vpg$ replaced
560: by the singular vectors $v_k$.
561: The theorems in this paper show that the {\it thresholded SVD expansion}
562: $$
563: f^* = \sum_{k=1}^{+\infty} S_{\mu/\sigma_k^2}
564: \left(\frac{\left< g,u_k\right>}{\sigma_k}
565: \right) v_k
566: ~=~ \sum_{k=1}^{+\infty}\frac{1}{\sigma_k^2} ~ S_{\mu}
567: \left( \sigma_k \left< g,u_k\right>
568: \right) v_k~,
569: $$
570: which is the minimizer of the functional \eref{funct-gen} with
571: $\mathbf{w}=\mu \mathbf{w}_{_0}$ and $p=1$,
572: provides a regularized solution that is better adapted to these
573: situations.
574:
575: \subsection*{1.4.3 Frame expansions}
576:
577: In a Hilbert space $\cH$, a frame $\{\psi_n\}_{n \in \N}$ is a
578: set of vectors for which there exist
579: constants $A, B >0$ so that, for all $v \in \cH$,
580: $$
581: B^{-1} \sum_{n\in \N} |\left<v,\psi_n\right>|^2 \leq \|v\|^2 \leq
582: A^{-1} \sum_{n \in \N}
583: |\left< v, \psi_n\right>|^2 ~~.\
584: $$
585: Frames always span the whole space $\cH$, but the frame vectors $\psi_n$
586: are
587: typically not linearly independent. Frames were first proposed
588: by Duffin and Schaeffer in \cite{DuSh52}; they are now used in a wide
589: range of
590: applications.
591: For particular choices of the frame vectors, the two frame bounds $A$
592: and
593: $B$ are equal; one has then, for all $ v \in \cH$,
594: \begin{equation}
595: \label{frame-1}
596: v = A^{-1} \sum_{n \in \N} \left<v, \psi_n \right> \psi_n ~.
597: \end{equation}
598: In this case, the frame is called {\em tight}.
599: An easy example of a frame is given by taking the union of two (or more)
600: different
601: orthonormal bases in $\cH$; these unions always constitute tight frames,
602: with $A=B$ equal to the number of orthonormal bases used in the union.
603:
604: Frames are typically ``overcomplete'', i.e. they still span all of
605: $\cH$ even if some frame vectors are removed. It follows that, given
606: a vector $v$ in $\cH$, one can find many different sequences of
607: coefficients
608: such that
609: \begin{equation}
610: \label{frame-v}
611: v = \sum_{n \in \N} z_n \psi_n ~~.
612: \end{equation}
613: Among these sequences, some have special properties for which they
614: are preferred. There is, for instance,
615: a standard procedure to find the unique sequence
616: with minimal $\ell^2$-norm; if the frame is tight, then this
617: sequence is given by $z_n=A^{-1} \left<v,\psi_n\right>$,
618: as in \eref{frame-1}.
619:
620: The problem of finding sequences
621: $\mathbf{z}=(z_n)_{n \in \N}$ that satisfy \eref{frame-v} can be
622: considered as an inverse problem. Let us define the operator $K$
623: from $\ell^2 (\N)$ to $\cH$ that maps a sequence
624: $\mbox{{\bf z}}= (z_n)_{n \in \N}$ to the element $K \mbox{{\bf z}}$ of $\cH$
625: by
626: \begin{equation*}
627: %\label{frame-2}
628: K \mbox{{\bf z}} = \sum_{n \in \N} z_n \psi_n ~.
629: \end{equation*}
630: Then solving \eref{frame-v} amounts to solving $K\mathbf{z}=v$. Note
631: that
632: this operator $K$ is associated with, but not identical to
633: what is often called the ``frame operator''. One has, for
634: $v \in \cH$,
635: $$
636: K K^* v = \sum_{n \in \N} \left<v, \psi_n \right> \psi_n~ ;
637: $$
638: for $\mathbf{z} \in \ell^2$, the sequence $K^*K\mathbf{z}$ is given by
639: $$
640: (K^*K \mathbf{z})_k = \sum_{l \in \N} z_l \left< \psi_l, \psi_k \right> ~.
641: $$
642: In this framework, the sequence $\mathbf{z}$ of minimum
643: $\ell^2$-norm that satisfies \eref{frame-v} is given simply by
644: $\mathbf{z}^{\dagger}= K^{\dagger}v$. The standard procedure in frame
645: lore
646: for the construction of $\mathbf{z}^{\dagger}$ can be rewritten as
647: $\mathbf{z}^{\dagger}=K^*(KK^*)^{-1}v$, so that
648: $K^{\dagger}=K^*(KK^*)^{-1}$
649: in this case. This last formula holds because this inverse problem is
650: in fact well-posed: even though $\mN(K) \neq \{0\}$, there is a gap
651: in the spectrum of $K^*K$ between the eigenvalue 0 and the remainder
652: of the spectrum, which is contained in the interval $[A,B]$; the
653: operator
654: $KK^*$ has its spectrum completely within $[A,B]$. In practice, one
655: always
656: works with frames for which the ratio $B/A$ is reasonably close to 1, so
657: that the problem is not only well-posed but also well-conditioned.
658:
659: It is often of interest, however,
660: to find sequences
661: that are sparser than $\mathbf{z}^{\dagger}$.
662: For instance, one may know a priori that $v$ is a ``noisy'' version
663: of a linear combination of $\psi_n$ with a coefficient sequence of small
664: $\ell^1$-norm. In this case, it makes sense to determine a sequence
665: $\mathbf{z}_{\mu}$ that minimizes
666: \begin{equation}
667: \label{frame-3}
668: \|K\mathbf{z}-v\|^2_{\cH} + \mu \|{\mathbf{z}}\|_{\ell^1} ~,
669: \end{equation}
670: a problem that
671: falls exactly in the category of problems described in subsection 1.3.
672: Note that although the inverse problem for $K$ from $\ell^2(\N)$ to $\cH$
673: is well-defined, this need not be the case with the restriction
674: $K \big|_{\ell^1}$ from $\ell^1(\N)$ to $\cH$. One can indeed find
675: tight frames for which
676: $\sup \{ \|\mathbf{z}\|_{_{\ell^1}}~;~ \mathbf{z} \in \ell^1 \mbox{ and }
677: \|K\mathbf{z}\| \leq 1 \} = \infty$,
678: so that for arbitrarily large $R$ and arbitrarily small $\epsilon$,
679: one can find $\tilde{v} \in \cH$, $\tilde{\mathbf{z}} \in \ell^1$
680: with $\|\tilde{v}-K\tilde{\mathbf{z}}\| = \epsilon$, yet
681: $\inf \{ \|\mathbf{z}\|_{_{\ell^1}} ~; ~ \|\tilde{v}-K{\mathbf{z}} \|
682: \leq \epsilon/2 \} \geq R \|\tilde{\mathbf{z}}\|_{\ell^1}$.
683: In a noisy situation, it therefore may not make sense to search for the
684: sequence
685: with minimal $\ell^1$--norm that is ``closest'' to $v$; to find
686: an estimate of the $\ell^1$--sequences of which a given $v$ is known
687: to be a small perturbation, a better strategy is to compute the minimizer
688: $z_{\mu}$ of \eref{frame-3}.
689:
690: Minimizing the functional \eref{frame-3} as an approach to obtain
691: sequences that provide sparse approximations $K\mathbf{z}$
692: to $v$ was
693: proposed and applied to various problems by Chen, Donoho and Saunders
694: \cite{CDS01};
695: in the statistical literature, least-squares regression with
696: $\ell^1$-penalty is known as the ``lasso'' \cite{Tib96}.
697: The algorithm in this paper provides thus an alternative to linear and
698: quadratic programming techniques for these problems,
699: which all amount to minimizing \eref{frame-3}.
700:
701: % \newpage
702:
703: \subsection{A summary of our approach}
704:
705: Given an operator $K$ from $\cH$ to itself (or, more generally, from
706: $\cH$ to $\cH'$), and an orthonormal basis
707: $(\vpg)_{\gamma \in \Gamma}$, our goal is to find minimizing
708: $f^\star$ for the functionals $\Phi_{\mathbf{w},p}$ defined in section
709: 1.3. The corresponding variational equations are
710: \begin{equation}
711: \label{variational}
712: \forall \gamma \in \Gamma ~ : ~
713: \left< \K Kf, \vpg \right> - \left< \K g , \vpg \right>
714: + \frac{ \ag p}{2}~ | \left< f, \vpg\right> |^{p-1}
715: \mbox{sign}(\left< f, \vpg\right>) = 0 ~~.
716: \end{equation}
717: When $p \neq 2$ and $K$ is not diagonal in the
718: $\vpg$-basis, this gives a coupled system of
719: nonlinear equations for the $\left<f, \vpg \right>$,
720: which it is not immediately clear how to solve.
721: To bypass this problem, we
722: shall use a sequence of ``surrogate'' functionals that are each
723: easy to minimize,
724: and for which we expect, by a heuristic argument, that the successive
725: minimizers have our desired $f^\star$ as a limit.
726: These surrogate
727: functionals are introduced in section 2 below. In section 3 we then show that
728: their successive minimizers do indeed converge to $f^\star$; we first
729: establish weak convergence, but conclude the section by proving that the
730: convergence also holds in norm. In section 4 we show that our proposed
731: iterative method is {\em stable}, in the sense given in subsection 1.2: if we
732: apply the algorithm to data that are a small perturbation of a ``true image''
733: $K f_o$,
734: then the algorithm will produce $f^\star$ that converge
735: to $f_o$ as the norm of the perturbation tends to zero.
736:
737: \subsection{Related work}
738:
739:
740: % version revisee par Ingrid
741: Exploiting the sparsity of the expansion on a given basis
742: of an unknown signal, in order to assist in the estimation
743: or approximation of the signal from noisy data, is not a new
744: idea. The key role played by sparsity to achieve superresolution
745: in diffraction-limited imaging was already emphasized by Donoho
746: \cite{Don92} more than a decade ago. Since the seminal paper by Donoho
747: and Johnstone \cite{Don94}, the use of thresholding techniques for
748: sparsifying the wavelet expansions of noisy signals in order to remove the
749: noise (the so-called ``denoising'' problem) has been abundantly discussed in
750: the literature, mainly in a statistical framework (see e.g.
751: the book \cite{Mal98}).
752: Of particular importance for the background of this paper is the article by
753: Chambolle et al. \cite{Cha98}, which provides a variational formulation
754: for denoising, through the use of penalties on a Besov-norm of
755: the signal; this is the perspective adopted in the present paper.
756:
757: Several attempts have been made to generalize the denoising framework
758: to solve inverse problems. To overcome the coupling problem stated in the
759: preceding subsection, a first approach is to construct wavelet- or
760: ``wavelet-inspired'' bases that are in some sense adapted to the operator to
761: be inverted. The so-called Wavelet-Vaguelette
762: Decomposition (WVD) proposed by Donoho \cite{Don95}, as well as the twin
763: Vaguelette-Wavelet Decomposition method \cite{Abr98}, and also the
764: deconvolution in mirror wavelet bases \cite{KMR03, Mal98} can all be
765: viewed as examples of this strategy. For the inversion of the Radon transform,
766: Lee and Lucier \cite{Lee01} formulated a
767: generalization of the WVD decomposition that uses a variational
768: approach to set thresholding levels. A drawback
769: of these methods is that they are limited to special types of operators $K$
770: (essentially convolution-type operators under some additional
771: technical assumptions).
772:
773: Other papers have explored the application of Galerkin-type methods to inverse
774: problems, using an appropriate but fixed wavelet basis \cite{Dic96, Lou97,
775: Coh02}. The underlying intuition is again that if the operator lends itself
776: to a
777: fairly sparse representation in wavelets, e.g. if it is an operator of the
778: type
779: considered in \cite{Bey91}, and if the object is mostly smooth with some
780: singularities, then the inversion of the truncated operator will not be too
781: onerous, and the approximate representation of the object will do a good
782: job of
783: capturing the singularities. In \cite{Coh02}, the method is made
784: adaptive, so that
785: the finer-scale wavelets are used where lower scales indicate the presence of
786: singularities.
787:
788: The mathematical framework in this paper has the advantage of not
789: pre-supposing any
790: particular properties for the operator $K$ (other than boundedness) or the
791: basis
792: $(\vpg)_{\gamma\in \Gamma}$ (other than its orthonormality). We prove,
793: in complete
794: generality, that generalizing Tikhonov's regularization method from the
795: $\ell^2$-penalty case to a $\ell^1$-penalty (or, more generally, a weighted
796: $\ell^p$-penalty with $1\leq p\leq 2$) provides a proper regularization
797: method for
798: ill-posed problems in a Hilbert space $\cal H$, with estimates that are
799: independent
800: of the dimension of $\cal H$ (and are thus valid for infinite-dimensional
801: separable $\cal H$). To our knowledge, this is the first proof of this fact.
802: Moreover, we derive a Landweber-type iterative algorithm that involves a
803: denoising procedure at each iteration step and provides a sequence of
804: approximations
805: converging in norm to the variational minimizer, with estimates of the rate of
806: convergence in particular cases. This algorithm was derived previously
807: in \cite{DeM02}, using,
808: as in this paper, a construction based on ``surrogate
809: functionals''. During the final editing of the present paper, our attention
810: was
811: drawn to the independent work by Figueiredo and Nowak \cite{Fig03, Nov01},
812: who, working in the different (finite-dimensional
813: and stochastic) framework of Maximum Penalized
814: Likelihood Estimation
815: for inverting a
816: convolution operator acting on objects that are sparse in the wavelet domain,
817: derive essentially the same iterative algorithm as in \cite{DeM02} and
818: this paper.
819:
820:
821: \section{An iterative algorithm through
822: surrogate functionals}
823: It is the combined presence of $\K Kf$ (which couples all the equations)
824: and the nonlinearity of the equations that makes the system
825: \eref{variational} unpleasant. For this reason, we borrow a technique
826: of optimization transfer (see e.g. \cite{Lan00} and \cite{DeP95}) and
827: construct surrogate functionals that effectively remove the term $\K Kf$. We
828: first pick a constant $C$ so that $ \|\K K \| < C$, and then
829: we define the functional $\Xi(f;a)=
830: C\|f-a\|^2-\|Kf-Ka\|^2$ which depends on an auxiliary element $a$ of
831: $\cH$.
832: Because $C\mbox{\I} - \K K$ is a strictly positive operator, $\Xi(f;a)$ is
833: strictly convex in $f$ for any choice of $a$. If $\|K\|<1$,
834: we are allowed to set $C=1$; for simplicity, we will restrict ourselves
835: to this case, without loss of generality since
836: $K$ can always be renormalized.
837: We then add $\Xi(f ; a)$ to $\Phi_{\mathbf{w},p}(f)$ to form the following
838: ``surrogate functional''
839: \begin{eqnarray}
840: \label{sur}
841: {\Phi}^{^{\SUR}}_{\mathbf{w},p}(f ; a)&=& \Phi_{\mathbf{w},p}(f)
842: - \|Kf-Ka\|^2 + \|f-a\|^2 \nonumber \\
843: &=& \|Kf-g\|^2 + \sum_{\gamma \in \Gamma} \ag |\left< f,
844: \vpg \right>|^p - \|Kf-Ka\|^2 + \|f-a\|^2 \nonumber \\
845: &=& \|f\|^2 - 2\left<f, a+\K g - \K K a \right> + \sum_{\gamma}
846: \ag | \left< f, \vpg \right>|^p + \|g\|^2 + \|a\|^2 - \| K a
847: \|^2\nonumber \\ &=& \sum_{\gamma} \left[ \fg^2 -2 \fg
848: \left(a +\K g - \K K a \right)_{\gamma} + \ag |\fg|^p
849: \right] + \|g\|^2 + \|a\|^2 - \| K a \|^2
850: \end{eqnarray}
851: where we have again used the shorthand $v_{\gamma}$ for $\left<v,\varphi_
852: {\gamma} \right>$, and implicitly assumed
853: that we are dealing with real functions only.
854: Since $\Xi(f;a)$ is strictly convex in $f$,
855: ${\Phi}^{^{\SUR}}_{\mathbf{w},p}(f ; a)$ is also strictly convex
856: in $f$, and has a unique minimizer for any choice of $a$. The advantage of
857: minimizing \eref{sur} in place of \eref{variational} is that the variational
858: equations for the $\fg$ decouple. We can then try to approach the minimizer
859: of $\Phi_{\mathbf{w},p}(f)$ by an iterative process which goes as follows:
860: starting from an arbitrarily chosen $f^0$, we determine the minimizer
861: $f^1$ of \eref{sur} for $a = f^0$; each successive iterate $f^n$ is then
862: the minimizer for $f$ of
863: the surrogate functional \eref{sur} anchored at the previous iterate, i.e. for
864: $a= f^{n-1}$. The iterative algorithm thus goes as follows
865: \begin{equation}
866: \label{iter}
867: f^0 {\mbox\ {\rm arbitrary}}\ ; f^n= \mbox{{\rm arg--min}}
868: \left({\Phi}^{^{\SUR}}_{\mathbf{w},p}(f ; f^{n-1})\right)\ \ n=1, 2,\dots
869: \end{equation}
870: To gain some insight into this iteration,
871: let us first focus on two special cases.
872:
873: In the case where $\mathbf{w}=\mathbf{0}$ (i.e. the functional
874: $\Phi_{\mathbf{w},p}$ reduces to the discrepancy only), one needs to
875: minimize
876: $$
877: {\Phi}^{^{\SUR}}_{\mathbf{0},p}(f ; f^{n-1})=\|f\|^2-2\left<f,
878: f^{n-1} + \K (g - K f^{n-1}) \right> +\|g\|^2 + \|f^{n-1}\|^2 - \|Kf^{n-1}\|^2
879: ~;
880: $$
881: this leads to
882: \begin{equation*}
883: %\label{BPRes}
884: f^{n} = f^{n-1} + \K (g - Kf^{n-1})~~.
885: \end{equation*}
886: This is
887: nothing else than the so-called Landweber iterative method, the convergence of
888: which to the (generalized) solution of
889: $Kf=g$ is well-known (\cite{Lad51}; see also \cite{Ber98},
890: \cite{Eng96}).
891:
892: In the case where $\mathbf{w}=\mu \mathbf{w_0}$ and $p=2$,
893: the $n$-th surrogate functional reduces to
894: $$
895: {\Phi}^{^{\SUR}}_{\mathbf{w},2}(f ; f^{n-1})=(1+\mu)\;
896: \|f\|^2-2\left<f,f^{n-1} + \K (g - K f^{n-1}) \right> +\|g\|^2 +
897: \|f^{n-1}\|^2 -
898: \|Kf^{n-1}\|^2 ~;
899: $$
900: the minimizer is now
901: \begin{equation}
902: \label{DamLan}
903: f^n = \frac{1}{1+\mu} \left[ f^{n-1} +\K (g -K f^{n-1}) \right] ~,
904: \end{equation}
905: i.e. we obtain a damped or regularized Landweber iteration
906: (see e.g. \cite{Ber98}). The convergence of the function $f^n$ defined by
907: (\ref{DamLan}) follows immediately from the estimate
908: $\|f^{n+1}-f^n\| = (1+\mu)^{-1} \|(\mbox{\I}-\K K)(f^n-f^{n-1})\|
909: \leq (1+\mu)^{-1} \|f^n -f^{n-1}\|$, showing that we have a contractive
910: mapping,
911: even if $\mN(K) \neq \{0\}$.
912:
913: In these two special cases we thus find that the $f^n$ converge as
914: $n \rightarrow \infty$. This permits one to hope that the $f^n$ will
915: converge for general $\mathbf{w}, p$ as well; whenever this is the
916: case the difference
917: $\|f^{n}-f^{n-1}\|^2 - \|K(f^{n}-f^{n-1})\|^2$ between
918: ${\Phi}^{^{\SUR}}_{\mathbf{w},p}(f^n ; f^{n-1})$ and
919: $\Phi_{\mathbf{w},p}(f^n)$ tends to zero as $n \rightarrow \infty$,
920: suggesting that the minimizer
921: $f^n$ for the first functional could well tend to a minimizer
922: $f^\star$ of the second.
923: In section 3 we shall see that all this is more than a pipe-dream; i.e. we
924: shall prove that the $f^n$ do indeed converge to a minimizer of
925: $\Phi_{\mathbf{w},p}$.
926:
927: In the remainder of this section, we derive an explicit formula
928: for the computation of the successive
929: $f^n$. We first discuss the minimization of the functional
930: \eref{sur} for a generic $a \in \cH$.
931: As already noticed, the variational equations for the $\fg$
932: decouple. For $p>1$, the summand in \eref{sur} is differentiable in $\fg$, and
933: the minimization reduces to solving the variational equation
934: \begin{equation*}
935: %\label{vareq-pneq1}
936: 2 \fg + p \, \ag \, \mbox{sign}(\fg) |\fg|^{p-1} = 2( a_{\gamma} + [\K (g-K
937: a)]_{\gamma}) ~;
938: \end{equation*}
939: since for any $w \geq 0$ and any $p>1$,
940: the real function $F_{w,p}(x)=x+ {\frac{w p}{2}} ~ \mbox{sign}(x)|x|^{p-1}$ is
941: a one-to-one map
942: from $\mathbb{R}$ to itself,
943: we thus find that the minimizer of \eref{sur} satisfies
944: \begin{equation}
945: \label{solcomp-pneq1}
946: \fg= S_{\ag,p}( a_{\gamma} + [\K (g-K a)]_{\gamma}) ~,
947: \end{equation}
948: where $S_{w,p}$ is defined by
949: \begin{equation}
950: \label{S-pneq1}
951: S_{w,p}= \left( F_{w,p} \right)^{-1} ~, ~ \mbox{for } p>1.
952: \end{equation}
953:
954: When $p=1$, the summand of \eref{sur} is differentiable in $\fg$
955: only if $\fg \neq 0$; except at the point of non-differentiability,
956: the variational equation now reduces to
957: \begin{equation*}
958: %\label{vareq-peq1}
959: 2 \fg + \ag \, \mbox{sign}(\fg) = 2 ( a_{\gamma} + [\K (g-K a)]_{\gamma})
960: ~.
961: \end{equation*}
962: For $\fg>0$, this leads to $\fg= a_{\gamma} + [\K (g-K
963: a)]_{\gamma} -
964: \ag/2$; for consistency we must impose
965: in this case that $a_{\gamma} + [\K (g-K
966: a)]_{\gamma}
967: > \ag/2$. For $\fg <0$, we obtain
968: $\fg=a_{\gamma} + [\K (g-K
969: a)]_{\gamma}+\ag/2$, valid only when
970: $a_{\gamma} + [\K (g-K
971: a)]_{\gamma} < -\ag/2$. When
972: $a_{\gamma} + [\K (g-K
973: a)]_{\gamma}$ does not satisfy either of the two
974: conditions. i.e. when $|a_{\gamma} + [\K (g-K
975: a)]_{\gamma}| \leq \ag/2$,
976: we put $\fg =0$. Summarizing,
977: \begin{equation}
978: \label{solcomp-peq1}
979: \fg = S_{\ag,1}(a_{\gamma}+[ \K( g - K a)]_{\gamma}) ~,
980: \end{equation}
981: where the function $S_{w,1}$ from $\mathbb{R}$ to itself is defined by
982: \begin{equation}
983: \label{S-peq1}
984: S_{w,1}(x)=\left\{ \begin{array}{ccl} {x-w/2} & {\mbox{if}} & {x \geq w/2} \\
985: {0} & {\mbox{if}} & {|x| < w/2} \\ {x+w/2 }& {\mbox{if}} & {x \leq -w/2 ~~.}
986: \end{array} \right.
987: \end{equation}
988: (Note that this is the same nonlinear function as encountered
989: earlier in section 1.3, in definition \eref{stau}.)
990:
991: The following proposition summarizes our findings, and proves (the case
992: $p=1$ is not conclusively proved by the variational equations above)
993: that we have indeed found the minimizer of
994: ${\Phi}^{^{\SUR}}_{\mathbf{w},p}(f ; a)$:
995: \begin{proposition}
996: \label{prop-2-1} Suppose the operator $K$ maps a Hilbert space $\cH$
997: to another Hilbert space $\cH'$, with $\|\K K\| < 1$,
998: and suppose $g$ is an element of $\cH'$.
999: Let $(\vpg)_{\gamma \in \Gamma}$
1000: be an orthonormal basis for $\cH$, and
1001: let $\mathbf{w}=(\ag)_{\g \in \Gamma}$ be a sequence
1002: of strictly positive numbers. Pick
1003: arbitrary $p \geq 1$ and $a \in \cH$. Define the
1004: functional ${\Phi}^{^{\SUR}}_{\mathbf{w},p}(f ; a)$ on $\cH$ by
1005: $$
1006: {\Phi}^{^{\SUR}}_{\mathbf{w},p}(f ; a)=\|Kf-g\|^2 + \sum_{\g \in
1007: \Gamma} \ag |\fg|^p
1008: +\|f-a\|^2-\|K(f-a)\|^2 ~.
1009: $$
1010: Then ${\Phi}^{^{\SUR}}_{\mathbf{w},p}(f ; a)$ has a unique minimizer
1011: in $\cH$. \\
1012: This minimizer
1013: is given by $f=\S_{\mathbf{w},p}\left(a +\K (g-Ka) \right)$, where
1014: the operators $\S_{\mathbf{w},p}$ are defined by
1015: \begin{equation}
1016: \label{def-SS}
1017: \S_{\mathbf{w},p}(h)= \sum_{\gamma} S_{\ag,p}(h_{\gamma}) \vpg ~~,
1018: \end{equation}
1019: with the functions $S_{w,p}$ from $\R$ to itself given by
1020: {\rm (\ref{S-pneq1}, \ref{S-peq1})}. For all $h \in \cH$, one has
1021: $$
1022: {\Phi}^{^{\SUR}}_{\mathbf{w},p}(f+h ; a)
1023: \geq {\Phi}^{^{\SUR}}_{\mathbf{w},p}(f ;a)+\|h\|^2~.
1024: $$
1025: \end{proposition}
1026: {\em Proof:}
1027: The cases $p>1$ and $p=1$ should be treated slightly differently. We discuss
1028: here only the case $p=1$; the simpler case $p>1$ is left to the reader.
1029:
1030: Take $f'=f+h$, where $f$ is as defined in the Proposition, and $
1031: h \in \cH$
1032: is arbitrary. Then
1033: $$
1034: {\Phi}^{^{\SUR}}_{\mathbf{w},1}(f+h ; a) =
1035: {\Phi}^{^{\SUR}}_{\mathbf{w},1}(f ; a)+ 2 \left<h,f-a -\K (g
1036: -Ka) \right>
1037: +\sum_{\g \in \Gamma} \ag (|\fg +h_{\g}| -|\fg|) + \|h\|^2 ~.
1038: $$
1039: Define now $\Gamma_{_{\!0}}=\{\g \in \Gamma; \fg=0\}$, and
1040: $\Gamma_{_{\!1}}=\Gamma \setminus
1041: \Gamma_{_{\!0}}$.
1042: Substituting the explicit expression \eref{solcomp-peq1} for the $\fg$, we
1043: have then
1044: \begin{eqnarray*}
1045: {\Phi}^{^{\SUR}}_{\mathbf{w},1}(f+h ; a)-
1046: {\Phi}^{^{\SUR}}_{\mathbf{w},1}(f; a) &= &\|h\|^2
1047: + \sum_{\g \in \Gamma_{_{\!0}}}
1048: \left[ \ag |h_{\g}| - 2 h_{\g} (a_{\g} +[ \K (g -K a)]_{\g} ) \right] \\
1049: &&~~~~~~~~~~~~
1050: +\sum_{\g \in \Gamma_{_{\!1}}} \left( \ag |\fg + h_{\g}| - \ag |\fg| + h_{\g}
1051: [-\ag ~ \mbox{sign}(\fg) ]\right) ~.
1052: \end{eqnarray*}
1053: For $\g \in \Gamma_{_{\!0}}$, $~2|a_{\g} +[ \K (g -K a)]_{\g}| \leq
1054: \ag$, so
1055: that
1056: $\ag |h_{\g}| - 2 h_{\g}\, (a_{\g} +[ \K (g -K a)]_{\g}) \geq 0$.\\
1057: If $\g \in \Gamma_{_{\!1}}$, we distinguish two cases, according to the sign
1058: of $\fg$\, . If $\fg >0$, then\\
1059: $\ag |\fg + h_{\g}| - \ag |\fg| + h_{\g}
1060: [-\ag ~ \mbox{sign}(\fg) ]= \ag [ |\fg +h_{\g}| - (\fg + h_{\g}) ] \geq 0$.
1061: If $\fg <0$, then $\ag |\fg + h_{\g}| - \ag |\fg| + h_{\g}
1062: [-\ag ~ \mbox{sign}(\fg) ]= \ag [ |\fg +h_{\g}| + (\fg + h_{\g}) ] \geq 0$.\\
1063: It follows that ${\Phi}^{^{\SUR}}_{\mathbf{w},1}(f+h;a)-
1064: {\Phi}^{^{\SUR}}_{\mathbf{w},1}(f;a) \geq \|h\|^2 $, which
1065: proves the Proposition.
1066: \hfill \QED
1067:
1068: \bigskip
1069:
1070: For later reference it is useful to point out that
1071: \begin{lemma}
1072: \label{SS-non-exp}
1073: The operators $\S_{\mathbf{w},p}$ are non-expansive, i.e.
1074: $\forall v, ~ v' \in \cH$, $\| \S_{\mathbf{w},p} v - \S_{\mathbf{w},p} v'
1075: \| \leq
1076: \|v-v'\|~.$
1077: \end{lemma}
1078: {\em Proof:}
1079: As shown by \eref{def-SS},
1080: $$
1081: \|\S_{\mathbf{w},p} v - \S_{\mathbf{w},p} v' \|^2 =
1082: \sum_{\g \in \Gamma} |S_{\ag,p}( v_{\g}) - S_{\ag,p}( v'_{\g})|^2 ~,
1083: $$
1084: which means that it suffices to show that, $\forall x , x' \in \R$, and all
1085: $w\geq0$,
1086: \begin{equation}
1087: \label{S-non-exp}
1088: | S_{w,p}(x)-S_{w,p}(x')| \leq |x-x'|~.
1089: \end{equation}
1090: If $p >1$, then $S_{w,p}$ is the inverse of the function $F_{w,p}$; since
1091: $F_{w,p}$ is differentiable with derivative uniformly bounded below by 1,
1092: \eref{S-non-exp} follows immediately in this case.\\
1093: If $p=1$, then $S_{w,1}$ is not differentiable in $x=w/2$ or $x=-w/2$, and
1094: another
1095: argument must be used. For the sake of definiteness, let us assume $x \geq x'$.
1096: We will just check all the possible cases.
1097: If $x$ and $x'$ have the same sign and $|x|,~|x'|\geq w/2$, then
1098: $| S_{w,p}(x)-S_{w,p}(x')| =|x-x'|$. If $x'\leq -w/2$ and $x \geq w/2$, then
1099: $| S_{w,p}(x)-S_{w,p}(x')| = x +|x'|-w < |x-x'|$. If
1100: $x \geq w/2$ and $|x'| < w/2$, then
1101: $| S_{w,p}(x)-S_{w,p}(x')| =x-w/2 < |x-x'|$. A symmetric argument applies
1102: to the
1103: case $|x|<w/2$ and $x' \leq -w/2$. Finally, if both $|x|$ and $|x'|$ are
1104: less than
1105: $w/2$, we have
1106: $| S_{w,p}(x)-S_{w,p}(x')|=0 \leq |x-x'|$. This establishes \eref{S-non-exp}
1107: in all cases. \hfill \QED
1108:
1109: \bigskip
1110:
1111: Having found the minimizer of a generic
1112: ${\Phi}^{^{\SUR}}_{\mathbf{w},p}(f ; a)$, we can
1113: apply this to the
1114: iteration \eref{iter}, leading to
1115:
1116: \begin{corollary}
1117: \label{cor-2-2}
1118: Let $\cH$, $\cH'$, $K$, $g$, $\mathbf{w}$ and $(\vpg)_{\g \in \Gamma}$ be
1119: as in Proposition {\rm \ref{prop-2-1}}. Pick $f^0$ in $\cH$, and define
1120: the functions $f^n$ recursively by the algorithm {\rm \eref{iter}}.
1121: Then
1122: \begin{equation}
1123: \label{f-n}
1124: f^n= \S_{\mathbf{w},p}\left(f^{n-1}+ \K (g-Kf^{n-1}) \right) ~.
1125: \end{equation}
1126: \end{corollary}
1127: {\em Proof:} this follows immediately from Proposition \ref{prop-2-1}.
1128: $~~~~~~~~~~~$ \hfill \QED
1129:
1130: \begin{remark}
1131: \label{op-D}
1132: {\rm In the argument above, we used essentially only two ingredients: the
1133: (strict)
1134: convexity of $\|f-a\|^2-\|K(f-a)\|^2$, and the presence of the negative
1135: $-\|Kf\|^2$
1136: term in this expression, canceling the $\|Kf\|^2$ in the original
1137: functional. We can use this observation to present a slight
1138: generalization, in which the identity operator used to upper bound $ \K K
1139: $ is replaced by a more general operator $D$ that is diagonal in the
1140: $\vpg$--basis,}
1141: $$
1142: D\, \vpg = d_{\gamma} \vpg~,
1143: $$
1144: {\rm and that still gives a strict upper bound for $ \K K$, i.e. satisfies}
1145: $$
1146: D \geq K^*K + \eta I\ \ \mbox{\rm for some } \eta >0\ .
1147: $$
1148: {\rm In this case, the whole construction still carries through, with slight
1149: modifications; the successive $f^n$ are now given by }
1150: \begin{equation*}
1151: %\label{generalization-comp}
1152: \fg ^n = S_{\ag/d_{\gamma},p}\left(\fg ^{n-1} +
1153: \frac {[\K (g-Kf^{n-1})]_{\gamma}}{d_{\gamma}} \right) ~~.
1154: \end{equation*}
1155: {\rm Introducing the notation $\mathbf{w/d}$ for the sequence
1156: $(\ag/d_{\gamma})_{\gamma}$, we can rewrite this as }
1157: \begin{equation*}
1158: %\label{generalization}
1159: f^n = \S_{\mathbf{w/d},p}\left( f^{n-1} +D^{-1}[\K (g-Kf^{n-1})] \right) ~.
1160: \end{equation*}
1161: {\rm
1162: For the sake of simplicity of notation,
1163: we shall restrict ourselves
1164: to the case $D = \mbox{\I}$.}
1165: \end{remark}
1166:
1167: \begin{remark}
1168: \label{2-5}
1169: {\rm If we deal with complex rather than real functions, and the $f_{\gamma}$,
1170: $(K^*g)_{\gamma}, \cdots$ are complex quantities, then the derivation
1171: of the minimizer of ${\Phi}^{^{\SUR}}_{\mathbf{w},1}(f; a)$
1172: has to be adapted somewhat. Writing
1173: $f_{\gamma}= r_{\gamma} e^{i \theta_{\gamma}}$, with
1174: $ r_{\gamma} \geq 0$, $\theta_{\gamma} \in [0,2 \pi )$,
1175: and likewise $(a +K^*g -K^*Ka)_{\gamma} = R_{\gamma} e^{i \Theta_{\gamma}}$,
1176: we find, instead of \eref{sur},
1177: $$
1178: {\Phi}^{^{\SUR}}_{\mathbf{w},p}(f; a)
1179: = \sum_{\gamma} [ r_{\gamma}^2+ w_{\gamma}r_{\gamma}^p
1180: - 2r_{\gamma}R_{\gamma}\cos(\theta_{\gamma}-\Theta_{\gamma})]
1181: +\|g\|^2+\|a\|^2-\|Ka\|^2~.
1182: $$
1183: Minimizing over $r_{\gamma} \in [0,\infty)$ and
1184: $\theta_{\gamma} \in [0,2 \pi)$ leads to
1185: $\theta_{\gamma}=\Theta_{\gamma}$ and
1186: $r_{\gamma}=S_{w_{\gamma},p}(R_{\gamma})$.
1187: If we extend the definition
1188: of $S_{\mu,p}$ to complex arguments by setting
1189: $S_{\mu,p}(r e^{i \theta}) = S_{\mu,p}(r) e^{i \theta}$,
1190: then this still leads to
1191: $\fg= S_{w_{\gamma},p}\left(a_{\gamma} +[K^*(g -Ka)]_{\gamma} \right)$,
1192: as in (\ref{solcomp-pneq1}, \ref{solcomp-peq1}).
1193: The arguments of the different proofs still
1194: hold for this complex version, after minor and straightforward modifications.}
1195: \end{remark}
1196: %\newpage
1197:
1198:
1199: \section{Convergence of the iterative algorithm}
1200:
1201: In this section we discuss the convergence of the sequence $(f^n)_{n \in \N}$
1202: defined by {\rm \eref{f-n}}. The main result of this section is the
1203: following theorem:
1204: \begin{theorem}
1205: \label{th-3-1}
1206: Let $K$ be a bounded linear operator from
1207: $\cH$ to $\cH'$, with norm strictly bounded by $1$. Take $p \in [1,2]$, and
1208: let $\S_{\mathbf{w},p}$ be the shrinkage operator defined by {\rm
1209: \eref{def-SS}},
1210: where the sequence $\mathbf{w}=(\ag)_{\g \in \Gamma}$ is uniformly
1211: bounded
1212: below away from zero, i.e. there exists a constant $c>0$ such that $\forall \g
1213: \in
1214: \Gamma:$
1215: $\ag \geq c$.
1216: Then the sequence of iterates
1217: \begin{equation*}
1218: f^n=\S_{\mathbf{w},p}\left( f^{n-1} + \K (g- Kf^{n-1})\right)\ ,\quad
1219: n=1,2,\dots\;,
1220: \end{equation*}
1221: with $f^0$ arbitrarily chosen in $\cH$, converges strongly to a minimizer
1222: of the functional
1223: \begin{equation*}
1224: \Phi_{\mathbf{w},p}(f) = \| Kf-g\|^2 + \Vvert f\Vvert_{\mathbf{w},p}^p\ ,
1225: \end{equation*}
1226: where $\Vvert f\Vvert_{\mathbf{w},p}$ denotes the norm
1227: \begin{equation}
1228: \label{triple-norm}
1229: \Vvert f\Vvert_{\mathbf{w},p} = \left[ \sum_{\g \in \Gamma} \ag
1230: |\left<f,\vpg \right>|^p
1231: \right]^{1/p} ~,~ 1 \leq p \leq 2~.
1232: \end{equation}
1233: If either $p>1$ or {\rm{N}}$(K)=\{0\}$, then the minimizer $f^\star$ of
1234: $\Phi_{\mathbf{w},p}$ is unique, and every sequence of iterates
1235: $f^n$ converges strongly to $f^\star$, regardless of the choice of $f^0$.
1236: \end{theorem}
1237:
1238: By ``strong convergence'' we mean convergence in the norm of $\cH$, as
1239: opposed to
1240: weak convergence.
1241: This theorem will be proved in several stages. To start, we prove weak
1242: convergence,
1243: and we establish that the weak limit is indeed a minimizer of
1244: $\Phi_{\mathbf{w},p}$.
1245: Next, we prove that the convergence holds in norm, and not only in the weak
1246: topology.
1247: To lighten our formulas, we introduce the shorthand notation
1248: $$
1249: \T f = \S_{\mathbf{w},p}\left( f + \K (g-Kf)\right)~;
1250: $$
1251: with this new notation we have $f^n= \T^n f^0$.
1252:
1253: \subsection{Weak convergence of the $f^n$}
1254:
1255: To prove weak convergence of the $f^n=\T^n f^0$, we apply the following
1256: theorem, due to
1257: Opial \cite{Opi67}:
1258: \begin{theorem}
1259: \label{thm_opial}
1260: Let
1261: the mapping $\A $ from $\cH$ to $\cH$ satisfy the following
1262: conditions:
1263: \begin{enumerate}
1264: \item[{\rm (i)}] $\A $ is non-expansive: $\forall v, v' \in \cH$,
1265: $\| \A v - \A v'\| \leq \| v - v'\|$,
1266: \item[{\rm (ii)}] $\A $ is asymptotically regular: $\forall v \in \cH$,
1267: $\| \A ^{n+1}v -\A ^n v\|
1268: \xrightarrow[n \to \infty ]{~} 0$ ,
1269: \item[{\rm (iii)}] the set ${\cal F}$ of the fixed points of $\A $ in
1270: ${\cH}$ is
1271: not empty.
1272: \end{enumerate}
1273: Then, $\forall v \in \cH$, the sequence $(\A ^n v)_{n \in \N}$
1274: converges weakly to a fixed point in ${\cal F}$.
1275: \end{theorem}
1276:
1277: Opial's original proof can be simplified;
1278: we provide the simplified proof (still mainly
1279: following Opial's approach) in Appendix \ref{Opial}.
1280: (The theorem is slightly more general than what is stated
1281: in Theorem \ref{thm_opial} in that the mapping $\A $ need not be
1282: defined on all of space; it suffices that it map a closed convex subset of
1283: $\cH$ to itself -- see Appendix \ref{Opial}.
1284: \cite{Opi67} also contains additional refinements,
1285: which we shall not need here.)
1286: One of the Lemmas stated and proved in the Appendix
1287: will be invoked in its own right, further below in this section;
1288: for the reader's convenience, we state
1289: it here in full as well:
1290:
1291: \begin{lemma}
1292: \label{lem-3-2}
1293: Suppose the mapping $\A$ from $\cH$
1294: to $\cH$ satisfies the conditions {\rm (i)} and {\rm (ii)}
1295: in Theorem {\rm\ref{thm_opial}}. Then, if
1296: a subsequence of $(\A ^n v)_{n\in \mathbb{N}}$
1297: converges weakly in $\cH$, then its limit is a fixed point of $\A$.
1298: \end{lemma}
1299:
1300: In order to apply Opial's Theorem to our nonlinear operator
1301: $\T$, we need to verify that it satisfies the three conditions in Theorem
1302: \ref{thm_opial}. We do this in the following series of lemmas. We first have
1303:
1304: \begin{lemma}
1305: \label{nonexp}
1306: The mapping $\T$ is non-expansive, i. e. $\forall v, v' \in \cH$
1307: \begin{equation*}
1308: \|\T v - \T v' \| \leq \| v - v' \| \ .
1309: \end{equation*}
1310: \end{lemma}
1311: {\em Proof:}
1312: It follows from Lemma \ref{SS-non-exp}
1313: that the shrinkage operator ${\S_{\mathbf{w},p}}$ is
1314: non-expansive. Hence we have
1315: \begin{eqnarray*}
1316: \|{\T} v - {\T} v' \| &\leq& \| (I-K^*K) v - (I-K^*K)
1317: v' \|\\
1318: &\leq& \| I-K^*K \|\ \| v - v' \| \leq \| v - v' \|
1319: \end{eqnarray*}
1320: because we assumed $\| K \| < 1$.
1321: \hfill\QED\bigskip
1322:
1323: This verifies that $\T$ satisfies the first condition (i) in Theorem
1324: \ref{thm_opial}.
1325: To verify the second condition, we first prove some auxiliary lemmas.
1326: \begin{lemma}
1327: \label{cost2}
1328: Both $\left({\Phi}_{\mathbf{w},p}(f^n)\right)_{n \in \N}$ and
1329: $\left({\Phi}^{^{\SUR}}_{\mathbf{w},p}(f^{n+1} ; f^n)\right)_{n
1330: \in
1331: \N}$ are non-increasing sequences.
1332: \end{lemma}
1333: {\em Proof:} For the sake of convenience, we introduce the operator
1334: $L = \sqrt{I -\K K }$, so that $\|h\|^2-\|Kh\|^2= \|Lh\|^2$. Because
1335: $f^{n+1}$ is the
1336: minimizer of the functional
1337: ${\Phi}^{^{\SUR}}_{\mathbf{w},p}(f ; f^n)$ and therefore
1338: \begin{equation*}
1339: \Phi_{\mathbf{w},p}(f^{n+1})+\| L(f^{n+1}-f^n)\|^2 =
1340: {\Phi}^{^{\SUR}}_{\mathbf{w},p}(f^{n+1} ; f^n)
1341: \leq
1342: {\Phi}^{^{\SUR}}_{\mathbf{w},p}(f^n ; f^n)=\Phi_{\mathbf{w},p}(f^n)\
1343: ,
1344: \end{equation*}
1345: we obtain
1346: \begin{equation*}
1347: \Phi_{\mathbf{w},p}(f^{n+1})\leq \Phi_{\mathbf{w},p}(f^n) \ .
1348: \end{equation*}
1349: On the other hand
1350: \begin{equation*}
1351: {\Phi}^{^{\SUR}}_{\mathbf{w},p}(f^{n+2} ; f^{n+1})\leq
1352: \Phi_{\mathbf{w},p}(f^{n+1})
1353: \leq \Phi_{\mathbf{w},p}(f^{n+1})+
1354: \|L(f^{n+1}-f^n)\|^2={\Phi}^{^{\SUR}}_{\mathbf{w},p}(f^{n+1} ; f^n)\
1355: .
1356: \end{equation*}
1357: \hfill\QED\bigskip
1358:
1359: \begin{lemma}
1360: \label{unifbddness}
1361: Suppose the sequence $\mathbf{w}=(\ag)_{\g \in \Gamma}$ is uniformly bounded
1362: below by a strictly positive number.
1363: Then the $\|f^n\|$ are bounded uniformly in $n$.
1364: \end{lemma}
1365: {\em Proof:} Since $\ag \geq c$, uniformly in $\g$, for some $c>0$, we have
1366: \begin{equation*}
1367: \Vvert f^n\Vvert_{\mathbf{w},p}^p
1368: \leq \Phi_{\mathbf{w},p}(f^n) \leq
1369: \Phi_{\mathbf{w},p}(f^0)~,
1370: \end{equation*}
1371: by Lemma \ref{cost2}. Hence the $f^n$ are bounded uniformly
1372: in the $\Vvert ~\Vvert_{\mathbf{w},p}$-norm.
1373: Since
1374: \begin{equation}
1375: \label{bdL2Ban}
1376: \| f \|^2 \leq c^{-2/p} \ {\mathop{\rm max}_{\g \in \Gamma}}
1377: [\ag^{(2-p)/p} |f_{\g}|^{2-p}]\ \Vvert f \Vvert_{\mathbf{w},p}^p
1378: \leq c^{-2/p} \Vvert f\Vvert_{\mathbf{w},p}^{2-p}\ \Vvert f
1379: \Vvert_{\mathbf{w},p}^p = c^{-2/p} \Vvert f
1380: \Vvert_{\mathbf{w},p}^2\ ,
1381: \end{equation}
1382: we also have a uniform bound on the $\|f^n\|$.
1383: \hfill\QED\bigskip
1384:
1385: \begin{lemma}
1386: \label{series}
1387: The
1388: series
1389: $\sum_{n=0}^\infty \| f^{n+1}-f^n\|^2 $ is convergent.
1390: \end{lemma}
1391: {\em Proof:} This is a consequence of the strict positive-definiteness of $L$,
1392: which holds because $\|K\| <1$. We have, for any $N \in \N$,
1393: \begin{equation*}
1394: \sum_{n=0}^N \| f^{n+1}-f^n\|^2 \leq \frac{1}{A} \sum_{n=0}^N
1395: \| L(f^{n+1}-f^n)\|^2
1396: \end{equation*}
1397: where $A$ is a strictly positive lower bound for the spectrum of $L^*L$.
1398: By Lemma \ref{cost2},
1399: \begin{equation*}
1400: \sum_{n=0}^{N}
1401: \| L(f^{n+1}-f^n)\|^2 \leq
1402: \sum_{n=0}^N [\Phi_{\mathbf{w},p}(f^n)-\Phi_{\mathbf{w},p}(f^{n+1})]
1403: = \Phi_{\mathbf{w},p}(f^{0})-\Phi_{\mathbf{w},p}(f^{N+1}) \leq
1404: \Phi_{\mathbf{w},p}(f^{0})~,
1405: \end{equation*}
1406: where we have used that
1407: $(\Phi_{\mathbf{w},p}(f^n))_{n \in \N}$ is a non-increasing sequence. \\
1408: It follows
1409: that $\sum_{n=0}^N \| f^{n+1}-f^n\|^2 $ is bounded uniformly in $N$, so that
1410: the infinite series converges.\hfill \QED
1411:
1412: \bigskip
1413:
1414: As an immediate consequence, we have that
1415: \begin{lemma}
1416: \label{asyreg}
1417: The mapping ${\T}$ is asymptotically regular, i.e.
1418: \begin{equation*}
1419: \|{\T}^{n+1}f^0 -{\T}^n f^0 \| =
1420: \| f^{n+1} - f^n \| \to 0 \quad {\rm for} \quad n \to \infty\ .
1421: \end{equation*}
1422: \end{lemma}
1423: We can now establish the following
1424: \begin{proposition}
1425: The sequence $f^n={\T}^nf^0$,
1426: $n=1, 2, \cdots$ converges weakly, and its limit is a fixed point for $\T$.
1427: \end{proposition}
1428: {\em Proof:} Since, by Lemma \ref{unifbddness}, the $f^n={\T}^n f^0$
1429: are uniformly bounded in $n$, it follows from the Banach-Alaoglu
1430: theorem that they have a weak accumulation point. By Lemma \ref{lem-3-2},
1431: this weak accumulation point is a fixed point for $\T$.
1432: It follows that the set of fixed points of $\T$ is not empty.
1433: Since $\T$ is also non-expansive (by Lemma \ref{nonexp}) and
1434: asymptotically regular (by Lemma \ref{asyreg}),
1435: we can apply Opial's theorem (Theorem \ref{th-3-1} above), and the
1436: conclusion of the Proposition follows.
1437: \hfill\QED
1438:
1439: \bigskip
1440:
1441: By the following proposition this fixed point is also a minimizer
1442: for the functional $\Phi_{\mathbf{w},p}$.
1443: \begin{proposition}
1444: \label{fix-min}
1445: A fixed point for ${\T}$ is a minimizer for the functional
1446: $\Phi_{\mathbf{w},p}$.
1447: \end{proposition}
1448: {\em Proof:}
1449: If $f^\star = {\T} f^\star$, then by Proposition
1450: \ref{prop-2-1}, we know that $f^\star$ is a minimizer for the surrogate
1451: functional ${\Phi}^{^{\SUR}}_{\mathbf{w},p}(f ; f^\star)$, and
1452: that, $\forall h \in \cH$,
1453: \begin{equation*}
1454: {\Phi}^{^{\SUR}}_{\mathbf{w},p}(f^\star + h ; f^\star) \geq
1455: {\Phi}^{^{\SUR}}_{\mathbf{w},p}(f^\star ;f^\star) + \| h \|^2
1456: ~.
1457: \end{equation*}
1458: Observing that ${\Phi}^{^{\SUR}}_{\mathbf{w},p}(f^\star ; f^\star) =
1459: \Phi_{\mathbf{w},p}(f^\star)$, and
1460: \begin{equation*}
1461: {\Phi}^{^{\SUR}}_{\mathbf{w},p}(f^\star + h ; f^\star) =
1462: \Phi_{\mathbf{w},p}(f^\star + h) + \| h \|^2 - \| Kh\|^2 \ ,
1463: \end{equation*}
1464: we conclude that, $\forall h \in \cH$,
1465: $ \Phi_{\mathbf{w},p}(f^\star + h) \geq \Phi_{\mathbf{w},p}(f^\star) + \|
1466: Kh\|^2$, which shows that $f^\star$ is a minimizer for $\Phi(f)$.
1467: \hfill\QED
1468:
1469: \bigskip
1470:
1471: The following proposition summarizes this subsection.
1472: \begin{proposition}
1473: \label{prop-wk-conv}
1474: {\rm (Weak convergence)} Make the same assumptions as in the statement of
1475: {\rm Theorem
1476: \ref{th-3-1}}. Then, for
1477: any choice of the initial $f^0$, the sequence $f^n={\T}^n f^0, \
1478: n=1,2,\cdots$ converges weakly to a minimizer for $\Phi_{\mathbf{w},p}$.
1479: If either {\rm N}$ (K)=\{0\}$ or $p>1$, then $\Phi_{\mathbf{w},p}$ has a unique
1480: minimizer $f^\star$, and all the sequences $(f^n)_{n \in \N}$ converge
1481: weakly to $f^\star$, regardless of the choice of $f^0$.
1482: \end{proposition}
1483: {\em Proof:}
1484: The only thing that hasn't been proved yet above is the
1485: uniqueness of the minimizer if $\mN (K)=\{0\}$ or $p>1$. This uniqueness
1486: follows from the observation that $\Vvert f \Vvert_{\mathbf{w},p}$ is strictly
1487: convex in $f$ if $p>1$, and that $\|Kf-g\|^2$ is strictly convex in
1488: $f$ if $\mN (K)=\{0\}$. In both these cases $\Phi_{\mathbf{w},p}$ is thus
1489: strictly convex, so that it has a unique minimizer. \hfill \QED
1490:
1491: \bigskip
1492:
1493: \begin{remark}
1494: {\rm If one has the additional prior information that the object lies in
1495: some closed convex subset ${\cal C}$ of the Hilbert space $\cH$, then the
1496: iterative procedure can be adapted to take this into account, by replacing
1497: the shrinkage operator
1498: ${\S}$ by ${\mathbf P}_{\!\cal C} {\S}$,
1499: where ${\mathbf P}_{\!\cal C}$ is the projector on ${\cal C}$. For example,
1500: if $\cH=L^2$, then ${\cal C}$ could be the cone of functions that are positive
1501: almost everywhere. The results in this subsection can be extended to this
1502: case;
1503: a more general version of Theorem \ref{thm_opial}
1504: can be applied, in which $\A $ need not be defined
1505: on all of $\cH$, but only on ${\cal C} \subset \cH$; see Appendix \ref{Opial}.
1506: We would however need to either use other tools
1507: to ensure, or assume outright
1508: that the set of fixed points of $\T={\mathbf P}_{\!\cal C} {\S}$ is not empty
1509: (see also \cite{Eic92})}.
1510: \end{remark}
1511:
1512: \bigskip
1513:
1514: \begin{remark}
1515: {\rm If $\Phi_{\mathbf{w},p}$ is strictly convex, then one can prove the weak
1516: convergence more directly, as follows. By the boundedness of the $f^n$
1517: (Lemma \ref{unifbddness}), we must have a weakly convergent subsequence
1518: $(f^{n_k})_{k \in \N}$. By Lemma \ref{asyreg}, the sequence
1519: $(f^{n_k+1})_{k \in \N}$ must then also be weakly convergent, with the same
1520: weak limit $\wf$. It then follows from the equation
1521: $$
1522: f^{n_k+1}_\g= S_{\ag,p}\left(f^{n_k}_\g +[\K (g-K f^{n_k})]_\g \right)~,
1523: $$
1524: together with $\lim_{k \to \infty}f^{n_k}_\g = \lim_{k \to \infty}f^{n_k+1}_\g
1525: =\wf _\g$, that $\wf$ must be the fixed point $f^\star$ of T. Since this
1526: holds for
1527: any weak accumulation point of $(f^n)_{n \in \N}$, the weak convergence
1528: of $(f^n)_{n \in \N}$ to $f^\star$ follows. }
1529: \end{remark}
1530:
1531: \bigskip
1532:
1533: \begin{remark}
1534: {\rm The proof of Lemma \ref{unifbddness} is the only place, so far, where we
1535: have explicitly used $p \leq 2$. If it were possible to establish a uniform
1536: bound on the $\|f^n\|$ by some other means (e.g. by showing that the
1537: $\|\T^n f^0\|$
1538: are bounded uniformly in $n$), then we could dispense with the restriction
1539: $p \leq 2$, and Proposition \ref{prop-wk-conv} would hold for all $p \geq 1$. }
1540: \end{remark}
1541:
1542: \subsection{Strong convergence of the $f^n$}
1543:
1544: In this subsection we shall prove that the convergence of the successive
1545: iterates
1546: $\{f^n\}$ holds not only in the weak topology, but also in the Hilbert
1547: space norm.
1548: Again, we break up the proof into several lemmas. For the sake of convenience,
1549: we introduce the following notations
1550: \begin{eqnarray}
1551: f^\star&=& \mbox{{\em w}\! --\!}\lim_{n \to \infty} f^n \nonumber \\
1552: u^n &=& f^n - f^\star \nonumber \\ %label{redef1}\\
1553: h\ &=& f^\star + \K (g-Kf^\star)\ . \label{redef2}
1554: \end{eqnarray}
1555: Here and below, we use the notation {\em w}$\,$--$\lim$ as a shorthand
1556: for {\em weak limit}.
1557: \begin{lemma}
1558: \label{Ku}
1559: $\| Ku^n \| \to 0$ for $n \to \infty$\ .
1560: \end{lemma}
1561: {\em Proof:}
1562: Since
1563: \begin{equation*}
1564: u^{n+1} - u^n = {\S}_{\mathbf{w},p}\left( h+(I-K^*K)u^n\right) -
1565: {\S}_{\mathbf{w},p}(h) - u^n
1566: \end{equation*}
1567: and $\|u^{n+1} - u^n \| = \| f^{n+1} - f^n \| \to 0\ {\rm for}\ n
1568: \to \infty$ by Lemma \ref{asyreg}, we have
1569: \begin{equation}
1570: \|\ {\S}_{\mathbf{w},p}\left( h+(I-K^*K)u^n\right) - {\S}_{\mathbf{w},p}(h) -
1571: u^n \| \to 0 \ {\rm for}\ n \to \infty ~,
1572: \label{cvaux}
1573: \end{equation}
1574: and hence also
1575: \begin{equation}
1576: \max\left(0,\| u^n \|- \|{\S}_{\mathbf{w},p}\left( h+(I-K^*K)u^n\right) -
1577: {\S}_{\mathbf{w},p}(h)\|\ \right) \to 0 \ {\rm for}\ n \to \infty\ .
1578: \label{cvtozero}
1579: \end{equation}
1580: Since ${\S}_{\mathbf{w},p}$ is non-expansive
1581: (Lemma \ref{SS-non-exp}), we have
1582: \begin{equation*}
1583: \| \ {\S_{\mathbf{w},p}}\left( h+(I-K^*K)u^n\right) - {\S}_{\mathbf{w},p}(h) \|
1584: \ \leq
1585: \| (I-K^*K)u^n \| \leq \| u^n \|~;
1586: \end{equation*}
1587: therefore the ``max'' in (\ref{cvtozero}) can be dropped, and it follows that
1588: \begin{equation}
1589: \| u^n \| - \| (I-K^*K)u^n \| \to 0 \ {\rm for}\ n \to \infty \ .
1590: \label{cv2}
1591: \end{equation}
1592: Because
1593: \begin{eqnarray*}
1594: \| u^n \| + \| (I-K^*K)u^n \| &\leq& 2\| u^n\| = 2\|f^n-f^\star\|\\
1595: &\leq& 2(\| f^\star\|+ {\mathop{\rm sup}_{k}} \| f^k\|) = C
1596: \end{eqnarray*}
1597: where $C$ is a finite constant (by Lemma \ref{unifbddness}), we obtain
1598: \begin{equation*}
1599: 0 \leq \| u^n \|^2 - \| (I-K^*K)u^n \|^2 \leq
1600: C (\| u^n \| - \| (I-K^*K)u^n \|)\ ,
1601: \end{equation*}
1602: which tends to zero by (\ref{cv2}).
1603: The inequality
1604: \begin{equation*}
1605: \| u^n \|^2 - \| (I-K^*K)u^n \|^2 =
1606: 2\| Ku^n\|^2-\|\K Ku^n\|^2 \geq \| Ku^n\|^2
1607: \end{equation*}
1608: then implies that $\| Ku^n\|^2 \to 0 \ {\rm for}\ n \to \infty\ $.
1609: \hfill\QED
1610: \begin{remark}
1611: \label{ifKcomp}
1612: {\rm Note that if $K$ is a compact operator, the weak convergence
1613: to $0$ of the $u_n$ automatically implies that $\|K u_n\|$ tends
1614: to $0$ as $n$ tends to $\infty$, so that we don't need
1615: Lemma \ref{Ku} in this case.}
1616: \end{remark}
1617:
1618: \bigskip
1619:
1620: If $K$ had a bounded inverse, we could conclude from $\|K u_n\| \to 0$ that
1621: $\| u_n\|
1622: \to 0
1623: \ {\rm for}\ n \to \infty\ $. If this is not the case, however,
1624: and thus for all
1625: ill-posed linear inverse problems, we need some extra work to show the norm
1626: convergence of $f^n$ to $f^\star$.
1627: \begin{lemma}
1628: For $h$ given by {\rm (\ref{redef2})},
1629: $\| {\S}_{\mathbf{w},p}(h+u^n) - {\S}_{\mathbf{w},p}(h) - u^n \| \to 0$ for
1630: $n \to
1631: \infty$.
1632: \end{lemma}
1633: {\em Proof:}
1634: We have
1635: \begin{eqnarray*}
1636: \| {\S}_{\mathbf{w},p}(h+u^n) - {\S}_{\mathbf{w},p}(h) - u^n \|
1637: &\leq& \| {\S}_{\mathbf{w},p}(h+u^n-K^*Ku^n) - {\S}_{\mathbf{w},p}(h) - u^n
1638: \|\\ &&~~~~+
1639: \|{\S}_{\mathbf{w},p}(h+u^n) - {\S}_{\mathbf{w},p}(h+u^n-K^*Ku^n) \| \\
1640: &\leq& \| {\S}_{\mathbf{w},p}(h+u^n-K^*Ku^n) - {\S}_{\mathbf{w},p}(h) - u^n
1641: \|\\ &&
1642: ~~~~+\|\K Ku^n \|~,
1643: \end{eqnarray*}
1644: where we used the non-expansivity of ${\S}_{\mathbf{w},p}$ (Lemma
1645: \ref{SS-non-exp}).
1646: The result follows since both terms in this last bound tend to zero for $n \to
1647: \infty$ because of Lemma
1648: \ref{Ku} and (\ref{cvaux}).
1649: \hfill\QED
1650:
1651: \bigskip
1652:
1653: \begin{lemma}
1654: \label{lm-3-16}
1655: If for some $a \in \cH$, and some sequence $(v^n)_{n \in \N}$,
1656: w--$\lim_{n \to \infty}v^n=0$ and
1657: $\lim_{n \to \infty}\| { \S}_{\mathbf{w},p}(a+v^n) -
1658: {\S}_{\mathbf{w},p}(a) - v^n \|=0$
1659: then $\| v^n \| \to 0$ for $n \to \infty$.
1660: \end{lemma}
1661: {\em Proof:}
1662: The argument of the proof is slightly different for the cases $p=1$
1663: and $p>1$, and we treat the two cases separately. \\
1664: We start with $p>1$.
1665: Since the sequence $\{v^n\}$ is weakly convergent, it has to be bounded: there
1666: is a constant $B$ such that $\forall n$, $\| v^n \| \leq B$, and
1667: hence also $\forall n, \forall \g \in \Gamma$, $\vert v^n_\g \vert \leq B$.
1668: Next, we define the set $\Gamma_{_{\!0}}
1669: = \{ \g \in \Gamma; |a_{\g}| \geq B\}$; since $a \in \cH$, this is a finite
1670: set. We then have $\forall \g \in \Gamma_{_{\!1}}=\Gamma \setminus
1671: \Gamma_{_{\!0}}$,
1672: that $|a_{\g}|$ and $|a_{\g}+v^n_{\g}|$ are bounded above by $2B$.
1673: Recalling the definition of $S_{\ag,p}=\left(F_{\ag,p}\right)^{-1}$,
1674: and observing that, because $p \leq 2$,
1675: $F'_{\ag,p}(x)=1+\ag p(p-1)|x|^{p-2}/2 \geq
1676: 1+ \ag \, p(p-1) /[2(2B)^{2-p}]$ if $|x| \leq 2B$ , we have
1677: \begin{eqnarray*}
1678: |S_{\ag,p}(a_{\g}+v^n_{\g}) - S_{\ag,p}(a_{\g})|
1679: &\leq &\left( {\mathop {\max}_{|x|\leq 2B}} |S'_{\ag,p}(x)| \right) |v^n_{\g}|
1680: \\ & \leq & \left( 1+ \ag\, p(p-1) /[2 (2B)^{2-p}] \right)^{-1} |v^n_{\g}| \\
1681: & \leq & \left( 1+ c \, p(p-1) / [2(2B)^{2-p}] \right)^{-1} |v^n_{\g}|~;
1682: \end{eqnarray*}
1683: in the second inequality,
1684: we have used that $|S_{\ag,p}(x)| \leq |x|$, a consequence of the
1685: non-expansivity
1686: of $S_{\ag,p}$ (see
1687: Lemma
1688: \ref{SS-non-exp}) to upper bound the derivative
1689: $S'_{\ag,p}$ on the interval $[-2B,2B]$ by the inverse of the lower bound for
1690: $F'_{\ag ,p}$ on the same interval;
1691: in the last inequality we used the uniform lower bound on the $\ag$, i.e.
1692: $\forall \g,
1693: ~ \ag \geq c >0$.
1694: Rewriting $\left( 1+ c \, p(p-1) / [2(2B)^{2-p}] \right)^{-1}= C'<1$, we
1695: have thus,
1696: $\forall \g \in \Gamma_{_{\!1}}$, $C'|v^n_{\g}| \geq
1697: |S_{\ag,p}(a_{\g}+v^n_{\g}) - S_{\ag,p}(a_{\g})|$, which implies
1698: $$
1699: \sum_{\g \in \Gamma_{_{\!1}}} |v^n_{\g}|^2 \leq
1700: \frac{1}{(1-C')^2} \sum_{\g \in \Gamma_{_{\!1}}} |v^n_{\g}
1701: -S_{\ag,p}(a_{\g}+v^n_{\g})
1702: + S_{\ag,p}(a_{\g})|^2 \to 0 \mbox{ as } n \to \infty ~.
1703: $$
1704: On the other hand, since $\Gamma_{_{\!0}}$ is a finite set, and the $v^n$
1705: tend to
1706: zero weakly as $n$ tends to $\infty$, we also have
1707: $$
1708: \sum_{\g \in \Gamma_{_{\!0}}} |v^n_{\g}|^2 \to \infty \mbox{ as } n \to
1709: \infty ~.
1710: $$
1711: This proves the proposition for the case $p>1$. \\
1712: For $p=1$,
1713: we define a finite set $\Gamma_{_{\!0}} \subset \Gamma$ so
1714: that $\sum_{ \g \in \Gamma \setminus \Gamma_{_{\!0}}}
1715: |a_{\g}|^2 \leq (c/4 )^2$,
1716: where $c$ is again the uniform lower bound on the $\ag$.
1717: Because this is a finite set, the weak convergence of the $v^n$
1718: implies that $\sum_{\g \in \Gamma_{_{\!0}}} |v^n_{\g}|^2
1719: \xrightarrow[n \to \infty]{~} 0$,
1720: so that we can concentrate on
1721: $\sum_{\g \in \Gamma \setminus \Gamma_{_{\!0}}} |v^n_{\g}|^2$ only. \\
1722: For each $n$, we split $\Gamma_{_{\!1}}=\Gamma \setminus \Gamma_{_{\!0}}$ into
1723: two subsets:
1724: $\Gamma_{_{\!1,n}} = \{\g \in \Gamma_{_{\!1}};
1725: |v^n_{\g}+a_{\g}| <\ag/2\}$ and $\widetilde{\Gamma}_{_{\!1,n}}=
1726: \Gamma_{_{\!1}} \setminus \Gamma_{_{\!1,n}}$. If $\g \in \Gamma_{_{\!1,n}}$,
1727: then $S_{\ag,1}(a_{\g}+v^n_{\g})=
1728: S_{\ag,1}(a_{\g}) =0$ (since $|a_{\g}|\leq c/4 \leq \ag/2$),
1729: so that $|v^n_{\g} -
1730: S_{\ag,1}(a_{\g}+v^n_{\g}) + S_{\ag,1}(a_{\g})|=|v^n_{\g}|$.
1731: It follows that
1732: $$
1733: \sum_{\g \in \Gamma_{_{\!1,n}}} |v^n_{\g}|^2
1734: \leq \sum_{\g \in \Gamma} |v^n_{\g} -
1735: S_{\ag,1}(a_{\g}+v^n_{\g}) + S_{\ag,1}(a_{\g})|^2 \to 0 \mbox{ as }
1736: n \to \infty ~.
1737: $$
1738: It remains to prove only that
1739: the remaining sum, $\sum_{\g \in \widetilde{\Gamma}_{_{\!1,n}}}
1740: |v^n_{\g}|^2 $ also tends
1741: to $0$ as $n \to \infty$. \\
1742: If $\g \in \Gamma_{_{\!1}}$ and $|v^n_{\g}+a_{\g}| \geq \ag/2$, then
1743: $|v^n_{\g}|\geq |v^n_{\g}+a_{\g}| - |a_{\g}| \geq \ag/2 -c/4
1744: \geq c/4 \geq |a_{\g}|$, so that $ v^n_{\g}+a_{\g}$ and
1745: $v^n_{\g}$ have the same sign; it then follows that
1746: \begin{eqnarray*}
1747: && |v^n_{\g} -
1748: S_{\ag,1}(a_{\g}+v^n_{\g}) + S_{\ag,1}(a_{\g})|=
1749: |v^n_{\g} -
1750: S_{\ag,1}(a_{\g}+v^n_{\g})| \\
1751: &&~~~~~~~~
1752: =|v^n_{\g}- (a_{\g}+v^n_{\g})+ \frac{\ag}{2} \mbox{sign}(v^n_{\g})|
1753: \geq \frac{\ag}{2} -|a_{\g}| \geq \frac{c}{4} ~.
1754: \end{eqnarray*}
1755: This implies that
1756: $$
1757: \sum_{\g \in \widetilde{\Gamma}_{_{\!1,n}}}|v^n_{\g} -
1758: S_{\ag,1}(a_{\g}+v^n_{\g}) + S_{\ag,1}(a_{\g})|^2
1759: \geq \left(\frac{c}{4}\right)^2 \# \widetilde{\Gamma}_{_{\!1,n}} ~;
1760: $$
1761: since $\|v^n-\S_{\mathbf{w},1}(a+v^n)+\S_{\mathbf{w},1}(a)\|
1762: \xrightarrow[n \to \infty]{~} 0$, we know on the other hand
1763: that
1764: $$
1765: \sum_{\g \in \widetilde{\Gamma}_{_{\!1,n}}}|v^n_{\g} -
1766: S_{\ag,1}(a_{\g}+v^n_{\g}) + S_{\ag,1}(a_{\g})|^2 <
1767: \left(\frac{c}{4}\right)^2
1768: $$
1769: when $n$ exceeds some threshold, $N$, which implies that
1770: $\widetilde{\Gamma}_{_{\!1,n}}$ is empty when $n > N$. Consequently
1771: $\sum_{\g \in \widetilde{\Gamma}_{_{\!1,n}}}|v^n_{\g}|^2 =0$
1772: for $n > N$. This completes the proof for the case $p=1$.\hfill \QED
1773:
1774: \bigskip
1775:
1776: Combining the Lemmas in this subsection with the results of the previous
1777: subsection gives a complete proof of Theorem \ref{th-3-1} as stated at the
1778: start of this section.
1779:
1780:
1781:
1782:
1783: \section{Regularization properties and stability estimates}
1784:
1785: In the preceding section, we devised
1786: an iterative algorithm that
1787: converges towards a minimizer of the functional
1788: \begin{equation}
1789: \Phi_{\mathbf{w},p}(f) = \| Kf-g\|^2 + \Vvert f\Vvert^p_{\mathbf{w},p}~.
1790: \label{phimu2}
1791: \end{equation}
1792: For simplicity, let us assume, until further notice,
1793: that either $p>1$ or $\mN(K)=\{0\}$, so that there is a unique minimizer.
1794:
1795: In this section, we shall discuss to what extent this minimizer is
1796: acceptable as a {\em regularized solution} of the (possibly ill-posed)
1797: inverse problem $Kf=g$. Of particular interest to us is the {\em stability}
1798: of the estimate. For instance, if $\mN(K)=\{0\}$, we would like to know
1799: to what extent the proposed solution, in this
1800: case the minimizer of $\Phi_{\mathbf{w},p}$, deviates from the ideal
1801: solution $f_o$ if the data are a (small) perturbation of the image
1802: $Kf_o$ of $f_o$. (If $\mN(K) \neq \{0\}$, then there exist other $f$ that
1803: have the same image as $f_o$, and the algorithm might choose one of those
1804: -- see below.) In this discussion both the ``size'' of the perturbation
1805: and the weight of the penalty term in the variational functional, given
1806: by the coefficients $(\ag)_{\g \in \Gamma}$, play a role. We argued earlier
1807: that we need
1808: $\mathbf{w} \neq \mathbf{0}$ in order to provide a meaningful estimate
1809: if e.g. $K$ is a compact operator; on the other hand,
1810: if $g = Kf_o$, then the presence of the penalty term will cause the
1811: minimizer of $\Phi_{\mathbf{w},p}$ to be different from $f_o$. We therefore
1812: need to strike a balance between the respective weights of the
1813: perturbation $g - Kf_o$ and
1814: the penalty term. Let us first define a framework in which we can make this
1815: statement more precise.
1816:
1817: Because we shall deal in this section with data functions $g$ that are not
1818: fixed, we
1819: adjust our notation for the variational functional to make the dependence
1820: on $g$
1821: explicit
1822: where appropriate: with this more elaborate notation, the right hand side
1823: of, for instance, \eref{phimu2} is now $\Phi_{\mathbf{w},p; g}(f )$.
1824: (Because we work with one fixed
1825: operator $K$, the dependence of the functional on $K$ remains ``silent''.)
1826: In order to make it possible to vary the weight of the penalty term in the
1827: functional, we introduce an extra parameter $\mu$. We shall thus consider
1828: the functional
1829: \begin{equation}
1830: \label{phimu}
1831: \Phi_{\mu,\mathbf{w},p; g}(f)=
1832: \| Kf-g\|^2 +\mu \Vvert f\Vvert^p_{\mathbf{w},p}~.
1833: \end{equation}
1834: Its minimizer will likewise depend on all these parameters. In its full
1835: glory, we denote it by $f^{\star}_{\mu ,\mathbf{w},p;g}$; when confusion
1836: is impossible we abbreviate this notation. In particular, since $\mathbf{w}$
1837: and $p$ typically will not vary in the limit procedure that defines stability,
1838: we may omit them in the heat of the discussion. Notice that the dependence on
1839: $\mathbf{w}$ and $\mu$ arises only through the product $\mu \mathbf{w}$.
1840:
1841: As mentioned above, if the ``error'' $e =g - Kf_o$ tends to zero, we would like
1842: to see our estimate for the solution of the inverse problem tend to $f_o$;
1843: since the minimizer of $\Phi_{\mu ,\mathbf{w},p; g}(f)$ differs from $f_o$ if
1844: $\mu \ne 0$, this means
1845: that we shall have to consider simultaneously a limit for $\mu \rightarrow 0$.
1846: More precisely, we want to find a functional dependence of $\mu$
1847: on the noise level $\epsilon$, $\mu=\mu(\epsilon)$
1848: such that
1849: \begin{equation}
1850: \label{desired-res}
1851: \mu(\epsilon) \xrightarrow[\epsilon \rightarrow 0]{~} 0 ~~~
1852: \mbox {and } ~~~ \sup_{\|g-Kf_o\|\leq \epsilon}
1853: \|f^{\star}_{\mu(\epsilon) ,\mathbf{w},p;g}-f_o \|
1854: \xrightarrow[\epsilon \rightarrow 0]{~} 0~.
1855: \end{equation}
1856: for each $f_o$ in a certain class of functions.
1857: If we can achieve this, then the ill-posed inverse problem will be {\em
1858: regularized}
1859: (in norm or ``strongly'') by our iterative method,
1860: and $f^\star_{\mu,\mathbf{w},p;g}$ will be
1861: called a {\em regularized solution}.
1862: One also says in this case
1863: that the minimization of the penalized least-squares functional
1864: (\ref{phimu2}) provides us with a {\em regularizing algorithm} or
1865: {\em regularization method}.
1866:
1867: \subsection {A general regularization theorem}
1868: If the $\ag$ tend to $\infty$, or more precisely, if
1869: \begin{equation}
1870: \label{compact-emb}
1871: \forall C >0 ~: ~ \# \{ \g \in \Gamma ; \ag \leq C \} < \infty ~,
1872: \end{equation}
1873: then the embedding of $\mathcal{B}_{\mathbf{w},p}=
1874: \{ f \in \cH;\sum_{\g \in \Gamma} \ag |f_\g|^p < \infty \}$ in $\cH$ is
1875: compact. (This is because the identity operator from
1876: $\mathcal{B}_{\mathbf{w},p}$ to
1877: $\cH$ is then the norm--limit in $\mathcal{L}(\mathcal{B}_{\mathbf{w},p},
1878: \cH)$,
1879: as $C \to \infty$,
1880: of the finite rank operators $P_C$ defined by
1881: $P_C f=\sum_{\g \in \Gamma_C} \ag \left<f,\varphi_\g \right> \varphi_\g$,
1882: where $\Gamma_C= \{ \g \in \Gamma ; \ag \leq C \}$.) In this case,
1883: general compactness arguments can be used to show that \eref{desired-res}
1884: can be achieved. (See also further below.)
1885: We are, however, also interested in the general case, where
1886: the $\ag$ need not grow unboundedly.
1887: The following theorem proves that we can then nevertheless
1888: choose the dependence $\mu(\epsilon)$
1889: so that \eref{desired-res} holds:
1890: \begin{theorem}
1891: \label{regthm}
1892: Assume that $K$ is a bounded operator from $\cH$ to $\cH'$ with $\|K\|<1$, that
1893: $1 \leq p \leq 2$ and that the entries in the sequence
1894: $\mathbf{w}=(\ag)_{\g \in \Gamma}$
1895: are bounded below uniformly by a strictly positive number $c$.
1896: Assume that either $p>1$ or $\mbox{{\rm N}}(K)=\{0\}$.
1897: For any $g \in \cH'$
1898: and any $\mu >0$, define
1899: $f^{\star}_{\mu, \mathbf{w},p;g}$ to be the minimizer of $\Phi_{\mu,
1900: \mathbf{w},p ; g}(f)$.
1901: If $\mu=\mu(\epsilon)$ satisfies the requirements
1902: \begin{equation}
1903: \label{mu-req}
1904: \lim_{\epsilon \rightarrow 0} \mu(\epsilon)=0 ~~~~~ \mbox{{\rm and}}
1905: ~~~~~ \lim_{\epsilon \rightarrow 0} \epsilon^2/\mu(\epsilon) =0 ~,
1906: \end{equation}
1907: then we have, for any $f_o \in \cH$,
1908: $$
1909: \lim_{\epsilon \rightarrow 0} \left[ \sup_{\|g-Kf_o\|\leq \epsilon}
1910: \|f^{\star}_{\mu(\epsilon), \mathbf{w},p;g}-\fs \|\right] =0 ~,
1911: $$
1912: where $\fs$ is the unique element of minimum
1913: $\Vvert ~ \Vvert _{\mathbf{w},p}$--norm in $\mathcal{S}
1914: = \mbox{{\rm N}}(K)+f_o =
1915: \{f; Kf = Kf_o\}$.
1916: \end{theorem}
1917:
1918:
1919: Note that under the conditions of Theorem \ref{regthm}, $\fs$ must indeed
1920: be unique:
1921: if $p>1$, then the $\Vvert ~ \Vvert _{\mathbf{w},p}$--norm is strictly
1922: convex, so
1923: that there is a unique minimizer for this norm in the hyperspace $\mN(K)+f_o$;
1924: if $p=1$, our assumptions require $\mN(K)=\{0\}$. Note also that
1925: if $\mN(K)=\{0\}$ (whether or not $p=1$), then necessarily $\fs = f_o$.
1926:
1927:
1928: To prove Theorem \ref{regthm}, we will need the following two lemmas:
1929:
1930: \begin{lemma}
1931: \label{lm-4-1}
1932: The functions $S_{w,p}$ from $\R$ to itself, defined by {\rm (\ref{S-pneq1},
1933: \ref{S-peq1})} for $p>1$, $p=1$, respectively, satisfy
1934: $$
1935: |S_{w,p}(x)-x| \leq {\frac{wp}{2}}\ |x|^{p-1}~.
1936: $$
1937: \end{lemma}
1938: {\em Proof:}
1939: For $p=1$, the definition \eref{S-peq1} implies immediately that
1940: $|x-S_{w,1}(x)|= \min(w/2,|x|) \leq w/2$, so that the proposition holds
1941: for $x \neq 0$. For $x=0$, $S_{w,1}(x)=0$.
1942:
1943: For $p>1$, $S_{w,p}= \left( F_{w,p} \right)^{-1}$, where
1944: $F_{w,p}(y)=y+{\frac{wp}{2}}|y|^{p-1} \mbox{sign}(y)$ satisfies $|F_{w,p}(y)|
1945: \geq |y|$, and $|F_{w,p}(y)-y| \leq {\frac{wp}{2}} |y|^{p-1}$.
1946: It follows that $\, |S_{w,p}(x)| \leq |x|\,$, and
1947: $\, |x-S_{w,p}(x)| $ $\leq {\frac{wp}{2}} \; |S_{w,p}(x)|^{p-1} $
1948: $ \leq {\frac{wp}{2}}\; |x|^{p-1} ~$.
1949: $~~~~~~~~~~~~~~~~~~$ \hfill \QED
1950:
1951: \bigskip
1952:
1953: \begin{lemma}
1954: \label{lm-conv}
1955: If the sequence of vectors $\left(v_k\right)_{ _{k \in \N}}$ converges weakly
1956: in $\cH$ to $v$, and $\lim_{k \to \infty} \Vvert v_k \Vvert_{\mathbf{w},p}$
1957: $ = \Vvert v \Vvert_{\mathbf{w},p}$,
1958: then $\left(v_k\right)_{ _{k \in \N}}$ converges
1959: to $v$ in the $\cH$--norm, i.e. $\lim_{k \to \infty} \|v-v_k\|=0~$.
1960: \end{lemma}
1961: {\em Proof:}
1962: It is a standard result that if {\em w}$\,$--$\lim_{k \to \infty} v_k = v$,
1963: and
1964: $\lim_{k \to \infty} \|v_k\|=\|v\|$, then $\lim_{k \to \infty} \|v- v_k\|^2
1965: = \lim_{k \to \infty} \left( \|v\|^2 + \|v_k\|^2 - 2 \left< v, v_k \right>
1966: \right)
1967: = \|v\|^2 + \|v\|^2 - 2 \left< v, v \right>
1968: = 0$.
1969: We thus need to prove only that $\lim_{k \to \infty} \|v_k\|=\|v\|$.
1970:
1971: Since the $v_k$ converge weakly, they are uniformly bounded. It follows that
1972: the $|v_{k,\g}| = |\left<v_k, \vpg \right>|$ are bounded uniformly in $k$
1973: and $\g$
1974: by some finite number $C$. Define $r=2/p$. Since, for $x, y > 0$,
1975: $|x^r - y^r| \leq r |x-y| \max(x,y)^{r-1}$, it follows that
1976: $ \left|~|v_{k,\g}|^2 -|v_{\g}|^2 \right|
1977: \leq r \, C^{p(r-1)}~ \left| ~ |v_{k,\g}|^p -|v_{\g}|^p \right|~.$
1978: Because the $\ag$ are uniformly bounded below by $c>0$, we obtain
1979: $$
1980: \left| \|v_k\|^2 -\|v\|^2 \right| \leq
1981: \sum_{\g \in \Gamma} \left| |v_{k,\g}|^2 - |v_{\g}|^2 \right|
1982: \leq \frac{2}{c \, p}\, C^{2-p} \sum_{\g \in \Gamma}
1983: \ag \left| ~ |v_{k,\g}|^p -|v_{\g}|^p \right| ~,
1984: $$
1985: so that it suffices to prove that this last expression tends to $0$
1986: as $k$ tends to $\infty$.
1987: Define now $u_{k,\g}=\min \left( |v_{k,\g}|,|v_{\g}| \right)$. Clearly
1988: $ \forall \g \in \Gamma ~:~ \lim_{k \to \infty} u_{k,\g}= |v_{\g}|~$; since
1989: $\sum_{\g \in \Gamma} \ag |v_{\g}|^p < \infty$, it follows by the dominated
1990: convergence theorem that $\lim_{k \to \infty}
1991: \sum_{\g \in \Gamma} \ag u_{k,\g}^p =
1992: \sum_{\g \in \Gamma} \ag |v_{\g}|^p $. Since
1993: $$
1994: \sum_{\g \in \Gamma}
1995: \ag \left| ~ |v_{k,\g}|^p -|v_{\g}|^p \right| =
1996: \sum_{\g \in \Gamma} \ag \left( |v_{\g}|^p + |v_{k,\g}|^p
1997: - 2 u_{k,\g}^p \right) \xrightarrow[k \to \infty]{~} 0 ~,
1998: $$
1999: the Lemma follows. \hfill \QED
2000:
2001: We are now ready to proceed to the
2002:
2003: {\em Proof of }Theorem 4.1:
2004: \newline
2005: Let's assume that $\mu(\epsilon)$ satisfies the requirements \eref{mu-req}.
2006: \newline
2007: We first establish weak convergence. For this it is sufficient to prove
2008: that if $(g_n)_{n \in \N}$ is a sequence in $\cH'$ such that
2009: $\|g_n-Kf_o\| \leq \epsilon_n$, where
2010: $(\epsilon_n)_{n \in \N}$ is a sequence of strictly positive numbers
2011: that converges to zero
2012: as $n \to \infty$, then {\em w}$\,$--$\lim_{n \to \infty}
2013: f^{\star}_{\mu(\epsilon_n);g_n}= f^\dagger$, where $f^{\star}_{\mu;g}$
2014: is the unique minimizer of $\Phi_{\mu ,\mathbf{w},p;g}(f)$
2015: (As predicted, we have dropped here the
2016: explicit indication of the dependence of $f^{\star}$ on $\mathbf{w}$ and
2017: $p$; these
2018: parameters will keep fixed values throughout this proof. We will take the
2019: liberty
2020: to drop them in our notation for $\Phi$ as well, when this is convenient.)
2021: For the sake of convenience,
2022: we abbreviate $\mu(\epsilon_n)$ as $\mu_n$. \\
2023: Then the $f^{\star}_{\mu_n;g_n}$ are uniformly bounded in $\cH$
2024: by the following argument:
2025: \begin{eqnarray}
2026: \| f^\star_{\mu_n;g_n} \|^p &\leq & \frac{1}{c}
2027: \Vvert f^\star_{\mu_n;g_n} \Vvert_{\mathbf{w},p}^p
2028: \leq \frac{1}{\mu_n \, c}\;\Phi_{\mu_n; g_n}(f^\star_{\mu_n;g_n})
2029: \leq \frac{1}{\mu_n \, c}\; \Phi_{\mu_n;g_n}(\fs )\nonumber \\
2030: &=&\frac{1}{\mu_n \, c}\left[\|Kf_o-g_n \|^2
2031: + \mu_n \Vvert \fs \Vvert^p_{\mathbf{w},p} \right]
2032: \leq \frac{1}{c} \left( \frac{\epsilon_n^2}{\mu_n}+\Vvert f^\dagger
2033: \Vvert_{\mathbf{w},p}^p \right) ~,
2034: \label{fmuubd}
2035: \end{eqnarray}
2036: where we have used, respectively,
2037: the bound (\ref{bdL2Ban}), the fact that $f^\star_{\mu_n;g_n}$
2038: minimizes $\Phi_{\mu_n;g_n}(f)$, $K\fs = Kf_o$
2039: and the bound $\|Kf_o-g_n\|^2 \leq \epsilon_n^2$.
2040: By the assumption \eref{mu-req}, $\epsilon_n^2 /\mu_n$ tends to zero for
2041: $n\to\infty$ and hence can be bounded by a constant independent of $n$.
2042: \newline
2043: It
2044: follows that the sequence $(f^\star_{\mu_n;g_n})_{_{n\in \N}}$ has at least
2045: one weak
2046: accumulation point, i.e. there exists a subsequence
2047:
2048: $(f^\star_{\mu_{n_l};g_{n_l}})_{_{l \in {\N}}}$ that has a weak limit.
2049: Because this sequence is bounded in the $\Vvert \; \Vvert $-norm,
2050: by passing to a subsequence
2051: $\left( f^*_{\mu_{n_{l(k)}};g_{n_{l(k)}}}\right)_{k \in \N}$, we can
2052: ensure that the $\Vvert f^*_{\mu_{n_{l(k)}};g_{n_{l(k)}}}
2053: \Vvert_{\mathbf{w},p}$
2054: constitute a converging sequence.
2055: To simplify notation, we define $\widetilde\mu_k = \mu_{n_{l(k)}}$ and
2056: ${\widetilde f}_k =
2057: f^\star_{\mu_{n_{l(k)}};g_{n_{l(k)}}}$; the $\widetilde{f}_k$ have the
2058: same weak limit $\widetilde{f}$ as the $f^*_{\mu_{n_l};g_{n_l}}$.
2059: We also define
2060: $\widetilde{g}_k = g_{n_{l(k)}}$,
2061: ${\widetilde e}_k = \widetilde{g}_k-Kf_o$ and $\widetilde\epsilon_k =
2062: \epsilon_{n_{l(k)}}$.
2063: We shall show that $\widetilde{f}=\fs$.
2064: \newline
2065: Since each $\widetilde{f}_k $ is the minimizer of
2066: $\Phi_{\widetilde\mu_k; \widetilde{g}_k}(f) $, by Proposition \ref{fix-min}, it
2067: is a fixed point of the corresponding operator $\T$. Therefore, for any $\g \in
2068: \Gamma$,
2069: ${\wf}_\g =
2070: \left<{\wf},\vpg \right>$ satisfies
2071: \begin{eqnarray*}
2072: {\wf}_\g & = &\lim_{k\to\infty}({\wf}_k)_\g =
2073: \lim_{k\to\infty} S_{\widetilde\mu_k \ag,p}[({\widetilde h}_k)_\g] \\
2074: \hbox{\ \ with\ \ } ~~~ {\widetilde h}_k & = &
2075: {\wf}_k + K^*(\widetilde g_k-K{\wf}_k)=
2076: \wf_k + \K K (f_o - \wf_k)+ K^*{\widetilde e}_k ~.
2077: \end{eqnarray*}
2078: We now rewrite this as
2079: \begin{equation}
2080: \label{2terms}
2081: {\wf}_\g=\lim_{k\to\infty}
2082: \left(S_{\widetilde\mu_k \ag,p}[({\widetilde h}_k)_\g] -
2083: ({\widetilde h}_k)_\g \right) +
2084: \lim_{k\to\infty} ({\widetilde h}_k)_\g ~ .
2085: \end{equation}
2086: By Lemma \ref{lm-4-1} the first limit in the right hand side is zero, since
2087: $$
2088: \left|S_{\widetilde\mu_k \ag,p}[({\widetilde h}_k)_\g] -
2089: ({\widetilde h}_k)_\g \right|
2090: \leq p\ \ag\, \widetilde{\mu}_k
2091: ~| (\widetilde h_k)_\g |^{p-1} /2\leq p\ C\ \widetilde{\mu}_k
2092: [ 3C + \widetilde\epsilon_k ] /2 \xrightarrow[k \to \infty]{~} 0~,
2093: $$
2094: where we have used $\|K\|<1$ ($C$ is some constant depending on $\ag$). Because
2095: $\lim_{k
2096: \to
2097: \infty}\Vert\widetilde{e}_k\Vert=0$, and {\em w}$\,$--$\lim_{k \to \infty}
2098: \wf_k =
2099: \wf$, it then follows from \eref{2terms} that
2100: \begin{equation*}
2101: {\wf}_\g = \lim_{k\to\infty} ({\widetilde h}_k)_\g = {\wf}_\g +
2102: [\K K(\fs -{\wf})]_\g \ .
2103: \end{equation*}
2104: Since this holds for all $\g$, it follows that
2105: $\K K(\fs-{\wf})=0$. If $\mN(K)=\{0\}$, then this allows us
2106: immediately to conclude that $\wf=\fs$. When $\mN(K) \neq \{0\}$,
2107: we can only conclude that $\fs -\wf \in \mN(K)$. Because $\fs$
2108: has the smallest $\Vvert ~ \Vvert _{\mathbf{w},p}$--norm among all
2109: $f \in \mathcal{S}=\{f; Kf=Kf_o\}$, it follows that
2110: $\Vvert \wf \Vvert _{\mathbf{w},p} \geq \Vvert \fs \Vvert _{\mathbf{w},p}$.
2111: On the other hand, because
2112: the ${\wf}_k$ weakly converge to ${\wf}$, and therefore, for all $\g$,
2113: $({\wf}_k)_\g \to {\wf}_\g $ as $k\to\infty$, we
2114: can use Fatou's lemma to obtain
2115: \begin{equation}
2116: \Vvert {\wf} \Vvert^p_{\mathbf{w},p} =
2117: {\sum_\g} \ag|{\wf}_\g |^p \leq
2118: \limsup_{k \to\infty} {\sum_\g} \ag |({\wf}_k)_\g|^p
2119: = \limsup_{k \to\infty}\Vvert {\wf}_k\Vvert_{\mathbf{w},p}^p
2120: = \lim_{k \to\infty}\Vvert {\wf}_k\Vvert_{\mathbf{w},p}^p \ .
2121: \label{now123}
2122: \end{equation}
2123: It then follows from (\ref{fmuubd}) that
2124: \begin{equation}
2125: \lim_{k \to\infty}\Vvert {\wf}_k\Vvert_{\mathbf{w},p}^p
2126: \leq \lim_{k \to\infty}\left[\frac{\widetilde\epsilon_k^2}{ \widetilde\mu_k}
2127: +\Vvert \fs \Vvert_{\mathbf{w},p}^p\right] = \Vvert \fs
2128: \Vvert_{\mathbf{w},p}^p
2129: \leq \Vvert {\wf}\Vvert_{\mathbf{w},p}^p \ .
2130: \label{now124}
2131: \end{equation}
2132: Together, the inequalities (\ref{now123}) and (\ref{now124}) imply that
2133: \begin{equation}
2134: \lim_{k \to\infty}\Vvert {\wf}_k\Vvert_{\mathbf{w},p}
2135: = \Vvert \fs \Vvert_{\mathbf{w},p}= \Vvert {\wf}\Vvert_{\mathbf{w},p} \ .
2136: \label{now125}
2137: \end{equation}
2138: Since $\fs$ is the unique element in $\mathcal{S}$ of minimal
2139: $\Vvert ~ \Vvert_{\mathbf{w},p}$--norm, it follows
2140: that
2141: ${\wf}=\fs$.
2142: The same argument holds for any other weakly converging subsequence of
2143: $(f^\star_{\mu_n;g_n})_{n \in {\N}}$; it follows that the sequence
2144: itself converges weakly to $\fs$.
2145: Similarly we conclude from \eref{now125} that $\lim_{n \to \infty}
2146: \Vvert f^\star_{\mu_n;g_n} \Vvert_{\mathbf{w},p} =
2147: \Vvert \fs _{\mu_n;g_n} \Vvert_{\mathbf{w},p}~$.
2148: It then follows from Lemma \ref{lm-conv} that the $f^\star_{\mu_n;g_n}$
2149: converge to $\fs$ in the $\cH$-norm. \hfill \QED
2150:
2151: \bigskip
2152:
2153: \begin{remark} {\rm Even when $p=1$ and $N(K) \neq \{0\}$, it may still be the
2154: case that, for any $f_o \in \cH$, there is a unique element $\fs$ of minimal
2155: norm in $\mathcal{S}=\{f \in \cH; Kf=Kf_o \}$. (For instance, if
2156: $K$ is diagonal in the $\varphi_\g$--basis, with some zero eigenvalues,
2157: then the unique minimizer $\fs$ in $\mathcal{S}$ is given by setting to zero
2158: all the components of $f_o$ corresponding to $\g$ for which $K \vpg = 0$.)
2159: In this case the proof still applies, and we still have norm--convergence
2160: of the $f^\star_{\mu(\epsilon),\mathbf{w},p;g}$ to $\fs$ if $\mu(\epsilon)$
2161: satisfies \eref{mu-req} and $\|g -Kf_o\| \leq \epsilon \to 0$.}
2162: \end{remark}
2163:
2164: \subsection{Stability estimates}
2165:
2166: The regularization theorem of the previous subsection
2167: gives no information on the rate at
2168: which the regularized solution approaches the exact solution when the noise
2169: (as measured by $\epsilon$) decreases to
2170: zero. Such rates are not available in the general case, but can be derived
2171: under additional assumptions, discussed below.
2172: For the remainder of this section we shall assume
2173: that the operator $K$ is invertible on its range, i.e. that
2174: $\mN(K)=\{0\}$. Suppose that the
2175: unknown exact solution of the problem, $f_o$, satisfies the constraint
2176: $\Vvert f_o \Vvert_{\mathbf{w},p} \leq \rho$, where $\rho>0$ is given; in
2177: other words, we know a priori
2178: that the unknown solution lies in the ball around the origin with radius $\rho$
2179: in the Banach space $\cB_{\mathbf{w},p}$; we shall denote this ball
2180: by $\rm{B}_{\mathbf{w},p}(0,\rho)$. If we also know that $g$ lies within
2181: a distance $\epsilon$ of $Kf_o$ in $\cH'$, then
2182: we can localize the exact
2183: solution within the set
2184: \begin{equation*}
2185: {\cal F}(\epsilon,\rho) =
2186: \{ f\in \cH ; \, \| Kf-g \| \leq \epsilon\, ,
2187: \, \Vvert f\Vvert_{\mathbf{w},p} \leq \rho \}\ .
2188: %\label{solset}
2189: \end{equation*}
2190: The diameter of this set is a measure of the uncertainty of the
2191: solution for a given prior and a given noise level $\epsilon$. The maximum
2192: diameter of ${\cal F}$, namely diam(${\cal F}$)=$\sup\{ \| f-f'\|;\
2193: f,f' \in {\cal F}\}$ is bounded by $2M(\epsilon,\rho)$, where
2194: $M(\epsilon,\rho)$, defined by
2195: \begin{equation}
2196: M(\epsilon,\rho)=\sup\{\| h\|;\, \| Kh \|
2197: \leq \epsilon \, ,\, \Vvert h\Vvert_{\mathbf{w},p} \leq \rho\} \ ,
2198: \label{MC}
2199: \end{equation}
2200: is called the {\it modulus of continuity} of $K^{-1}$ under the
2201: prior. (We have once more dropped the explicit reference in our
2202: notation to the dependence on $\mathbf{w}$ and $p$.)
2203: If \eref{compact-emb} is satisfied, then
2204: the ball $\rm{B}_{\mathbf{w},p}(0,\rho)$ is compact in $\cH$, and it
2205: follows from a general topological lemma
2206: (see e.g. \cite{Eng96}) that $M(\epsilon,\rho) \to 0$
2207: when $\epsilon \to 0~$; the uncertainty on the solution
2208: thus vanishes in this limit. However,
2209: this topological argument, which holds for any regularization
2210: method enforcing the prior $\Vvert f_o \Vvert_{\mathbf{w},p} \leq \rho$,
2211: does not tell us anything about the rate of convergence
2212: of any specific method.
2213:
2214: In what follows, we shall systematically assume that \eref{compact-emb}
2215: is satisfied. We shall also make additional assumptions that will make it
2216: possible to derive more precise convergence results.
2217: Our specific regularization method consists in taking the
2218: minimizer $f^*_{\mu;g}$ of the functional
2219: $\Phi_{\mu; g}(f )$ given by
2220: (\ref{phimu}) as
2221: an estimate of the exact solution $f_o$, where we
2222: leave any links between $\mu$ and $\epsilon$ unspecified for
2223: the moment. (Because of the compactness argument above, we could
2224: conceivably dispense with \eref{mu-req}; see below.) An upper
2225: bound on the reconstruction error $\| f^*_{\mu;g} - f_o \|$ ,
2226: valid for all $g$ such that $\|g-Kf_o\| \le \epsilon$, as well as uniformly
2227: in $f_o$, is
2228: given by the following {\it modulus of convergence}:
2229: \begin{equation}
2230: M_\mu(\epsilon,\rho) = \sup\{ \| f^*_{\mu;g} -f\|;\ f \in \cH,\, g \in \cH'
2231: \, ,
2232: \| Kf - g \| \leq \epsilon\, ,\, \Vvert f \Vvert_{\mathbf{w},p} \leq \rho \}\ .
2233: \label{modcv}
2234: \end{equation}
2235: The decay of this modulus of convergence as $\epsilon \to 0$ is governed by the
2236: decay of the modulus of continuity \eref{MC}, as shown by the following
2237: proposition.
2238: \begin{proposition}
2239: \label{bestcv}
2240: The modulus of convergence {\rm \eref{modcv}} satisfies
2241: \begin{equation}
2242: M(\epsilon,\rho) \leq M_\mu(\epsilon,\rho) \leq M(\epsilon
2243: +\epsilon',\rho + \rho')\ .
2244: \label{stability}
2245: \end{equation}
2246: where
2247: \begin{equation}
2248: \epsilon' = \left(\epsilon^2 +\mu \rho^p\right)^{\frac{1}{2}}
2249: ~~~~\mbox{and}~~~~
2250: \rho' = \left(\rho^p + \frac{\epsilon^2}{\mu}\right)^{\frac{1}{p}}\ .
2251: \label{primes}
2252: \end{equation}
2253: and $M(\epsilon,\rho)$ is defined by {\rm \eref{MC}}\ .
2254: \end{proposition}
2255: {\em Proof:} We first note
2256: that $\Phi_{\mu ; g}(f^*_{\mu;g}) \leq\Phi_{\mu;g}(f_o) \leq \epsilon^2 + \mu
2257: \rho^p$ because $f^*_{\mu;g}$ is the minimizer of $\Phi_{\mu ; g}(f)$
2258: and $f_o \in {\cal F}(\epsilon, \rho)$.
2259: It follows that
2260: $$
2261: \| Kf^*_{\mu;g} - g\|^2 \leq \Phi_{\mu; g}(f^*_{\mu;g}) \le
2262: \epsilon^2 + \mu \rho^p ~~\mbox{and}~~
2263: \mu \Vvert f^*_{\mu;g}\Vvert_{\mathbf{w},p}^p \leq \Phi_{\mu; g}(f^*_{\mu;g} )
2264: \le \epsilon^2 + \mu \rho^p
2265: $$
2266: or, equivalently, $f^*_{\mu;g} \in {\cal F}(\epsilon',\rho')$ with
2267: $\epsilon'$ and $\rho'$ given by \eref{primes}.
2268: The modulus of convergence (\ref{modcv}) can then be bounded as follows,
2269: using the
2270: triangle inequality. Indeed, for any $f \in {\cal F}(\epsilon,\rho)$ and
2271: $f' \in {\cal F}(\epsilon',\rho')$, we have
2272: $
2273: \| K(f - f') \|
2274: \leq \epsilon+\epsilon'
2275: $
2276: and
2277: $
2278: \Vvert f - f' \Vvert_{\mathbf{w},p} \leq \rho+\rho'\ .
2279: $
2280: and we immediately obtain from the definition of (\ref{MC}) the
2281: upper bound in (\ref{stability}). To derive the lower bound, observe that
2282: for the
2283: particular choice $g=0$,
2284: the minimizer $f^*_{\mu;g}$ of the functional (\ref{phimu}) is
2285: $f^*_{\mu;0}=0$. The desired lower bound then follows immediately upon
2286: inspection
2287: of the two definitions \eref{MC} and \eref{modcv}. \hfill\QED
2288:
2289: \bigskip
2290:
2291: Let us briefly discuss the meaning of the previous proposition.
2292: The modulus of continuity $M(\epsilon,\rho)$ yields the best possible
2293: convergence rate for any
2294: regularization method that enforces the error bound and the prior
2295: constraint defined by \eref{MC}.
2296: Proposition \ref{bestcv} provides a relation between the modulus of
2297: continuity and the convergence rate $M_\mu(\epsilon,\rho)$ of the
2298: specific regularization method considered in this paper, which
2299: is defined by the minimization of the functional \eref{phimu}. Optimizing
2300: the upper bound
2301: in \eref{stability} suggests the choice
2302: $\mu=\epsilon^2/\rho^p$, yielding
2303: $\epsilon'=\sqrt{2}\;\epsilon\ $ and $\rho'=2^{1/p}\rho\ $. With these
2304: choices,
2305: we ensure that $f^*_{\mu;g} \rightarrow f_o$ when $\epsilon
2306: \rightarrow 0$, i.e.
2307: that the problem is {\it regularized}, provided we can show
2308: that the modulus of continuity
2309: tends to zero with $\epsilon$.
2310: Moreover, once we establish its rate of
2311: decay (see below), we know that our regularization method is (nearly)
2312: optimal in the sense that the modulus of convergence (\ref{modcv}) will decay
2313: {\em at the same rate as the optimal rate} given by the modulus of stability
2314: $M(\epsilon,\rho)$\ (We call it {\em nearly} optimal because, although the
2315: rate of
2316: decay is optimal, the constant multiplier probably is not.)
2317: Note that because of the assumption of compactness of
2318: the ball $\rm{B}_{\mathbf{w},p}(0,\rho)$ (which amounts to assuming that
2319: \eref{compact-emb} is satisfied), we achieve regularization
2320: even in some cases where $\epsilon^2 / \mu$
2321: does not tend to zero for $\epsilon \to 0$, which is a case not covered by
2322: Theorem \ref{regthm}.
2323:
2324: In order to derive upper or lower bounds on $M(\epsilon,\rho)$, we must
2325: know more information about the operator $K$. The following proposition
2326: illustrates how such information can be used.
2327:
2328: \begin{proposition}
2329: \label{convrate-gen}
2330: Suppose that there exist sequences $\mathbf{b}=(b_\g)_{\g \in \Gamma}$
2331: and $\mathbf{B}=(B_\g)_{\g \in \Gamma}$ satisfying
2332: $\forall \,\g \in \Gamma ~:~ 0 < b_\g,
2333: \, B_\g < \infty\,$ and such that, for all $h$ in $\cH$,
2334: \begin{equation}
2335: \label{extraK}
2336: \sum_{\g \in \Gamma} b_\g |h_\g|^2 \leq \|K h \|^2
2337: \leq \sum_{\g \in \Gamma} B_\g |h_\g|^2 ~.
2338: \end{equation}
2339: Then the following upper and lower bounds hold for $M(\epsilon,\rho)$:
2340: \begin{eqnarray}
2341: \label{lowerbound}
2342: M(\epsilon,\rho) &\geq & \max_{\g \in \Gamma}
2343: \left[ \min \left(\rho \ag^{-1/p},\epsilon B_\g^{-1/2} \right) \right]~, \\
2344: \label{upperbound}
2345: M(\epsilon,\rho) &\leq & \min_{\Gamma = \Gamma_{_{\!1}} \cup \Gamma_{_{\!2}}}
2346: \sqrt{ \frac{\epsilon^2}{\min_{\g \in \Gamma_{_{\!1}}}b_\g} +
2347: \frac{\rho^2}{\min_{\g \in \Gamma_{_{\!2}}}\ag^{2/p}} } ~.
2348: \end{eqnarray}
2349: \end{proposition}
2350: {\em Proof:}
2351: To prove the lower bound, we need only exhibit one particular $h$ such that
2352: $\|Kh\| \le \epsilon$ and $\Vvert h \Vvert_{\mathbf{w},p}\le \rho$, for which
2353: $\|h\|$ is given by the right hand side of
2354: \eref{lowerbound}. For this we need only identify the index ${\g_{_m}}$ such
2355: that, $\forall \g \in \Gamma \,$,
2356: $$
2357: \nu = \min \left(\rho w_{\g_{_m}}^{-1/p},\epsilon B_{\g_{_m}}^{-1/2} \right)
2358: \ge \min \left(\rho \ag^{-1/p},\epsilon B_\g^{-1/2} \right) ~,
2359: $$
2360: and choose $h = \nu \varphi_{\g_{_m}}$. Then $\Vvert h
2361: \Vvert_{\mathbf{w},p} =
2362: \nu \, w_{\g_{_m}}^{1/p} \le \rho$ and $\|K h \| \le \nu B_{\g_{_m}}^{1/2}
2363: \le \epsilon ~$; on the other hand, $\nu$ equals the right hand side of
2364: \eref{lowerbound}.
2365:
2366: On the other hand, for any partition of $\Gamma$ into two subsets,
2367: $\Gamma = \Gamma_{_{\!1}} \cup \Gamma_{_{\!2}}$,
2368: and any $h \in \{h; \|Ku\|\le \epsilon, ~\Vvert u \Vvert_{\mathbf{w},p} \le
2369: \rho \}$, we have
2370: \begin{eqnarray*}
2371: \sum_{\g \in \Gamma}|h_\g|^2 &=& \sum_{\g \in \Gamma_{_{\!1}}} |h_\g|^2 +
2372: \sum_{\g \in \Gamma_{_{\!2}}} |h_\g|^2 \\
2373: &\le & \max_{\g' \in \Gamma_{_{\!1}}} [b_{\g'}^{-1} ]
2374: \sum_{\g \in \Gamma_{_{\!1}}} b_\g |h_\g|^2 +
2375: \max_{\g' \in \Gamma_{_{\!2}}}[ w_{\g'}^{-2/p}][\max_{\g'' \in
2376: \Gamma_{_{\!2}}} w_{\g''}
2377: |h_{\g''}|^p]^{\frac{2}{p}-1 } \sum_{\g \in \Gamma_{_{\!2}}} \ag |h_\g|^p \\
2378: & \le & \max_{\g' \in \Gamma_{_{\!1}}} [b_{\g'}^{-1}] \epsilon^2
2379: + \max_{\g' \in \Gamma_{_{\!2}}}[ w_{\g'}^{-2/p}] \rho^2 ~.
2380: \end{eqnarray*}
2381: Since this is true for any partition $\Gamma = \Gamma_{_{\!1}} \cup
2382: \Gamma_{_{\!2}}$, we
2383: still have an upper bound, uniformly valid for all $h \in
2384: \{u; \|Ku\|\le \epsilon, ~\Vvert u \Vvert_{\mathbf{w},p} \le \rho \}$, if we
2385: take the minimum over all such partitions. The upper bound on
2386: $M(\epsilon,\rho)$ then follows upon taking the square root.
2387: \hfill \QED
2388:
2389: \bigskip
2390:
2391: To illustrate how Proposition \ref{convrate-gen} could be used, let us apply
2392: it to one particular example, in which we choose
2393: the $(\vpg)$--basis with respect to which
2394: the $\Vvert~\Vvert_{\mathbf{w},p}$--norm is defined to be a wavelet basis
2395: $(\Psi_\lambda)_{\lambda \in \Lambda}$. As
2396: already pointed out in subsection 1.4.1
2397: the Besov spaces $B^s_{p,p}(\R^d)$ can then be identified with the Banach
2398: spaces
2399: $\mathcal{B}_{\mathbf{w},p}$ for the particular choice $w_\lambda =
2400: 2^{\sigma p
2401: |\lambda|}$, where $\sigma = s + d\left( \frac{1}{2}-\frac{1}{p} \right)$
2402: is assumed to be non-negative. For $f \in B^s_{p,p}(\R^d)$, the Banach norm
2403: $\Vvert f \Vvert_{\mathbf{w},p}$ then coincides with the Besov norm
2404: $\VVert f \VVert_{s,p} = \left[ \sum_{\lambda \in \Lambda} w_{\lambda}
2405: |\left<f,\Psi_\lambda \right>|^p \right]^{1/p} ~.$
2406: Let us now consider an inverse problem for the operator $K$ with such a
2407: Besov prior.
2408: If we assume that the operator
2409: $K$ has particular smoothing properties, then we can
2410: use these to derive bounds on the corresponding modulus of continuity, and thus
2411: also on the rate of convergence for our regularization algorithm.
2412: In particular, let us assume that the operator $K$ is a smoothing
2413: operator of order
2414: $\alpha$, a property which can be formulated as an equivalence between the norm
2415: $\| Kh\|$ and the norm of $h$ in a Sobolev space of
2416: negative order $H^{-\alpha}$, i.e. in a Besov space $B^{-\alpha}_{2,2}$
2417: (see e.g.
2418: \cite{Eng96}, \cite{Lou97}, \cite{DeV98}, \cite{Coh02}). In other words,
2419: we assume that
2420: for some $\alpha >0$, there exist constants $A_{\ell}$ and $A_u$, such that,
2421: for all $h \in L^2(\R^d)$,
2422: \begin{equation}
2423: \label{ell}
2424: A_{\ell}^2 \sum_\lambda 2^{-2\vert\lambda\vert\alpha}\; \vert
2425: h_\lambda\vert^2 \leq \| K h\|^2 \leq A_u^2\sum_\lambda
2426: 2^{-2\vert\lambda\vert\alpha}\;
2427: \vert h_\lambda\vert^2\ .
2428: \end{equation}
2429: The decay rate of the modulus of continuity is then characterized as follows.
2430: \begin{proposition}
2431: If the operator $K$ satisfies the smoothness condition {\rm(\ref{ell})},
2432: then the
2433: modulus of continuity $M(\epsilon,\rho)$, defined by
2434: $$
2435: M(\epsilon,\rho)= \max\{ \|h\|; \|K h\| \leq \epsilon,
2436: \VVert h \VVert_{s,p} \le \rho \} ~,
2437: $$
2438: satisfies
2439: \begin{equation}
2440: \label{M}
2441: c \left( \frac{\epsilon}{A_u} \right)^{\frac{\sigma}{\sigma+\alpha}}
2442: \rho^{\frac{\alpha}{\sigma+\alpha}} \leq M(\epsilon,\rho) \leq C
2443: \left( \frac{\epsilon}{A_\ell}\right)^{\frac{\sigma}{\sigma+\alpha}}
2444: \rho^{\frac{\alpha}{\sigma+\alpha}}
2445: \end{equation}
2446: where $\sigma = s + d\left( \frac{1}{2}-\frac{1}{p} \right)\geq0$,
2447: and $c$ and $C$ are constants depending on $\sigma$
2448: and $\alpha$ only.
2449: \end{proposition}
2450: {\em Proof:}
2451: By \eref{ell}, the operator $K$ satisfies
2452: \eref{extraK} with $b_\lambda=A_{\ell}^2\, 2^{-2\vert\lambda\vert\alpha}$
2453: and $B_\lambda=A_u^2\, 2^{-2\vert\lambda\vert\alpha} \,$.
2454: \newline
2455: It then follows from \eref{lowerbound} that
2456: $$
2457: M(\epsilon, \rho) \geq \max_\lambda \left[ \min \left(
2458: \rho \, 2^{- \sigma |\lambda|} , \frac{\epsilon}{A_u}\, 2^{\alpha |\lambda|}
2459: \right) \right] ~;
2460: $$
2461: if $x=|\lambda|$ could take on all positive real values, then one easily
2462: computes
2463: that this max-min would be given for
2464: $x= - [\log_2(\epsilon/\rho A_u)]/(\alpha + \sigma)$, and would be equal to
2465: $(\epsilon/A_u)^{\sigma/(\alpha + \sigma)} \rho^{\alpha/(\alpha + \sigma)}$.
2466: Because $|\lambda|$ is constrained to take only the values in $\N$, the max-min
2467: is guaranteed only to be within a constant of this bound (corresponding
2468: to an integer neighbor of the optimal $x$), which leads
2469: to the lower bound in \eref{M}.
2470:
2471: For the upper bound \eref{upperbound}, we must partition the index set.
2472: Splitting $\Lambda$ into $\Lambda_{_1}=\{\lambda; |\lambda| < J\}$ and
2473: $\Lambda_{_2}=\{\lambda; |\lambda| \ge J\}$, we find that
2474: $$
2475: \frac{\epsilon^2}{\min_{\lambda \in \Lambda_{_{1}}}b_\lambda} +
2476: \frac{\rho^2}{\min_{\lambda \in \Lambda_{_{2}}}w_\lambda^{2/p}}
2477: =\frac{\epsilon^2}{A_\ell^2}\, 2^{2\alpha(J-1)} + \rho^2\, 2^{-2 \sigma J} ~.
2478: $$
2479: The minimizing partition for $\Lambda$ thus corresponds with the minimizing
2480: $J$ for the right hand side of this expression. This value
2481: for $J$ is an integer neighbor of
2482: $y= - [\log_2(\epsilon/\rho A_\ell)]/(\alpha + \sigma)$, which leads to the
2483: upper bound in \eref{M}.
2484: \hfill \QED
2485:
2486: \bigskip
2487:
2488: The stability estimates we have derived are standard in regularization
2489: theory for the special case $p=2$; they were first extended
2490: to the case $1\leq p < 2$
2491: in \cite{DeM02}. They show the interplay between the smoothing order of the
2492: operator
2493: characterized by $\alpha$ and the assumed smoothness of
2494: the solutions characterized by $\sigma=s+d(\frac{1}{2}-\frac{1}{p})$
2495: (for Besov spaces, we recall that this amounts to
2496: solutions
2497: having $s$ derivatives in $L^p$). For
2498: $\sigma/(\sigma+\alpha)$ close to one, the problem is mildly ill-posed,
2499: whereas the stability degrades for large $\alpha$. Note that if
2500: the bound (\ref{ell}) were replaced by another one,
2501: in which the decay of the $b_\lambda$ and
2502: $B_\lambda$ was given by an exponential decay in $D=2^{|\lambda|}$ (instead
2503: of the much slower decaying negative power $D^{-2\alpha}$) of \eref{ell},
2504: then the modulus of continuity would tend to zero only as an
2505: inverse power
2506: of $| \log\epsilon|$. This is the so-called {\it logarithmic continuity}
2507: which has been extensively discussed in the case $p=2$, and which extends,
2508: as shown
2509: by an easy application of Proposition \ref{convrate-gen}, to $1 \le p <2$.
2510:
2511: \section{An illustration}
2512: We provide a simple illustration of the behavior of the algorithms based on
2513: minimizing
2514: $\Phi_{\mu \mathbf{w}_0,1}$ and
2515: $\Phi_{\mu \mathbf{w}_0,2}$ for a two-dimensional deconvolution
2516: problem,
2517: considering a class of objects consisting of small bright sources on a dark
2518: background. The image is discretized on a $256 \times 256$ array,
2519: denoted by $f$. The convolution operator $K$ is
2520: implemented by multiplying the discrete Fourier transform (DFT) of $f$
2521: by a low-pass, radially symmetrical filter and then
2522: taking the inverse DFT to obtain the data $g=Kf$ (data were zero padded on
2523: a larger
2524: $512 \times 512$ array when taking DFTs). The filter was equal
2525: (in the Fourier domain) to the convolution with itself
2526: of the characteristic function of a disk with radius equal to $0.1$ times the
2527: maximum frequency determined by the image grid sampling;
2528: this filter provides a discrete model of a diffraction-limited imaging
2529: system with
2530: incoherent light. Pseudo-random Poisson noise was
2531: added to the data array $g$, for a total number of $10000$ photons,
2532: corresponding to about $25$ photons for the data pixel with the maximum
2533: intensity.
2534:
2535: \begin{figure}[h!tb]
2536: \begin{center}
2537: \epsfig{figure=figure1.ps, width= 5 in}
2538: \caption{The object $f$ (top left), the image $g$ after convolving
2539: with a radially symmetrical low-pass filter and adding pseudo-random
2540: Poisson noise (top right), and the minimizers of $\Phi_{\mu_1 \mathbf{w}_0,1}$
2541: (bottom left) and $\Phi_{\mu_2 \mathbf{w}_0,2}$ (bottom right). The values
2542: $\mu_1=0.001$ and $\mu_2=0.0001$ have been selected separately for the
2543: $\ell^1$- and $\ell^2$-cases, to obtain a balance between
2544: sharpness and ringing and noise. }
2545: \end{center}
2546: \end{figure}
2547:
2548: The top of Figure 1 shows the object $f$ (four ellipses of axis
2549: $7.5$ or $5.0$ pixels, slightly smoothed to avoid blocking effects)
2550: and the data $g=Kf$. The figure also shows intensity
2551: distributions along two lines in the object and data arrays; along the
2552: horizontal line we see
2553: how two close sources in $f$ give rise to a joint blur in $g$.
2554: The bottom of Figure 1 shows the reconstructions obtained after
2555: 2000 steps of the iterative thresholded Landweber algorithm, which
2556: accurately approximate the minimizers of $\Phi_{\mu_1 \mathbf{w}_0,1}$ and
2557: $\Phi_{\mu_2 \mathbf{w}_0,2}$. The parameters $\mu_1$ and $\mu_2$
2558: are selected separately for each case, in order to achieve a good balance
2559: between
2560: sharpness and ringing and noise.
2561:
2562: \begin{figure}[h!tb]
2563: \begin{center}
2564: \epsfig{figure=figure2.ps, width= 5 in}
2565: \caption{A comparison to illustrate the impact of the positivity constraint,
2566: imposed at every iteration step. On the left are the fixed points for
2567: $P_{\cal C} \mathbf{S}_{\mu_1 \mathbf{w}_0,1}$ (top) and
2568: $\mathbf{S}_{\mu_1 \mathbf{w}_0,1}$ (bottom); on the right
2569: those of $P_{\cal C} \mathbf{S}_{\mu_2 \mathbf{w}_0,2}$ (top) and
2570: $\mathbf{S}_{\mu_2 \mathbf{w}_0,2}$ (bottom). The data
2571: and the values of $\mu_1,~\mu_2$ are the same as in Figure 1.}
2572: \end{center}
2573: \end{figure}
2574:
2575: As expected for an example of this type, the minimizer of
2576: $\Phi_{\mu_1 \mathbf{w}_0,1}$ does a better job at
2577: resolving the two close sources on the horizontal line; it also gives a better
2578: concentration of the source lying along the vertical line.
2579: Because the object $f$ is positive, we can
2580: apply Remark 3.12 and use $P_{\cal C} S_{\mu \mathbf{w}_0,p}$ instead
2581: of $S_{\mu \mathbf{w}_0,p}$, where $P_{\cal C}$ is the projection
2582: onto the convex cone of $256 \times 256$ arrays that take only non-negative
2583: values.
2584:
2585: The results are
2586: shown in Figure 2; on the top is the $2000$-th iterate for the case with
2587: $P_{\cal C}$, with the case $p=1$
2588: (left) and $p=2$ (right). In each case we used the same values of
2589: $\mu_1$ and $\mu_2$ as for
2590: the reconstructions without positivity constraint, which are shown for
2591: comparison on the bottom of
2592: Figure 2. Exploiting the positivity constraint leads to better resolution and
2593: less ringing for this example, where the background is zero.
2594:
2595: \begin{figure}[h!tb]
2596: \begin{center}
2597: \epsfig{figure=figure3.ps, width= 5 in}
2598: \caption
2599: {A different view of the four solutions in Figure 2, with a different
2600: dynamic range for the image intensity gray scale, to highlight ringing
2601: and other artifacts.}
2602: \end{center}
2603: \end{figure}
2604:
2605: Figure 3 gives a different view of the same solutions, with
2606: a compressed gray scale ranging from $- 2 \%$ (darker) to $+ 2 \%$ (lighter)
2607: of the maximum intensity in the original object. This has the effect of
2608: highlighting
2609: the ringing effects and the noise. Both
2610: ringing and noise are seen to be less pronounced for the minimizer of
2611: $\Phi_{\mu_1 \mathbf{w}_0,1}$ (top left)
2612: than that of $\Phi_{\mu_2 \mathbf{w}_0,2}$. Although the introduction of
2613: the positivity constraint removes the
2614: ringing phenomenon (top of Figure 3), we nevertheless see that noise is better
2615: suppressed with $p=1$.
2616:
2617: To produce Figures 1 to 3, the same program was used in every case; the only
2618: change was the choice
2619: of the nonlinear operator applied at the $n$-th iteration step to
2620: $f^{n-1} + K^* (g - K f^{n-1})$.
2621: For realistic applications on data of this type, more sophisticated algorithms
2622: exist. With the
2623: $\ell^2$--penalty, for instance, the reconstructions in our simple example
2624: can be
2625: obtained directly by a regularized Fourier deconvolution.
2626: These examples are included to illustrate the differences
2627: that can be achieved by the choice of $p$, and do not constitute a claim
2628: that the
2629: iterative algorithms discussed
2630: in this paper are optimal. The ``data'' in this example are also only
2631: simple-minded
2632: caricatures of quasi
2633: point-sources data sets. While similar examples may have applications in
2634: astronomy,
2635: most natural
2636: images have a much richer structure. However, as is abundantly documented, the
2637: wavelet transforms of
2638: natural images tend to have distributions that are sparse. A similar
2639: improvement in
2640: accuracy can be expected by applying $\ell^1$ rather than $\ell^2$
2641: penalizations on the
2642: wavelet coefficients
2643: in inverse problems involving natural images, similar to the gain achieved in
2644: denoising with
2645: a soft thresholding rather than with a quadratic penalty.
2646:
2647: \section{Generalizations and additional comments}
2648:
2649: The algorithm proposed in this paper can be generalized in several directions,
2650: some of which we list here, with brief comments.
2651:
2652: The penalization functionals $\Vvert f \Vvert_{\mathbf{w},p}$ we have used are
2653: symmetric, i.e. they are invariant under the exchange of $f$ for $-f$. We can
2654: equally well consider penalization functionals that treat positive and
2655: negative
2656: values of the $\fg$ differently. If $(w^+_\gamma)_{\gamma \in \Gamma}$ and
2657: $(w^-_\gamma)_{\gamma \in \Gamma}$ are two sequences of
2658: strictly positive numbers,
2659: then we can consider the problem of minimizing the functional
2660: \begin{equation}
2661: \label{asfunct}
2662: \Phi_{\mathbf{w}^+, \mathbf{w}^-,p} (f)= \Vert Kf-g\Vert^2 +
2663: \sum_{\gamma \in \Gamma}
2664: ((w^+_\gamma) [\fg]_+^p + (w^-_\gamma) [\fg]_-^p)
2665: \end{equation}
2666: where, for $r \in \mathbb{R},~ r_+ = \max(0,r),~ r_- = \max(0,-r)$. One easily
2667: checks that all the arguments in this paper can be applied equally well (after
2668: some straightforward modifications) to the general functional \eref{asfunct},
2669: provided we replace the thresholding functions $S_{w_\gamma,p}$ in the
2670: iterative
2671: algorithm by $S_{w^+_\gamma, w^-_\gamma, p}$, where, for $p>1$,
2672: \begin{equation*}
2673: S_{w^+, w^-, p} = \left( F_{w^+, w^-, p}\right)^{-1}\hbox{\ \ with\ \ }
2674: F_{w^+, w^-, p}(x) = x +\frac{p}{2}\, w^+ [x]_+^{p-1} -
2675: \frac{p}{2}\, w^- [x]_-^{p-1} ~,
2676: \end{equation*}
2677: and for $p=1$,
2678: \begin{equation*}
2679: S_{w^+, w^-, 1} = \left\{
2680: \begin{array}{ccl} x + w^-/2 ~&~ \mbox{if} ~& x \leq - w^-/2 \\
2681: 0 ~&~ \mbox{if} ~& - w^-/2 < x < w^+/2
2682: \\ x- w^+/2 ~&~ \mbox{if} ~& x \geq w^+/2.
2683: \end{array} \right.
2684: \end{equation*}
2685:
2686: The above applies when the $\fg$ are all real; a generalization to complex
2687: $\fg$
2688: is not straightforward. When dealing with complex functions, one could
2689: generalize
2690: the penalization $\sum_{\gamma \in \Gamma} w_\gamma |\fg|^p$ to
2691: $\sum_{\gamma \in \Gamma, |\fg| \neq 0} w_\gamma({\rm arg} \fg) |\fg|^p$,
2692: where the weight coefficients have been replaced by strictly positive
2693: $2\pi$--periodic $C^1$--functions on the $1$--torus $\mathbb{T} =
2694: \{x \in \mathbb{C}, |x|=1\}$.
2695: It turns out, however, that the variational
2696: equation for $e^{i \arg\fg}=\fg |\fg|^{-1}$ then no longer uncouples
2697: from that for $|\fg|$ (as it does in the case where $w_\gamma$ is a constant),
2698: leading to a more complicated ``generalized thresholding'' operation in
2699: which the
2700: absolute value and phase of the complex number $S_{w,p}(\fg)$ are given
2701: by a system of
2702: coupled nonlinear equations.
2703:
2704: When the $(\varphi_\gamma)_{\gamma \in \Gamma}$--basis is chosen to be a
2705: wavelet basis,
2706: then we saw in subsection 1.4.1 that is is possible to make the
2707: $\Vvert ~ \Vvert_{\mathbf{w},p}$--norm equivalent to the Besov-norm
2708: $\VVert ~ \VVert_{s,p}$, by choosing the weight for
2709: $| \langle f, \Psi_\lambda \rangle |^p$ to be given by
2710: $w_\lambda = 2^{|\lambda| \sigma p}$, where $|\lambda |$ is the scale of
2711: wavelet
2712: $\Psi_\lambda$. The label $\lambda$ contains much more information than just
2713: the scale, however, since it also indicates the location of the wavelet,
2714: as well as
2715: its ``species'' (i.e. exactly which combination of $1$-dimensional scaling
2716: functions
2717: and wavelets is used to construct the product function $\Psi_\lambda$).
2718: One could
2719: choose the $w_\lambda$ so that certain regions in space are given
2720: extra weight, or
2721: on the contrary de-emphasized, depending on prior information.
2722: In pixel space, prior information on the support of the object to be
2723: reconstructed can be easily enforced by simply
2724: setting the
2725: corresponding weights to very small values,
2726: or by choosing very large weights outside
2727: the object support. This type of constraint is of uttermost importance
2728: to achieve superresolution in inverse problems in optics and imaging
2729: (see e.g. \cite{Ber96}).
2730: When thresholding in the wavelet domain,
2731: a constraint on the object support can be enforced in a similar way due to the
2732: good spatial localization of the wavelets.
2733: If no a priori information is known,
2734: one could
2735: even imagine repeating the wavelet thresholding
2736: algorithm several times, adapting
2737: the weights $w_\lambda$
2738: after each pass, depending on the results of the previous pass;
2739: this could be used,
2740: e.g., to emphasize certain locations at fine scales if coarser scale
2741: coefficients
2742: indicate the possible existence of an edge. The results of this paper
2743: guarantee
2744: that each pass will converge.
2745:
2746: In this paper we have restricted ourselves to penalty functions that are
2747: weighted $\ell^p$--norms of the $\fg = \left<f, \varphi_{\gamma} \right>$. The
2748: approach can be extended naturally to include penalty functions that
2749: can be written as sums, over $\gamma \in \Gamma$, of more general
2750: functions of $\fg$, so that the functional to be minimized is then written
2751: as
2752: $$
2753: \widetilde{\Phi}_{_{\mbox{\scriptsize{\bf{W}}}}}(f) = \|Kf-g\|^2 +
2754: \sum_{\gamma \in \Gamma}
2755: W_{\gamma} (|\fg|) ~.
2756: $$
2757: The arguments in this paper will still be applicable to this more general case
2758: if the functions $W_{\gamma}: \R_+ \rightarrow \R_+$ are convex, and satisfy
2759: some extra technical conditions, which ensure that the corresponding
2760: generalized component--shrinkage functions $\widetilde{S}_{\gamma}$ are still
2761: non-expansive (used in several places), and that, for some $c > 0$,
2762: $$
2763: \inf_{\|v\| \leq 1} ~ \inf_{\|a\| \leq c} \|v\|^{-2}
2764: \sum_{\gamma \in \Gamma} \left|v_{\gamma} + \widetilde{S}_{\gamma}(a_{\gamma})
2765: -\widetilde{S}_{\gamma}(v_{\gamma}+a_{\gamma}) \right|^2 > 0 ~
2766: $$
2767: (used in Lemma \ref{lm-3-16}).
2768: To ensure that both conditions are satisfied, it is sufficient to choose
2769: functions $W_{\gamma}$ that are convex, with a minimum at $0$ and e.g.
2770: twice differentiable, except possibly at $0$ (where they should nevertheless
2771: still be left and right differentiable), and for which $W_{\gamma}'' >1$
2772: on $V \setminus \{0\}$, where $V$ is a neighborhood of $0$.
2773:
2774: We conclude this paper with some comments
2775: concerning the numerical complexity of the algorithm.
2776:
2777: At each iteration step,
2778: we must compute the action of the operator $K^*K$ on the current
2779: object estimate, expressed in the $\varphi_{\gamma}$--basis.
2780: In a finite-dimensional setting where the solution is
2781: represented by a vector of length $N$, this necessitates in principle a
2782: matrix multiplication
2783: of complexity $O(N^2)$,
2784: if we neglect the cost of the shrinkage operation in each iteration step.
2785: After sufficient accuracy is attained and the iterations are stopped, the
2786: resulting $(f^n)_{\gamma}$ must be transformed back into the standard
2787: representation domain of the object function, except in the special case
2788: where the $\varphi_{\g}$ are already the basis for the standard representation
2789: (e.g., if the $\varphi_{\g}$ correspond to the pixel representation
2790: for images). This adds one final $O(N^2)$--matrix multiplication. In this
2791: scenario, the total cost equals that
2792: of the classical Landweber algorithm
2793: on the basis of a comparable number of iterations.
2794: Since Landweber's algorithm typically requires a substantial number of
2795: iterations, it follows that this method can become
2796: computationally competitive with the $O(N^3)$ SVD algorithms only
2797: when $N$ is large compared to the number of iterations necessary.
2798:
2799: Several methods have been proposed in the literature to accelerate the
2800: convergence
2801: of Landweber's iteration, which could be used for the present algorithm as
2802: well.
2803: For instance, one could use some form of preconditioning (using the
2804: operator $D$
2805: of the Remark \ref{op-D}) or group together $k$ Landweber iteration steps and
2806: apply thresholding only every $k$ steps (see e.g. the book \cite{Eng96}).
2807:
2808: Much more substantial gains can be obtained when the operator $\K K$ can be
2809: implemented via fast algorithms. In a first important class of applications,
2810: the matrix \\
2811: $\left( \left<\K K\varphi_{\gamma},\varphi_{\gamma'}\right> \right)_
2812: {\gamma , \gamma' \in \Gamma}$ is sparse; if, for instance, there are only
2813: $O(N)$ non vanishing entries in this matrix, then standard techniques to deal
2814: with the action of sparse matrices will reduce the cost of each iteration step
2815: to $O(N)$ instead of $O(N^2)$. If the $\vpg$--basis
2816: is a wavelet basis, this is the case for a large class
2817: of integro-differential operators of interest (see e.g. \cite{Bey91}).
2818: Even if $\K K$ is sparse in the $\vpg$--basis, but has an even simpler
2819: expression in another basis, and if the transforms back and forth between the
2820: two bases can be carried out via fast algorithms, then it may be useful to
2821: implement the action of $\K K$ via these back--and--forth transformations.
2822: For instance, if the object is of a type that will have a sparse representation
2823: in a wavelet basis, and the operator $\K K$ is a convolution operator, then
2824: we can pick the $\vpg$--basis to be a wavelet basis, and implement the
2825: operator $\K K$ by doing, successively, a fast reconstruction from wavelet
2826: coefficients, followed by a FFT, a multiplication in the Fourier domain, an
2827: inverse FFT, and a wavelet transform, for a total complexity of $O(N \log N)$.
2828: One can obtain similar complexity estimates if the algorithm is modified
2829: to not only take the nonlinear thresholding into account, but also additional
2830: projections $P_{\cal C}$ on a convex set, such as the cone of functions that
2831: are a.e. positive; in this case, after the thresholding operation, one needs
2832: to carry out an additional fast reconstruction from, say, the wavelet domain,
2833: take the positive part, and then perform the fast transform back, without
2834: affecting the $O(N \log N)$ complexity estimate.
2835:
2836: The situations described above cover several applications of
2837: great practical relevance, in which we expect this algorithm will prove itself
2838: to be an attractive competitor to other fast techniques for large-scale
2839: inverse problems with sparsity constraints.
2840:
2841:
2842:
2843:
2844: \section*{Acknowledgments}
2845:
2846: We thank Albert Cohen, Rich Baraniuk, Mario Bertero, Brad Lucier,
2847: St\'ephane Mallat
2848: and especially David Donoho for interesting and stimulating discussions. We
2849: also
2850: would like to thank Rich Baraniuk for drawing our attention to \cite{Fig03}.
2851:
2852: Ingrid Daubechies gratefully acknowledges partial support by NSF grants
2853: DMS-0070689
2854: and DMS-0219233, as well as by AFOSR grant F49620-01-1-0099, whereas research
2855: by Christine De Mol is supported by the ``Action de Recherche Concert\'ee'' Nb
2856: 02/07-281 and IAP-network in Statistics P5/24.
2857:
2858:
2859:
2860: \begin{thebibliography}{A}
2861:
2862: \bibitem [1]{Abr98} F. Abramovich and B. W. Silverman, \textit{Wavelet
2863: Decomposition Approaches to Statistical Inverse Problems.} Biometrika
2864: \textbf{85} (1998), 115--129.
2865:
2866: \bibitem [2]{Ber98} M. Bertero and P. Boccacci, \textit{Introduction to
2867: Inverse Problems in Imaging}, Institute of Physics, Bristol, 1998.
2868:
2869: \bibitem [3]{Ber96} M. Bertero and C. De Mol, \textit{Super-resolution by
2870: data inversion}, in: Progress in Optics (Vol. XXXVI), E. Wolf, ed.,
2871: Elsevier, Amsterdam, 1996, pp. 129--178.
2872:
2873: \bibitem [4]{Bey91} G. Beylkin, R. Coifman and V. Rokhlin, \textit{Fast
2874: Wavelet Transforms and Numerical Algorithms I.} Comm. Pure Appl. Math.
2875: \textbf{44} (1991), 141--183.
2876:
2877: \bibitem [5]{Can00} E. J. Cand\`es and D. L. Donoho, \textit{Recovering Edges
2878: in Ill-Posed Inverse Problems: Optimality of Curvelet Frames.} Ann. Statist.
2879: \textbf{30} (2000), 784--842.
2880:
2881: \bibitem [6]{Cha98} A. Chambolle, R. A. DeVore, N.-Y. Lee and B. J. Lucier,
2882: \textit{Nonlinear Wavelet Image Processing: Variational Problems, Compression,
2883: and Noise Removal through Wavelet Shrinkage.} IEEE Trans. Image Processing
2884: \textbf{7} (1998), 319--335.
2885:
2886: \bibitem [7]{CDS01} S. Chen, D. Donoho and M. Saunders, \textit{Atomic
2887: Decomposition by Basis Pursuit} SIAM Review \textbf{43} (2001), 129--159.
2888:
2889: \bibitem [8]{Coh00} A. Cohen, \textit{Wavelet methods in numerical
2890: analysis.}, Handbook of Numerical Analysis, vol. VII, P. G. Ciarlet and J. L.
2891: Lions eds., Elsevier, Amsterdam, 2000.
2892:
2893: \bibitem [9]{Coh02} A. Cohen, M. Hoffmann and M. Reiss, \textit{Adaptive
2894: wavelet Galerkin methods for linear inverse problems.} preprint, 2002.
2895:
2896:
2897: \bibitem [10]{DeM02} C. De Mol and M. Defrise, \textit{A note on
2898: wavelet-based inversion methods}, in: \textit{Inverse Problems, Image Analysis
2899: and Medical Imaging}, M. Z. Nashed and O. Scherzer eds,
2900: Series ``Contemporary Mathematics''
2901: vol. 313, pp. 85--96, American Mathematical Society, 2002.
2902:
2903: \bibitem [11]{DeP95} A. R. De Pierro, \textit{A modified expectation
2904: maximization algorithm for penalized likelihood estimation in emission
2905: tomography.} IEEE Trans. Med. Imag. \textbf{14} (1995), 132--137.
2906:
2907: \bibitem [12]{DeV98} R. DeVore, \textit{Nonlinear Approximation.} Acta
2908: Numerica (1998), 1--99.
2909:
2910: \bibitem [13]{Dic96} V. Dicken and P. Maass, \textit{Wavelet-Galerkin
2911: methods for ill-posed problems.} J. Inv. Ill-Posed Problems \textbf{4} (1996)
2912: 203--222.
2913:
2914: \bibitem [14]{Don92} D. Donoho, \textit{Superresolution via sparsity
2915: constraints.}
2916: SIAM J. Math. Anal. \textbf{23} (1992), 1309--1331.
2917:
2918: \bibitem [15]{Don95} D. Donoho, \textit{Nonlinear solution of Linear
2919: Inverse Problems by Wavelet-Vaguelette Decomposition.} Appl. Comp. Harmonic
2920: Anal. \textbf{2} (1995), 101--126.
2921:
2922: \bibitem [16]{Don00} D. Donoho, \textit{Orthonormal ridgelets and linear
2923: singularities.} SIAM J. Math. Anal.
2924: \textbf{31} (2000), 1062--1099.
2925:
2926: \bibitem [17]{Don94} D. Donoho and I. Johnstone, \textit{Ideal
2927: spatial adaptation via wavelet shrinkage.} Biometrika
2928: \textbf{81} (1994), 425--455.
2929:
2930: \bibitem [18]{DuSh52} R. J. Duffin and A. C. Schaeffer, \textit{A class of
2931: nonharmonic Fourier series.} Trans. Am. Math. Soc. \textbf{72} (1952),
2932: 341--366.
2933:
2934: \bibitem [19]{Eic92} B. Eicke, \textit{Iteration methods for convexly
2935: constrained ill-posed problems in Hilbert space.} Numer. Funct. Anal. Optim.
2936: \textbf{13} (1990), 413--429.
2937:
2938: \bibitem [20]{Eng96} H. W. Engl, M. Hanke and A. Neubauer,
2939: \textit{Regularization of Inverse Problems}, Kluwer, Dordrecht, 1996.
2940:
2941: \bibitem [21]{Fig03} M. Figueiredo and R. Nowak, \textit{An EM Algorithm
2942: for Wavelet-Based Image Restoration}, IEEE Transactions on Image Processing.
2943: To appear in July 2003.
2944:
2945: \bibitem [22]{KMR03} J. Kalifa, S. Mallat and B. Roug\'e,
2946: \textit{Deconvolution by thresholding in mirror wavelet bases.} IEEE Trans. on
2947: Image Processing \textbf{12} (2003), 446--457.
2948:
2949: \bibitem [23]{Lad51} L. Landweber, \textit{An iterative formula for Fredholm
2950: integral equations of the first kind.} Am. J. Math.
2951: \textbf{73} (1951), 615--624.
2952:
2953: \bibitem [24]{Lan00} K. Lange, D. R. Hunter and I. Yang,
2954: \textit{Optimization Transfer algorithms using surrogate objective functions.}
2955: J. Comp. Graph. Stat. \textbf{9} (2000), 1--59.
2956:
2957: \bibitem [25]{Lee01} N.-Y. Lee and B. J. Lucier,
2958: \textit{Wavelet Methods for Inverting the Radon Transform with Noisy Data.}
2959: IEEE Trans. Image Processing \textbf{10} (2001), 79--94.
2960:
2961: \bibitem [26]{Li02} M. Li, H. Yang and H. Kudo, \textit{An accurate iterative
2962: reconstruction algorithm for sparse objects: application to 3-D blood-vessel
2963: reconstruction from a limited number of projections.} Phys. Med. Biol
2964: \textbf{47}
2965: (2002), 2599--2609.
2966:
2967: \bibitem [27]{Lou97} A. K. Louis, P. Maass and A. Rieder, \textit{Wavelets:
2968: Theory and Applications}, Wiley, Chichester, 1997.
2969:
2970: \bibitem [28]{Mal98} S. Mallat, \textit{A Wavelet Tour of Signal
2971: Processing}, 2nd edition, Academic Press, San Diego, 1999.
2972:
2973: \bibitem [29]{Mey92} Y. Meyer, \textit{Wavelets and Operators}, Cambridge
2974: University Press, 1992.
2975:
2976: \bibitem [30]{Nov01} R. Nowak and M. Figueiredo, \textit{Fast wavelet-based
2977: image deconvolution using the EM algorithm.} Conference Record of the
2978: Thirty-Fifth
2979: Asilomar Conference on Signals, Systems and Computers, Vol. 1 ,
2980: pp. 371--375, 2001.
2981:
2982: \bibitem [31]{Opi67} Z. Opial, \textit{Weak convergence of the sequence of
2983: successive approximations for nonexpansive mappings.} Bull. Amer. Math. Soc.
2984: \textbf{73} (1967), 591--597.
2985:
2986: \bibitem [32]{Tib96} R. Tibshirani, \textit{Regression shrinkage and
2987: selection via the lasso.} J. Royal Statist. Soc. B \textbf{58} (1996),
2988: 267--288.
2989:
2990:
2991: \end{thebibliography}
2992:
2993:
2994:
2995:
2996:
2997: \section*{Appendices}
2998: \appendix
2999:
3000: \section{Wavelets and Besov spaces}
3001: \label{WavBes}
3002:
3003: We give a brief review of basic definitions of wavelets and their
3004: connection with Besov spaces.
3005: This will be a sketch only; for details, we direct the reader to e.g.
3006: \cite{Mey92, DeV98, Coh00, Mal98}.
3007:
3008: For simplicity we start with dimension 1. Starting from a (very special)
3009: function
3010: $\psi$ we define\begin{equation*}\psi_{j,k}(x)= 2^{j/2}\ \psi(2^j x-k) ~,
3011: j,k \in \Z~,\end{equation*}
3012: and we assume that the collection $\{\psi_{j,k}; j,~k \in \Z\}$
3013: constitutes an orthonormal basis of $L^2(\mathbb R)$.
3014: For all wavelet bases used in practical applications, there also exists an
3015: associated
3016: {\em scaling function} $\phi$, which is orthogonal to its
3017: translates by integers, and such
3018: that, for all $j \in \Z$,
3019: \begin{equation}
3020: \label{MRA}
3021: \overline{\mbox{Span}\{\phi_{j,k}; k \in \Z\}}~\mbox{\small{$\bigoplus$}}
3022: ~\overline
3023: {\mbox{Span}\{\psi_{j,k}; k \in \Z\}}
3024: = \overline{\mbox{Span}\{\phi_{j+1,k}; k \in \Z\}} ~,
3025: \end{equation}
3026: where the $\phi_{j,k}$ are defined analogously to the $\psi_{j,k}$.
3027: Typically, the functions $\phi$ and $\psi$ are very well localized, in the
3028: sense
3029: that $\forall N \in \N$, $\int_{\R} (1+|x|)^N(|\phi(x)|+|\psi(x)|) dx <
3030: \infty$;
3031: one can even
3032: choose $\phi$ and $\psi$ such that they are supported on a finite interval.
3033: This can be
3034: achieved with arbitrary finite smoothness, i.e. for any preassigned $L \in
3035: \N$, one can
3036: find such $\phi$ and $\psi$ that are moreover in $C^L(\R)$. Because of
3037: \eref{MRA},
3038: one can consider (inhomogeneous) wavelet expansions, in which not all
3039: scales $j$
3040: are used,
3041: but a cut-off is introduced at some coarsest scale, often set at $j=0$.
3042: More precisely,
3043: we shall use the following wavelet expansion
3044: of $f
3045: \in L^2$,
3046: \begin{equation}
3047: \label{inhMRA}
3048: f= \sum_{k=-\infty}^{+\infty} \left<f,\phi_{0,k}\right> \phi_{0,k} +
3049: \sum_{j=0}^{+\infty}
3050: \sum_{k=-\infty}^{+\infty} \left<f,\psi_{j,k}\right> \psi_{j,k}~.
3051: \end{equation}
3052: Wavelet bases in
3053: higher dimensions can be built by taking appropriate products of
3054: one-dimensional
3055: wavelet and scaling functions. Such $d$-dimensional bases can be viewed as the
3056: result of translating (by elements $k$ of $\Z^d$) and
3057: dilating (by integer powers $j$ of 2) of
3058: not just one, but several
3059: (finite in number) ``mother wavelets'', typically numbered from
3060: 1 to $2^d-1$.
3061: It will be convenient to abbreviate the full label (including $j$, $k$ and the
3062: number of the mother wavelet) to just $\lambda$, with the convention that
3063: $|\lambda|=j$.
3064: We shall again cut off at some coarsest scale, and we shall follow the
3065: convenient
3066: slight abuse
3067: of notation used in \cite{Coh00} that sweeps up the coarsest-$j$
3068: scaling functions
3069: (as in \eref{inhMRA}) into the $\Psi_{\lambda}$ as well. We thus denote the
3070: complete
3071: $d$-dimensional, inhomogeneous wavelet basis by
3072: $\{\Psi_{\lambda}; \lambda \in \Lambda\}$.
3073:
3074: It turns out that $\{\Psi_{\lambda}; \lambda \in \Lambda\}$
3075: is not only an orthonormal basis
3076: for $L^2(\R^d)$, but also an unconditional basis for a variety of other
3077: useful
3078: Banach spaces of functions, such
3079: as H\"older spaces, Sobolev spaces and, more generally, Besov spaces.
3080: Again, we review only some basic facts; a full study can be found in e.g.
3081: \cite{Mey92, DeV98, Coh00}. The Besov spaces $B^s_{p,q}(\R^d)$
3082: consist, basically, of functions that ``have $s$ derivatives in $L^p$'';
3083: the parameter $q$ provides
3084: some additional fine-tuning to the definition of these spaces. The norm
3085: $\|f\|_{_{B^s_{p,q}}}$ in a Besov space $B^s_{p,q}$ is traditionally
3086: defined via the
3087: {\em modulus of continuity} of $f$ in $L^p(\R)$, of which an additional
3088: weighted $L^q$-norm
3089: is then taken, in which the integral is over different scales.
3090: We shall not give its details here; for our purposes it suffices that
3091: this traditional Besov norm is equivalent with a norm that can be computed from
3092: wavelet coefficients. More precisely, let us assume that the original
3093: 1-dimensional $\phi$ and
3094: $\psi$ are in $C^L(\R)$, with $L>s$, that
3095: $\sigma=s+d(\frac{1}{2}-\frac{1}{p}) \geq 0$,
3096: and define the norm $\VVert \cdot \VVert_{_{s;p,q}}$ by
3097: \begin{equation}
3098: \label{triple}
3099: \VVert f \VVert _{_{s;p,q} }= \left( \sum_{j=0}^{\infty} \left(2^{j \sigma
3100: p} \sum_{
3101: \lambda \in \Lambda ,
3102: |\lambda |=j }|\left<f,\Psi_{\lambda}\right>|^p\right)^{q/p}\right)^{1/q} ~~.
3103: \end{equation}
3104: Then this norm is equivalent to the traditional Besov norm,
3105: $\VVert f \Vvert _{s;p,q}\sim\| f \|_{_{B^s_{p,q}}}$, that is, there exist
3106: strictly positive constants $A$ and $B$ such that
3107: \begin{equation}
3108: \label{Besnor}
3109: A \VVert f \VVert_{_{s;p,q}} \leq \| f \| _{_{B^s_{p,q}}}
3110: \leq B \VVert f \Vvert_{_{s;p,q}} ~.
3111: \end{equation}
3112: The condition that $\sigma \geq 0$ is imposed to ensure
3113: that $B^s_{p,q}(\R^d)$ is a subspace
3114: of $L^2(\R^d)$; we shall restrict ourselves to this case in this paper.
3115: From \eref{triple} one can gauge the fine-tuning role played by the
3116: parameter $q$
3117: in the definition of the Besov spaces. A particularly convenient choice, to
3118: which we
3119: shall stick in the remainder of this paper, is $q=p$,
3120: for which the expression simplifies
3121: to
3122: \begin{equation*}
3123: %\label{triple-simple}
3124: \VVert f \VVert_{_{s,p}} = \left( \sum_{\lambda \in \Lambda} 2^{\sigma
3125: p|\lambda|} ~
3126: | \left< f, \Psi_{\lambda} \right> |^p \right)^{1/p} ~~;
3127: \end{equation*}
3128: to alleviate notation, we shall drop the extra index $q$ wherever it
3129: normally occurs,
3130: on the understanding that $q=p$ when we do so.
3131:
3132: When $0<p,~q<1$, the Besov spaces can still be defined as complete metric spaces,
3133: although they are no longer Banach spaces (because (\ref{triple}) no longer is a norm),
3134: This allows for more local variability in local smoothness than
3135: is typical for functions
3136: in the usual H\"older
3137: or Sobolev spaces. For instance, a real function $f$ on $\R$ that is piecewise
3138: continuous, but for which each piece is locally in $C^s$, can be an element of
3139: $B^s_p(\R)$, despite the possibility of discontinuities at the
3140: transition from one piece to
3141: the next, provided $p>0$ is sufficiently small, and
3142: some technical conditions are met on the number and
3143: size of the discontinuities, and on the decay at $\infty$ of $f$.
3144: % Moreover,
3145: %this is
3146: %the case for any finite value of $s>0$.
3147:
3148: Wavelet bases are thus closely linked to a rich class
3149: of smoothness spaces; they also provide a good tool for high accuracy
3150: nonlinear approximation of a wide variety of functions.
3151: For instance, if the bounded function $f$
3152: on $[0,1]$ has only finitely many discontinuities, and is $C^s$ elsewhere,
3153: then one can find a way of renumbering (dependent on $f$ itself)
3154: the wavelets in the standard wavelet
3155: expansion of $f$, so that the distance in, say, $L^2([0,1])$ between $f$
3156: and the first $N$ terms of this reordered wavelet expansion, decreases
3157: proportionally to $N^{-s}$.
3158: If $s$ is large, it follows that a very accurate approximation to $f$ can be
3159: obtained with relatively few wavelets; this
3160: is possible because
3161: the smooth patches of the piecewise continuous
3162: $f$ will be well approximated by coarse scale wavelets, which are
3163: few in number; to capture the behavior of $f$ near
3164: the discontinuities much more localized
3165: finer scale wavelets are required, but only
3166: those wavelets located exactly near
3167: the discontinuities will be needed, which amounts
3168: again to a small number.
3169:
3170: In higher dimensions, $d > 1$, the suitability of wavelets
3171: is influenced by the
3172: dimension of the manifolds on which singularities occur. If the
3173: singularities in the
3174: functions of interest are solely point singularities, then expansions
3175: using $N$ wavelets can still approximate such functions with distances that
3176: decrease like $N^{-s}$,
3177: depending on their behavior away
3178: from the
3179: singularities. If, however, we are interested in $f$ that may have, e.g.
3180: discontinuities
3181: along manifolds of dimension higher than 0, then such wavelet
3182: approximations are not optimal.
3183: For instance, if $f:\R^2 \rightarrow \R$ is piecewise $C^L$, with
3184: possible jumps across the
3185: boundaries of the smoothness domains, which are themselves smooth
3186: (say, $C^L$ again) curves,
3187: then $N$-term wavelet approximations to $f$ cannot achieve an error
3188: rate decay faster than $N^{-1/2}$,
3189: regardless of the
3190: value of $L>1$.
3191:
3192: It follows that whenever we are faced with an inverse problem
3193: that needs regularization,
3194: in which the objects to be restored are expected to be mostly smooth,
3195: with very localized
3196: lower dimensional areas of singularities,
3197: we can expect that their expansions into wavelets
3198: will be sparse. This sparsity can be expressed by requiring that
3199: the wavelet coefficients (possibly with some scale-dependent weight)
3200: have a finite (or small) $\ell^p$-norm,
3201: with $1\leq p \leq 2$, or equivalently that the Besov-equivalent norm $\VVert f
3202: \VVert_{_{s,p}}$ is finite (or small), where $\VVert f \VVert_{_{s,p}}$
3203: is exactly of the form
3204: $\Phi_{\mathbf w,p}$ defined in \eref{funct-gen}.
3205:
3206: \section{ A fixed-point theorem}
3207: \label{Opial}
3208:
3209: We provide here the proof of the theorem needed to establish the weak
3210: convergence of the iterative algorithm. The theorem is given in \cite{Opi67};
3211: we give a simplified proof here (see the remark at the end),
3212: which nevertheless still follows the
3213: main lines of Opial's paper.
3214:
3215: \begin{theorem}
3216: \label{FPThm}
3217: Let ${\cal C}$ be a closed convex subset of the Hilbert space $\cH$ and let
3218: the mapping $\A : {\cal C} \to {\cal C}$ satisfy the following conditions:
3219: \begin{enumerate}
3220: \item[{\rm (i)}] $\A $ is non-expansive: $\| \A v - \A
3221: v' \| \leq \| v - v'\|,\ \forall v,v' \in {\cal C}$~,
3222: \item[\rm{ (ii)}] $\A $ is asymptotically regular: $\| \A ^{n+1}
3223: v -\A ^n v\| \xrightarrow[n \to \infty]{~} 0,\ \forall v \in
3224: {\cal C}$~,
3225: \item[\rm{ (iii)}] the set ${\cal F}$ of the fixed points of $\A $ in
3226: ${\cal C}$ is not empty~.
3227: \end{enumerate}
3228: Then, $\forall v \in \cal C$, the sequence $(\A ^n v)_{n \in \mathbb{N}}$
3229: converges weakly to a fixed point in ${\cal F}$.
3230: \end{theorem}
3231: The proof of the main theorem will follow from a series of lemmas.
3232: As before, we use the notation {\em w}$\,$--$\lim$ to indicate a {\em weak}
3233: limit.
3234: \begin{lemma}
3235: \label{FP1}
3236: If $u,v \in\cH$, and if $(v_n)_{n \in \N}$ is a sequence in $\cH$ such that
3237: w--$\lim_{n \to \infty}v_n = v$, and $u \neq
3238: v$, then
3239: $\lim\inf_{n \to \infty} \| v_n - u\| > \lim\inf_{n \to \infty}\| v_n - v\|~$.
3240: \end{lemma}
3241: {\em Proof:}
3242: We have $\lim\inf_{n \to \infty}\| v_n - u\|^2 $
3243: $= \lim\inf_{n \to \infty}\| v_n - v\|^2 +
3244: \| v - u\|^2 + 2 \lim_{n \to \infty} Re (v_n-v,v-u)$
3245: $= \lim\inf_{n \to \infty}\| v_n - v\|^2 +\| v - u\|^2~$,
3246: whence the result.
3247: \hfill\QED
3248:
3249: \bigskip
3250:
3251: \begin{lemma}
3252: \label{FP2}
3253: Suppose that $\A:\cal C \rightarrow \cal C$ satisfies condition
3254: {\rm(i)} in Theorem \ref{FPThm}.\\
3255: If w--$\lim_{n \to \infty}u_n= u$, and
3256: $\lim_{n \to \infty} \|u_n -\A u_n -h\| =0~$, then
3257: $h = u -\A u$\ .
3258: \end{lemma}
3259: {\em Proof:}
3260: Because of the non-expansivity of $\A $ (assumption (i)),
3261: we have
3262: $\| u_n - (h + \A u)\| $
3263: $\leq \| u_n - h - \A u_n\|$ $+\| \A u_n - \A u\| $
3264: $\leq \| u_n - h - \A u_n\|$ $ +\| u_n - u \|~$.
3265: Hence,
3266: \begin{eqnarray*}
3267: {\mathop{\rm lim\ inf}_{n \to \infty}}\ \| u_n - (h + \A u)\|
3268: &\leq&
3269: {\mathop{\rm lim}_{n \to \infty}}\ \| h - (u_n - \A u_n)\| +
3270: {\mathop{\rm lim\ inf}_{n \to \infty}}\ \| u_n - u\| \\
3271: &=&
3272: {\mathop{\rm lim\ inf}_{n \to \infty}}\ \| u_n - u\|
3273: \end{eqnarray*}
3274: It then follows from Lemma \ref{FP1} that $u = h + \A u$ or $h = u -
3275: \A u$.
3276: \hfill\QED
3277:
3278: \bigskip
3279:
3280: \begin{lemma}
3281: \label{FP3}
3282: Suppose that $\A :\,\cal C \rightarrow \cal C$ satisfies conditions
3283: {\rm(i)} and {\rm(ii)} in Theorem \ref{FPThm}.
3284: If a subsequence of $(\A ^n v)_{n\in \mathbb{N}}$, with $v \in {\cal C}$,
3285: converges weakly in ${\cal C}$, then its limit is in $\cal F$.
3286: \end{lemma}
3287: {\em Proof:}
3288: Suppose {\em w}$\,$--$\lim_{k \to \infty}\A ^{n_k}v= u~$.
3289: Since, by the assumption (ii) of asymptotic
3290: regularity,
3291: $\lim_{n \to \infty}\|\A ^{n} v - \A \A ^{n} v \|=0~$,
3292: we have
3293: $\lim_{k \to \infty}\|\A ^{n_k} v - \A \A ^{n_k} v \|=0~$.
3294: By Lemma \ref{FP2}, it
3295: follows that $u - \A u = 0$ , i.e. that $u$ is in $\cal F$.
3296: \hfill\QED
3297:
3298: \bigskip
3299:
3300: \begin{lemma}
3301: \label{FP4}
3302: Suppose that $\A :\,\cal C \rightarrow \cal C$ satisfies conditions
3303: {\rm(i)} and {\rm(iii)} in Theorem \ref{FPThm}. Then,
3304: for all $h \in {\cal F}$, and all $v \in {\cal C}$, the sequence
3305: $(\|\A ^n v -
3306: h\|)_{n\in \N}\ $ is non-increasing and thus has a limit.
3307: \end{lemma}
3308: {\em Proof:}
3309: Since $\A $ is non-expansive, we have indeed
3310: $\| \A ^{n+1} v - h \|$
3311: $= \| \A \A ^n v -\A h\| $
3312: $\leq \| \A ^{n} v - h \|~.$
3313: $~~~~~~~~~~~~~~~~~~~~~~~~$\hfill\QED
3314:
3315: \bigskip
3316:
3317: We can now proceed to the
3318:
3319: \bigskip
3320:
3321: \noindent
3322: {\bf Proof of Theorem \ref{FPThm}}
3323:
3324: \noindent
3325: Let $v$ be any element in $\cal{C}$. Take an arbitrary $h \in {\cal F}$.
3326: By Lemma \ref{FP4}, we then have \\
3327: $\lim\sup_{n \to \infty}\| \A ^n v\|$
3328: $ \leq
3329: \lim\sup_{n \to \infty} \|\A ^n v - h\|$
3330: $+\| h\|$
3331: $ = \| h\| $ $+ \lim_{n \to \infty}\ \|\A ^n v - h\|$
3332: $< \infty~$.
3333:
3334: \noindent
3335: Since the $\|\A ^n v\|$ are thus uniformly bounded,
3336: it follows from the Banach-Alaoglu theorem that they must have at least
3337: one weak accumulation point.
3338:
3339: \noindent
3340: The following argument shows that this
3341: accumulation point is unique.
3342: Suppose we have two different accumulation points :
3343: {\em w}$\,$--$\lim_{k \to \infty}\A ^{n_k} v =u$, and
3344: {\em w}$\,$--$\lim_{\ell \to \infty}\A ^{{\tilde n}_\ell} v =\tilde{u}~$,
3345: with $u \neq {\tilde u}$.
3346:
3347: \noindent
3348: By Lemma \ref{FP3}, $u$ and $\tilde u$ must both lie in $\cal F$,
3349: and by Lemma \ref{FP4},
3350: the limits $\lim_{n \to \infty} \|\A ^n v -
3351: u\|$ and $\lim_{n \to \infty} \|\A ^n v -
3352: {\tilde u} \|$ both exist.
3353:
3354: \noindent
3355: Since $\tilde{u} \neq u$, we obtain from Lemma \ref{FP1} that
3356: $\lim\inf_{k \to \infty} \|\A ^{n_k} v - {\tilde u}\| $
3357: $ > {\lim\inf}_{k \to \infty} \|\A ^{n_k} v - u\|\ .$
3358: On the other hand,
3359: because $(\|\A ^{n_k} v-{\tilde u}\|)_{k\in \mathbb{N}}$ and
3360: $(\|\A ^{n_k} v-u\|)_{k\in \mathbb{N}}$
3361: are each a
3362: subsequence of a convergent sequence,
3363: $\lim\inf_{k \to \infty} \|\A ^{n_k} v - {\tilde u}\|$ =
3364: $\lim_{n \to \infty} \|\A ^n v - {\tilde u}\|$ and
3365: $\lim\inf_{k \to \infty} \|\A ^{n_k} v - u\|$ =
3366: $ \lim_{k \to \infty} \|\A ^{n_k} v - u\|~$.
3367: It follows that
3368: $\lim_{n \to \infty} \|\A ^{n} v - {\tilde
3369: u}\|$ $> \lim_{n \to \infty} \|\A ^n v -u\| ~$.
3370: In a completely analogous way (working with the subsequence
3371: $\A ^{{\tilde n}_l} v$ instead of $\A ^{n_k} v$) one derives
3372: the opposite strict inequality. Since both cannot be valid
3373: simultaneously, the assumption of the existence of two different
3374: weak accumulation points for $(\A ^n v)_{n\in \N}$ is false.
3375:
3376: \noindent
3377: It thus follows that $\A ^n v$ converges weakly to this unique
3378: weak accumulation point.
3379: \hfill\QED
3380:
3381: \bigskip
3382:
3383: \begin{remark}
3384: {\rm It is essential to require that the set $\cal F$ is not empty since
3385: there are
3386: asymptotically regular, non-expansive maps that possess no fixed point.
3387: However, the only place where we used this assumption was in showing that the
3388: $\|\A ^n v \|$ were bounded. If one can prove this boundedness by
3389: some other means (e.g. by a variational principle as we did in the iterative
3390: algorithm), then we automatically have a weakly convergent subsequence
3391: $(\|\A ^{n_k} v\|)_{k\in \mathbb{N}}$, and thus, by Lemma
3392: \ref{FP3}, an element of $\cal F$.}
3393: \end{remark}
3394:
3395: \begin{remark}
3396: {\rm The simplification of the original argument of \cite{Opi67}
3397: (obtained through
3398: deriving the contradiction in the proof of Theorem \ref{FPThm}) avoids having
3399: to appeal to the convexity of $\cal F$ (which is true but not immediately
3400: obvious) and having to introduce the auxiliary sets $\cal F_\delta$ used in
3401: \cite{Opi67}.}
3402: \end{remark}
3403:
3404: \end{document}
3405:
3406: -----------------------
3407:
3408: \subsection*{1.4.1 Sparse wavelet expansions.}
3409:
3410: Wavelets provide orthonormal bases of $L^2(\R^d)$ with localization
3411: in space and in scale; this makes them more suitable than e.g.
3412: the Fourier representation for an efficient
3413: representation of functions that have space-varying smoothness properties.
3414: Appendix \ref{WavBes} gives a very succinct overview of wavelets and their
3415: link with
3416: a particular family of smoothness spaces, the Besov spaces. Essentially,
3417: the Besov space $B^s_{p,q}(\R^d)$ is a space of functions on
3418: $\R^d$ that ``have $s$ derivatives in $L^p(R^d)$''; the
3419: index $q$ provides some extra fine-tuning. The
3420: precise definition involves the moduli of continuity of the function,
3421: defined by finite differencing, instead of derivatives, and combines
3422: the behavior of these moduli at different scales. The result is that
3423: functions that are mostly smooth, but have a few local
3424: ``irregularities'', nevertheless can still belong to a Besov space with
3425: high smoothness index. For instance, the 1-dimensional function
3426: $F(x)= \mbox{sign} (x) e^{-x^2}$ belongs to $B^s_{p,q}(\R)$ for
3427: arbitrarily large $s$, and all $p,q \in [1,\infty)$. (Note that
3428: this same example does not belong to any of the Sobolev spaces
3429: $W^s_p(\R)$ with $s>0$.) As reviewed in Appendix \ref{WavBes},
3430: wavelet expansions provide an equivalent norm for the Besov spaces,
3431: which is particularly simple in the case $p=q$, to which we shall restrict
3432: ourselves here. We denote this norm by $\VVert ~ \VVert_{ _{s,p}}$;
3433: it is defined by
3434: (see Appendix \ref{WavBes})
3435: \begin{equation}
3436: %\label{triple-simple}
3437: \VVert f \VVert_{ _{s,p}} = \left( \sum_{\lambda \in \Lambda} 2^{\sigma p
3438: |\lambda|} |\left<f, \Psi_{\lambda} \right> |^p \right) ^{1/p}~,
3439: \end{equation}
3440: where $\sigma$ depends on $s,p$ and is defined by $\sigma =s +d \left(
3441: \frac{1}{2}-\frac{1}{p} \right)$,
3442: and where $|\lambda|$ stands for the scale of the wavelet
3443: $\Psi_{\lambda}$. (The $\frac{1}{2}$ in the formula for $\sigma$ is
3444: due to the choice of normalization
3445: of the $\Psi_{\lambda}$, $\|\Psi_{\lambda}\|_{_{L^2}} =1$.)
3446:
3447:
3448: It follows that minimizing
3449: the variational functional for an inverse
3450: problem with a Besov space prior falls exactly within the category of
3451: problems studied in this paper: for such an inverse problem,
3452: with operator $K$ and with
3453: the a priori knowledge that the object lies in some $B^s_{p,p}$, it
3454: is natural to define the
3455: variational functional to be minimized
3456: by
3457: $$
3458: \Delta(f)+ \VVert f \VVert _{ _{s,p}}^p = \|Kf-g\|^2 +\mu \sum_{\lambda \in
3459: \Lambda} 2^{\sigma p |\lambda|} |\left< f, \Psi_{\lambda} \right> |^p ~~,
3460: $$
3461: which is exactly of the type
3462: $\Phi_{\mathbf{w}},p}(f)$, as defined in \eref{funct-gen}.
3463: For the case where $K$ is the identity operator {\bf and $\sigma =0$ ??}, it
3464: was noted already in \cite{Cha98}
3465: that the wavelet-based algorithm for denoising
3466: of data with a Besov prior, derived earlier in \cite{Don94},
3467: amounts exactly to the minimization of
3468: $\Phi_{\mu\mathbf{w}_{ _0},1}(f)$, where
3469: $K$ is the identity operator and the $\vpg$-basis is a wavelet basis; the
3470: denoised approximant given in \cite{Don94} then coincides
3471: with (\ref{simple},\ref{stau}) with $\tau=\mu$.
3472:
3473: It should be noted that if $d > 1$, and if we are interested in functions that
3474: are mostly smooth, with possible jump discontinuities
3475: (or other ``irregularities'')
3476: on smooth manifolds of dimension 1 or higher (i.e. not point irregularities),
3477: then the Besov spaces do not constitute the optnite set of smooth curves,
3478: belong to
3479: $B^1_{1,1}([0,1]^2)$, but not to $B^s_{1,1}([0,1]^2)$ for $s>1$.
3480: In order to obtain
3481: more efficient (sparser) expansions of this type of functions, other bases
3482: have to be used, such as ridgelets or curvelets.
3483: {\bf References!}
3484: One can then again use the approach in this paper, with respect to these
3485: more adapted bases.
3486:
3487: ********
3488:
3489: \section{Wavelets and Besov spaces}
3490: \label{WavBes}
3491:
3492: We give a brief review of basic definitions of wavelets and their
3493: connection with Besov spaces.
3494: This will be a sketch only; for details, we direct the reader to e.g.
3495: \cite{Mal98, Meyer, Lou97, Coh00, DeVore}.
3496:
3497: For simplicity we start with dimension 1. Starting from a (very special)
3498: function
3499: $\psi$ we define\begin{equation*}\psi_{j,k}(x)= 2^{j/2}\ \psi(2^j x-k) ~,
3500: j,k \in \Z~,\end{equation*}
3501: and we assume that the collection $\{\psi_{j,k}; j,~k \in \Z\}$
3502: constitutes an orthonormal basis of $L^2(\mathbb R)$.
3503: For all wavelet bases used in practical applications, there also exists an
3504: associated
3505: {\em scaling function} $a$, which is orthogonal to its
3506: translates by integers, and such
3507: that, for all $j \in \Z$,
3508: \begin{equation}
3509: \label{MRA}
3510: \overline{\mbox{Span}\{_{j,k}; k \in \Z\}}~\mbox{\small{$\bigoplus$}}
3511: ~\overline
3512: {\mbox{Span}\{\psi_{j,k}; k \in \Z\}}
3513: = \overline{\mbox{Span}\{a_{j+1,k}; k \in \Z\}} ~,
3514: \end{equation}
3515: where the $a_{j,k}$ are defined analogously to the $\psi_{j,k}$.
3516: Typically, the functions $a$ and $\psi$ are very well localized, in the sense
3517: that $\forall N \in \N$, $\int_{\R} (1+|x|)^N(|a(x)|+|\psi(x)|) dx <
3518: \infty$;
3519: one can even
3520: choose $a$ and $\psi$ such that they are supported on a finite interval.
3521: This can be
3522: achieved with arbitrary finite smoothness, i.e. for any preassigned $L \in
3523: \N$, one can
3524: find such $a$ and $\psi$ that are moreover in $C^L(\R)$. Because of
3525: \eref{MRA},
3526: one can consider (inhomogeneous) wavelet expansions, in which not all
3527: scales $j$
3528: are used,
3529: but a cut-off is introduced at some coarsest scale, often set at $j=0$.
3530: More precisely,
3531: we shall use the following wavelet expansion
3532: of $f
3533: \in L^2$,
3534: \begin{equation}
3535: \label{inhMRA}
3536: f= \sum_{k=-\infty}^{+\infty} \left<f,a_{0,k}\right>\ a_{0,k} +
3537: \sum_{j=0}^{+\infty}
3538: \sum_{k=-\infty}^{+\infty} \left<f,\psi_{j,k}\right>\ \psi_{j,k}~.
3539: \end{equation}
3540: Wavelet bases in
3541: higher dimensions can be built by taking appropriate products of
3542: one-dimensional
3543: wavelet and scaling functions. Such $d$-dimensional bases can be viewed as the
3544: result of translating (by elements $k$ of $\Z^d$) and
3545: dilating (by integer powers $j$ of 2) of
3546: not just one, but several
3547: (finite in number) ``mother wavelets'', typically numbered from
3548: 1 to $2^d-1$.
3549: It will be convenient to abbreviate the full label (including $j$, $k$ and the
3550: number of the mother wavelet) to just $\lambda$, with the convention that
3551: $|\lambda|=j$.
3552: We shall again cut off at some coarsest scale, and we shall follow the
3553: convenient
3554: slight abuse
3555: of notation used in \cite{CohDahmDev} that sweeps up the coarsest-$j$
3556: scaling functions
3557: (as in \eref{inhMRA}) into the $\Psi_{\lambda}$ as well. We thus denote the
3558: complete
3559: $d$-dimensional, inhomogeneous wavelet basis by
3560: $\{\Psi_{\lambda}; \lambda \in \Lambda\}$.
3561:
3562: It turns out that $\{\Psi_{\lambda}; \lambda \in \Lambda\}$
3563: is not only an orthonormal basis
3564: for $L^2(\R^d)$, but also an unconditional basis for a variety of other
3565: useful
3566: Banach spaces of functions, such
3567: as H\"older spaces, Sobolev spaces and, more generally, Besov spaces.
3568: Again, we review only some basic facts; a full study can be found in e.g.
3569: \cite{Meyer, Coh00,
3570: DeVore}. The Besov spaces $B^s_{p,q}(\R^d)$
3571: consist, basically, of functions that ``have $s$ derivatives in $L^p$'';
3572: the parameter $q$ provides
3573: some additional fine-tuning to the definition of these spaces. The norm
3574: $\|f\|_{_{B^s_{p,q}}}$ in a Besov space $B^s_{p,q}$ is traditionally
3575: defined via the
3576: {\em modulus of continuity} of $f$ in $L^p(\R)$, of which an additional
3577: weighted $L^q$-norm
3578: is then taken, in which the integral is over different scales.
3579: We shall not give its details here; for our purposes it suffices that
3580: this traditional Besov norm is equivalent with a norm that can be computed from
3581: wavelet coefficients. More precisely, let us assume that the original
3582: 1-dimensional $a$ and
3583: $\psi$ are in $C^L(\R)$, with $L>s$, that
3584: $\sigma=s+d(\frac{1}{2}-\frac{1}{p}) \geq 0$,
3585: and define the norm $\VVert \cdot \VVert_{_{s;p,q}}$ by
3586: \begin{equation}
3587: \label{triple}
3588: \VVert f \VVert _{_{s;p,q} }= \left( \sum_{j=0}^{\infty} \left(2^{j \sigma
3589: p} \sum_{
3590: \lambda \in \Lambda ,
3591: |\lambda |=j }|\left<f,\Psi_{\lambda}\right>|^p\right)^{q/p}\right)^{1/q} ~~.
3592: \end{equation}
3593: Then this norm is equivalent to the traditional Besov norm,
3594: $\VVert f \Vvert _{s;p,q}\sim\| f \|_{_{B^s_{p,q}}}$, that is, there exist
3595: strictly positive constants $A$ and $B$ such that
3596: \begin{equation}
3597: \label{Besnor}
3598: A \VVert f \VVert_{_{s;p,q}} \leq \| f \| _{_{B^s_{p,q}}}
3599: \leq B \VVert f \Vvert_{_{s;p,q}} ~.
3600: \end{equation}
3601: The condition that $\sigma \geq 0$ is imposed to ensure
3602: that $B^s_{p,q}(\R^d)$ is a subspace
3603: of $L^2(\R^d)$; we shall restrict ourselves to this case in this paper.
3604: >From \eref{triple} one can gauge the fine-tuning role played by the
3605: parameter $q$
3606: in the definition of the Besov spaces. A particularly convenient choice, to
3607: which we
3608: shall stick in the remainder of this paper, is $q=p$,
3609: for which the expression simplifies
3610: to
3611: \begin{equation}
3612: \label{triple-simple}
3613: \VVert f \VVert_{_{s,p}} = \left( \sum_{\lambda \in \Lambda} 2^{\sigma p
3614: \|\lambda|} ~
3615: | \left< f, \Psi_{\lambda} \right> |^p \right)^{1/p} ~~;
3616: \end{equation}
3617: to alleviate notation, we shall drop the extra index $q$ wherever it
3618: normally occurs,
3619: on the understanding that $q=p$ when we do so.
3620:
3621: Besov spaces allow more local variability in local smoothness than
3622: is typical for functions
3623: in the usual H\"older
3624: or Sobolev spaces. For instance, a real function $f$ on $\R$ that is piecewise
3625: continuous, but for which each piece is locally in $C^s$, can be an element of
3626: $B^s_1(\R)$, despite the possibility of discontinuities at the
3627: transition from one piece to
3628: the next, provided some technical conditions are met on the number and
3629: size of the discontinuities, and on the decay at $\infty$ of $f$. Moreover,
3630: this is
3631: the case for any finite value of $s>0$.
3632: This observation makes the Besov spaces particularly suitable when we are
3633: interested
3634: in functions that have such locally variable smoothness characteristics.
3635: In higher dimensions, $d > 1$, the suitability of the Besov spaces
3636: is influenced by the
3637: dimension of the manifolds on which singularities occur. If the
3638: singularities in the
3639: functions of interest are solely point singularities, then these
3640: functions can still lie
3641: in Besov spaces with high values for $s$, depending on their behavior away
3642: from the
3643: singularities. If, however, we are interested in $f$ that may have, e.g.
3644: discontinuities
3645: along manifolds of dimension higher than $0$, then Besov spaces are not
3646: optimal.
3647: For instance, if $f:\R^2 \rightarrow \R$ is piecewise $C^L$, with
3648: possible jumps across the
3649: boundaries of the smoothness domains, which are themselves smooth
3650: (say, $C^L$ again) curves,
3651: then $f$ cannot lie in a Besov space of smoothness larger than 1,
3652: regardless of the
3653: value of $L>1$. Nevertheless, even an index $s=1$ is a gain over what would
3654: have been
3655: obtained by a H\"older or $L^2$-based Sobolev characterization.
3656:
3657: Since wavelet bases allow particularly easy characterizations of Besov spaces,
3658: as shown by (\ref{triple}, \ref{Besnor}), it follows that they are
3659: also well suited (and in
3660: 1 dimension, optimally suited) for the functions with local smoothness
3661: variability
3662: described above. Moreover, the smooth patches of these piecewise continuous
3663: functions
3664: will be well approximated by coarse scale wavelets, of which there are much
3665: fewer, while
3666: the discontinuities will be well captured by the much more localized
3667: finer scale wavelets,
3668: of which only select subfamilies will be needed, located exactly near
3669: the discontinuities.
3670:
3671: It follows that whenever we are faced with an inverse problem
3672: that needs regularization,
3673: in which the objects to be restored are expected to be mostly smooth,
3674: with very localized
3675: lower dimensional areas of singularities, we have a ``natural'' framework
3676: in the
3677: family of Besov spaces, and we can expect that their expansions into wavelets
3678: will be sparse.
3679: Moreover, the $p$-th power of the Besov-equivalent norm $\VVert f
3680: \VVert_{_{s,p}}$
3681: is exactly of the form
3682: $\Phi_{\mathbf w,p}$ defined in \eref{triple-norm}.
3683:
3684: ********
3685:
3686: Remember that the iterative algorithm goes as follows
3687: \begin{itemize}
3688: \item Choose $f^0$ arbitrarily.
3689: \item Define the $(n+1)^{\rm th}$ iterate as
3690: \begin{equation} f^{n+1}={\mathbb T}\ f^n\ .
3691: \end{equation}
3692: \end{itemize}
3693: where the mapping $\mathbb T$ is defined by
3694: \begin{equation} {\mathbb T}\ a ={\mathop{\rm arg\ min}_{f}}\ \Psi_{a}(f)\ .
3695: \label{defT}
3696: \end{equation}
3697: Using an expansion of $a$ on a basis of $L^2$ which diagonalizes the prior
3698: $\Omega(f)$, e.g. on the wavelet basis $\{\psi_\lambda\}$, the variational
3699: problem (\ref{defT}) can be solved through one-dimensional minimization
3700: problems, by using
3701: the minimizer $S(y)$
3702: of the following function of the real variable $x$
3703: \begin{equation}
3704: \varphi(x) = (x-y)^2 + 2\tau \vert x \vert^p \label{1dfunc}
3705: \end{equation}
3706: for a given $y \in {\mathbb R}$ and $\tau > 0$.
3707: The properties of $S(y)$ are investigated in Appendix B. When $p=1$, $S(y)$ has
3708: an explicit expression in terms of thresholding (with a threshold depending
3709: on $\mu$ and $\sigma$) whereas for $1 < p < 2$ it has to be determined
3710: numerically but also acts as a nonlinear shrinkage of the data.
3711: We define the corresponding shrinkage operator${\S}$ as
3712: \begin{equation}
3713: {\S}\ a=\sum_\lambda S(a_\lambda) \ \psi_\lambda\ ;
3714: \ a_\lambda=(a,\psi_\lambda)\ . \label{Sbb}
3715: \end{equation)
3716: \begin{proposition} \label{mapT}
3717: The vector defined by
3718: \begin{equation}
3719: {\mathbb T}\; a= {\S}\; [a + (K^*g-K^*Ka)] \label{Ta}
3720: \end{equation}
3721: where the shrinkage operator is defined by (\ref{Sbb}), provides a
3722: minimizer for
3723: the functional
3724: $\Psi_a(f)$, i.e. we have
3725: \begin{equation}
3726: \Psi_a({\mathbb T}\; a +h) \geq \Psi_a({\mathbb T}\; a) + \| h \|^2\
3727: ,\quad
3728: \forall h \in L^2\ .\label{Parmin}
3729: \end{equation}
3730: \end{proposition}
3731: {\em Proof:}
3732: We notice that $\Psi_a$ can be rewritten
3733: as follows
3734: \begin{equation}
3735: \Psi_a(f) = \| f\|^2 - 2(f,c)+\mu \Vvert f \Vvert^p
3736: +\| g\|^2 +\| a\|^2 - \| Ka\|^2\ .
3737: \end{equation}
3738: where $c$ is defined by (\ref{def c}), or else, using expansions on the basis
3739: $\{\psi_\lambda\}$,
3740: \begin{equation}
3741: \Psi_a(f) = \sum_\lambda [\vert f_\lambda - c_\lambda \vert^2 + \mu\
3742: 2^{|\lambda|\sigma p} \vert f_\lambda \vert^p] + {\rm terms\
3743: independent\ of\ } f.
3744: \end{equation}
3745: We see that
3746: each term in the sum is minimized separately by the minimizer $S(c_\lambda)$ of
3747: the function (\ref{1dfunc}) for $x=f_\lambda$, $y=c_\lambda$ and $2\tau =
3748: \mu\ 2^{|\lambda|\sigma p}$. The properties of this minimizer are
3749: studied in detail in Appendix B. The property (\ref{Parmin}) is an immediate
3750: consequence of Lemma \ref{1dparmin}.
3751: \hfill\QED\bigskip
3752:
3753: ---------------------
3754:
3755: \newline
3756: When the null-space $\mN(K)$ of $K$ is non-trivial, let us denote
3757: by ${\calS}=\{f\ |\Kf=g_o\}$ the set of exact solutions corresponding
3758: to the noise-free data$g_o$. Since${\cal S}$ is a closed and convex set
3759: with respect to both the $L^2$ andBanach norm$\Vvert \cdot \Vvert$, it
3760: contains an element $f^{\maltese}$ which hasminimal Banachnorm:
3761: \begin{equation}
3762: f^{\maltese} = {\mathop{\rm arg\ min}_{f \in {\cal S}}}\ \Vvert f \Vvert\ .
3763: \label{defminBnorm}
3764: \end{equation}The uniqueness of such minimizer is guaranteed when $p>1$,
3765: whereas
3766: there might by more than one element satisfying (\ref{defminBnorm}) when $p=1$.
3767: Notice that in the classical case where the Banach norm coincides with
3768: the$L^2$-norm,
3769: $f^{\maltese}$ is the usual generalized solution $f^\dagger$.
3770: When excluding the case where both $p=1$ and $N(K)\neq\{0\}$, we can now
3771: prove that the minimization of (\ref{phimu}) is indeed a regularization
3772: method, in the
3773: sense that the requirement (\ref{regprop}), with $f^{\maltese}$ replacing
3774: $f_o$ if$N(K)\neq\{0\}$, canbe met by an appropriate choice of the
3775: regularization
3776: parameter. Let usremark that inthe case where$\sigma$ is strictly positive,
3777: i.e.$s > \frac{d}{2p} (2-p)$, then $B^s_{p,p}$ is compactly embedded in
3778: $L^2(\mathbb R^d)$, and regularization follows from general compactness
3779: results, as will be recalled and shown subsequently. However, we want first
3780:
3781:
3782: to prove suchresult in the general setting which includes the case $\sigma
3783: = 0$, and
3784: therefore we needsome more work to show that we have defined a
3785: regularization method
3786:
3787:
3788: ___________
3789:
3790: RESERVE MATERIAL ON FRAMES
3791:
3792:
3793: \end{document}
3794:
3795:
3796: A simple example, the setting
3797: of which we borrow from several presentations by Donoho, is given by
3798: $\cH = \C^N$, in which we consider two different orthonormal bases
3799: $\{u_1, \cdots, u_n \}$ and $\{u_{N+1}, \cdots , u_{2N} \}$ defined by
3800: \begin{eqnarray*}
3801: (u_l)_{ _{k}} &=& \delta_{l,k}~~~~~~~~~~~~~~~~~~\mbox{if} ~ l \in \{1,
3802: \cdots,N\}
3803: \\
3804: (u_l)_{ _{k}} &=& \frac{1}{\sqrt{N}} e^{-2 \pi i (l-1)(k-1)/N} ~~~~~
3805: \mbox{if } l \in \{N+1, \cdots,2N\} ~;
3806: \end{eqnarray*}
3807: these are the ``pulse'' and ``FFT'' bases, respectively. Note that
3808: $|\left<u_l,u_k \right>| = \frac{1}{\sqrt{N}}$ if $|l-k| \geq N$, i.e.
3809: each of
3810: these basis vectors has a very non-sparse expansion in the other basis.
3811: Define a tight frame by setting $\psi_k= \frac{1}{2} u_k$, $k=1, \cdots,
3812: 2N$;
3813: one has then, for all $v \in \cH$,
3814: $$
3815: KK^*v = \frac{1}{4} \sum_{k=1}{N} \left<v,u_k\right> u_k
3816: + \frac{1}{4} \sum_{k=N+1}{2N} \left<v,u_k\right> u_k = \frac{1}{2}~ v
3817: ~.
3818: $$
3819: The Gramm operator $K^*K$ has 4 blocks:
3820: \begin{eqnarray*}
3821: (K^*K)k,l &=& \frac{1}{4} \delta_{k,l} ~~~~~~~~\mbox{if } 1 \leq k \leq
3822: N, ~
3823: 1 \leq l \leq N ~, \\
3824: && \frac{1}{4\sqrt{N}} e{2 \pi i (k-1)(l-1)/N} ~~~~ \mbox{if }
3825: 1 \leq k \leq N, ~
3826: N+1 \leq l \leq 2N ~, \\
3827: && \frac{1}{4\sqrt{N}} e{-2 \pi i (k-1)(l-1)/N} ~~~~ \mbox{if }
3828: N+1 \leq k \leq 2N, ~
3829: 1 \leq l \leq N ~, \\
3830: && \frac{1}{4} \delta_{k,l} ~~~~~~~~\mbox{if } N+1 \leq k \leq 2N, ~
3831: N+1 \leq l \leq 2N ~.
3832: \end{eqnarray*}
3833: Then $v=K \tilde{\mathbf{z}}$ for the very sparse sequence
3834: defined by $\tilde{z}_n=\delta_{n,1}+ \delta_{n,N+1}$,
3835: with $\|\tilde{\mathbf{z}}\|_{\ell^1} =2$,
3836: $\|\tilde{\mathbf{z}}\|_{\ell^2}=1$;
3837: the sequence of minimal $\ell^2$-norm
3838: $\mathbf{z}^{\dagger}=K^{\dagger}v$ is given by
3839: $(z^{\dagger})_n=\frac{1}{2} \delta_{n,1}+\frac{1}{2} \delta_{n,N+1}
3840: +\frac{1}{2\sqrt{N}}$, for which
3841: $\|z^{\dagger}\|_{\ell^1}=1 + \sqrt{N}$,
3842: $\|z^{\dagger}\|_{\ell^2}=1$.
3843: For $\mu= \frac{1}{2\sqrt{N}}$, the minimizer $\tilde{\mathbf{z}}_{\mu}$
3844:
3845: of \eref{frame-3}, given by a straightforward
3846: application of the algorithm in this paper, is
3847: defined by
3848: $$
3849: (\tilde{z}_{\mu})_n = \left( 1- \frac{2}{\sqrt{N}+1} \right)
3850: \delta_{n,1}
3851: +
3852: \left( 1- \frac{2}{\sqrt{N}+1} \right) \delta_{n,N+1} ~;
3853: $$
3854: it has $\|\tilde{\mathbf{z}}_{\mu}\|_{\ell^1}=2-\frac{4}{sqrt{N}+1}$,
3855: and
3856: $\|K(\tilde{\mathbf{z}}_{\mu}-v\|_{\ell^2}=\frac{1}{\sqrt{N}}$.
3857:
3858:
3859: \end{document}
3860:
3861:
3862: