0502:math0502299/ecc.tex

1: \documentclass[11pt,reqno]{amsart}

2: \usepackage{amsmath, amssymb, amsthm}

3: \usepackage{graphicx}

4:

5: \numberwithin{equation}{section}

6:

7: %\renewcommand{\baselinestretch}{1.5}  %1.5 spacing

8: %\pagestyle{myheadings}

9: %\markright{}

10: %\numberwithin{equation}{section}

11:

12:

13: %\parindent=1em

14: %\baselineskip 15pt

15: \hsize=14cm \textwidth=14cm

16: %\hsize=12.3cm \textwidth=12.3cm

17: %\vsize=18.5cm \textheight=18.5cm

18:

19: \newtheorem{theorem}{Theorem}[section]

20: \newtheorem{definition}[theorem]{Definition}

21: \newtheorem{proposition}[theorem]{Proposition}

22: \newtheorem{corollary}[theorem]{Corollary}

23: \newtheorem{lemma}[theorem]{Lemma}

24: \newtheorem{conjecture}[theorem]{Conjecture}

25: \newtheorem{fact}[theorem]{Fact}

26: \newtheorem*{general Gromov'}{Corollary \ref{general Gromov}$'$}

27:

28: \def \proof {\noindent {\bf Proof.}\ \ }

29: \def \remark {\noindent {\bf Remark.}\ \ }

30: \def \remarks {\noindent {\bf Remarks.}\ \ }

31: \def \example {\vspace{0.5cm} \noindent {\bf  Example.}\ \ }

32: \def \endproof {{\mbox{}\nolinebreak\hfill\rule{2mm}{2mm}\par\medbreak}}

33: \newcommand{\margin}[1]{\marginpar{\scriptsize #1}}

34:

35: \DeclareMathOperator*{\Ave}{Ave}

36:

37: \def \N {\mathbb{N}}

38: \def \R {\mathbb{R}}

39: \def \C {\mathbb{C}}

40: \def \Q {\mathbb{Q}}

41: \def \Z {\mathbb{Z}}

42: \def \E {\mathbb{E}}

43: \def \F {\mathbb{F}}

44: \def \G {\mathbb{G}}

45: \def \P {\mathbb{P}}

46: \def \T {\mathbb{T}}

47: \def \I {\mathbb{I}}

48: \def \one {{\bf 1}}

49: \def \EE {\mathcal{E}}

50: \def \NN {\mathcal{N}}

51: \def \CC {\mathcal{C}}

52: \def \MM {\mathcal{M}}

53: \def \OO {\mathcal{O}}

54: \def \PP {\mathcal{P}}

55: \def \SS {\mathcal{S}}

56: \def \QQ {\mathcal{Q}}

57: \def \a {\alpha}

58: \def \b {\beta}

59: \def \g {\gamma}

60: \def \e {\varepsilon}

61: \def \eps {\varepsilon}

62: \def \d {\delta}

63: \def \D {\Delta}

64: \def \f {\varphi}

65: \def \k {\kappa}

66: \def \l {\lambda}

67: \def \L {\Lambda}

68: \def \s {\sigma}

69: \def \t {\tau}

70: \def \om {\omega}

71: \def \w {\omega}

72: \def \W {\Omega}

73: \def \< {\langle}

74: \def \> {\rangle}

75: \def \absconv {{\rm abs.conv}}

76: \def \sign {{\rm sign}}

77: \def \dist {{\rm dist}}

78: \def \diam {{\rm diam}}

79: \def \Span {{\rm span}}

80: \def \rank {{\rm rank }}

81: \def \range {{\rm range }}

82: \def \trace {{\rm trace}}

83: \def \diag {{\rm diag}}

84: \def \conv {{\rm conv}}

85: \def \lin {{\rm lin}}

86: \def \aff {{\rm aff}}

87: \def \HS {{\rm HS}}

88: \def \Prob {{\rm Prob}}

89: \def \id {{\it id}}

90: \def \im {{\rm Im}}

91: \def \vol {{\rm vol}}

92: \def \Lip {{\rm Lip}}

93: \def \supp {{\rm supp}}

94: \def \bi {B\bigl(L_\infty(\Omega)\bigr)}

95: \def \Ball {{\rm Ball}}

96: \def \const {{\rm const}}

97:

98:

99:

100:

101:

102: \begin{document}

103: \title [Geometric approach to error correcting codes and signal recovery]

104:        {Geometric approach to error correcting codes

105:         and reconstruction of signals}

106:

107: \author{Mark Rudelson}

108: \address{Departent of Mathematics, University of Missouri, Columbia, MO 65211, U.S.A.}

109: \email{rudelson@math.missouri.edu}

110:

111: \author{Roman Vershynin}

112: \address{Departent of Mathematics, University of California, Davis, CA 95616, U.S.A.}

113: \email{vershynin@math.ucdavis.edu}

114:

115: \thanks{The first author is partially supported by the NSF grant DMS 0245380.

116:   The second author is partially supported by the NSF grant DMS 0401032

117:   and by the Miller Scholarship from the University of

118:   Missouri-Columbia. }

119:

120: \subjclass[2000]{46B07, 94B75, 68P30, 52B05}

121:

122: \begin{abstract}

123: We develop an approach through geometric functional analysis

124: to error correcting codes and to reconstruction of signals

125: from few linear measurements. An error correcting code encodes

126: an $n$-letter word $x$ into an $m$-letter word $y$

127: in such a way that $x$ can be decoded correctly when any $r$ letters

128: of $y$ are corrupted. We prove that most linear orthogonal

129: transformations $Q : \R^n \to \R^m$ form efficient and robust robust

130: error correcting codes over reals. The decoder (which corrects the corrupted

131: components of $y$) is the metric projection onto the range of $Q$

132: in the $\ell_1$ norm. An equivalent problem arises in signal processing:

133: how to reconstruct a signal that belongs to a small class from few linear measurements?

134: We prove that for most sets of Gaussian measurements, all signals

135: of small support can be exactly reconstructed by the $L_1$ norm

136: minimization. This is a substantial improvement of recent results of Donoho and

137: of Candes and Tao. An equivalent problem in combinatorial geometry

138: is the existence of a polytope with fixed number of facets and maximal

139: number of lower-dimensional facets.

140: We prove that most sections of the cube form such polytopes.

141: \end{abstract}

142:

143: \maketitle

144:

145:

146:

147: \section{Error correcting codes and transform coding}

148: %_______________________________________________________

149:

150: Error correcting codes are used in modern technology to protect

151: information from errors. Information is formed by finite words

152: over some alphabet $\F$.

153: An encoder transforms an $n$-letter word $x$ into an $m$-letter word $y$ with $m > n$.

154: The decoder must be able to recover $x$ correctly when up to $r$ letters of $y$

155: are corrupted in any way. Such an encoder-decoder pair is called an

156: {\em $(n,m,r)$-error correcting code}.

157:

158: Development of algorithmically efficient error correcing codes

159: has been attracting attention of engineers, computer scientists

160: and applied mathematicians for past five decades.

161: Known constructions involve deep algebraic and combinatorial methods,

162: see \cite{Handbook}, \cite{Sp1}, \cite{Sp2}.

163: This paper develops a new approach to error correcting codes

164: from the viewpoint of geometric functional analysis (asymptotic convex geometry).

165: Our main focus will be on words over the alphabet $\F = \R$ or $\C$. In applications,

166: these words may be formed of the coefficients of some signal (such as image or audio)

167: with respect to some basis or overcomplete system (Fourier, wavelet, etc.)

168: Finite alphabets will be discussed in Section \ref{s:conclusion}.

169:

170: The simplest and most natural way to encode a vector $x \in \R^n$ into

171: a vector $y \in \R^m$ is of course a linear transform

172: \begin{equation}                \label{Q}

173: y = Qx

174: \end{equation}

175: where $Q$ is given by an $m \times n$ matrix. Elementary linear

176: algebra tells us that if $m \ge n + 2r$ and the range of $Q$ is

177: generic\footnote{that is, in general position with respect to all

178: subspaces $\R^I$, $|I| = r$} then $x$ can be recovered from $y$

179: even if $r$ coordinates of $y$ are corrupted. This gives an

180: $(n,m,r)$-error correcting code. However, the decoder for this

181: code has a huge computational complexity, as it involves a search

182: through all $r$-element subsets of the components of $y$. Then the

183: problem is:

184:

185: \medskip

186:

187: \begin{quote}

188: {\em How to reconstruct a vector $y$ in an $n$-dimensional subspace $Y$

189:   of $\R^m$ from a vector $y' \in \R^m$

190:   that differs from $y$ in at most $r$ coordinates?}

191: \end{quote}

192:

193: \medskip

194:

195: \noindent

196: What complicates this problem is the arbitrary magnitude of errors in each

197: corrupted component of $y'$, in contrast to what happens over finite alphabets

198: such as $\F = \{0,1\}$.

199:

200: A traditional and simple approach to denoising $y'$, used in applications such

201: as signal processing, is the mean least square (MLS) minimization. One hopes

202: that $y$ is well approximated by a solution to the minimization problem

203: \begin{equation*}

204:   \min_{u \in Y} \|u - y'\|_2           \tag{MLS}

205: \end{equation*}

206: where $\|x\|_2^2 = \sum_i |x_i|^2$.

207: The solution to (MLS) is simply the orthogonal projection of $y'$ onto $Y$.

208: This of course can not recover $y$ exactly, and even the approximation is typically

209: poor since we have no control of the magnitude of the errors in the

210: corrupted coordinates.

211: A promising alternative approach is the {\em Basis Pursuit} (BP).

212: We simply replace the $1$-norm by the $2$-norm and expect $y$ to be the {\em exact}

213: and unique solution to the minimization problem

214: \begin{equation*}

215:   \min_{u \in Y} \|u - y'\|_1           \tag{BP}

216: \end{equation*}

217: where $\|x\|_1 = \sum_i |x_i|$.

218: Thus a solution to (BP) is the metric projection of $y'$ onto $Y$

219: with respect to the $1$-norm.  (BP) be cast as a Linear Programming problem,

220: and can be attacked with a variety of methods, such as the classical simplex method

221: or more recent interior point methods that yield polynomial time algorithms

222: \cite{CDS}.

223:

224: \begin{center}

225: \raisebox{-1 true in}{\includegraphics[height=2in]{ecc1.eps}}

226: \end{center}

227:

228: The potential of Basis Pursuit for exact reconstruction

229: is illustrated by the following heuristics, essentially due to \cite{DET}.

230: The solution $u$ to (MLS) is the contact point where the smallest Euclidean ball

231: centered at $y'$ meets the subspace $Y$. That contact point is in general

232: different from $y$. The situation is much better in (BP): typically the solution

233: coincides with $y$. The solution $u$ to (BP) is the contact point

234: where the smallest octahedron centered at $y'$ (the ball with respect to the $1$-norm)

235: meets $Y$. Because the vector $y-y'$ lies in a low-dimensional coordinate subspace,

236: the octahedron has a wedge at $y$. Thus, many subspaces $Y$ through $y$

237: will miss the octahedron of radius $y-y'$ (as opposed to the Euclidean ball).

238: This forces the solution $u$ to (BP), which is the contact point of the octahedron,

239: to coincide with $y$.

240:

241: The idea of using the $1$-norm instead of the $2$-norm for better data recovery

242: has been explored since mid-seventies in various applied areas, in particular

243: geophysics and statistics (early history can be found in \cite{T 04c}).

244: With the subsequent development of fast interior point methods in Linear Programming,

245: (BP) turned into an effectively solvable problem, and was put forward

246: more recently by Donoho and his collaborators, triggering

247: massive experimental and theoretical work \cite{CDS, DH, EB, FN, DE, GN,

248: T 04a, T 04b, T 04c, DET, D 04a, D 04b, DT 04a, DT 04b, CRT, CR, CT}.

249:

250: \medskip

251:

252: The main result of this paper validates the Basis Pursuit method

253: for most subspaces $Y$ under an asymptotically sharp condition on $m,n,r$.

254: We thus prove that {\em the Basis Pursuit yields exact reconstruction for most subspaces $Y$}

255: in the Grassmanian.

256: The randomness is with respect to the normalized Haar

257: measure on the Grassmanian $G_{m,n}$ of $n$-dimensional subspaces of $\R^m$.

258: Positive absolute constants will be denoted throughout the paper

259: by $C, c, C_1, \ldots$.

260:

261: \begin{theorem}                                 \label{ecc}

262:   Let $m$, $n$ and $r < cm$ be positive integers such that

263:   \begin{equation}              \label{mnr'}

264:     m = n+ R, \ \ \ \text{where $R \ge C r \log(m/r)$}.

265:   \end{equation}

266:   Then a random $n$-dimensional subspace $Y$ in $\R^m$ satisfies

267:   the following with probability at least $1 - e^{-c R}$.

268:   Let $y \in Y$ be an unknown vector, and we are given a vector $y'$ in $\R^m$

269:   that differs from $y$ on at most $r$ coordinates.

270:   Then $y$ can be exactly reconstructed from $y'$ as the solution

271:   to the minimization problem (BP).

272: \end{theorem}

273:

274: In an equivalent form, this theorem is a substantial improvement of

275: recent results of Donoho \cite{D 04a} and of Candes and Tao \cite{CT},

276: see Theorem~\ref{reconstruction} below.

277:

278:

279: \subsection{Error correcting codes.}                \label{ss:ecc}

280: Theorem \ref{ecc} implies a natural $(n,m,r)$-error correcting code over $\R$.

281: The encoder \eqref{Q} is given by an $m \times n$

282: random orthogonal matrix\footnote{one can view it as the first $n$ rows of

283: a random matrix from $O(m)$ equipped with the normalized Haar measure.} $Q$.

284: Its range $Y$ is a random $n$-dimensional subspace in $\R^m$.

285: The decoder takes a corrupted vector $y'$, solves (BP) and outputs

286: $Q^T u = Q^{-1} u$. Theorem \ref{ecc} states that under the assumption \eqref{mnr'},

287: this encoder-decored pair is an $(n,m,r)$-error correcting code with

288: exponentially good probability $\ge 1 - e^{-c R}$.

289:

290: \subsection{Sharpness.}

291: The sufficient condition \eqref{mnr'} is sharp up to an absolute

292: constant $C$ (see Section \ref{s:conclusion}) and is only slightly

293: stronger than the necessary condition $m \ge n + 2r$. The ratio

294: $\e = r/m$ in \eqref{mnr'} is the number of errors per letter in

295: the noisy communication channel that maps $y$ to $y'$. Thus $\e$

296: should be considered as a quality of the channel, which is

297: independent of the message. Thus \eqref{mnr'} is equivalent to

298: $$

299: m \ge \Bigl(1 + C \e \log \frac{1}{\e} \Bigr) n.

300: $$

301:

302: \subsection{Robustness.}

303: An natural feature of our error correction code is its {\em robustness}.

304: Simple linear algebra yields that

305: the solution to (BP) is stable with respect to the $1$-norm -- in the same way

306: as the solution to (MLS) is stable with respect to the $2$-norm, see \cite{CT}.

307: Such robustness allows in particular quantization of the messages.

308: This immediately yields error correcting codes for finite alphabets, see

309: Section \ref{s:conclusion}.

310:

311: \subsection {Transform coding.}

312: In the signal processing, the linear codes \eqref{Q} are known

313: as {\em transform codes}. The general paradigm about transform codes is

314: that the redundancies in the coefficients of $y$

315: that come from the excess of the dimension $m > n$ should guarantee

316: a stability of the signal with respect to noise, quantization, erasures,

317: etc. This is confirmed by an extensive experimental and some theoretical

318: work, see e.g. \cite{Da,G1,G2,GVT,GKK,KDG,BO,CK}

319: and the bibliography contained therein.

320: Theorem \ref{ecc} states that {\em most orthogonal transform codes

321: are good error-correcting codes}.

322:

323: \subsection* {Acknowledgement.} This work has started when the second

324: author was visiting University of Missouri-Columbia as a Miller

325: Visiting Scholar. He is grateful to UMC for the hospitality.

326:

327:

328:

329: \section{Reconstruction of signals from linear measurements.}

330: %_____________________________________________________________

331:

332:

333: The heuristic idea that guides the Statistical Learning Theory is that

334: {\em a function $f$ from a small class should be determined by few linear measurements}.

335: Linear measurements are generally given by some linear functionals $X_k$

336: in the dual space, which are fixed (in particular are independent of $f$).

337: Most common measurements are point evaluation functionals; the

338: problem there is to interpolate $f$ between known values while keeping $f$

339: in the known (small) class.

340: When the evaluation points are chosen at random, this becomes the `proper learning'

341: problem of the Statistical Learning Theory (see \cite{M}).

342:

343: We shall however be interested in general linear measurements.

344: The proposal to learn $f$ from general linear measurements ({\em `sensing'})

345: has been originated recently from a criticism of the current methodology

346: of signal compression. Most of real life signals, such as images and sounds,

347: seem to belong to small classes. This is because they carry much of unwanted information

348: that can be discarded with almost no perceptual loss, which makes such signals

349: easily compressible. Donoho \cite{D 04c} then questions the conventional scheme of

350: signal processing, where the whole signal must be first acquired (together

351: with lots of unwanted information) and only then be compressed

352: (throwing away the unwanted part).

353: Instead, can one {\em directly acquire} (`sense') the essential part of the signal,

354: via few linear measurements? Similar issues are raised in \cite{CT}.

355: We shall operate under the assumption that some technology

356: allows us to take linear measurements in certain fixed `directions' $X_k$.

357:

358: We will assume that our signal $f$ is discrete, so we view it as a vector in $\R^m$.

359: Suppose we can take linear measurements $\< f, X_k \> $ with some fixed

360: vectors $X_1, X_2, \ldots, X_{R}$ in $\R^m$.

361: Assuming that $f$ belongs to a small class,

362: how many measurements $R$ are needed to reconstruct $f$?

363: And even when we prove that $R$ measurements do determine $f$

364: (uniquely or approximately), the algorithmic issue remains unsettled:

365: how can one reconstruct $f$ from these measurements?

366:

367: The previoous section suggests to reconstruct $f$

368: as a solution to the Basis Pursuit minimization problem

369: \begin{equation*}

370:   \min \|g\|_1

371:   \ \ \text{subsect to} \ \

372:   \< g, X_k \> = \< f, X_k \> , \ \ k = 1, \ldots, R.     \tag{BP$'$}

373: \end{equation*}

374: For the Basis Pursuit to work, the vectors $X_k$ must be in a good position

375: with respect to all coordinate subspaces $\R^I$, $|I| \le r$.

376: A typical choice for such vectors would be the independent standard Gaussian

377: vectors\footnote{All the components of $X_k$ are independent

378: standard Gaussian random variables.} $X_k$.

379:

380: \subsection{Functions with small support}

381: In the class of functions with small support, one can hope for exact reconstruction.

382: Candes and Tao \cite{CT} have indeed proved that every {\em fixed} function $f$ with

383: support $|\supp f| \le r$ can indeed be recovered by (BP$'$), correctly

384: with the polynomial probability $1 - m^{-\text{const}}$, from the

385: $R = C r \log m$ Gaussian measurements.

386: However, the polynomial probability is clearly not sufficient

387: to deduce that there is {\em one} set vectors $X_k$ that can be used to

388: reconstruct all functions $f$ of small support.

389:

390: The following equivalent form of Theorem \ref{ecc} does

391: yield a uniform exact reconstruction.

392: It provides us with {\em one set} of linear measurements from from which we

393: can effectively reconstruct {\em every} signal of small support.

394:

395: \begin{theorem} [Uniform Exact Reconstruction]                   \label{reconstruction}

396:   Let $m$, $r < cm$ and $R$ be positive integers satisfying

397:   $R \ge C r \log(m/r)$.

398:   The independent standard Gaussian vectors $X_k$ in $\R^m$

399:   satisfy the following with probability at least $1 - e^{-c R}$.

400:   Let $f \in \R^m$ be an unknown function of small support, $|\supp f| \le r$,

401:   and we are given $R$ measurements $\< f, X_k\> $.

402:   Then $f$ can be exactly reconstructed from these measurements

403:   as a solution to the Basis Pursuit problem (BP$'$).

404: \end{theorem}

405:

406: This theorem gives uniformity in Candes-Tao result \cite{CT}, improves the polynomial

407: probability to an exponential probability, and improves upon the number $R$

408: of measurements (which was $R \ge C r \log m$ in \cite{CT}).

409: Donoho \cite{D 04c} proved a weaker form of Theorem \ref{reconstruction}

410: with $R/r$ bounded below by some function of $m/r$.

411:

412: \medskip

413:

414: \proof

415: Write $g = f - u$ for some $u \in \R^m$. Then (BP$'$) reads as

416: \begin{equation}                    \label{BP uf}

417: \min \|u - f\|_1

418: \ \ \text{subsect to} \ \

419: \< u, X_k \> = 0, \ \ k = 1, \ldots, R.

420: \end{equation}

421: The constraints here define a random $(n = m - R)$-dimensional subspace

422: $Y$ of $\R^m$. Now apply Theorem \ref{ecc} with $y = 0$ and $y' = f$. It states

423: that the unique solution to \eqref{BP uf} is $u = 0$. Therefore, the

424: unique solution to (BP$'$) is $f$.

425: \endproof

426:

427:

428: \subsection{Compressible functions}

429: In a larger class of compressible functions \cite{D 04c}, we can only hope for

430: an approximate reconstruction. This is a class of functions $f$ that are

431: well compressible by a known orthogonal transform, such as Fourier or wavelet.

432: This means that the coefficients of $f$ with respect to a certain known

433: orthogonal basis have a power decay. By applying an appropriate rotation,

434: we can assume that this basis is the canonical basis of $\R^m$, thus

435: $f$ satisfies

436: \begin{equation}                        \label{compressible}

437:   f^*(s) \le s^{-1/p}, \ \ \ s = 1, \ldots, m

438: \end{equation}

439: where $f^*$ denotes a nonincreasing rearrangement of $f$.

440: Many natural signals are compressible for some $0 < p < 1$,

441: such as smooth signals and signals with bounded variations (see \cite{CT}),

442: in particular most photographic images.

443: Theorem \ref{reconstruction} implies, by the argument of \cite{CT},

444: that functions compressible in some basis can be approximately

445: reconstructed from few fixed linear measurements:

446:

447: \begin{corollary}[Uniform Approximate Reconstruction]

448:   Let $m$ and $r$ be positive integers.

449:   The independent standard Gaussian vectors $X_k$ in $\R^m$

450:   satisfy the following with probability at least $1 - e^{-c R}$.

451:   Assume that an unknown function $f \in \R^m$ satisfies either

452:   \eqref{compressible} for some $0 < p < 1$ or $\|f\|_1 \le 1$ for $p=1$.

453:   Suppose that we are given $R$ measurements $\< f, X_k\> $.

454:   Then $f$ can be approximately reconstructed from these measurements:

455:   a unique solution $g$ to the Basis Pursuit problem (BP$'$) satisfies

456:   $$

457:   \|f - g\|_2

458:     \le C_p \Bigl( \frac{\log(m/R)}{R} \Bigr)^{\frac{1}{p} - \frac{1}{2}}

459:   $$

460:   where $C_p$ depends on $p$ only.

461: \end{corollary}

462:

463: This theorem also gives uniformity in another Candes-Tao result from \cite{CT}

464: (see also \cite{D 04b}); it improves the polynomial probability to an

465: exponential probability, and also improves upon the approximation error.

466:

467:

468: \section{Counting low-dimensional facets of polytopes.}

469: %_____________________________________________________________

470:

471: Theorem \ref{ecc} turns out to be equivaent to a problem of counting

472: lower-dimensional facets of polytopes. Let $B_1^m$ denote the unit ball

473: with respect to the $1$-norm; it is sometimes called the unit octahedron.

474: The polar body is the unit cube $B_\infty^m = [-1,1]^m$.

475: The conclusion of Theorem \ref{ecc} is then equivalent to the

476: following statement: the affine subspace $z + Y$ is tangent to the unit

477: octahedron at point $z$, where $z = y' - y$. This should happen

478: for all $z$ from the coordinate subspaces $\R^I$ with $|I| = r$.

479: By the duality, this means that the subspace $Y^\perp$ intersects all

480: $(m-r)$-dimensional facets of the unit cube.  The section of the cube by

481: the subspace $Y^\perp$ forms an origin-symmetric polytope of dimension $R$

482: and with $2m$ facets.

483:

484: Our problem can thus be stated as a problem of counting lower-dimensional facets

485: of polytopes.

486: \begin{quote}

487:   {\em Consider an $R$-dimensional origin symmetric polytope

488:   with $2m$ facets. How many $(R-r)$-dimensional facets can it have?}

489: \end{quote}

490: Clearly\footnote{Any such facet is the intersection of some $r$ facets

491: of the polytope of full dimension $R-1$; there are $m$ facets to choose from,

492: each coming with its opposite by the symmetry.}, no more than

493: $2^r \binom{m}{r}$. Does there exist a polytope with that many facets?

494: Our ability to construct such a polytope

495: is equivalent to the existence of the efficient error

496: correcting code. Indeed, looking at the canonical realization of such a

497: polytope as a section of the unit cube by a subspace $Y^\perp$,

498: we see that $Y^\perp$ intersects all the $(m-r)$-dimensional facets

499: of the cube. Thus $Y$ satisfies the conclusion of Theorem~\ref{ecc}.

500: We can thus state Theorem \ref{ecc} in the following form:

501:

502: \begin{theorem}

503:   There exists an $R$-dimensional symmetric polytope with $m$ facets

504:   and with the maximal number of $(R-r)$-dimensional facets

505:   (which is $2^r \binom{m}{r}$), provided $R \ge C r \log(m/r)$.

506:   A random section of the cube forms such a polytope with probability

507:   $1 - e^{-cR}$.

508: \end{theorem}

509:

510: So, how can we prove that a random subspace $Y^\perp$ indeed intersects all the

511: $(m-r)$-dimensional facets of the cube? It is enough to show that

512: $Y^\perp$ intersects one such fixed facet with exponential probability

513: (bigger than $1 - 2^{-r} \binom{m}{r}^{-1}$).

514: The main difficulty here is that the concentration of measure technique

515: can not be readily applied. This is because the $\infty$-norm defined

516: by the unit cube (more precisely, by its facet) has a bad Lipschitz constant.

517: To improve the Lipschitzness, we first project the facet onto a random

518: subspace (within its affine span); the random subspace parallel to which we

519: project is taken from the random directions that form $Y^\perp$.

520: This creates a big Euclidean ball inside the projected facet;

521: here we shall use the full strength of the estimate

522: of Garnaev and Gluskin \cite{GG} on Euclidean projections of a cube.

523: The existence of the Euclidean ball inside a body creates the needed

524: Lipschitzness, so we can now use the concentration of measure tecnique.

525:

526: \medskip

527:

528: The rest of the paper is organized as follows.

529: In Section \ref{s:proof} we prove Theorem \ref{ecc}.

530: In Section \ref{s:conclusion} we discuss some optimality and

531: robustness of the Basis Pursuit with applications to error correcting

532: codes over finite alphabets.

533:

534:

535:

536:

537:

538: \section{Proof}                         \label{s:proof}

539: %______________________________________________________________________________

540:

541: We shall use the following standard notations throughout the proof.

542: The $p$-norm ($1 \le p < \infty$) on $\R^m$ is defined by

543: $\|x\|_p^p = \sum_i |x_i|^p$, and for $p = \infty$ it is

544: $\|x\|_\infty = \max_i |x_i|$. The unit ball with respect to the

545: $p$-norm on $\R^n$ is denoted by $B_p^m$. When the $p$-norm is considered

546: on a coordinate subspace $\R^I$, $I \subset \{1,\ldots,m\}$,

547: the corresponding unit ball is denoted by $B_p^I$.

548:

549: The unit Euclidean sphere in a subspace $E$ is denoted by $S(E)$.

550: The normalized rotational invariant Lebesgue measure on $S(E)$ is denoted

551: by $\sigma_E$.

552: The orthogonal projection in onto a subspace $E$ is denoted by $P_E$.

553: The standard Gaussian measure on $E$ (with the identity covariance matrix)

554: is denoted by $\gamma_H$. When $E = \R^d$, we write $\sigma_{d-1}$ for

555: $\sigma_E$ and $\gamma_d$ for $\gamma_E$.

556:

557:

558:

559: \subsection{Duality}

560: We begin the proof of Theorem \ref{ecc} with a typical duality argument,

561: leading to the same reformulation of the problem as in \cite{CT}.

562: We claim that the conclusion of Theorem \ref{ecc} follows from

563: (and is actually equivalent to) the following separation condition:

564: \begin{equation}                        \label{separation}

565:   (z + Y) \cap \;\text{interior}\, (B_1^m) = \emptyset

566:   \ \ \ \text{for all} \ \ z \in \bigcup_{|I| = r} B_1^I.

567: \end{equation}

568: Indeed, suppose \eqref{separation} holds. We apply it for

569: $$

570: z := \frac{y-y'}{\|y-y'\|_1}

571: $$

572: noting that $z \in \bigcup_{|I| = r} B_1^I$ holds, because $y$ and $y'$

573: differ in at most $r$ coordinates.

574: By \eqref{separation},

575: $$

576: (z + v) \cap \;\text{interior}\, (B_1^m) = \emptyset

577: \ \ \ \text{for all $v \in Y$}

578: $$

579: which implies

580: $$

581: \|z + v\|_1 \ge 1

582: \ \ \ \text{for all $v \in Y$}.

583: $$

584: Let $u \in Y$ be arbitrary. Using the inequality above for

585: $v := \frac{u-y}{\|u-y\|_1}$, we conclude that

586: $$

587: \|u-y\|_1 \ge \|y-y'\|_1

588: \ \ \ \text{for all $u \in Y$}.

589: $$

590: This proves that $y$ is indeed a solution to (BP).

591: The solution to (BP) is unique with probability $1$ in the Grassmanian.

592: This follows from a direct dimension argument, see e.g. \cite{CT}.

593:

594: By Hahn-Banach theorem, the separation condition \ref{separation}

595: is equivalent to the following:

596: for every $z \in \bigcup_{|I| = r} \;\text{boundary}\, B_1^I$

597: there exists $w = w(z) \in Y^\perp$ such that

598: $$

599: \< w,z \> = \sup_{x \in B_1^m} \< w,x \> = \|w\|_\infty.

600: $$

601: This holds if and only if the components of $w$ satisfy

602: \begin{equation}                        \label{w}

603:   \begin{cases}

604:     w_j = \sign(z_j) \ \ \text{for $j \in I$}, \\

605:     |w_j| \le 1 \ \ \text{for $j \in I^c$}.

606:   \end{cases}

607: \end{equation}

608: The set of vectors $w$ in $\R^m$ that satisfy \eqref{w} form a

609: $(m-r)$-dimensional facet of the unit cube $B_\infty^m$.

610: Then with $E := Y^\perp$ we can say that the conclusion

611: of Theorem \ref{ecc} is equivalent to the following:

612:

613: \medskip

614:

615: \begin{quote}

616:   {\em A random $R$-dimensional subspace $E$ in $\R^m$ intersects

617:    all the $(m-r)$-dimensional facets of the unit cube

618:    with probability at least $1 - e^{-cR}$.}

619: \end{quote}

620:

621: \medskip

622:

623: It will be enough to show that $E$ intersects {\em one fixed}

624: facet with the probability $1 - e^{-cR}$. Indeed, since the total

625: number of the facets is $N = 2^r \binom{m}{r}$, the probability

626: that $E$ misses some facet would be at most $N e^{-cR} \le e^{-c_1 R}$

627: with an appropriate choice of the absolute constant in \eqref{mnr'}.

628:

629:

630: \subsection{Realizing a random subspace}

631: We are to show that a random $R$-dimensional subspace $E$ intersects one fixed

632: $(m-r)$-dimensional facet of the unit cube $B_\infty^m$ with high probability.

633: Without loss of generality, we can assume that our facet is

634: $$

635: F = \{ (w_1, \ldots, w_{m-r}, 1, \ldots, 1), \ \ \text{all $|w_j| \le 1$} \},

636: $$

637: whose center is

638: $$

639: \theta = (\underbrace{0,\ldots,0}_{m-r}, 1,\ldots,1).

640: $$

641: The probability we are interested in is

642: $$

643: P := \Prob\{ E \cap F \ne \emptyset\}.

644: $$

645: We shall restrict our attention to the linear span of $F$,

646: $$

647: \lin(F) = \{ (w_1, \ldots, w_{m-r}, t, \ldots, t),

648:      \ \ \text{all $w_j \in \R$, $t \in \R$} \},

649: $$

650: and even to its the affine span of $F$,

651: $$

652: \aff(F) = \{ (w_1, \ldots, w_{m-r}, 1, \ldots, 1),

653:      \ \ \text{all $w_j \in \R$} \}.

654: $$

655: Only the random affine subspace $E \cap \aff(F)$ matters for us, because

656: $$

657: P =  \Prob\Bigl\{ (E \cap \aff(F)) \cap F \ne \emptyset \Bigr\}.

658: $$

659: The dimension of that affine subspace is almost surely

660: $$

661: l := \dim (E \cap \aff(F)) = R-r.

662: $$

663:

664: We can realize the random affine subspace $E \cap \aff(F)$

665: (or rather a random subspace with the same law) by the following

666: algorithm:

667:

668: \begin{enumerate}

669:

670:   \item Select a random variable $D$ with the same law as

671:     $\dist(\theta, E \cap \aff(F))$.

672:

673:   \item Select a random subspace $L_0$ in the Grassmanian $G_{m-r,l}$.

674:     It will realize the ``direction'' of $E \cap \aff(F)$ in $\aff(F)$.

675:

676:   \item Select a random point $z$ on the Euclidean sphere $D \cdot S(L_0^\perp)$

677:     of radius $D$, according to the uniform distribution on the sphere.

678:     Here $L_0^\perp$ is the orthogonal complement of $L_0$ in $\R^{m-r}$.

679:     The vector $z$ will realize the distance from the affine subspace

680:     $E \cap \aff(F)$ to the center $\theta$ of $F$.

681:

682:   \item Set $L = \theta + z + L_0$. Thus the random affine subspace $L$

683:     has the same law as $E \cap \aff(F)$.

684:

685: \end{enumerate}

686:

687: \begin{center}

688: \raisebox{-1 true in}{\includegraphics[height=2in]{ecc2.eps}}

689: \end{center}

690:

691: \noindent Hence

692: $$

693: P = \Prob \{ L \cap F \ne \emptyset \}

694:   = \Prob \{ (z + L_0) \cap B_\infty^{m-r} \ne \emptyset \}

695:   = \Prob \{ z \in P_{L_0^\perp} B_\infty^{m-r} \}.

696: $$

697: $H := L_0^\perp$ is a random subspace in $G_{m-r,m-r-l} = G_{m-r,m-R}$.

698: By the rotational invariance of $z \in D \cdot S(H)$,

699: \begin{equation}                    \label{P=integral}

700: P = \int_{\R^+} \int_{G_{m-r,m-R}} \sigma_H (D^{-1} P_H B_\infty^{m-r})

701:       \; d\nu(H) \; d\mu(D)

702: \end{equation}

703: where $\nu$ is the normalized Haar measure on $G_{m-r,m-R}$

704: and $\mu$ is the law of $D$.

705: We shall bound $P$ in two steps:

706:

707: \begin{enumerate}

708:

709: \item Prove that the distance $D$ is small with high probability;

710:

711: \item Prove that a suitable multiple of the random projection

712:   $P_H B_\infty^{m-r}$ has an almost full Gaussian

713:   (thus also spherical) measure.

714:

715: \end{enumerate}

716:

717:

718: \subsection{The distance $D$ from the center of the facet to a random subspace}

719: We shall first relate $D$, the distance to the affine subspace $E \cap \aff(F)$,

720: to the distance to the linear subspace $E \cap \lin(F)$.

721: Equivalently, we compute the length of the projection onto $E \cap \lin(F)$.

722:

723: \begin{lemma}                       \label{linear vs affine}

724: $$

725: \|P_{E \cap \lin(F)} \theta \|_2 = \sqrt{\frac{r}{r+D^2}} \;

726: \|\theta\|_2.

727: $$

728: \end{lemma}

729:

730: \proof

731: Let $f$ be the multiple of the vector $P_{E \cap \lin(F)} \theta$ such that

732: $f-\theta$ is orthogonal to $\theta$. Such a multiple exists and is unique,

733: as this is a two-dimensional problem.

734:

735: \begin{center}

736: \raisebox{-1 true in}{\includegraphics[height=1.25in]{ecc3.eps}}

737: \end{center}

738:

739: Then $f \in E \cap \aff(F)$. Notice that $D= \|f-\theta\|_2$.  By

740: the similarity of the triangles with the vertices $(0, \theta,

741: P_{E \cap \lin(F)} \theta)$ and $(0, f, \theta)$, we conclude that

742: $$

743: \|P_{E \cap \lin(F)} \theta \|_2 = \frac{r}{\sqrt{r+D^2}} =

744: \sqrt{\frac{r}{r+D^2}} \; \|\theta\|_2

745: $$

746: because $\|\theta\|_2 = \sqrt{r}$.

747: This completes the proof.

748: \endproof

749:

750: \medskip

751:

752: The length of the projection of a fixed vector onto a random subspace in

753: Lemma~\ref{linear vs affine} is well known. The asymptotically sharp

754: estimate was computed by S.~Artstein \cite{A}, but we will be satisfied

755: with a much weaker elementary estimate, see e.g. \cite{Ma} 15.2.2.

756:

757: \begin{lemma} \label{l: random projection}

758:   Let $\theta \in \R^{d-1}$ and let $G$ be a random subspace in $G_{d,k}$.

759:   Then

760:   $$

761:   \Prob \Bigl\{ c \sqrt{\frac{k}{d}} \; \|\theta\|_2

762:                 \le \|P_G \theta\|_2

763:                 \le C \sqrt{\frac{k}{d}} \; \|\theta\|_2

764:         \Bigr\}

765:   \ge  1 - 2 e^{-ck}.

766:   $$

767: \end{lemma}

768:

769: We apply this lemma for $G = E \cap \lin(F)$, which is a random subspace

770: in the Grassmanian of $(l+1)$-dimensional subspaces of $\lin(F)$.

771: Since $\dim \lin(F) = m-r+1$, we have

772: $$

773: \Prob \Bigl\{ \|P_{E \cap \lin(F)} \theta\|_2

774:         \ge c \sqrt{\frac{l+1}{m-r+1}} \; \|\theta\|_2

775:       \Bigr\}

776: \ge  1 - 2 e^{-cl}.

777: $$

778: Together with Lemma \ref{linear vs affine} this gives

779: \begin{equation}                    \label{D small}

780: \Prob \Bigl\{ D \le c \sqrt{m-r} \sqrt{\frac{r}{l}} \Bigr\}

781: \ge 1 - 2e^{-cl}.

782: \end{equation}

783: Note that $\sqrt{m-r}$ is the radius of the Euclidean ball circumscribed

784: on the facet $F$. The statement $D \le \sqrt{m-r}$ would only tell us

785: that the random subspace $E$ intersects the circumscribed ball, not yet the

786: facet itself. The ratio $r/l$ in \eqref{D small} will be chosen logarithmically

787: small, which will force $E$ intersect also the facet $F$.

788:

789:

790:

791: \subsection{Gaussian measure of random projections of the cube}

792: By \eqref{P=integral} and \eqref{D small},

793: $$

794: P \ge \int_{G_{m-r,m-R}}

795:       \sigma_H \Bigl( \frac{c}{\sqrt{m-r}} \sqrt{\frac{l}{r}} \,

796:                        P_H B_\infty^{m-r} \Bigr)

797:       \; d\nu(H) -2 e^{-cl}.

798: $$

799: We can replace the spherical measure $\sigma_H$ by the

800: Gaussian measure $\g_H$ via a simple lemma:

801:

802:

803: \begin{lemma}                       \label{spherical vs Gaussian}

804:   Let $K$ be a star-shaped set in $\R^d$. Then

805:   $$

806:   \g_d(c \sqrt{d} \cdot K) - e^{-d}

807:   \le \sigma_{d-1}(K)

808:   \le \g_d(C \sqrt{d} \cdot K)\cdot (1+ e^{-d}).

809:   $$

810: \end{lemma}

811:

812: \proof Passing to polar coordinates, by the rotational invariance

813: of the Gaussian measure we see that there exists a probability

814: measure $\mu$ on $\R^+$ so that the Gaussian measure of every set

815: $A$ can be computed as $\int_{\R^+} \s^t(A) \; d\mu(t)$, where

816: $\s^t$ denotes the normalized Lebesgue measure on the Euclidean

817: sphere of radius $t$ in $\R^d$. Since $K$ is star-shaped,

818: $\s^t(K)$ is a non-increasing function of $t$. Hence

819: \begin{align*}

820:   \gamma_d(K)

821:   & \ge \int_0^{C \sqrt{d}} \s^t(K) \, d\mu(t)

822:     \ge \s^{C \sqrt{d}}(K) \cdot

823:     \gamma_d( C \sqrt{d} B_2^d)

824: \intertext{and}

825:   \gamma_d(K)

826:   & \le \int_0^{c \sqrt{d}} d\mu(t)

827:    + \s^{c \sqrt{d}}(K) \int_{c \sqrt{d}}^\infty d\mu(t)

828:   \le \gamma_d(c \sqrt{d} \cdot B_2^d) + \s^{c \sqrt{d}}(K).

829: \end{align*}

830: The classical large deviation inequalities imply $\gamma_d(c

831: \sqrt{d} \cdot B_2^d) \le e^{-d}$ and $\gamma_d( C \sqrt{d} B_2^d)

832: \ge 1- e^{-d}/2$. Using the above argument for $c \sqrt{d} \cdot

833: K$, we conclude that $\g_d(c \sqrt{d} \cdot K) \le e^{-d} +

834: \sigma_{d-1}(K)$ and $\g_d(C \sqrt{d} \cdot K) \ge \sigma_{d-1}(K)

835: \cdot (1-e^{-d}/2)$.

836: \endproof

837:

838: \medskip

839:

840:

841: Using Lemma \ref{spherical vs Gaussian}

842: in the space $H$ of dimension $d = m-R$, we obtain

843: $$

844: P \ge \int_{G_{m-r,m-R}}

845:       \gamma_H \Bigl( c \sqrt{\frac{m-R}{m-r}} \sqrt{\frac{l}{r}} \,

846:                        P_H B_\infty^{m-r} \Bigr)

847:       \; d\nu(H) -2 e^{-cl} -e^{m-R}.

848: $$

849: By choosing the absolute constant $c$ in the assumption $r < cm$

850: appropriately small, we can assume that $2r < R < m/2$.

851: Thus

852: \begin{equation}                        \label{P}

853: P \ge \int_{G_{m-r,m-R}}

854:       \gamma_H \Bigl( c \sqrt{\frac{R}{r}} \,

855:                        P_H B_\infty^{m-r} \Bigr)

856:       \; d\nu(H) -2 e^{-cR}.

857: \end{equation}

858: We now compute the Gaussian measure of random projections of the cube.

859:

860: \begin{proposition}                     \label{proj of cube}

861:   Let $H$ be a random subspace in $G_{n,n-k}$, $k < n/2$.

862:   Then the inequality

863:   $$

864:   \gamma_H \Bigl( C \sqrt{\log \frac{n}{k}} \,

865:                        P_H B_\infty^n \Bigr)

866:   \ge 1 - e^{-ck}

867:   $$

868:   holds with probability at least $1 - e^{-ck}$ in the Grassmanian.

869: \end{proposition}

870:

871: The proof of this estimate will follow from the concentration of Gaussian measure,

872: combined with the existence of a big Euclidean ball inside a random projection

873: of the cube.

874:

875: \begin{lemma}[Concentration of Gaussian measure]        \label{concentration}

876:   Let $A$ be a measurable set in $\R^n$. Then for $\e > 0$,

877:   $$

878:   \gamma_n(A) \ge e^{-\e^2 n}

879:   \ \ \ \text{implies} \ \ \

880:   \gamma_n(A + C \e \sqrt{n} B_2^n ) \ge 1 - e^{-\e^2 n}.

881:   $$

882: \end{lemma}

883:

884: With the stronger assumption $\gamma(A) \ge 1/2$, this lemma is the classical

885: concentration inequality, see \cite{L} 1.1. The fact that the concentration

886: holds also for exponentially small sets follows formally by a simple extension

887: argument that was first noticed by D.~Amir and V.~Milman in \cite{AM},

888: see \cite{L} Lemma 1.1.

889:

890: The optimal result on random projections of the cube

891: is due to Garnaev and Gluskin \cite{GG}.

892:

893: \begin{theorem}[Euclidean projections of the cube \cite{GG}]            \label{GG lemma}

894:   Let $H$ be a random subspace in $G_{n,n-k}$, where $k = \a n < n/2$.

895:   Then with probability at least $1 - e^{-ck}$ in the Grassmanian, we have

896:   $$

897:   c(\a) \, P_H(\sqrt{n} B_2^n)

898:   \subseteq P_H(B_\infty^n) \subseteq

899:   P_H(\sqrt{n} B_2^n)

900:   $$

901:   where

902:   $$

903:   c(\a) = c \sqrt{\frac{\a}{\log(1/\a)}}.

904:   $$

905: \end{theorem}

906:

907:

908: \medskip

909:

910: \noindent {\bf Proof of Proposition \ref{proj of cube}. }

911: Let $g_1, g_2, \ldots$ be independent standard Gaussian random variables.

912: Then for a suitable positive absolute constant $c$ and for every $0 < \e < 1/2$,

913: $$

914: \gamma_n \Bigl( C \sqrt{\log \frac{1}{\e}} \, B_\infty^n \Bigr)

915: = \Prob \Bigl\{ \max_{1 \le j \le n} |g_i| \le C \sqrt{\log \frac{1}{\e}} \Bigr\}

916: \ge (1 - \e^2/10)^n \ge e^{-\e^2 n}.

917: $$

918: Since for every measurable set $A$ and every subspace $H$ one has

919: $\gamma_H(P_H A) \ge \gamma(A)$, we conclude that

920: $$

921: \gamma_H \Bigl( C \sqrt{\log \frac{1}{\e}} \, P_H B_\infty^n \Bigr)

922: \ge e^{-\e^2 n}

923: \ \ \ \text{for $0 < \e < 1/2$.}

924: $$

925: Then by Lemma \ref{concentration},

926: \begin{equation}                            \label{cube+ball}

927: \gamma_H \Bigl( C \sqrt{\log \frac{1}{\e}} \, P_H B_\infty^n

928:   + C \e \sqrt{n} \, P_H B_2^n \Bigr)

929: \ge 1 - e^{-\e^2 n}

930: \ \ \ \text{for $0 < \e < 1/2$.}

931: \end{equation}

932: Theorem \ref{GG lemma} tells us that for a random subspace $H$,

933: if $\e = c \sqrt{\a} = c \sqrt{k/n}$,

934: then Euclidean ball is absorbed by the projection of the cube

935: in \eqref{cube+ball}:

936: $$

937: \e \sqrt{n} \, P_H B_2^n \subset C \sqrt{\log \frac{1}{\e}} \, P_H B_\infty^n.

938: $$

939: Hence for a random subspace $H$ and for $\e$ as above we have

940: $$

941: \gamma_H \Bigl( C \sqrt{\log \frac{1}{\e}} \, P_H B_\infty^n \Bigr)

942: \ge 1 - e^{-\e^2 n},

943: $$

944: which completes the proof.

945: \endproof

946:

947: \medskip

948:

949: Coming back to \eqref{P}, we shall use Lemma \ref{proj of cube}

950: for a random subspace $H$ in the Grassmanian $G_{m-r,m-R}$.

951: We conclude that if

952: \begin{equation}                            \label{Rr}

953: c \sqrt{\frac{R}{r}} \ge C \sqrt{\log \frac{m-r}{R-r}},

954: \end{equation}

955: then with probability at least $1 - e^{-cR}$ in the Grassmanian,

956: $$

957: \gamma_H \Bigl( c \sqrt{\frac{R}{r}} \, P_H B_\infty^{m-r} \Bigr)

958: \ge 1 - e^{-cR}.

959: $$

960: Since $\frac{m-r}{R-r} \le \frac{m}{r}$, the choice of $R$ in \eqref{mnr'}

961: satisfies condition \eqref{Rr}. Thus \eqref{P} implies

962: $$

963: P \ge 1 - 3 e^{-cR}.

964: $$

965: This completes the proof.

966: \endproof

967:

968:

969:

970:

971:

972: \section{Optimality, robustness, finite alphabets}                    \label{s:conclusion}

973: %______________________________________________________________________________

974:

975:

976: \subsection{Optimality}

977: The logarithmic term in Theorems \ref{ecc} and

978: \ref{reconstruction} is necessary, at least in the case of small

979: $r$.  Indeed, combining  formula \eqref{P=integral} and Lemmas

980: \ref{linear vs affine}, \ref{l: random projection}, \ref{spherical

981: vs Gaussian}, we obtain

982: \begin{equation}  \label{upper P}

983:   P \le \int_{G_{m-r,m-R}}

984:       \gamma_H \Bigl( c \sqrt{\frac{R}{r}} \,

985:                        P_H B_\infty^{m-r} \Bigr)

986:       \; d\nu(H) + 2 e^{-cR}.

987: \end{equation}

988: To estimate the Gaussian measure we need the following

989: \begin{lemma}  \label{l: Gaussian measure}

990: Let $x_1, \ldots x_s$ be vectors in $\R^s$. Then

991: \[

992:   \g_s \left (\sum_{j=1}^s [-x_j,x_j] \right )

993:   \le \g_s( M \cdot B_{\infty}^s),

994: \]

995: where $M= \max_{j=1, \ldots s} \|x_j\|_2$.

996: \end{lemma}

997:

998: The sum in the Lemma is understood as the Minkowski sum of sets of vectors,

999: $A+B = \{a+b \;|\; a \in A, \; b \in B\}$.

1000:

1001: \medskip

1002:

1003: \proof Let $F= \Span (x_1, \ldots x_{s-1})$ and let $V=F^{\perp}$.

1004: Let $v \in V$ be a unit vector. Set $Z= \sum_{j=1}^{s-1}

1005: [-x_j,x_j]$. Then

1006: \begin{align*}

1007:    \g_s \Bigl(\sum_{j=1}^s [-x_j,x_j] \Bigr)

1008:    &= \int_V \g_F \Bigl( \Bigl( \sum_{j=1}^s [-x_j,x_j]-tv \Bigr) \cap F

1009:                   \Bigr) \, d \g_V(t) \\

1010:    &= \int_{[-P_V x_s, P_V x_s]} \g_F (Z+ t P_F x_s) d \g_V(t).

1011: \end{align*}

1012: By Anderson's Lemma (see \cite{Lif}),

1013: $\g_F (Z+ t P_F x_s) \le \g_F (Z)$. Thus,

1014: \[

1015:   \g_s \Bigl( \sum_{j=1}^s [-x_j,x_j] \Bigr)

1016:   \le \g_V([-P_V x_s, P_V x_s]) \cdot \g_F(Z)

1017:   \le \g_1([-M,M]) \cdot \g_F(Z).

1018: \]

1019: The proof of the Lemma is completed by induction.

1020: \endproof

1021:

1022: The Gaussian measure of a projection of the cube can be estimated

1023: as follows.

1024: \begin{proposition}

1025:   Let $H$ be any subspace in $G_{n,n-k}$, $k < n/2$.

1026:   Then

1027:   \begin{equation} \label{measure of proj of cube}

1028:   \gamma_H \Bigl( \frac{c}{\sqrt{k}} \sqrt{\log \frac{n}{k}} \,

1029:                        P_H B_\infty^n \Bigr)

1030:   \le e^{-cn/k}.

1031:   \end{equation}

1032: \end{proposition}

1033:

1034:

1035: \proof Decompose $I$ into the disjoint union of the sets $J_1,

1036: \ldots J_{s+1}$, so that each of the sets $J_1, \ldots J_s$

1037: contains $k+1$ elements and $(k+1)s<n \le (k+1)(s+1)$. Let $1 \le

1038: j \le s$. Let $U_j = H \cap (P_He_i, \ i \in \{1, \ldots n\}

1039: \setminus J_j)^{\perp}$, where $e_1, \ldots e_n$ is the standard

1040: basis of $\R^n$. Then $U_j$ is a one-dimensional subspace of $H$.

1041: Set

1042: \[

1043:   x_j= \sum_{i \in J_j} \e_i P_He_i,

1044: \]

1045: where the signs $\e_i \in \{-1,1\}$ are chosen to maximize

1046: $\|P_{U_j}x_j\|_2$. Let $E= \Span (x_1, \ldots x_{s-1})$. Since

1047: $P_{U_j} B_{\infty}^n = [-x_j,x_j]$, we get

1048: \[

1049:   P_H B_{\infty}^n \cap E = \sum_{j=1}^s [-x_j,x_j],

1050: \]

1051: where the sum is understood in the sense of Minkowski addition.

1052: Since $\|P_{U_J}\| =1$, $\|x_j\|_2 \le C \sqrt{k}$ and by Lemma

1053: \ref{l: Gaussian measure},

1054: \[

1055:   \gamma_E \left ( \frac{\bar{c}\sqrt{\log s}}{\sqrt{k}}

1056:   \sum_{j=1}^s [-x_j,x_j] \right )

1057:   \le \gamma_E ( c'\sqrt{\log s} \cdot B_{\infty}^E) \le e^{-cs}

1058: \]

1059: for some appropriately chosen constant $\bar{c}$. Finally,

1060: log-concavity of the Gaussian measure implies that for any convex

1061: symmetric body $K \subset H$

1062: \[

1063:   \gamma_H (K) \le \gamma_E(K \cap E).

1064: \]

1065: \endproof

1066:

1067: Combining \eqref{upper P} and \eqref{measure of proj of cube} we

1068: obtain $P \le 2e^{-cR}$, whenever $R \le c \log (m/r)$.

1069:

1070:

1071:

1072:

1073: \subsection{Robustness and codes for finite alphabets}

1074: Robustness is a well known property of the Basis Pursuit method.

1075: It states that the solution to (BP) is stable with respect to the $1$-norm.

1076: Indeed, it is not hard to show that, once Theorem \ref{ecc} holds,

1077: the unknown vector $y$ in Theorem \ref{ecc} can be approximately recovered

1078: from $y'' = y' + h$, where $h \in \R^m$ is any additional

1079: error vector of small $1$-norm (see \cite{CT}).

1080: Namely, the solution $u$ to the Basis Pursuit problem

1081: $$

1082: \min_{u \in Y} \|u - y''\|_1

1083: $$

1084: satisfies

1085: $$

1086: \|u - y\|_1 \le 4 \|h\|_1.

1087: $$

1088: This implies a possibility of quantization of the coefficients

1089: in the process of encoding and yields {\em error correcting codes over

1090: alphabets of size polynomial in $n$}.

1091:

1092: The following is the $(m,n,r)$-error correcting code under

1093: assumption \eqref{mnr'}, with input words $x$ over the alphabet

1094: $\{1,\ldots,p\}$ and the encoded words $y$ over the alphabet

1095: $\{1, \ldots, C p n^{3/2}\}$. The construction is the same as

1096: in \eqref{ss:ecc}; we just introduce quantization.

1097: The encoder takes $x \in \{1,\ldots,p\}^n$, computes

1098: $y = Qx$ and outputs the $\hat{y}$ whose coefficients are the quantized

1099: coefficients of $y$ with step $\frac{1}{10m}$.

1100: Then $\hat{y} \in \frac{1}{10m} \Z^m \cap [-p\sqrt{m}, p\sqrt{m}]^m$,

1101: which by rescaling can be identified with $\{1, \ldots, C p n^{3/2}\}$

1102: because we can assume that $m \le 2n$.

1103: The decoder takes $y' \in \frac{1}{10m} \Z^m$, finds solution $u$

1104: to (BP) with $Y = \range(Q)$, inverts to $x' = Q^T u$ and

1105: outputs $\hat{x'}$ whose coefficients are the quantized

1106: coefficients of $x'$ with step $1$.

1107:

1108: This is indeed an $(m,n,r)$-error correcting code. If

1109: $y'$ differs from $\hat{y}$ on at most $r$ coordinates, this and

1110: the condition $\|\hat{y} - y\|_1 \le \frac{1}{10}$ implies

1111: by the robustness that $\|u-y\|_1 \le 0.4$. Hence

1112: $\|x'-x\|_2 = \|Q^T (u-y)\|_2 = \|u-y\|_2 \le \|u-y\|_1 \le 0.4$.

1113: Thus $\hat{x'} = x$, so the decoder recovers $x$ from $y'$ correctly.

1114:

1115: The robustness also implies a ``continuity'' of our error correcting

1116: codes. If the number of corrupted coordinates in the received message

1117: $y'$ is bigger than $r$ but is still a small fraction,

1118: then the $(m,n,r)$-error correcting code above can still recover $y$

1119: up to some small fraction of the coordinates.

1120:

1121: We hope to return to consequences of our method, in particular

1122: to robustness and continuity of our codes and generally to codes over

1123: finite alphabets, in a separate publication.

1124:

1125:

1126:

1127:

1128:

1129:

1130:

1131:

1132:

1133:

1134:

1135:

1136: {\small

1137: \begin{thebibliography}{S 99}

1138:

1139: \bibitem {A} S. Artstein,

1140:   {\em Proportional concentration phenomena on the sphere},

1141:   Israel J. Math. 132 (2002), 337--358

1142:

1143: \bibitem {AM}  D. Amir, V. D. Milman,

1144:   {\em Unconditional and symmetric sets in $n$-dimensional normed spaces},

1145:   Israel J. Math. 37 (1980), 3--20

1146:

1147: \bibitem {BO} B. Beferull-Lozano, A. Ortega,

1148:   {\em Efficient quantization for overcomplete expansions in $\R^n$},

1149:   IEEE Trans. Inform. Theory  49  (2003), 129--150

1150:

1151: \bibitem {CDS}, S. Chen, D. Donoho, M. Saunders,

1152:   {\em Atomic decomposition by basis pursuit},

1153:   SIAM J. Sci. Comput. 20 (1998), no. 1, 33--61;

1154:   reprinted in: SIAM Rev. 43 (2001), no. 1, 129--159

1155:

1156: \bibitem {CK} P.G.Casazza, J.Kovacevi\'c,

1157:   {\em Equal-norm tight frames with erasures. Frames},

1158:   Adv. Comput. Math.  18  (2003), 387--430

1159:

1160: \bibitem {CR} E. Candes, J. Romberg,

1161:   {\em Quantitative Robust Uncertainty Principles and Optimally Sparse Decompositions},

1162:    preprint

1163:

1164: \bibitem {CRT} E. Candes, J. Romberg, T. Tao,

1165:   {\em Robust Uncertainty Principles: Exact Signal Reconstruction from Highly Incomplete Frequency Information},

1166:   preprint

1167:

1168: \bibitem {CT} E. Candes, T. Tao,

1169:   {\em Near Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?},

1170:   preprint

1171:

1172: \bibitem {Da} I.Daubechies,

1173:   {\em Ten lectures on wavelets},

1174:   SIAM, Philadelphia, 1992

1175:

1176: \bibitem {D 04a} D. Donoho,

1177:   {\em For Most Large Underdetermined Systems of Linear Equations,

1178:   the minimal $\ell_1$-norm solution is also the sparsest solution},

1179:   preprint

1180:

1181: \bibitem {D 04b} D. Donoho,

1182:  {\em For Most Large Underdetermined Systems of Linear Equations, the minimal l1-norm

1183:  near-solution approximates the sparsest near-solution},

1184:   preprint

1185:

1186: \bibitem {D 04c} D. Donoho,

1187:   {\em Compressed sensing},

1188:   preprint

1189:

1190: \bibitem {DET} D. Donoho, M. Elad, V. Temlyakov,

1191:   {\em Stable Recovery of Sparse Overcomplete Representations in the Presence of Noise},

1192:   preprint

1193:

1194: \bibitem {DE} D. Donoho, M. Elad,

1195:   {\em Optimally sparse representation in general (nonorthogonal) dictionaries via $ell_1$

1196:   minimization},

1197:   Proc. Natl. Acad. Sci. USA 100 (2003), 2197--2202

1198:

1199: \bibitem {DT 04a} D. Donoho, Y. Tsaig,

1200:   {\em Extensions of compresed sensing},

1201:   preprint

1202:

1203: \bibitem {DT 04b} D. Donoho, Y. Tsaig,

1204:   {\em Breakdown of Equivalence between the minimal l1-norm Solution and the Sparsest Solution},

1205:   preprint

1206:

1207: \bibitem {DH} D. Donoho, X. Huo,

1208:   {\em Uncertainty principles and ideal atomic decomposition},

1209:   IEEE Trans. Inform. Theory  47  (2001), 2845--2862

1210:

1211: \bibitem {EB} M. Elad, A. Bruckstein,

1212:   {\em A generalized uncertainty principle and sparse representation in pairs of bases},

1213:   IEEE Trans. Inform. Theory  48  (2002), 2558--2567

1214:

1215: \bibitem {FN} A. Feuer, A. Nemirovski,

1216:   {\em On sparse representation in pairs of bases},

1217:   IEEE Trans. Inform. Theory 49 (2003), 1579--1581

1218:

1219: \bibitem {GG} A. Yu. Garnaev, E. D. Gluskin,

1220:   {\em The widths of a Euclidean ball} (Russian),

1221:   Dokl. Akad. Nauk SSSR 277 (1984), 1048--1052.

1222:   English translation: Soviet Math. Dokl. 30 (1984), 200--204

1223:

1224: \bibitem {G1} V.K.Goyal,

1225:   {\em Theoretical Foundations of Transform Coding},

1226:   IEEE Signal Processing Magazine 18 (2001), no. 5, 9--21

1227:

1228: \bibitem {G2} V.K.Goyal,

1229:   {\em Multiple Description Coding: Compression Meets the Network},

1230:   IEEE Signal Processing Magazine 18 (2001), no. 5, 74--93

1231:

1232: \bibitem {GKK} V.K.Goyal, J.Kovacevic, and J.A.Kelner,

1233:   {\em Quantized Frame Expansions with Erasures},

1234:   Applied and Computational Harmonic Analysis 10 (2001), 203--233

1235:

1236: \bibitem {GVT} V.K.Goyal, M.Vetterli, and N.T.Thao,

1237:   {\em Quantized Overcomplete Expansions in RN: Analysis, Synthesis and Algorithms},

1238:   IEEE Trans. on Information Theory 44 (1998), 16--31

1239:

1240: \bibitem {GN} R. Gribonval, M. Nielsen,

1241:   {\em Sparse representations in unions of bases},

1242:   IEEE Trans. Inform. Theory 49 (2003), 3320--3325

1243:

1244: \bibitem {Handbook}

1245:   {\em Handbook of coding theory. Vol. I, II.}

1246:   Edited by V. S. Pless, W. C. Huffman and R. A. Brualdi.

1247:   North-Holland, Amsterdam, 1998.

1248:

1249: \bibitem {KDG} J.~Kovacevic, P.~Dragotti, and V.~Goyal,

1250:   {\em Filter Bank Frame Expansions with Erasures},

1251:   IEEE Trans. on Information Theory, 48 (2002), 1439--1450

1252:

1253: \bibitem {L} M. Ledoux,

1254:   {\em The concentration of measure phenomenon},

1255:   Mathematical Surveys and Monographs, 89.

1256:   American Mathematical Society, Providence, RI, 2001

1257:

1258: \bibitem {Lif} M. A. Lifshits,

1259:   {\em Gaussian random functions},

1260:   Mathematics and its Applications, 322.

1261:   Kluwer Academic Publishers, Dordrecht, 1995

1262:

1263: \bibitem {Ma} J.~Matousek,

1264:   {\em Lectures on discrete geometry},

1265:   Graduate Texts in Mathematics, 212. Springer-Verlag, New York, 2002.

1266:

1267: \bibitem {M} S. Mendelson,

1268:   {\em Geometric parameters in learning theory},

1269:   Geometric aspects of functional analysis,  193--235,

1270:   Lecture Notes in Mathematics, 1850, Springer, Berlin, 2004

1271:

1272: \bibitem {Sp1} D. Spielman,

1273:   {\em The complexity of error-correcting codes},

1274:   Fundamentals of Computation Theory, Krakow, Poland, 67--84,

1275:   Lecture Notes in Computer Science 1279, Springer, Berlin, 1997

1276:

1277: \bibitem {Sp2} D. Spielman,

1278:   {\em Constructing Error-Correcting Codes from Expander Graphs},

1279:    Emerging applications of number theory (Minneapolis, MN, 1996), 591--600,

1280:   IMA Vol. Math. Appl., 109, Springer, New York, 1999

1281:

1282: \bibitem {T 04a} J. Tropp,

1283:   {\em Recovery of short, complex linear combinations via $\ell_1$ minimization},

1284:   IEEE Trans. Inform. Theory, to appear

1285:

1286: \bibitem {T 04b} J. Tropp,

1287:   {\em Greed is good: Algorithmic results for sparse approximation},

1288:   IEEE Trans. Inform. Theory, Vol. 50, Num. 10, October 2004, pp. 2231-2242

1289:

1290: \bibitem {T 04c} J. Tropp,

1291:   {\em Just relax: Convex programming methods for subset selection and sparse approximation},

1292:   ICES Report 04-04, UT-Austin, February 2004

1293:

1294:

1295:

1296: \end{thebibliography}

1297: \end{document}

1298: