0611:math0611285/mn40.tex

1:

2: \documentclass[12pt,leqno]{amsart}

3: \usepackage[boxed,ruled]{algorithm2e}

4: % \usepackage[notcite,notref]{showkeys}

5: % \usepackage{hyperref}

6: \setlength{\textwidth}{15cm}

7: \setlength{\textheight}{23cm}

8:

9: \hoffset=-1.0cm

10: \voffset=-2.0cm

11:

12: \hfuzz=7pt

13: \vfuzz=2pt

14:

15: \headsep=27pt

16: \parindent=15pt

17:

18: \frenchspacing

19:

20: \newlength{\fixboxwidth}

21: \setlength{\fixboxwidth}{\marginparwidth}

22: \addtolength{\fixboxwidth}{-9pt}

23: \newcommand{\fix}[1]{\marginpar{\fbox{\parbox{\fixboxwidth}

24: {\raggedright\tiny #1}}}}

25:

26: \newcommand{\tabfrac}[2]{%

27:         \setlength{\fboxrule}{0pt}%

28:         \fbox{$\frac{#1}{#2}$}%

29: }

30:

31: \newcommand{\tabmath}[1]{%

32:         \setlength{\fboxrule}{0pt}%

33:         \fbox{${#1}$}%

34: }

35:

36: \newcommand{\R}{{\mathbb R}}

37: \newcommand{\N}{{\mathbb N}}

38: \newcommand{\SP}{{\mathbb S}}

39: \newcommand{\PP}{{\mathbb P}}

40: \newcommand{\Oh}{{\cal O}}

41: \newcommand{\bi}{{\bf{i}}}

42: \newcommand{\bx}{{\bf{x}}}

43: \newcommand{\ba}{{\alpha}}

44: \newcommand{\cost}{\operatorname{cost}}

45: \renewcommand{\rho}{{\varrho}}

46: \def\min{{\rm min}}

47: \def\old{{\rm old}}

48: \def\l{{\lambda }}

49: \def\a{{\alpha }}

50: \def\g{{\gamma }}

51: \def\e{{\varepsilon}}

52: \def\alph{{\theta}}

53: \def\F{\mathcal F}

54: \def\d{{\delta}}

55: \def\phi{{\varphi}}  % huebscher!?

56: \def\om{\omega}

57: \newcommand{\vol}{\operatorname{vol}}

58: %\renewcommand{\rho}{{\varrho}}

59: \newcommand{\rad}{\mathcal R^{\alpha}}

60: \newcommand{\radc}{\mathcal R_{C}}

61: %  \newcommand{\fad}{\mathcal F_\alpha} %   alt

62: \newcommand{\fad}{\mathcal F^\alpha}    %   neu

63: \newcommand{\ball}{{B^d}}

64: \newcommand{\fo}{\mathcal F(\Omega)}

65: \newcommand{\fco}{\mathcal F_C(\Omega)}

66: \newcommand{\wt}{\widetilde }

67: \newcommand{\krd}{K_{\rho,\delta} }

68: \newcommand{\prd}{P_{\rho,\delta} }

69: \newcommand{\muo}{\mu_\Omega}

70: \newcommand{\mur}{\mu_\rho}

71: \newcommand{\dist}{\operatorname{dist}}

72: \newcommand{\vt}{S^{\mathrm{mean}}_n}

73: \newcommand{\vtm}{S^{\mathrm{mh}}_n}

74: \newcommand{\vtn}{S^{\mathrm{simple}}_n}

75: \newcommand{\lr}[1]{\left(#1\right)}

76: \newcommand{\abs}[1]{\left\vert #1 \right\vert}

77: \newcommand{\norm}[2]{\Vert #1  \Vert _{#2}}

78: \newcommand{\set}[1]{\left\{#1\right\}}

79: \newcommand{\expect}{\mathbf E}

80: \newcommand{\Var}{\operatorname{Var}}

81: \newcommand{\scalar}[2]{\langle #1,#2\rangle}

82: \newcommand{\die}{\mathcal E}

83:

84: \theoremstyle{plain}

85: \newtheorem{theorem}{Theorem}

86: \newtheorem{lemma}{Lemma}

87: \newtheorem{proposition}{Proposition}

88: \newtheorem{corollary}{Corollary}

89:

90: \theoremstyle{definition}

91: \newtheorem{rem}{Remark}

92: %   \numberwithin{lemma}{section}  % geaendert

93: %   \numberwithin{equation}{section} % geaendert

94: \begin{document}

95:

96: \title{Simple Monte Carlo and the Metropolis algorithm}

97: \author{Peter Math\'e}

98: \address{Weierstrass Institute for Applied Analysis and

99: Stochastics, Mohrenstrasse 39, D-10117 Berlin, Germany}

100: \email{mathe@wias-berlin.de}

101: \author{Erich Novak}

102: \address{ Friedrich Schiller University Jena,

103: Mathem. Institute,

104: Ernst-Abbe-Platz 2,

105: D-07743 Jena, Germany}

106: \email{novak@math.uni-jena.de}

107: \date{Version: \today}

108: \keywords{Monte Carlo methods, Metropolis algorithm,

109: log-concave density, rapidly mixing Markov chains,

110: optimal algorithms, adaptivity, complexity}

111: \subjclass[2000]{65C05, secondary: 65Y20, 68Q17, 82B80}

112:

113: \maketitle

114: \begin{center}

115: {\sl\large Dedicated to our dear colleague and friend Henryk

116: Wo\'zniakowski on the occasion of his 60th birthday. }

117: \end{center}

118:

119: \begin{abstract}

120: We study the integration of functions with

121: respect to an unknown density.

122: % which is known only up to the normalizing factor.

123: Information is available as

124: oracle calls to the integrand and to the non-normalized density

125: function.

126: We are interested in analyzing the integration error

127: of optimal algorithms

128: (or the complexity of the problem) with emphasis on

129: the variability of the weight function.

130: For a corresponding large

131: class of problem instances we show that the complexity

132: grows linearly in the variability, and the simple Monte Carlo method

133: provides an almost optimal algorithm.

134: Under additional geometric restrictions (mainly log-concavity)

135: for the density

136: functions, we establish that a suitable adaptive

137: local Metropolis algorithm is almost optimal and

138: outperforms any non-adaptive algorithm.

139: \end{abstract}

140:

141: \section{Introduction, Problem description}\label{s1}

142: In many applications one wants to compute an integral of the form

143: \begin{equation}

144: \label{eq:base}

145: \int_\Omega f(x) \cdot c \rho(x) \, \mu(dx)

146: \end{equation}

147: with a density $c \rho(x),\ x\in \Omega$, where $c >0$ is unknown

148: and $\mu$ is a probability measure.

149: Of course we have

150: $

151: {1}/{c} = \int_\Omega  \rho(x) \, \mu(dx),

152: $

153: but the numerical computation of the

154: latter integral is often as hard as the original

155: problem~(\ref{eq:base}).

156: Therefore it is desirable to have algorithms which are able to

157: approximately compute~(\ref{eq:base}) without knowing the normalizing

158: constant, based solely on $n$  function values of $f$ and $\rho$. In other

159: terms, these functions are given by an \emph{oracle}, i.e., we assume

160: that we can compute function values of $f$ and $\rho$.

161:

162: \subsubsection*{Solution operator}

163: \label{solop}

164: Assume that we are given any

165: class $\fo$ of input data $(f,\rho)$ defined

166: on a set $\Omega$.

167: We can rewrite the integral in~(\ref{eq:base}) as

168: \begin{equation}   \label{eq02}

169: S(f, \rho) = \frac{\int f(x) \cdot \rho (x) \, \mu(dx)}{\int \rho (x)

170: \, \mu(dx)},\quad (f,\rho)\in\fo.

171: \end{equation}

172: This \emph{solution operator} is linear in $f$ but not in $\rho$.

173: We discuss algorithms for the (approximate) computation of $S(f, \rho)$.

174: \begin{rem}

175: This solution operator is closely related to systems in statistical

176: mechanics, which obey a Boltzmann

177: (or Maxwell or Gibbs) distribution, i.e., when there is a

178: countable number $j=1,2,\dots$ of microstates with energies, say

179: $E_j$,  and the overall system is distributed according to the

180: Boltzmann distribution, with inverse temperature $\beta$,  as

181: $$

182: P_\beta(j):= \frac{e^{-\beta E_j}}{Z_\beta},\quad j=1,2,\dots.

183: $$

184: In this case the normalizing constant $Z_\beta$ is the \emph{partition

185: function},

186: corresponding to $1/c$ from~(\ref{eq:base}) and $\rho^\beta(j)=

187: e^{-\beta E_j}$ for $j \in \N$.

188:

189: In this setup, if $A$ is any global thermodynamic quantity, then its

190: expected value $\langle A \rangle_\beta$ is given by

191: $$

192: \langle A \rangle_\beta := \frac{1}{Z_\beta} \sum_{j} A_j e^{-\beta E_j},

193: $$

194: which can be written as $S(A,\rho^\beta)$.

195: Observe, however, that we use here slightly

196: different assumptions

197: since we use the counting measure on $\N$, not a probability measure.

198: \end{rem}

199:

200: \subsubsection*{Randomized methods}

201: \label{randm}

202:

203: Monte Carlo methods (randomized methods) are

204: important numerical tools for integration and

205: simulation in science and engineering, we refer to the

206: recent special issue~\cite{10.1109/MCSE.2006.27}.

207: The Metropolis method, or more accurately, the class of

208: \emph{Metropolis-Hastings algorithms} ranges among the most important

209: methods in numerical analysis and scientific computation,

210: see~\cite{10.1109/5992.814660,10.1109/MCSE.2006.30}.

211:

212: Here we consider randomized methods $S_n$ that use $n$ function

213: evaluations of $f$ and $\rho$. Hence $S_n$ is of the form as exhibited

214: in Figure~\ref{fig:gene}.

215: \SetKw{KwInit}{Init}

216: \SetKw{KwAvg}{Compute}

217: \SetKw{KwDet}{Step}

218: \SetKw{KwCh}{Choose}

219: \SetKw{KwComp}{Compute}

220: \restylealgo{boxed}

221: \begin{figure}[h]

222:   \centering

223: \begin{algorithm}[H]

224: \SetLine

225: \Titleofalgo{ $S_n(f,\rho)$}

226: \KwData{Functions $f,\rho$, {\tt random numbers} $\omega_{1},\dots,\omega_{n}$\;}

227: \KwResult{approximate value $S_n(f,\rho)$ for $S(f,\rho)$ from Eq.~(\ref{eq02})\;}

228: \Begin{

229: \KwInit{$x_{1} := x_{1}(\omega_{1})$,

230: \KwComp{ $f(x_1)$ and $\rho(x_1)$}\;

231: }

232:

233: \For{$i=2,\dots,n$}

234: {

235: \KwDet{ $x_i := x_{i}(f(x_{1}),\dots,f(x_{i-1}),\rho(x_{1}),\rho(x_{i-1}),\omega_{i})$}\;

236: \KwComp{$f(x_i)$ and $\rho (x_i)$}\;

237: }

238: \KwAvg{ $S_n(f,\rho)= \phi_n(f(x_{1}),\dots,f(x_{n}),\rho(x_{1}),\dots,

239: \rho(x_{n}))\in \R $}\;

240: }

241: \end{algorithm}

242:   \caption{Generic Monte Carlo algorithm based on $n$ values of

243:  $f$ and $\rho$. The final {\bf Compute} may use any mapping $\phi_n : \R^{2n} \to \R$.

244: %  random numbers

245: %  $\omega_{1},\dots,\omega_{n}$.

246: % Monte Carlo algorithms differ by choosing~\KwDet and~\KwAvg{} in

247: % different ways.

248: }

249:   \label{fig:gene}

250: \end{figure}

251:

252: In all steps, random number generators may be used to determine the

253: consecutive node.

254: If the nodes $x_i$ from \KwDet

255: do not depend on previously computed

256: values of $f(x_1), \dots ,f(x_{i-1})$ and

257: $\rho(x_1), \dots , \rho(x_{i-1})$, then the algorithm is called

258: \emph{non-adaptive}, otherwise it is called \emph{adaptive}.

259: Specifically we analyze the

260: procedures $\vtn$ and $\vtm$, introduced in~(\ref{eq:vtn})

261: and~(\ref{eq:met}) below.

262: \begin{rem}

263: The notion of \emph{adaption} which is used here differs from the

264: one recently used to introduce~\emph{adaptive MCMC}, see

265: e.g.~\cite{MR2260070,MR2172842}.

266: %   By a non-adaptive algorithm we mean an algorithm of the form

267: %   $x_i = x_i (\omega_i)$, i.e., the node $x_i$ does \emph{not}

268: %   depend on the (already computed) values

269: %   $f(x_1), \dots , f(x_{i-1}), \rho(x_1), \dots , \rho (x_{i-1})$.

270: %

271: %   All other algorithms are called adaptive.

272: The Metropolis algorithm which is used in this paper is based

273: on a

274: \emph{homogeneous} Markov chain, in our notation this is still

275: an adaptive algorithm since the used nodes $x_i$ depend on $\rho$.

276: %  but the kernel of which

277: %  may depend on the specific target distribution, as this is the case

278: %  for the Metropolis sampler, see~\S~\ref{sec:metro-loc}.

279: Hence we use the concept of adaptivity from numerical analysis

280: and information-based complexity, see~\cite{MR1408328}.

281: \end{rem}

282:

283: For details on the model of computation we

284: refer to~\cite{NOV,MR1319050,IBC}.

285: Here we only mention the following:

286: We use the real number model and assume that $f$ and $\rho$

287: are given by an oracle for function values.

288: Our lower bounds hold under very general assumptions

289: concerning the available random number generator.\footnote{Observe,

290: however, that we cannot use a random number generator

291: for the ``target distribution''

292: $\mu_\rho=\rho \cdot \mu / \Vert \rho \Vert_1$,

293: since $\rho$ is part of the input.}

294:

295: For the upper bounds we only study two algorithms

296: in this paper, described in~(\ref{eq:vtn}) and (\ref{eq:met}),

297: below. Specifically we shall deal with the (non-adaptive)~\emph{simple Monte Carlo

298:   method} and a specific  (adaptive)~\emph{Metropolis--Hastings method}.

299: The former can only be applied if a random

300: number generator for $\mu$ on $\Omega$ is available.

301: Thus there

302: are natural situations when this method cannot be used.

303: % If applicable, then the subroutine \KwDet~ in Algorithm~\ref{fig:algorithm} chooses a random number

304: % according to $\mu$, independently in each step.

305: The latter will be based on a suitable

306: ball walk. Hence we need a random number generator

307: for the uniform distribution on a (Euclidean) ball.

308: Thus the Metropolis Hastings methods

309: can also be applied when a random

310: number generator for $\mu$ on $\Omega$ is not available.

311: Instead, we need a ``membership

312: oracle'' for $\Omega$: On input $x \in \R^d$ this oracle can

313: decide with cost 1 whether $x \in \Omega$ or not.

314:

315: % The detailed description of the \emph{adaptive} algorithm is postponed to~\S~\ref{sec:metro-loc}.

316: \subsubsection*{Error criterion}

317: \label{sec:error}

318: We are interested in error bounds

319: uniformly for classes~$\fo$ of input data. If $S_{n}$ is any method

320: that uses (at most) $n$ values of $f$ and $\rho$

321: then the (individual) error for the

322: problem instance~$(f, \rho)\in\fo$ is given by

323: \begin{equation*}

324: %\label{eq:mcerr}

325: e(S_n, (f,\rho))= \lr{\expect\abs{S(f,\rho) -

326:       S_n (f,\rho)}^{2}}^{1/2},

327: \end{equation*}

328: where $\expect$ means the expectation.

329: The overall (or worst case)  error on the class $\fo$ is

330: \begin{equation*}

331: %\label{eq:mcerfc}

332: e(S_n, \fo)= \sup_{(f,\rho)\in\fo}

333: e(S_n , (f,\rho)).

334: \end{equation*}

335: The complexity of the problem is given by

336: the error of the best algorithm, hence we let

337: \begin{equation*}

338: %\label{eq05}

339: e_n (\fo) := \inf_{S_n}

340: e(S_n, \fo).

341: \end{equation*}

342: The classes~$\fo$ under consideration will always contain constant

343: densities~$\rho = c > 0$ and all $f$ with $\Vert f \Vert_\infty

344: \le 1$, hence

345: $$

346: \mathcal F_1 (\Omega) :=\set{(f,\rho),\ \abs{f(x)}\leq 1,\

347: x\in\Omega, \text{ and }\ \rho = c} \subset \fo.

348: $$

349: On this class the problem~(\ref{eq02}) reduces to the classical

350: integration problem for uniformly bounded functions, and it is well

351: known that the error of any Monte Carlo method can decrease at a rate

352: $n^{-1/2}$, at most. Precisely, it holds true that

353: $$

354: e_{n}(\mathcal

355: F_{1}(\Omega))= \frac{1}{1 + \sqrt n},

356: $$

357: if the probability~$\mu$ is non-atomic, see~\cite{olm}.

358: On the other hand we will only consider $(f, \rho)$ with

359: $S(f, \rho) \in [-1, 1]$, hence the trivial algorithm

360: $S_0=0$ always has error 1.

361:

362: For the classes $\fco$

363: and $\fad(\Omega)$,  which will be

364: introduced in Section~\ref{sec:m+c},

365: we easily obtain the optimal order

366: $e_{n}(\fo) \asymp n^{-1/2}$.

367: We will analyze how $e_n(\fo)$

368: depends on the parameters

369: $C$ and $\alpha$, in case $\fo:=\fco$ or

370: $\fo:=\fad(\Omega)$, respectively.

371:

372: We discuss some of our subsequent results and provide a short

373: outline.

374: In Section~\ref{sec:m+c} we shall specify the methods and classes of input

375: data to be analyzed.

376: The classes $\fco$,

377: analyzed first in Section~\ref{s2},  contain all densities $\rho$ with

378: $\sup \rho / \inf \rho \le C$. In

379: typical applications we may face $C=10^{20}$.

380: Then  we cannot decrease the error of optimal

381: methods from 1 to $0.7$ even with sample

382: size $n=10^{15}$, see Theorem 1 for more details.

383: Hence the classes $\fco$ are so large that no

384: algorithm, deterministic or Monte Carlo,

385: adaptive or non-adaptive, can provide an acceptable

386: error. We also prove that the simple (non-adaptive) Monte Carlo method is almost

387: optimal, no sophisticated  Markov chain Monte Carlo method can help.

388:

389: Thus we face the question whether adaptive algorithms,

390: such as the Metropolis algorithm,

391: help significantly on ``suitable and interesting'' subclasses of $\fco$.

392: We give a positive answer for the classes

393: $\fad(\Omega)$, analyzed in Section~\ref{s3}.  Here we assume that

394: $\Omega \subset \R^d$ is a convex body, and that $\mu$ is the normalized Lebesgue

395: measure~$\muo$ on $\Omega$.

396: The class~$\fad(\Omega)$ contains logconcave densities,

397: where $\a$ is the Lipschitz constant

398: of $\log \rho$.

399: We shall establish in \S~\ref{sec:non} that

400: all non-adaptive methods (such as the simple Monte

401: Carlo method) suffer from the curse of dimension,

402: i.e., % for non-adaptive methods

403: we get similar lower bounds as for the classes $\fco$.

404:  However, in \S~\ref{sec:metro-loc} we shall design and analyze

405:  specific (adaptive) Metropolis algorithms that are based on some

406:  underlying ball walks, tuned to the class parameters%  as these are the

407: % spacial dimension $d$ and the Lipschitz constant $\a$

408: . Using such algorithms we can

409: break the curse of dimension by adaption. The main error estimate for

410: this algorithm is given in Theorem~\ref{th5}, and we conclude

411: this study with further discussion in the final Section~\ref{sec:sum}.

412:

413: \section{Specific methods and classes of input}

414: \label{sec:m+c}

415: We consider the approximate computation of $S(f,\rho)$

416: for large classes of input data.

417: Since with deterministic algorithms one cannot %E  substantially

418: improve

419: the trivial zero algorithm (with error 1),

420: we study randomized or Monte Carlo algorithms.

421:

422: \subsection*{The methods}

423: The Monte Carlo methods under consideration  fit the schematic view from

424: Figure~\ref{fig:gene}.

425:

426: \subsubsection*{{Simple Monte Carlo}}

427: \label{sec:simp}

428: Here the random numbers

429: $\omega_{1},\dots,\omega_{n}$ are identically and independently

430: distributed according to~$\mu$, and the routine~\KwDet chooses

431: $X_{i}:= \omega_{i}$.

432: The final routine~\KwAvg is the quotient of the sample means of

433: the computed function values

434: \begin{equation}\label{eq:vtn}

435: \vtn(f,\rho):= \frac{\sum_{j=1}^n f(X_j)\rho(X_j)}{\sum_{j=1}^n\rho(X_j)}.

436: \end{equation}

437: \subsubsection*{{Metropolis-Hastings method}}

438: \label{sec:mh}

439: This describes a class of (adaptive) Monte Carlo  methods which are based

440: on the ingenious idea to construct in \KwDet a Markov chain having

441: \begin{equation}  \label{mur}

442: \mur := \frac{\rho \cdot \mu}{\int\rho(x)\, \mu(dx)}

443: \end{equation}

444: as invariant distribution without knowing the normalization.

445: Thus, if $(X_1,X_2,\dots,X_n)$ is a

446: trajectory of such a Markov chain, then we let \KwAvg be given as

447: \begin{equation}

448:   \label{eq:met}

449:   \vtm(f,\rho):= \frac{1}{n}  \sum_{j=1}^n f(X_j).

450: \end{equation}

451: Hence we use $n$ steps of the Markov chain, the number of needed

452: (different)

453: function values of $\rho$ and $f$ might be smaller.

454: We will further specify the Metropolis-Hastings algorithm for the

455: problem at hand in \S~\ref{sec:metro-loc}, see Figures 2 and 3

456: for a schematic presentation and Theorem 5 for the choice of $\delta$.

457: %E  In diesem Bereich ein paar kleine Aenderungen.

458: Both Monte Carlo methods construct Markov chains,  i.e., the point

459: $x_i$ depends on $x_{i-1}$ and $\rho (x_{i-1})$, only. This trivially holds true

460: for simple Monte Carlo, since $x_i$ does not at all depend on

461: earlier computed function values.

462:

463: \begin{rem}

464: Comparisons of different Monte Carlo methods for problems similar

465: to~(\ref{eq02}) are frequently met in the literature. We

466: mention~\cite{B/D06} with a comparison

467: of \emph{Metropolis algorithms} and

468: \emph{importance sampling}, where an error expansion at any instance

469: $(f,\rho)$ is given in terms of certain auto-correlations. The simple

470: Monte Carlo method, as introduced below, is also studied there as

471: $\tilde\mu_{I}$ for $\rho   = 1$.

472: \end{rem}

473: The (point-wise almost sure) convergence of both

474: methods $\vtn% (f,\rho)

475: $ and

476: $\vtm% (f,\rho)

477: $, as $n\to\infty$,  is ensured by corresponding

478: ergodic theorems, see~\cite{MR797411}. But, as outlined above, we are

479: interested in the uniform error on  relatively large~\emph{problem classes}.

480: \subsection*{The classes}

481: Here we formally describe the classes of input under consideration.

482:

483: \subsubsection*{{ The class $\fco$}}

484: \label{sec:classfc}

485:

486: %In Section~3 we assume that

487: Let $\mu$ be an arbitrary probability

488: measure on a set $\Omega$ and consider the set

489: $$

490: \fco = \{ (f, \rho) \mid

491: \Vert f \Vert_\infty \le 1, \

492: \rho >0, \

493: \frac{\rho(x)}{\rho(y)}  \le C,\ x,y\in\Omega \}.

494: $$

495: % $$

496: % \fco = \{ (f, \rho) \mid

497: % \Vert f \Vert_\infty \le 1, \

498: % \rho >0, \

499: % \frac{\sup \rho}{\inf \rho}  \le C \}.

500: % $$

501: Note that necessarily $C\geq 1$. If $C=1$ then $\rho$ is constant and

502: we almost face the ordinary integration problem, since

503: $\rho$ can be recovered with only one function value.

504:

505: In many applications the constant $C$ is huge and we will establish

506: that the complexity of the problem (the cost of an optimal

507: algorithm) is linear in $C$. Therefore, for large $C$, the class is

508:  too large. We have to look for smaller classes that

509: contain many interesting pairs $(f, \rho)$ and have smaller complexity.

510:

511: \subsubsection*{The class $\mathcal \fad(\Omega)$

512: with log-concave densities}

513: \label{sec:classfad}

514:

515: In many applications, we have a weight~$\rho$ with additional

516: properties and %in Section~4

517: we assume the following:

518: \begin{itemize}

519: \item The set $\Omega\subset \R^d$ is a \emph{convex body}, that is a compact and convex set

520: with nonempty interior. The probability $\mu=\muo$ is the normalized Lebesgue measure

521: on the set~$\Omega$.

522: \item

523: The functions $f$ and $\rho$ are defined on $\Omega$.

524: \item

525: The weight~$\rho >0$ is log-concave, i.e.,

526: $$

527: \rho(\lambda x + (1-\lambda)y) \ge \rho(x)^\lambda \cdot

528: \rho(y)^{1-\lambda},

529: $$

530: where $x,y \in \Omega $ and $0<\lambda <1$.

531: \item

532: The logarithm of $\rho$ is Lipschitz,

533: i.e.,

534: $

535: |\log\rho(x) - \log\rho(y) | \leq \alpha \Vert x-y \Vert_2

536: $.

537: \end{itemize}

538: Thus  we  consider the class of log-concave weights on

539: $\Omega\subset \R^{d}$ given by

540: \begin{equation}

541: \label{eq:dens-class}

542: \rad(\Omega)  = \{  \rho \mid

543: \rho >0, \

544: \log\rho \text{ is concave}, \

545: |\log\rho(x) - \log\rho(y) | \leq \alpha \Vert x-y \Vert_2 \} .

546: \end{equation}

547:

548: We study the following class $\fad(\Omega)$ of problem elements,

549: \begin{equation}

550:   \label{eq:fad}

551:  \fad (\Omega)  = \set{(f, \rho) \mid

552: \rho \in\rad( \Omega),  \ \norm{f}{2,\rho}\le 1 } ,

553: \end{equation}

554: where $\Vert \cdot \Vert_{2,\rho}$ is the

555: $L_2$-norm with respect to the probability measure $\mur$,

556: see~\eqref{mur}.

557: In some places we restrict our study to the (Euclidean) unit ball, i.e.,

558: $\Omega:= \ball \subset \R^d$.

559:

560: \begin{rem}

561: Let $\radc (\Omega)$ be the class of weight functions that

562: belong to $\fco$. Then

563: $\rad(\Omega) \subset \radc (\Omega)$

564: if $C = e^{\alpha

565: D}$, where $D$ is the diameter of $\Omega$.

566: Thus large $\a$ correspond to ``exponentially large''

567: values of $C$. However,

568: the densities from the class

569: $\rad(\Omega)$ have some extra (local) properties: they are log-concave

570: and Lipschitz continuous.

571: These properties can be used for the construction of fast

572: adaptive methods, via rapidly mixing Markov chains.

573: \end{rem}

574:

575: \section{Analysis for $\fco$} \label{s2}

576:

577: We assume that $\Omega$ is an arbitrary set and $\mu$

578: is a probability measure on $\Omega$,

579: and that the functions~$f$ and $\rho$ are defined on $\Omega$.

580:

581: In the applications, the constant $C$ might be very large,

582: something like $C=10^{20}$ is a realistic assumption.

583: Therefore we want to know how the complexity (the cost of

584: optimal algorithms) depends on $C$.

585: Observe that the problem is correctly normalized or scaled such that

586: $

587: S(\fco) = [-1, 1] ,

588: $

589: for any $C \ge 1$.

590: We will prove that the complexity of the problem % , i.e.,

591: % the cost of optimal algorithms,

592: is linear in $C$, and hence

593: there is no way to solve the problem if $C$ is really huge.

594: % This problem class is simply too large.

595: We start with establishing a lower bound and then show that simple

596: Monte Carlo achieves this error up to a constant.

597:

598: \subsection{Lower Bounds}

599:

600: Here we prove lower bounds for all

601: (adaptive or non-adaptive) methods that use $n$ evaluations

602: of $f$ and $\rho$. We use the technique of Bahvalov, i.e.,

603: we study the average error

604: of deterministic algorithms with respect to certain discrete measures

605: on $\fco$.

606: \begin{theorem} \label{thm:lbfc}

607: Assume that we can partition  $\Omega$ into $2n$ disjoint sets with

608: equal measure (equal to $1/2n$).

609: Then for any Monte Carlo method $S_n$ that uses $n$ values of

610: $f$ and $\rho$ we have the lower bound

611: \begin{equation}

612:  \label{eq:2nc}

613: e(S_n,\fco) \ge\frac 1 6 \sqrt 2

614: \begin{cases}

615: \sqrt{\frac{C}{2n}}, &  2n\geq C - 1, \\

616: \frac{3 C}{C+2n-1}, & 2n < C -1.

617: \end{cases}

618: \end{equation}

619: \end{theorem}

620: The lower bound will be obtained in two steps.

621: \begin{enumerate}

622: \item We first reduce the error analysis for Monte Carlo sampling to

623:   the average case error analysis with respect to a certain prior

624:   probability on the class $\fco$.

625:   This approach is due to Bahvalov, see~\cite{Bachvalov}.

626: \item For the chosen prior the average case analysis can be carried

627:   out explicitly and will thus yield a lower bound.

628: \end{enumerate}

629: To construct the prior let $m:=2n$ and  $\Omega_{1},\dots,\Omega_{m}$

630: the partition into sets of equal probability, and $\chi_{\Omega_{j}}$

631: the corresponding characteristic functions. Furthermore, let

632: $$

633: l:=

634: \begin{cases}

635:  \lceil \frac{m}{C-1}\rceil, &  m\geq C -1,\\

636: 1,&\text{ else.}

637: \end{cases}

638: $$

639: Denote $J_{l}^{m}$ the set of

640: all subsets of $\set{1,\dots,m}$ of cardinality equal to $l$, and

641: $\mu_{m,l}$ the equi-distribution on $J_{l}^{m}$, while $\expect_{m,l}$ denotes the expectation with

642: respect to the prior $\mu_{m,l}$. Let

643: $(\e_{1},\dots,\e_{m})$ be independent and identically

644: distributed with $P(\e_{j}=-1)= P(\e_{j}=1)=1/2,\ j=1,\dots,m$.

645: The overall prior is the product probability on $J_{l}^{m}\times

646: \set{\pm 1}^{m}$.

647: For any realization $\om=(I,\e_{1},\dots,\e_{m})$ we assign

648: $$

649: f_{\om}:= \sum_{j\in I} \e_{j}\chi_{\Omega_{j}}\quad \text{and}\quad

650: \rho_{\om}:= C \sum_{j\in I}\chi_{\Omega_{j}} + \sum_{j\not\in I}\chi_{\Omega_{j}} .

651: $$

652: The following observation is useful.

653: \begin{lemma}\label{lem:eml}

654: For any subset $N\subset\set{1,\dots,m}$ of cardinality at most $n$ it holds

655: $$

656: \expect_{m,l}\#(I\setminus N)\geq \frac l 2.

657: $$

658: \end{lemma}

659: \begin{proof}

660:   Clearly, for any fixed $k\in\set{1,\dots,m}$ we have

661:   $\mu_{m,l}(k\in I)=l/m$, thus

662: $$

663: \expect_{m,l}\#(I \setminus N) = \sum_{r\in N^{c}} \expect_{m,l}\chi_{I}(r) =

664: \#(N^{c})\frac l m\geq \frac l 2,

665: $$

666: where we denoted by $N^{c}$ the complement of $N$.

667:

668: \end{proof}

669: \begin{proof}[Proof of Theorem~\ref{thm:lbfc}]

670: Given the above prior let us denote

671: \begin{equation}

672:   \label{eq:errmfl}

673:   e^{avg}_{n}(\fco):= \inf_{q}\lr{\expect_{m,l}\expect_{\e}\abs{S(f,\rho) - q(f,\rho)}^{2}}^{1/2},

674: \end{equation}

675: where the $\inf$ is taken with respect to any

676: (possibly adaptive) deterministic algorithm

677: which uses at most $n$ values from $f$ and $\rho$.

678:

679:

680: For any Monte Carlo method $S_n$  we have, using Bahvalov's argument~\cite{Bachvalov}, the relation

681: \begin{equation}

682:   \label{eq:mc2avg}

683:  e(S_{n},\fco) \geq e^{avg}_{n}(\fco).

684: \end{equation}

685: We provide a lower bound for $e^{avg}_{n}(\fco)^{2}$.

686: To this end note that for each realization $(f_{\om},\rho_{\om})$ the

687: integral $\int \rho_{\om} \;d\mu$ is constant.

688: In the first case $m\geq C -1$,

689: and we can bound the integral by the choice of $l$ as

690: \begin{equation}

691:   \label{eq:intrho}

692:   c_{m,l}:= \int \rho_{\om}(x)\; \mu(d x)= \frac 1 m \lr{l C +

693:     (m-l)1} \leq 3.

694: \end{equation}

695: In the other case  $m <  C -1$, we obtain~$c_{m,1}= (C - 1 + m)/m$.

696: Now, to analyze the average case error, let $q_{n}$ be any

697: (deterministic) method, and let us assume that it uses the set $N$ of nodes.

698: We have the decomposition

699: $$

700: S(f_{\om},\rho_{\om}) -

701: q_{n}(f_{\om},\rho_{\om})=  \lr{\frac{C}{m c_{m,l}} \sum_{j\in

702:     I\setminus N} \e_{j}}

703:  - \lr{\frac{C}{m c_{m,l}}

704:  \sum_{j\in I\cap N} \e_{j} - q_{n}(f_{\om},\rho_{\om})}.

705: $$

706: Given $I$,

707: the random variables in the brackets

708: are conditionally independent, thus uncorrelated.

709: Hence we conclude that

710: \begin{align*}

711:   \expect_{m,l}\expect_{\e}\abs{S(f_{\om},\rho_{\om}) -

712: q_{n}(f_{\om},\rho_{\om})}^{2}

713: & \geq \expect_{m,l}\expect_{\e}\abs{\frac{C}{m c_{m,l}} \sum_{j\in

714:     I\setminus N} \e_{j} }^{2}\\

715: & = \frac{C^{2}}{m^{2} c_{m,l}^{2}}\expect_{m,l}\#(J\setminus N)\geq

716: \frac{C^{2} l}{2 m^{2} c_{m,l}^{2}},

717: \end{align*}

718: by Lemma~\ref{lem:eml}.

719: % \begin{equation*}

720: %   \expect_{m,l}\abs{ S(f,\rho) - q_{n}(f,\rho)}^{2}= \frac{1}{\binom m

721: %     l}\sum_{I\in J_{l}^{m}}

722: % \expect_{m,l}\lr{\abs{ \frac{C}{m c_{m,l}}

723: % \sum_{j\in I} f^{m}_{j} - q_{n}(f^{m},\rho^{m})}^{2}/I},

724: % \end{equation*}

725: % where the expectation on the right is the conditional expectation,

726: % i.e., when $I$ is fixed.

727: % This depends on the overlap between $N\subset\set{1,\dots,m}$ the

728: % set of nodes which is used by $q_{n}$ and

729: % $I$ which may vary between $0$ and $l$. Further note that,

730: %  such that we can bound ($k$ being the random

731: % cardinality $\#(I\setminus N)$ )

732: % \begin{align*}

733: %   \expect_{m,l}\abs{ S(f,\rho) - q_{n}(f,\rho)}^{2}

734: % &\geq  \frac{1}{\binom m l}

735: % \sum_{I\in J_{l}^{m}} \expect_{m,l}\lr{\abs{ \frac{C}{m c_{m,l}}

736: % \sum_{j\in

737: %       I\setminus N} f^{m}_{j}}^{2}/I}\\

738: % &=  \frac{C^{2}}{m^{2} c_{m,l}^{2}}

739: % \sum_{k=0}^{l} k P(\# (I\setminus N)=k)\\

740: % &= \frac{C^{2}}{m^{2} c_{m,l}^{2}}  \sum_{k=0}^{l} k

741: % \frac{\binom{n}{l-k}\binom{m-n}{k}}{\binom m l}=

742: % \frac{C^{2}}{m^{2}c_{m,l}^{2}} \frac{(m - n) l}{m},

743: % \end{align*}

744: % where we used  the definition of the binomials

745: % to evaluate the sum on the

746: % right. %  as

747: % $$

748: %  \sum_{k=0}^{l} k

749: % \frac{\binom{n}{l-k}\binom{m-n}{k}}{\binom m l} = \frac{(m - n) l}{m}.

750: % $$

751: % Overall we obtain

752: % \begin{equation}

753: %   \label{eq:finbound}

754: %  \expect_{m,l}\abs{ S(f,\rho) - q_{n}(f,\rho)}^{2} \geq

755: %  \frac{C^{2}}{m^{2}c_{m,l}^{2}} \frac{(m - n) l}{m}= \frac{C^{2} l}{2

756: %    c_{m,l}^{2} m^{2}}.

757: % \end{equation}

758: In the case $m\geq C -1 $ we obtain $l\geq m/C$ and  have

759: $c_{m,l}\leq 3$, such that %we finally obtain

760: $$

761: \expect_{m,l}\abs{ S(f,\rho) - q_{n}(f,\rho)}^{2} \geq \frac{C}{36 n},

762: $$

763: which in turn yields the first case bound in~(\ref{eq:2nc}).

764: In the other case~$m <  C -1$ the value of $l=1$ yields the second

765: bound in~(\ref{eq:2nc}).

766: \end{proof}

767: \subsection{The error of the simple Monte Carlo method}

768: \label{sec:simple}

769:

770: The direct approach to evaluate~(\ref{eq:base}) would be to use the

771: method~$\vtn$ from~(\ref{eq:vtn}).

772: We will prove an upper bound for the error of this method, and

773: we start with the following

774: \begin{lemma}\label{lem:rho}

775:   If the function $\rho$ obeys the requirements in~$\fco$, then

776:   \begin{enumerate}

777:   \item $0< \inf_{x\in\Omega}\rho(x)\leq

778:     \sup_{x\in\Omega}\rho(x)<\infty$.

779: \item For every probability measure $\mu$ on $\Omega$ we have

780: $\norm{\rho}{2,\mu}\leq \sqrt C\norm{\rho}{1,\mu} $.

781:   \end{enumerate}

782: \end{lemma}

783: \begin{proof}

784:   To prove the first assertion, fix any $y_{0}\in\Omega$. Then the

785:   assumption on $\rho$ yields $\rho(x)\leq C \rho(y_{0})$, and

786:   reversing the roles of $x$ and $y$ also the lower bound.

787: Now both, the assumption on $\rho$ as well as the

788: second assertion,  are invariant with respect to multiplication

789: of $\rho$ by a constant. In the light of the first assertion we may

790: and do assume that $1\leq\rho(x)\leq C,\ x\in\Omega$, and  we derive,

791: using $ 1 \leq \int_{\Omega}\rho(x)\; \mu(dx)$, that

792: $$

793: \int_{\Omega}\rho^{2}(x)\; \mu(dx)\leq C \int_{\Omega}\rho(x)\;

794: \mu(dx) \leq C \lr{\int_{\Omega}\rho(x)\; \mu(dx)}^{2},

795: $$

796: completing the proof of the second assertion and of the lemma.

797: \end{proof}

798: We turn to the bound for the simple Monte Carlo method.

799: \begin{theorem}

800: For all $n\in\N$ we have

801: \begin{equation}

802:     \label{eq:thm1}

803: e(\vtn,\fco)\leq 2\, \min\set{1,  \sqrt{\frac{2C}{n}}} .

804:   \end{equation}

805: \end{theorem}

806: \begin{proof}

807: The upper bound~$2$ is trivial, it even holds deterministically.

808:   Fix any pair $(f,\rho)$ of input. For any sample

809:   $\lr{X_1,\dots,X_n}$ and function $g$ we denote the sample

810: mean by $\vt(g):= 1/n\sum_{j=1}^n g(X_j)$.

811: It is well known that $e(\vt,g)\leq \norm{g}{2}/\sqrt n$.

812: With this notation we can

813: bound

814:   \begin{align*}

815:     &\abs{S(f,\rho) - \vtn(f,\rho)}\leq \abs{S(f,\rho) -

816:       \frac{\vt(f\rho)}{\int \rho(x)\mu(dx)}}+

817: \abs{\frac{\vt(f\rho)}{\int \rho(x)\mu(dx)} -

818:   \frac{\vt(f\rho)}{\vt(\rho)}}\\

819: &\leq \frac{1}{\norm{\rho}{1}}\lr{\abs{\int

820:   f(x)\rho(x)\mu(dx)-\vt(f \rho) }

821: + \abs{\frac{\vt(f\rho)}{\vt(\rho)}}

822: \abs{\int \rho(x)\mu(dx) - \vt(\rho)}}\\

823: &\leq  \frac{1}{\norm{\rho}{1}}\lr{\abs{\int

824:   f(x)\rho(x)\mu(dx)-\vt(f \rho) }

825: +\norm{f}{\infty}

826: \abs{\int \rho(x)\mu(dx) - \vt(\rho)}},

827:   \end{align*}

828: where we used

829: $

830: \abs{\vt(f\rho)/{\vt(\rho)}}\leq \norm{f}{\infty},

831: $

832: which holds true since the enumerator and

833: denominator use the same sample.

834: This yields the following error bound

835: \begin{align*}

836:   e(\vtn,(f,\rho))&\leq   \frac{\sqrt 2}{\norm{\rho}{1}}

837: \lr{ e(\vt,f\rho) + \norm{f}{\infty}e(\vt,\rho)}\\

838: &\leq \frac{\sqrt 2}{\norm{\rho}{1}\sqrt n}\lr{\norm{f\rho}{2} +

839:   \norm{f}{\infty}\norm{\rho}{2}}\leq \frac{2\sqrt 2 \norm{f}{\infty}}{\sqrt

840:   n}\frac{\norm{\rho}{2}}{\norm{\rho}{1}}

841:   \leq \frac{2\sqrt{2C}}{\sqrt  n},

842: \end{align*}

843: where we use Lemma~\ref{lem:rho}. Taking  the supremum over $(f,\rho)\in\fco$

844: allows to complete the proof.

845: \end{proof}

846:

847: \section{Analysis for $\fad(\Omega)$} \label{s3}

848:

849: In this section we impose restrictions on  the input data, in

850: particular on the density,  in order to improve the complexity. This

851: class is still large enough to contain many important situations.

852: Monte Carlo methods for problems when the target (invariant)

853: distribution is log-concave proved to be important in many studies, we

854: refer to~\cite{MR1284987}. One of the main intrinsic features of such

855: classes of distributions are \emph{isoperimetric inequalities},

856: see~\cite{103439,MR1318794}, which will also be used here in the form

857: as used in~\cite{MR2178341}.

858: Recall that here we always require that $\Omega\subset \R^{d}$ is a

859: convex body, as introduced in Section~\ref{sec:classfad}.

860:

861: % We always assume the following.

862: % The functions $f$ and $\rho$ are defined on

863: % a compact and convex set $\Omega \subset \R^d$

864: % with nonempty interior and

865: % $\mu=\muo$ is the normalized Lebesgue measure

866: % on the set~$\Omega$.

867: % We  consider the class of log-concave weights on

868: % $\Omega$  given by

869: % $$

870: % \rad(\Omega)  = \{  \rho \mid

871: % \rho >0, \

872: % \log\rho \text{ is concave}, \

873: % |\log\rho(x) - \log\rho(y) | \leq \alpha \Vert x-y \Vert_2 \} .

874: % $$

875:

876: % We study the class $\fad(\Omega)$, given by

877: % $$

878: % \fad (\Omega)  = \set{(f, \rho) \mid

879: % \rho \in\rad( \Omega),  \ \norm{f}{2,\rho}\le 1 } ,

880: % $$

881: % where $\Vert \cdot \Vert_{2,\rho}$ is the

882: % $L_2$-norm with respect to the probability measure $\mur$,

883: % see~\eqref{mur}.

884: % In particular, we study the (Euclidean) unit ball, i.e.,

885: % $\Omega:= \ball \subset \R^d$.

886:

887: We start with a lower bound for all non-adaptive algorithms to exhibit

888: that simple Monte Carlo cannot take into account the additional

889: structure of the underlying class of input data and adaptive methods

890: should be used. This bound, together with Theorem~\ref{th5}, will show

891: that adaptive methods can outperform any

892: non-adaptive method, if we consider $S$ on $\fad (\ball)$.

893: Indeed, we also show that specific Metropolis

894: algorithms, based on local underlying Markov chains are suited for

895: this problem class.

896:

897: \subsection{A lower bound for non-adaptive methods}

898: \label{sec:non}

899:

900: Here we prove a lower bound for all non-adaptive methods

901: (hence in particular for the simple Monte Carlo method)

902: for the problem on the classes~$\fad(\Omega)$.

903: Again, this lower bound will use Bahvalov's technique.

904:

905: We start with a result on sphere packings.

906: The Minkowski-Hlawka theorem,  see~\cite{MR0172183},

907: says that the density of the densest sphere packing in $\R^d$

908: ist at least $\zeta (d) \cdot 2^{1-d}\ge 2^{1-d}$.

909: It is also known, see \cite{Hlawka}, that the density

910: (by definition of the whole $\R^d$) can be replaced by the density within

911: a convex body $\Omega$, as long as the radius $r$ of the

912: spheres tends to zero. Hence we obtain the following result.

913:

914: \begin{lemma}

915: \label{lem:MHT}

916: There is $n_{\Omega}\in\N$ such that for all $m\geq n_{\Omega}$ there are points

917: $y_{1},\dots,y_{m}\in\Omega$ such that with

918: $$

919: r:=r(\Omega,m):= 2^{-1} m^{-1/d} \left( \frac{\vol (\Omega)}

920: {\vol (\ball)}\right)^{1/d}

921: $$

922: the closed balls $B_{i}:= B(y_{i},r)\subset \Omega$

923: are disjoint.

924: \end{lemma}

925:

926: Our construction will use such points $y_{1},\dots,y_{m}\in\Omega$ and

927: the corresponding balls $B_{1},\dots,B_{m}$ as follows.

928:

929: For $i\in\set{1,\dots,m}$ we assign

930: \begin{align*}

931:  \rho_{i}(y)&:= c_i \exp\lr{-\alpha\norm{y - y_{i}}{2}},\quad

932:  y\in\Omega  \quad\text{and}\\

933: f_{i}(y)&:= \tilde c_i  \chi_{B_{i}}(y),\quad y\in\Omega ,

934: \end{align*}

935: with constants $c_i$ and $\tilde c_i$ chosen such that

936: \begin{alignat*}{2}

937: 1&= \int_{\Omega } \rho_i(y) \, dy &=

938: c_i \int_{\Omega } \exp(- \a \norm{y - y_i}{}) dy\quad \text{and}\\

939: 1&=\norm{f_i}{2,\rho_i} &= \tilde c_i^2 c_i \int_{B_i} \exp(- \a

940: \norm{y - y_i}{})\, dy.

941: \end{alignat*}

942: The corresponding values of the mapping $S$ are computed as

943: \begin{align}\label{eq:slb}

944:   \begin{split}

945: S(f_i,\rho_i) &= \int_{\Omega } f_i \rho_i\, dy = \tilde c_i c_i

946: \int_{B_i} \exp(- \a \norm{y - y_i}{})\, dy\\

947: & = \lr{ c_i \int_{B_i} \exp(- \a \norm{y - y_i}{}) dy}^{1/2}=

948:  \lr{ c_i \int_{B(0,r)} \exp(- \a \norm{y}{}) \, dy}^{1/2}\\

949: &= \lr{\frac{\int_{B(0,r)} \exp(- \a \norm{y}{}) \, dy}{\int_{\Omega}

950: \exp(- \a \norm{y - y_i}{})\,dy}}^{1/2}.

951:   \end{split}

952: \end{align}

953: Again we turn to the average case setting, this time with

954:  probability measure $\mu^{2n}$ being the equidistribution on the set

955: $$

956: \mathcal F^{2n}:= \set{ \lr{\e_i f_i,\rho_i},\quad i=1,\dots,2n,\

957:   \e_i=\pm 1}\subset \fad(\Omega ).

958: $$

959: Similar to~(\ref{eq:mc2avg}) we have for any non-adaptive Monte Carlo

960: method $S_n(f,\rho)$ the relation

961: $$

962: e(S_n,\fad(\Omega ))\geq

963: \min\set{ e^{avg}(q_n,\mu^{2n}),\quad q_n \text{ is

964: deterministic and non-adaptive}},

965: $$

966: where $e^{avg}(q_n,\mu^{2n})$ denotes the average case error of the

967: deterministic non-adaptive method $q_n$ with respect to the

968: probability $\mu^{2n}$.

969: Thus let~ $q_n$ be any non-adaptive

970: (deterministic) algorithm for $S$ on the

971: class $\fad (\Omega )$ that uses at most $n$ values.

972:

973: The average case error can then be bounded from below as

974: \begin{align*}

975: \expect_{\mu^{2n}}\abs{S(f,\rho) - q_n(f,\rho)}^2&=

976: \frac{1}{2n}\sum_{i=1}^{2n}

977: \expect_{\e}\abs{S(\e_i f_i,\rho_i) - q_n(\e_i

978:   f_i,\rho_i) }^2\\

979: &\geq \frac 1 2 \min_{i=1,\dots,2n}\expect_{\e}\abs{S(\e_i f_i,\rho_i)

980: }^2 \geq  \frac 1 2 \min_{i=1,\dots,2n}S(f_i,\rho_i)^2.

981: \end{align*}

982: Above, $\expect_{\e}$ denotes the expecation with respect to the

983: independent random variables $\e_{i}=\pm 1$.

984: Together with~(\ref{eq:slb}) we obtain

985: $$

986: e(S_n,\fad(\Omega))\geq \frac 1 2 \sqrt 2\,

987: \min_{i=1,\dots,2n}\lr{\frac{\int_{B(0,r)} \exp(- \a \norm{y}{}) \,

988:     dy}{\int_{\Omega} \exp(- \a \norm{y - y_i}{})\,dy}}^{1/2}.

989: $$

990: We bound the enumerator from below and the denominator from

991: above.

992: For $\alpha r\leq \log 2$ we can bound

993: $$

994: \int_{B(0,r)} \exp(- \a \norm{y}{}) \, dy\geq \frac 1 2

995:   \vol(B(0,r))= \frac 1 2 r^d \vol(\ball).

996: $$

997: For the denominator we have  %  \fix{alpha gross klein}

998: %   letting temporarily $\bar\a:=

999: %   \max\set{\a,1}$, that

1000: \begin{align*}

1001: \int_{\Omega} \exp(- \a \norm{y - y_i}{})\,dy &\leq \int_{\R^d} \exp(-

1002: \a \norm{y - y_i}{})\,dy \\

1003: & ={\a}^{-d} \int_{\R^d} \exp(-\norm{y}{})\,dy=

1004: {\a}^{-d} \Gamma(d)\vol{\partial \ball},

1005: \end{align*}

1006: such that we finally obtain, using the well known formula

1007: $\vol(\partial \ball) = d \vol(\ball)$, that

1008: $$

1009: e(S_n,\fad(\Omega))\geq  \frac 1 2 \sqrt 2\, \lr{\frac{{\a}^d

1010:     r^d}{2 d!}}^{1/2} = \frac 1 2 \lr{\frac{{\a}^d

1011:     r^d}{d!}}^{1/2}.

1012: $$

1013: Using the value for $r=r(\Omega ,2n)$ from Lemma~\ref{lem:MHT} we end up

1014: with

1015: \begin{theorem}

1016: Assume that $S_n$ is any non-adaptive Monte Carlo method for

1017: the class $\fad (\Omega )$. Then, with~$ n_\Omega $ from Lemma~\ref{lem:MHT},

1018: we have for all

1019: $$

1020: 2n \ge \max\set{n_\Omega ,\lr{\a/{\log 4}}^d \cdot

1021: \frac{\vol \Omega}{\vol \ball}}

1022: $$

1023: that

1024: \begin{equation}  \label{lo9}

1025: e(S_n, \fad(\Omega )) \ge

1026: 2^{-d/2-3/2} \cdot

1027: \left( \frac{\vol \Omega}{\vol \ball} \right)^{1/2} \cdot

1028: \frac{\alpha^{d/2}}{\sqrt{d!}} \ n^{-1/2} .

1029: \end{equation}

1030: \end{theorem}

1031:

1032: \begin{rem}

1033: For fixed $d$ this is a lower bound of the form

1034: $e(S_n) \ge c_\Omega \, \a^{d/2} \, n^{-1/2}$. It is interesting only

1035: if $\alpha$ is ``large'', otherwise the already mentioned lower bound

1036: $(1+ \sqrt{n})^{-1}$ is better.

1037:

1038: We stress that in the above reasoning we essentially used the

1039: non-adaptivity of the method $S_n$. Indeed, if $S_n$ were adaptive,

1040: then by just one appropriate function

1041: value $\rho(x)$,  we could identify the

1042: index $i$, since the functions $\rho_i$ are

1043: global. Then, knowing $i$,  we could ask for the value of $\e_i$ and

1044: would obtain the exact solution to $S(f,\rho)$ for this small class

1045: $\mathcal F^{2n}$ for all $n \ge 2$.

1046: \end{rem}

1047:

1048: \subsection{Metropolis method with local underlying walk}

1049: \label{sec:metro-loc}

1050:

1051: The Metropolis algorithm we consider here has a specific

1052: routine~\KwDet in Figure~\ref{fig:gene}, whereas the

1053: final step~\KwAvg is exactly as given in~(\ref{eq:met}). It is based on a

1054: specific ball walk and this version is

1055: sometimes called \emph{ball walk with

1056: Metropolis filter}, see~\cite{MR2178341}.

1057: Two concepts from the theory of Markov chains turn out to be

1058: important, reversibility and uniform ergodicity. We recall these

1059: notions briefly, see~\cite{MR1399158} for further details.

1060: A Markov chain  $(K,\pi)$ is \emph{reversible with respect to $\pi$},

1061: if for all measurable subsets $A,B\subset\Omega$ the balance

1062: \begin{equation}\label{eq-rev}

1063: \int_{A}K(x,B)\pi(dx)=\int_{B}K(x,A)\pi(dx)

1064: \end{equation}

1065: holds true. Notice that in this case necessarily $\pi$ is an invariant

1066: distribution.

1067:

1068: A Markov chain is \emph{uniformly ergodic} if there are $n_{0}\in\N$, a

1069: constant $c>0$ and a probability measure $\nu$ on $\Omega$ such that

1070:   \begin{equation}

1071:     \label{eq:ueball}

1072: K^{n_{0}}(x,A) \geq c \nu(A),

1073: \quad \text{ for all } A\subset \Omega\text{ and } x\in\Omega.

1074:   \end{equation}

1075: Markov chains which are  uniformly ergodic have a unique invariant

1076: probability distribution.

1077:

1078: Our analysis will be based on conductance arguments and we

1079: recall the basic notions, see~\cite{MR1025467,MR1238906}.

1080: If $(K,\pi)$ is a Markov chain with transition kernel $K$ and

1081: invariant distribution $\pi$ then we assign the

1082: \begin{enumerate}

1083: \item

1084: \emph{local conductance} at $x\in\Omega$ by $l_K(x):=

1085:   K(x,\Omega\setminus\set{x})$,

1086: \item and the \emph{conductance} as

1087: \begin{equation}

1088:   \label{eq:conductance}

1089:   \phi(K,\pi):= \inf_{0<\pi(A) <  1}\frac{\int_A K(x,A^c)

1090:     \pi(dx)}{\min\set{\pi(A),\pi(A^c)}},

1091: \end{equation}

1092: where $A^c= \Omega \setminus A$.

1093: \end{enumerate}

1094: Below we call $l>0$ a \emph{lower bound for the local conductance}, if

1095: $l_{K}(x)\geq l$ for all $x\in\Omega$.

1096:

1097: \subsubsection*{The ball walk and some of its properties}

1098: \label{sec:ball}

1099:

1100: Here we gather some properties of the ball walk,

1101: see~\cite{MR1238906,MR2178341},  which will serve as

1102: ingredients for the analysis of Metropolis chains using this as the

1103: underlying proposal.

1104: In particular we prove that on convex bodies in $\R^{d}$ the ball walk is

1105: uniformly ergodic and we bound its conductance from below, in terms

1106: of bounds $l>0$ for the local conductance.

1107:

1108: We abbreviate $B(0,\delta) = \delta \ball$.

1109: Let $Q_\delta$ be the transition

1110: kernel of a local random walk

1111: having transitions within $\delta$-balls of its current position,

1112: i.e., we let

1113: \begin{equation}

1114: \label{eq:pxx}

1115: Q_{\delta}(x,\set{x}):= 1 - \frac{\vol(B(x,\delta)

1116: \cap \Omega)}{\vol(\delta \ball )},

1117: \end{equation}

1118: and

1119: \begin{equation}

1120: \label{eq:qloc}

1121: Q_\delta(x,A):=

1122: \begin{cases}

1123: \displaystyle{\frac{\vol(B(x,\delta) \cap A)}{\vol(\delta \ball )}},

1124: &   A

1125: \subset \Omega \text{ and }x \notin A, \\

1126: Q_\delta(x,A\setminus\set{x}) +   Q_{\delta}(x,\set{x}), &

1127: A \subset \Omega \text{ and } x\in A.

1128: \end{cases}

1129: \end{equation}

1130: Schematically, the transition kernel may be viewed

1131: as in Figure~\ref{fig:bbb}.

1132:

1133: \SetKw{KwProp}{Propose:}

1134: \SetKw{KwAcc}{Accept:}

1135: \SetKwInOut{Input}{Input}

1136: \SetKwInOut{Output}{Output}

1137: \restylealgo{ruled}

1138: \begin{figure}[h]

1139:   \centering

1140: \begin{procedure}[H]

1141: \Input{current position $x$; $\delta>0$\;}

1142: \Output{next position\;}

1143: \KwProp{Choose $y\in B(x,\delta)$ uniformly}\;

1144: \KwAcc{}

1145: \eIf{$y\in\Omega$}{\Return{$y$}\;}{\Return{$x$}\;}

1146:   \caption{Ball-walk-step($x,\delta$)}

1147: \end{procedure}

1148:   \caption{Schematic view of ball walk step}

1149:   \label{fig:bbb}

1150: \end{figure}

1151: Clearly we may restrict to $\delta\leq D$, the diameter of $\Omega$.

1152: The following observation is important and explains why we restrict

1153: ourselves to convex bodies..

1154: \begin{lemma}

1155:  If $\Omega\subset \R^{d}$ is a convex body, then the ball walk

1156:  $Q_{\delta}$ has a (non-trivial) lower bound $l>0$ for the local conductance.

1157: \end{lemma}

1158: \begin{proof}

1159: It is well-known that convex bodies satisfy the cone condition

1160: (see % Lemma 3 of Section 3.2 in

1161: \cite[\S~3.2, Lemma~3]{Burenkov}).

1162: Therefore we obtain that for each $\delta>0$ there is $l>0$ such that

1163: for each $x \in \Omega$ we have $l_{Q_\delta} (x) \ge l$.

1164: % $$

1165: % \exists \ {l > 0} \quad

1166: % \exists \ {\delta_0>0} \quad

1167: % \forall \ {0<\delta < \delta_0} \quad

1168: % \forall \ {x \in \Omega} \quad

1169: % l_{Q_\delta} (x) \ge l .

1170: % $$

1171: \end{proof}

1172: \begin{rem}

1173: Observe however, that $l$ might be very small.

1174: For $\Omega=[0,1]^d$, for example, we get $l = 2^{-d}$,

1175: even if $\delta$ is very small. In contrast, we will

1176: see that a large $l$ is possible for $\Omega=B^d$

1177: and $\delta \le 1/\sqrt{d+1}$, see Lemma~\ref{lem:l-bound}.

1178: \end{rem}

1179: Notice that $l_{Q_{\delta}}(x)= {\vol(B(x,\delta) \cap

1180: \Omega)}/{\vol(\delta \ball )}$, hence in the following we use the inequality

1181: \begin{equation}

1182: \label{eq:l-bound}

1183: \vol(B(x,\delta)\cap \Omega)\geq l \vol(\delta\ball),

1184: \end{equation}

1185: where $l>0$ is a lower bound for the local conductance

1186: of the ball walk.

1187:

1188: The following result  is~\emph{folklore}, but for a

1189: lack of reference we sketch a proof.

1190:

1191: \begin{proposition}\label{prop:ueqd}

1192: %  Let $\Omega\subset \R^{d}$ be compact.

1193: The ball walk $Q_{\delta}$ is reversible with respect to the uniform

1194: distribution $\muo$ and

1195: %  If there is a lower bound $l>0$  for the local conductance

1196: %  of $Q_{\delta/2}$ then the ball walk $Q_{\delta}$ is

1197: uniformly ergodic.

1198: %   For each $0<\delta\leq 1/\sqrt{d+1}$  the ball

1199: %   walk $Q_{\delta}$  is uniformly

1200: %   ergodic and reversible. In particular there are $n_{0}\in\N$, a

1201: %   constant $c>0$ and a probability measure $\nu$ on $\ball$ such that

1202: %   \begin{equation}

1203: % %    \label{eq:ueball}

1204: % Q_{\delta}^{n_{0}}(x,A) \geq c \nu(A),

1205: % \quad \text{ for all } A\subset \ball\text{ and } x\in\ball.

1206: %   \end{equation}

1207: \end{proposition}

1208:

1209:  The crucial tool for proving this is provided by the

1210: notion of small and petite sets, where we refer to~\cite[Sect.~5.2 \&

1211: 5.5]{Meyn-book} for details and properties.

1212: To this end we introduce a \emph{sampled} chain, say

1213: $(Q_{\delta})_{a}$, where $a$ is some probability

1214: $a=\lr{a_{0},a_{1},\dots}$ on $\set{0,1,2,\dots}$

1215: and $(Q_{\delta})_{a}$ is defined by $(Q_{\delta})_{a}(x,C):=

1216: \sum_{j=0}^{\infty}a_{j}Q_{\delta}^{j}(x,C)$.

1217: %A set $C\subset \Omega$ is \emph{petite},

1218: We recall that a

1219: (measurable) subset $C\subset \Omega$ is \emph{petite} (for

1220: $Q_{\delta}$), if there are a probability~$a$

1221:  and a probability measure $\nu$ on

1222: $\Omega$ such that

1223: \begin{equation}

1224: \label{eq:small}

1225: (Q_{\delta})_{a}(y,A)\geq \varepsilon \nu(A),

1226: \quad A\subset \Omega,\ y \in C.

1227: \end{equation}

1228: A set $C\subset \Omega$ is \emph{small}, if the same property holds

1229: true for some Dirac probability $a:= \delta_{n}$, such that obviously

1230: small sets are petite.

1231: We first show that certain balls are small.

1232:

1233: \begin{lemma}\label{lem:small}

1234: % Let $\delta>0$ and let $l >0$ be a lower bound for

1235: % the local conductance

1236: % of the ball walk $Q_{\delta/2}$.

1237: The sets $ B(x,\delta/2)\cap

1238: % If there is a is

1239: % a lower bound $l>0$ for the local conductance

1240: % of the ball walk $Q_{\delta/2}$ then the sets $ B(x,\delta/2)\cap

1241: \Omega,\ x\in\Omega$ are small for $Q_\delta$.

1242: %  Let $\delta\leq 1/\sqrt{d+1}$ and $x\in\ball$. If $y\in

1243: %   B(x,\delta/2)\cap \ball$ then

1244: %   \begin{equation}

1245: %     \label{eq:smlemma}

1246: %     Q_{\delta}(y,A) \geq 0.3 \cdot 2^{-d} \frac{\vol(A \cap

1247: %       B(x,\delta/2)\cap \ball)}{\vol( B(x,\delta/2)\cap \ball)},\quad

1248: %     A\subset \ball.

1249: % \end{equation}

1250: %Consequently,  each set $B(x,\delta/2)\cap \ball$  is small.

1251: \end{lemma}

1252:

1253: \begin{proof}

1254: First, we note that $y\in B(x,\delta/2)$ implies $B(x,\delta/2) \subset

1255: B(y,\delta)$. Let $l>0$ be a lower bound for the local conductance of

1256: $Q_{\delta/2}$. Using~(\ref{eq:l-bound})

1257: for $Q_{\delta/2}$, we obtain for any set $A\subset \Omega$ that

1258: \begin{align*}

1259:   Q_{\delta}(y,A) &\geq  Q_{\delta}(y,A\setminus\set{y}) =

1260:   \frac{\vol(B(y,\delta)\cap A)}{\vol(B(y,\delta))} \geq 2^{-d}

1261:   \frac{\vol(B(x,\delta/2)\cap A)}{\vol(\delta/2\ball)}\\

1262: &\geq l \cdot 2^{-d} \frac{\vol(A \cap

1263:       B(x,\delta/2)\cap \Omega)}{\vol( B(x,\delta/2)\cap \Omega)}.

1264: \end{align*}

1265: Hence estimate~(\ref{eq:small}) holds true with $n_{0}:=1,\

1266: \varepsilon:= l\cdot 2^{-d}$ and

1267: $$

1268: \nu(A) := \frac{\vol(A \cap

1269:       B(x,\delta/2)\cap \Omega)}{\vol( B(x,\delta/2)\cap \Omega)},\quad

1270:     A\subset \Omega.

1271: $$

1272: This completes the proof.

1273: \end{proof}

1274: \begin{proof}[Proof of Proposition~\ref{prop:ueqd}]

1275: We first prove reversibility with respect to $\muo$.

1276: Notice that it is enough to verify~(\ref{eq-rev})

1277: for disjoint sets $A,B\subset \Omega$.

1278: Furthermore we observe that for any pair $A,B\subset \Omega$

1279: of measurable subsets the characteristic function of the set

1280: $$

1281: \set{(x,y)\in\Omega\times \Omega,\quad x\in A,\ y\in B,\ \norm{x -

1282:     y}{}\leq \delta}

1283: $$

1284: can equivalently be rewritten as

1285: $$

1286: \chi_{B}(y) \chi_{B(y,\delta)\cap A}(x)

1287: \quad \text{or} \quad \chi_{A}(x) \chi_{B(x,\delta)\cap B}(y).

1288: $$

1289: Hence, letting temporarily

1290: $c:={\vol(\Omega)\vol(\delta\ball)}$ we obtain

1291: \begin{align*}

1292:   \int_{A}Q_{\delta}(x,B)\;\muo(dx)&=

1293:  \frac 1 c \int_{A}\vol(B(x,\delta)\cap

1294:   B)\; dx\\

1295: &=  \frac 1 c \int_{\Omega}\int_{\Omega}\chi_{A}(x)

1296: \chi_{B(x,\delta)\cap B}(y)\;

1297: dy\;dx\\

1298: &=  \frac 1 c \int_{\Omega}\int_{\Omega}\chi_{B}(y)

1299: \chi_{B(y,\delta)\cap A}(x)\;

1300: dx\;dy= \int_{B}Q_{\delta}(y,A)\;\muo(dy),

1301: \end{align*}

1302: proving reversibility.

1303:

1304: By Lemma~\ref{lem:small} each set $B(x,\delta/2) \cap \Omega$ is small,

1305: thus also petite. Petiteness is in\-heri\-ted by taking finite

1306: unions. Since $\Omega$, being compact, can be covered by finitely many

1307: sets  $B(x,\delta/2)\cap \Omega$, this implies that $\Omega$ is

1308: petite. By~\cite[Thm.~16.2.2]{Meyn-book} this yields uniform

1309: ergodicity of the ball walk % , and hence that $\Omega$ is small

1310: (see~\cite[Thm.~16.0.2(v)]{Meyn-book}).

1311: \end{proof}

1312: We mention the following conductance bound  of the ball

1313: walk, which is  a slight improvement

1314: of~\cite[Thm.~5.2]{MR2178341}. This will be  a special case of

1315: Theorem~\ref{thm:met-cond}, below, and we omit the proof.

1316:

1317: \begin{proposition}\label{pro:phi}

1318: Let $(Q_{\delta},\muo)$ be the ball walk from above,

1319: and let $\phi(Q_{\delta},\muo)$ be its conductance.

1320: Let~$D$ be the diameter of $\Omega$  and

1321: let $l$ be a lower bound for the local conductance. Then

1322: \begin{equation}

1323: \label{eq:ballconductancelb}

1324: \phi(Q_{\delta},\muo) \geq

1325: \sqrt{\frac \pi 2}\frac{l^{2}\delta}{8 D \sqrt{d +1}}.

1326: %% \frac{l^{2}\delta}{16 D \sqrt d}.

1327: \end{equation}

1328: \end{proposition}

1329:

1330: The local conductance may be arbitrarily small if the domain $\Omega$

1331: has sharp corners.

1332: For specific sets $\Omega$ we can explicitly provide lower bounds for

1333: the local conductance, and this will be used in the later convergence

1334: analysis.

1335: In the following we mainly discuss the case $\Omega = \ball$.

1336:

1337: We start with a  technical result, related to the Gamma function on

1338: $\R^+$. We use the well-known formula

1339: \begin{equation}

1340:   \label{eq:3}

1341: \vol(\ball)= \pi^{d/2}/\Gamma(d/2 +1).

1342: \end{equation}

1343: \begin{lemma}\label{lem:bou}

1344: For any $z>0$ we have

1345: \begin{equation}

1346:   \label{eq:gamma}

1347:   \frac{\Gamma(z+1/2)}{\Gamma(z)}\leq \sqrt z.

1348: \end{equation}

1349: Consequently,

1350: \begin{equation}

1351:   \label{eq:vol-bound}

1352: \frac{ \vol(B^{d-1})}{\vol(\ball)}

1353: %% \frac{\Gamma(d/2 +1 )}{\delta\sqrt\pi\Gamma((d+1)/2) }

1354: \leq \sqrt{\frac{d+1}{2\pi}}.

1355: \end{equation}

1356: \end{lemma}

1357: \begin{proof}

1358:   By~\cite[Chapt.~VII, Eq.~(11)]{MR2013000} we know that the function

1359:   $z\mapsto \log\Gamma(z)$ is convex for $z>0$. Thus we conclude

1360:   \begin{align*}

1361:     \log\Gamma(z + 1/2)

1362: &\leq \frac 1 2 \lr{\log\Gamma(z+1) + \log\Gamma(z)}\\

1363: & = \frac 1 2 \lr{\log z  + 2 \log\Gamma(z)}

1364: = \log\sqrt z + \log\Gamma(z),

1365:   \end{align*}

1366: from which the proof of assertion~(\ref{eq:gamma}) can be completed.

1367: Using the representation for the volume from~(\ref{eq:3}) and applying

1368: the above  bound with $z:= (d+1)/2$ we obtain

1369: $$

1370: \frac{ \vol(B^{d-1})}{\vol(\ball)}\leq

1371: \frac{\Gamma(d/2 +1 )}{\sqrt\pi\Gamma((d+1)/2) }

1372: \leq \sqrt{\frac{d+1}{2\pi}},

1373: $$

1374: and the proof is complete.

1375: \end{proof}

1376: %P where we can explicitly provide lower bounds

1377: %  for the local conductance.

1378: Using Lemma~\ref{lem:bou},  we can prove the

1379: following lower bound for the local

1380: conductance of the ball walk on $\ball$.

1381:

1382: \begin{lemma}  \label{lem:l-bound}

1383: Let $(Q_\delta,\muo)$ be the local ball walk on $\ball\subset \R^d$.

1384: If $\delta\leq 1/\sqrt{d +1}$, then its

1385: local conductance obeys $l\geq 0.3$.

1386: \end{lemma}

1387:

1388: \begin{proof}

1389: The proof is based on some geometric reasoning. It is clear that the

1390: local conductance~$l(x)$ is minimal for points $x$ at the

1391: boundary of $\ball$, and in this case

1392: its value equals the portion, say $\widetilde V$,

1393: of the volume of $B(x,\delta)$ inside $\ball$. If $H$ is the

1394: hyperplane at $x$ to $\ball$, then this cuts off $B(x,\delta)$

1395: exactly one half of its volume.

1396: Thus we let  $Z(h)$ be the cylinder with

1397: base being the $(d-1)$-ball around $x$ in the hyperplane $H$ of

1398: radius $\delta$.

1399: Its height~$h$ is the distance of $H$ to the hyperplane

1400:   determined by the intersection of $\ball\cap B(x,\delta)$. This

1401:   height $h$ is exactly determined from the quotient $h/\delta =

1402:   \delta/2$, by similarity, hence $h:= \delta^2/2$. By

1403:   construction we have $\widetilde V \geq 1/2 -

1404: \vol(Z(h))/\vol(B(x,\delta))$ and we can

1405: lower bound the local conductance $l(x)$ by

1406: $$

1407: l(x)\geq \frac 1 2  - \frac{\vol(Z(h))}{\vol(B(x,\delta))}.

1408: $$

1409: We can evaluate~$\vol(Z(h))$ as

1410: $

1411: \vol(Z(h)) = h \delta^{d-1} \vol(B^{d-1}),

1412: $

1413: and we obtain

1414: $$

1415: l(x)\geq \frac 1 2 - \frac{\delta^{d+1} \vol(B^{d-1})}{2 \delta^d

1416: \vol(\ball)}= \frac 1 2 \lr{ 1 -

1417: \frac{\delta \vol(B^{d-1})}{\vol(\ball)}}.

1418: $$

1419: % We use~(\ref{eq:3})

1420: % \begin{displaymath}

1421: %   l(x)\geq \frac 1 2 \lr{1 - \frac{\delta \Gamma(\frac d 2 + 1)}{

1422: %   \sqrt\pi  \Gamma(\frac d 2 +  \frac 1 2)}}.

1423: % \end{displaymath}

1424: The bound~(\ref{eq:vol-bound}) from Lemma~\ref{lem:bou} implies

1425: $$

1426: l(x) \geq % \frac 1 2 \lr{1 - \frac{\delta \sqrt{{(d+1)}/{2}}}{

1427: %   \sqrt\pi}} =

1428: \frac 1 2 \lr{1 - \frac{\delta \sqrt{{d+1}}}{  \sqrt{2

1429:     \pi}}}.

1430: $$

1431: For $\delta\leq 1/(\sqrt{d+1})$ we get

1432: $l(x) \geq 1/2( 1 - 1/\sqrt{2\pi})\geq 0.3$, completing the proof.

1433: \end{proof}

1434:

1435: We close this subsection with the following technical lemma,

1436: which  can be extracted from the unpublished

1437: seminar note~\cite{vempala-lesson}. For the convenience of the

1438: reader we present its proof.

1439: In addition we will slightly improve the statement.

1440: \begin{lemma}%% [{\cite{vempala-lesson}}]

1441:   \label{lem:vempala}

1442: Let $l >  0$ be a lower bound for the local

1443: conductance of the ball walk $(Q_\delta,\muo)$.

1444: For any $0<t< l$ and any set

1445:   $A\subset \Omega$ with related sets

1446:   \begin{align}

1447: A_1&:= \set{x\in A, \quad Q_\delta (x, A^c)< \frac{l -t}{2}}\subset

1448:     A\\

1449:   A_2 &:= \set{y\in A^c,\quad  Q_\delta(y, A)< \frac{l -t}{2}}\subset

1450:   A^c,

1451:   \end{align}

1452: we have $d(A_1,A_2)>t\delta \sqrt{2 \pi/\lr{d+1}}$.

1453: \end{lemma}

1454: For its proof we need the following

1455: \begin{lemma}

1456: Let $\delta>0$.

1457:   If $x,y\in \R^d$ are two points with distance~$t\delta \sqrt{2

1458:     \pi/\lr{d+1}}$ at most, then

1459:   \begin{equation}

1460:     \label{eq:1}

1461:     \vol(B(x,\delta)\cap B(y,\delta)) \geq (1 - t) \vol(\delta\ball).

1462:   \end{equation}

1463: \end{lemma}

1464: \begin{proof}

1465: Let $u:= \norm{x - y}{2}$. If $u<\delta$ then

1466:  the volume of the intersection of $B(x,\delta)$ and $B(y,\delta)$  is

1467: exactly the same as the volume of the

1468: ball $\delta\ball$ minus the volume of the

1469:  middle slice with distance~$u$ as thickness. The volume of

1470:  this slice is bounded from above by the volume of the cylinder with

1471:  base $\delta B^{d-1}$ and thickness $u$. Thus we obtain

1472:  \begin{equation*}

1473:   \vol(B(x,\delta)\cap B(y,\delta)) \geq \vol(\delta\ball) - u

1474: \vol(\delta B^{d-1}) =

1475: \vol(\delta\ball) \lr{ 1 - u \frac{

1476: \vol(\delta B^{d-1})}{\vol(\delta\ball)}}.

1477:  \end{equation*}

1478: Applying Lemma~\ref{lem:bou} we obtain

1479: $$

1480: \frac{ \vol(\delta B^{d-1})}{\vol(\delta\ball)}=

1481: \frac{ \vol(B^{d-1})}{\delta\vol(\ball)}

1482: \leq \frac 1 \delta \sqrt{\frac{d+1}{2\pi}},

1483: $$

1484: thus  by the choice of $u\leq \sqrt{2\pi} t\delta/\sqrt{d+1} $

1485: we conclude that

1486: $$

1487: u\frac{ \vol(\delta B^{d-1})}{\vol(\delta\ball)}

1488: \leq  \frac{\sqrt{2\pi}t\delta

1489: \sqrt{d+1}}{\delta\sqrt{2\pi}\sqrt {d+1}}\leq t,

1490: $$

1491: and the proof is complete.

1492: \end{proof}

1493: We turn to the

1494: \begin{proof}[Proof of Lemma~\ref{lem:vempala}]

1495: Let $x\in A_1$ and $y\in A_2$ be in $\Omega$, and suppose that their

1496:   distance is at most $t\delta \sqrt{2 \pi/\lr{d+1}}$.

1497:   Simple set theoretic reasoning shows that

1498:   \begin{align*}

1499: \vol(B(x,\delta)\cap B(y,\delta)\cap \Omega)&

1500: \geq \vol(B(x,\delta)\cap

1501: \Omega) - \vol(B(x,\delta)\setminus B(y,\delta)) \\

1502: &\geq  \vol(B(x,\delta)\cap

1503: \Omega) - \vol(B(x,\delta)\setminus (B(x,\delta)\cap  B(y,\delta)))

1504: \\

1505: &= \vol(B(x,\delta)\cap \Omega) - \vol(\delta\ball)

1506: + \vol(B(x,\delta)\cap  B(y,\delta)).

1507:   \end{align*}

1508: Since $l$ is a lower bound for the conductance $l(x)$ we have that

1509: $$

1510: \vol(B(x,\delta)\cap \Omega)\geq l \vol(B(x,\delta))= l

1511: \vol(\delta\ball).

1512: $$

1513: Taking this into account and using~(\ref{eq:1}) we

1514: end up with

1515: \begin{align*}

1516:  \vol(B(x,\delta)\cap B(y,\delta)\cap \Omega)& \geq l

1517:  \vol(\delta\ball) - \vol(\delta\ball) + (1-t) \vol(\delta\ball) \\

1518: & = (l-t)  \vol(\delta\ball).

1519: \end{align*}

1520: In probabilistic terms this rewrites as

1521: $Q_\delta(x, B(x,\delta)\cap B(y,\delta)\cap \Omega) \geq l-t$, and

1522: similarly $Q_\delta(y, B(x,\delta)\cap B(y,\delta)\cap \Omega) \geq

1523: l-t$.

1524: Now, if $A\subset\Omega$ is any measurable subset with complement

1525: $A^c$ then for $x\in A$ and $y\in A^c$ we obtain

1526: $$

1527: B(x,\delta) \cap B(y,\delta)\cap\Omega \subset

1528: \lr{B(x,\delta) \cap A^c \cap \Omega}

1529: \bigcup \lr{B(y,\delta) \cap A \cap \Omega} ,

1530: $$

1531: %E  In der letzten Formel ist ein c verschoben worden!

1532: which in turn yields $Q_\delta(x,A^c) + Q_\delta(y,A)\geq l-t$, but

1533: %E auch in der letzten Formel ist ein c gewandert!

1534: this contradicts the definition of the sets $A_1$ and $A_2$. Hence any

1535: two points from $A_1$ and $A_2$, respectively,  must have distance

1536: larger than  $t\delta \sqrt{2 \pi/\lr{d+1}}$, and the proof is complete.

1537: \end{proof}

1538:

1539: \subsubsection*{Properties of the related Metropolis method}

1540: \label{sec:metprop}

1541: We analyze  Metropolis Markov chains which are based

1542: on the ball walk, introduced above, for some appropriately chosen

1543: $\delta$. As it will turn out, the related Metropolis chains are

1544: \emph{perturbations} of the underlying ball walk, and its properties,

1545: as established in Propositions~\ref{prop:ueqd} and~\ref{pro:phi}

1546: extend in a natural way.

1547:

1548: For $\rho \in \rad(\Omega)$ we define the \emph{acceptance

1549:   probabilities} as

1550: \begin{equation}

1551:   \label{eq:alpha}

1552:   \alph(x,y):= \min\set{1,\frac{\rho(y)}{\rho(x)}}.

1553: \end{equation}

1554: The corresponding Metropolis kernel is given by

1555: \begin{equation}

1556:   \label{mk}

1557:   \krd(x,dy):=

1558:   \alph(x,y) Q_\delta(x,dy)

1559:   + (1 - \int_{}\alph(x,y)Q_\delta(x,dy))\delta_x(dy).

1560: \end{equation}

1561: Note that for $x \notin A$ we obtain

1562: $$

1563: \krd (x,A) =

1564: \int_A \alph (x,y) \, Q_\delta (x, dy) =

1565: \frac 1 {\vol (\delta \ball)} \, \int_{A\cap B(x,\delta)}

1566: \alph (x,y) \, dy .

1567: $$

1568: % For the convenience of the reader

1569: Below we sketch a single Metropolis~\KwDet

1570: from the present position~$x\in\Omega$ with kernel

1571:     $\krd(x,\cdot)$.

1572: The procedure~{\bf Ball-walk-step} was described in

1573: Figure~\ref{fig:bbb}.

1574:

1575: \begin{figure}[h]

1576:   \centering

1577: \begin{procedure}[H]

1578:   \caption{Metropolis-step($x,\rho,\delta$)}

1579: \SetLine

1580: \Input{current position $x$, $\delta>0$, function $\rho$\;}

1581: \Output{next position\;}

1582: \KwProp{$y := \text{\bf Ball-walk-step}(x,\delta)$}\;

1583: \KwAcc{}

1584:

1585: \uIf{$\rho(y)\geq \rho(x)$}{\Return{$y$}}%{\Return{$x$}}

1586: \uElseIf{$\rho(y) \geq {\bf rand()}\cdot \rho(x)$}{\Return{$y$}}

1587: \Else{\Return{$x$}}

1588: \end{procedure}

1589:   \caption{Schematic view of the Metropolis step. Note that the Acceptance step results in an

1590:     acceptance probability of $\alph(x,y)=\min\set{1,\rho(y)/\rho(x)}$.}

1591:   \label{fig:ccc}

1592: \end{figure}

1593: We start with the following observation.

1594: \begin{lemma}\label{lem:beta}

1595: Let $\alpha$ be the Lipschitz constant in $\rad(\Omega)$ and  $\beta:=

1596: \exp(-\alpha\delta)$.

1597:   Uniformly for $\rho\in\rad(\Omega)$ the following bound for the

1598:   related Metropolis chain holds true:

1599:   \begin{equation}

1600:     \label{eq:alb}

1601: \krd(x,dy) \geq \beta Q_\delta(x, dy).

1602:   \end{equation}

1603: \end{lemma}

1604: \begin{proof}

1605: Let $A\subset\Omega$. If $\dist(x,A)>\delta$  then there is nothing to

1606: prove.

1607: Otherwise, for $y\in A\cap B(x,\delta)$  we find

1608: from~(\ref{eq:dens-class}) and~(\ref{eq:alpha}) that

1609: \begin{equation*}

1610:  \alph(x,y)\geq \exp(-\alpha\norm{x - y}{2})

1611: \geq e^{-\alpha\delta}=\beta.

1612: \end{equation*}

1613: By definition of the transition kernel $\krd$ from~(\ref{mk}) we can

1614: use $\beta$ to bound

1615: $$

1616: \krd(x,A)\geq \min\set{\alph(x,y),\ y\in A\cap B(x,\delta)}

1617: Q_\delta(x, A) \geq \beta Q_\delta(x, A).

1618: $$

1619: The proof is complete.

1620: \end{proof}

1621: The assertion of Proposition~\ref{prop:ueqd} extends to the family of

1622: Metropolis chains as follows% ,

1623: % quantifying similar results from~\cite{MR1399158}

1624: .

1625:

1626: \begin{proposition}[{cf.~\cite[Prop.~1]{MR1738303}}]\label{pro:uue1}

1627: Let $Q_{\delta}$ be the ball walk from~(\ref{eq:qloc}) on

1628: %  a compact set

1629: $\Omega$.

1630: %  with lower bound $l$ for the local conductance of

1631: %  $Q_{\delta/2}$. % with $\delta\leq1/\sqrt{d+1}$.

1632: For each $\rho\in\rad(\Omega)$ and $\delta\leq D$ the corresponding

1633: Metropolis chains from~(\ref{mk}) are

1634: uniformly ergodic and reversible with respect to the related $\mur$.

1635: \end{proposition}

1636:

1637: \begin{proof}

1638: Reversibility with respect to $\mur$ is clear

1639: by the choice of the function~$\alph$. To

1640: prove uniform ergodicity,

1641: let $\beta$ be from Lemma~\ref{lem:beta} and $c$

1642:   from~(\ref{eq:ueball}). %Set $\eta:= 1 - \beta^{n_{0}}c$.

1643:   As established in Lemma~\ref{lem:beta} we have $\krd(x,dy)\geq

1644:   \beta Q_{\delta}(x,dy)$. It is easy to see, and was established

1645:   in~\cite[Proof of Thm.~2]{MR1738303}, that this extends to all

1646:   iterates as

1647: $$

1648: \krd^{n}(x,dy)\geq   \beta^{n} Q^{n}_{\delta}(x,dy).

1649: $$

1650: Recall that under the assumptions made,

1651: the ball walk is uniformly ergodic, and

1652: from Proposition~\ref{prop:ueqd} we obtain  $n_{0}$ such that for all

1653: $x\in\Omega$ we have

1654: \begin{equation}

1655:   \label{eq:unifbound}

1656: \krd^{n_{0}}(x,A)\geq  \beta^{n_{0}}c \nu(A),\quad A\subset \Omega,

1657: \end{equation}

1658: proving uniform ergodicity.

1659: \end{proof}

1660:

1661: \begin{rem}\label{rem:unifb}

1662: Notice that~(\ref{eq:unifbound}) is obtained with

1663: right hand side \emph{uniformly} for

1664: all $\rho\in\rad(\Omega)$, a fact which will prove useful later.

1665: \end{rem}

1666:

1667: Finally we prove lower bounds for the conductance of the

1668: Metropolis chains.

1669:

1670: \begin{theorem}\label{thm:met-cond}

1671: Let $(\krd,\mur)$ be the Metropolis chain based

1672: on the local ball walk $(Q_\delta,\muo)$

1673: and let $\phi(\krd,\mur)$ be its conductance, where

1674: $\rho\in\rad(\Omega)$.

1675: Let $l$ be a lower bound for the local conductance of $Q_{\delta}$.

1676: For  $\rho\in\rad(\Omega)$  we have

1677: \begin{equation}

1678: \label{eq:conductancelb}

1679: \phi(\krd,\mur) \geq

1680: \frac{l e^{-\alpha \delta}}{8}

1681: \min\set{\sqrt{\frac \pi 2}\frac{l\delta}{D \sqrt{d +1}},1},

1682: \end{equation}

1683: where $D$ is the diameter of $\Omega$.

1684: \end{theorem}

1685: \begin{rem}

1686:   As mentioned above, Proposition~\ref{pro:phi} is a special case of

1687: Theorem~\ref{thm:met-cond} for $\alpha=0$.

1688: \end{rem}

1689: The proof of Theorem~\ref{thm:met-cond} will be based on

1690: Lemma~\ref{lem:vempala} for the underlying

1691: ball walk, specifying $t:= l/2$.

1692: This extends to the Metropolis walk as follows.

1693: \begin{lemma}\label{cor:vemp}

1694:   Let $\alpha$

1695:   from~(\ref{eq:dens-class}) and $l$ be the local conductance of the

1696:   ball walk. We let $\beta:= \exp(-\alpha\delta)$.

1697: For $A\subset \Omega$ we assign

1698:  \begin{align}

1699: T_1 &:= \set{x\in A,\quad  \krd(x,A^c)< \frac{\beta l}{4}}\subset

1700:     A\\

1701:   T_2 &:= \set{y\in A^c,\quad  \krd(y,A)< \frac{\beta l}{4}}\subset

1702:   A^c.

1703:   \end{align}

1704: Then $d(T_1 ,T_2)>\delta l \sqrt{{\pi}/\lr{2d+2}}$.

1705:  \end{lemma}

1706:  \begin{proof}

1707: It is enough to prove $T_1\subset A_1$ and $T_2\subset A_2$.

1708: If $x\in T_1$ then Lemma~\ref{lem:beta} implies

1709: $\krd(x,A^c) <\beta {l}/{4}$, hence

1710: $$

1711: Q_\delta (x, A^c) \leq \frac 1 \beta \krd(x,A^c) \leq  \frac{l}{4}.

1712: $$

1713: The other inclusion is proved similarly.

1714:  \end{proof}

1715: We turn to the

1716: \begin{proof}[Proof of Theorem~\ref{thm:met-cond}]

1717: Let $A\subset \Omega$ be the set for which the conductance is

1718: attained. We assign sets $T_1$ and $T_2$ as in

1719: Lemma~\ref{cor:vemp} and distinguish two cases.

1720: If $\mur (T_1)<\mur (A)/2$ \emph{or}

1721: $\mur (T_2)<\mur (A^c)/2$,

1722: then the

1723: estimate~(\ref{eq:conductancelb}) follows easily.

1724: For instance, if  $\mur(T_1)<\mur(A)/2$ then

1725: \begin{multline*}

1726:   \int_A \krd(x,A^c)\mur(dx) \geq  \int_{A\setminus T_1}

1727:   \krd(x,A^c)\mur(dx)\\

1728: \geq \frac{\beta l}{4}\mur(A\setminus T_1)\geq

1729: \frac{ \beta l}{8}\mur(A)\geq

1730:  \frac{\beta l}{8} \min\set{\mur(A),\mur(A^c)},

1731: \end{multline*}

1732: %   The choice of $t:= l/2c$ yields

1733: % $$

1734: %  \int_A \krd(x,A^c)\mu(dx)\geq \frac{\beta l}{8}\mu(A),

1735: % $$

1736:  thus $ \phi(\krd,\mur)\geq \beta l/8$ in this case, which

1737: proves~(\ref{eq:conductancelb}).

1738: %  under condition~(\ref{eq:condition}).

1739:

1740: Otherwise we have $\mur(T_1)\geq \mur(A)/2$ \emph{and}

1741: $\mur(T_2)\geq \mur(A^c)/2$. In this case we

1742: apply an isoperimetric inequality,

1743: see~\cite[Thm.~4.2]{MR2178341} to the triple

1744: $(T_1,T_2,T_3)$ with

1745: $T_3:= \Omega \setminus (T_1 \cup T_2)$ to conclude

1746: that

1747: \begin{equation}

1748:   \label{eq:mu123}

1749: \mur(T_3)\geq \frac{2 d(T_1,T_2)}{D}\min\set{\mur(T_1),

1750: \mur(T_2)},

1751: \end{equation}

1752: hence under the size constraints in this case it holds true that

1753: \begin{equation}

1754:   \label{eq:mu123f}

1755:   \mur(T_3)\geq

1756:   \frac{d(T_1,T_2)}{D}\min\set{\mur(A),\mur(A^c)}.

1757: \end{equation}

1758: Using the reversibility of the Metropolis

1759: chain $(\krd,\mur)$ we have

1760: $$

1761: \int_A \krd(x,A^c)\mur(dx)= \int_{A^c} \krd(y,A)\mur(dy),

1762: $$

1763: which implies

1764: \begin{align*}

1765: \int_A \krd(x,A^c)\mur(dx)&= \frac 1 2 \lr{\int_A \krd(x,

1766:   A^c )\mur(dx)+ \int_{A^c} \krd(y,A)\mur(dy) }  \\

1767: & \ge  \frac 1 2 \lr{ \int_{A\cap T_3} \krd(x,

1768:   A^c )\mur(dx)+ \int_{A^c \cap T_3} \krd(y,A)\mur(dy) }\\

1769: &\geq \frac 1 2 \lr{ \frac{\beta l }{4} \mur(A \cap T_3) +

1770:   \frac{\beta l}{4} \mur(A^c  \cap T_3) }\\

1771: &= \frac{\beta l }{8}\lr{\mur(A \cap T_3)

1772: +\mur(A^c  \cap T_3) }=

1773: \frac{\beta l}{8} \mur(T_3).

1774: \end{align*}

1775: Since by Lemma~\ref{cor:vemp} we can bound

1776: $d(T_1,T_2)\geq \delta l \sqrt{{\pi}/\lr{2d+2}}$

1777: we use~(\ref{eq:mu123f}) to

1778: complete the proof.

1779: \end{proof}

1780:

1781: If we restrict ourselves to Metropolis chains on $\ball$, then

1782: Lemma~\ref{lem:l-bound}  provides a lower bound for

1783: the local conductance which is independent of the

1784: dimension~$d$.

1785: As a simple consequence of Theorem~\ref{thm:met-cond} we

1786: then obtain the following

1787: \begin{corollary}

1788: \label{cor2}

1789: Assume that $\rho \in \rad(\ball)$ and

1790: $\delta \le (d+1)^{-1/2}$.

1791: Then we obtain

1792: $$

1793: \phi(K_{\rho, \delta} , \mur) \ge\sqrt{\frac \pi 2}

1794: \frac{9 \delta}{1600 \sqrt{d + 1}} e^{-\alpha \delta} .

1795: $$

1796: To maximize $\phi$ we define

1797: \begin{math} % \label{eq:max}

1798: \delta^* = \min\set{{1}/{\sqrt{d+1}},1 /\a }

1799: \end{math}

1800: and obtain

1801: $$

1802: \phi(K_{\rho, \delta^*}, \mur)

1803: \ge 0.0025 \, \frac{1}{\sqrt{d+1}}

1804: \min\set{\frac{1}{\sqrt{d+1}},\frac 1 \a } .

1805: $$

1806: \end{corollary}

1807:

1808: \subsubsection*{Error bounds}

1809: \label{sec:er-mc}

1810:

1811: For the class $\fad(\Omega)$

1812: the above lower conductance bound~(\ref{eq:conductancelb})

1813: will yield an error estimate for the problem~(\ref{eq02}).

1814:

1815: Let $S_n^\delta$ be the

1816: estimator based on a sample of the local Metropolis Markov

1817: chain with transition $K_{\rho,\delta}$, starting at zero.

1818: To estimate its error

1819: we combine the estimates of the conductance of $K_{\rho,\delta}$

1820: with two results, partially known from the literature.

1821: To formulate the results we note the following.

1822: The Markov kernel

1823: $K_{\rho, \delta}$ is reversible with respect to

1824: $\mur$ and hence induces a self-adjoint operator

1825: $$

1826: K_{\rho, \delta} : L_2 (\Omega,\mur) \to L_2 (\Omega,\mur) .

1827: $$

1828: The spectrum $\sigma (K_{\rho,\delta})$ is contained

1829: in $[-1, 1]$ and $1 \in \sigma(K_{\rho, \delta})$

1830: and we are interested in the second largest eigenvalue

1831: $$

1832: \beta_{\rho, \delta} := \sup \{ \sigma \in \sigma

1833: (K_{\rho, \delta}) \mid \sigma \not= 1 \}

1834: $$

1835:  of $K_{\rho,\delta}$. This is motivated by the  extension of  a result

1836: from~\cite[Cor.~1]{MR1738303} about the worst case

1837: error of $S_n^\delta$, uniformly for $(f,\rho)\in\fad(\Omega)$.

1838: \begin{lemma}

1839: \label{le:mathescharf}

1840: $$

1841: \lim_{n \to \infty } \sup_{(f,\rho)\in\fad(\Omega)}

1842: e(S_n^\delta, (f,\rho))^2 \cdot n =

1843: \sup_{\rho\in\rad(\Omega)}\frac{1+ \beta_{\rho,

1844: \delta}}{1- \beta_{\rho, \delta}} .

1845: $$

1846: \end{lemma}

1847: The proof is given in the appendix.

1848: For Markov chains which start according to the invariant distribution

1849: $\mur$ the bound is similar, but  more explicit and was given

1850: in~\cite{SOK} and~\cite[Thm.~1.9]{MR1238906}.

1851:

1852: The relation of the second largest

1853: eigenvalue~$\beta_{\rho, \delta}$ to the conductance

1854: is given in

1855:

1856: \begin{lemma}[Cheeger's Inequality,

1857: see~\cite{MR1025467,MR930082,MR1238906}]

1858: \label{le:cheeger}

1859: $$

1860: \lambda_{\rho,\delta} :=  1 - \beta_{\rho, \delta} \geq

1861: \phi^{2}(K_{\rho, \delta}, \mur)/2.

1862: $$

1863: \end{lemma}

1864:

1865: We are ready to state %  and prove

1866: our main result for

1867: the Metropolis algorithm $S_n^\delta$, based on the Markov chain

1868: $K_{\rho, \delta}$, for the class

1869: $\fad (\ball)$, i.e., when $\Omega\subset \R^{d}$ is the Euclidean unit ball.

1870: \begin{theorem}

1871: \label{th5}

1872: Let $S_n^\delta=\frac 1 n \sum_{j=1}^{n}f(X_{j})$ be the

1873: estimator based on a sample~$(X_{1},\dots,X_{n})$ of the local Metropolis Markov

1874: chain with transition $K_{\rho, \delta}$,

1875: where $\delta \le (d+1)^{-1/2}$.

1876: Then

1877: \begin{equation}

1878:   \label{eq:th5}

1879:  \lim_{n \to \infty} \sup_{(f,\rho)\in\fad(\ball)}

1880: e(S_n^\delta, (f,\rho) ) ^2 \cdot n

1881: \le \frac{8\cdot 1600^{2}}{81\pi}(d +1)\cdot \frac{e^{2 \alpha \delta}}

1882: {\delta^{2}} .

1883: \end{equation}

1884:  Again we may choose

1885: $

1886: \delta^* = \min\set{(d+1)^{-1/2},\alpha^{-1}}

1887: $

1888: and obtain

1889: \begin{equation}

1890: \label{tract}

1891: \lim_{n \to \infty} \sup_{(f,\rho)\in\fad(\ball)}

1892: e(S_n^{\delta^*} , (f,\rho) ) ^2 \cdot n

1893: \le 594700 \cdot (d+1)\max\set{d+1,{\alpha^{2}}}.

1894: \end{equation}

1895: \end{theorem}

1896: \begin{proof}

1897: This follows from Corollary~\ref{cor2}, and

1898:  Lemmas~\ref{le:mathescharf} and~\ref{le:cheeger}.

1899: \end{proof}

1900:

1901: \section{Summary}

1902: \label{sec:sum}

1903:

1904: Let us discuss our findings.  %   in some detail.

1905: The results from Section~\ref{s2} clearly indicate that the

1906: superiority of Metropolis algorithms upon

1907: simpler (non-adaptive) Monte Carlo methods

1908: does not hold in general. Specifically, it does not hold

1909: for the large classes $\fco$ of input without

1910: additional structure.

1911:

1912: On the other hand, for the class~$\fad(\ball)$, specific Metropolis

1913: algorithms that are based on local underlying walks are superior to

1914: all non-adaptive methods.

1915: Even more,   %  , as formula~\eqref{tract} indicates,

1916: on~$\ball$  %   the problem is \emph{tractable}

1917: %   in $d$ and $\alpha$:

1918: the cost of the algorithm~$S_n^{\delta^*}$, roughly

1919: given by the number $n$ of evaluations of $\rho$ and $f$,

1920: increases like a polynomial in $d$ and $\alpha$.

1921: More

1922: precisely, according to~\eqref{tract}, the asymptotic constant

1923: $\lim_{n \to \infty} e(S_n^{\delta^*} , \fad(\ball) ) ^2~\cdot~n$

1924: is bounded by a constant times~$\max\set{d^{2}, d\alpha^{2}}$,

1925: i.e., the complexity grows polynomially in $d$ and $\alpha$

1926: and, for fixed $d$, increases (at most) as $\alpha^{2}$.

1927: If we only allow non-adaptive methods then this asymptotic constant,

1928: again for fixed $d$,  increases at least as $\alpha^{d}$,

1929: see~\eqref{lo9}.

1930:

1931: %E  Hier Modifikationen.

1932: We believe that this problem is \emph{tractable} in the sense that

1933: the number of function values to achieve an error $\e$ can be bounded

1934: by

1935: \begin{equation}   \label{tract2}

1936: n(\e , \fad(\ball)   ) \,  \le \,  C \,  \e^{-2} \,  d \,  \max ( d, \alpha^2) .

1937: \end{equation}

1938: We did not prove \eqref{tract2}, however, since Theorem 5 is only a

1939: statement for large $n$.

1940:

1941: Notice

1942: that according to Theorem~\ref{th5} the size~$\delta^{\ast}$ of the

1943: underlying balls walk needs to be adjusted both to

1944: the spatial dimension~$d$ and the

1945: Lipschitz constant~$\alpha$.

1946:

1947: The analysis of the Metropolis algorithm is based on properties of the

1948: underlying ball walk; in particular we establish uniform ergodicity of

1949: the ball walk for convex bodies~$\Omega\subset \R^{d}$. Also, based

1950: on conductance arguments,  we provide lower bounds for the spectral gap

1951: of the ball walk.   % : If $\delta\sim 1/\sqrt{d}$, then

1952: % Proposition~\ref{pro:phi} together

1953: % with Lemma~\ref{le:cheeger} show that this is independent

1954: % of the dimension

1955:

1956: As a consequence,  in the case~$\alpha=0$ the estimate~(\ref{eq:th5})  provides an

1957: error bound for the ball

1958: walk $(Q_{\delta},\mu)$, which is asymptotically of the form

1959: $ e(S_{n}^{\delta},L_{2}(\ball,\mu))\leq C \delta^{-1}

1960: (d/n)^{1/2}$.%  This complements the heuristic considerations

1961: % from~\cite[Example~1]{MR1738303}.

1962:

1963: The results  extend in a similar way to any family

1964: $\Omega_d \subset \R^{d}$ for which

1965: the underlying local ball walk $Q_{\delta}$ has

1966: (for $\delta \le \delta_d$)

1967: a non-trivial lower bound for the

1968: local conductance that is independent of the dimension.

1969:

1970: Finally, from the results of Section~\ref{s2} we can conclude that adaption

1971: does not help much for the classes $\fco$.

1972: Hence we have new results concerning the \emph{power of

1973: adaption}, see~\cite{MR1408328} for a survey of earlier results, in

1974: particular that it may help to break the \emph{curse of

1975: dimensionality} for the classes $\fad(\ball)$.

1976:

1977: \appendix

1978: \section{Proof of Lemma~\ref{le:mathescharf}}

1979: % \label{app}

1980:

1981: Lemma~\ref{le:mathescharf} extends the bound

1982: from~\cite[Thm.~1]{MR1738303}, which deals with a single uniformly

1983: ergodic chain. It was obtained from  on a contraction

1984: property, as stated in~\cite[Prop.~1]{MR1738303}.

1985: The goal of the

1986: present analysis is to establish this asymptotic result

1987: \emph{uniformly} for all Metropolis chains with density from

1988: $\rad(\Omega)$, by showing that this contractivity holds true uniformly% , which in turn allows to

1989: % extend the proof of Theorem~1 in~\cite{MR1738303}

1990: .

1991:

1992: \subsection*{Contractivity of the Markov operator}

1993: We assign to each transition kernel $K$ on $\Omega$ with corresponding invariant

1994: distribution $\mu$ the bounded linear mapping $P$, given by

1995: \begin{equation}

1996:   \label{eq:prd}

1997: (P f)(x) := \int f(y) K(x,dy).

1998: \end{equation}

1999: Also we let $E$ denote the mapping which assigns any integrable

2000: function its expectation as a constant function

2001: $

2002: E(f)\colon= \int_\Omega f(x) \, \mu(dx).

2003: $

2004: {F}or each $K$ the mapping $P - E$ is bounded in

2005: $L_{\infty}(\Omega,\mu)$, with norm less than or  equal to one and we

2006: shall strengthen this uniformly for kernels $\krd$ with

2007: $\rho\in\rad(\Omega)$.

2008: Within this operator context

2009: \emph{uniform ergodicity} is equivalent to a specific

2010: form of quasi-compactness, namely there are $0<\eta<1$ and $n_{0}\in\N$

2011: for which

2012: \begin{equation}\label{eq-infty-con}

2013: \norm{P^{n} - E\colon L_{\infty}(\Omega)

2014: \to L_{\infty}(\Omega)}{}\leq\eta,\ \text{for $n\geq

2015: n_{0}$.}

2016: \end{equation}

2017: We first show that reversibility allows to transfer this to

2018: the spaces~$L_{1}(\Omega,\mu_{\rho})$.

2019: \begin{lemma} \label{lem:infty1}

2020: Suppose that the transition kernel $K$ with corresponding

2021:   mapping $P$  is reversible. Then for all $n\in\N$ we have

2022:   \begin{equation}

2023:     \label{eq:1infty}

2024:     \norm{P^{n} - E\colon L_{1}(\Omega,\mu)

2025:     \to L_{1}(\Omega,\mu)}{}

2026:     \leq  \norm{P^{n} - E\colon L_{\infty}(\Omega,\mu)

2027:     \to L_{\infty}(\Omega,\mu)}{}.

2028:   \end{equation}

2029: % Consequently, if $K$ is uniformly ergodic and reversible, then there

2030: % are $n_{0}\in\N$ and $\eta<1$ such that

2031: % \begin{equation}

2032: %   \label{eq:1eta}

2033: % \norm{P^{n_{0}} - E\colon L_{1}(\ball,\mu)

2034: % \to L_{1}(\ball,\mu)}{}\leq \eta.

2035: % \end{equation}

2036:   \end{lemma}

2037:   \begin{proof}

2038:     If $K$ is reversible, then so are all iterates $K^{n}$. Thus for

2039:     arbitrary functions $f\in L_{1}(\Omega,\mu)$ and $h\in

2040:     L_{\infty}(\Omega,\mu)$ we have, using the scalar product on

2041:     $L_{2}(\Omega,\mu)$, that

2042: $$

2043: \scalar{(P^{n}- E)f}{h}= \scalar{f}{(P^{n}- E)h}.

2044: $$

2045: Consequently, for any $f\in L_{1}(\Omega,\mu)$ we have

2046: \begin{align*}

2047:   \norm{(P^{n} - E) f}{1} &= \sup_{\norm{h}{\infty}\leq 1}

2048:   \abs{\scalar{(P^{n}- E)f}{h}} =

2049:   \sup_{\norm{h}{\infty}\leq 1} \abs{\scalar{f}{(P^{n}- E)h}}  \\

2050:   &\leq \norm{f}{1} \sup_{\norm{h}{\infty}\leq 1} \norm{(P^{n}-

2051:     E)h}{\infty},

2052: \end{align*}

2053: from which the proof can be completed.

2054: \end{proof}

2055:

2056: \begin{proposition}\label{pro:unbound}

2057: % Let $\Omega\subset\R^{d}$ be a compact set,

2058: % and suppose that there is  a lower bound $l>0$

2059: % for the local conductance of $Q_{\delta/2}$.

2060: For any convex body $\Omega \subset \R^d$

2061: there are an

2062: integer $n_{0}$ and a constant $0<\eta<1$ such that uniformly for

2063: $\rho\in\rad(\Omega)$ we have

2064: \begin{equation}

2065:   \label{eq:uue1}

2066:   \norm{\prd^{n_{0}} - E\colon L_{1}(\Omega,\mu_{\rho})\to

2067:     L_{1}(\Omega,\mu_{\rho})}{}\leq \eta.

2068: \end{equation}

2069: \end{proposition}

2070:

2071: \begin{proof}

2072: This is  an immediate consequence of the

2073: bound~(\ref{eq:unifbound}). As mentioned in Remark~\ref{rem:unifb}

2074: uniform ergodicity was established uniformly for $\rho\in\rad(\Omega)$.

2075:  It is well known (see~\cite[Thm.~16.2.4]{Meyn-book}) that this

2076: implies % the assertion~(\ref{eq-infty-con}).

2077: % because this yields

2078: that there is an $\eta<1$ such that uniformly for

2079: $\rho\in\rad(\Omega)$ we have

2080: \begin{equation}\label{eq-infty-con2}

2081: \norm{\prd^{n_{0}} - E\colon L_{\infty}(\Omega)

2082: \to L_{\infty}(\Omega)}{}\leq\eta,\ \text{for $n\geq

2083: n_{0}$.}

2084: \end{equation}

2085: In the light of Lemma~\ref{lem:infty1} this yields~(\ref{eq:uue1}).

2086: \end{proof}

2087:

2088: Finally we sketch the

2089: \begin{proof}[Proof of Lemma~\ref{le:mathescharf}]

2090: Using Proposition~\ref{pro:unbound} we can extend the

2091: proof of~\cite[Thm.~1]{MR1738303}.

2092: In particular, the bounds from Eq.~(13)--(15)

2093: in~\cite{MR1738303} tend to zero uniformly for

2094: $\rho\in\rad(\Omega)$. Moreover, starting at zero,

2095: after one step according to the underlying ball walk, the (new)

2096: initial distribution is uniformly bounded  with respect to the uniform

2097: distribution on $\Omega$, hence also with respect to~$\mur$,

2098: such that we establish the asymptotics in Lemma~\ref{le:mathescharf}.

2099: \end{proof}

2100:

2101: \medskip

2102: \noindent

2103: {\bf Acknowledgment:} \

2104: We thank two anonymous referees and Daniel Rudolf for their comments.

2105: %E  Ackno ist neu.

2106:

2107: %\cite{MR2260070,MR2172842}

2108:

2109: \bibliographystyle{plain}

2110: %\bibliography{ref,mybib}

2111:

2112: \def\cprime{$'$} \def\cprime{$'$}

2113:

2114: \begin{thebibliography}{10}

2115:

2116: \bibitem{MR2260070}

2117: Christophe Andrieu and {\'E}ric Moulines.

2118: \newblock On the ergodicity properties of some adaptive {MCMC} algorithms.

2119: \newblock {\em Ann. Appl. Probab.}, 16(3):1462--1505, 2006.

2120:

2121: \bibitem{103439}

2122: David Applegate and Ravi Kannan.

2123: \newblock Sampling and integration of near log-concave functions.

2124: \newblock In {\em STOC '91: Proceedings of the twenty-third annual ACM

2125:   symposium on Theory of computing}, pages 156--163, New York, NY, USA, 1991.

2126:   ACM Press.

2127:

2128: \bibitem{MR2172842}

2129: Yves~F. Atchad{\'e} and Jeffrey~S. Rosenthal.

2130: \newblock On adaptive {M}arkov chain {M}onte {C}arlo algorithms.

2131: \newblock {\em Bernoulli}, 11(5):815--828, 2005.

2132:

2133: \bibitem{Bachvalov}

2134: N.~S. Bahvalov.

2135: \newblock Approximate computation of multiple integrals.

2136: \newblock {\em Vestnik Moskov. Univ. Ser. Mat. Meh. Astr. Fiz. Him.},

2137:   1959(4):3--18, 1959.

2138:

2139: \bibitem{B/D06}

2140: F.~Bassetti and P.~Diaconis.

2141: \newblock Examples comparing importance sampling and the {M}etropolis

2142:   algorithm.

2143: \newblock {\em to appear Illinois J. of Math.}, 2006.

2144:

2145: \bibitem{10.1109/5992.814660}

2146: Isabel Beichl and Francis Sullivan.

2147: \newblock The {M}etropolis algorithm.

2148: \newblock {\em Computing in Science and Engineering}, 2(1):65--69, 2000.

2149:

2150: \bibitem{10.1109/MCSE.2006.27}

2151: Isabel Beichl and Francis Sullivan.

2152: \newblock Guest editors' introduction: Monte {C}arlo methods.

2153: \newblock {\em Computing in Science and Engineering}, 8(2):7--8, 2006.

2154:

2155: \bibitem{MR2013000}

2156: Nicolas Bourbaki.

2157: \newblock {\em Functions of a real variable}.

2158: \newblock Elements of Mathematics (Berlin). Springer-Verlag, Berlin, 2004.

2159:

2160: \bibitem{Burenkov}

2161: Victor I. Burenkov.

2162: {\em Sobolev Spaces on Domains.}

2163: Teubner-Texte zur Mathematik 137.

2164: Teubner Verlag Stuttgart, 1998.

2165:

2166: \bibitem{MR1284987}

2167: Alan Frieze, Ravi Kannan, and Nick Polson.

2168: \newblock Sampling from log-concave distributions.

2169: \newblock {\em Ann. Appl. Probab.}, 4(3):812--837, 1994.

2170:

2171: \bibitem{Hlawka}

2172: E.~Hlawka.

2173: \newblock Ausf\"ullung und \"Uberdeckung konvexer K\"orper durch

2174: konvexe K\"orper.

2175: \newblock {\em Mh. Math. Phys.}, 53:81--131, 1949.

2176:

2177: \bibitem{MR1025467}

2178: Mark Jerrum and Alistair Sinclair.

2179: \newblock Approximating the permanent.

2180: \newblock {\em SIAM J. Comput.}, 18(6):1149--1178, 1989.

2181:

2182: \bibitem{MR1318794}

2183: R.~Kannan, L.~Lov{\'a}sz, and M.~Simonovits.

2184: \newblock Isoperimetric problems for convex bodies and a localization lemma.

2185: \newblock {\em Discrete Comput. Geom.}, 13(3-4):541--559, 1995.

2186:

2187: \bibitem{MR797411}

2188: Ulrich Krengel.

2189: \newblock {\em Ergodic theorems}, volume~6 of {\em de Gruyter Studies in

2190:   Mathematics}.

2191: \newblock Walter de Gruyter \& Co., Berlin, 1985.

2192:

2193: \bibitem{MR930082}

2194: Gregory~F. Lawler and Alan~D. Sokal.

2195: \newblock Bounds on the {$L\sp 2$} spectrum for {M}arkov chains and {M}arkov

2196:   processes: a generalization of {C}heeger's inequality.

2197: \newblock {\em Trans. Amer. Math. Soc.}, 309(2):557--580, 1988.

2198:

2199: \bibitem{MR1238906}

2200: L.~Lov{\'a}sz and M.~Simonovits.

2201: \newblock Random walks in a convex body and an improved volume algorithm.

2202: \newblock {\em Random Structures Algorithms}, 4(4):359--412, 1993.

2203:

2204: \bibitem{olm}

2205: Peter Math{\'e}.

2206: \newblock The optimal error of {M}onte {C}arlo integration.

2207: \newblock {\em J. Complexity}, 11(4):394--415, 1995.

2208:

2209: \bibitem{MR1738303}

2210: Peter Math{\'e}.

2211: \newblock Numerical integration using {M}arkov chains.

2212: \newblock {\em Monte Carlo Methods Appl.}, 5(4):325--343, 1999.

2213:

2214: \bibitem{Meyn-book}

2215: S.~P. Meyn and R.~L. Tweedie.

2216: \newblock {\em Markov chains and stochastic stability}.

2217: \newblock Springer-Verlag London Ltd., London, 1993.

2218:

2219: \bibitem{NOV}

2220: Erich Novak.

2221: \newblock {\em Deterministic and stochastic error bounds in numerical

2222:   analysis}.

2223: \newblock Lect. Notes Math. 1349. Springer-Verlag, Berlin, 1988.

2224:

2225: \bibitem{MR1319050}

2226: Erich Novak.

2227: \newblock The real number model in numerical analysis.

2228: \newblock {\em J. Complexity}, 11(1):57--73, 1995.

2229:

2230: \bibitem{MR1408328}

2231: Erich Novak.

2232: \newblock On the power of adaption.

2233: \newblock {\em J. Complexity}, 12(3):199--237, 1996.

2234:

2235: \bibitem{10.1109/MCSE.2006.30}

2236: Dana Randall.

2237: \newblock Rapidly mixing {M}arkov chains with applications in computer science

2238:   and physics.

2239: \newblock {\em Computing in Science and Engineering}, 8(2):30--41, 2006.

2240:

2241: \bibitem{MR1399158}

2242: G.~O. Roberts and R.~L. Tweedie.

2243: \newblock Geometric convergence and central limit theorems for multidimensional

2244:   {H}astings and {M}etropolis algorithms.

2245: \newblock {\em Biometrika}, 83(1):95--110, 1996.

2246:

2247: \bibitem{MR0172183}

2248: C.~A. Rogers.

2249: \newblock {\em Packing and covering}.

2250: \newblock Cambridge Tracts in Mathematics and Mathematical Physics, No. 54.

2251:   Cambridge University Press, New York, 1964.

2252:

2253: \bibitem{SOK}

2254: A.~Sokal.

2255: \newblock Monte {C}arlo methods in statistical mechanics: foundations and new

2256:   algorithms.

2257: \newblock In {\em Functional integration (Carg\`ese, 1996)}, pages 131--192.

2258:   Plenum, New York, 1997.

2259:

2260: \bibitem{IBC}

2261: J.~F. Traub, G.~W. Wasilkowski, and H.~Wo{\'z}niakowski.

2262: \newblock {\em Information-based complexity}.

2263: \newblock Academic Press Inc., Boston, MA, 1988.

2264: \newblock With contributions by A. G. Werschulz and T. Boult.

2265:

2266: \bibitem{vempala-lesson}

2267: Santosh Vempala.

2268: \newblock Lect.~17, {R}andom walks and polynomial time algorithms.

2269: \newblock {http://www-math.mit.edu/\~{}vempala/random/course.html}, 2002.

2270:

2271: \bibitem{MR2178341}

2272: Santosh Vempala.

2273: \newblock Geometric random walks: a survey.

2274: \newblock In {\em Combinatorial and computational geometry}, volume~52 of {\em

2275:   Math. Sci. Res. Inst. Publ.}, pages 577--616. Cambridge Univ. Press,

2276:   Cambridge, 2005.

2277:

2278: \end{thebibliography}

2279:

2280:

2281: \end{document}

2282:

2283:

2284:

2285: