math0611285/mn40.tex
1: 
2: \documentclass[12pt,leqno]{amsart} 
3: \usepackage[boxed,ruled]{algorithm2e}
4: % \usepackage[notcite,notref]{showkeys} 
5: % \usepackage{hyperref}
6: \setlength{\textwidth}{15cm}
7: \setlength{\textheight}{23cm}
8: 
9: \hoffset=-1.0cm  
10: \voffset=-2.0cm 
11: 
12: \hfuzz=7pt
13: \vfuzz=2pt
14: 
15: \headsep=27pt 
16: \parindent=15pt 
17: 
18: \frenchspacing 
19: 
20: \newlength{\fixboxwidth}
21: \setlength{\fixboxwidth}{\marginparwidth}
22: \addtolength{\fixboxwidth}{-9pt}
23: \newcommand{\fix}[1]{\marginpar{\fbox{\parbox{\fixboxwidth}
24: {\raggedright\tiny #1}}}}
25: 
26: \newcommand{\tabfrac}[2]{%
27:         \setlength{\fboxrule}{0pt}%
28:         \fbox{$\frac{#1}{#2}$}%
29: }
30: 
31: \newcommand{\tabmath}[1]{%
32:         \setlength{\fboxrule}{0pt}%
33:         \fbox{${#1}$}%
34: }
35: 
36: \newcommand{\R}{{\mathbb R}}
37: \newcommand{\N}{{\mathbb N}}
38: \newcommand{\SP}{{\mathbb S}}
39: \newcommand{\PP}{{\mathbb P}}
40: \newcommand{\Oh}{{\cal O}}
41: \newcommand{\bi}{{\bf{i}}}
42: \newcommand{\bx}{{\bf{x}}}
43: \newcommand{\ba}{{\alpha}}
44: \newcommand{\cost}{\operatorname{cost}}
45: \renewcommand{\rho}{{\varrho}}
46: \def\min{{\rm min}}  
47: \def\old{{\rm old}} 
48: \def\l{{\lambda }} 
49: \def\a{{\alpha }} 
50: \def\g{{\gamma }} 
51: \def\e{{\varepsilon}} 
52: \def\alph{{\theta}}  
53: \def\F{\mathcal F} 
54: \def\d{{\delta}} 
55: \def\phi{{\varphi}}  % huebscher!? 
56: \def\om{\omega}
57: \newcommand{\vol}{\operatorname{vol}}
58: %\renewcommand{\rho}{{\varrho}}
59: \newcommand{\rad}{\mathcal R^{\alpha}}
60: \newcommand{\radc}{\mathcal R_{C}}
61: %  \newcommand{\fad}{\mathcal F_\alpha} %   alt  
62: \newcommand{\fad}{\mathcal F^\alpha}    %   neu 
63: \newcommand{\ball}{{B^d}}
64: \newcommand{\fo}{\mathcal F(\Omega)}
65: \newcommand{\fco}{\mathcal F_C(\Omega)}
66: \newcommand{\wt}{\widetilde }
67: \newcommand{\krd}{K_{\rho,\delta} }
68: \newcommand{\prd}{P_{\rho,\delta} }
69: \newcommand{\muo}{\mu_\Omega}
70: \newcommand{\mur}{\mu_\rho}
71: \newcommand{\dist}{\operatorname{dist}}
72: \newcommand{\vt}{S^{\mathrm{mean}}_n}
73: \newcommand{\vtm}{S^{\mathrm{mh}}_n}
74: \newcommand{\vtn}{S^{\mathrm{simple}}_n}
75: \newcommand{\lr}[1]{\left(#1\right)}
76: \newcommand{\abs}[1]{\left\vert #1 \right\vert} 
77: \newcommand{\norm}[2]{\Vert #1  \Vert _{#2}} 
78: \newcommand{\set}[1]{\left\{#1\right\}}
79: \newcommand{\expect}{\mathbf E}
80: \newcommand{\Var}{\operatorname{Var}}
81: \newcommand{\scalar}[2]{\langle #1,#2\rangle}
82: \newcommand{\die}{\mathcal E}
83: 
84: \theoremstyle{plain}
85: \newtheorem{theorem}{Theorem}
86: \newtheorem{lemma}{Lemma}
87: \newtheorem{proposition}{Proposition}
88: \newtheorem{corollary}{Corollary}
89: 
90: \theoremstyle{definition}
91: \newtheorem{rem}{Remark}
92: %   \numberwithin{lemma}{section}  % geaendert 
93: %   \numberwithin{equation}{section} % geaendert 
94: \begin{document}
95: 
96: \title{Simple Monte Carlo and the Metropolis algorithm} 
97: \author{Peter Math\'e} 
98: \address{Weierstrass Institute for Applied Analysis and 
99: Stochastics, Mohrenstrasse 39, D-10117 Berlin, Germany}
100: \email{mathe@wias-berlin.de}
101: \author{Erich Novak}
102: \address{ Friedrich Schiller University Jena, 
103: Mathem. Institute,
104: Ernst-Abbe-Platz 2, 
105: D-07743 Jena, Germany}
106: \email{novak@math.uni-jena.de}
107: \date{Version: \today}
108: \keywords{Monte Carlo methods, Metropolis algorithm, 
109: log-concave density, rapidly mixing Markov chains, 
110: optimal algorithms, adaptivity, complexity}
111: \subjclass[2000]{65C05, secondary: 65Y20, 68Q17, 82B80}
112: 
113: \maketitle
114: \begin{center}
115: {\sl\large Dedicated to our dear colleague and friend Henryk
116: Wo\'zniakowski on the occasion of his 60th birthday. }  
117: \end{center}
118: 
119: \begin{abstract}
120: We study the integration of functions with 
121: respect to an unknown density.
122: % which is known only up to the normalizing factor. 
123: Information is available as
124: oracle calls to the integrand and to the non-normalized density 
125: function.
126: We are interested in analyzing the integration error
127: of optimal algorithms 
128: (or the complexity of the problem) with emphasis on
129: the variability of the weight function. 
130: For a corresponding large
131: class of problem instances we show that the complexity
132: grows linearly in the variability, and the simple Monte Carlo method
133: provides an almost optimal algorithm.
134: Under additional geometric restrictions (mainly log-concavity)
135: for the density
136: functions, we establish that a suitable adaptive
137: local Metropolis algorithm is almost optimal and 
138: outperforms any non-adaptive algorithm. 
139: \end{abstract}
140: 
141: \section{Introduction, Problem description}\label{s1} 
142: In many applications one wants to compute an integral of the form
143: \begin{equation}
144: \label{eq:base}
145: \int_\Omega f(x) \cdot c \rho(x) \, \mu(dx)  
146: \end{equation}
147: with a density $c \rho(x),\ x\in \Omega$, where $c >0$ is unknown
148: and $\mu$ is a probability measure. 
149: Of course we have 
150: $
151: {1}/{c} = \int_\Omega  \rho(x) \, \mu(dx),
152: $
153: but the numerical computation of the 
154: latter integral is often as hard as the original
155: problem~(\ref{eq:base}).     
156: Therefore it is desirable to have algorithms which are able to 
157: approximately compute~(\ref{eq:base}) without knowing the normalizing
158: constant, based solely on $n$  function values of $f$ and $\rho$. In other
159: terms, these functions are given by an \emph{oracle}, i.e., we assume 
160: that we can compute function values of $f$ and $\rho$. 
161: 
162: \subsubsection*{Solution operator}
163: \label{solop}
164: Assume that we are given any 
165: class $\fo$ of input data $(f,\rho)$ defined 
166: on a set $\Omega$.
167: We can rewrite the integral in~(\ref{eq:base}) as
168: \begin{equation}   \label{eq02} 
169: S(f, \rho) = \frac{\int f(x) \cdot \rho (x) \, \mu(dx)}{\int \rho (x)
170: \, \mu(dx)},\quad (f,\rho)\in\fo. 
171: \end{equation} 
172: This \emph{solution operator} is linear in $f$ but not in $\rho$. 
173: We discuss algorithms for the (approximate) computation of $S(f, \rho)$.
174: \begin{rem}
175: This solution operator is closely related to systems in statistical
176: mechanics, which obey a Boltzmann 
177: (or Maxwell or Gibbs) distribution, i.e., when there is a
178: countable number $j=1,2,\dots$ of microstates with energies, say
179: $E_j$,  and the overall system is distributed according to the
180: Boltzmann distribution, with inverse temperature $\beta$,  as
181: $$
182: P_\beta(j):= \frac{e^{-\beta E_j}}{Z_\beta},\quad j=1,2,\dots.
183: $$
184: In this case the normalizing constant $Z_\beta$ is the \emph{partition
185: function}, 
186: corresponding to $1/c$ from~(\ref{eq:base}) and $\rho^\beta(j)=
187: e^{-\beta E_j}$ for $j \in \N$.
188: 
189: In this setup, if $A$ is any global thermodynamic quantity, then its
190: expected value $\langle A \rangle_\beta$ is given by
191: $$
192: \langle A \rangle_\beta := \frac{1}{Z_\beta} \sum_{j} A_j e^{-\beta E_j},
193: $$
194: which can be written as $S(A,\rho^\beta)$.
195: Observe, however, that we use here slightly
196: different assumptions 
197: since we use the counting measure on $\N$, not a probability measure.
198: \end{rem}
199: 
200: \subsubsection*{Randomized methods}
201: \label{randm}
202: 
203: Monte Carlo methods (randomized methods) are 
204: important numerical tools for integration and
205: simulation in science and engineering, we refer to the 
206: recent special issue~\cite{10.1109/MCSE.2006.27}.
207: The Metropolis method, or more accurately, the class of 
208: \emph{Metropolis-Hastings algorithms} ranges among the most important 
209: methods in numerical analysis and scientific computation, 
210: see~\cite{10.1109/5992.814660,10.1109/MCSE.2006.30}.
211: 
212: Here we consider randomized methods $S_n$ that use $n$ function 
213: evaluations of $f$ and $\rho$. Hence $S_n$ is of the form as exhibited
214: in Figure~\ref{fig:gene}. 
215: \SetKw{KwInit}{Init}
216: \SetKw{KwAvg}{Compute}
217: \SetKw{KwDet}{Step}
218: \SetKw{KwCh}{Choose}
219: \SetKw{KwComp}{Compute}
220: \restylealgo{boxed}
221: \begin{figure}[h]
222:   \centering
223: \begin{algorithm}[H]
224: \SetLine
225: \Titleofalgo{ $S_n(f,\rho)$}
226: \KwData{Functions $f,\rho$, {\tt random numbers} $\omega_{1},\dots,\omega_{n}$\;}
227: \KwResult{approximate value $S_n(f,\rho)$ for $S(f,\rho)$ from Eq.~(\ref{eq02})\;}
228: \Begin{
229: \KwInit{$x_{1} := x_{1}(\omega_{1})$, 
230: \KwComp{ $f(x_1)$ and $\rho(x_1)$}\;
231: }
232: 
233: \For{$i=2,\dots,n$}
234: {
235: \KwDet{ $x_i := x_{i}(f(x_{1}),\dots,f(x_{i-1}),\rho(x_{1}),\rho(x_{i-1}),\omega_{i})$}\;
236: \KwComp{$f(x_i)$ and $\rho (x_i)$}\; 
237: }
238: \KwAvg{ $S_n(f,\rho)= \phi_n(f(x_{1}),\dots,f(x_{n}),\rho(x_{1}),\dots,
239: \rho(x_{n}))\in \R $}\;
240: }
241: \end{algorithm}  
242:   \caption{Generic Monte Carlo algorithm based on $n$ values of 
243:  $f$ and $\rho$. The final {\bf Compute} may use any mapping $\phi_n : \R^{2n} \to \R$.
244: %  random numbers
245: %  $\omega_{1},\dots,\omega_{n}$.
246: % Monte Carlo algorithms differ by choosing~\KwDet and~\KwAvg{} in
247: % different ways.
248: }
249:   \label{fig:gene}
250: \end{figure}
251: 
252: In all steps, random number generators may be used to determine the
253: consecutive node.
254: If the nodes $x_i$ from \KwDet
255: do not depend on previously computed
256: values of $f(x_1), \dots ,f(x_{i-1})$ and
257: $\rho(x_1), \dots , \rho(x_{i-1})$, then the algorithm is called 
258: \emph{non-adaptive}, otherwise it is called \emph{adaptive}. 
259: Specifically we analyze the 
260: procedures $\vtn$ and $\vtm$, introduced in~(\ref{eq:vtn})
261: and~(\ref{eq:met}) below.
262: \begin{rem}
263: The notion of \emph{adaption} which is used here differs from the
264: one recently used to introduce~\emph{adaptive MCMC}, see
265: e.g.~\cite{MR2260070,MR2172842}.
266: %   By a non-adaptive algorithm we mean an algorithm of the form 
267: %   $x_i = x_i (\omega_i)$, i.e., the node $x_i$ does \emph{not} 
268: %   depend on the (already computed) values 
269: %   $f(x_1), \dots , f(x_{i-1}), \rho(x_1), \dots , \rho (x_{i-1})$. 
270: %   
271: %   All other algorithms are called adaptive. 
272: The Metropolis algorithm which is used in this paper is based 
273: on a 
274: \emph{homogeneous} Markov chain, in our notation this is still
275: an adaptive algorithm since the used nodes $x_i$ depend on $\rho$. 
276: %  but the kernel of which
277: %  may depend on the specific target distribution, as this is the case
278: %  for the Metropolis sampler, see~\S~\ref{sec:metro-loc}. 
279: Hence we use the concept of adaptivity from numerical analysis 
280: and information-based complexity, see~\cite{MR1408328}. 
281: \end{rem}
282: 
283: For details on the model of computation we 
284: refer to~\cite{NOV,MR1319050,IBC}. 
285: Here we only mention the following: 
286: We use the real number model and assume that $f$ and $\rho$ 
287: are given by an oracle for function values. 
288: Our lower bounds hold under very general assumptions 
289: concerning the available random number generator.\footnote{Observe,
290: however, that we cannot use a random number generator 
291: for the ``target distribution'' 
292: $\mu_\rho=\rho \cdot \mu / \Vert \rho \Vert_1$, 
293: since $\rho$ is part of the input.} 
294: 
295: For the upper bounds we only study two algorithms 
296: in this paper, described in~(\ref{eq:vtn}) and (\ref{eq:met}),
297: below. Specifically we shall deal with the (non-adaptive)~\emph{simple Monte Carlo
298:   method} and a specific  (adaptive)~\emph{Metropolis--Hastings method}.
299: The former can only be applied if a random 
300: number generator for $\mu$ on $\Omega$ is available. 
301: Thus there
302: are natural situations when this method cannot be used.
303: % If applicable, then the subroutine \KwDet~ in Algorithm~\ref{fig:algorithm} chooses a random number
304: % according to $\mu$, independently in each step. 
305: The latter will be based on a suitable
306: ball walk. Hence we need a random number generator 
307: for the uniform distribution on a (Euclidean) ball.
308: Thus the Metropolis Hastings methods 
309: can also be applied when a random 
310: number generator for $\mu$ on $\Omega$ is not available.
311: Instead, we need a ``membership
312: oracle'' for $\Omega$: On input $x \in \R^d$ this oracle can
313: decide with cost 1 whether $x \in \Omega$ or not. 
314: 
315: % The detailed description of the \emph{adaptive} algorithm is postponed to~\S~\ref{sec:metro-loc}.
316: \subsubsection*{Error criterion}
317: \label{sec:error}
318: We are interested in error bounds 
319: uniformly for classes~$\fo$ of input data. If $S_{n}$ is any method
320: that uses (at most) $n$ values of $f$ and $\rho$ 
321: then the (individual) error for the 
322: problem instance~$(f, \rho)\in\fo$ is given by
323: \begin{equation*}
324: %\label{eq:mcerr}
325: e(S_n, (f,\rho))= \lr{\expect\abs{S(f,\rho) -
326:       S_n (f,\rho)}^{2}}^{1/2},
327: \end{equation*}
328: where $\expect$ means the expectation. 
329: The overall (or worst case)  error on the class $\fo$ is
330: \begin{equation*}
331: %\label{eq:mcerfc}
332: e(S_n, \fo)= \sup_{(f,\rho)\in\fo} 
333: e(S_n , (f,\rho)).
334: \end{equation*}
335: The complexity of the problem is given by 
336: the error of the best algorithm, hence we let
337: \begin{equation*}
338: %\label{eq05}
339: e_n (\fo) := \inf_{S_n} 
340: e(S_n, \fo). 
341: \end{equation*}
342: The classes~$\fo$ under consideration will always contain constant
343: densities~$\rho = c > 0$ and all $f$ with $\Vert f \Vert_\infty
344: \le 1$, hence 
345: $$
346: \mathcal F_1 (\Omega) :=\set{(f,\rho),\ \abs{f(x)}\leq 1,\
347: x\in\Omega, \text{ and }\ \rho = c} \subset \fo.
348: $$
349: On this class the problem~(\ref{eq02}) reduces to the classical
350: integration problem for uniformly bounded functions, and it is well
351: known that the error of any Monte Carlo method can decrease at a rate
352: $n^{-1/2}$, at most. Precisely, it holds true that 
353: $$
354: e_{n}(\mathcal
355: F_{1}(\Omega))= \frac{1}{1 + \sqrt n},
356: $$
357: if the probability~$\mu$ is non-atomic, see~\cite{olm}.
358: On the other hand we will only consider $(f, \rho)$ with 
359: $S(f, \rho) \in [-1, 1]$, hence the trivial algorithm 
360: $S_0=0$ always has error 1. 
361: 
362: For the classes $\fco$ 
363: and $\fad(\Omega)$,  which will be 
364: introduced in Section~\ref{sec:m+c},
365: we easily obtain the optimal order 
366: $e_{n}(\fo) \asymp n^{-1/2}$. 
367: We will analyze how $e_n(\fo)$ 
368: depends on the parameters 
369: $C$ and $\alpha$, in case $\fo:=\fco$ or
370: $\fo:=\fad(\Omega)$, respectively. 
371: 
372: We discuss some of our subsequent results and provide a short
373: outline. 
374: In Section~\ref{sec:m+c} we shall specify the methods and classes of input
375: data to be analyzed.
376: The classes $\fco$,
377: analyzed first in Section~\ref{s2},  contain all densities $\rho$ with 
378: $\sup \rho / \inf \rho \le C$. In 
379: typical applications we may face $C=10^{20}$. 
380: Then  we cannot decrease the error of optimal 
381: methods from 1 to $0.7$ even with sample 
382: size $n=10^{15}$, see Theorem 1 for more details. 
383: Hence the classes $\fco$ are so large that no
384: algorithm, deterministic or Monte Carlo, 
385: adaptive or non-adaptive, can provide an acceptable 
386: error. We also prove that the simple (non-adaptive) Monte Carlo method is almost 
387: optimal, no sophisticated  Markov chain Monte Carlo method can help. 
388: 
389: Thus we face the question whether adaptive algorithms, 
390: such as the Metropolis algorithm, 
391: help significantly on ``suitable and interesting'' subclasses of $\fco$. 
392: We give a positive answer for the classes 
393: $\fad(\Omega)$, analyzed in Section~\ref{s3}.  Here we assume that 
394: $\Omega \subset \R^d$ is a convex body, and that $\mu$ is the normalized Lebesgue
395: measure~$\muo$ on $\Omega$.  
396: The class~$\fad(\Omega)$ contains logconcave densities, 
397: where $\a$ is the Lipschitz constant 
398: of $\log \rho$.
399: We shall establish in \S~\ref{sec:non} that 
400: all non-adaptive methods (such as the simple Monte
401: Carlo method) suffer from the curse of dimension,
402: i.e., % for non-adaptive methods
403: we get similar lower bounds as for the classes $\fco$. 
404:  However, in \S~\ref{sec:metro-loc} we shall design and analyze
405:  specific (adaptive) Metropolis algorithms that are based on some
406:  underlying ball walks, tuned to the class parameters%  as these are the
407: % spacial dimension $d$ and the Lipschitz constant $\a$
408: . Using such algorithms we can 
409: break the curse of dimension by adaption. The main error estimate for
410: this algorithm is given in Theorem~\ref{th5}, and we conclude 
411: this study with further discussion in the final Section~\ref{sec:sum}.
412: 
413: \section{Specific methods and classes of input}
414: \label{sec:m+c}
415: We consider the approximate computation of $S(f,\rho)$
416: for large classes of input data. 
417: Since with deterministic algorithms one cannot %E  substantially 
418: improve 
419: the trivial zero algorithm (with error 1), 
420: we study randomized or Monte Carlo algorithms.
421: 
422: \subsection*{The methods}
423: The Monte Carlo methods under consideration  fit the schematic view from
424: Figure~\ref{fig:gene}.
425: 
426: \subsubsection*{{Simple Monte Carlo}}
427: \label{sec:simp}
428: Here the random numbers
429: $\omega_{1},\dots,\omega_{n}$ are identically and independently
430: distributed according to~$\mu$, and the routine~\KwDet chooses
431: $X_{i}:= \omega_{i}$. 
432: The final routine~\KwAvg is the quotient of the sample means of
433: the computed function values
434: \begin{equation}\label{eq:vtn}
435: \vtn(f,\rho):= \frac{\sum_{j=1}^n f(X_j)\rho(X_j)}{\sum_{j=1}^n\rho(X_j)}. 
436: \end{equation}
437: \subsubsection*{{Metropolis-Hastings method}}
438: \label{sec:mh}
439: This describes a class of (adaptive) Monte Carlo  methods which are based
440: on the ingenious idea to construct in \KwDet a Markov chain having
441: \begin{equation}  \label{mur} 
442: \mur := \frac{\rho \cdot \mu}{\int\rho(x)\, \mu(dx)}
443: \end{equation} 
444: as invariant distribution without knowing the normalization. 
445: Thus, if $(X_1,X_2,\dots,X_n)$ is a
446: trajectory of such a Markov chain, then we let \KwAvg be given as
447: \begin{equation}
448:   \label{eq:met}
449:   \vtm(f,\rho):= \frac{1}{n}  \sum_{j=1}^n f(X_j).
450: \end{equation}
451: Hence we use $n$ steps of the Markov chain, the number of needed 
452: (different) 
453: function values of $\rho$ and $f$ might be smaller. 
454: We will further specify the Metropolis-Hastings algorithm for the
455: problem at hand in \S~\ref{sec:metro-loc}, see Figures 2 and 3 
456: for a schematic presentation and Theorem 5 for the choice of $\delta$. 
457: %E  In diesem Bereich ein paar kleine Aenderungen. 
458: Both Monte Carlo methods construct Markov chains,  i.e., the point 
459: $x_i$ depends on $x_{i-1}$ and $\rho (x_{i-1})$, only. This trivially holds true
460: for simple Monte Carlo, since $x_i$ does not at all depend on 
461: earlier computed function values. 
462: 
463: \begin{rem}
464: Comparisons of different Monte Carlo methods for problems similar
465: to~(\ref{eq02}) are frequently met in the literature. We
466: mention~\cite{B/D06} with a comparison 
467: of \emph{Metropolis algorithms} and
468: \emph{importance sampling}, where an error expansion at any instance
469: $(f,\rho)$ is given in terms of certain auto-correlations. The simple
470: Monte Carlo method, as introduced below, is also studied there as
471: $\tilde\mu_{I}$ for $\rho   = 1$.
472: \end{rem}
473: The (point-wise almost sure) convergence of both 
474: methods $\vtn% (f,\rho)
475: $ and
476: $\vtm% (f,\rho)
477: $, as $n\to\infty$,  is ensured by corresponding
478: ergodic theorems, see~\cite{MR797411}. But, as outlined above, we are
479: interested in the uniform error on  relatively large~\emph{problem classes}. 
480: \subsection*{The classes}
481: Here we formally describe the classes of input under consideration.
482: 
483: \subsubsection*{{ The class $\fco$}}
484: \label{sec:classfc}
485: 
486: %In Section~3 we assume that 
487: Let $\mu$ be an arbitrary probability 
488: measure on a set $\Omega$ and consider the set
489: $$
490: \fco = \{ (f, \rho) \mid 
491: \Vert f \Vert_\infty \le 1, \ 
492: \rho >0, \
493: \frac{\rho(x)}{\rho(y)}  \le C,\ x,y\in\Omega \}. 
494: $$
495: % $$
496: % \fco = \{ (f, \rho) \mid 
497: % \Vert f \Vert_\infty \le 1, \ 
498: % \rho >0, \
499: % \frac{\sup \rho}{\inf \rho}  \le C \}. 
500: % $$
501: Note that necessarily $C\geq 1$. If $C=1$ then $\rho$ is constant and
502: we almost face the ordinary integration problem, since 
503: $\rho$ can be recovered with only one function value. 
504: 
505: In many applications the constant $C$ is huge and we will establish
506: that the complexity of the problem (the cost of an optimal
507: algorithm) is linear in $C$. Therefore, for large $C$, the class is
508:  too large. We have to look for smaller classes that 
509: contain many interesting pairs $(f, \rho)$ and have smaller complexity. 
510: 
511: \subsubsection*{The class $\mathcal \fad(\Omega)$ 
512: with log-concave densities}
513: \label{sec:classfad}
514: 
515: In many applications, we have a weight~$\rho$ with additional 
516: properties and %in Section~4 
517: we assume the following:
518: \begin{itemize} 
519: \item The set $\Omega\subset \R^d$ is a \emph{convex body}, that is a compact and convex set
520: with nonempty interior. The probability $\mu=\muo$ is the normalized Lebesgue measure 
521: on the set~$\Omega$. 
522: \item
523: The functions $f$ and $\rho$ are defined on $\Omega$.
524: \item
525: The weight~$\rho >0$ is log-concave, i.e., 
526: $$
527: \rho(\lambda x + (1-\lambda)y) \ge \rho(x)^\lambda \cdot 
528: \rho(y)^{1-\lambda}, 
529: $$
530: where $x,y \in \Omega $ and $0<\lambda <1$. 
531: \item
532: The logarithm of $\rho$ is Lipschitz, 
533: i.e., 
534: $ 
535: |\log\rho(x) - \log\rho(y) | \leq \alpha \Vert x-y \Vert_2
536: $. 
537: \end{itemize} 
538: Thus  we  consider the class of log-concave weights on
539: $\Omega\subset \R^{d}$ given by
540: \begin{equation}
541: \label{eq:dens-class}
542: \rad(\Omega)  = \{  \rho \mid 
543: \rho >0, \
544: \log\rho \text{ is concave}, \
545: |\log\rho(x) - \log\rho(y) | \leq \alpha \Vert x-y \Vert_2 \} . 
546: \end{equation}
547: 
548: We study the following class $\fad(\Omega)$ of problem elements, 
549: \begin{equation}
550:   \label{eq:fad}
551:  \fad (\Omega)  = \set{(f, \rho) \mid 
552: \rho \in\rad( \Omega),  \ \norm{f}{2,\rho}\le 1 } ,
553: \end{equation}
554: where $\Vert \cdot \Vert_{2,\rho}$ is the 
555: $L_2$-norm with respect to the probability measure $\mur$, 
556: see~\eqref{mur}. 
557: In some places we restrict our study to the (Euclidean) unit ball, i.e.,  
558: $\Omega:= \ball \subset \R^d$. 
559: 
560: \begin{rem}
561: Let $\radc (\Omega)$ be the class of weight functions that 
562: belong to $\fco$. Then 
563: $\rad(\Omega) \subset \radc (\Omega)$ 
564: if $C = e^{\alpha
565: D}$, where $D$ is the diameter of $\Omega$. 
566: Thus large $\a$ correspond to ``exponentially large'' 
567: values of $C$. However,
568: the densities from the class
569: $\rad(\Omega)$ have some extra (local) properties: they are log-concave 
570: and Lipschitz continuous. 
571: These properties can be used for the construction of fast 
572: adaptive methods, via rapidly mixing Markov chains. 
573: \end{rem}
574: 
575: \section{Analysis for $\fco$} \label{s2} 
576: 
577: We assume that $\Omega$ is an arbitrary set and $\mu$ 
578: is a probability measure on $\Omega$, 
579: and that the functions~$f$ and $\rho$ are defined on $\Omega$. 
580: 
581: In the applications, the constant $C$ might be very large, 
582: something like $C=10^{20}$ is a realistic assumption. 
583: Therefore we want to know how the complexity (the cost of 
584: optimal algorithms) depends on $C$. 
585: Observe that the problem is correctly normalized or scaled such that 
586: $
587: S(\fco) = [-1, 1] ,
588: $
589: for any $C \ge 1$. 
590: We will prove that the complexity of the problem % , i.e., 
591: % the cost of optimal algorithms,
592: is linear in $C$, and hence
593: there is no way to solve the problem if $C$ is really huge. 
594: % This problem class is simply too large.
595: We start with establishing a lower bound and then show that simple
596: Monte Carlo achieves this error up to a constant. 
597: 
598: \subsection{Lower Bounds} 
599: 
600: Here we prove lower bounds for all
601: (adaptive or non-adaptive) methods that use $n$ evaluations 
602: of $f$ and $\rho$. We use the technique of Bahvalov, i.e., 
603: we study the average error 
604: of deterministic algorithms with respect to certain discrete measures 
605: on $\fco$. 
606: \begin{theorem} \label{thm:lbfc}
607: Assume that we can partition  $\Omega$ into $2n$ disjoint sets with
608: equal measure (equal to $1/2n$). 
609: Then for any Monte Carlo method $S_n$ that uses $n$ values of 
610: $f$ and $\rho$ we have the lower bound
611: \begin{equation}
612:  \label{eq:2nc} 
613: e(S_n,\fco) \ge\frac 1 6 \sqrt 2 
614: \begin{cases}
615: \sqrt{\frac{C}{2n}}, &  2n\geq C - 1, \\
616: \frac{3 C}{C+2n-1}, & 2n < C -1.
617: \end{cases}
618: \end{equation}
619: \end{theorem} 
620: The lower bound will be obtained in two steps.
621: \begin{enumerate}
622: \item We first reduce the error analysis for Monte Carlo sampling to
623:   the average case error analysis with respect to a certain prior
624:   probability on the class $\fco$. 
625:   This approach is due to Bahvalov, see~\cite{Bachvalov}.
626: \item For the chosen prior the average case analysis can be carried
627:   out explicitly and will thus yield a lower bound. 
628: \end{enumerate}
629: To construct the prior let $m:=2n$ and  $\Omega_{1},\dots,\Omega_{m}$
630: the partition into sets of equal probability, and $\chi_{\Omega_{j}}$
631: the corresponding characteristic functions. Furthermore, let 
632: $$
633: l:=
634: \begin{cases}
635:  \lceil \frac{m}{C-1}\rceil, &  m\geq C -1,\\
636: 1,&\text{ else.} 
637: \end{cases}
638: $$ 
639: Denote $J_{l}^{m}$ the set of
640: all subsets of $\set{1,\dots,m}$ of cardinality equal to $l$, and
641: $\mu_{m,l}$ the equi-distribution on $J_{l}^{m}$, while $\expect_{m,l}$ denotes the expectation with 
642: respect to the prior $\mu_{m,l}$. Let
643: $(\e_{1},\dots,\e_{m})$ be independent and identically
644: distributed with $P(\e_{j}=-1)= P(\e_{j}=1)=1/2,\ j=1,\dots,m$.
645: The overall prior is the product probability on $J_{l}^{m}\times
646: \set{\pm 1}^{m}$.
647: For any realization $\om=(I,\e_{1},\dots,\e_{m})$ we assign
648: $$
649: f_{\om}:= \sum_{j\in I} \e_{j}\chi_{\Omega_{j}}\quad \text{and}\quad
650: \rho_{\om}:= C \sum_{j\in I}\chi_{\Omega_{j}} + \sum_{j\not\in I}\chi_{\Omega_{j}} .
651: $$
652: The following observation is useful.
653: \begin{lemma}\label{lem:eml}
654: For any subset $N\subset\set{1,\dots,m}$ of cardinality at most $n$ it holds
655: $$
656: \expect_{m,l}\#(I\setminus N)\geq \frac l 2.
657: $$
658: \end{lemma}
659: \begin{proof}
660:   Clearly, for any fixed $k\in\set{1,\dots,m}$ we have
661:   $\mu_{m,l}(k\in I)=l/m$, thus
662: $$
663: \expect_{m,l}\#(I \setminus N) = \sum_{r\in N^{c}} \expect_{m,l}\chi_{I}(r) =
664: \#(N^{c})\frac l m\geq \frac l 2,
665: $$
666: where we denoted by $N^{c}$ the complement of $N$.
667: 
668: \end{proof}
669: \begin{proof}[Proof of Theorem~\ref{thm:lbfc}]
670: Given the above prior let us denote 
671: \begin{equation}
672:   \label{eq:errmfl}
673:   e^{avg}_{n}(\fco):= \inf_{q}\lr{\expect_{m,l}\expect_{\e}\abs{S(f,\rho) - q(f,\rho)}^{2}}^{1/2},
674: \end{equation}
675: where the $\inf$ is taken with respect to any
676: (possibly adaptive) deterministic algorithm
677: which uses at most $n$ values from $f$ and $\rho$.
678: 
679: 
680: For any Monte Carlo method $S_n$  we have, using Bahvalov's argument~\cite{Bachvalov}, the relation
681: \begin{equation}
682:   \label{eq:mc2avg}
683:  e(S_{n},\fco) \geq e^{avg}_{n}(\fco).
684: \end{equation}
685: We provide a lower bound for $e^{avg}_{n}(\fco)^{2}$. 
686: To this end note that for each realization $(f_{\om},\rho_{\om})$ the
687: integral $\int \rho_{\om} \;d\mu$ is constant.
688: In the first case $m\geq C -1$, 
689: and we can bound the integral by the choice of $l$ as
690: \begin{equation}
691:   \label{eq:intrho}
692:   c_{m,l}:= \int \rho_{\om}(x)\; \mu(d x)= \frac 1 m \lr{l C +
693:     (m-l)1} \leq 3.
694: \end{equation}
695: In the other case  $m <  C -1$, we obtain~$c_{m,1}= (C - 1 + m)/m$.
696: Now, to analyze the average case error, let $q_{n}$ be any
697: (deterministic) method, and let us assume that it uses the set $N$ of nodes. 
698: We have the decomposition
699: $$
700: S(f_{\om},\rho_{\om}) -
701: q_{n}(f_{\om},\rho_{\om})=  \lr{\frac{C}{m c_{m,l}} \sum_{j\in
702:     I\setminus N} \e_{j}}
703:  - \lr{\frac{C}{m c_{m,l}} 
704:  \sum_{j\in I\cap N} \e_{j} - q_{n}(f_{\om},\rho_{\om})}.
705: $$
706: Given $I$, 
707: the random variables in the brackets
708: are conditionally independent, thus uncorrelated.
709: Hence we conclude that
710: \begin{align*}
711:   \expect_{m,l}\expect_{\e}\abs{S(f_{\om},\rho_{\om}) -
712: q_{n}(f_{\om},\rho_{\om})}^{2}
713: & \geq \expect_{m,l}\expect_{\e}\abs{\frac{C}{m c_{m,l}} \sum_{j\in
714:     I\setminus N} \e_{j} }^{2}\\
715: & = \frac{C^{2}}{m^{2} c_{m,l}^{2}}\expect_{m,l}\#(J\setminus N)\geq
716: \frac{C^{2} l}{2 m^{2} c_{m,l}^{2}},
717: \end{align*}
718: by Lemma~\ref{lem:eml}.
719: % \begin{equation*}
720: %   \expect_{m,l}\abs{ S(f,\rho) - q_{n}(f,\rho)}^{2}= \frac{1}{\binom m
721: %     l}\sum_{I\in J_{l}^{m}}
722: % \expect_{m,l}\lr{\abs{ \frac{C}{m c_{m,l}} 
723: % \sum_{j\in I} f^{m}_{j} - q_{n}(f^{m},\rho^{m})}^{2}/I},
724: % \end{equation*}
725: % where the expectation on the right is the conditional expectation,
726: % i.e., when $I$ is fixed. 
727: % This depends on the overlap between $N\subset\set{1,\dots,m}$ the
728: % set of nodes which is used by $q_{n}$ and
729: % $I$ which may vary between $0$ and $l$. Further note that, 
730: %  such that we can bound ($k$ being the random
731: % cardinality $\#(I\setminus N)$ ) 
732: % \begin{align*}
733: %   \expect_{m,l}\abs{ S(f,\rho) - q_{n}(f,\rho)}^{2}
734: % &\geq  \frac{1}{\binom m l}
735: % \sum_{I\in J_{l}^{m}} \expect_{m,l}\lr{\abs{ \frac{C}{m c_{m,l}} 
736: % \sum_{j\in
737: %       I\setminus N} f^{m}_{j}}^{2}/I}\\
738: % &=  \frac{C^{2}}{m^{2} c_{m,l}^{2}} 
739: % \sum_{k=0}^{l} k P(\# (I\setminus N)=k)\\
740: % &= \frac{C^{2}}{m^{2} c_{m,l}^{2}}  \sum_{k=0}^{l} k
741: % \frac{\binom{n}{l-k}\binom{m-n}{k}}{\binom m l}=  
742: % \frac{C^{2}}{m^{2}c_{m,l}^{2}} \frac{(m - n) l}{m},
743: % \end{align*}
744: % where we used  the definition of the binomials 
745: % to evaluate the sum on the
746: % right. %  as 
747: % $$
748: %  \sum_{k=0}^{l} k
749: % \frac{\binom{n}{l-k}\binom{m-n}{k}}{\binom m l} = \frac{(m - n) l}{m}.
750: % $$
751: % Overall we obtain
752: % \begin{equation}
753: %   \label{eq:finbound}
754: %  \expect_{m,l}\abs{ S(f,\rho) - q_{n}(f,\rho)}^{2} \geq
755: %  \frac{C^{2}}{m^{2}c_{m,l}^{2}} \frac{(m - n) l}{m}= \frac{C^{2} l}{2
756: %    c_{m,l}^{2} m^{2}}.  
757: % \end{equation}
758: In the case $m\geq C -1 $ we obtain $l\geq m/C$ and  have
759: $c_{m,l}\leq 3$, such that %we finally obtain
760: $$
761: \expect_{m,l}\abs{ S(f,\rho) - q_{n}(f,\rho)}^{2} \geq \frac{C}{36 n},
762: $$
763: which in turn yields the first case bound in~(\ref{eq:2nc}).
764: In the other case~$m <  C -1$ the value of $l=1$ yields the second
765: bound in~(\ref{eq:2nc}).
766: \end{proof} 
767: \subsection{The error of the simple Monte Carlo method} 
768: \label{sec:simple}
769: 
770: The direct approach to evaluate~(\ref{eq:base}) would be to use the
771: method~$\vtn$ from~(\ref{eq:vtn}).
772: We will prove an upper bound for the error of this method, and 
773: we start with the following 
774: \begin{lemma}\label{lem:rho}
775:   If the function $\rho$ obeys the requirements in~$\fco$, then 
776:   \begin{enumerate}
777:   \item $0< \inf_{x\in\Omega}\rho(x)\leq
778:     \sup_{x\in\Omega}\rho(x)<\infty$.
779: \item For every probability measure $\mu$ on $\Omega$ we have
780: $\norm{\rho}{2,\mu}\leq \sqrt C\norm{\rho}{1,\mu} $.
781:   \end{enumerate}
782: \end{lemma}
783: \begin{proof}
784:   To prove the first assertion, fix any $y_{0}\in\Omega$. Then the
785:   assumption on $\rho$ yields $\rho(x)\leq C \rho(y_{0})$, and
786:   reversing the roles of $x$ and $y$ also the lower bound.
787: Now both, the assumption on $\rho$ as well as the 
788: second assertion,  are invariant with respect to multiplication
789: of $\rho$ by a constant. In the light of the first assertion we may
790: and do assume that $1\leq\rho(x)\leq C,\ x\in\Omega$, and  we derive,
791: using $ 1 \leq \int_{\Omega}\rho(x)\; \mu(dx)$, that
792: $$
793: \int_{\Omega}\rho^{2}(x)\; \mu(dx)\leq C \int_{\Omega}\rho(x)\;
794: \mu(dx) \leq C \lr{\int_{\Omega}\rho(x)\; \mu(dx)}^{2},
795: $$
796: completing the proof of the second assertion and of the lemma.
797: \end{proof}
798: We turn to the bound for the simple Monte Carlo method.
799: \begin{theorem}
800: For all $n\in\N$ we have
801: \begin{equation}
802:     \label{eq:thm1}
803: e(\vtn,\fco)\leq 2\, \min\set{1,  \sqrt{\frac{2C}{n}}} .  
804:   \end{equation}
805: \end{theorem}
806: \begin{proof}
807: The upper bound~$2$ is trivial, it even holds deterministically. 
808:   Fix any pair $(f,\rho)$ of input. For any sample
809:   $\lr{X_1,\dots,X_n}$ and function $g$ we denote the sample 
810: mean by $\vt(g):= 1/n\sum_{j=1}^n g(X_j)$. 
811: It is well known that $e(\vt,g)\leq \norm{g}{2}/\sqrt n$. 
812: With this notation we can
813: bound
814:   \begin{align*}
815:     &\abs{S(f,\rho) - \vtn(f,\rho)}\leq \abs{S(f,\rho) -
816:       \frac{\vt(f\rho)}{\int \rho(x)\mu(dx)}}+  
817: \abs{\frac{\vt(f\rho)}{\int \rho(x)\mu(dx)} -
818:   \frac{\vt(f\rho)}{\vt(\rho)}}\\ 
819: &\leq \frac{1}{\norm{\rho}{1}}\lr{\abs{\int
820:   f(x)\rho(x)\mu(dx)-\vt(f \rho) } 
821: + \abs{\frac{\vt(f\rho)}{\vt(\rho)}}
822: \abs{\int \rho(x)\mu(dx) - \vt(\rho)}}\\
823: &\leq  \frac{1}{\norm{\rho}{1}}\lr{\abs{\int
824:   f(x)\rho(x)\mu(dx)-\vt(f \rho) } 
825: +\norm{f}{\infty}
826: \abs{\int \rho(x)\mu(dx) - \vt(\rho)}},
827:   \end{align*}
828: where we used 
829: $
830: \abs{\vt(f\rho)/{\vt(\rho)}}\leq \norm{f}{\infty},
831: $
832: which holds true since the enumerator and 
833: denominator use the same sample.
834: This yields the following error bound
835: \begin{align*}
836:   e(\vtn,(f,\rho))&\leq   \frac{\sqrt 2}{\norm{\rho}{1}}
837: \lr{ e(\vt,f\rho) + \norm{f}{\infty}e(\vt,\rho)}\\
838: &\leq \frac{\sqrt 2}{\norm{\rho}{1}\sqrt n}\lr{\norm{f\rho}{2} +
839:   \norm{f}{\infty}\norm{\rho}{2}}\leq \frac{2\sqrt 2 \norm{f}{\infty}}{\sqrt
840:   n}\frac{\norm{\rho}{2}}{\norm{\rho}{1}}
841:   \leq \frac{2\sqrt{2C}}{\sqrt  n},
842: \end{align*}
843: where we use Lemma~\ref{lem:rho}. Taking  the supremum over $(f,\rho)\in\fco$
844: allows to complete the proof. 
845: \end{proof}
846: 
847: \section{Analysis for $\fad(\Omega)$} \label{s3} 
848: 
849: In this section we impose restrictions on  the input data, in
850: particular on the density,  in order to improve the complexity. This
851: class is still large enough to contain many important situations.
852: Monte Carlo methods for problems when the target (invariant)
853: distribution is log-concave proved to be important in many studies, we
854: refer to~\cite{MR1284987}. One of the main intrinsic features of such
855: classes of distributions are \emph{isoperimetric inequalities},
856: see~\cite{103439,MR1318794}, which will also be used here in the form
857: as used in~\cite{MR2178341}.
858: Recall that here we always require that $\Omega\subset \R^{d}$ is a
859: convex body, as introduced in Section~\ref{sec:classfad}.
860: 
861: % We always assume the following. 
862: % The functions $f$ and $\rho$ are defined on 
863: % a compact and convex set $\Omega \subset \R^d$ 
864: % with nonempty interior and  
865: % $\mu=\muo$ is the normalized Lebesgue measure 
866: % on the set~$\Omega$. 
867: % We  consider the class of log-concave weights on
868: % $\Omega$  given by
869: % $$ 
870: % \rad(\Omega)  = \{  \rho \mid 
871: % \rho >0, \
872: % \log\rho \text{ is concave}, \
873: % |\log\rho(x) - \log\rho(y) | \leq \alpha \Vert x-y \Vert_2 \} . 
874: % $$
875: 
876: % We study the class $\fad(\Omega)$, given by 
877: % $$
878: % \fad (\Omega)  = \set{(f, \rho) \mid 
879: % \rho \in\rad( \Omega),  \ \norm{f}{2,\rho}\le 1 } ,
880: % $$
881: % where $\Vert \cdot \Vert_{2,\rho}$ is the 
882: % $L_2$-norm with respect to the probability measure $\mur$, 
883: % see~\eqref{mur}. 
884: % In particular, we study the (Euclidean) unit ball, i.e.,  
885: % $\Omega:= \ball \subset \R^d$. 
886: 
887: We start with a lower bound for all non-adaptive algorithms to exhibit
888: that simple Monte Carlo cannot take into account the additional
889: structure of the underlying class of input data and adaptive methods
890: should be used. This bound, together with Theorem~\ref{th5}, will show 
891: that adaptive methods can outperform any 
892: non-adaptive method, if we consider $S$ on $\fad (\ball)$. 
893: Indeed, we also show that specific Metropolis
894: algorithms, based on local underlying Markov chains are suited for
895: this problem class.
896: 
897: \subsection{A lower bound for non-adaptive methods}
898: \label{sec:non}
899: 
900: Here we prove a lower bound for all non-adaptive methods 
901: (hence in particular for the simple Monte Carlo method) 
902: for the problem on the classes~$\fad(\Omega)$. 
903: Again, this lower bound will use Bahvalov's technique.
904: 
905: We start with a result on sphere packings. 
906: The Minkowski-Hlawka theorem,  see~\cite{MR0172183}, 
907: says that the density of the densest sphere packing in $\R^d$ 
908: ist at least $\zeta (d) \cdot 2^{1-d}\ge 2^{1-d}$. 
909: It is also known, see \cite{Hlawka}, that the density 
910: (by definition of the whole $\R^d$) can be replaced by the density within 
911: a convex body $\Omega$, as long as the radius $r$ of the 
912: spheres tends to zero. Hence we obtain the following result. 
913: 
914: \begin{lemma}
915: \label{lem:MHT}
916: There is $n_{\Omega}\in\N$ such that for all $m\geq n_{\Omega}$ there are points
917: $y_{1},\dots,y_{m}\in\Omega$ such that with 
918: $$
919: r:=r(\Omega,m):= 2^{-1} m^{-1/d} \left( \frac{\vol (\Omega)}
920: {\vol (\ball)}\right)^{1/d}
921: $$
922: the closed balls $B_{i}:= B(y_{i},r)\subset \Omega$ 
923: are disjoint.
924: \end{lemma}
925: 
926: Our construction will use such points $y_{1},\dots,y_{m}\in\Omega$ and
927: the corresponding balls $B_{1},\dots,B_{m}$ as follows.
928: 
929: For $i\in\set{1,\dots,m}$ we assign 
930: \begin{align*}
931:  \rho_{i}(y)&:= c_i \exp\lr{-\alpha\norm{y - y_{i}}{2}},\quad
932:  y\in\Omega  \quad\text{and}\\ 
933: f_{i}(y)&:= \tilde c_i  \chi_{B_{i}}(y),\quad y\in\Omega ,
934: \end{align*}
935: with constants $c_i$ and $\tilde c_i$ chosen such that 
936: \begin{alignat*}{2}
937: 1&= \int_{\Omega } \rho_i(y) \, dy &= 
938: c_i \int_{\Omega } \exp(- \a \norm{y - y_i}{}) dy\quad \text{and}\\
939: 1&=\norm{f_i}{2,\rho_i} &= \tilde c_i^2 c_i \int_{B_i} \exp(- \a
940: \norm{y - y_i}{})\, dy.  
941: \end{alignat*}
942: The corresponding values of the mapping $S$ are computed as
943: \begin{align}\label{eq:slb}
944:   \begin{split}
945: S(f_i,\rho_i) &= \int_{\Omega } f_i \rho_i\, dy = \tilde c_i c_i
946: \int_{B_i} \exp(- \a \norm{y - y_i}{})\, dy\\
947: & = \lr{ c_i \int_{B_i} \exp(- \a \norm{y - y_i}{}) dy}^{1/2}=
948:  \lr{ c_i \int_{B(0,r)} \exp(- \a \norm{y}{}) \, dy}^{1/2}\\
949: &= \lr{\frac{\int_{B(0,r)} \exp(- \a \norm{y}{}) \, dy}{\int_{\Omega} 
950: \exp(- \a \norm{y - y_i}{})\,dy}}^{1/2}.
951:   \end{split}
952: \end{align}
953: Again we turn to the average case setting, this time with
954:  probability measure $\mu^{2n}$ being the equidistribution on the set 
955: $$
956: \mathcal F^{2n}:= \set{ \lr{\e_i f_i,\rho_i},\quad i=1,\dots,2n,\
957:   \e_i=\pm 1}\subset \fad(\Omega ).
958: $$
959: Similar to~(\ref{eq:mc2avg}) we have for any non-adaptive Monte Carlo
960: method $S_n(f,\rho)$ the relation 
961: $$
962: e(S_n,\fad(\Omega ))\geq
963: \min\set{ e^{avg}(q_n,\mu^{2n}),\quad q_n \text{ is 
964: deterministic and non-adaptive}},
965: $$
966: where $e^{avg}(q_n,\mu^{2n})$ denotes the average case error of the
967: deterministic non-adaptive method $q_n$ with respect to the
968: probability $\mu^{2n}$.
969: Thus let~ $q_n$ be any non-adaptive 
970: (deterministic) algorithm for $S$ on the 
971: class $\fad (\Omega )$ that uses at most $n$ values.
972: 
973: The average case error can then be bounded from below as
974: \begin{align*}
975: \expect_{\mu^{2n}}\abs{S(f,\rho) - q_n(f,\rho)}^2&=
976: \frac{1}{2n}\sum_{i=1}^{2n} 
977: \expect_{\e}\abs{S(\e_i f_i,\rho_i) - q_n(\e_i
978:   f_i,\rho_i) }^2\\
979: &\geq \frac 1 2 \min_{i=1,\dots,2n}\expect_{\e}\abs{S(\e_i f_i,\rho_i)
980: }^2 \geq  \frac 1 2 \min_{i=1,\dots,2n}S(f_i,\rho_i)^2.
981: \end{align*}
982: Above, $\expect_{\e}$ denotes the expecation with respect to the
983: independent random variables $\e_{i}=\pm 1$.
984: Together with~(\ref{eq:slb}) we obtain
985: $$
986: e(S_n,\fad(\Omega))\geq \frac 1 2 \sqrt 2\,
987: \min_{i=1,\dots,2n}\lr{\frac{\int_{B(0,r)} \exp(- \a \norm{y}{}) \,
988:     dy}{\int_{\Omega} \exp(- \a \norm{y - y_i}{})\,dy}}^{1/2}. 
989: $$
990: We bound the enumerator from below and the denominator from
991: above.
992: For $\alpha r\leq \log 2$ we can bound 
993: $$
994: \int_{B(0,r)} \exp(- \a \norm{y}{}) \, dy\geq \frac 1 2
995:   \vol(B(0,r))= \frac 1 2 r^d \vol(\ball).
996: $$
997: For the denominator we have  %  \fix{alpha gross klein} 
998: %   letting temporarily $\bar\a:=
999: %   \max\set{\a,1}$, that 
1000: \begin{align*}
1001: \int_{\Omega} \exp(- \a \norm{y - y_i}{})\,dy &\leq \int_{\R^d} \exp(-
1002: \a \norm{y - y_i}{})\,dy \\
1003: & ={\a}^{-d} \int_{\R^d} \exp(-\norm{y}{})\,dy=
1004: {\a}^{-d} \Gamma(d)\vol{\partial \ball},
1005: \end{align*}
1006: such that we finally obtain, using the well known formula
1007: $\vol(\partial \ball) = d \vol(\ball)$, that
1008: $$
1009: e(S_n,\fad(\Omega))\geq  \frac 1 2 \sqrt 2\, \lr{\frac{{\a}^d
1010:     r^d}{2 d!}}^{1/2} = \frac 1 2 \lr{\frac{{\a}^d
1011:     r^d}{d!}}^{1/2}.
1012: $$
1013: Using the value for $r=r(\Omega ,2n)$ from Lemma~\ref{lem:MHT} we end up
1014: with
1015: \begin{theorem} 
1016: Assume that $S_n$ is any non-adaptive Monte Carlo method for 
1017: the class $\fad (\Omega )$. Then, with~$ n_\Omega $ from Lemma~\ref{lem:MHT},
1018: we have for all 
1019: $$ 
1020: 2n \ge \max\set{n_\Omega ,\lr{\a/{\log 4}}^d \cdot 
1021: \frac{\vol \Omega}{\vol \ball}}
1022: $$ 
1023: that
1024: \begin{equation}  \label{lo9} 
1025: e(S_n, \fad(\Omega )) \ge 
1026: 2^{-d/2-3/2} \cdot 
1027: \left( \frac{\vol \Omega}{\vol \ball} \right)^{1/2} \cdot 
1028: \frac{\alpha^{d/2}}{\sqrt{d!}} \ n^{-1/2} .
1029: \end{equation} 
1030: \end{theorem} 
1031: 
1032: \begin{rem}
1033: For fixed $d$ this is a lower bound of the form 
1034: $e(S_n) \ge c_\Omega \, \a^{d/2} \, n^{-1/2}$. It is interesting only 
1035: if $\alpha$ is ``large'', otherwise the already mentioned lower bound 
1036: $(1+ \sqrt{n})^{-1}$ is better. 
1037: 
1038: We stress that in the above reasoning we essentially used the
1039: non-adaptivity of the method $S_n$. Indeed, if $S_n$ were adaptive,
1040: then by just one appropriate function 
1041: value $\rho(x)$,  we could identify the
1042: index $i$, since the functions $\rho_i$ are
1043: global. Then, knowing $i$,  we could ask for the value of $\e_i$ and
1044: would obtain the exact solution to $S(f,\rho)$ for this small class
1045: $\mathcal F^{2n}$ for all $n \ge 2$. 
1046: \end{rem}
1047: 
1048: \subsection{Metropolis method with local underlying walk}
1049: \label{sec:metro-loc}
1050: 
1051: The Metropolis algorithm we consider here has a specific
1052: routine~\KwDet in Figure~\ref{fig:gene}, whereas the
1053: final step~\KwAvg is exactly as given in~(\ref{eq:met}). It is based on a
1054: specific ball walk and this version is
1055: sometimes called \emph{ball walk with
1056: Metropolis filter}, see~\cite{MR2178341}.
1057: Two concepts from the theory of Markov chains turn out to be
1058: important, reversibility and uniform ergodicity. We recall these
1059: notions briefly, see~\cite{MR1399158} for further details.
1060: A Markov chain  $(K,\pi)$ is \emph{reversible with respect to $\pi$}, 
1061: if for all measurable subsets $A,B\subset\Omega$ the balance
1062: \begin{equation}\label{eq-rev}
1063: \int_{A}K(x,B)\pi(dx)=\int_{B}K(x,A)\pi(dx)
1064: \end{equation}
1065: holds true. Notice that in this case necessarily $\pi$ is an invariant
1066: distribution.
1067:  
1068: A Markov chain is \emph{uniformly ergodic} if there are $n_{0}\in\N$, a
1069: constant $c>0$ and a probability measure $\nu$ on $\Omega$ such that
1070:   \begin{equation}
1071:     \label{eq:ueball}
1072: K^{n_{0}}(x,A) \geq c \nu(A),
1073: \quad \text{ for all } A\subset \Omega\text{ and } x\in\Omega.
1074:   \end{equation}
1075: Markov chains which are  uniformly ergodic have a unique invariant
1076: probability distribution.
1077: 
1078: Our analysis will be based on conductance arguments and we
1079: recall the basic notions, see~\cite{MR1025467,MR1238906}.
1080: If $(K,\pi)$ is a Markov chain with transition kernel $K$ and
1081: invariant distribution $\pi$ then we assign the 
1082: \begin{enumerate}
1083: \item 
1084: \emph{local conductance} at $x\in\Omega$ by $l_K(x):=
1085:   K(x,\Omega\setminus\set{x})$,
1086: \item and the \emph{conductance} as
1087: \begin{equation}
1088:   \label{eq:conductance}
1089:   \phi(K,\pi):= \inf_{0<\pi(A) <  1}\frac{\int_A K(x,A^c)
1090:     \pi(dx)}{\min\set{\pi(A),\pi(A^c)}}, 
1091: \end{equation}
1092: where $A^c= \Omega \setminus A$. 
1093: \end{enumerate}
1094: Below we call $l>0$ a \emph{lower bound for the local conductance}, if
1095: $l_{K}(x)\geq l$ for all $x\in\Omega$.
1096: 
1097: \subsubsection*{The ball walk and some of its properties}
1098: \label{sec:ball}
1099: 
1100: Here we gather some properties of the ball walk, 
1101: see~\cite{MR1238906,MR2178341},  which will serve as
1102: ingredients for the analysis of Metropolis chains using this as the
1103: underlying proposal. 
1104: In particular we prove that on convex bodies in $\R^{d}$ the ball walk is 
1105: uniformly ergodic and we bound its conductance from below, in terms
1106: of bounds $l>0$ for the local conductance.
1107: 
1108: We abbreviate $B(0,\delta) = \delta \ball$. 
1109: Let $Q_\delta$ be the transition 
1110: kernel of a local random walk
1111: having transitions within $\delta$-balls of its current position,
1112: i.e., we let 
1113: \begin{equation}
1114: \label{eq:pxx}
1115: Q_{\delta}(x,\set{x}):= 1 - \frac{\vol(B(x,\delta) 
1116: \cap \Omega)}{\vol(\delta \ball )},
1117: \end{equation}
1118: and 
1119: \begin{equation}
1120: \label{eq:qloc}
1121: Q_\delta(x,A):= 
1122: \begin{cases}
1123: \displaystyle{\frac{\vol(B(x,\delta) \cap A)}{\vol(\delta \ball )}}, 
1124: &   A
1125: \subset \Omega \text{ and }x \notin A, \\
1126: Q_\delta(x,A\setminus\set{x}) +   Q_{\delta}(x,\set{x}), & 
1127: A \subset \Omega \text{ and } x\in A.
1128: \end{cases}
1129: \end{equation}
1130: Schematically, the transition kernel may be viewed 
1131: as in Figure~\ref{fig:bbb}.
1132: 
1133: \SetKw{KwProp}{Propose:}
1134: \SetKw{KwAcc}{Accept:}
1135: \SetKwInOut{Input}{Input}
1136: \SetKwInOut{Output}{Output}
1137: \restylealgo{ruled}
1138: \begin{figure}[h]
1139:   \centering
1140: \begin{procedure}[H]
1141: \Input{current position $x$; $\delta>0$\;}
1142: \Output{next position\;}
1143: \KwProp{Choose $y\in B(x,\delta)$ uniformly}\;
1144: \KwAcc{}
1145: \eIf{$y\in\Omega$}{\Return{$y$}\;}{\Return{$x$}\;} 
1146:   \caption{Ball-walk-step($x,\delta$)}
1147: \end{procedure}  
1148:   \caption{Schematic view of ball walk step}
1149:   \label{fig:bbb}
1150: \end{figure}
1151: Clearly we may restrict to $\delta\leq D$, the diameter of $\Omega$.
1152: The following observation is important and explains why we restrict
1153: ourselves to convex bodies..
1154: \begin{lemma}
1155:  If $\Omega\subset \R^{d}$ is a convex body, then the ball walk
1156:  $Q_{\delta}$ has a (non-trivial) lower bound $l>0$ for the local conductance.
1157: \end{lemma}
1158: \begin{proof} 
1159: It is well-known that convex bodies satisfy the cone condition 
1160: (see % Lemma 3 of Section 3.2 in
1161: \cite[\S~3.2, Lemma~3]{Burenkov}).  
1162: Therefore we obtain that for each $\delta>0$ there is $l>0$ such that
1163: for each $x \in \Omega$ we have $l_{Q_\delta} (x) \ge l$.
1164: % $$
1165: % \exists \ {l > 0} \quad
1166: % \exists \ {\delta_0>0} \quad 
1167: % \forall \ {0<\delta < \delta_0} \quad
1168: % \forall \ {x \in \Omega} \quad
1169: % l_{Q_\delta} (x) \ge l .
1170: % $$
1171: \end{proof}
1172: \begin{rem}
1173: Observe however, that $l$ might be very small. 
1174: For $\Omega=[0,1]^d$, for example, we get $l = 2^{-d}$, 
1175: even if $\delta$ is very small. In contrast, we will 
1176: see that a large $l$ is possible for $\Omega=B^d$ 
1177: and $\delta \le 1/\sqrt{d+1}$, see Lemma~\ref{lem:l-bound}.   
1178: \end{rem}
1179: Notice that $l_{Q_{\delta}}(x)= {\vol(B(x,\delta) \cap
1180: \Omega)}/{\vol(\delta \ball )}$, hence in the following we use the inequality
1181: \begin{equation}
1182: \label{eq:l-bound}
1183: \vol(B(x,\delta)\cap \Omega)\geq l \vol(\delta\ball),  
1184: \end{equation}
1185: where $l>0$ is a lower bound for the local conductance 
1186: of the ball walk. 
1187: 
1188: The following result  is~\emph{folklore}, but for a 
1189: lack of reference we sketch a proof.
1190: 
1191: \begin{proposition}\label{prop:ueqd}
1192: %  Let $\Omega\subset \R^{d}$ be compact.
1193: The ball walk $Q_{\delta}$ is reversible with respect to the uniform
1194: distribution $\muo$ and 
1195: %  If there is a lower bound $l>0$  for the local conductance
1196: %  of $Q_{\delta/2}$ then the ball walk $Q_{\delta}$ is 
1197: uniformly ergodic.
1198: %   For each $0<\delta\leq 1/\sqrt{d+1}$  the ball 
1199: %   walk $Q_{\delta}$  is uniformly
1200: %   ergodic and reversible. In particular there are $n_{0}\in\N$, a
1201: %   constant $c>0$ and a probability measure $\nu$ on $\ball$ such that
1202: %   \begin{equation}
1203: % %    \label{eq:ueball}
1204: % Q_{\delta}^{n_{0}}(x,A) \geq c \nu(A),
1205: % \quad \text{ for all } A\subset \ball\text{ and } x\in\ball.
1206: %   \end{equation}
1207: \end{proposition}
1208: 
1209:  The crucial tool for proving this is provided by the
1210: notion of small and petite sets, where we refer to~\cite[Sect.~5.2 \&
1211: 5.5]{Meyn-book} for details and properties. 
1212: To this end we introduce a \emph{sampled} chain, say
1213: $(Q_{\delta})_{a}$, where $a$ is some probability
1214: $a=\lr{a_{0},a_{1},\dots}$ on $\set{0,1,2,\dots}$
1215: and $(Q_{\delta})_{a}$ is defined by $(Q_{\delta})_{a}(x,C):=
1216: \sum_{j=0}^{\infty}a_{j}Q_{\delta}^{j}(x,C)$.
1217: %A set $C\subset \Omega$ is \emph{petite}, 
1218: We recall that a
1219: (measurable) subset $C\subset \Omega$ is \emph{petite} (for
1220: $Q_{\delta}$), if there are a probability~$a$
1221:  and a probability measure $\nu$ on
1222: $\Omega$ such that 
1223: \begin{equation}
1224: \label{eq:small}
1225: (Q_{\delta})_{a}(y,A)\geq \varepsilon \nu(A),
1226: \quad A\subset \Omega,\ y \in C.
1227: \end{equation}
1228: A set $C\subset \Omega$ is \emph{small}, if the same property holds
1229: true for some Dirac probability $a:= \delta_{n}$, such that obviously
1230: small sets are petite.
1231: We first show that certain balls are small.
1232: 
1233: \begin{lemma}\label{lem:small}
1234: % Let $\delta>0$ and let $l >0$ be a lower bound for 
1235: % the local conductance
1236: % of the ball walk $Q_{\delta/2}$. 
1237: The sets $ B(x,\delta/2)\cap
1238: % If there is a is  
1239: % a lower bound $l>0$ for the local conductance
1240: % of the ball walk $Q_{\delta/2}$ then the sets $ B(x,\delta/2)\cap
1241: \Omega,\ x\in\Omega$ are small for $Q_\delta$.
1242: %  Let $\delta\leq 1/\sqrt{d+1}$ and $x\in\ball$. If $y\in
1243: %   B(x,\delta/2)\cap \ball$ then
1244: %   \begin{equation}
1245: %     \label{eq:smlemma}
1246: %     Q_{\delta}(y,A) \geq 0.3 \cdot 2^{-d} \frac{\vol(A \cap
1247: %       B(x,\delta/2)\cap \ball)}{\vol( B(x,\delta/2)\cap \ball)},\quad
1248: %     A\subset \ball.
1249: % \end{equation}
1250: %Consequently,  each set $B(x,\delta/2)\cap \ball$  is small.
1251: \end{lemma}
1252: 
1253: \begin{proof}
1254: First, we note that $y\in B(x,\delta/2)$ implies $B(x,\delta/2) \subset
1255: B(y,\delta)$. Let $l>0$ be a lower bound for the local conductance of
1256: $Q_{\delta/2}$. Using~(\ref{eq:l-bound}) 
1257: for $Q_{\delta/2}$, we obtain for any set $A\subset \Omega$ that
1258: \begin{align*}
1259:   Q_{\delta}(y,A) &\geq  Q_{\delta}(y,A\setminus\set{y}) =
1260:   \frac{\vol(B(y,\delta)\cap A)}{\vol(B(y,\delta))} \geq 2^{-d}
1261:   \frac{\vol(B(x,\delta/2)\cap A)}{\vol(\delta/2\ball)}\\
1262: &\geq l \cdot 2^{-d} \frac{\vol(A \cap
1263:       B(x,\delta/2)\cap \Omega)}{\vol( B(x,\delta/2)\cap \Omega)}.
1264: \end{align*}
1265: Hence estimate~(\ref{eq:small}) holds true with $n_{0}:=1,\
1266: \varepsilon:= l\cdot 2^{-d}$ and 
1267: $$
1268: \nu(A) := \frac{\vol(A \cap
1269:       B(x,\delta/2)\cap \Omega)}{\vol( B(x,\delta/2)\cap \Omega)},\quad
1270:     A\subset \Omega.
1271: $$
1272: This completes the proof.
1273: \end{proof}
1274: \begin{proof}[Proof of Proposition~\ref{prop:ueqd}]
1275: We first prove reversibility with respect to $\muo$. 
1276: Notice that it is enough to verify~(\ref{eq-rev}) 
1277: for disjoint sets $A,B\subset \Omega$.
1278: Furthermore we observe that for any pair $A,B\subset \Omega$ 
1279: of measurable subsets the characteristic function of the set 
1280: $$
1281: \set{(x,y)\in\Omega\times \Omega,\quad x\in A,\ y\in B,\ \norm{x -
1282:     y}{}\leq \delta}
1283: $$
1284: can equivalently be rewritten as
1285: $$
1286: \chi_{B}(y) \chi_{B(y,\delta)\cap A}(x)
1287: \quad \text{or} \quad \chi_{A}(x) \chi_{B(x,\delta)\cap B}(y).
1288: $$
1289: Hence, letting temporarily
1290: $c:={\vol(\Omega)\vol(\delta\ball)}$ we obtain  
1291: \begin{align*}
1292:   \int_{A}Q_{\delta}(x,B)\;\muo(dx)&=
1293:  \frac 1 c \int_{A}\vol(B(x,\delta)\cap
1294:   B)\; dx\\
1295: &=  \frac 1 c \int_{\Omega}\int_{\Omega}\chi_{A}(x) 
1296: \chi_{B(x,\delta)\cap B}(y)\;
1297: dy\;dx\\
1298: &=  \frac 1 c \int_{\Omega}\int_{\Omega}\chi_{B}(y) 
1299: \chi_{B(y,\delta)\cap A}(x)\;
1300: dx\;dy= \int_{B}Q_{\delta}(y,A)\;\muo(dy),
1301: \end{align*}
1302: proving reversibility.
1303: 
1304: By Lemma~\ref{lem:small} each set $B(x,\delta/2) \cap \Omega$ is small,
1305: thus also petite. Petiteness is in\-heri\-ted by taking finite
1306: unions. Since $\Omega$, being compact, can be covered by finitely many
1307: sets  $B(x,\delta/2)\cap \Omega$, this implies that $\Omega$ is
1308: petite. By~\cite[Thm.~16.2.2]{Meyn-book} this yields uniform
1309: ergodicity of the ball walk % , and hence that $\Omega$ is small
1310: (see~\cite[Thm.~16.0.2(v)]{Meyn-book}).
1311: \end{proof}
1312: We mention the following conductance bound  of the ball
1313: walk, which is  a slight improvement
1314: of~\cite[Thm.~5.2]{MR2178341}. This will be  a special case of
1315: Theorem~\ref{thm:met-cond}, below, and we omit the proof.
1316: 
1317: \begin{proposition}\label{pro:phi}
1318: Let $(Q_{\delta},\muo)$ be the ball walk from above,
1319: and let $\phi(Q_{\delta},\muo)$ be its conductance. 
1320: Let~$D$ be the diameter of $\Omega$  and 
1321: let $l$ be a lower bound for the local conductance. Then
1322: \begin{equation}
1323: \label{eq:ballconductancelb}
1324: \phi(Q_{\delta},\muo) \geq  
1325: \sqrt{\frac \pi 2}\frac{l^{2}\delta}{8 D \sqrt{d +1}}.
1326: %% \frac{l^{2}\delta}{16 D \sqrt d}.
1327: \end{equation}  
1328: \end{proposition}
1329: 
1330: The local conductance may be arbitrarily small if the domain $\Omega$
1331: has sharp corners. 
1332: For specific sets $\Omega$ we can explicitly provide lower bounds for
1333: the local conductance, and this will be used in the later convergence
1334: analysis.
1335: In the following we mainly discuss the case $\Omega = \ball$. 
1336: 
1337: We start with a  technical result, related to the Gamma function on
1338: $\R^+$. We use the well-known formula
1339: \begin{equation}
1340:   \label{eq:3}
1341: \vol(\ball)= \pi^{d/2}/\Gamma(d/2 +1). 
1342: \end{equation}
1343: \begin{lemma}\label{lem:bou}
1344: For any $z>0$ we have
1345: \begin{equation}
1346:   \label{eq:gamma}
1347:   \frac{\Gamma(z+1/2)}{\Gamma(z)}\leq \sqrt z.
1348: \end{equation}
1349: Consequently,
1350: \begin{equation}
1351:   \label{eq:vol-bound}
1352: \frac{ \vol(B^{d-1})}{\vol(\ball)}
1353: %% \frac{\Gamma(d/2 +1 )}{\delta\sqrt\pi\Gamma((d+1)/2) }
1354: \leq \sqrt{\frac{d+1}{2\pi}}.
1355: \end{equation}
1356: \end{lemma}
1357: \begin{proof}
1358:   By~\cite[Chapt.~VII, Eq.~(11)]{MR2013000} we know that the function
1359:   $z\mapsto \log\Gamma(z)$ is convex for $z>0$. Thus we conclude
1360:   \begin{align*}
1361:     \log\Gamma(z + 1/2) 
1362: &\leq \frac 1 2 \lr{\log\Gamma(z+1) + \log\Gamma(z)}\\
1363: & = \frac 1 2 \lr{\log z  + 2 \log\Gamma(z)} 
1364: = \log\sqrt z + \log\Gamma(z),
1365:   \end{align*}
1366: from which the proof of assertion~(\ref{eq:gamma}) can be completed.
1367: Using the representation for the volume from~(\ref{eq:3}) and applying
1368: the above  bound with $z:= (d+1)/2$ we obtain
1369: $$
1370: \frac{ \vol(B^{d-1})}{\vol(\ball)}\leq
1371: \frac{\Gamma(d/2 +1 )}{\sqrt\pi\Gamma((d+1)/2) }
1372: \leq \sqrt{\frac{d+1}{2\pi}},
1373: $$
1374: and the proof is complete.
1375: \end{proof}
1376: %P where we can explicitly provide lower bounds 
1377: %  for the local conductance. 
1378: Using Lemma~\ref{lem:bou},  we can prove the 
1379: following lower bound for the local
1380: conductance of the ball walk on $\ball$.
1381: 
1382: \begin{lemma}  \label{lem:l-bound}
1383: Let $(Q_\delta,\muo)$ be the local ball walk on $\ball\subset \R^d$.
1384: If $\delta\leq 1/\sqrt{d +1}$, then its 
1385: local conductance obeys $l\geq 0.3$.  
1386: \end{lemma}
1387: 
1388: \begin{proof}
1389: The proof is based on some geometric reasoning. It is clear that the
1390: local conductance~$l(x)$ is minimal for points $x$ at the
1391: boundary of $\ball$, and in this case 
1392: its value equals the portion, say $\widetilde V$,  
1393: of the volume of $B(x,\delta)$ inside $\ball$. If $H$ is the
1394: hyperplane at $x$ to $\ball$, then this cuts off $B(x,\delta)$
1395: exactly one half of its volume. 
1396: Thus we let  $Z(h)$ be the cylinder with
1397: base being the $(d-1)$-ball around $x$ in the hyperplane $H$ of
1398: radius $\delta$. 
1399: Its height~$h$ is the distance of $H$ to the hyperplane
1400:   determined by the intersection of $\ball\cap B(x,\delta)$. This
1401:   height $h$ is exactly determined from the quotient $h/\delta =
1402:   \delta/2$, by similarity, hence $h:= \delta^2/2$. By
1403:   construction we have $\widetilde V \geq 1/2 -
1404: \vol(Z(h))/\vol(B(x,\delta))$ and we can 
1405: lower bound the local conductance $l(x)$ by
1406: $$
1407: l(x)\geq \frac 1 2  - \frac{\vol(Z(h))}{\vol(B(x,\delta))}.
1408: $$
1409: We can evaluate~$\vol(Z(h))$ as
1410: $
1411: \vol(Z(h)) = h \delta^{d-1} \vol(B^{d-1}),
1412: $
1413: and we obtain
1414: $$
1415: l(x)\geq \frac 1 2 - \frac{\delta^{d+1} \vol(B^{d-1})}{2 \delta^d
1416: \vol(\ball)}= \frac 1 2 \lr{ 1 - 
1417: \frac{\delta \vol(B^{d-1})}{\vol(\ball)}}.
1418: $$
1419: % We use~(\ref{eq:3}) 
1420: % \begin{displaymath}
1421: %   l(x)\geq \frac 1 2 \lr{1 - \frac{\delta \Gamma(\frac d 2 + 1)}{
1422: %   \sqrt\pi  \Gamma(\frac d 2 +  \frac 1 2)}}.
1423: % \end{displaymath}
1424: The bound~(\ref{eq:vol-bound}) from Lemma~\ref{lem:bou} implies
1425: $$
1426: l(x) \geq % \frac 1 2 \lr{1 - \frac{\delta \sqrt{{(d+1)}/{2}}}{
1427: %   \sqrt\pi}} =
1428: \frac 1 2 \lr{1 - \frac{\delta \sqrt{{d+1}}}{  \sqrt{2
1429:     \pi}}}.
1430: $$
1431: For $\delta\leq 1/(\sqrt{d+1})$ we get
1432: $l(x) \geq 1/2( 1 - 1/\sqrt{2\pi})\geq 0.3$, completing the proof.
1433: \end{proof}
1434: 
1435: We close this subsection with the following technical lemma, 
1436: which  can be extracted from the unpublished
1437: seminar note~\cite{vempala-lesson}. For the convenience of the
1438: reader we present its proof. 
1439: In addition we will slightly improve the statement.
1440: \begin{lemma}%% [{\cite{vempala-lesson}}]
1441:   \label{lem:vempala}
1442: Let $l >  0$ be a lower bound for the local 
1443: conductance of the ball walk $(Q_\delta,\muo)$.
1444: For any $0<t< l$ and any set
1445:   $A\subset \Omega$ with related sets 
1446:   \begin{align}
1447: A_1&:= \set{x\in A, \quad Q_\delta (x, A^c)< \frac{l -t}{2}}\subset
1448:     A\\
1449:   A_2 &:= \set{y\in A^c,\quad  Q_\delta(y, A)< \frac{l -t}{2}}\subset
1450:   A^c,
1451:   \end{align}
1452: we have $d(A_1,A_2)>t\delta \sqrt{2 \pi/\lr{d+1}}$.
1453: \end{lemma}
1454: For its proof we need the following 
1455: \begin{lemma}
1456: Let $\delta>0$.
1457:   If $x,y\in \R^d$ are two points with distance~$t\delta \sqrt{2
1458:     \pi/\lr{d+1}}$ at most, then 
1459:   \begin{equation}
1460:     \label{eq:1}
1461:     \vol(B(x,\delta)\cap B(y,\delta)) \geq (1 - t) \vol(\delta\ball).
1462:   \end{equation}
1463: \end{lemma}
1464: \begin{proof}
1465: Let $u:= \norm{x - y}{2}$. If $u<\delta$ then 
1466:  the volume of the intersection of $B(x,\delta)$ and $B(y,\delta)$  is
1467: exactly the same as the volume of the 
1468: ball $\delta\ball$ minus the volume of the
1469:  middle slice with distance~$u$ as thickness. The volume of
1470:  this slice is bounded from above by the volume of the cylinder with
1471:  base $\delta B^{d-1}$ and thickness $u$. Thus we obtain
1472:  \begin{equation*}
1473:   \vol(B(x,\delta)\cap B(y,\delta)) \geq \vol(\delta\ball) - u
1474: \vol(\delta B^{d-1}) = 
1475: \vol(\delta\ball) \lr{ 1 - u \frac{ 
1476: \vol(\delta B^{d-1})}{\vol(\delta\ball)}}.   
1477:  \end{equation*}
1478: Applying Lemma~\ref{lem:bou} we obtain
1479: $$
1480: \frac{ \vol(\delta B^{d-1})}{\vol(\delta\ball)}=
1481: \frac{ \vol(B^{d-1})}{\delta\vol(\ball)}
1482: \leq \frac 1 \delta \sqrt{\frac{d+1}{2\pi}},
1483: $$
1484: thus  by the choice of $u\leq \sqrt{2\pi} t\delta/\sqrt{d+1} $ 
1485: we conclude that
1486: $$
1487: u\frac{ \vol(\delta B^{d-1})}{\vol(\delta\ball)}
1488: \leq  \frac{\sqrt{2\pi}t\delta
1489: \sqrt{d+1}}{\delta\sqrt{2\pi}\sqrt {d+1}}\leq t,
1490: $$
1491: and the proof is complete. 
1492: \end{proof}
1493: We turn to the 
1494: \begin{proof}[Proof of Lemma~\ref{lem:vempala}]
1495: Let $x\in A_1$ and $y\in A_2$ be in $\Omega$, and suppose that their
1496:   distance is at most $t\delta \sqrt{2 \pi/\lr{d+1}}$.
1497:   Simple set theoretic reasoning shows that
1498:   \begin{align*}
1499: \vol(B(x,\delta)\cap B(y,\delta)\cap \Omega)& 
1500: \geq \vol(B(x,\delta)\cap
1501: \Omega) - \vol(B(x,\delta)\setminus B(y,\delta)) \\
1502: &\geq  \vol(B(x,\delta)\cap
1503: \Omega) - \vol(B(x,\delta)\setminus (B(x,\delta)\cap  B(y,\delta)))
1504: \\
1505: &= \vol(B(x,\delta)\cap \Omega) - \vol(\delta\ball) 
1506: + \vol(B(x,\delta)\cap  B(y,\delta)).
1507:   \end{align*}
1508: Since $l$ is a lower bound for the conductance $l(x)$ we have that
1509: $$
1510: \vol(B(x,\delta)\cap \Omega)\geq l \vol(B(x,\delta))= l
1511: \vol(\delta\ball).
1512: $$ 
1513: Taking this into account and using~(\ref{eq:1}) we
1514: end up with
1515: \begin{align*}
1516:  \vol(B(x,\delta)\cap B(y,\delta)\cap \Omega)& \geq l
1517:  \vol(\delta\ball) - \vol(\delta\ball) + (1-t) \vol(\delta\ball) \\
1518: & = (l-t)  \vol(\delta\ball). 
1519: \end{align*}
1520: In probabilistic terms this rewrites as
1521: $Q_\delta(x, B(x,\delta)\cap B(y,\delta)\cap \Omega) \geq l-t$, and
1522: similarly $Q_\delta(y, B(x,\delta)\cap B(y,\delta)\cap \Omega) \geq
1523: l-t$.
1524: Now, if $A\subset\Omega$ is any measurable subset with complement
1525: $A^c$ then for $x\in A$ and $y\in A^c$ we obtain 
1526: $$
1527: B(x,\delta) \cap B(y,\delta)\cap\Omega \subset
1528: \lr{B(x,\delta) \cap A^c \cap \Omega} 
1529: \bigcup \lr{B(y,\delta) \cap A \cap \Omega} ,
1530: $$
1531: %E  In der letzten Formel ist ein c verschoben worden! 
1532: which in turn yields $Q_\delta(x,A^c) + Q_\delta(y,A)\geq l-t$, but
1533: %E auch in der letzten Formel ist ein c gewandert! 
1534: this contradicts the definition of the sets $A_1$ and $A_2$. Hence any
1535: two points from $A_1$ and $A_2$, respectively,  must have distance
1536: larger than  $t\delta \sqrt{2 \pi/\lr{d+1}}$, and the proof is complete.
1537: \end{proof}
1538: 
1539: \subsubsection*{Properties of the related Metropolis method}
1540: \label{sec:metprop}
1541: We analyze  Metropolis Markov chains which are based
1542: on the ball walk, introduced above, for some appropriately chosen
1543: $\delta$. As it will turn out, the related Metropolis chains are
1544: \emph{perturbations} of the underlying ball walk, and its properties,
1545: as established in Propositions~\ref{prop:ueqd} and~\ref{pro:phi}
1546: extend in a natural way.
1547: 
1548: For $\rho \in \rad(\Omega)$ we define the \emph{acceptance
1549:   probabilities} as
1550: \begin{equation}
1551:   \label{eq:alpha}
1552:   \alph(x,y):= \min\set{1,\frac{\rho(y)}{\rho(x)}}.
1553: \end{equation}
1554: The corresponding Metropolis kernel is given by
1555: \begin{equation} 
1556:   \label{mk}
1557:   \krd(x,dy):= 
1558:   \alph(x,y) Q_\delta(x,dy) 
1559:   + (1 - \int_{}\alph(x,y)Q_\delta(x,dy))\delta_x(dy).
1560: \end{equation}
1561: Note that for $x \notin A$ we obtain 
1562: $$
1563: \krd (x,A) =
1564: \int_A \alph (x,y) \, Q_\delta (x, dy) = 
1565: \frac 1 {\vol (\delta \ball)} \, \int_{A\cap B(x,\delta)} 
1566: \alph (x,y) \, dy .
1567: $$
1568: % For the convenience of the reader
1569: Below we sketch a single Metropolis~\KwDet 
1570: from the present position~$x\in\Omega$ with kernel
1571:     $\krd(x,\cdot)$. 
1572: The procedure~{\bf Ball-walk-step} was described in
1573: Figure~\ref{fig:bbb}.
1574: 
1575: \begin{figure}[h]
1576:   \centering
1577: \begin{procedure}[H]
1578:   \caption{Metropolis-step($x,\rho,\delta$)}
1579: \SetLine 
1580: \Input{current position $x$, $\delta>0$, function $\rho$\;}
1581: \Output{next position\;}
1582: \KwProp{$y := \text{\bf Ball-walk-step}(x,\delta)$}\;
1583: \KwAcc{}
1584: 
1585: \uIf{$\rho(y)\geq \rho(x)$}{\Return{$y$}}%{\Return{$x$}}
1586: \uElseIf{$\rho(y) \geq {\bf rand()}\cdot \rho(x)$}{\Return{$y$}}
1587: \Else{\Return{$x$}}
1588: \end{procedure}  
1589:   \caption{Schematic view of the Metropolis step. Note that the Acceptance step results in an
1590:     acceptance probability of $\alph(x,y)=\min\set{1,\rho(y)/\rho(x)}$.}
1591:   \label{fig:ccc}
1592: \end{figure}
1593: We start with the following observation.
1594: \begin{lemma}\label{lem:beta}
1595: Let $\alpha$ be the Lipschitz constant in $\rad(\Omega)$ and  $\beta:=
1596: \exp(-\alpha\delta)$. 
1597:   Uniformly for $\rho\in\rad(\Omega)$ the following bound for the
1598:   related Metropolis chain holds true: 
1599:   \begin{equation}
1600:     \label{eq:alb}
1601: \krd(x,dy) \geq \beta Q_\delta(x, dy).
1602:   \end{equation}
1603: \end{lemma}
1604: \begin{proof}
1605: Let $A\subset\Omega$. If $\dist(x,A)>\delta$  then there is nothing to
1606: prove.
1607: Otherwise, for $y\in A\cap B(x,\delta)$  we find
1608: from~(\ref{eq:dens-class}) and~(\ref{eq:alpha}) that 
1609: \begin{equation*}
1610:  \alph(x,y)\geq \exp(-\alpha\norm{x - y}{2})
1611: \geq e^{-\alpha\delta}=\beta.     
1612: \end{equation*}
1613: By definition of the transition kernel $\krd$ from~(\ref{mk}) we can
1614: use $\beta$ to bound
1615: $$
1616: \krd(x,A)\geq \min\set{\alph(x,y),\ y\in A\cap B(x,\delta)}
1617: Q_\delta(x, A) \geq \beta Q_\delta(x, A).  
1618: $$ 
1619: The proof is complete.
1620: \end{proof}
1621: The assertion of Proposition~\ref{prop:ueqd} extends to the family of
1622: Metropolis chains as follows% , 
1623: % quantifying similar results from~\cite{MR1399158}
1624: .
1625: 
1626: \begin{proposition}[{cf.~\cite[Prop.~1]{MR1738303}}]\label{pro:uue1}
1627: Let $Q_{\delta}$ be the ball walk from~(\ref{eq:qloc}) on
1628: %  a compact set 
1629: $\Omega$.
1630: %  with lower bound $l$ for the local conductance of
1631: %  $Q_{\delta/2}$. % with $\delta\leq1/\sqrt{d+1}$. 
1632: For each $\rho\in\rad(\Omega)$ and $\delta\leq D$ the corresponding 
1633: Metropolis chains from~(\ref{mk}) are
1634: uniformly ergodic and reversible with respect to the related $\mur$.
1635: \end{proposition}
1636: 
1637: \begin{proof}
1638: Reversibility with respect to $\mur$ is clear 
1639: by the choice of the function~$\alph$. To
1640: prove uniform ergodicity, 
1641: let $\beta$ be from Lemma~\ref{lem:beta} and $c$
1642:   from~(\ref{eq:ueball}). %Set $\eta:= 1 - \beta^{n_{0}}c$.
1643:   As established in Lemma~\ref{lem:beta} we have $\krd(x,dy)\geq
1644:   \beta Q_{\delta}(x,dy)$. It is easy to see, and was established
1645:   in~\cite[Proof of Thm.~2]{MR1738303}, that this extends to all
1646:   iterates as
1647: $$
1648: \krd^{n}(x,dy)\geq   \beta^{n} Q^{n}_{\delta}(x,dy).
1649: $$
1650: Recall that under the assumptions made, 
1651: the ball walk is uniformly ergodic, and
1652: from Proposition~\ref{prop:ueqd} we obtain  $n_{0}$ such that for all
1653: $x\in\Omega$ we have
1654: \begin{equation}
1655:   \label{eq:unifbound}
1656: \krd^{n_{0}}(x,A)\geq  \beta^{n_{0}}c \nu(A),\quad A\subset \Omega,  
1657: \end{equation}
1658: proving uniform ergodicity.
1659: \end{proof}
1660: 
1661: \begin{rem}\label{rem:unifb}
1662: Notice that~(\ref{eq:unifbound}) is obtained with 
1663: right hand side \emph{uniformly} for
1664: all $\rho\in\rad(\Omega)$, a fact which will prove useful later.
1665: \end{rem}
1666: 
1667: Finally we prove lower bounds for the conductance of the
1668: Metropolis chains. 
1669: 
1670: \begin{theorem}\label{thm:met-cond}
1671: Let $(\krd,\mur)$ be the Metropolis chain based
1672: on the local ball walk $(Q_\delta,\muo)$ 
1673: and let $\phi(\krd,\mur)$ be its conductance, where 
1674: $\rho\in\rad(\Omega)$.
1675: Let $l$ be a lower bound for the local conductance of $Q_{\delta}$.
1676: For  $\rho\in\rad(\Omega)$  we have 
1677: \begin{equation}
1678: \label{eq:conductancelb}
1679: \phi(\krd,\mur) \geq  
1680: \frac{l e^{-\alpha \delta}}{8} 
1681: \min\set{\sqrt{\frac \pi 2}\frac{l\delta}{D \sqrt{d +1}},1},
1682: \end{equation}
1683: where $D$ is the diameter of $\Omega$. 
1684: \end{theorem}
1685: \begin{rem}
1686:   As mentioned above, Proposition~\ref{pro:phi} is a special case of
1687: Theorem~\ref{thm:met-cond} for $\alpha=0$.
1688: \end{rem}
1689: The proof of Theorem~\ref{thm:met-cond} will be based on 
1690: Lemma~\ref{lem:vempala} for the underlying
1691: ball walk, specifying $t:= l/2$.
1692: This extends to the Metropolis walk as follows.
1693: \begin{lemma}\label{cor:vemp}
1694:   Let $\alpha$
1695:   from~(\ref{eq:dens-class}) and $l$ be the local conductance of the
1696:   ball walk. We let $\beta:= \exp(-\alpha\delta)$.
1697: For $A\subset \Omega$ we assign
1698:  \begin{align}
1699: T_1 &:= \set{x\in A,\quad  \krd(x,A^c)< \frac{\beta l}{4}}\subset
1700:     A\\
1701:   T_2 &:= \set{y\in A^c,\quad  \krd(y,A)< \frac{\beta l}{4}}\subset
1702:   A^c.
1703:   \end{align}
1704: Then $d(T_1 ,T_2)>\delta l \sqrt{{\pi}/\lr{2d+2}}$.
1705:  \end{lemma}
1706:  \begin{proof}
1707: It is enough to prove $T_1\subset A_1$ and $T_2\subset A_2$.
1708: If $x\in T_1$ then Lemma~\ref{lem:beta} implies
1709: $\krd(x,A^c) <\beta {l}/{4}$, hence 
1710: $$
1711: Q_\delta (x, A^c) \leq \frac 1 \beta \krd(x,A^c) \leq  \frac{l}{4}.
1712: $$
1713: The other inclusion is proved similarly.
1714:  \end{proof}
1715: We turn to the 
1716: \begin{proof}[Proof of Theorem~\ref{thm:met-cond}]
1717: Let $A\subset \Omega$ be the set for which the conductance is
1718: attained. We assign sets $T_1$ and $T_2$ as in
1719: Lemma~\ref{cor:vemp} and distinguish two cases. 
1720: If $\mur (T_1)<\mur (A)/2$ \emph{or}
1721: $\mur (T_2)<\mur (A^c)/2$, 
1722: then the
1723: estimate~(\ref{eq:conductancelb}) follows easily. 
1724: For instance, if  $\mur(T_1)<\mur(A)/2$ then 
1725: \begin{multline*}
1726:   \int_A \krd(x,A^c)\mur(dx) \geq  \int_{A\setminus T_1}
1727:   \krd(x,A^c)\mur(dx)\\
1728: \geq \frac{\beta l}{4}\mur(A\setminus T_1)\geq 
1729: \frac{ \beta l}{8}\mur(A)\geq
1730:  \frac{\beta l}{8} \min\set{\mur(A),\mur(A^c)},
1731: \end{multline*}
1732: %   The choice of $t:= l/2c$ yields
1733: % $$
1734: %  \int_A \krd(x,A^c)\mu(dx)\geq \frac{\beta l}{8}\mu(A),
1735: % $$
1736:  thus $ \phi(\krd,\mur)\geq \beta l/8$ in this case, which
1737: proves~(\ref{eq:conductancelb}).
1738: %  under condition~(\ref{eq:condition}).
1739: 
1740: Otherwise we have $\mur(T_1)\geq \mur(A)/2$ \emph{and}
1741: $\mur(T_2)\geq \mur(A^c)/2$. In this case we
1742: apply an isoperimetric inequality, 
1743: see~\cite[Thm.~4.2]{MR2178341} to the triple
1744: $(T_1,T_2,T_3)$ with 
1745: $T_3:= \Omega \setminus (T_1 \cup T_2)$ to conclude 
1746: that
1747: \begin{equation}
1748:   \label{eq:mu123}
1749: \mur(T_3)\geq \frac{2 d(T_1,T_2)}{D}\min\set{\mur(T_1),
1750: \mur(T_2)},
1751: \end{equation}
1752: hence under the size constraints in this case it holds true that
1753: \begin{equation}
1754:   \label{eq:mu123f}
1755:   \mur(T_3)\geq
1756:   \frac{d(T_1,T_2)}{D}\min\set{\mur(A),\mur(A^c)}.
1757: \end{equation}
1758: Using the reversibility of the Metropolis 
1759: chain $(\krd,\mur)$ we have
1760: $$
1761: \int_A \krd(x,A^c)\mur(dx)= \int_{A^c} \krd(y,A)\mur(dy),
1762: $$
1763: which implies
1764: \begin{align*}
1765: \int_A \krd(x,A^c)\mur(dx)&= \frac 1 2 \lr{\int_A \krd(x,
1766:   A^c )\mur(dx)+ \int_{A^c} \krd(y,A)\mur(dy) }  \\
1767: & \ge  \frac 1 2 \lr{ \int_{A\cap T_3} \krd(x,
1768:   A^c )\mur(dx)+ \int_{A^c \cap T_3} \krd(y,A)\mur(dy) }\\
1769: &\geq \frac 1 2 \lr{ \frac{\beta l }{4} \mur(A \cap T_3) +
1770:   \frac{\beta l}{4} \mur(A^c  \cap T_3) }\\
1771: &= \frac{\beta l }{8}\lr{\mur(A \cap T_3) 
1772: +\mur(A^c  \cap T_3) }=
1773: \frac{\beta l}{8} \mur(T_3).
1774: \end{align*}
1775: Since by Lemma~\ref{cor:vemp} we can bound
1776: $d(T_1,T_2)\geq \delta l \sqrt{{\pi}/\lr{2d+2}}$ 
1777: we use~(\ref{eq:mu123f}) to
1778: complete the proof.
1779: \end{proof}
1780: 
1781: If we restrict ourselves to Metropolis chains on $\ball$, then
1782: Lemma~\ref{lem:l-bound}  provides a lower bound for 
1783: the local conductance which is independent of the
1784: dimension~$d$. 
1785: As a simple consequence of Theorem~\ref{thm:met-cond} we 
1786: then obtain the following
1787: \begin{corollary}
1788: \label{cor2} 
1789: Assume that $\rho \in \rad(\ball)$ and 
1790: $\delta \le (d+1)^{-1/2}$. 
1791: Then we obtain 
1792: $$
1793: \phi(K_{\rho, \delta} , \mur) \ge\sqrt{\frac \pi 2}
1794: \frac{9 \delta}{1600 \sqrt{d + 1}} e^{-\alpha \delta} .
1795: $$
1796: To maximize $\phi$ we define
1797: \begin{math} % \label{eq:max} 
1798: \delta^* = \min\set{{1}/{\sqrt{d+1}},1 /\a }
1799: \end{math} 
1800: and obtain 
1801: $$
1802: \phi(K_{\rho, \delta^*}, \mur)   
1803: \ge 0.0025 \, \frac{1}{\sqrt{d+1}} 
1804: \min\set{\frac{1}{\sqrt{d+1}},\frac 1 \a } .
1805: $$
1806: \end{corollary}
1807: 
1808: \subsubsection*{Error bounds}
1809: \label{sec:er-mc}
1810: 
1811: For the class $\fad(\Omega)$ 
1812: the above lower conductance bound~(\ref{eq:conductancelb}) 
1813: will yield an error estimate for the problem~(\ref{eq02}).
1814: 
1815: Let $S_n^\delta$ be the 
1816: estimator based on a sample of the local Metropolis Markov
1817: chain with transition $K_{\rho,\delta}$, starting at zero.
1818: To estimate its error
1819: we combine the estimates of the conductance of $K_{\rho,\delta}$
1820: with two results, partially known from the literature.
1821: To formulate the results we note the following. 
1822: The Markov kernel 
1823: $K_{\rho, \delta}$ is reversible with respect to 
1824: $\mur$ and hence induces a self-adjoint operator
1825: $$
1826: K_{\rho, \delta} : L_2 (\Omega,\mur) \to L_2 (\Omega,\mur) .
1827: $$
1828: The spectrum $\sigma (K_{\rho,\delta})$ is contained 
1829: in $[-1, 1]$ and $1 \in \sigma(K_{\rho, \delta})$ 
1830: and we are interested in the second largest eigenvalue 
1831: $$
1832: \beta_{\rho, \delta} := \sup \{ \sigma \in \sigma 
1833: (K_{\rho, \delta}) \mid \sigma \not= 1 \}
1834: $$
1835:  of $K_{\rho,\delta}$. This is motivated by the  extension of  a result 
1836: from~\cite[Cor.~1]{MR1738303} about the worst case
1837: error of $S_n^\delta$, uniformly for $(f,\rho)\in\fad(\Omega)$.  
1838: \begin{lemma}
1839: \label{le:mathescharf} 
1840: $$
1841: \lim_{n \to \infty } \sup_{(f,\rho)\in\fad(\Omega)} 
1842: e(S_n^\delta, (f,\rho))^2 \cdot n =
1843: \sup_{\rho\in\rad(\Omega)}\frac{1+ \beta_{\rho, 
1844: \delta}}{1- \beta_{\rho, \delta}} . 
1845: $$
1846: \end{lemma} 
1847: The proof is given in the appendix. 
1848: For Markov chains which start according to the invariant distribution
1849: $\mur$ the bound is similar, but  more explicit and was given
1850: in~\cite{SOK} and~\cite[Thm.~1.9]{MR1238906}.
1851: 
1852: The relation of the second largest
1853: eigenvalue~$\beta_{\rho, \delta}$ to the conductance 
1854: is given in 
1855: 
1856: \begin{lemma}[Cheeger's Inequality, 
1857: see~\cite{MR1025467,MR930082,MR1238906}]
1858: \label{le:cheeger}
1859: $$
1860: \lambda_{\rho,\delta} :=  1 - \beta_{\rho, \delta} \geq 
1861: \phi^{2}(K_{\rho, \delta}, \mur)/2.
1862: $$
1863: \end{lemma}
1864: 
1865: We are ready to state %  and prove
1866: our main result for 
1867: the Metropolis algorithm $S_n^\delta$, based on the Markov chain 
1868: $K_{\rho, \delta}$, for the class 
1869: $\fad (\ball)$, i.e., when $\Omega\subset \R^{d}$ is the Euclidean unit ball. 
1870: \begin{theorem} 
1871: \label{th5} 
1872: Let $S_n^\delta=\frac 1 n \sum_{j=1}^{n}f(X_{j})$ be the 
1873: estimator based on a sample~$(X_{1},\dots,X_{n})$ of the local Metropolis Markov
1874: chain with transition $K_{\rho, \delta}$, 
1875: where $\delta \le (d+1)^{-1/2}$.
1876: Then
1877: \begin{equation}
1878:   \label{eq:th5}
1879:  \lim_{n \to \infty} \sup_{(f,\rho)\in\fad(\ball)}  
1880: e(S_n^\delta, (f,\rho) ) ^2 \cdot n 
1881: \le \frac{8\cdot 1600^{2}}{81\pi}(d +1)\cdot \frac{e^{2 \alpha \delta}}
1882: {\delta^{2}} . 
1883: \end{equation}
1884:  Again we may choose 
1885: $
1886: \delta^* = \min\set{(d+1)^{-1/2},\alpha^{-1}}
1887: $
1888: and obtain 
1889: \begin{equation} 
1890: \label{tract} 
1891: \lim_{n \to \infty} \sup_{(f,\rho)\in\fad(\ball)} 
1892: e(S_n^{\delta^*} , (f,\rho) ) ^2 \cdot n 
1893: \le 594700 \cdot (d+1)\max\set{d+1,{\alpha^{2}}}. 
1894: \end{equation} 
1895: \end{theorem} 
1896: \begin{proof} 
1897: This follows from Corollary~\ref{cor2}, and 
1898:  Lemmas~\ref{le:mathescharf} and~\ref{le:cheeger}. 
1899: \end{proof} 
1900: 
1901: \section{Summary}
1902: \label{sec:sum}
1903: 
1904: Let us discuss our findings.  %   in some detail.
1905: The results from Section~\ref{s2} clearly indicate that the
1906: superiority of Metropolis algorithms upon 
1907: simpler (non-adaptive) Monte Carlo methods
1908: does not hold in general. Specifically, it does not hold 
1909: for the large classes $\fco$ of input without
1910: additional structure.
1911: 
1912: On the other hand, for the class~$\fad(\ball)$, specific Metropolis
1913: algorithms that are based on local underlying walks are superior to
1914: all non-adaptive methods. 
1915: Even more,   %  , as formula~\eqref{tract} indicates,
1916: on~$\ball$  %   the problem is \emph{tractable} 
1917: %   in $d$ and $\alpha$:
1918: the cost of the algorithm~$S_n^{\delta^*}$, roughly 
1919: given by the number $n$ of evaluations of $\rho$ and $f$, 
1920: increases like a polynomial in $d$ and $\alpha$.
1921: More 
1922: precisely, according to~\eqref{tract}, the asymptotic constant 
1923: $\lim_{n \to \infty} e(S_n^{\delta^*} , \fad(\ball) ) ^2~\cdot~n$
1924: is bounded by a constant times~$\max\set{d^{2}, d\alpha^{2}}$, 
1925: i.e., the complexity grows polynomially in $d$ and $\alpha$
1926: and, for fixed $d$, increases (at most) as $\alpha^{2}$. 
1927: If we only allow non-adaptive methods then this asymptotic constant,
1928: again for fixed $d$,  increases at least as $\alpha^{d}$,
1929: see~\eqref{lo9}. 
1930: 
1931: %E  Hier Modifikationen. 
1932: We believe that this problem is \emph{tractable} in the sense that 
1933: the number of function values to achieve an error $\e$ can be bounded
1934: by 
1935: \begin{equation}   \label{tract2} 
1936: n(\e , \fad(\ball)   ) \,  \le \,  C \,  \e^{-2} \,  d \,  \max ( d, \alpha^2) .
1937: \end{equation} 
1938: We did not prove \eqref{tract2}, however, since Theorem 5 is only a
1939: statement for large $n$. 
1940: 
1941: Notice
1942: that according to Theorem~\ref{th5} the size~$\delta^{\ast}$ of the
1943: underlying balls walk needs to be adjusted both to 
1944: the spatial dimension~$d$ and the
1945: Lipschitz constant~$\alpha$.
1946: 
1947: The analysis of the Metropolis algorithm is based on properties of the
1948: underlying ball walk; in particular we establish uniform ergodicity of
1949: the ball walk for convex bodies~$\Omega\subset \R^{d}$. Also, based
1950: on conductance arguments,  we provide lower bounds for the spectral gap
1951: of the ball walk.   % : If $\delta\sim 1/\sqrt{d}$, then
1952: % Proposition~\ref{pro:phi} together 
1953: % with Lemma~\ref{le:cheeger} show that this is independent
1954: % of the dimension
1955: 
1956: As a consequence,  in the case~$\alpha=0$ the estimate~(\ref{eq:th5})  provides an
1957: error bound for the ball 
1958: walk $(Q_{\delta},\mu)$, which is asymptotically of the form 
1959: $ e(S_{n}^{\delta},L_{2}(\ball,\mu))\leq C \delta^{-1}
1960: (d/n)^{1/2}$.%  This complements the heuristic considerations
1961: % from~\cite[Example~1]{MR1738303}.
1962: 
1963: The results  extend in a similar way to any family 
1964: $\Omega_d \subset \R^{d}$ for which
1965: the underlying local ball walk $Q_{\delta}$ has 
1966: (for $\delta \le \delta_d$) 
1967: a non-trivial lower bound for the
1968: local conductance that is independent of the dimension.
1969: 
1970: Finally, from the results of Section~\ref{s2} we can conclude that adaption
1971: does not help much for the classes $\fco$. 
1972: Hence we have new results concerning the \emph{power of
1973: adaption}, see~\cite{MR1408328} for a survey of earlier results, in
1974: particular that it may help to break the \emph{curse of
1975: dimensionality} for the classes $\fad(\ball)$. 
1976: 
1977: \appendix
1978: \section{Proof of Lemma~\ref{le:mathescharf}}
1979: % \label{app}
1980: 
1981: Lemma~\ref{le:mathescharf} extends the bound 
1982: from~\cite[Thm.~1]{MR1738303}, which deals with a single uniformly
1983: ergodic chain. It was obtained from  on a contraction
1984: property, as stated in~\cite[Prop.~1]{MR1738303}.
1985: The goal of the
1986: present analysis is to establish this asymptotic result 
1987: \emph{uniformly} for all Metropolis chains with density from
1988: $\rad(\Omega)$, by showing that this contractivity holds true uniformly% , which in turn allows to
1989: % extend the proof of Theorem~1 in~\cite{MR1738303}
1990: .
1991: 
1992: \subsection*{Contractivity of the Markov operator}
1993: We assign to each transition kernel $K$ on $\Omega$ with corresponding invariant
1994: distribution $\mu$ the bounded linear mapping $P$, given by
1995: \begin{equation}
1996:   \label{eq:prd}
1997: (P f)(x) := \int f(y) K(x,dy).
1998: \end{equation}
1999: Also we let $E$ denote the mapping which assigns any integrable
2000: function its expectation as a constant function
2001: $
2002: E(f)\colon= \int_\Omega f(x) \, \mu(dx).
2003: $
2004: {F}or each $K$ the mapping $P - E$ is bounded in
2005: $L_{\infty}(\Omega,\mu)$, with norm less than or  equal to one and we
2006: shall strengthen this uniformly for kernels $\krd$ with
2007: $\rho\in\rad(\Omega)$.
2008: Within this operator context 
2009: \emph{uniform ergodicity} is equivalent to a specific
2010: form of quasi-compactness, namely there are $0<\eta<1$ and $n_{0}\in\N$
2011: for which
2012: \begin{equation}\label{eq-infty-con}
2013: \norm{P^{n} - E\colon L_{\infty}(\Omega)
2014: \to L_{\infty}(\Omega)}{}\leq\eta,\ \text{for $n\geq
2015: n_{0}$.}
2016: \end{equation}
2017: We first show that reversibility allows to transfer this to
2018: the spaces~$L_{1}(\Omega,\mu_{\rho})$.
2019: \begin{lemma} \label{lem:infty1}
2020: Suppose that the transition kernel $K$ with corresponding
2021:   mapping $P$  is reversible. Then for all $n\in\N$ we have
2022:   \begin{equation}
2023:     \label{eq:1infty}
2024:     \norm{P^{n} - E\colon L_{1}(\Omega,\mu)
2025:     \to L_{1}(\Omega,\mu)}{}
2026:     \leq  \norm{P^{n} - E\colon L_{\infty}(\Omega,\mu)
2027:     \to L_{\infty}(\Omega,\mu)}{}.
2028:   \end{equation}
2029: % Consequently, if $K$ is uniformly ergodic and reversible, then there
2030: % are $n_{0}\in\N$ and $\eta<1$ such that 
2031: % \begin{equation}
2032: %   \label{eq:1eta}
2033: % \norm{P^{n_{0}} - E\colon L_{1}(\ball,\mu)
2034: % \to L_{1}(\ball,\mu)}{}\leq \eta.
2035: % \end{equation}
2036:   \end{lemma}
2037:   \begin{proof}
2038:     If $K$ is reversible, then so are all iterates $K^{n}$. Thus for
2039:     arbitrary functions $f\in L_{1}(\Omega,\mu)$ and $h\in
2040:     L_{\infty}(\Omega,\mu)$ we have, using the scalar product on
2041:     $L_{2}(\Omega,\mu)$, that
2042: $$
2043: \scalar{(P^{n}- E)f}{h}= \scalar{f}{(P^{n}- E)h}.
2044: $$
2045: Consequently, for any $f\in L_{1}(\Omega,\mu)$ we have
2046: \begin{align*}
2047:   \norm{(P^{n} - E) f}{1} &= \sup_{\norm{h}{\infty}\leq 1}
2048:   \abs{\scalar{(P^{n}- E)f}{h}} =
2049:   \sup_{\norm{h}{\infty}\leq 1} \abs{\scalar{f}{(P^{n}- E)h}}  \\
2050:   &\leq \norm{f}{1} \sup_{\norm{h}{\infty}\leq 1} \norm{(P^{n}-
2051:     E)h}{\infty},
2052: \end{align*}
2053: from which the proof can be completed.
2054: \end{proof}
2055: 
2056: \begin{proposition}\label{pro:unbound}
2057: % Let $\Omega\subset\R^{d}$ be a compact set, 
2058: % and suppose that there is  a lower bound $l>0$ 
2059: % for the local conductance of $Q_{\delta/2}$.
2060: For any convex body $\Omega \subset \R^d$ 
2061: there are an
2062: integer $n_{0}$ and a constant $0<\eta<1$ such that uniformly for
2063: $\rho\in\rad(\Omega)$ we have
2064: \begin{equation}
2065:   \label{eq:uue1}
2066:   \norm{\prd^{n_{0}} - E\colon L_{1}(\Omega,\mu_{\rho})\to
2067:     L_{1}(\Omega,\mu_{\rho})}{}\leq \eta.
2068: \end{equation}
2069: \end{proposition}
2070: 
2071: \begin{proof}
2072: This is  an immediate consequence of the
2073: bound~(\ref{eq:unifbound}). As mentioned in Remark~\ref{rem:unifb}
2074: uniform ergodicity was established uniformly for $\rho\in\rad(\Omega)$.
2075:  It is well known (see~\cite[Thm.~16.2.4]{Meyn-book}) that this
2076: implies % the assertion~(\ref{eq-infty-con}).
2077: % because this yields
2078: that there is an $\eta<1$ such that uniformly for
2079: $\rho\in\rad(\Omega)$ we have
2080: \begin{equation}\label{eq-infty-con2}
2081: \norm{\prd^{n_{0}} - E\colon L_{\infty}(\Omega)
2082: \to L_{\infty}(\Omega)}{}\leq\eta,\ \text{for $n\geq
2083: n_{0}$.}
2084: \end{equation}
2085: In the light of Lemma~\ref{lem:infty1} this yields~(\ref{eq:uue1}).
2086: \end{proof}
2087: 
2088: Finally we sketch the 
2089: \begin{proof}[Proof of Lemma~\ref{le:mathescharf}]
2090: Using Proposition~\ref{pro:unbound} we can extend the
2091: proof of~\cite[Thm.~1]{MR1738303}. 
2092: In particular, the bounds from Eq.~(13)--(15)
2093: in~\cite{MR1738303} tend to zero uniformly for
2094: $\rho\in\rad(\Omega)$. Moreover, starting at zero, 
2095: after one step according to the underlying ball walk, the (new)
2096: initial distribution is uniformly bounded  with respect to the uniform
2097: distribution on $\Omega$, hence also with respect to~$\mur$,
2098: such that we establish the asymptotics in Lemma~\ref{le:mathescharf}.  
2099: \end{proof}
2100: 
2101: \medskip
2102: \noindent
2103: {\bf Acknowledgment:} \
2104: We thank two anonymous referees and Daniel Rudolf for their comments. 
2105: %E  Ackno ist neu. 
2106: 
2107: %\cite{MR2260070,MR2172842}
2108: 
2109: \bibliographystyle{plain}
2110: %\bibliography{ref,mybib}
2111: 
2112: \def\cprime{$'$} \def\cprime{$'$}
2113: 
2114: \begin{thebibliography}{10}
2115: 
2116: \bibitem{MR2260070}
2117: Christophe Andrieu and {\'E}ric Moulines.
2118: \newblock On the ergodicity properties of some adaptive {MCMC} algorithms.
2119: \newblock {\em Ann. Appl. Probab.}, 16(3):1462--1505, 2006.
2120: 
2121: \bibitem{103439}
2122: David Applegate and Ravi Kannan.
2123: \newblock Sampling and integration of near log-concave functions.
2124: \newblock In {\em STOC '91: Proceedings of the twenty-third annual ACM
2125:   symposium on Theory of computing}, pages 156--163, New York, NY, USA, 1991.
2126:   ACM Press.
2127: 
2128: \bibitem{MR2172842}
2129: Yves~F. Atchad{\'e} and Jeffrey~S. Rosenthal.
2130: \newblock On adaptive {M}arkov chain {M}onte {C}arlo algorithms.
2131: \newblock {\em Bernoulli}, 11(5):815--828, 2005.
2132: 
2133: \bibitem{Bachvalov}
2134: N.~S. Bahvalov.
2135: \newblock Approximate computation of multiple integrals.
2136: \newblock {\em Vestnik Moskov. Univ. Ser. Mat. Meh. Astr. Fiz. Him.},
2137:   1959(4):3--18, 1959.
2138: 
2139: \bibitem{B/D06}
2140: F.~Bassetti and P.~Diaconis.
2141: \newblock Examples comparing importance sampling and the {M}etropolis
2142:   algorithm.
2143: \newblock {\em to appear Illinois J. of Math.}, 2006.
2144: 
2145: \bibitem{10.1109/5992.814660}
2146: Isabel Beichl and Francis Sullivan.
2147: \newblock The {M}etropolis algorithm.
2148: \newblock {\em Computing in Science and Engineering}, 2(1):65--69, 2000.
2149: 
2150: \bibitem{10.1109/MCSE.2006.27}
2151: Isabel Beichl and Francis Sullivan.
2152: \newblock Guest editors' introduction: Monte {C}arlo methods.
2153: \newblock {\em Computing in Science and Engineering}, 8(2):7--8, 2006.
2154: 
2155: \bibitem{MR2013000}
2156: Nicolas Bourbaki.
2157: \newblock {\em Functions of a real variable}.
2158: \newblock Elements of Mathematics (Berlin). Springer-Verlag, Berlin, 2004.
2159: 
2160: \bibitem{Burenkov} 
2161: Victor I. Burenkov.
2162: {\em Sobolev Spaces on Domains.} 
2163: Teubner-Texte zur Mathematik 137.
2164: Teubner Verlag Stuttgart, 1998. 
2165: 
2166: \bibitem{MR1284987}
2167: Alan Frieze, Ravi Kannan, and Nick Polson.
2168: \newblock Sampling from log-concave distributions.
2169: \newblock {\em Ann. Appl. Probab.}, 4(3):812--837, 1994.
2170: 
2171: \bibitem{Hlawka}
2172: E.~Hlawka.
2173: \newblock Ausf\"ullung und \"Uberdeckung konvexer K\"orper durch 
2174: konvexe K\"orper.
2175: \newblock {\em Mh. Math. Phys.}, 53:81--131, 1949.
2176: 
2177: \bibitem{MR1025467}
2178: Mark Jerrum and Alistair Sinclair.
2179: \newblock Approximating the permanent.
2180: \newblock {\em SIAM J. Comput.}, 18(6):1149--1178, 1989.
2181: 
2182: \bibitem{MR1318794}
2183: R.~Kannan, L.~Lov{\'a}sz, and M.~Simonovits.
2184: \newblock Isoperimetric problems for convex bodies and a localization lemma.
2185: \newblock {\em Discrete Comput. Geom.}, 13(3-4):541--559, 1995.
2186: 
2187: \bibitem{MR797411}
2188: Ulrich Krengel.
2189: \newblock {\em Ergodic theorems}, volume~6 of {\em de Gruyter Studies in
2190:   Mathematics}.
2191: \newblock Walter de Gruyter \& Co., Berlin, 1985.
2192: 
2193: \bibitem{MR930082}
2194: Gregory~F. Lawler and Alan~D. Sokal.
2195: \newblock Bounds on the {$L\sp 2$} spectrum for {M}arkov chains and {M}arkov
2196:   processes: a generalization of {C}heeger's inequality.
2197: \newblock {\em Trans. Amer. Math. Soc.}, 309(2):557--580, 1988.
2198: 
2199: \bibitem{MR1238906}
2200: L.~Lov{\'a}sz and M.~Simonovits.
2201: \newblock Random walks in a convex body and an improved volume algorithm.
2202: \newblock {\em Random Structures Algorithms}, 4(4):359--412, 1993.
2203: 
2204: \bibitem{olm}
2205: Peter Math{\'e}.
2206: \newblock The optimal error of {M}onte {C}arlo integration.
2207: \newblock {\em J. Complexity}, 11(4):394--415, 1995.
2208: 
2209: \bibitem{MR1738303}
2210: Peter Math{\'e}.
2211: \newblock Numerical integration using {M}arkov chains.
2212: \newblock {\em Monte Carlo Methods Appl.}, 5(4):325--343, 1999.
2213: 
2214: \bibitem{Meyn-book}
2215: S.~P. Meyn and R.~L. Tweedie.
2216: \newblock {\em Markov chains and stochastic stability}.
2217: \newblock Springer-Verlag London Ltd., London, 1993.
2218: 
2219: \bibitem{NOV}
2220: Erich Novak.
2221: \newblock {\em Deterministic and stochastic error bounds in numerical
2222:   analysis}.
2223: \newblock Lect. Notes Math. 1349. Springer-Verlag, Berlin, 1988.
2224: 
2225: \bibitem{MR1319050}
2226: Erich Novak.
2227: \newblock The real number model in numerical analysis.
2228: \newblock {\em J. Complexity}, 11(1):57--73, 1995.
2229: 
2230: \bibitem{MR1408328}
2231: Erich Novak.
2232: \newblock On the power of adaption.
2233: \newblock {\em J. Complexity}, 12(3):199--237, 1996.
2234: 
2235: \bibitem{10.1109/MCSE.2006.30}
2236: Dana Randall.
2237: \newblock Rapidly mixing {M}arkov chains with applications in computer science
2238:   and physics.
2239: \newblock {\em Computing in Science and Engineering}, 8(2):30--41, 2006.
2240: 
2241: \bibitem{MR1399158}
2242: G.~O. Roberts and R.~L. Tweedie.
2243: \newblock Geometric convergence and central limit theorems for multidimensional
2244:   {H}astings and {M}etropolis algorithms.
2245: \newblock {\em Biometrika}, 83(1):95--110, 1996.
2246: 
2247: \bibitem{MR0172183}
2248: C.~A. Rogers.
2249: \newblock {\em Packing and covering}.
2250: \newblock Cambridge Tracts in Mathematics and Mathematical Physics, No. 54.
2251:   Cambridge University Press, New York, 1964.
2252: 
2253: \bibitem{SOK}
2254: A.~Sokal.
2255: \newblock Monte {C}arlo methods in statistical mechanics: foundations and new
2256:   algorithms.
2257: \newblock In {\em Functional integration (Carg\`ese, 1996)}, pages 131--192.
2258:   Plenum, New York, 1997.
2259: 
2260: \bibitem{IBC}
2261: J.~F. Traub, G.~W. Wasilkowski, and H.~Wo{\'z}niakowski.
2262: \newblock {\em Information-based complexity}.
2263: \newblock Academic Press Inc., Boston, MA, 1988.
2264: \newblock With contributions by A. G. Werschulz and T. Boult.
2265: 
2266: \bibitem{vempala-lesson}
2267: Santosh Vempala.
2268: \newblock Lect.~17, {R}andom walks and polynomial time algorithms.
2269: \newblock {http://www-math.mit.edu/\~{}vempala/random/course.html}, 2002.
2270: 
2271: \bibitem{MR2178341}
2272: Santosh Vempala.
2273: \newblock Geometric random walks: a survey.
2274: \newblock In {\em Combinatorial and computational geometry}, volume~52 of {\em
2275:   Math. Sci. Res. Inst. Publ.}, pages 577--616. Cambridge Univ. Press,
2276:   Cambridge, 2005.
2277: 
2278: \end{thebibliography}
2279: 
2280: 
2281: \end{document}
2282: 
2283: 
2284: 
2285: