quant-ph0005122/pp.tex
1: 
2: %
3: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
4: %
5: %   file pp.tex
6: %
7: %   preprint MS-TP-00-4
8: %
9: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
10: %
11: \documentclass[epj]{svjour}
12: %\documentclass[12pt]{article}
13: \usepackage{epsfig}
14: 
15: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
16: \newcommand{\be}{\begin{equation}}
17: \newcommand{\ee}{\end{equation}}
18: \newcommand{\bea}{\begin{eqnarray}}
19: \newcommand{\eea}{\end{eqnarray}}
20: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
21: \newcommand{\melo}[3]{<\!#1\,|\,#2\,|\,#3\!>}
22: \newcommand{\scpo}[2]{\mbox{$<\!#1\,|\,#2\!>$}}
23: \newcommand{\brao}[1]{\mbox{$<\!#1\,|$}}
24: \newcommand{\keto}[1]{\mbox{$|\,#1\!>$}}
25: \newcommand{\sepo}[2]{\mbox{$|\, #1\!><\! #2 \, |$}}
26: \newcommand{\avo}[1]{\mbox{$<\!#1\!>$}}
27: \newcommand{\mel}[3]{\langle #1\,|\,#2\,|\,#3\rangle}
28: \newcommand{\scp}[2]{\mbox{$\langle #1\,|\,#2\rangle$}}
29: \newcommand{\scpBig}[2]{\Big\langle #1\,\Big|\,#2\Big\rangle}
30: \newcommand{\av}[1]{\mbox{$\langle \, #1 \, \rangle$}}
31: \newcommand{\bra}[1]{\mbox{$\langle#1|$}}
32: \newcommand{\ket}[1]{\mbox{$|#1\rangle$}}
33: \newcommand{\sep}[2]{\mbox{$|\, #1\rangle\langle #2 \, |$}}
34: \newcommand{\acom}[2]{\mbox{$[#1,#2]_+$}}
35: \newcommand{\ecom}[2]{\mbox{$[#1,#2]_{\epsilon}$}}
36: \newcommand{\com}[2]{\mbox{$[#1,#2]_-$}}
37: \newcommand{\vmat}[4]{\left(\begin{array}{cc}#1&#2\\#3&#4\end{array}\right)}
38: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
39: % to remove headerbox
40: \renewcommand\makeheadbox{}
41: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
42: 
43: \begin{document}
44: 
45: %\headnote{}
46: \title{Bayesian Reconstruction of Approximately Periodic Potentials
47: at Finite Temperature}
48: \author{J. C.  Lemm\thanks{e-mail: {\tt lemm@uni-muenster.de}}, 
49:         J. Uhlig, A. Weiguny}
50: %\author{J. C.  Lemm\inst{1}, J. Uhlig\inst{1}, A. Weiguny\inst{1}}
51: \authorrunning{J. C. Lemm {\it et al.}}
52: \titlerunning{Bayesian Reconstruction of Approximately Periodic Potentials
53: at Finite Temperature}
54: \institute{
55: Institut f\"ur Theoretische Physik,\\
56: Universit\"at M\"unster, 48149 M\"unster, Germany}
57: %\author{J. C.  Lemm, J. Uhlig, A. Weiguny\\
58: %Institut f\"ur Theoretische Physik I,\\
59: %Universit\"at M\"unster, 48149 M\"unster, Germany}
60: 
61: 
62: %\begin{abstract}
63: \abstract{
64: The paper discusses the reconstruction of potentials 
65: for quantum systems at finite temperatures
66: from observational data.
67: A nonparametric approach is developed, 
68: based on the framework of Bayesian statistics,
69: to solve such inverse problems.
70: Besides the specific model of quantum statistics
71: giving the probability of observational data,
72: a Bayesian approach is essentially based
73: on {\it a priori} information available for the potential.
74: Different possibilities to implement
75: {\it a priori} information
76: are discussed in detail, including hyperparameters,
77: hyperfields, and non--Gaussian auxiliary fields.
78: Special emphasis is put on the reconstruction
79: of potentials with approximate periodicity. 
80: The feasibility of the approach
81: is demonstrated for a numerical model.
82: \PACS{
83: {05.30.-d}{Quantum statistical mechanics}
84: \and
85: {02.50.Rj}{Nonparametric inference}
86: \and
87: {02.50.Wp}{Inference from stochastic processes}
88: }
89: }
90: %\end{abstract}
91: 
92: \date{\today}
93: \maketitle
94: 
95: \tableofcontents
96: %\clearpage
97: 
98: \section{Introduction}
99: 
100: A successful application of quantum mechanics to real world systems
101: relies essentially on an adequate reconstruction of the underlying potential,
102: describing the forces governing the system.
103: %necessary to set up the system Hamiltonian.
104: The reconstruction of potentials or forces
105: from available observational data
106: defines an empirical learning task.
107: It also constitutes a typical example of an inverse problem.
108: Such problems are notoriously ill--defined in the sense of Tikhonov
109: \cite{Tikhonov-Arsenin-1977,Kirsch-1996,Vapnik-1998,Honerkamp-1998}.
110: In that case additional {\it a priori} information
111: is required to yield a unique and stable solution.
112: A Bayesian framework 
113: is especially well suited to include both, observational data and 
114: {\it a priori} information, in a quite flexible manner.
115: 
116: Inverse scattering theory
117: \cite{Newton-1989,Chadan-Sabatier-1989,Chadan-Colton-Paivarinta-Rundell-1997} 
118: and inverse spectral theory 
119: \cite{Gelfand-Levitan-1951,Kac-1966,Marchenko-1986,Zakhariev-Chabanov-1997}
120: are two classical research fields
121: which deal in particular with the reconstruction of potentials
122: from spectral data.
123: Both theories describe the kind of data 
124: which are necessary, in addition to a given spectrum, 
125: to determine a potential uniquely.
126: In inverse scattering theory these additional data
127: are for example phase shifts, obtained far away from the scatterer.
128: For the bound state problems studied in inverse spectral theory 
129: these additional data may consist of a second spectrum
130: obtained for boundary conditions different from 
131: those for the first spectrum.
132: The approach of Bayesian Inverse Quantum Mechanics (BIQM)
133: we will refer to in the following 
134: is not exclusively designed for spectral data
135: but is able to work with quite arbitrary observational data
136: \cite{Lemm-IQS-2000}.
137: It can thus be easily adapted to a large variety of
138: different reconstruction scenarios 
139: \cite{Lemm-BFT-1999,Lemm-TDQ-2000,Lemm-IHF-2000}.
140: 
141: The basics of a Bayesian framework are summarized in Section \ref{bayesian}.
142: Setting up a Bayesian approach for a specific application area
143: requires the definition of two basic probabilistic models.
144: First, a {\it likelihood model} is needed
145: giving, for each possible potential,
146: the probability of the observational data.
147: The likelihood model of quantum statistics
148: is discussed in Section \ref{Likelihood-model}.
149: Second, 
150: a {\it prior model} has to be chosen to implement available 
151: {\it a priori} information.
152: Prior models
153: which are useful for inverse quantum statistics
154: are presented in Section \ref{Prior-models}.
155: Technically the most convenient prior models
156: are Gaussian processes, presented in Section \ref{Gaussian-processes}.
157: Section \ref{Covariances-and-approximate-symmetries}
158: shows how covariance and mean of a Gaussian process
159: can be related to {\it a priori information}
160: about approximate symmetries of the potentials to be reconstructed.
161: Section \ref{Approximate-periodicity}
162: concentrates on approximate periodicity,
163: Section \ref{discontinuities} on potentials with discontinuities.
164: Prior models are made more flexible by using {\it hyperparameters}
165: (Section \ref{hyperparameter}), or more general
166: {\it hyperfields},
167: being function hyperparameters (Section \ref{hyperfields}).
168: Related non--Gaussian priors
169: are the topic of Section \ref{Non--Gaussian-priors}.
170: Having defined liklihood and prior models
171: Section \ref{stationarity-equations}
172: discusses the equations to be solved
173: for reconstructing a potential. 
174: Finally, 
175: Section \ref{numerical}
176: presents numerical applications.
177: 
178: 
179: \section{Bayesian approach}
180: \label{bayesian}
181: 
182: Empirical learning is based on observational data $D$.
183: In particular, we will distinguish ``dependent'' variables $x$,
184: representing measurement results,
185: and ``independent'' variables $O$,
186: characterizing the kind of measurement performed.
187: In the context of inverse quantum mechanics 
188: the latter denotes the {\it observables} which are measured.
189: Such observables may for example be 
190: the position, the momentum, or the energy of a quantum particle.
191: Variables $x$ and $O$ are assumed to be measurable
192: and represent therefore {\it visible} variables.
193: Observational data will be assumed to consist of $n$ pairs
194: $D$ = $\{(x_i,O_i)|1\le i\le n\}$ = $(x_T,O_T)$,
195: where $x_T$ and $O_T$ denote the vectors
196: with components $x_i$ or $O_i$, respectively.
197: Such data will also be called {\it training data}.
198: In empirical learning one tries to extract a 
199: ``general law'' from observations.
200: In this paper the quantum potential $V$ 
201: to be reconstructed will represent this ``general law''.
202: (Similarly, in the Bayesian reconstruction of quantum states
203: the object to be reconstructed is the density operator
204: of an unknown state 
205: \cite{Helstrom:1976,Holevo:1982,Tan:1997,Buzek-Drobny-Derka-Adam-Wiedemann:1998}.)
206: Potentials, considered not to be directly observable,
207: represent in our context the
208: {\it hidden} or {\it latent} variables.
209: We will now use the Bayesian framework to relate
210: unobservable potentials to observational data.
211: 
212: 
213: The Bayesian approach is a general probabilistic framework
214: to deal with empirical learning problems 
215: \cite{Bayes-1763,Berger-1980,Loredo-1990,Bernado-Smith-1994,Gelman-Carlin-Stern-Rubin-1995,Sivia-1996,Carlin-Louis-1996,Lemm-BFT-1999}.
216: Predicting results of future measurements 
217: on the basis of given training data
218: is achieved by means of 
219: the {\it predictive probability} 
220: $p(x|O,D)$
221: (or predictive density for continuous $x$), 
222: which is the probability 
223: of finding the value $x$ when measuring observable $O$
224: under the condition that the training data $D$ are given.
225: To calculate the predictive
226: probability a probabilistic model 
227: is needed
228: which describes the measurement process.
229: Such a model is specified by
230: giving  the probability  $p(x|O,V)$
231: of finding $x$ when measuring observable $O$
232: for each possible potential $V$.
233: As $p(x|O,V)$, considered as function of $V$ for fixed $x$ and $O$,
234: is known as likelihood of $V$,
235: we will call this the {\it likelihood model}.
236: For inverse quantum problems
237: the likelihood model is given by the axioms of quantum mechanics
238: and will be discussed in Section \ref{Likelihood-model}.
239: 
240: According to the rules of probability theory
241: the predictive probability can now be written as an integral
242: over the space of all possible potentials $V$,
243: \be
244: p(x|O,D)
245: = \int \!dV\, p(x|O,V)\, p(V|D)
246: .
247: \label{predictive}
248: \ee 
249: We note that in Eq.(\ref{predictive}) we have assumed
250: that the probability of $x$ is completely determined 
251: by giving potential and observable
252: and does not depend on the training data, $p(x|O,V,D)$ = $p(x|O,V)$,
253: and
254: that the probability of the potential given the training data
255: does not depend on the observables selected in the future,
256: $p(V|O,D)$ = $p(V|D)$.
257: If the set of possible potentials is a space of functions, 
258: the integral in (\ref{predictive}) is a functional integral.
259: 
260: As the likelihood model is assumed to be given,
261: learning consists in the determination of $p(V|D)$,
262: known as the {\it posterior} for $V$.
263: To this end, we relate the 
264: posterior for $V$ to the
265: likelihood of $V$ under the training data
266: by applying Bayes' theorem,
267: \be
268: p(V|D)
269: =
270: \frac{p(x_T|O_T,V)\,p(V)}{p(x_T|O_T)}
271: ,
272: \label{bayestheorem}
273: \ee
274: assuming $p(V|O_T)$ = $p(V)$,
275: analogous to Eq.~(\ref{predictive}).
276: In the numerator of Eq.~(\ref{bayestheorem})
277: appears, besides the likelihood, 
278: the so called prior $p(V)$.
279: This prior gives the probability of $V$ 
280: {\it before} training data have been collected.
281: Hence it has to comprise all 
282: {\it a priori information}
283: available for the potential.
284: The need for a prior model,
285: complementing the likelihood model,
286: is characteristic for a Bayesian approach.
287: The denominator in Eq.~(\ref{bayestheorem})
288: plays the role of a normalization factor
289: and can be obtained from likelihood and prior
290: by integration over $V$ as
291: $p(x_T|O_T)$ 
292: = $\int \!dV\,p(x_T|O_T,V)\,p(V)$.
293: 
294: From a Bayesian perspective learning appears as
295: updating the probability for $V$ caused by the arrival of new data $D$.
296: If more data become available
297: this process can be iterated, 
298: the old posterior becoming the new prior
299: which is then updated yielding a new posterior.
300: 
301: 
302: In practice, a major difficulty is the calculation of 
303: the integral over all possible $V$
304: to get the predictive probability (\ref{predictive}). 
305: Even if one resorts to a discrete approximation for $x$
306: the integral (\ref{predictive}) is
307: typically still very high dimensional.
308: The key point is thus to find a feasible
309: approximation for  that integral.
310: Two approaches are common in Bayesian statistics.
311: The first one is an evaluation of the integral
312: by Monte Carlo methods 
313: \cite{Gelman-Carlin-Stern-Rubin-1995,Metropolis-Rosenbluth-Rosenbluth-Teller-Teller-1953,Binder-Heermann-1988,Neal-1997}.
314: The second one, which we will pursue in the following, 
315: is the so called {\it maximum a posteriori approximation} (MAP),
316: being a variant of the saddle point method
317: \cite{Berger-1980,Gelman-Carlin-Stern-Rubin-1995,De-Bruijn-1981,Bleistein-Handelsman-1986,Girosi-Jones-Poggio-1995,Lemm-1996,Lemm-1998}.
318: In MAP one assumes the posterior to be sufficiently peaked
319: around the potential $V^*$ which maximizes the posterior,
320: so that approximately
321: \be
322: p(x|O,D) \approx p(x|O,V^*)
323: ,
324: \ee
325: with 
326: \be
327: V^* 
328: = {\rm argmax}_{V\in{\cal V}} p(V|D)
329: = {\rm argmax}_{V\in{\cal V}} p(x_T|O_T,V) p(V)
330: .
331: \label{map-eq}
332: \ee
333: Maximizing the posterior with respect to $V\in{\cal V}$
334: means,  according to Eq.~(\ref{bayestheorem})
335: with the denominator independent of $V$,
336: maximizing the product of likelihood and prior.
337: 
338: The Bayesian framework discussed
339: so far can analogously be applied to a variety of different contexts,
340: including regression, density estimation and classification
341: problems \cite{Lemm-BFT-1999}.
342: The case of a Gaussian likelihood with fixed variance, for example,
343: is known as regression problem,
344: while problems with general likelihoods
345: are known as density estimation.
346: %For BIQM we have to choose the specific likelihood model
347: %for quantum systems discussed in the next section.
348: 
349: 
350: 
351: \section{Likelihood model of quantum statistics}
352: \label{Likelihood-model}
353: 
354: The first step in applying the Bayesian framework
355: to inverse problems of quantum mechanics or quantum statistics
356: is the definition of the likelihood model \cite{Lemm-IQS-2000}.
357: This is easily obtained from the axioms of quantum mechanics.
358: Consider a system prepared in a state described by
359: a density operator $\rho$. 
360: As our aim will be to reconstruct potentials $V$
361: from observational data,
362: we have to choose a $\rho$ which depends on the potential.
363: The probability to find value $x$,
364: when measuring an observable represented by the Hermitian operator $O$,
365: is given by
366: \be
367: p(x|O,V) 
368: = {\rm Tr} 
369: \Big(P_O(x) \, \rho(V) \Big)
370: ,
371: \label{qm-likelihood}
372: \ee
373: where 
374: $P_O(x)$ = $\sum_\zeta \sep{x,\zeta}{x,\zeta}$
375: denotes the projector on the space of 
376: (orthonormalized) eigenfunctions $\ket{x,\zeta}$ 
377: of $O$ with eigenvalue $x$ and
378: the variable $\zeta$ distinguishes 
379: eigenfunctions with degenerate eigenvalues.
380: 
381: In particular, for a canonical ensemble
382: at temperature $1/\beta$ 
383: (setting  Boltzmann's constant equal to 1)
384: the density operator reads
385: \be
386: \rho = 
387: \frac{e^{-\beta H}}{{\rm Tr\,} e^{-\beta H}}
388: .
389: \label{canonical}
390: \ee
391: To be specific, we will study in the following Hamiltonians of the form
392: $H$ = $T + V$,
393: with kinetic energy 
394: $T$ = $-(1/2m)\Delta$,
395: (with Laplacian $\Delta$, mass $m$, 
396: and setting $\hbar$ = $1$)
397: and a 
398: local potential
399: \be
400: V(x,x^\prime) = 
401: v(x) \delta (x-x^\prime )
402: ,
403: \ee
404: defined by the function $v(x)$.
405: Note that the formalism presented in the following 
406: works with nonlocal potentials as well,
407: numerical calculations, however, would in that case be more
408: demanding.
409: For the likelihood models corresponding to
410: time--dependent quantum systems
411: and to many--body systems 
412: in Hartree--Fock approximation
413: we refer to \cite{Lemm-TDQ-2000,Lemm-IHF-2000}.
414: 
415: 
416: In the following we will study observational data
417: consisting of $n$ position measurements $x_i$.
418: This corresponds to choosing the position operator 
419: for the observables $O_i$ = $\hat x$
420: with 
421: $\hat x \ket{x_i}$ = $x_i\ket{x_i}$.
422: Hence, for a canonical ensemble, 
423: the likelihood (\ref{qm-likelihood})
424: becomes
425: for a single position measurement
426: \be
427: p(x_i|\hat x,v) 
428: =\sum_\alpha p_\alpha |{\phi}_\alpha(x_i)|^2 
429: =\av{|{\phi} (x_i)|^2}
430: \label{pos-likelihood}
431: \ee
432: with (non--degenerate) eigenfunctions ${\phi}_\alpha$ of $H$
433: and energies $E_\alpha$,
434: i.e.,
435: $H\ket{{\phi}_\alpha}$ = $E_\alpha \ket{{\phi}_\alpha}$.
436: Angular brackets $\av{\cdots}$
437: denote a thermal expectation 
438: under the probabilities
439: $p_\alpha$ = 
440: $\exp(-\beta E_\alpha)/Z$ with
441: $Z$ = $\sum_\alpha \exp (-\beta E_\alpha)$
442: according to Eq.~(\ref{canonical}).
443: For independent data $D_i$ = $(x_i,O_i)$,
444: \be
445: p(x_T|O_T,v) 
446: = \prod_{i=1}^n p(x_i|\hat x,v) 
447: = \prod_{i=1}^n \av{|{\phi} (x_i)|^2}
448: .
449: \ee
450: A quantum mechanical measurement changes 
451: the state of the system, i.e., it changes $\rho$.
452: Hence, to obtain independent data under constant $\rho$
453: requires the density operator 
454: to be restored before each measurement.
455: For a canonical ensemble this means 
456: to wait between two consecutive observations
457: until the system is thermalized again.
458: 
459: Choosing a parametric family of potentials
460: $v(x;\xi)$
461: one could now
462: maximize the likelihood
463: with respect to the parameters $\xi$,
464: and choose as reconstructed potential
465: \be
466: v^*(x) = v(x;\xi^*)
467: \quad \mbox{with} \quad
468: \xi^* 
469: = \mbox{\rm argmax}_{\xi} \, p(x_T|O_T,v(\xi) )
470: .
471: \label{ml-eq}
472: \ee
473: This is known as maximum likelihood approximation 
474: and works well 
475: if the number of data is large compared to the 
476: flexibility of the selected parametric family of potentials.
477: This method does however not yield a unique optimal potential
478: if the flexibility is too large for the available number of observations.
479: (A possible measure of the ``flexibility'' of a parametric family
480: is given by the Vapnik-Chervonenkis dimension \cite{Vapnik-1998}
481: or variants thereof.)
482: In such cases, the inclusion of additional restrictions on $v$ 
483: in form of {\it a priori} information is essential.
484: This holds especially for nonparametric approaches,
485: where each number $v(x)$ is treated
486: as individual degree of freedom.
487: Including {\it a priori} information
488: generalizes the 
489: maximum likelihood approximation 
490: of Eq.~(\ref{ml-eq})
491: to the MAP of Eq.~(\ref{map-eq}).
492: 
493: 
494: 
495: 
496: 
497: 
498: 
499: \section{Prior models}
500: \label{Prior-models}
501: 
502: \subsection{Gaussian processes}
503: \label{Gaussian-processes}
504: 
505: A finite number of observational data cannot
506: completely determine a function $v(x)$.
507: Hence,  besides observational data,
508: additional {\it a priori} information
509: is necessary to reconstruct a potential in BIQM.
510: In nonparametric approaches
511: it is advantageous to formulate
512: {\it a priori} information
513: directly in terms of the function $v(x)$ itself.
514: A convenient choice for a prior is a Gaussian process,
515: \be
516: p(v) 
517: =
518: \left(\det \frac{{\bf K}_0}{2\pi}\right)^\frac{1}{2}
519: e^{-\frac{1}{2} \mel{v-v_0}{{\bf K}_0}{v-v_0}}
520: ,
521: \label{gaussprior}
522: \ee
523: where 
524: \be
525: \mel{v-v_0}{{\bf K}_0}{v-v_0}
526: = 
527: \ee
528: \[
529: \int\! dx \,dx^\prime\, [v(x)-v_0(x)]{\bf K}_0(x,x^\prime)
530: [v(x^\prime)-v_0(x^\prime)].
531: \]
532: The function $v_0$ is the mean or regression function,
533: representing a reference potential or template for $v$.
534: The inverse covariance ${\bf K}_0$
535: is a real symmetric, positive (semi)definite operator 
536: which acts on potentials rather than on wave functions
537: and defines
538: a distance measure on the space of potentials.
539: For technical convenience one may introduce explicitly
540: a factor $\lambda$ multiplying  ${\bf K}_0$
541: to balance the influence of the prior 
542: against the likelihood term.
543: A Gaussian prior as in Eq.~(\ref{gaussprior})
544: is already a quite flexible tool 
545: for implementing {\it a priori} knowledge.
546: A bias towards smooth
547: functions $v(x)$, for instance, 
548: can be implemented by choosing the negative Laplacian as inverse
549: covariance
550: ${\bf K}_0$ = $-\Delta$.
551: Including higher derivatives in ${\bf K}_0$ 
552: would result in even smoother potentials,
553: in the sense that higher derivatives of $v(x)$ become continuous.
554: For example,
555: a common smoothness prior used for regression problems is 
556: the Radial Basis Function prior
557: ${\bf K}_0$ = $\exp{(-{\sigma_{\rm RBF}^2}{\Delta}/2)}$
558: \cite{Girosi-Jones-Poggio-1995}.
559: 
560: 
561: 
562: \subsection{Covariances and approximate symmetries}
563: \label{Covariances-and-approximate-symmetries}
564: 
565: Prior information on potentials $v$ can often be related to 
566: approximate invariance under specific transformations
567: \cite{Lemm-BFT-1999}.
568: Typical examples of such transformations are symmetry operations
569: like translations or rotations.
570: To be specific, assume that
571: a (not necessarily local) potential $V$ 
572: commutes approximately, but not exactly,
573: with some unitary operator $S$, 
574: \be
575: V \approx S^\dagger V S = {\bf S} V
576: ,
577: \ee
578: which defines an operator ${\bf S}$
579: acting on $V$.
580: In particular, we may choose a prior 
581: $p(V)\propto\exp \{-E_{S}(V)\}$ with
582: a {\it prior energy}
583: \be
584: E_S
585: = \frac{1}{2}\scp{V-{\bf S}V}{V-{\bf S}V}
586: = \frac{1}{2}\mel{V}{{\bf K}_{0}}{V}
587: .
588: \ee
589: This shows that the expectation 
590: of an approximate symmetry of $V$ under $S$
591: can be implemented by choosing a Gaussian prior with
592: inverse covariance operator
593: \be
594: {\bf K}_0 = 
595: ({\bf I}-{\bf S})^\dagger ({\bf I}-{\bf S})
596: ,
597: \ee
598: where ${\bf I}$ denotes the identity operator. 
599: Symmetry operations $S(\theta)$,
600: with corresponding ${\bf S}(\theta)$,
601: may depend on a parameter (vector) $\theta$.
602: Approximate invariance under $S(\theta_i)$ 
603: for several $\theta_i$
604: can be implemented by using the sum 
605: (or integral, for continuous variables)
606: \bea
607: E_S 
608: &=& \frac{1}{2}
609: %\int \! d\theta\, \scp{V-{\bf S}(\theta)V}{V-{\bf S}(\theta)V}
610: \sum_i \scp{V-{\bf S}(\theta_i)V}{V-{\bf S}(\theta_i)V}
611: %= \frac{1}{2}\int \! d\theta\, \mel{V}{{\bf K}_{0}(\theta)}{V}
612: \nonumber\\
613: &=& \frac{1}{2}\sum_i \mel{V}{{\bf K}_{0}(\theta_i)}{V}
614: .
615: \eea
616: Alternatively, one may 
617: require approximate symmetry for only one value of $\theta$,
618: not fixed {\it a priori}.
619: For example, one may expect an approximately periodic potential
620: with unknown periodicity length $\theta$
621: which also has to be determined from the data.
622: Such $\theta$ are known as {\it hyperparameters}
623: and will be discussed in Section \ref{hyperparameter}.
624: 
625: Lie groups
626: are continuously parameterized transformations
627: \be
628: {\bf S}(\theta)
629: =e^{\sum_i\theta_i {\bf s}_i}
630: ,
631: \ee
632: where $\theta_i$ are the real parameters 
633: and the ${\bf s}_i$ = $-{\bf s}_i^T$ 
634: (the superscript ${}^T$ denoting the transpose)
635: are antisymmetric operators
636: representing the generators 
637: of the infinitesimal transformations 
638: of the Lie--group.
639: We can define a prior energy as
640: an error measure with respect to an
641: infinitesimal transformation,
642: \bea
643: E_S &=&
644: \frac{1}{2}
645: \sum_i  
646: \scpBig{\frac{{V} - (1 + \theta_i {\bf s}_i) {V}}{\theta_i}}
647:     {{\frac{{V} - (1 + \theta_i {\bf s}_i){V}}{\theta_i}}} 
648: \nonumber\\
649: &=&
650: \frac{1}{2}
651: \mel{{V}}{\sum_i {\bf s}_i^T {\bf s}_i}{{V}}
652: \label{Lie-error}
653: .
654: \eea
655: For instance, a Laplacian smoothness prior for a local potential $v(x)$
656: can be related to an approximate symmetry 
657: under infinitesimal translations.
658: For the group of 
659: $d$--dimensional translations which is generated
660: by the gradient operator $\nabla$
661: this can be verified by recalling the multidimensional Taylor formula 
662: for expanding ${v}$ around $x$ 
663: \be
664: {\bf S}(\theta) {v}(x) 
665: = e^{ \sum_i \theta_i \nabla_i } {v}(x)
666: = \sum_{k=0}^\infty 
667: \frac{\left(\sum_i \theta_i \nabla_i\right)^{k}}{k!} {v}(x)
668: = {v}(x+\theta).
669: \ee
670: Up to first order 
671: ${\bf S} \approx  1+\sum_i\theta_i \nabla_i$.
672: Hence, for infinitesimal translations,
673: the error measure of Eq.\ (\ref{Lie-error}) becomes
674: \bea
675: E_S 
676: &=&
677: \frac{1}{2}\sum_i  
678: \scpBig{\frac{{v} -(1 + \theta_i {\nabla}_i) {v}}{\theta_i}}
679:        {{\frac{{v} - (1 + \theta_i {\nabla}_i){v}}{\theta_i}}}
680: \nonumber\\
681: &=&
682: -\frac{1}{2}\mel{{v}}{\Delta}{{v}}
683: ,
684: \eea
685: assuming vanishing boundary terms.
686: This is the classical Laplacian smoothness term.
687: 
688: 
689: 
690: \subsection{Approximate periodicity}
691: \label{Approximate-periodicity}
692: 
693: In this paper we will in particular be interested
694: in potentials which are approximately periodic.
695: To measure the deviation from exact periodicity
696: for a local potential $v(x)$ 
697: let us define the difference operators
698: \bea
699: \left(\nabla^{R}_\theta v\right)(x)
700: &=&
701: {v}(x+\theta)-{v}(x).
702: \\
703: \left(\nabla^{L}_\theta v\right)(x)
704: &=&
705: {v}(x)-{v}(x-\theta),
706: \eea
707: For periodic boundary conditions
708: $(\nabla^{L}_\theta)^T$
709: =
710: $-\nabla^{R}_\theta$,
711: where $(\nabla^{L}_\theta)^T$ 
712: denotes the transpose of $\nabla^{L}_\theta$.
713: Hence, the operator
714: \be
715: -\Delta_\theta 
716: = -\nabla^L_\theta\nabla^R_\theta
717: = (\nabla^R_\theta)^T\nabla^R_\theta
718: \ee
719: defined in analogy to the negative Laplacian,
720: is positive (semi)\-definite,
721: and a possible prior energy 
722: is an error term 
723: which measures the deviation from exact periodicity
724: for given period $\theta$,
725: \bea
726: E_S 
727: &=&\frac{1}{2}\int \!dx\; |{v}(x)-{v}(x+\theta)|^2
728: \nonumber\\
729: &=&
730:  \frac{1}{2}
731: \scp{\nabla^R_\theta{v}}{\nabla^R_\theta {v}}
732: \nonumber\\
733: &=&
734: -\frac{1}{2}
735: \mel{{v}}{\Delta_\theta}{{v}}
736: .
737: \label{periodic-error}
738: \eea
739: Discretizing $v$ 
740: the operator $\nabla^R_\theta$  
741: for periodic boundary conditions
742: becomes, 
743: for example on a mesh with six points and $\theta$ = $2$, 
744: the matrix
745: \be
746: \nabla^R_\theta  = 
747: \left(
748: \begin{tabular}{    c     c     c     c     c    c }
749:                    $-1$&  0  & $1$ &  0  &  0  & 0   \\
750:                     0  & $-1$&  0  & $1$ &  0  & 0   \\
751:                     0  &  0  & $-1$&  0  & $1$ & 0   \\
752:                     0  &  0  &  0  & $-1$&  0  & $1$ \\
753:                    $1$ &  0  &  0  &  0  & $-1$  & 0 \\
754:                     0  & $1$ &  0  &  0  &  0  & $-1$\\
755: \end{tabular}
756: \right)
757: ,
758: \ee
759: so that
760: \be
761: -\Delta_\theta  = 
762: \left(
763: \begin{tabular}{    c     c     c     c     c    c }
764:                     2  &  0  & $-1$&  0  & $-1$& 0     \\
765:                     0  &  2  &  0  & $-1$&  0  & $-1$  \\
766:                    $-1$&  0  &  2  &  0  & $-1$& 0     \\
767:                     0  & $-1$&  0  &  2  &  0  & $-1$  \\
768:                    $-1$&  0  & $-1$&  0  &  2  & 0   \\
769:                     0  & $-1$&  0  & $-1$&  0  & 2  \\
770: \end{tabular}
771: \right)
772: .
773: \ee
774: 
775: 
776: As every periodic function with ${v}(x)={v}(x+\theta)$
777: is in the null space of $\Delta_\theta$
778: typically another error term has to be added 
779: to get a unique maximum of the posterior.
780: For example, combining 
781: a prior energy (\ref{periodic-error})
782: with a Laplacian smoothness term yields
783: a Gaussian prior of the form (\ref {gaussprior})
784: with inverse covariance
785: ${\bf K}_0$ = $-\lambda (\Delta+\gamma \Delta_\theta)$
786: and prior energy
787: \be
788: E_S =
789: -\frac{\lambda}{2}
790: \mel{{v}}{\Delta+\gamma \Delta_\theta}{{v}}
791: \label{periodic-cov}
792: ,
793: \ee
794: with weighting factors $\lambda$, $\gamma$.  
795: In case the period $\theta$ is not known, it can be treated
796: as hyperparameter as will be discussed in Section \ref{hyperparameter}.
797: Clearly, a nonzero reference potential $v_0$ can be included
798: in Eq.~(\ref{periodic-cov}).
799: In Eq.~(\ref{periodic-error}),
800: one may also sum over several periods
801: \be
802: E_S 
803: = \frac{1}{2} \sum_{k=1}^{k_{\rm max}} 
804: w(k) \int \!dx\; |{v}(x)-{v}(x+k \theta)|^2
805: ,
806: \label{periodic-error2}
807: \ee
808: where $w(k)$ is a weighting function, decreasing for larger $k$.
809: Prior energies as in (\ref{periodic-error2})
810: enforce approximate periodicity
811: over longer distances than
812: a prior energy of the form (\ref{periodic-error}).
813: The latter, on the other hand,
814: is more robust than (\ref{periodic-error2})
815: with respect to local deviations from periodicity,
816: like a locally varying frequency.
817: 
818: 
819: Instead of choosing an
820: inverse covariance ${\bf K}_0$ 
821: with symmetric functions in its null space,
822: approximate symmetries can be implemented by using 
823: explicitly a symmetric reference function 
824: $v_0$ = ${\bf S} v_0$ for the Gaussian prior (\ref{gaussprior}).
825: For approximate periodicity, 
826: this would mean to choose
827: a periodic reference potential
828: $v_0(x)$ = $v_0(x+\theta)$ in the prior energy
829: $E_S = \frac{1}{2} \mel{{v} -v_0}{{\bf K}_0}{{v}-v_0}$
830: where ${\bf K}_0$ could be for example
831: the identity or a differential operator.
832: Thus a periodic reference potential 
833: favors a specific form for the reconstructed potential,
834: including a specific frequency and phase.
835: This is different for the covariance implementation
836: (\ref{periodic-error})
837: of approximate periodicity 
838: where only the frequency is relevant
839: and reference potentials can still be chosen arbitrarily.
840: They may, for example be nonperiodic functions or functions 
841: with even higher symmetry
842: like in Eq.~(\ref{periodic-cov})
843: where $v_0\equiv 0$ is invariant under all translations.
844: Flexible reference potentials will be studied in Section \ref{hyperparameter}.
845: 
846: 
847: \subsection{Potentials with discontinuities}
848: \label{discontinuities}
849: 
850: Smooth potentials $v(x)$ with discontinuities can either be approximated
851: by using discontinuous templates $v_0(x;\theta)$ 
852: or by eliminating matrix elements of the inverse covariance
853: which connect the two sides of the discontinuity.
854: For example, consider the discrete version 
855: of a negative Laplacian
856: with unit lattice spacing and periodic boundary conditions,
857: \be
858: {\bf K}_0 = -\Delta =
859: \left(
860: \begin{tabular}{    c     c     c     c     c    c }
861:                     2  & $-1$&  0  &  0  &  0  &$-1$\\
862:                    $-1$&  2  & $-1$&  0  &  0  & 0  \\
863:                     0  & $-1$&  2  & $-1$&  0  & 0  \\
864:                     0  &  0  & $-1$&  2  & $-1$&  0  \\
865:                     0  &  0  &  0  & $-1$&  2  & $-1$ \\
866:                    $-1$&  0  &  0  &  0  & $-1$&  2  \\
867: \end{tabular}
868: \right).
869: \label{discrete1}
870: \ee
871: Decomposing the matrix (\ref{discrete1})
872: into square roots we write ${\bf K}_0$ = ${\bf W}^T {\bf W}$ 
873: (see also Section \ref{hyperfields})
874: where a possible square root is
875: \be
876: {\bf W} = \nabla_1^R =
877: \left(
878: \begin{tabular}{    c     c     c     c     c    c }
879:                   $-1$ & $1$ &  0  &  0  &  0  & 0   \\
880:                     0  & $-1$& $1$ &  0  &  0  & 0   \\
881:                     0  &  0  & $-1$& $1$ &  0  &  0  \\
882:                     0  &  0  &  0  & $-1$& $1$ &  0  \\
883:                     0  &  0  &  0  &  0  & $-1$& $1$ \\
884:                    $1$ &  0  &  0  &  0  &  0  & $-1$\\
885: \end{tabular}
886: \right)
887: \label{discrete2}
888: .
889: \ee
890: Similarly, the derivative operator $\partial/\partial x$
891: represents a square root of the negative Laplacian
892: for periodic boundary conditions.
893: Two regions can now be disconnected by deleting all lines of ${\bf W}$
894: which have matrix elements in both regions. 
895: For instance, the first three points in the six--dimensional space
896: of Eq.~(\ref{discrete2})
897: can be disconnected from the last three points
898: by setting 
899: ${\bf W}(3,\cdot )$ and ${\bf W}(6,\cdot )$ to zero,
900: \be
901: \tilde {\bf W} = 
902: \left(
903: \begin{tabular}{    c     c     c  |  c     c    c }
904:                    $-1$& $1$ &  0  &  0  &  0  & 0  \\
905:                     0  & $-1$& $1$ &  0  &  0  & 0  \\
906:                     0  &  0  &  0  &  0  &  0  & 0  \\
907: \hline
908:                     0  &  0  &  0  & $-1$& $1$ &  0  \\
909:                     0  &  0  &  0  &  0  & $-1$& $1$ \\
910:                     0  &  0  &  0  &  0  &  0  &  0  \\
911: \end{tabular}
912: \right)
913: \label{discrete3}
914: .
915: \ee
916: Squaring of $\tilde {\bf W}$ yields a positive semidefinite operator 
917: \be
918: \tilde {\bf K}_0 = {\tilde {\bf W}}^T \tilde {\bf W} =
919: \left(
920: \begin{tabular}{    c     c     c  |  c     c    c }
921:                     1  & $-1$&  0  &  0  &  0  & 0  \\
922:                   $-1$ &  2  & $-1$&  0  &  0  & 0  \\
923:                     0  & $-1$&  1  &  0  &  0  & 0  \\
924: \hline
925:                     0  &  0  &  0  &  1  & $-1$&  0  \\
926:                     0  &  0  &  0  & $-1$&  2  & $-1$ \\
927:                     0  &  0  &  0  &  0  & $-1$&  1  \\
928: \end{tabular}
929: \right)
930: \label{discrete4}
931: \ee
932: resulting in a smoothness prior which is ineffective
933: between points from different regions.
934: In contrast to using discontinuous templates,
935: the height of the jump at the discontinuity 
936: has not to be given in advance
937: when working with
938: disconnected Laplacians (or other disconnected inverse covariances).
939: On the other hand
940: training data are then required for all separated regions
941: to determine the free constants 
942: which correspond to the zero modes of the local Laplacians.
943: The reconstruction of discontinuous functions 
944: with non--Gaussian priors will be discussed in
945: Section \ref{Non--Gaussian-priors}.
946: 
947: 
948: 
949: \subsection{Hyperparameters}
950: \label{hyperparameter}
951: 
952: Parameters of the prior are known as {\it hyperparameters}
953: \cite{Lemm-BFT-1999,Carlin-Louis-1996,Bishop-1995b}.
954: Like potentials $v$, hyperparameters $\theta$  
955: are not directly observable and
956: represent hidden variables.
957: In the presence of hyperparameters
958: a prior for $v$ can be decomposed as follows
959: \be
960: p(v)
961: = 
962: \int \!d\theta\, p(v|\theta) \, p(\theta)
963: ,
964: \label{theta-integral}
965: \ee
966: where $p(\theta)$ is known as {\it hyperprior}.
967: The likelihood does not depend on $\theta$,
968: the predictive probability (\ref{predictive}), 
969: however, 
970: contains then an integral over $\theta$, 
971: \be
972: p(x|O,D) =
973: \label{hyper-predictive}
974: \ee
975: \[\frac{1}{p(x_T|O_T)}\int \!dv\,d\theta\,  p(x|O,v)\, 
976: p(x_T|O_T,v)\, p(v|\theta) \, p(\theta)
977: .
978: \]
979: Like the integral over $v$,
980: the integral over $\theta$
981: can be calculated either by Monte Carlo methods
982: or in MAP.
983: We remark that, when a $\theta$--dependent prior
984: is written in terms of a corresponding prior energy
985: $p(v|\theta)\propto e^{-E(v|\theta)}$,
986: the normalization $\int\!dv\, e^{-E(v|\theta)}$
987: is independent of $v$ but does in general depend on $\theta$.
988: 
989: Hyperparameters $\theta$ can be 
990: single numbers or vectors.
991: They can describe continuous transformations,
992: like translation, rotation or scaling of template functions
993: and scaling of inverse covariance operators.
994: For real $\theta$ and differentiable posterior,
995: stationarity conditions can be found by differentiating
996: the posterior with respect to $\theta$.
997: 
998: 
999: Instead of continuous transformations
1000: of templates or inverse covariances
1001: one can consider
1002: a finite collection of 
1003: alternative reference potentials $v_i$
1004: or alternative inverse covariances ${\bf K}_i$.
1005: For example, a potential to be reconstructed 
1006: may be expected to be similar to one reference potential
1007: out of a small number of possible alternatives $v_i$.
1008: The ``class'' variables $i$ are then
1009: nothing else but hyperparameters 
1010: $\theta$ with integer values.
1011: 
1012: Binary parameters allow to select from two reference functions 
1013: or two inverse covariances
1014: that one which fits the data best.
1015: Indeed, writing
1016: \bea
1017: v_0(\theta) &=& (1-\theta) v_1 + \theta v_2,
1018: \label{integer-hyper-t}\\
1019: {\bf K}_0(\theta) &=& (1-\theta) {\bf K}_1 + \theta {\bf K}_2
1020: \label{integer-hyper-K}
1021: ,
1022: \eea
1023: a binary $\theta\in \{0,1\}$ implements
1024: hard switching between alternative templates or inverse covariances,
1025: corresponding to a conditional prior
1026: \be
1027: p(v|\theta) \propto e^{-(1-\theta)E_1(v)-\theta E_2(v)}
1028: \label{mix-prior-bin}
1029: \ee
1030: with
1031: \bea
1032: E_1(v)  &=& \frac{1}{2}\mel{v-v_1}{{\bf K}_1}{v-v_1}
1033: ,
1034: \\
1035: E_2(v) &=& \frac{1}{2}\mel{v-v_2}{{\bf K}_2}{v-v_2}
1036: .
1037: \eea
1038: Similarly, a real $\theta\in [0,1]$ 
1039: in (\ref{integer-hyper-t}) or 
1040: (\ref{integer-hyper-K})
1041: yields soft mixing.
1042: In that case, however,
1043: the mixing of templates in (\ref{integer-hyper-t})
1044: is not equivalent to
1045: a mixing of prior energies 
1046: as in (\ref{mix-prior-bin})
1047: because for real $\theta$ 
1048: Eqs.~(\ref{integer-hyper-t})
1049: and (\ref{integer-hyper-K})
1050: lead to mixed terms, 
1051: like 
1052: $(1-\theta)\theta \mel{v-v_1}{{\bf K}_0}{v-v_2}/2$
1053: for ${\bf K}_1$ = ${\bf K}_2$.
1054: When $\theta$ takes integer values the integral
1055: $\int\! d\theta$ 
1056: becomes a sum $\sum_\theta$ 
1057: so that prior, posterior, and predictive probability
1058: have the form of a {\it finite mixture} 
1059: with components $\theta$ \cite{lemm-mixture-1999}.
1060: 
1061: 
1062: For a moderate number of components
1063: one may be able to include 
1064: all of the mixture components in the calculations.
1065: If the number of mixture components is too large
1066: one must select some of the components,
1067: for example by creating a random sample
1068: using Monte Carlo methods, 
1069: or by solving for the $\theta^*$
1070: with maximal posterior.
1071: In contrast to typical optimization problems for real variables,
1072: the corresponding integer optimization problems
1073: are usually not very smooth with respect to $\theta$
1074: (with smoothness defined in terms of differences instead of derivatives),
1075: and are therefore often much harder to solve.
1076: 
1077: 
1078: There exists
1079: a variety of deterministic and stochastic integer optimization algorithms,
1080: which may be combined with ensemble methods like genetic algorithms
1081: \cite{Holland-1975,Goldberg-1989,Michalewicz-1992,Schwefel-1995,Mitchell-1996},
1082: and with homotopy methods like simulated annealing
1083: \cite{Kirkpatrick-Gelatt-Vecchi-1983,Mezard-Parisi-Virasoro-1987,Aarts-Korts-1989,Gelfand-Mitter-1991,Yuille-Kosowski-1994}.
1084: Annealing methods are similar to (Markov chain) Monte Carlo methods,
1085: which aim at sampling many points 
1086: from a specific distribution
1087: (i.e., for example at fixed temperature).
1088: For Monte Carlo methods it is important to have (nearly) independent samples
1089: and the correct limiting distribution for the Markov chain.
1090: For annealing methods the aim is to find the correct minimum 
1091: by smoothly changing the temperature from a finite value to zero.
1092: For the latter it is thus less important to model the distribution
1093: for nonzero temperatures exactly, but
1094: it is important to use an adequate
1095: cooling scheme for lowering the temperature.
1096: 
1097: 
1098: 
1099: 
1100: \subsection{Hyperfields}
1101: \label{hyperfields}
1102: 
1103: The hyperparameters $\theta$ 
1104: considered so far have been real or integer {\it numbers}, 
1105: or {\it vectors} with real or integer components $\theta_i$.
1106: In this section we will discuss 
1107: priors parameterized by functions,
1108: called {\it hyperfields} \cite{Lemm-BFT-1999},
1109: resulting in a still larger flexibility of the formalism.
1110: In numerical calculations where functions have to be discretized
1111: hyperfields stand for high dimensional hyperparameter vectors.
1112: 
1113: 
1114: Using hyperfields 
1115: one has to keep in mind
1116: that a gain in flexibility at the same time
1117: tends to lower the influence of the prior.
1118: For example,
1119: consider as hyperfield a completely adaptive reference potential 
1120: $\theta(x)$ = $v_0(x)$
1121: within a Gaussian prior (\ref{gaussprior}).
1122: Then, for any $v(x)$
1123: the prior energy vanishes 
1124: for $v_0(x)$ = $v(x)$.
1125: In the absence of additional hyperpriors $p(\theta)$
1126: the corresponding MAP solution for the hyperfield 
1127: $\theta(x)$ = $v_0(x)$
1128: is thus
1129: $\theta^*(x)$ = $v(x)$
1130: for which the Gaussian prior (\ref{gaussprior})
1131: becomes uniform in $v(x)$.
1132: Hence the price to be paid for the additional flexibility
1133: introduced by hyperfields
1134: are weaker priors
1135: and a large number of additional degrees of freedom.
1136: This can considerably complicate calculations and
1137: requires sufficiently restrictive hyperpriors for the hyperfields.
1138: 
1139: 
1140: Let us define {\it local hyperfields} $\theta(x)$ 
1141: to be  hyperfields depending on the position variable $x$. 
1142: (In general hyperfields can be introduced 
1143: which depend on other real variables or 
1144: on several position variables.) 
1145: Local hyperfields can be used, for example,
1146: to adapt templates or inverse covariances locally.
1147: To this end, 
1148: we express real symmetric, positive (semi)\-definite inverse covariances
1149: by square roots or (real) {\it filter operators} ${\bf W}$,
1150: so that
1151: \be
1152: {\bf K}_0 = {\bf W}^T{\bf W}
1153: .
1154: \ee
1155: In components
1156: \be
1157: {\bf K}_0(x,x^\prime) 
1158: = \int \!dx^{\prime\prime}\; 
1159:    {\bf W}^T(x,x^{\prime\prime}){\bf W}(x^{\prime\prime},x^{\prime})
1160: ,
1161: \ee
1162: and therefore
1163: \bea
1164: \mel{{v}-v_0}{{\bf K}_0}{{v}-v_0}
1165: &=&
1166: \int\! dx\,dx^\prime\, dx^{\prime\prime}\,
1167: [{v}(x)-v_0(x)]
1168: \nonumber\\
1169: &&\times\;
1170: {\bf W}^T(x,x^{\prime}){\bf W}(x^{\prime},x^{\prime\prime})
1171: \nonumber\\
1172: &&\times\;
1173: [{v}(x^{\prime\prime})-v_0(x^{\prime\prime})]
1174: \nonumber\\
1175: &=&
1176: \int \! dx\, |\omega (x)|^2
1177: ,
1178: \eea
1179: where we define the  {\it filtered difference}
1180: \be
1181: \omega (x)
1182: %=\scp{W_x}{{v}-v_0}
1183: =
1184: \int \!dx^\prime \, {\bf W}(x,x^\prime) 
1185: [{v}(x^\prime)-v_0(x^\prime )]
1186: .
1187: \label{filtered-diff}
1188: \ee
1189: For instance,
1190: a square root (\ref{discrete2})
1191: of the discrete negative Laplacian (\ref{discrete1})
1192: corresponds for $v_0\equiv 0$ to a filtered difference
1193: $\omega(x)$ = $v(x+1)-v(x)$.
1194: 
1195: The exponent of a Gaussian prior for a local potential ${v}$
1196: can thus be written as an integral over $x$,
1197: \be
1198: p({v}) \propto e^{-E(v)}
1199: ;\quad
1200: E(v) = \frac{1}{2}\int \!dx \, |\omega(x)|^2
1201: .
1202: \label{Gauss-omega}
1203: \ee
1204: In contrast to 
1205: Eqs.~(\ref{integer-hyper-t}) and (\ref{integer-hyper-K})
1206: the representation (\ref{Gauss-omega}) 
1207: is well suited for introducing local hyperfields.
1208: For instance,  
1209: an adaptive prior 
1210: \be
1211: p({v}|\theta) = e^{-E(v|\theta)}
1212: ,
1213: \label{hyper-prior}
1214: \ee
1215: with a real local hyperfield
1216: $\theta(x)\in [0,1]$
1217: can be obtained by
1218: mixing locally two alternative filtered differences
1219: \be
1220: \omega (x;\theta) 
1221: = [1-\theta(x)] \, \omega_1(x) + \theta(x) \,\omega_2(x)
1222: \label{hyper-function-omega}
1223: ,
1224: \ee
1225: where the two $\omega_i$
1226: may differ in their filters and/or reference potentials.
1227: In that case 
1228: the hyperfield $\theta(x)$ 
1229: can locally select
1230: the best mixture of the filtered differences
1231: $\omega_i$, i.e., 
1232: that one which yields in (\ref{hyper-prior})
1233: the largest probability
1234: or smallest prior energy 
1235: \bea
1236: E(v|\theta) 
1237: &=& \frac{1}{2}
1238: \int \!dx |\omega (x;\theta)|^2
1239: +\ln Z_{\cal V}(\theta)
1240: \label{local-hyper-p-r}
1241: \\
1242: &=&\frac{1}{2} \! \int \!\!dx 
1243:  \Big| [1-\theta(x) ] \omega_1(x)
1244:  +\theta(x) \omega_2(x)
1245:  \Big|^2
1246: \!+\ln Z_{\cal V}(\theta)
1247: .
1248: \nonumber
1249: \eea
1250: Here the normalization factor 
1251: \be
1252: Z_{\cal V}(\theta)
1253: =
1254: \int_{v\in {\cal V}} 
1255: d \!v\, e^{-\frac{1}{2} \int \!dx |\omega (x;\theta)|^2}
1256: ,
1257: \ee
1258: depends in general on $\theta$
1259: if the filters of the $\omega_i$ differ.
1260: Clearly, 
1261: allowing an unbounded $-\infty\le \theta(x)\le \infty$
1262: any function $\omega (x;\theta)$
1263: can be written in the form of Eq.~(\ref{hyper-function-omega}),
1264: provided $\omega_1(x)\ne \omega_2(x)$ for all $x$.
1265: 
1266: 
1267: 
1268: In contrast to soft mixing with real functions $\theta(x)$
1269: a binary local hyperfield $\theta(x)\in \{0,1\}$
1270: implements hard switching 
1271: between alternative filtered differences.
1272: Since in the binary case
1273: $\theta^2$ = $\theta$,
1274: $(1-\theta)^2$ = $(1-\theta)$,
1275: and 
1276: $\theta(1-\theta)$ = $0$,
1277: Eq.~(\ref{local-hyper-p-r})  
1278: becomes [compare Eq.~(\ref{mix-prior-bin})]
1279: \bea
1280: E({v}|\theta) 
1281: &=&
1282: \frac{1}{2} \int \!dx \, 
1283:  \Big( [1-\theta(x)]|\omega_1(x)|^2
1284: \nonumber\\
1285: &&
1286: \quad +\theta(x) |\omega_2(x)|^2
1287:  \Big)
1288: +\ln Z_{\cal V}(\theta)
1289: \label{local-hyper-p}
1290: ,
1291: \eea
1292: while for real $\theta(x)$
1293: Eq.~(\ref{local-hyper-p-r})  
1294: includes a mixed term in $\omega_1\omega_2$.
1295: It is sometimes helpful to transform
1296: an unrestricted real hyperfield $-\infty\le g(x)\le\infty$
1297: into a bounded real hyperfield 
1298: $\theta(x)\in [0,1]$ by
1299: \be
1300: \theta(x) = \sigma(g(x)-\vartheta)
1301: ,
1302: \label{def-B}
1303: \ee
1304: with threshold $\vartheta$
1305: and sigmoidal transformation
1306: \be
1307: \sigma(x) = \frac{1}{1+e^{-2\nu x}}
1308: = \frac{1}{2} (\tanh(\nu x) + 1)
1309: .
1310: \label{sigmoid-bsp}
1311: \ee
1312: In the limit $\nu\rightarrow\infty$
1313: the transformation $\sigma(x)$ of (\ref{sigmoid-bsp}) 
1314: approaches the step function $\Theta(x)$ 
1315: and (\ref{def-B}) results in a binary 
1316: $\theta(x)$ = $\Theta(g(x)-\vartheta)\in \{0,1\}$. 
1317: 
1318:  
1319: Analogous to the global mixing or global switching
1320: in Eq.~(\ref{integer-hyper-t})
1321: and Eq.~(\ref{integer-hyper-K}),
1322: the alternative filtered differences $\omega_i (x)$
1323: at position $x$
1324: in Eq.~(\ref{hyper-function-omega})
1325: can be constructed by local mixing or switching 
1326: between 
1327: template functions 
1328: $v_1(x^\prime)$, $v_2(x^\prime)$
1329: or filters 
1330: ${\bf W}_1(x,x^\prime)$, ${\bf W}_2(x,x^\prime)$
1331: using a local hyperfield $\theta(x)$,
1332: \bea
1333: v_x(x^\prime;\theta) 
1334: &=& 
1335: [1-\theta(x)] \, v_1(x^\prime) + \theta(x)\, v_2(x^\prime),
1336: \label{hyper-function-t}
1337: \\
1338: {\bf W}(x,x^\prime; \theta) &=& 
1339: [1\! -\theta(x)] {\bf W}_{1}(x,x^\prime)
1340: \! + \theta(x) {\bf W}_{2}(x,x^\prime)
1341: \label{hyper-function-W}
1342: .\;\;
1343: \eea
1344: It is important to note that the local templates or 
1345: reference potentials
1346: $v_x(x^\prime; \theta)$ 
1347: are functions 
1348: of $x^\prime$ and $x$.
1349: Indeed, to obtain a filtered difference $\omega(x;\theta)$ at position $x$,
1350: a reference function $v_x$ is needed for all $x^\prime$ for which 
1351: the corresponding ${\bf W}(x,x^\prime)$
1352: is nonzero, since
1353: \be
1354: \omega(x;\theta )
1355: =
1356: \int\!dx^\prime\, 
1357: {\bf W}(x,x^\prime) 
1358: [{v}(x^\prime)-v_x(x^\prime;\theta)]
1359: .
1360: \ee
1361: In this way the whole template function $v_x(x^\prime;\theta)$,
1362: rather than individual function values $v_0(x,\theta)$, 
1363: is adapted individually for every local filtered difference.
1364: In particular, the local reference potentials of Eq.~(\ref{hyper-function-t})
1365: have to be distinguished from one global,
1366: locally adapted reference potential
1367: \be
1368: v_0(x^\prime;\theta) 
1369: =
1370: [1-\theta(x^\prime )] \, v_{1}(x^\prime)
1371:  + \theta(x^\prime )\, v_{2}(x^\prime)
1372: \label{mixing-t}
1373: ,
1374: \ee
1375: which at first glance seems to be the natural generalization of
1376: Eq.~(\ref{integer-hyper-t}) to local hyperfields.
1377: Only in Gaussian prior terms 
1378: with the identity ${\bf I}$ as covariance,
1379: local template functions $v_x(x^\prime, \theta)$
1380: are not required.
1381: In that case $v_{x}(x^\prime;\theta)$ 
1382: is only needed for $x$ = $x^\prime$
1383: and we may directly write
1384: $v_{x}(x^\prime;\theta)$ 
1385: =
1386: $\tilde v_{0}(x^\prime;\theta)$,
1387: skipping the variable $x$, and obtain the prior energy
1388: \be
1389: \frac{1}{2}\int\! dx\; |\omega(x;\theta)|^2
1390: =
1391: \frac{1}{2}\scp{v-\tilde v_0(\theta)}{v-\tilde v_0(\theta)}
1392: .
1393: \label{identity-cov}
1394: \ee
1395: We remark that one can also generalize
1396: Eq.~(\ref{hyper-function-t}), 
1397: which uses the same  
1398: $v_1(x^\prime)$, $v_2(x^\prime)$ for all $x$,
1399: by working with reference potentials 
1400: $v_{1,x}(x^\prime)$, $v_{2,x}(x^\prime)$
1401: which vary with the position $x$ 
1402: at which the filtered difference $\omega(x)$
1403: is required. This yields
1404: \be
1405: v_x(x^\prime;\theta) 
1406: =
1407: [1-\theta(x)] \, v_{1,x}(x^\prime)
1408: + \theta(x)\,  v_{2,x}(x^\prime)
1409: .
1410: \label{hyper-function-t-nonlocal}
1411: \ee
1412: 
1413: For binary $\theta(x)$ 
1414: Eq.~(\ref{hyper-function-W}) 
1415: corresponds
1416: to an inverse covariance
1417: \bea
1418: {\bf K}_0(\theta) 
1419: &=& \int \!dx\; {\bf K}_x(\theta) 
1420: =
1421: \int \!dx \, {W}_{x}(\theta){W}^T_{x}(\theta)
1422: \nonumber\\
1423: &=& \int \!\! dx
1424: \left(
1425: [1-\theta(x)] {W}_{1,x}{W}^T_{1,x}
1426: +  \theta(x)  {W}_{2,x}{W}^T_{2,x}
1427: \right)
1428: \qquad
1429: \label{invcov}
1430: \eea
1431: with
1432: %${\bf K}_x(\theta)$ = ${W}_{x}(\theta){W}^T_{x}(\theta)$
1433: \be
1434: {\bf K}_x(\theta) = {W}_{x}(\theta){W}^T_{x}(\theta)
1435: \ee
1436: written as dyadic product of the vector
1437: $W_{x}(\theta )$ = ${\bf W}(x,\cdot\,;\theta)$
1438: and with analogously defined $W_{i,x}$ = ${\bf W}_i(x,\cdot)$.
1439: For $\theta$--dependent inverse covariances
1440: the normalization factors $Z_{\cal V}(\theta)$ become 
1441: $\theta$--dependent. They have to be included
1442: when integrating over $\theta$ or 
1443: solving for the optimal $\theta$ in MAP.
1444: 
1445: In Eqs.~(\ref{hyper-function-t}) and (\ref{hyper-function-W})
1446: it is straightforward to introduce
1447: two binary hyperfields $\theta$, $\theta^\prime$,
1448: one for the reference potential $v_x$ and one for 
1449: the filter ${\bf W}$.
1450: This results in a conditional prior
1451: \bea
1452: p({v}|\theta,\theta^\prime)
1453: &\propto&
1454: e^{-\frac{1}{2}
1455: \int\!dx\, 
1456: \mel{{v} - v_x(\theta)}{{\bf K}_x(\theta^\prime)}{{v}-v_x(\theta)}
1457: }
1458: \nonumber\\
1459: &=&
1460: e^{-\frac{1}{2} \int \!dx\, |\omega(x;\theta,\theta^\prime)|^2}
1461: .
1462: \eea
1463: Here we can write
1464: \bea
1465: \int\! dx \, |\omega(x;\theta,\theta^\prime)|^2 
1466: &=&
1467: \mel{{v}-v_0(\theta,\theta^\prime)}
1468: {{\bf K}_0(\theta^\prime)}{{v}-v_0(\theta,\theta^\prime)}
1469: \nonumber\\
1470: &&+
1471: \int \!dx \,
1472: \mel{v_x(\theta)}{{\bf K}_x(\theta^\prime)}{v_x(\theta)}
1473: \nonumber\\
1474: &&-
1475: \mel{v_0(\theta,\theta^\prime)}
1476:   {{\bf K}_0(\theta^\prime)}{v_0(\theta,\theta^\prime)}
1477: ,
1478: \label{eff-E}
1479: \eea
1480: with an effective template $v_0(\theta,\theta^\prime)$
1481: given by
1482: \be
1483: v_0(\theta ,\theta^\prime)
1484: =
1485: {\bf K}_0(\theta^\prime)^{-1}
1486: \int\!dx\, {\bf K}_x(\theta^\prime ) \,  v_x(\theta)
1487: %{\bf K}_0(\theta^\prime)
1488: ,
1489: \ee
1490: and effective inverse covariance ${\bf K}_0(\theta^\prime)$
1491: =
1492: $\int \! dx\, {\bf K}_x(\theta^\prime)$
1493: as in Eq. (\ref{invcov}).
1494: Since 
1495: the last two terms in Eq.~(\ref{eff-E}) are ${v}$--independent constants 
1496: (only depending on $\theta$, $\theta^\prime$)
1497: we see that for fixed hyperfields
1498: %$\theta$, $\theta^\prime$
1499: this prior is minimized by
1500: $v$ = $v_0(\theta,\theta^\prime)$.
1501: For given hyperparameters $\theta$, $\theta^\prime$
1502: we can write
1503: $p({v}|\theta,\theta^\prime)\propto e^{-E({v}|\theta,\theta^\prime)}$
1504: with a prior energy of the form
1505: $E({v}|\theta,\theta^\prime)$
1506: =
1507: $\frac{1}{2}
1508: \mel{{v}-v_0(\theta,\theta^\prime)}
1509:     {{\bf K}_0(\theta^\prime )}{{v}-v_0(\theta,\theta^\prime)}$.
1510: 
1511: As the product of Gaussians is again a Gaussian
1512: several Gaussian prior factors can easily be combined.
1513: In this way one can implement a nonlocal property like smoothness 
1514: and still avoid local template functions $v_x(x^\prime, \theta)$
1515: by combining a Gaussian prior with ${\bf K}_0$ = ${\bf I}$
1516: as in (\ref{identity-cov})
1517: with a Gaussian prior with nondiagonal covariance and 
1518: zero (or fixed) template,
1519: \be
1520: E({v}|\theta) =
1521: \frac{1}{2}
1522: \scp{{v}-\tilde v_0(\theta)}{{v}-\tilde v_0(\theta)}
1523: +\frac{1}{2}
1524: \mel{{v}}{{\bf K}}{{v}}
1525: .
1526: \label{local+laplace}
1527: \ee
1528: Combining both terms yields
1529: \bea
1530: E({v}|\theta) 
1531: &=&
1532: \frac{1}{2} 
1533: \bigg( 
1534: \mel{{v}-v_0(\theta)}{{\bf K}_0}{{v}-v_0(\theta)}
1535: \nonumber\\
1536: &&\quad  + \;\;
1537: \mel{\tilde v_0(\theta)}
1538:     {{\bf I}-{\bf K}_0^{-1}}{\tilde v_0(\theta)}
1539: \bigg)
1540: ,
1541: \label{local+laplace2}
1542: \eea
1543: with the second term 
1544: being independent of $v$
1545: and 
1546: with effective template and effective inverse covariance
1547: \be
1548: v_0(\theta) = {\bf K}_0^{-1} \tilde v_0(\theta)
1549: ,\quad 
1550: {\bf K}_0 = {\bf I}+ {\bf K}
1551: .
1552: \ee
1553: For differential operators ${\bf K}_0$
1554: the effective $v_0(\theta)$ 
1555: is thus a smoothed version of $\tilde v_0(\theta)$.
1556: 
1557: 
1558: The extreme case would be to treat
1559: $v_0$ and ${\bf W}$ itself as unrestricted hyperfields.
1560: As already discussed,
1561: this just eliminates the corresponding prior term.
1562: Hence, to restrict the flexibility,
1563: typically a smoothness hyperprior may be imposed
1564: to prevent highly oscillating functions $\theta (x)$.
1565: For real $\theta(x)$, for example, a smoothness prior
1566: like a Laplacian prior $\mel{\theta}{\! -\!\Delta}{\theta}/2$ can be used 
1567: in regions where it is defined. 
1568: (The space of functions
1569: for which a smoothness prior 
1570: with discontinuous templates is defined
1571: depends on the locations of the discontinuities.)
1572: An example of a non--Gaussian hyperprior is
1573: \be
1574: p(\theta) \propto 
1575: e^{-\frac{\tau}{2} \int\!dx \, C_\theta(x)}
1576: ,
1577: \label{hyperprior-C}
1578: \ee
1579: where $\tau$ is a constant 
1580: and 
1581: \be
1582: C_\theta(x) = 
1583: \sigma \left( \left(\frac{\partial \theta}{\partial x}\right)^2 
1584:   - \vartheta_\theta\right)
1585: ,
1586: \label{jumps}
1587: \ee
1588: with a sigmoid $\sigma(x)$ as in  (\ref{sigmoid-bsp}).
1589: For $\nu\rightarrow\infty$ the sigmoid approaches a step function
1590: and $C_\theta(x)$ 
1591: becomes zero at locations where the square of the first derivative
1592: is smaller than a certain 
1593: threshold $0\le \vartheta_\theta < \infty$,
1594: and one otherwise.
1595: For discrete $x$ one can analogously 
1596: count the number of jumps 
1597: larger than a given threshold.
1598: One can then penalize the number $N_d(\theta)$ 
1599: of discontinuities
1600: where 
1601: $\left(\partial \theta/\partial x\right)^2$ = $\infty$
1602: and use 
1603: \be
1604: p(\theta) \propto e^{-\frac{\tau}{2} N_d(\theta)}
1605: .
1606: \label{hyperprior-Nd}
1607: \ee
1608: In the case of a binary field
1609: this corresponds 
1610: to counting the number of times the field changes its value.
1611: The expression $C_\theta$
1612: of Eq. (\ref{jumps})
1613: can be generalized to
1614: \be
1615: C_\theta(x)
1616: = \sigma\left( |\omega_\theta(x)|^2-\vartheta_\theta\right)
1617: ,
1618: \label{Cdef}
1619: \ee
1620: where,
1621: analogous to Eq.~(\ref{filtered-diff}), 
1622: \be
1623: \omega_\theta(x)
1624: =
1625: \int \!dx^\prime \, 
1626: {\bf W}_\theta(x,x^\prime) 
1627: [\theta(x^\prime)-t_\theta(x^\prime)]
1628: ,
1629: %\label{filtered-diff-theta}}
1630: \ee
1631: with template
1632: $t_\theta(x^\prime)$
1633: representing the expected form for the hyperfield,
1634: and a filter operator 
1635: ${\bf W}_\theta$
1636: defining a distance measure for hyperfields.
1637: Parameters of the hyperprior like $\tau$ 
1638: in Eq. (\ref{hyperprior-C}) or Eq.~(\ref{hyperprior-Nd})
1639: can be treated as higher level hyperparameters.
1640: 
1641: 
1642: 
1643: \subsection{Non--Gaussian priors and auxiliary fields}
1644: \label{Non--Gaussian-priors}
1645: 
1646: As an alternative to introducing hyperfields $\theta(x)$
1647: one can work with priors which are
1648: explicitly non--Gaussian with respect to $v$.
1649: This can be done by introducing auxiliary fields
1650: $B(x;v)$ 
1651: whose function values are not considered 
1652: as independent variables
1653: but are directly defined as functionals of $v$.
1654: (For the sake of simplicity
1655: we will for $B(x;v)$ also write
1656: $B(x)$ or $B(v)$, depending on the context.)
1657: Like hyperfields,
1658: auxiliary fields 
1659: can select locally the best adapted filtered difference
1660: from a set of alternative $\omega_i$.
1661: 
1662: For instance, consider the auxiliary field
1663: [compare with Eqs.~(\ref{def-B}) or (\ref{Cdef})]
1664: \be
1665: B(x) = 
1666: \sigma \left(u(x) - \vartheta \right)
1667: ,
1668: \label{jumps2}
1669: \ee
1670: where
1671: \be
1672: u(x) = |\omega_1(x)|^2-|\omega_2(x)|^2
1673: ,
1674: \ee
1675: $\vartheta$ represents a threshold,
1676: $\sigma (x)$ a sigmoidal function as in (\ref{sigmoid-bsp}),
1677: and the $\omega_i$ are filtered differences 
1678: defined in terms of $v$
1679: according to Eq.~(\ref{filtered-diff}).
1680: Again a binary field $B(x)$ is obtained 
1681: by letting the sigmoid approach the step function.
1682: Because the $\omega_i$ depend on $v$,
1683: it is clear from the definition (\ref{jumps2})
1684: that the auxiliary field $B(x)$ is no independent hyperfield
1685: but has values being functionals of ${v}$.
1686: Notice that $B(x)$ 
1687: is nonlocal with respect to ${v}(x)$
1688: if $\omega_i(x)$ is nonlocal;
1689: a value $B(x)$ then depends 
1690: on more than one ${v}(x)$--value.
1691: For a negative Laplacian prior in one--dimension
1692: Eq.~(\ref{jumps2}) reads,
1693: \be
1694: B(x) = 
1695: \sigma \left( 
1696: \left|\frac{\partial ({v}-v_1)}{\partial x}\right|^2
1697: -\left|\frac{\partial ({v}-v_2)}{\partial x}\right|^2
1698:             - \vartheta
1699: \right)
1700: .
1701: \label{jumps2b}
1702: \ee
1703: While auxiliary fields $B(x)$ are directly determined by ${v}$,
1704: hyperfields are indirectly coupled to $v$
1705: through the MAP stationarity equations.
1706: Conversely,
1707: an auxiliary field $B(x)$ can be treated formally 
1708: as independent hyperfield
1709: if a Lagrange multiplier term 
1710: \mbox{$\lambda
1711: \left[
1712: B(x)-\sigma \left(u(x) - \vartheta 
1713: \right)
1714: \right]$}
1715: is added to the prior energy
1716: in the limit $\lambda\rightarrow\infty$.
1717: 
1718: 
1719: Like hyperfields $\theta(x)$
1720: auxiliary fields $B(x)$
1721: can be used 
1722: to adapt reference potentials $v_0$ or filters ${\bf W}$.
1723: However,
1724: a prior as in  Eq.~(\ref{gaussprior})
1725: is non--Gaussian with respect to $v$
1726: if $v_0(B)$ and ${\bf K}_0(B)$ 
1727: depend on $B$ and thus also on $v$.
1728: Furthermore, analogous to hyperpriors $p(\theta)$,
1729: additional prior terms $p(B(v))\propto \exp(-E_B(v))$ for $v$ 
1730: can be included,
1731: formulated in terms of an auxiliary field $B(x)$.
1732: As in Eq.~(\ref{local-hyper-p})
1733: a binary $B(x)$ can switch between two filtered differences
1734: \be
1735: |\omega(x;B)|^2
1736: =
1737: [1-B(x)] |\omega_1(x)|^2 
1738: +
1739: B(x) |\omega_2(x)|^2 
1740: ,
1741: \label{binary-B}
1742: \ee
1743: within a (non--Gaussian) prior for ${v}$
1744: \be
1745: p({v}) \propto
1746: e^{-E(v)-E_B(v)}
1747: ,
1748: \label{b-prior}
1749: \ee
1750: where the normalization factor 
1751: $Z$ = $\int \!dv\,e^{-E(v)-E_B(v)}$ 
1752: of (\ref{b-prior})
1753: is by definition independent of $v$. 
1754: Hence it can be skipped for MAP calculations
1755: also for non--Gaussian $p(v)$.
1756: In Eq.~(\ref{b-prior})
1757: \be
1758: E(v) = \frac{1}{2} \int\!dx\, 
1759: \left(
1760: [1-B(x)] |\omega_1(x)|^2 
1761: +
1762: B(x) |\omega_2(x)|^2 
1763: \right)
1764: ,
1765: \label{omega-B-energy}
1766: \ee
1767: according to Eq.~(\ref{binary-B}),
1768: while $E_B(v)$ depends on $v$
1769: only through $B(v)$.
1770: For example, the number of switchings
1771: can be restricted by taking
1772: \be
1773: E_B(v) = \frac{\tau}{2}N_d(B)
1774: ,
1775: \label{additional-b-prior}
1776: \ee
1777: where
1778: $N_d(B)$ counts the number of discontinuities of $B(x)$.
1779: Other choices, for real $B(x)$, 
1780: are quadratic energies
1781: \be
1782: E_B(v) = \frac{\tau}{2}\int \!dx |\omega_B(x)|^2 
1783: \label{quad-err}
1784: \ee
1785: or non--quadratic energies of the form
1786: \be
1787: E_B(v) = \frac{\tau}{2}\int \!dx \,C_B (x)
1788: \label{non-quad-err}
1789: \ee
1790: where, similar to (\ref{Cdef}),
1791: \be
1792: C_B(x)
1793: = \sigma\left( |\omega_B(x)|^2-\vartheta_B\right)
1794: .
1795: \label{cforb}
1796: \ee
1797: and
1798: \be
1799: \omega_B(x)
1800: =
1801: \int \!dx^\prime \, 
1802: {\bf W}_B(x,x^\prime) 
1803: [B(x^\prime)-t_B(x^\prime)]
1804: ,
1805: \label{filtered-diff-ng}
1806: \ee
1807: is a filtered difference of $B$ 
1808: with filter operator
1809: ${\bf W}_B$
1810: and template
1811: $t_B$.
1812: 
1813: 
1814: 
1815: Let us compare a non--Gaussian prior 
1816: built of prior energies \ (\ref{omega-B-energy})
1817: and (\ref{additional-b-prior}) 
1818: for a binary auxiliary field (\ref{jumps2})
1819: \be
1820: p(v) \propto
1821: e^{-\frac{1}{2} \int\!dx\, 
1822: \left(
1823: [1-B(x)] |\omega_1(x)|^2 
1824: +
1825: B(x) |\omega_2(x)|^2 
1826: \right)
1827: -\frac{\tau}{2} N_d(B)
1828: }
1829: ,
1830: \label{cmpB}
1831: \ee
1832: with the similar--looking
1833: combination of Gaussian prior (\ref{local-hyper-p})
1834: with hyperprior (\ref{hyperprior-Nd})
1835: for a binary hyperfield,
1836: \be
1837: p({v},\theta) 
1838: =p(v|\theta) p(\theta) 
1839: \propto
1840: \label{omega-theta-energy}
1841: \label{cmpT}
1842: \ee
1843: \[
1844: e^{-\frac{1}{2} \int\!dx\, 
1845: \left[
1846: (1-\theta(x)) |\omega_1(x)|^2 
1847: +
1848: \theta(x) |\omega_2(x)|^2 
1849: \right]
1850: -\frac{\tau}{2} N_d(\theta)
1851: -\ln Z_{\cal V}(\theta)
1852: }
1853: .
1854: \]
1855: Eq.~(\ref{cmpT}) works with conditional probabilities $p(v|\theta)$, 
1856: hence the corresponding 
1857: normalization factors are in general $\theta$--dependent 
1858: and have to be included
1859: for MAP calculations.
1860: Typically, MAP solutions 
1861: for $B$, $N_d(B)$ and $C_B$
1862: being directly defined in terms of the corresponding MAP solution for $v$
1863: are different from the MAP solutions for $\theta$, 
1864: $N_d(\theta)$ and $C_\theta$,
1865: respectively.
1866: However, if the filtered differences $\omega_i$ 
1867: in Eq.~(\ref{omega-theta-energy})
1868: differ only in their templates,
1869: the normalization term can be skipped.
1870: Then
1871: assuming 
1872: $\vartheta$ = $0$,
1873: $p(\theta) \propto  1$,
1874: $p(B) \propto 1$
1875: the two equations are equivalent
1876: for 
1877: $\theta(x)$ = $\Theta\left(|\omega_1(x)|^2-|\omega_2(x)|^2\right)$.
1878: In the absence of hyperpriors,
1879: it is indeed easily seen 
1880: that this is a selfconsistent solution for $\theta$
1881: for every given ${v}$.
1882: In general, however, when 
1883: hyperpriors are included,
1884: another solution for $\theta$ 
1885: may have a larger posterior.
1886: 
1887: 
1888: Hyperpriors $p(\theta)$ 
1889: or additional auxiliary prior terms $p(B)$
1890: can be useful to enforce specific
1891: global constraints for $\theta(x)$ or $B(x)$.
1892: In natural images, for example, discontinuities 
1893: are expected to form closed curves.
1894: Priors or hyperpriors, 
1895: organizing discontinuities along lines or closed curves, 
1896: are thus important for image segmentation or image restoration
1897: \cite{Geman-Geman-1984,Poggio-Torre-Koch-1985,Marroquin-Mitter-Poggio-1987,Geiger-Girosi-1991,Zhu-Yuille-1996}.
1898: A similar method has been used
1899: in the determination of piecewise smooth relaxation time spectra
1900: from rheological data
1901: \cite{Roths-Maier-Friedrich-Marth-Honerkamp-2000}.
1902: 
1903: Another  useful class of
1904: non--Gaussian priors
1905: generalizing (\ref{Gauss-omega})
1906: has the form \cite{Winkler-1995,Zhu-Mumford-1997,Zhu-Wu-Mumford-1997}
1907: \be
1908: p(v)
1909: \propto
1910: e^{-\frac{1}{2} \int\! dx\, \psi[\omega(x)]}
1911: ,
1912: \ee
1913: where $\psi$ is a non--quadratic function.
1914: This function $\psi$
1915: can be fixed in advance for a given problem
1916: or adapted using hyperparameters.
1917: Typical choices to allow discontinuities
1918: are symmetric ``cup'' functions 
1919: with minimum at zero and flat tails 
1920: for which one large step is cheaper than many small ones
1921: (see Fig.~\ref{Zhu-Mum-pic}).
1922: 
1923: Table \ref{collection}
1924: summarizes the basic variants of prior energies 
1925: discussed in the paper.
1926: 
1927: 
1928: \begin{figure}
1929: \vspace{-0.5cm}
1930: \begin{center}
1931: %\epsfig{file=wink1a.eps, width=50mm}\\
1932: \epsfig{file=figure1.eps, width=50mm}\\
1933: \end{center}
1934: \vspace{-0.5cm}
1935: \caption{Example of a non--quadratic
1936: ``cup''--function
1937: $\psi(x)$ = $a( 1.0 - 1/(1+(|x-x_0|/b)^\gamma))$, 
1938: with
1939: $a$= $5$,
1940: $b$ = $10$,
1941: $\gamma$ = $0.7$,
1942: $x_0$ = $0$.
1943: }
1944: \label{Zhu-Mum-pic}
1945: \end{figure}
1946: 
1947: 
1948: \begin{table}[ht]
1949: \begin{center}
1950: \begin{tabular}{|c|c|}
1951: %\hline\rule[-2mm]{0mm}{6mm}
1952: %prior energy  & Eq. \\
1953: %\hline
1954: \hline
1955: \multicolumn{2}{|c|}{Gaussian prior\rule[-2mm]{0mm}{6mm}}\\
1956: \hline\rule[-2mm]{0mm}{6mm}
1957: $E(v)$ = 
1958: $\frac{1}{2} \mel{v-v_0}{{\bf K}_0}{v-v_0}$ 
1959: & (\ref{gaussprior}) \\
1960: \hline
1961: \multicolumn{2}{|c|}{with hyperparameter $\theta$\rule[-2mm]{0mm}{6mm}}\\
1962: \hline\rule[-2mm]{0mm}{6mm}
1963: $E(v|\theta)$ = 
1964: $\frac{1-\theta}{2}\mel{v-v_1}{{\bf K}_1}{v-v_1}$ &\\
1965: $\quad\qquad +\frac{\theta}{2}\mel{v-v_2}{{\bf K}_2}{v-v_2}$
1966: \rule[-2mm]{0mm}{6mm}
1967: &(\ref{mix-prior-bin})\\
1968: \hline
1969: \multicolumn{2}{|c|}{with local hyperfield $\theta(x)$\rule[-2mm]{0mm}{6mm}}\\
1970: \hline\rule[-2mm]{0mm}{6mm}
1971: $E(v|\theta)$ = 
1972: $\frac{1}{2} \int \!dx \, 
1973:  \Big( [1-\theta(x)]|\omega_1(x)|^2$ &\\
1974: $\qquad\qquad+\theta(x) |\omega_2(x)|^2
1975:  \Big)
1976: +\ln Z_{\cal V}(\theta)$
1977: & (\ref{local-hyper-p})\\
1978: %\hline
1979: $E(v|\theta)$ = 
1980: $\frac{1}{2}
1981: \scp{{v}-\tilde v_0(\theta)}{{v}-\tilde v_0(\theta)}
1982: +\frac{1}{2}
1983: \mel{{v}}{{\bf K}}{{v}}$
1984: \rule[-2mm]{0mm}{6mm}
1985: &(\ref{local+laplace})\\
1986: \hline
1987: \multicolumn{2}{|c|}{
1988: Non--Gaussian prior with auxiliary field $B(x;v)$\rule[-2mm]{0mm}{6mm}}\\
1989: \hline\rule[-2mm]{0mm}{6mm}
1990: $E(v)$ = 
1991: $\frac{1}{2} \int\!dx\, 
1992: \left(
1993: [1-B(x)]|\omega_1(x)|^2 
1994: +
1995: B(x) |\omega_2(x)|^2 
1996: \right)$ 
1997: &(\ref{omega-B-energy})\\
1998: \hline
1999: \end{tabular}
2000: \end{center}
2001: \caption{Summary of basic prior energy variants discussed in this paper.}
2002: \label{collection}
2003: \end{table}
2004: 
2005: 
2006: \section{Stationarity equations}
2007: \label{stationarity-equations}
2008: 
2009: To reconstruct a local potential $v$
2010: in MAP we have to maximize the posterior $p(v|D)$
2011: with respect to $v$.
2012: If the functional derivative of the posterior with respect to $v$ exists,
2013: the reconstructed potential can be found by solving the 
2014: stationarity equation
2015: \be
2016: \delta_{v} \ln p(v|D) = 0
2017: ,
2018: \label{stat-eq}
2019: \ee
2020: where we have chosen the logarithm for technical convenience, and 
2021: $\delta_{v}$ denotes the functional derivative with respect to $v$.
2022: 
2023: For observational data consisting of 
2024: $n$ independent position measurements
2025: the posterior (\ref{bayestheorem}) reads
2026: \be
2027: p(v|D) 
2028: \propto 
2029: p(v) \prod_{i=1}^n p(x_i|\hat x,v)
2030: .
2031: \label{posterior2}
2032: \ee
2033: To formulate the stationarity equation (\ref{stat-eq})
2034: we have to calculate the functional derivatives 
2035: of likelihood and prior.
2036: For inverse quantum statistics \cite{Lemm-IQS-2000} 
2037: the likelihood for position measurements (\ref{pos-likelihood})
2038: on a canonical ensemble (\ref{canonical})
2039: depends on the eigenfunctions and eigenvalues 
2040: of the $v$--dependent Hamiltonian $H(v)$.
2041: We thus have to find the functional derivatives
2042: of the eigenfunctions $\phi_\alpha$ and eigenvalues $E_\alpha$.
2043: Those can be obtained by taking the functional derivative
2044: of the eigenvalue equation 
2045: $H\ket{\phi_\alpha}$ = $E_\alpha \ket{\phi_\alpha}$,
2046: where we will assume the eigenfunctions 
2047: to be orthonormalized.
2048: Choosing
2049: $\scp{\phi_\alpha}{\delta_{v(x)}\phi_\alpha}$ = 0
2050: and
2051: utilizing  
2052: \be
2053: \delta_{v(x)} H (x^\prime,x^{\prime\prime})
2054: = 
2055: \delta_{v(x)} V (x^\prime,x^{\prime\prime})
2056: = 
2057: \delta(x-x^\prime) \delta (x^\prime-x^{\prime\prime})
2058: ,
2059: \ee
2060: we find for nondegenerate eigenfunctions
2061: \bea
2062: \delta_{v(x)} E_\alpha 
2063: &=& \mel{\phi_\alpha}{\delta_{v(x)} H}{\phi_\alpha}
2064: =|\phi_\alpha(x)|^2
2065: ,
2066: \label{deltaE-nonp}
2067: \\
2068: \delta_{v(x)} \phi_\alpha(x^{\prime})
2069: &=& \sum_{\gamma\ne \alpha} \frac{1}{E_\alpha-E_\gamma} 
2070: \phi_\gamma(x^{\prime})\phi^*_\gamma(x) \phi_\alpha (x)
2071: .
2072: \eea
2073: It follows for the functional derivative of the likelihood
2074: \bea
2075: \delta_{v(x)}p(x_i|\hat x,v)
2076: &=&
2077: \av{\left(\delta_{v(x)}\phi^*(x_i)\right) \phi (x_i)} 
2078: \nonumber\\&&
2079: +\av{\phi^*(x_i)\delta_{v(x)}\phi (x_i)}
2080: \nonumber\\&&
2081: -
2082: \beta \Big(
2083: \av{|\phi (x_i)|^2 |\phi (x)|^2}
2084: \nonumber\\&&
2085: -\av{|\phi (x_i)|^2}\av{|\phi (x)|^2}
2086: \Big)
2087: .
2088: \label{der-like}
2089: \eea
2090: 
2091: 
2092: Having obtained Eq.~(\ref{der-like})
2093: for the likelihood 
2094: we now have to find the functional derivative of the prior.
2095: For the Gaussian prior (\ref{gaussprior})
2096: one gets directly
2097: \be
2098: \delta_{v} \ln p(v) 
2099: = -{\bf K}_0(v-v_0)
2100: .
2101: \label{prior-dev}
2102: \ee
2103: 
2104: 
2105: If hyperparameters $\theta$ are included
2106: and treated in MAP
2107: (i.e., not integrated out by Monte Carlo techniques),
2108: the posterior has to be maximized simultaneously
2109: with respect to $v$ and $\theta$.
2110: We have already mentioned that $\theta$--dependent inverse covariances
2111: lead to normalization factors which are independent of $v$
2112: but depend on $\theta$.
2113: Such factors have to be included 
2114: when maximizing with respect to $\theta$.
2115: 
2116: As a non--Gaussian example
2117: consider a prior
2118: where two filtered differences
2119: are mixed by an auxiliary field $B(x)$
2120: and an additional prior factor $p(B)$ is included,
2121: for example to prevent fast oscillations of $B(x)$.
2122: With $B(x)$ = $\sigma(u(x)-\vartheta)$,
2123: threshold $\vartheta$,
2124: sigmoidal function  $\sigma(x)$
2125: as in Eq.~(\ref{sigmoid-bsp}),
2126: and 
2127: $u(x)$ = $|\omega_1(x)|^2-|\omega_2(x)|^2$
2128: this gives
2129: \be
2130: p(v) \propto
2131: e^{-\frac{1}{2} \, \int\! dx \, 
2132: \big| [1-B(x)] \omega_1(x) + B(x) \omega_2(x) 
2133: \big|^2 
2134: -E_B
2135: }
2136: .
2137: \label{non-gauss-prior}
2138: \ee
2139: Analogous to Eq.~(\ref{b-prior}),
2140: the term 
2141: \be
2142: E_B = \int\!dx\, E_B(x)
2143: ,
2144: \ee
2145: represents an auxiliary prior energy
2146: formulated in terms of the mixing function $B(x)$.
2147: Like $\omega(x)$ the function value $E_B(x)$
2148: may depend on the whole function $B$
2149: and not necessarily only on the function value $B(x)$.
2150: Using $\omega_i(x)$ = $\scp{x}{{\bf W}_i (v-v_i)}$
2151: we find
2152: \be
2153: \delta_{v(x)} \omega_i(x^\prime) 
2154: = {\bf W}_i(x^\prime,x)
2155: ,
2156: \ee
2157: and thus
2158: \be
2159: \delta_{v(x)} u(x^\prime)
2160: =
2161: 2\left({\bf W}^T_1(x,x^\prime) \, \omega_1(x^\prime)
2162: -{\bf W}^T_2(x,x^\prime) \, \omega_2(x^\prime)
2163: \right).
2164: \ee
2165: Furthermore, we obtain for the functional derivative of $E_B$
2166: \be
2167: \delta_{v(x)} E_B(x^\prime)
2168: =
2169: \int\!dx^{\prime\prime}\,
2170: \left[
2171: \delta_{v(x)} B(x^{\prime\prime})
2172: \right]
2173: \, 
2174: \left[
2175: \delta_{B(x^{\prime\prime})} E_B(x^\prime)
2176: \right]
2177: ,
2178: \ee
2179: where with Eq.~(\ref{jumps2})
2180: \be
2181: \delta_{v(x)} B(x^{\prime\prime})
2182: =
2183: \sigma^\prime(u(x^{\prime\prime})-\vartheta)
2184: \delta_{v(x)} u(x^{\prime\prime})
2185: ,
2186: \ee
2187: and
2188: $\sigma^\prime(u)$ = $d\sigma(u)/du$.
2189: For a prior energy as in (\ref{quad-err}) which is quadratic in  $B(x)$
2190: \be
2191: E_B(x) = |\omega_B(x)|^2
2192: ,
2193: \ee
2194: $\omega_B(x)$ defined in Eq.~(\ref{filtered-diff-ng}),
2195: the functional derivative with respect to $B(x)$ becomes
2196: \be
2197: \delta_{B(x)}E_B(x^\prime)
2198:  = 
2199: 2 {\bf W}_B^T(x,x^\prime)\, \omega_B(x^\prime)
2200: .
2201: \ee
2202: For a  non--Gaussian prior with energy (\ref{non-quad-err})
2203: an additional derivative of the sigmoid appears.
2204: Now all terms can be collected and inserted into the 
2205: functional derivative 
2206: of the prior (\ref{non-gauss-prior})
2207: \bea
2208: \delta_{v} \ln p(v)
2209: &=&
2210: -\int\! dx \, 
2211: \Big(
2212: \left[[1-B(x)] \omega_1(x) + B(x) \omega_2(x) 
2213: \right] 
2214: \nonumber\\&&
2215: \qquad\times
2216: \big(
2217: [1-B(x)] \delta_v\omega_1(x) 
2218: + B(x) \delta_v\omega_2(x) 
2219: \nonumber\\&&
2220: \qquad\qquad
2221: +\;\delta_v B(x) [\omega_2(x)-\omega_1(x)] 
2222: \big)
2223: \nonumber\\&&
2224: \qquad+\;\delta_v E_B(x) \Big)
2225: .
2226: \eea
2227: 
2228: The Bayesian approach to inverse quantum mechanics 
2229: is not restricted to position measurements,
2230: but allows to deal with all kinds of observations
2231: for which the likelihood can be calculated.
2232: To have better information about the depth of a potential
2233: it is useful to include information on the
2234: ground state energy of a system.
2235: For instance,
2236: including a noisy  measurement of the average energy
2237: \be
2238: U 
2239: = \av{E}
2240: =
2241: \sum_\alpha p_\alpha E_\alpha
2242: ,
2243: \ee
2244: yields an additional factor in the posterior of the form
2245: \be
2246: p_U \propto e^{-E_U}
2247: ,\quad
2248: E_U =
2249: \frac{\mu}{2} (U - \kappa)^2
2250: \label{averageE-penal}
2251: .
2252: \ee
2253: In the noise free limit 
2254: $\mu\rightarrow\infty$
2255: this yields $U\rightarrow\kappa$.
2256: 
2257: Calculating 
2258: the functional derivative of $U$ 
2259: with respect to a local potential
2260: \be
2261: \delta_{v(x)} U =
2262: \av{\delta_{v(x)} E}-\beta \av{E\; \delta_{v(x)} E}
2263: + \beta \av{E} \av{\delta_{v(x)} E}
2264: ,
2265: \ee
2266: it is straightforward to obtain
2267: \be
2268: \delta_{v(x)} E_U 
2269: =
2270:   \mu\left(U\!-\!\kappa\right)
2271: \av{|\phi (x)|^2\left[1-\beta \left( E  -U \right) \right]}
2272: .
2273: \ee
2274: 
2275: 
2276: Stationarity equations are typically nonlinear 
2277: and have to be solved by iteration.
2278: A possible iteration scheme is 
2279: \bea
2280: v^{(r+1)}
2281: &=&
2282: v^{(r)}\! +
2283: \eta {\bf A}^{-1}
2284: \Big[\delta_v \ln p(v^{(r)})
2285: %{\bf K}_0^{(r)} (v_0\!-\!v^{(r)})
2286: \label{iter1}
2287: \nonumber\\&&
2288: \quad +\sum_i \delta_{v}\ln p(x_i|\hat x,v^{(r)})
2289: -\delta_{v} E_U^{(r)}
2290: \Big]
2291: .
2292: \label{iteration}
2293: \eea
2294: Here $\eta$ is a step width which can be optimized
2295: by a line search algorithm
2296: and 
2297: the positive definite operator ${\bf A}$ 
2298: distinguishes different learning algorithms.
2299: 
2300: 
2301: 
2302: \section{Numerical examples}
2303: \label{numerical}
2304: 
2305: As numerical application of BIQM
2306: and to test several variants 
2307: of implementing {\it a priori} information
2308: we will study the reconstruction of an approximately
2309: periodic, one--di\-mensional potential.
2310: Such a potential may represent a one--dimensional surface
2311: where a periodic structure, 
2312: e.g. that of a regular crystal, 
2313: is distorted by impurities,
2314: located at unknown positions and of unknown form.
2315: 
2316: To test the quality of reconstruction algorithms,
2317: artificial data will be sampled
2318: from a model with known ``true'' potential $v_{\rm true}$.
2319: Selecting a specific prior model
2320: and applying the corresponding Bayesian reconstruction algorithm
2321: to the sampled data, 
2322: we will be able to compare the reconstructed
2323: potential with the original one.
2324: In particular, we will take as true potential
2325: the following perturbed periodic potential
2326: \be
2327: v_{\rm true}(x) = 
2328: \left\{
2329: {
2330: \sin \left( \frac{2\pi}{6}x\right); \quad 1\le x\le12,\,\;  25\le x\le 36,
2331: \atop
2332: \!\!\!\!\!\!
2333: \!\!\!\!\!\!\!\!\!\!\!\!
2334: \!\!\!\!\!\!\!\!\!\!\!\!
2335: \!\!\!\!
2336: \sin \left( \frac{2\pi}{12}x\right); \quad 13\le x\le 24,
2337: }
2338: \right.
2339: \ee
2340: using for the numerical calculations a mesh 
2341: of size 36.
2342: Considering a system prepared as canonical ensemble
2343: the potential $v_{\rm true}$
2344: defines a corresponding canonical density operator $\rho$ 
2345: as given in Eq.~(\ref{canonical}).
2346: Artificial data $D$ can then be sampled
2347: according to the likelihood model 
2348: of quantum mechanics (\ref{qm-likelihood}).
2349: For the following examples, $n$ = 200 data points representing 
2350: position measurements have been sampled
2351: using the transformation method
2352: \cite{Press-Teukolsky-Vetterling-Flannery-1992}.
2353: In all calculations we used periodic boundary conditions 
2354: for quantum mechanical wave functions
2355: while the potential $v$ has been set to zero at the boundaries.
2356: 
2357: 
2358: We will now discuss the results of a Bayesian reconstruction
2359: under varying prior models.
2360: As first example, consider a simple Gaussian prior
2361: (\ref{gaussprior}) 
2362: with negative Laplacian inverse covariance
2363: ${\bf K}_0$ = $-\lambda \Delta$,
2364: zero reference potential $v_0\equiv 0$,
2365: and an additional prior factor (\ref{averageE-penal})
2366: representing a noisy measurement of the average energy.
2367: The reconstruction results
2368: %ed potential $v_{\rm BIQM}$
2369: %and the corresponding
2370: %reconstructed likelihood $p(x|\hat x, v_{\rm BIQM})$
2371: are shown in Fig.~ \ref{p162}.
2372: In particular, the figure on top compares
2373: the reconstructed likelihood 
2374: $p_{\rm BIQM}(x|\hat x,v_{\rm BIQM})$
2375: with the true likelihood
2376: $p_{\rm true}(x|\hat x,v_{\rm true})$
2377: and with the empirical density, i.e.,
2378: the relative frequencies of the sampled data
2379: \be
2380: p_{\rm emp}(x) =
2381: \frac{1}{n} \sum_{i=1}^n \delta(x-x_i)
2382: .
2383: \ee
2384: Similarly, the lower figure compares
2385: the reconstructed  potential $v_{\rm BIQM}$
2386: with the true potential $v_{\rm true}$. 
2387: Since information on the average energy
2388: was available the depth of the potential
2389: is well approximated at least at one of its minima.
2390: This is sufficient to fulfill the noisy average energy condition.
2391: However, because only smoothness and  no
2392: periodicity information is implemented by the prior
2393: the reconstructed potential is too flat.
2394: The effect is stronger near the maxima 
2395: than near the minima of the potential
2396: because near the maxima only few data points are available
2397: and hence the reconstructed potential is there dominated by the zero reference
2398: potential in the smoothness prior.
2399: 
2400: To include information on approximate periodicity
2401: we have replaced in the next example
2402: the zero reference potential $v_0\equiv 0$
2403: by the strictly periodic reference potential
2404: \be
2405: v_0(x) = \sin \left( \frac{2\pi}{6}x\right)
2406: ,
2407: \label{per-ref-prior}
2408: \ee
2409: shown as dashed line 
2410: in the following figures of potentials.
2411: A reconstruction 
2412: with the periodic reference potential (\ref{per-ref-prior})
2413: but without average energy information,
2414: and starting the iteration with the reference potential as
2415: initial guess $v^{(0)}$ = $v_0$
2416: is shown in Fig.~\ref{p19}.
2417: Due to missing average energy information
2418: the depth of the potential is not well approximated.
2419: It is also clearly visible in Fig.~\ref{p19}
2420: that the smoothness prior
2421: does not favor solutions which are 
2422: similar to the reference $v_0$ itself 
2423: but solutions
2424: which have derivatives similar to that of $v_0$.
2425: Fig.~\ref{p19} also displays
2426: that the reconstruction of the potential does clearly identify
2427: the impurity.
2428: As the reference potential is not adapted
2429: to the impurity region the reconstruction is there poorer 
2430: than in the regular region.
2431: 
2432: Furthermore, it is worth emphasizing 
2433: that the reconstructed likelihood 
2434: fits the empirical density well,
2435: even slightly better than the true likelihood does.
2436: This is due to the flexibility of a nonparametric approach
2437: which allows to fit the fluctuations of the empirical density 
2438: caused by the finite sample size. 
2439: The effect is well known in empirical learning and 
2440: leads to so called ``overfitting''
2441: if the influence of the prior becomes to small.
2442: Since observational data 
2443: influence the reconstruction only through the likelihood,
2444: the reconstruction of potentials is in general 
2445: a more difficult task than the reconstruction of likelihoods.
2446: This indicates the special 
2447: importance of {\it a priori} information 
2448: when reconstructing potentials.
2449: Indeed, even if the complete likelihood is given,
2450: the problem of determining the potential
2451: can still be ill--defined
2452: in regions where the likelihood is small
2453: \cite{Zhu-Rabitz-1999}.
2454: 
2455: A prior model with periodic reference potential 
2456: can be made more flexible
2457: by adapting 
2458: amplitude, frequency, and phase
2459: of the reference potential (\ref{per-ref-prior}).
2460: For this purpose one can introduce a hyperparameter vector
2461: $\theta$ = $(\theta_1,\theta_2,\theta_3)$
2462: parameterizing amplitude, frequency, and phase
2463: and take as reference potential
2464: \be
2465: v_0(x;\theta) 
2466: = \theta_1 \sin \left( \frac{2\pi}{\theta_2}x+\theta_3\right)
2467: .
2468: \ee
2469: The corresponding maximization of the posterior
2470: with respect to $\theta$ is easy in that case
2471: and does not change the results of Fig.~\ref{p19}
2472: where the hyperparameters are already optimally adapted.
2473: 
2474: 
2475: Including an additional noisy energy measurement
2476: (\ref{averageE-penal})
2477: Fig.~\ref{p22} shows that 
2478: the depth of the 
2479: potential is indeed better approximated
2480: than in Fig.~\ref{p19}.
2481: To avoid local maxima of the posterior
2482: the solution of Fig.~\ref{p19} 
2483: has been used as initial guess
2484: and the factor $\mu$ multiplying the average energy term
2485: has been slowly increased to its final value.
2486: Fig.~\ref{p22} still only represents a
2487: local and no global maximum of the posterior, 
2488: as can be by seen by starting 
2489: with a different initial guess $v^{(0)}$.
2490: In Fig.~\ref{p155}
2491: a better solution for the same parameters
2492: is presented 
2493: where the initial guess has been selected
2494: using {\it a priori} information
2495: about the location of the impurity region. 
2496: 
2497: 
2498: Alternatively to a Gaussian prior with periodic reference,
2499: approximate periodicity can be enforced by 
2500: the inverse covariance of a Gaussian prior.
2501: In this case the prior 
2502: favors periodicity but no special form of the potential.
2503: The prior is thus less specific
2504: than a prior with explicit periodic reference function.
2505: Corresponding BIQM results 
2506: for the inverse covariance (\ref{periodic-cov})
2507: are shown in 
2508: Fig.~\ref{p31}. 
2509: Indeed while the potential is well approximated 
2510: in regions where many observations have been collected,
2511: it is not as well approximated in regions where no or only few data 
2512: are available.
2513: These are the regions where the prior dominates the observational data.
2514: In particular, in the case presented in Fig.~\ref{p31},
2515: the zero reference function $v_0\equiv 0$
2516: of an additional Laplacian smoothness prior
2517: implements a tendency to flat potentials.
2518: 
2519: If impurities are expected,
2520: a prior with one fixed periodic reference potential
2521: for the whole region is no adequate choice.
2522: Near impurities one would like to 
2523: switch off the standard periodic reference potential 
2524: which in these regions will be misleading.
2525: Because it is usually not known in advance 
2526: where a given reference should be used and where not,
2527: those regions  must be identified
2528: during learning.
2529: As first example we study a prior energy
2530: similar to Eq.~(\ref{local+laplace}),
2531: \bea
2532: E(v) 
2533: &=&
2534: \frac{\lambda_1}{2}
2535: \int\!dx\,
2536: |v(x)-v_0(x)|^2 [1-B(x)]
2537:  -\frac{\lambda_2}{2}\mel{v}{\Delta}{v},
2538: \nonumber\\&&
2539: \label{eq1}
2540: \eea
2541: which allows to switch off a given reference locally
2542: by means of a binary switching function
2543: defined as
2544: $B(x)$ = $\Theta\left(|v(x)-v_0(x)|^2-\vartheta\right)$.
2545: (An average energy term 
2546: $E_U$ = $\frac{\mu}{2}(U-\kappa)^2$
2547: could easily be included.)
2548: In the prior energy (\ref{eq1}) the reference $v_0$ is only used
2549: if $|v(x)-v_0(x)|^2$ is smaller than the given threshold $\vartheta$.
2550: Starting with a smoothed version of Eq.~(\ref{eq1}) 
2551: with a real mixing function 
2552: $B(x)$ = $\sigma\left(|v(x)-v_0(x)|^2-\vartheta\right)$,
2553: the results of Fig.~\ref{p102}
2554: have been obtained by changing during iteration
2555: $\sigma(x)$ slowly from a sigmoid to a step function.
2556: Using a step function for $B$ directly from the beginning
2557: leads to nearly indistinguishable results.
2558: Compared to Fig.~\ref{p31}
2559: the reconstruction in Fig.\ \ref{p102}
2560: is improved mainly in the unperturbed region
2561: where the algorithm can now use the correct reference potential.
2562: An additional advantage is 
2563: that the final auxiliary field $B(x)$
2564: directly shows the identified impurity regions. 
2565: One sees in  Fig.~\ref{p102}
2566: that the auxiliary field $B(x)$ is always switched off
2567: if the solution $v(x)$ is similar enough to the template $v_0(x)$.
2568: 
2569: The two $v$--dependent terms in
2570: Eq.~(\ref{eq1}) can be combined
2571: [compare Eqs.~(\ref{local+laplace}) and (\ref{local+laplace2})].
2572: Skipping a term which only depends
2573: on $v$ through $B(x)$,
2574: one arrives at another
2575: prior which also implements local switching.
2576: More general, choosing the prior energy (\ref{omega-B-energy}) 
2577: for switching between
2578: two filtered differences 
2579: with two reference potentials  $v_1$ and $v_2$
2580: leads to
2581: \bea
2582: E(v) 
2583: &=&
2584:  \frac{\lambda_1}{2}
2585: \int\!dx\,
2586: [1-B(x)] |\omega_1(x)|^2 
2587: \nonumber\\&&
2588: +
2589: \frac{\lambda_2}{2}
2590: \int\!dx\,
2591: B(x) |\omega_2(x)|^2 
2592: %+\frac{\mu}{2}(U-\kappa)^2
2593: \label{eq2}
2594: ,
2595: \eea
2596: where the switching is controlled by
2597: the binary function $B(x)$
2598: =
2599: $\Theta\left(|\omega_1(x)|^2-|\omega_2(x)|^2 -\vartheta\right)$
2600: defined in terms of the filtered differences
2601: $\omega_i(x)$ = $(\partial/\partial x) [v(x)-v_i(x)]$. 
2602: A prior energy (\ref{eq2})
2603: with two different nonzero reference potentials
2604: $v_1$ and $v_2$ is obtained, for example,
2605: when a different nonzero reference potential is given
2606: for the unperturbed and the perturbed region.
2607: The number of changes
2608: in the switching function
2609: $B(x)$ = $\Theta\left(|\omega_1(x)|^2-|\omega_2(x)|^2-\vartheta\right)$,
2610: can be controlled
2611: by adding a prior term $p(B)$
2612: penalizing the number of times the function $B(x)$
2613: changes its value.
2614: To avoid local minima for binary $B(x)$,
2615: simulated annealing techniques
2616: are useful.
2617: We have obtained an initial guess for $v$, 
2618: and thus for $B(x)$,
2619: by writing $v(x)$ = $[1-c(x)]v_1(x)+c(x)v_2(x)$
2620: and optimizing the binary function $c(x)$
2621: by simulated annealing
2622: with respect to the likelihood and the additional prior $p(B)$.
2623: In particular, starting from $c(x)$ = $0$,
2624: new trial functions have been generated
2625: by selecting two points $x_1$, $x_2$ randomly
2626: and exchanging the function values zero and one
2627: in between (see Fig.~\ref{trial}).
2628: A new trial function has been accepted or rejected
2629: using the Metropolis rule
2630: $p($accept) = min$[1,\exp({-\beta_{\rm ann} \Delta E_{\rm ann}})]$
2631: with $\Delta E_{\rm ann}$ denoting the difference in the error 
2632: between actual function and new trial function.
2633: In the present case we have
2634: $E_{\rm ann}(v)$ = $\sum_{i} E(x_i|\hat x,v)$ + $E_B(v)$
2635: where
2636: $E(x_i|\hat x,v)$ = $- \ln p(x_i|\hat x,v)$ and
2637: $p(B)\propto \exp(-E_B)$.
2638: %$E_B$ = $-\ln p(B)$.
2639: The annealing temperature $1/\beta_{\rm ann}$
2640: decreases during optimization.
2641: 
2642: Fig.~\ref{p75}
2643: shows the reconstruction results
2644: using the following two reference potentials
2645: \bea
2646: v_1(x) &=& \frac{2}{3} \sin \left( \frac{2\pi}{6}x\right)
2647: ,
2648: \label{two-ref-potentialsA}
2649: \\
2650: v_2(x) &=& \sin^2 \left( \frac{2\pi}{6}x\right)
2651: {\rm sign}\left[\sin \left( \frac{2\pi}{6}x\right)\right]
2652: .
2653: \label{two-ref-potentialsB}
2654: \eea
2655: Compared to Fig.\ \ref{p102}
2656: the reconstruction is improved
2657: in the perturbed region,
2658: where the algorithm can now rely on a useful reference potential.
2659: 
2660: Finally, the switching function can be introduced
2661: as local hyperfield.
2662: As an example for a prior with hyperfield,
2663: Fig. \ref{p120} shows the reconstruction
2664: with the prior energy
2665: \be
2666: E(v,\theta) 
2667: =
2668: \frac{\lambda_1}{2}\scp{v-v_0(\theta)}{v-v_0(\theta)}
2669: -\frac{\lambda_2}{2}\mel{v}{\Delta}{v}
2670: %\nonumber\\&&
2671: %+\frac{\mu}{2}(U-\kappa)^2
2672: -\ln p(\theta)
2673: \label{eq3}
2674: ,
2675: \ee
2676: where 
2677: $v_0(x;\theta)$ = $v_1(x)[1-\theta(x)]+v_2(x)\theta(x)$
2678: with the reference potentials of 
2679: Eq.~(\ref{two-ref-potentialsA})
2680: and Eq.~(\ref{two-ref-potentialsB}).
2681: A hyperprior $p(\theta)$ has been used
2682: penalizing the number of discontinuities of the hyperfield $\theta(x)$,
2683: analogous to $p(B)$ for Fig.~\ref{p75}.
2684: The $E(v|\theta)$ part of the prior energy (\ref{eq3}) 
2685: is of the form (\ref{local-hyper-p}) with $\theta$--independent
2686: covariances. Hence the $\theta$--independent normalization factor
2687: can be skipped.
2688: An initial guess 
2689: for the  local hyperfield $\theta (x)$ has been obtained
2690: by simulated annealing as described for Fig.~\ref{p75}.
2691: As in this case optimization is required only
2692: with respect to the $\theta$--dependent parts of the posterior,
2693: optimizing $\theta(x)$ for given $v$ 
2694: is faster than optimizing $v$ through $c(x)$
2695: which requires diagonalization of the hamiltonian $H$ 
2696: for every new trial function.
2697: However, as $\theta(x)$ is independent of $v$, the hyperfield 
2698: has to be updated during iteration which has also been done
2699: by simulated annealing.
2700: As expected, a reconstruction with the non--Gaussian prior
2701: corresponding to the prior energy (\ref{eq2})
2702: is very similar to a reconstruction 
2703: using hyperfields as in  Eq.~(\ref{eq3}).
2704: 
2705: 
2706: \begin{figure}[ht]
2707: \begin{center}
2708: \setlength{\unitlength}{0.9mm}
2709: \hspace{3cm}
2710: \begin{picture}(90,20)
2711: \thicklines
2712: %lines
2713: \put(0,0){\line(1,0){40}}  
2714: %
2715: \put(50,0){\line(1,0){10}}  
2716: \put(80,0){\line(1,0){10}}  
2717: \put(60,0){\line(0,1){10}} 
2718: \put(80,0){\line(0,1){10}} 
2719: \put(60,10){\line(1,0){20}}   
2720: % points
2721: \put(10,0){\circle*{1.2}}  
2722: \put(30,0){\circle*{1.2}}  
2723: \put(60,0){\circle*{1.2}}  
2724: \put(80,0){\circle*{1.2}}  
2725: % vector
2726: \put(41,5){\vector(1,0){8}} 
2727: \put(49,5){\vector(-1,0){8}} 
2728: \end{picture}
2729: \end{center}
2730: 
2731: 
2732: \begin{center}
2733: \setlength{\unitlength}{0.9mm}
2734: \begin{picture}(90,20)
2735: \thicklines
2736: % lines
2737: \put(0,0){\line(1,0){10}}  
2738: \put(10,0){\line(0,1){10}}  
2739: \put(10,10){\line(1,0){30}}  
2740: %
2741: \put(50,0){\line(1,0){30}}  
2742: \put(80,0){\line(0,1){10}}   
2743: \put(80,10){\line(1,0){10}}   
2744: % points
2745: \put(10,0){\circle*{1.2}}  
2746: \put(30,0){\circle*{1.2}}  
2747: \put(60,0){\circle*{1.2}}  
2748: \put(80,0){\circle*{1.2}}  
2749: % vector
2750: \put(41,5){\vector(1,0){8}} 
2751: \put(49,5){\vector(-1,0){8}} 
2752: \end{picture}
2753: \end{center}
2754: 
2755: 
2756: \begin{center}
2757: \setlength{\unitlength}{0.9mm}
2758: \begin{picture}(90,20)
2759: \thicklines
2760: % lines
2761: \put(0,0){\line(1,0){20}}  
2762: \put(20,0){\line(0,1){10}}  
2763: \put(20,10){\line(1,0){20}}  
2764: %
2765: \put(50,0){\line(1,0){10}}  
2766: \put(60,0){\line(0,1){10}}  
2767: \put(60,10){\line(1,0){10}} 
2768: \put(70,0){\line(0,1){10}} 
2769: \put(70,0){\line(1,0){10}}   
2770: \put(80,0){\line(0,1){10}}   
2771: \put(80,10){\line(1,0){10}}   
2772: % points
2773: \put(10,0){\circle*{1.2}}  
2774: \put(30,0){\circle*{1.2}}  
2775: \put(60,0){\circle*{1.2}}  
2776: \put(80,0){\circle*{1.2}}  
2777: % vector
2778: \put(41,5){\vector(1,0){8}} 
2779: \put(49,5){\vector(-1,0){8}} 
2780: \end{picture}
2781: \end{center}
2782: \caption{Generation of new trial configurations
2783: for simulated annealing
2784: by selecting two points randomly
2785: and exchanging the values zero and one of the binary function in between.
2786: This mechanism has been used to optimize the binary functions 
2787: $c(x)$ and $\theta(x)$.}
2788: \label{trial}
2789: \end{figure}
2790: 
2791: 
2792: \section{Conclusion}
2793: 
2794: A nonparametric Bayesian approach
2795: has been developed and applied 
2796: to the inverse problem of reconstructing potentials
2797: of quantum systems from observational data.
2798: Relying on observational data only
2799: the problem is typically ill--defined.
2800: It is therefore essential
2801: to include adequate {\it a priori} information.
2802: Since reconstructed potentials
2803: obtained by Bayesian Inverse Quantum Mechanics (BIQM)
2804: depend sensitively on the implemented {\it a priori} information,
2805: flexible prior models are required
2806: which can be adapted to the specific situation under study.
2807: In particular, the use of hyperparameters, hyperfields,
2808: and non--Gaussian priors with auxiliary fields
2809: has been discussed in detail.
2810: In this paper we have focussed on
2811: the implementation of approximate periodicity
2812: for potentials in inverse problems of quantum statistics.
2813: The presented prior models, however, can be useful 
2814: for many empirical learning problems, 
2815: including for example regression or general density estimation.
2816: Several variants of implementing  {\it a priori} information
2817: on approximate periodicity
2818: have been tested and compared numerically.
2819: 
2820: 
2821: 
2822: %\subsection*{Acknowledgments}
2823: 
2824: \begin{thebibliography}{00}
2825: 
2826: \bibitem{Tikhonov-Arsenin-1977}
2827: A.N. Tikhonov, 
2828: V. Arsenin, 
2829: {\it Solution of Ill--posed Problems.}
2830: (New York: Wiley, 1977).
2831: 
2832: \bibitem{Kirsch-1996}
2833: A. Kirsch, 
2834: {\it An Introduction to the Mathematical Theory of Inverse Problems.}
2835: (New York: Springer Verlag, 1996).
2836: 
2837: \bibitem{Vapnik-1998}
2838: V.N. Vapnik, 
2839: {\it Statistical Learning Theory.}
2840: (New York: Wiley, 1998).
2841: 
2842: \bibitem{Honerkamp-1998}
2843: J. Honerkamp,
2844: {\it Statistical Physics.}
2845: (New York: Springer Verlag, 1998)
2846: 
2847: \bibitem{Newton-1989}
2848: R.G. Newton, 
2849: {\it Inverse Schr\"odinger Scattering in Three Dimensions.}
2850: (New York: Springer Verlag, 1989).
2851: 
2852: \bibitem{Chadan-Sabatier-1989}
2853: K. Chadan,
2854: P.C. Sabatier, 
2855: {\it Inverse Problems in Quantum Scattering Theory.}
2856: (Berlin: Springer Verlag, 1989)
2857: 
2858: 
2859: \bibitem{Chadan-Colton-Paivarinta-Rundell-1997}
2860: K. Chadan, 
2861: D. Colton,
2862: L. P\"aiv\"arinta,
2863: W. Rundell,
2864: {\it An Introduction to Inverse Scattering and Inverse Spectral Problems.}
2865: (Philadelphia: SIAM, 1997).
2866: 
2867: \bibitem{Gelfand-Levitan-1951}
2868: I.M. Gel'fand, 
2869: B.M. Levitan, 
2870: %On the determination of a differential equation from its spectral function.
2871: {Trans.\ Amer.\ Soc.\ } {\bf 1}, 253--302 (1951).
2872: 
2873: \bibitem{Kac-1966}
2874: M. Kac, 
2875: Can one hear the shape of a drum?
2876: {Am.\ Math.\ Mon.} {\bf 73}, 1--23 (1966).
2877: 
2878: \bibitem{Marchenko-1986}
2879: V.A. Marchenko, 
2880: {\it Sturm--Liouville Operators and Applications.}
2881: (Basel: Birk\-h\"auser, 1986).
2882: 
2883: \bibitem{Zakhariev-Chabanov-1997}
2884: B.N. Zakhariev, 
2885: V.M. Chabanov, 
2886: %New situation in quantum mechanics.
2887: {Inverse Problems} {\bf 13}, R47--R79 (1997).
2888: 
2889: \bibitem{Lemm-IQS-2000}
2890: J.C. Lemm, 
2891: J. Uhlig, 
2892: A. Weiguny,
2893: %Bayesian Approach to Inverse Quantum Statistics.
2894: %Technical Report, MS-TP1-99-6, M\"unster University,
2895: {Phys. Rev. Lett.} {\bf 84}, 2068 (2000). 
2896: 
2897: \bibitem{Lemm-BFT-1999}
2898: J.C. Lemm, 
2899: {\it Bayesian Field Theory.} 
2900: Technical Report No.~MS-TP1-99-1, Univ.\ of M\"unster, 
2901: {\tt arXiv:physics/9912005}, (1999).
2902: 
2903: \bibitem{Lemm-TDQ-2000}
2904: J.C. Lemm, 
2905: {\it Inverse Time--Dependent Quantum Mechanics.}
2906: Technical Report, MS-TP1-00-1, M\"unster University,\\
2907: {\tt arXiv:quant-ph/0002010}, (2000).
2908: 
2909: \bibitem{Lemm-IHF-2000}
2910: J.C. Lemm, 
2911: J. Uhlig,
2912: {Phys. Rev. Lett.} {\bf 84}, 4517 (2000) 
2913: %{\it Hartree-Fock Approximation for Inverse Many-Body Problems.}
2914: %Technical Report, MS-TP1-99-10, M\"unster University, 
2915: %{\tt arXiv:nucl-th/9908056}, (1999).
2916: 
2917: \bibitem{Helstrom:1976}
2918: C.W. Helstrom,
2919: {\it Quantum Detection and Estimation Theory.}
2920: (New York: Academic Press, 1976).
2921: 
2922: \bibitem{Holevo:1982}
2923: A.S. Holevo,
2924: {\it Probabilistic and Statistical Aspects of Quantum Theory.}
2925: (Amsterdam: North--Holland, 1982).
2926: 
2927: \bibitem{Tan:1997}
2928: M. Tan, 
2929: %An inverse problem approach to optical homodyne tomography.
2930: {J. Mod. Opt.} {\bf 44} 2233 (1997).
2931: 
2932: \bibitem{Buzek-Drobny-Derka-Adam-Wiedemann:1998}
2933: V. Bu\~zek, G. Drobn\'y, R. Derka, G. Adam, H. Wiedemann,
2934: {\tt arXiv:quant-ph/9805020}.
2935: %Bu\~zek, V., Drobn\'y, G., Derka, R., Adam, G., Wiedemann, H. (1998)
2936: %{\tt arXiv:quant-ph/9805020}.
2937: 
2938: \bibitem{Bayes-1763}
2939: T.R. Bayes, 
2940: %(1763)
2941: %An Essay Towards Solving a Problem in the Doctrine of Chances.
2942: {Phil. Trans. Roy. Soc. London} {\bf 53}, 370 (1763), 
2943: reprinted in {\it Biometrika} {\bf 45}, 293 (1958).
2944: 
2945: \bibitem{Berger-1980}
2946: J.O. Berger, 
2947: {\it Statistical Decision Theory and Bayesian Analysis.}
2948: (New York: Springer Verlag, 1980).
2949: 
2950: 
2951: \bibitem{Loredo-1990}
2952: T. Loredo,
2953: {\it From Laplace to Supernova SN 1987A: Bayesian Inference in Astrophysics.}
2954: In Foug\`ere, P.F. (ed.) 
2955: {\it Maximum-Entropy and Bayesian Methods, Dartmouth, 1989},  81--142.
2956: (Dordrecht: Kluwer, 1990),
2957: available at {\tt http://bayes.wustl.edu/gregory/gregory.html}.
2958: 
2959: \bibitem{Bernado-Smith-1994}
2960: J.M. Bernado,  
2961: A.F. Smith, 
2962: {\it Bayesian Theory.}
2963: (New York: John Wiley, 1994).
2964: 
2965: 
2966: \bibitem{Gelman-Carlin-Stern-Rubin-1995}
2967: A. Gelman, 
2968: J.B. Carlin, 
2969: H.S. Stern, 
2970: D.B. Rubin,
2971: {\it Bayesian Data Analysis.}
2972: (New York: Chapman \& Hall, 1995).
2973: 
2974: \bibitem{Sivia-1996}
2975: D.S. Sivia, 
2976: {\it Data Analysis: A Bayesian Tutorial.}
2977: (Oxford: Oxford University Press, 1996).
2978: 
2979: \bibitem{Carlin-Louis-1996}
2980: B.P. Carlin, 
2981: T.A. Louis, 
2982: {\it Bayes and Empirical Bayes Methods for Data Analysis.}
2983: (Boca Raton: Chapman \& Hall/CRC, 1996).
2984: 
2985: 
2986: \bibitem{Metropolis-Rosenbluth-Rosenbluth-Teller-Teller-1953}
2987: N. Metropolis, 
2988: A.W. Rosenbluth, 
2989: M.N. Rosenbluth, 
2990: A.H. Teller, 
2991: E. Teller,
2992: %Equation of state calculations by fast computing machines.
2993: {Journal of Chemical Physics} {\bf 21}, 1087--1092, (1953).
2994: 
2995: \bibitem{Binder-Heermann-1988}
2996: K. Binder, 
2997: D.W. Heermann, 
2998: {\it Monte Carlo simulation in statistical physics: an introduction.}
2999: (Berlin: Springer Verlag, 1988).
3000: 
3001: \bibitem{Neal-1997}
3002: R.M. Neal, 
3003: {\it Monte Carlo Implementation of Gaussian Process Models
3004: for Bayesian Regression and Classification.}
3005: Technical Report No. 9702, Dept.\ of Statistics,
3006: Univ.\ of Toronto, Canada (1997).
3007: 
3008: 
3009: \bibitem{De-Bruijn-1981}
3010: N.G. De Bruijn, 
3011: {\it Asymptotic Methods in Analysis.}
3012: (New York: Dover, 1981),
3013: originally published in 1958
3014: by the North--Holland Publishing Co., Amsterdam.
3015: 
3016: \bibitem{Bleistein-Handelsman-1986}
3017: N. Bleistein, N. Handelsman,  
3018: {\it Asymptotic Expansions of Integrals.}
3019: (New York: Dover 1986),
3020: originally published in 1975
3021: by Holt, Rinehart and Winston, New York.
3022: 
3023: \bibitem{Girosi-Jones-Poggio-1995}
3024: F. Girosi,
3025: M. Jones, 
3026: T. Poggio,
3027: %Regularization Theory and Neural Networks Architectures.
3028: {Neural Computation} {\bf 7} (2), 219--269 (1995).
3029: 
3030: \bibitem{Lemm-1996}
3031: J.C. Lemm, 
3032: {\it Prior Information and Generalized Questions.}  
3033: A.I.Memo No. 1598, C.B.C.L. Paper No. 141, 
3034: Massachusetts Institute of Technology, (1996),
3035: available at {\tt http://pauli.uni-muenster.de/${}^\sim$lemm}.
3036: 
3037: \bibitem{Lemm-1998}
3038: J.C. Lemm, 
3039: {\it How to Implement A Priori Information:
3040: A Statistical Mechanics Approach.}
3041: Technical Report MS-TP1-98-12, M\"unster University,
3042: {\tt arXiv:cond-mat/9808039} (1998).
3043: 
3044: \bibitem{Bishop-1995b}
3045: C.M. Bishop, 
3046: {\it Neural Networks for Pattern Recognition.}
3047: (Oxford: Oxford University Press, 1995).
3048: 
3049: \bibitem{lemm-mixture-1999}
3050: J.C. Lemm, 
3051: {\it Mixtures of Gaussian Process Priors.}
3052: In Proceedings of ICANN 99
3053: IEEE Conference Publication,
3054: Vol. 1, pp 292--297
3055: (London, IEEE, 1999).
3056: 
3057: 
3058: \bibitem{Holland-1975}
3059: J.H. Holland, 
3060: {\it Adaption in Natural and Artificial Systems.}
3061: (University of Michigan Press, 1975),
3062: 2nd ed. MIT Press, 1992.
3063: 
3064: \bibitem{Goldberg-1989}
3065: D.E. Goldberg, 
3066: {\it Genetic Algorithms in Search, Optimization, and Machine Learning.}
3067: (Redwood City, CA: Addison--Wesley, 1989).
3068: 
3069: \bibitem{Michalewicz-1992}
3070: Z. Michalewicz,
3071: {\it Genetic Algorithms + Data Structures = Evolution Programs.}
3072: (Berlin: Springer Verlag, 1992).
3073: 
3074: 
3075: \bibitem{Schwefel-1995}
3076: H.--P. Schwefel,
3077: {\it Evolution and Optimum Seeking.}
3078: (New York: Wiley, 1995).
3079: 
3080: \bibitem{Mitchell-1996}
3081: M. Mitchell, 
3082: {\it An Introduction to Genetic Algorithms.}
3083: (Cambridge, MA: MIT Press, 1996).
3084: 
3085: \bibitem{Kirkpatrick-Gelatt-Vecchi-1983}
3086: S. Kirkpatrick, 
3087: C.D. Gelatt Jr., 
3088: M.P. Vecchi, 
3089: %Optimization by Simulated Annealing.
3090: {Science} {\bf 220}, 671--680 (1983).
3091: 
3092: \bibitem{Mezard-Parisi-Virasoro-1987}
3093: M. Mezard, 
3094: G. Parisi,
3095: M.A. Virasoro,
3096: {\it Spin Glass Theory and Beyond.}
3097: (Singapore: World Scientific, 1987).
3098: 
3099: \bibitem{Aarts-Korts-1989}
3100: E. Aarts, J. Korts,
3101: {\it Simulated Annealing and Boltzmann Machines.}
3102: (New York: Wiley, 1989).
3103: 
3104: \bibitem{Gelfand-Mitter-1991}
3105: S.B. Gelfand,
3106: S.K. Mitter, 
3107: %Simulated Annealing Type Algorithms for Multivariate Optimization. 
3108: %Volume 6, Number 3, 1991
3109: Algorithmica {\bf 6} (3) 419-436 (1991). 
3110: %\bibitem{Gelfand-Mitter-1993}
3111: %S.B. Gelfand, S.K. Mitter, 
3112: %On Sampling Methods and Annealing Algorithms.
3113: %{Markov Random Fields -- Theory and Applications.}
3114: %(New York: Academic Press, 1993).
3115: 
3116: \bibitem{Yuille-Kosowski-1994}
3117: A.L. Yuille, 
3118: J.J. Kosowski, 
3119: %Statistical Physics Algorithm That Converge.  
3120: {Neural Computation} {\bf 6} (3), 341--356 (1994).
3121: 
3122: \bibitem{Geman-Geman-1984}
3123: S. Geman,  
3124: D. Geman, 
3125: Stochastic relaxation,
3126: Gibbs distributions and the Bayesian restoration of images.
3127: {IEEE Trans. on Pattern Analysis and Machine Intelligence}
3128: {\bf 6}, 721--741 (1984),
3129: reprinted in Shafer \& Pearl (eds.)  
3130: Readings in Uncertainty Reasoning. 
3131: (San Mateo, CA: Morgan Kaufmann, 1990)
3132: 
3133: \bibitem{Poggio-Torre-Koch-1985}
3134: T. Poggio,
3135: V. Torre, 
3136: C. Koch,
3137: Computational vision and regularization theory.
3138: {Nature} {\bf 317}, 314--319, (1985).
3139: 
3140: \bibitem{Marroquin-Mitter-Poggio-1987}
3141: J.L. Marroquin, 
3142: S. Mitter, 
3143: T. Poggio, 
3144: %Probabilistic solution of ill--posed problems in computational vision.
3145: {J. Am. Stat. Assoc.} {\bf 82}, 76--89 (1987).
3146: 
3147: \bibitem{Geiger-Girosi-1991}
3148: D. Geiger,
3149: F. Girosi,
3150: %Parallel and Deterministic Algortihms for MRFs: Surface Reconstruction.
3151: {IEEE Trans. on Pattern Analysis and Machine Intelligence}
3152: {\bf 13} (5), 401--412 (1991). 
3153: 
3154: \bibitem{Zhu-Yuille-1996}
3155: S.C. Zhu, 
3156: A.L. Yuille, 
3157: %Region Competition: Unifying Snakes, Region Growing, 
3158: %and Bayes/MDL for Multiband Image Segmentation.
3159: {IEEE Trans.\ on Pattern Analysis and Machine Intelligence}
3160: {\bf 18} (9), 884--900 (1996).
3161: 
3162: \bibitem{Roths-Maier-Friedrich-Marth-Honerkamp-2000} 
3163: T. Roths, 
3164: D. Maier, 
3165: Chr. Friedrich, 
3166: M. Marth, 
3167: J. Honerkamp,
3168: %Roths, T., Maier, D., Friedrich, C., Marth, M.,Honerkamp, J.(2000)  
3169: %Determination of the relaxation time spectrum from dynamic moduli 
3170: %using an edge preserving regularization method. 
3171: {Rheol. Acta} {\bf 39} (2) 163-173 (2000).
3172: 
3173: \bibitem{Winkler-1995}
3174: G. Winkler, 
3175: {\it Image Analysis, Random Fields and Dynamic Monte Carlo Methods.}
3176: (Berlin: Springer Verlag, 1995).
3177: 
3178: 
3179: \bibitem{Zhu-Mumford-1997}
3180: S.C. Zhu, 
3181: D. Mumford, 
3182: %Prior Learning and Gibbs Reaction--Diffusion.
3183: {IEEE Trans.\ on Pattern Analysis and Machine Intelligence}
3184: {\bf 19} (11), 1236--1250 (1997).
3185: 
3186: \bibitem{Zhu-Wu-Mumford-1997}
3187: S.C. Zhu, 
3188: Y.N. Wu, 
3189: D. Mumford, 
3190: %Minimax Entropy principle and Its Application to Texture Modeling.
3191: {\it Neural Computation}, {\bf 9} (8), 1627--1660 (1997).
3192: 
3193: 
3194: \bibitem{Press-Teukolsky-Vetterling-Flannery-1992}
3195: W.H. Press, 
3196: S.A. Teukolsky, 
3197: W.T. Vetterling, 
3198: B.P. Flannery, 
3199: {\it Numerical Recipes in C.}
3200: (Cambridge: Cambridge University Press, 1992).
3201: 
3202: \bibitem{Zhu-Rabitz-1999}
3203: W. Zhu, 
3204: H. Rabitz, 
3205: %Potential surfaces from the inversion 
3206: %of time dependent probability density data.
3207: {J. Chem. Phys.} {\bf 111}, 472--480 (1999).
3208: 
3209: \end{thebibliography}
3210: %\clearpage
3211: 
3212: \begin{figure}
3213: \begin{center}
3214: \epsfig{file=figure2.eps, width= 67mm}
3215: %\epsfig{file=ps/FLDframe162k.eps, width= 67mm}
3216: \epsfig{file=figure3.eps, width= 67mm}
3217: %\epsfig{file=ps/OFVframe162k.eps, width= 67mm}
3218: \end{center}
3219: \caption{
3220: Gaussian prior with Laplacian inverse covariance,
3221: zero reference potential,
3222: and additional noisy energy measurement.
3223: Top: %L.h.s:
3224: Empirical density $p_{\rm emp}$ (bars),
3225: true likelihood $p_{\rm true}$ (thin), 
3226: %reference likelihood $p_0$ (dashed),     
3227: reconstructed likelihood $p_{\rm BIQM}$ (thick)
3228: Bottom: %R.h.s.:
3229: Reconstructed potential $v_{\rm BIQM}$ (thick)
3230: and true potential $v_{\rm true}$ (thin).
3231: %and the reference potential $v_0$ (dashed) of Eq.~(\ref{per-ref-prior}). 
3232: (With 200 data points,
3233: $m$ = 0.25 for $\hbar$ = 1, $\beta$ = 4.
3234: Gaussian prior (\ref{gaussprior})
3235: with inverse covariance ${\bf K}_0$ = $-\lambda \Delta$,
3236: $\lambda$ = 0.2, 
3237: zero reference potential $v_0\equiv 0$,
3238: and an additional energy penalty term of the form (\ref{averageE-penal})
3239: with $\mu$ = 1000 
3240: and $\kappa$ = $-0.330$,
3241: equal to the true average energy $U(v_{\rm true})$.
3242: The solution has been obtained by iterating according to 
3243: Eq. (\ref{iteration}) with ${\bf A}$ = ${\bf K}_0$, starting 
3244: with initial guess $v^{(0)} \equiv 0$.
3245: The optimal step width $\eta$ has 
3246: been determined for each iteration by a line search algorithm.)
3247: }
3248: \label{p162}
3249: \end{figure}
3250: 
3251: 
3252: \begin{figure}
3253: \begin{center}
3254: \epsfig{file=figure4.eps, width= 67mm}
3255: %\epsfig{file=ps/FLDframe159k.eps, width= 67mm}
3256: \epsfig{file=figure5.eps, width= 67mm}
3257: %\epsfig{file=ps/OFVframe159k.eps, width= 67mm}
3258: %\epsfig{file=ps/FLDframe19k.eps, width= 67mm}
3259: %\epsfig{file=ps/OFVframe19k.eps, width= 67mm}
3260: \end{center}
3261: \caption{
3262: Gaussian prior with periodic reference potential
3263: without noisy energy measurement.
3264: Top: %L.h.s:
3265: Empirical density $p_{\rm emp}$ (bars),
3266: true likelihood $p_{\rm true}$ (thin), 
3267: reconstructed likelihood $p_{\rm BIQM}$ (thick)
3268: Bottom: %R.h.s.:
3269: Reconstructed potential $v_{\rm BIQM}$ (thick).
3270: true potential $v_{\rm true}$ (thin),  
3271: and reference potential $v_0$ (dashed) 
3272: of Eq.~(\ref{per-ref-prior}). 
3273: (Number of data points
3274: %200 data $m$ = 0.25 for $\beta$ = 4,
3275: %${\bf K}_0$ = $-\lambda \Delta$,
3276: %$\lambda$ = 0.2, 
3277: and parameters $m$, $\beta$,
3278: ${\bf K}_0$,
3279: and $\lambda$
3280: as for Fig.~\ref{p162} 
3281: but with $\mu$ = 0.
3282: The solution has been obtained by 
3283: iterating according to (\ref{iteration})
3284: as described for Fig.\ref{p162}
3285: with initial guess
3286: $v^{(0)} = v_0$.)
3287: }
3288: \label{p19}
3289: \end{figure}
3290: 
3291: 
3292: \begin{figure}
3293: \begin{center}
3294: \epsfig{file=figure6.eps, width= 67mm}
3295: %\epsfig{file=ps/FLDframe160k.eps, width= 67mm}
3296: \epsfig{file=figure7.eps, width= 67mm}
3297: %\epsfig{file=ps/OFVframe160k.eps, width= 67mm}
3298: %\epsfig{file=ps/FLDframe22k.eps, width= 67mm}
3299: %\epsfig{file=ps/OFVframe22k.eps, width= 67mm}
3300: \end{center}
3301: \caption{
3302: Gaussian prior with periodic reference potential 
3303: and additional energy measurement,
3304: improving the approximation of the minima.
3305: (Reference potential $v_0$ given in (\ref{per-ref-prior}),
3306: energy penalty term as in (\ref{averageE-penal})
3307: with $\mu$ = 1000
3308: and 
3309: $\kappa$ = $-0.330$.
3310: %equal to the true average energy $U(v_{\rm true})$.
3311: All other parameters as for Fig.~\ref{p19}.
3312: Iterated with
3313: the solution shown in Fig.~\ref{p19} 
3314: as initial guess $v^{(0)}$.)
3315: }
3316: \label{p22}
3317: \end{figure}
3318: 
3319: 
3320: \begin{figure}
3321: \begin{center}
3322: \epsfig{file=figure8.eps, width= 67mm}
3323: %\epsfig{file=ps/FLDframe155k.eps, width= 67mm}
3324: \epsfig{file=figure9.eps, width= 67mm}
3325: %\epsfig{file=ps/OFVframe155k.eps, width= 67mm}
3326: \end{center}
3327: \caption{
3328: Gaussian prior with periodic reference potential 
3329: and additional energy measurement,
3330: with initial guess $v^{(0)}$ different from that of Fig.~\ref{p22}.
3331: (Reference potential $v_0$ given in (\ref{per-ref-prior}),
3332: energy penalty term as in (\ref{averageE-penal})
3333: All  parameters as for Fig.~\ref{p22}.
3334: Iterated with
3335: initial guess $v^{(0)}(x)$ = $v_0(x)$
3336: for $0<x\le12,\,  25\le x$ and 
3337: $v^{(0)}(x)$ = $0$ for $13\le x\le 24$.)
3338: }
3339: \label{p155}
3340: \end{figure}
3341: 
3342: 
3343: \begin{figure}
3344: \begin{center}
3345: \epsfig{file=figure10.eps, width= 67mm}
3346: %\epsfig{file=ps/FLDframe182k.eps, width= 67mm}
3347: \epsfig{file=figure11.eps, width= 67mm}
3348: %\epsfig{file=ps/OFVframe182k.eps, width= 67mm}
3349: %\epsfig{file=ps/FLDframe163k.eps, width= 67mm}
3350: %\epsfig{file=ps/OFVframe163k.eps, width= 67mm}
3351: %\epsfig{file=ps/FLDframe31k.eps, width= 67mm}
3352: %\epsfig{file=ps/OFVframe31k.eps, width= 67mm}
3353: \end{center}
3354: \caption{
3355: Approximate periodicity implemented by an inverse covariance
3356: ${\bf K}_0$ 
3357: =
3358: $- \lambda (\Delta+\gamma \Delta_\theta)$
3359: as in Eq.~(\ref{periodic-cov}).
3360: % 163: (With $\gamma$ = 4.0, $\lambda$ = 0.05, 
3361: (With $\gamma$ = 1.0, $\lambda$ = 0.2, 
3362: a fixed $\theta$ = 6,
3363: energy penalty term with $\mu$ = 1000,
3364: and zero reference potential $v_0\equiv 0$.
3365: Initial guess $v^{(0)}$ = $v_0\equiv 0$.
3366: All other parameters as for Fig.~\ref{p19}.
3367: % neue Werte:
3368: % $U(v)$ = $-0.348$
3369: % $\lambda$ = 0.7
3370: % $\gamma$ = 0.4/0.7
3371: %
3372: )
3373: }
3374: \label{p31}
3375: \end{figure}
3376: 
3377: 
3378: \begin{figure}
3379: \begin{center}
3380: \epsfig{file=figure12.eps, width= 67mm}
3381: %\epsfig{file=ps/FLDframe167k.eps, width= 67mm}
3382: \epsfig{file=figure13.eps, width= 67mm}
3383: %\epsfig{file=ps/OFVframe167k.eps, width= 67mm}
3384: % 168 like 167 except mu = 1000 starting with solution of 167
3385: %\epsfig{file=ps/FLDframe168k.eps, width= 67mm}
3386: %\epsfig{file=ps/OFVframe168k.eps, width= 67mm}
3387: %\epsfig{file=ps/FLDframe102k.eps, width= 67mm}
3388: %\epsfig{file=ps/OFVframe102k.eps, width= 67mm}
3389: \end{center}
3390: \caption{
3391: Local switching between periodic 
3392: and zero reference potential.
3393: The black bars on top indicate regions where $B(x)$ = 1,
3394: i.e., regions where impurities have been identified.
3395: (Prior of Eq.\ (\ref{eq1}) with
3396: %$\lambda_1$ = 5, $\lambda_2$ = 1, $\mu$ = 10, $\kappa$ = $-0.388$,
3397: $\lambda_1$ = 0.2, $\lambda_2$ = 0.2, 
3398: $\mu$ = 0,
3399: and reference potential as in (\ref{per-ref-prior}).
3400: The $v$--dependent function $B(x)$ was slowly changed 
3401: from a sigmoid to a step function
3402: during iteration, keeping  the threshold $\vartheta$ = $0.15$ fixed.
3403: %$\beta$ = 4, $m$ = 0.25, %$\hbar^2/2m$ = 2,
3404: All other parameters as
3405: in Fig.~\ref{p19}.
3406: Initial guess $v^{(0)}$ as for Fig.~\ref{p155})
3407: }
3408: \label{p102}
3409: \end{figure}
3410: 
3411: 
3412: \begin{figure}
3413: \begin{center}
3414: \epsfig{file=figure14.eps, width= 67mm}
3415: %\epsfig{file=ps/FLDframe75k.eps, width= 67mm}
3416: \epsfig{file=figure15.eps, width= 67mm}
3417: %\epsfig{file=ps/OFVframe75k.eps, width= 67mm}
3418: \end{center}
3419: \caption{
3420: Local switching between two nonzero reference potentials.
3421: (Reference potentials $v_1$, $v_2$
3422: given in 
3423: (\ref{two-ref-potentialsA})
3424: and
3425: (\ref{two-ref-potentialsB}).
3426: Prior of Eq.\ (\ref{eq2}),
3427: with 
3428: $\lambda_1$ = $\lambda_2$ = 10, $\mu$ = 0.
3429: Step function for $B(x)$ with $\vartheta$ = 0. 
3430: An additional prior $p(B)$ on $B$ has been included 
3431: with $-\ln p(B)/10$
3432: counting the number of discontinuities of the function $B(x)$.
3433: Other parameters as
3434: in Fig.~\ref{p19}.)
3435: }
3436: \label{p75}
3437: \end{figure}
3438: 
3439: 
3440: \begin{figure}
3441: \begin{center}
3442: \epsfig{file=figure16.eps, width= 67mm}
3443: %\epsfig{file=ps/FLDframe122k.eps, width= 67mm}
3444: \epsfig{file=figure17.eps, width= 67mm}
3445: %\epsfig{file=ps/OFVframe122k.eps, width= 67mm}
3446: \end{center}
3447: \caption{
3448: Prior with local hyperfield.
3449: (Prior of Eq.\ (\ref{eq3}),
3450: with 
3451: $\lambda_1$ = 10, $\lambda_2$ = 1,
3452: $\vartheta$ = 0, $\mu$ = 0,
3453: including a hyperprior $p(\theta)$ with 
3454: %$-\ln p(\theta)/10$
3455: $E_B/10$
3456: counting the number of discontinuities of the hyperfield $\theta(x)$.
3457: Other parameters as
3458: in Fig.~\ref{p19}.)
3459: }
3460: \label{p120}
3461: \end{figure}
3462: 
3463: 
3464: 
3465: \end{document}
3466: 
3467: 
3468: 
3469: 
3470: 
3471: 
3472: 
3473: 
3474: