0005:quant-ph0005122/pp.tex

1:

2: %

3: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

4: %

5: %   file pp.tex

6: %

7: %   preprint MS-TP-00-4

8: %

9: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

10: %

11: \documentclass[epj]{svjour}

12: %\documentclass[12pt]{article}

13: \usepackage{epsfig}

14:

15: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

16: \newcommand{\be}{\begin{equation}}

17: \newcommand{\ee}{\end{equation}}

18: \newcommand{\bea}{\begin{eqnarray}}

19: \newcommand{\eea}{\end{eqnarray}}

20: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

21: \newcommand{\melo}[3]{<\!#1\,|\,#2\,|\,#3\!>}

22: \newcommand{\scpo}[2]{\mbox{$<\!#1\,|\,#2\!>$}}

23: \newcommand{\brao}[1]{\mbox{$<\!#1\,|$}}

24: \newcommand{\keto}[1]{\mbox{$|\,#1\!>$}}

25: \newcommand{\sepo}[2]{\mbox{$|\, #1\!><\! #2 \, |$}}

26: \newcommand{\avo}[1]{\mbox{$<\!#1\!>$}}

27: \newcommand{\mel}[3]{\langle #1\,|\,#2\,|\,#3\rangle}

28: \newcommand{\scp}[2]{\mbox{$\langle #1\,|\,#2\rangle$}}

29: \newcommand{\scpBig}[2]{\Big\langle #1\,\Big|\,#2\Big\rangle}

30: \newcommand{\av}[1]{\mbox{$\langle \, #1 \, \rangle$}}

31: \newcommand{\bra}[1]{\mbox{$\langle#1|$}}

32: \newcommand{\ket}[1]{\mbox{$|#1\rangle$}}

33: \newcommand{\sep}[2]{\mbox{$|\, #1\rangle\langle #2 \, |$}}

34: \newcommand{\acom}[2]{\mbox{$[#1,#2]_+$}}

35: \newcommand{\ecom}[2]{\mbox{$[#1,#2]_{\epsilon}$}}

36: \newcommand{\com}[2]{\mbox{$[#1,#2]_-$}}

37: \newcommand{\vmat}[4]{\left(\begin{array}{cc}#1&#2\\#3&#4\end{array}\right)}

38: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

39: % to remove headerbox

40: \renewcommand\makeheadbox{}

41: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

42:

43: \begin{document}

44:

45: %\headnote{}

46: \title{Bayesian Reconstruction of Approximately Periodic Potentials

47: at Finite Temperature}

48: \author{J. C.  Lemm\thanks{e-mail: {\tt lemm@uni-muenster.de}},

49:         J. Uhlig, A. Weiguny}

50: %\author{J. C.  Lemm\inst{1}, J. Uhlig\inst{1}, A. Weiguny\inst{1}}

51: \authorrunning{J. C. Lemm {\it et al.}}

52: \titlerunning{Bayesian Reconstruction of Approximately Periodic Potentials

53: at Finite Temperature}

54: \institute{

55: Institut f\"ur Theoretische Physik,\\

56: Universit\"at M\"unster, 48149 M\"unster, Germany}

57: %\author{J. C.  Lemm, J. Uhlig, A. Weiguny\\

58: %Institut f\"ur Theoretische Physik I,\\

59: %Universit\"at M\"unster, 48149 M\"unster, Germany}

60:

61:

62: %\begin{abstract}

63: \abstract{

64: The paper discusses the reconstruction of potentials

65: for quantum systems at finite temperatures

66: from observational data.

67: A nonparametric approach is developed,

68: based on the framework of Bayesian statistics,

69: to solve such inverse problems.

70: Besides the specific model of quantum statistics

71: giving the probability of observational data,

72: a Bayesian approach is essentially based

73: on {\it a priori} information available for the potential.

74: Different possibilities to implement

75: {\it a priori} information

76: are discussed in detail, including hyperparameters,

77: hyperfields, and non--Gaussian auxiliary fields.

78: Special emphasis is put on the reconstruction

79: of potentials with approximate periodicity.

80: The feasibility of the approach

81: is demonstrated for a numerical model.

82: \PACS{

83: {05.30.-d}{Quantum statistical mechanics}

84: \and

85: {02.50.Rj}{Nonparametric inference}

86: \and

87: {02.50.Wp}{Inference from stochastic processes}

88: }

89: }

90: %\end{abstract}

91:

92: \date{\today}

93: \maketitle

94:

95: \tableofcontents

96: %\clearpage

97:

98: \section{Introduction}

99:

100: A successful application of quantum mechanics to real world systems

101: relies essentially on an adequate reconstruction of the underlying potential,

102: describing the forces governing the system.

103: %necessary to set up the system Hamiltonian.

104: The reconstruction of potentials or forces

105: from available observational data

106: defines an empirical learning task.

107: It also constitutes a typical example of an inverse problem.

108: Such problems are notoriously ill--defined in the sense of Tikhonov

109: \cite{Tikhonov-Arsenin-1977,Kirsch-1996,Vapnik-1998,Honerkamp-1998}.

110: In that case additional {\it a priori} information

111: is required to yield a unique and stable solution.

112: A Bayesian framework

113: is especially well suited to include both, observational data and

114: {\it a priori} information, in a quite flexible manner.

115:

116: Inverse scattering theory

117: \cite{Newton-1989,Chadan-Sabatier-1989,Chadan-Colton-Paivarinta-Rundell-1997}

118: and inverse spectral theory

119: \cite{Gelfand-Levitan-1951,Kac-1966,Marchenko-1986,Zakhariev-Chabanov-1997}

120: are two classical research fields

121: which deal in particular with the reconstruction of potentials

122: from spectral data.

123: Both theories describe the kind of data

124: which are necessary, in addition to a given spectrum,

125: to determine a potential uniquely.

126: In inverse scattering theory these additional data

127: are for example phase shifts, obtained far away from the scatterer.

128: For the bound state problems studied in inverse spectral theory

129: these additional data may consist of a second spectrum

130: obtained for boundary conditions different from

131: those for the first spectrum.

132: The approach of Bayesian Inverse Quantum Mechanics (BIQM)

133: we will refer to in the following

134: is not exclusively designed for spectral data

135: but is able to work with quite arbitrary observational data

136: \cite{Lemm-IQS-2000}.

137: It can thus be easily adapted to a large variety of

138: different reconstruction scenarios

139: \cite{Lemm-BFT-1999,Lemm-TDQ-2000,Lemm-IHF-2000}.

140:

141: The basics of a Bayesian framework are summarized in Section \ref{bayesian}.

142: Setting up a Bayesian approach for a specific application area

143: requires the definition of two basic probabilistic models.

144: First, a {\it likelihood model} is needed

145: giving, for each possible potential,

146: the probability of the observational data.

147: The likelihood model of quantum statistics

148: is discussed in Section \ref{Likelihood-model}.

149: Second,

150: a {\it prior model} has to be chosen to implement available

151: {\it a priori} information.

152: Prior models

153: which are useful for inverse quantum statistics

154: are presented in Section \ref{Prior-models}.

155: Technically the most convenient prior models

156: are Gaussian processes, presented in Section \ref{Gaussian-processes}.

157: Section \ref{Covariances-and-approximate-symmetries}

158: shows how covariance and mean of a Gaussian process

159: can be related to {\it a priori information}

160: about approximate symmetries of the potentials to be reconstructed.

161: Section \ref{Approximate-periodicity}

162: concentrates on approximate periodicity,

163: Section \ref{discontinuities} on potentials with discontinuities.

164: Prior models are made more flexible by using {\it hyperparameters}

165: (Section \ref{hyperparameter}), or more general

166: {\it hyperfields},

167: being function hyperparameters (Section \ref{hyperfields}).

168: Related non--Gaussian priors

169: are the topic of Section \ref{Non--Gaussian-priors}.

170: Having defined liklihood and prior models

171: Section \ref{stationarity-equations}

172: discusses the equations to be solved

173: for reconstructing a potential.

174: Finally,

175: Section \ref{numerical}

176: presents numerical applications.

177:

178:

179: \section{Bayesian approach}

180: \label{bayesian}

181:

182: Empirical learning is based on observational data $D$.

183: In particular, we will distinguish ``dependent'' variables $x$,

184: representing measurement results,

185: and ``independent'' variables $O$,

186: characterizing the kind of measurement performed.

187: In the context of inverse quantum mechanics

188: the latter denotes the {\it observables} which are measured.

189: Such observables may for example be

190: the position, the momentum, or the energy of a quantum particle.

191: Variables $x$ and $O$ are assumed to be measurable

192: and represent therefore {\it visible} variables.

193: Observational data will be assumed to consist of $n$ pairs

194: $D$ = $\{(x_i,O_i)|1\le i\le n\}$ = $(x_T,O_T)$,

195: where $x_T$ and $O_T$ denote the vectors

196: with components $x_i$ or $O_i$, respectively.

197: Such data will also be called {\it training data}.

198: In empirical learning one tries to extract a

199: ``general law'' from observations.

200: In this paper the quantum potential $V$

201: to be reconstructed will represent this ``general law''.

202: (Similarly, in the Bayesian reconstruction of quantum states

203: the object to be reconstructed is the density operator

204: of an unknown state

205: \cite{Helstrom:1976,Holevo:1982,Tan:1997,Buzek-Drobny-Derka-Adam-Wiedemann:1998}.)

206: Potentials, considered not to be directly observable,

207: represent in our context the

208: {\it hidden} or {\it latent} variables.

209: We will now use the Bayesian framework to relate

210: unobservable potentials to observational data.

211:

212:

213: The Bayesian approach is a general probabilistic framework

214: to deal with empirical learning problems

215: \cite{Bayes-1763,Berger-1980,Loredo-1990,Bernado-Smith-1994,Gelman-Carlin-Stern-Rubin-1995,Sivia-1996,Carlin-Louis-1996,Lemm-BFT-1999}.

216: Predicting results of future measurements

217: on the basis of given training data

218: is achieved by means of

219: the {\it predictive probability}

220: $p(x|O,D)$

221: (or predictive density for continuous $x$),

222: which is the probability

223: of finding the value $x$ when measuring observable $O$

224: under the condition that the training data $D$ are given.

225: To calculate the predictive

226: probability a probabilistic model

227: is needed

228: which describes the measurement process.

229: Such a model is specified by

230: giving  the probability  $p(x|O,V)$

231: of finding $x$ when measuring observable $O$

232: for each possible potential $V$.

233: As $p(x|O,V)$, considered as function of $V$ for fixed $x$ and $O$,

234: is known as likelihood of $V$,

235: we will call this the {\it likelihood model}.

236: For inverse quantum problems

237: the likelihood model is given by the axioms of quantum mechanics

238: and will be discussed in Section \ref{Likelihood-model}.

239:

240: According to the rules of probability theory

241: the predictive probability can now be written as an integral

242: over the space of all possible potentials $V$,

243: \be

244: p(x|O,D)

245: = \int \!dV\, p(x|O,V)\, p(V|D)

246: .

247: \label{predictive}

248: \ee

249: We note that in Eq.(\ref{predictive}) we have assumed

250: that the probability of $x$ is completely determined

251: by giving potential and observable

252: and does not depend on the training data, $p(x|O,V,D)$ = $p(x|O,V)$,

253: and

254: that the probability of the potential given the training data

255: does not depend on the observables selected in the future,

256: $p(V|O,D)$ = $p(V|D)$.

257: If the set of possible potentials is a space of functions,

258: the integral in (\ref{predictive}) is a functional integral.

259:

260: As the likelihood model is assumed to be given,

261: learning consists in the determination of $p(V|D)$,

262: known as the {\it posterior} for $V$.

263: To this end, we relate the

264: posterior for $V$ to the

265: likelihood of $V$ under the training data

266: by applying Bayes' theorem,

267: \be

268: p(V|D)

269: =

270: \frac{p(x_T|O_T,V)\,p(V)}{p(x_T|O_T)}

271: ,

272: \label{bayestheorem}

273: \ee

274: assuming $p(V|O_T)$ = $p(V)$,

275: analogous to Eq.~(\ref{predictive}).

276: In the numerator of Eq.~(\ref{bayestheorem})

277: appears, besides the likelihood,

278: the so called prior $p(V)$.

279: This prior gives the probability of $V$

280: {\it before} training data have been collected.

281: Hence it has to comprise all

282: {\it a priori information}

283: available for the potential.

284: The need for a prior model,

285: complementing the likelihood model,

286: is characteristic for a Bayesian approach.

287: The denominator in Eq.~(\ref{bayestheorem})

288: plays the role of a normalization factor

289: and can be obtained from likelihood and prior

290: by integration over $V$ as

291: $p(x_T|O_T)$

292: = $\int \!dV\,p(x_T|O_T,V)\,p(V)$.

293:

294: From a Bayesian perspective learning appears as

295: updating the probability for $V$ caused by the arrival of new data $D$.

296: If more data become available

297: this process can be iterated,

298: the old posterior becoming the new prior

299: which is then updated yielding a new posterior.

300:

301:

302: In practice, a major difficulty is the calculation of

303: the integral over all possible $V$

304: to get the predictive probability (\ref{predictive}).

305: Even if one resorts to a discrete approximation for $x$

306: the integral (\ref{predictive}) is

307: typically still very high dimensional.

308: The key point is thus to find a feasible

309: approximation for  that integral.

310: Two approaches are common in Bayesian statistics.

311: The first one is an evaluation of the integral

312: by Monte Carlo methods

313: \cite{Gelman-Carlin-Stern-Rubin-1995,Metropolis-Rosenbluth-Rosenbluth-Teller-Teller-1953,Binder-Heermann-1988,Neal-1997}.

314: The second one, which we will pursue in the following,

315: is the so called {\it maximum a posteriori approximation} (MAP),

316: being a variant of the saddle point method

317: \cite{Berger-1980,Gelman-Carlin-Stern-Rubin-1995,De-Bruijn-1981,Bleistein-Handelsman-1986,Girosi-Jones-Poggio-1995,Lemm-1996,Lemm-1998}.

318: In MAP one assumes the posterior to be sufficiently peaked

319: around the potential $V^*$ which maximizes the posterior,

320: so that approximately

321: \be

322: p(x|O,D) \approx p(x|O,V^*)

323: ,

324: \ee

325: with

326: \be

327: V^*

328: = {\rm argmax}_{V\in{\cal V}} p(V|D)

329: = {\rm argmax}_{V\in{\cal V}} p(x_T|O_T,V) p(V)

330: .

331: \label{map-eq}

332: \ee

333: Maximizing the posterior with respect to $V\in{\cal V}$

334: means,  according to Eq.~(\ref{bayestheorem})

335: with the denominator independent of $V$,

336: maximizing the product of likelihood and prior.

337:

338: The Bayesian framework discussed

339: so far can analogously be applied to a variety of different contexts,

340: including regression, density estimation and classification

341: problems \cite{Lemm-BFT-1999}.

342: The case of a Gaussian likelihood with fixed variance, for example,

343: is known as regression problem,

344: while problems with general likelihoods

345: are known as density estimation.

346: %For BIQM we have to choose the specific likelihood model

347: %for quantum systems discussed in the next section.

348:

349:

350:

351: \section{Likelihood model of quantum statistics}

352: \label{Likelihood-model}

353:

354: The first step in applying the Bayesian framework

355: to inverse problems of quantum mechanics or quantum statistics

356: is the definition of the likelihood model \cite{Lemm-IQS-2000}.

357: This is easily obtained from the axioms of quantum mechanics.

358: Consider a system prepared in a state described by

359: a density operator $\rho$.

360: As our aim will be to reconstruct potentials $V$

361: from observational data,

362: we have to choose a $\rho$ which depends on the potential.

363: The probability to find value $x$,

364: when measuring an observable represented by the Hermitian operator $O$,

365: is given by

366: \be

367: p(x|O,V)

368: = {\rm Tr}

369: \Big(P_O(x) \, \rho(V) \Big)

370: ,

371: \label{qm-likelihood}

372: \ee

373: where

374: $P_O(x)$ = $\sum_\zeta \sep{x,\zeta}{x,\zeta}$

375: denotes the projector on the space of

376: (orthonormalized) eigenfunctions $\ket{x,\zeta}$

377: of $O$ with eigenvalue $x$ and

378: the variable $\zeta$ distinguishes

379: eigenfunctions with degenerate eigenvalues.

380:

381: In particular, for a canonical ensemble

382: at temperature $1/\beta$

383: (setting  Boltzmann's constant equal to 1)

384: the density operator reads

385: \be

386: \rho =

387: \frac{e^{-\beta H}}{{\rm Tr\,} e^{-\beta H}}

388: .

389: \label{canonical}

390: \ee

391: To be specific, we will study in the following Hamiltonians of the form

392: $H$ = $T + V$,

393: with kinetic energy

394: $T$ = $-(1/2m)\Delta$,

395: (with Laplacian $\Delta$, mass $m$,

396: and setting $\hbar$ = $1$)

397: and a

398: local potential

399: \be

400: V(x,x^\prime) =

401: v(x) \delta (x-x^\prime )

402: ,

403: \ee

404: defined by the function $v(x)$.

405: Note that the formalism presented in the following

406: works with nonlocal potentials as well,

407: numerical calculations, however, would in that case be more

408: demanding.

409: For the likelihood models corresponding to

410: time--dependent quantum systems

411: and to many--body systems

412: in Hartree--Fock approximation

413: we refer to \cite{Lemm-TDQ-2000,Lemm-IHF-2000}.

414:

415:

416: In the following we will study observational data

417: consisting of $n$ position measurements $x_i$.

418: This corresponds to choosing the position operator

419: for the observables $O_i$ = $\hat x$

420: with

421: $\hat x \ket{x_i}$ = $x_i\ket{x_i}$.

422: Hence, for a canonical ensemble,

423: the likelihood (\ref{qm-likelihood})

424: becomes

425: for a single position measurement

426: \be

427: p(x_i|\hat x,v)

428: =\sum_\alpha p_\alpha |{\phi}_\alpha(x_i)|^2

429: =\av{|{\phi} (x_i)|^2}

430: \label{pos-likelihood}

431: \ee

432: with (non--degenerate) eigenfunctions ${\phi}_\alpha$ of $H$

433: and energies $E_\alpha$,

434: i.e.,

435: $H\ket{{\phi}_\alpha}$ = $E_\alpha \ket{{\phi}_\alpha}$.

436: Angular brackets $\av{\cdots}$

437: denote a thermal expectation

438: under the probabilities

439: $p_\alpha$ =

440: $\exp(-\beta E_\alpha)/Z$ with

441: $Z$ = $\sum_\alpha \exp (-\beta E_\alpha)$

442: according to Eq.~(\ref{canonical}).

443: For independent data $D_i$ = $(x_i,O_i)$,

444: \be

445: p(x_T|O_T,v)

446: = \prod_{i=1}^n p(x_i|\hat x,v)

447: = \prod_{i=1}^n \av{|{\phi} (x_i)|^2}

448: .

449: \ee

450: A quantum mechanical measurement changes

451: the state of the system, i.e., it changes $\rho$.

452: Hence, to obtain independent data under constant $\rho$

453: requires the density operator

454: to be restored before each measurement.

455: For a canonical ensemble this means

456: to wait between two consecutive observations

457: until the system is thermalized again.

458:

459: Choosing a parametric family of potentials

460: $v(x;\xi)$

461: one could now

462: maximize the likelihood

463: with respect to the parameters $\xi$,

464: and choose as reconstructed potential

465: \be

466: v^*(x) = v(x;\xi^*)

467: \quad \mbox{with} \quad

468: \xi^*

469: = \mbox{\rm argmax}_{\xi} \, p(x_T|O_T,v(\xi) )

470: .

471: \label{ml-eq}

472: \ee

473: This is known as maximum likelihood approximation

474: and works well

475: if the number of data is large compared to the

476: flexibility of the selected parametric family of potentials.

477: This method does however not yield a unique optimal potential

478: if the flexibility is too large for the available number of observations.

479: (A possible measure of the ``flexibility'' of a parametric family

480: is given by the Vapnik-Chervonenkis dimension \cite{Vapnik-1998}

481: or variants thereof.)

482: In such cases, the inclusion of additional restrictions on $v$

483: in form of {\it a priori} information is essential.

484: This holds especially for nonparametric approaches,

485: where each number $v(x)$ is treated

486: as individual degree of freedom.

487: Including {\it a priori} information

488: generalizes the

489: maximum likelihood approximation

490: of Eq.~(\ref{ml-eq})

491: to the MAP of Eq.~(\ref{map-eq}).

492:

493:

494:

495:

496:

497:

498:

499: \section{Prior models}

500: \label{Prior-models}

501:

502: \subsection{Gaussian processes}

503: \label{Gaussian-processes}

504:

505: A finite number of observational data cannot

506: completely determine a function $v(x)$.

507: Hence,  besides observational data,

508: additional {\it a priori} information

509: is necessary to reconstruct a potential in BIQM.

510: In nonparametric approaches

511: it is advantageous to formulate

512: {\it a priori} information

513: directly in terms of the function $v(x)$ itself.

514: A convenient choice for a prior is a Gaussian process,

515: \be

516: p(v)

517: =

518: \left(\det \frac{{\bf K}_0}{2\pi}\right)^\frac{1}{2}

519: e^{-\frac{1}{2} \mel{v-v_0}{{\bf K}_0}{v-v_0}}

520: ,

521: \label{gaussprior}

522: \ee

523: where

524: \be

525: \mel{v-v_0}{{\bf K}_0}{v-v_0}

526: =

527: \ee

528: \[

529: \int\! dx \,dx^\prime\, [v(x)-v_0(x)]{\bf K}_0(x,x^\prime)

530: [v(x^\prime)-v_0(x^\prime)].

531: \]

532: The function $v_0$ is the mean or regression function,

533: representing a reference potential or template for $v$.

534: The inverse covariance ${\bf K}_0$

535: is a real symmetric, positive (semi)definite operator

536: which acts on potentials rather than on wave functions

537: and defines

538: a distance measure on the space of potentials.

539: For technical convenience one may introduce explicitly

540: a factor $\lambda$ multiplying  ${\bf K}_0$

541: to balance the influence of the prior

542: against the likelihood term.

543: A Gaussian prior as in Eq.~(\ref{gaussprior})

544: is already a quite flexible tool

545: for implementing {\it a priori} knowledge.

546: A bias towards smooth

547: functions $v(x)$, for instance,

548: can be implemented by choosing the negative Laplacian as inverse

549: covariance

550: ${\bf K}_0$ = $-\Delta$.

551: Including higher derivatives in ${\bf K}_0$

552: would result in even smoother potentials,

553: in the sense that higher derivatives of $v(x)$ become continuous.

554: For example,

555: a common smoothness prior used for regression problems is

556: the Radial Basis Function prior

557: ${\bf K}_0$ = $\exp{(-{\sigma_{\rm RBF}^2}{\Delta}/2)}$

558: \cite{Girosi-Jones-Poggio-1995}.

559:

560:

561:

562: \subsection{Covariances and approximate symmetries}

563: \label{Covariances-and-approximate-symmetries}

564:

565: Prior information on potentials $v$ can often be related to

566: approximate invariance under specific transformations

567: \cite{Lemm-BFT-1999}.

568: Typical examples of such transformations are symmetry operations

569: like translations or rotations.

570: To be specific, assume that

571: a (not necessarily local) potential $V$

572: commutes approximately, but not exactly,

573: with some unitary operator $S$,

574: \be

575: V \approx S^\dagger V S = {\bf S} V

576: ,

577: \ee

578: which defines an operator ${\bf S}$

579: acting on $V$.

580: In particular, we may choose a prior

581: $p(V)\propto\exp \{-E_{S}(V)\}$ with

582: a {\it prior energy}

583: \be

584: E_S

585: = \frac{1}{2}\scp{V-{\bf S}V}{V-{\bf S}V}

586: = \frac{1}{2}\mel{V}{{\bf K}_{0}}{V}

587: .

588: \ee

589: This shows that the expectation

590: of an approximate symmetry of $V$ under $S$

591: can be implemented by choosing a Gaussian prior with

592: inverse covariance operator

593: \be

594: {\bf K}_0 =

595: ({\bf I}-{\bf S})^\dagger ({\bf I}-{\bf S})

596: ,

597: \ee

598: where ${\bf I}$ denotes the identity operator.

599: Symmetry operations $S(\theta)$,

600: with corresponding ${\bf S}(\theta)$,

601: may depend on a parameter (vector) $\theta$.

602: Approximate invariance under $S(\theta_i)$

603: for several $\theta_i$

604: can be implemented by using the sum

605: (or integral, for continuous variables)

606: \bea

607: E_S

608: &=& \frac{1}{2}

609: %\int \! d\theta\, \scp{V-{\bf S}(\theta)V}{V-{\bf S}(\theta)V}

610: \sum_i \scp{V-{\bf S}(\theta_i)V}{V-{\bf S}(\theta_i)V}

611: %= \frac{1}{2}\int \! d\theta\, \mel{V}{{\bf K}_{0}(\theta)}{V}

612: \nonumber\\

613: &=& \frac{1}{2}\sum_i \mel{V}{{\bf K}_{0}(\theta_i)}{V}

614: .

615: \eea

616: Alternatively, one may

617: require approximate symmetry for only one value of $\theta$,

618: not fixed {\it a priori}.

619: For example, one may expect an approximately periodic potential

620: with unknown periodicity length $\theta$

621: which also has to be determined from the data.

622: Such $\theta$ are known as {\it hyperparameters}

623: and will be discussed in Section \ref{hyperparameter}.

624:

625: Lie groups

626: are continuously parameterized transformations

627: \be

628: {\bf S}(\theta)

629: =e^{\sum_i\theta_i {\bf s}_i}

630: ,

631: \ee

632: where $\theta_i$ are the real parameters

633: and the ${\bf s}_i$ = $-{\bf s}_i^T$

634: (the superscript ${}^T$ denoting the transpose)

635: are antisymmetric operators

636: representing the generators

637: of the infinitesimal transformations

638: of the Lie--group.

639: We can define a prior energy as

640: an error measure with respect to an

641: infinitesimal transformation,

642: \bea

643: E_S &=&

644: \frac{1}{2}

645: \sum_i

646: \scpBig{\frac{{V} - (1 + \theta_i {\bf s}_i) {V}}{\theta_i}}

647:     {{\frac{{V} - (1 + \theta_i {\bf s}_i){V}}{\theta_i}}}

648: \nonumber\\

649: &=&

650: \frac{1}{2}

651: \mel{{V}}{\sum_i {\bf s}_i^T {\bf s}_i}{{V}}

652: \label{Lie-error}

653: .

654: \eea

655: For instance, a Laplacian smoothness prior for a local potential $v(x)$

656: can be related to an approximate symmetry

657: under infinitesimal translations.

658: For the group of

659: $d$--dimensional translations which is generated

660: by the gradient operator $\nabla$

661: this can be verified by recalling the multidimensional Taylor formula

662: for expanding ${v}$ around $x$

663: \be

664: {\bf S}(\theta) {v}(x)

665: = e^{ \sum_i \theta_i \nabla_i } {v}(x)

666: = \sum_{k=0}^\infty

667: \frac{\left(\sum_i \theta_i \nabla_i\right)^{k}}{k!} {v}(x)

668: = {v}(x+\theta).

669: \ee

670: Up to first order

671: ${\bf S} \approx  1+\sum_i\theta_i \nabla_i$.

672: Hence, for infinitesimal translations,

673: the error measure of Eq.\ (\ref{Lie-error}) becomes

674: \bea

675: E_S

676: &=&

677: \frac{1}{2}\sum_i

678: \scpBig{\frac{{v} -(1 + \theta_i {\nabla}_i) {v}}{\theta_i}}

679:        {{\frac{{v} - (1 + \theta_i {\nabla}_i){v}}{\theta_i}}}

680: \nonumber\\

681: &=&

682: -\frac{1}{2}\mel{{v}}{\Delta}{{v}}

683: ,

684: \eea

685: assuming vanishing boundary terms.

686: This is the classical Laplacian smoothness term.

687:

688:

689:

690: \subsection{Approximate periodicity}

691: \label{Approximate-periodicity}

692:

693: In this paper we will in particular be interested

694: in potentials which are approximately periodic.

695: To measure the deviation from exact periodicity

696: for a local potential $v(x)$

697: let us define the difference operators

698: \bea

699: \left(\nabla^{R}_\theta v\right)(x)

700: &=&

701: {v}(x+\theta)-{v}(x).

702: \\

703: \left(\nabla^{L}_\theta v\right)(x)

704: &=&

705: {v}(x)-{v}(x-\theta),

706: \eea

707: For periodic boundary conditions

708: $(\nabla^{L}_\theta)^T$

709: =

710: $-\nabla^{R}_\theta$,

711: where $(\nabla^{L}_\theta)^T$

712: denotes the transpose of $\nabla^{L}_\theta$.

713: Hence, the operator

714: \be

715: -\Delta_\theta

716: = -\nabla^L_\theta\nabla^R_\theta

717: = (\nabla^R_\theta)^T\nabla^R_\theta

718: \ee

719: defined in analogy to the negative Laplacian,

720: is positive (semi)\-definite,

721: and a possible prior energy

722: is an error term

723: which measures the deviation from exact periodicity

724: for given period $\theta$,

725: \bea

726: E_S

727: &=&\frac{1}{2}\int \!dx\; |{v}(x)-{v}(x+\theta)|^2

728: \nonumber\\

729: &=&

730:  \frac{1}{2}

731: \scp{\nabla^R_\theta{v}}{\nabla^R_\theta {v}}

732: \nonumber\\

733: &=&

734: -\frac{1}{2}

735: \mel{{v}}{\Delta_\theta}{{v}}

736: .

737: \label{periodic-error}

738: \eea

739: Discretizing $v$

740: the operator $\nabla^R_\theta$

741: for periodic boundary conditions

742: becomes,

743: for example on a mesh with six points and $\theta$ = $2$,

744: the matrix

745: \be

746: \nabla^R_\theta  =

747: \left(

748: \begin{tabular}{    c     c     c     c     c    c }

749:                    $-1$&  0  & $1$ &  0  &  0  & 0   \\

750:                     0  & $-1$&  0  & $1$ &  0  & 0   \\

751:                     0  &  0  & $-1$&  0  & $1$ & 0   \\

752:                     0  &  0  &  0  & $-1$&  0  & $1$ \\

753:                    $1$ &  0  &  0  &  0  & $-1$  & 0 \\

754:                     0  & $1$ &  0  &  0  &  0  & $-1$\\

755: \end{tabular}

756: \right)

757: ,

758: \ee

759: so that

760: \be

761: -\Delta_\theta  =

762: \left(

763: \begin{tabular}{    c     c     c     c     c    c }

764:                     2  &  0  & $-1$&  0  & $-1$& 0     \\

765:                     0  &  2  &  0  & $-1$&  0  & $-1$  \\

766:                    $-1$&  0  &  2  &  0  & $-1$& 0     \\

767:                     0  & $-1$&  0  &  2  &  0  & $-1$  \\

768:                    $-1$&  0  & $-1$&  0  &  2  & 0   \\

769:                     0  & $-1$&  0  & $-1$&  0  & 2  \\

770: \end{tabular}

771: \right)

772: .

773: \ee

774:

775:

776: As every periodic function with ${v}(x)={v}(x+\theta)$

777: is in the null space of $\Delta_\theta$

778: typically another error term has to be added

779: to get a unique maximum of the posterior.

780: For example, combining

781: a prior energy (\ref{periodic-error})

782: with a Laplacian smoothness term yields

783: a Gaussian prior of the form (\ref {gaussprior})

784: with inverse covariance

785: ${\bf K}_0$ = $-\lambda (\Delta+\gamma \Delta_\theta)$

786: and prior energy

787: \be

788: E_S =

789: -\frac{\lambda}{2}

790: \mel{{v}}{\Delta+\gamma \Delta_\theta}{{v}}

791: \label{periodic-cov}

792: ,

793: \ee

794: with weighting factors $\lambda$, $\gamma$.

795: In case the period $\theta$ is not known, it can be treated

796: as hyperparameter as will be discussed in Section \ref{hyperparameter}.

797: Clearly, a nonzero reference potential $v_0$ can be included

798: in Eq.~(\ref{periodic-cov}).

799: In Eq.~(\ref{periodic-error}),

800: one may also sum over several periods

801: \be

802: E_S

803: = \frac{1}{2} \sum_{k=1}^{k_{\rm max}}

804: w(k) \int \!dx\; |{v}(x)-{v}(x+k \theta)|^2

805: ,

806: \label{periodic-error2}

807: \ee

808: where $w(k)$ is a weighting function, decreasing for larger $k$.

809: Prior energies as in (\ref{periodic-error2})

810: enforce approximate periodicity

811: over longer distances than

812: a prior energy of the form (\ref{periodic-error}).

813: The latter, on the other hand,

814: is more robust than (\ref{periodic-error2})

815: with respect to local deviations from periodicity,

816: like a locally varying frequency.

817:

818:

819: Instead of choosing an

820: inverse covariance ${\bf K}_0$

821: with symmetric functions in its null space,

822: approximate symmetries can be implemented by using

823: explicitly a symmetric reference function

824: $v_0$ = ${\bf S} v_0$ for the Gaussian prior (\ref{gaussprior}).

825: For approximate periodicity,

826: this would mean to choose

827: a periodic reference potential

828: $v_0(x)$ = $v_0(x+\theta)$ in the prior energy

829: $E_S = \frac{1}{2} \mel{{v} -v_0}{{\bf K}_0}{{v}-v_0}$

830: where ${\bf K}_0$ could be for example

831: the identity or a differential operator.

832: Thus a periodic reference potential

833: favors a specific form for the reconstructed potential,

834: including a specific frequency and phase.

835: This is different for the covariance implementation

836: (\ref{periodic-error})

837: of approximate periodicity

838: where only the frequency is relevant

839: and reference potentials can still be chosen arbitrarily.

840: They may, for example be nonperiodic functions or functions

841: with even higher symmetry

842: like in Eq.~(\ref{periodic-cov})

843: where $v_0\equiv 0$ is invariant under all translations.

844: Flexible reference potentials will be studied in Section \ref{hyperparameter}.

845:

846:

847: \subsection{Potentials with discontinuities}

848: \label{discontinuities}

849:

850: Smooth potentials $v(x)$ with discontinuities can either be approximated

851: by using discontinuous templates $v_0(x;\theta)$

852: or by eliminating matrix elements of the inverse covariance

853: which connect the two sides of the discontinuity.

854: For example, consider the discrete version

855: of a negative Laplacian

856: with unit lattice spacing and periodic boundary conditions,

857: \be

858: {\bf K}_0 = -\Delta =

859: \left(

860: \begin{tabular}{    c     c     c     c     c    c }

861:                     2  & $-1$&  0  &  0  &  0  &$-1$\\

862:                    $-1$&  2  & $-1$&  0  &  0  & 0  \\

863:                     0  & $-1$&  2  & $-1$&  0  & 0  \\

864:                     0  &  0  & $-1$&  2  & $-1$&  0  \\

865:                     0  &  0  &  0  & $-1$&  2  & $-1$ \\

866:                    $-1$&  0  &  0  &  0  & $-1$&  2  \\

867: \end{tabular}

868: \right).

869: \label{discrete1}

870: \ee

871: Decomposing the matrix (\ref{discrete1})

872: into square roots we write ${\bf K}_0$ = ${\bf W}^T {\bf W}$

873: (see also Section \ref{hyperfields})

874: where a possible square root is

875: \be

876: {\bf W} = \nabla_1^R =

877: \left(

878: \begin{tabular}{    c     c     c     c     c    c }

879:                   $-1$ & $1$ &  0  &  0  &  0  & 0   \\

880:                     0  & $-1$& $1$ &  0  &  0  & 0   \\

881:                     0  &  0  & $-1$& $1$ &  0  &  0  \\

882:                     0  &  0  &  0  & $-1$& $1$ &  0  \\

883:                     0  &  0  &  0  &  0  & $-1$& $1$ \\

884:                    $1$ &  0  &  0  &  0  &  0  & $-1$\\

885: \end{tabular}

886: \right)

887: \label{discrete2}

888: .

889: \ee

890: Similarly, the derivative operator $\partial/\partial x$

891: represents a square root of the negative Laplacian

892: for periodic boundary conditions.

893: Two regions can now be disconnected by deleting all lines of ${\bf W}$

894: which have matrix elements in both regions.

895: For instance, the first three points in the six--dimensional space

896: of Eq.~(\ref{discrete2})

897: can be disconnected from the last three points

898: by setting

899: ${\bf W}(3,\cdot )$ and ${\bf W}(6,\cdot )$ to zero,

900: \be

901: \tilde {\bf W} =

902: \left(

903: \begin{tabular}{    c     c     c  |  c     c    c }

904:                    $-1$& $1$ &  0  &  0  &  0  & 0  \\

905:                     0  & $-1$& $1$ &  0  &  0  & 0  \\

906:                     0  &  0  &  0  &  0  &  0  & 0  \\

907: \hline

908:                     0  &  0  &  0  & $-1$& $1$ &  0  \\

909:                     0  &  0  &  0  &  0  & $-1$& $1$ \\

910:                     0  &  0  &  0  &  0  &  0  &  0  \\

911: \end{tabular}

912: \right)

913: \label{discrete3}

914: .

915: \ee

916: Squaring of $\tilde {\bf W}$ yields a positive semidefinite operator

917: \be

918: \tilde {\bf K}_0 = {\tilde {\bf W}}^T \tilde {\bf W} =

919: \left(

920: \begin{tabular}{    c     c     c  |  c     c    c }

921:                     1  & $-1$&  0  &  0  &  0  & 0  \\

922:                   $-1$ &  2  & $-1$&  0  &  0  & 0  \\

923:                     0  & $-1$&  1  &  0  &  0  & 0  \\

924: \hline

925:                     0  &  0  &  0  &  1  & $-1$&  0  \\

926:                     0  &  0  &  0  & $-1$&  2  & $-1$ \\

927:                     0  &  0  &  0  &  0  & $-1$&  1  \\

928: \end{tabular}

929: \right)

930: \label{discrete4}

931: \ee

932: resulting in a smoothness prior which is ineffective

933: between points from different regions.

934: In contrast to using discontinuous templates,

935: the height of the jump at the discontinuity

936: has not to be given in advance

937: when working with

938: disconnected Laplacians (or other disconnected inverse covariances).

939: On the other hand

940: training data are then required for all separated regions

941: to determine the free constants

942: which correspond to the zero modes of the local Laplacians.

943: The reconstruction of discontinuous functions

944: with non--Gaussian priors will be discussed in

945: Section \ref{Non--Gaussian-priors}.

946:

947:

948:

949: \subsection{Hyperparameters}

950: \label{hyperparameter}

951:

952: Parameters of the prior are known as {\it hyperparameters}

953: \cite{Lemm-BFT-1999,Carlin-Louis-1996,Bishop-1995b}.

954: Like potentials $v$, hyperparameters $\theta$

955: are not directly observable and

956: represent hidden variables.

957: In the presence of hyperparameters

958: a prior for $v$ can be decomposed as follows

959: \be

960: p(v)

961: =

962: \int \!d\theta\, p(v|\theta) \, p(\theta)

963: ,

964: \label{theta-integral}

965: \ee

966: where $p(\theta)$ is known as {\it hyperprior}.

967: The likelihood does not depend on $\theta$,

968: the predictive probability (\ref{predictive}),

969: however,

970: contains then an integral over $\theta$,

971: \be

972: p(x|O,D) =

973: \label{hyper-predictive}

974: \ee

975: \[\frac{1}{p(x_T|O_T)}\int \!dv\,d\theta\,  p(x|O,v)\,

976: p(x_T|O_T,v)\, p(v|\theta) \, p(\theta)

977: .

978: \]

979: Like the integral over $v$,

980: the integral over $\theta$

981: can be calculated either by Monte Carlo methods

982: or in MAP.

983: We remark that, when a $\theta$--dependent prior

984: is written in terms of a corresponding prior energy

985: $p(v|\theta)\propto e^{-E(v|\theta)}$,

986: the normalization $\int\!dv\, e^{-E(v|\theta)}$

987: is independent of $v$ but does in general depend on $\theta$.

988:

989: Hyperparameters $\theta$ can be

990: single numbers or vectors.

991: They can describe continuous transformations,

992: like translation, rotation or scaling of template functions

993: and scaling of inverse covariance operators.

994: For real $\theta$ and differentiable posterior,

995: stationarity conditions can be found by differentiating

996: the posterior with respect to $\theta$.

997:

998:

999: Instead of continuous transformations

1000: of templates or inverse covariances

1001: one can consider

1002: a finite collection of

1003: alternative reference potentials $v_i$

1004: or alternative inverse covariances ${\bf K}_i$.

1005: For example, a potential to be reconstructed

1006: may be expected to be similar to one reference potential

1007: out of a small number of possible alternatives $v_i$.

1008: The ``class'' variables $i$ are then

1009: nothing else but hyperparameters

1010: $\theta$ with integer values.

1011:

1012: Binary parameters allow to select from two reference functions

1013: or two inverse covariances

1014: that one which fits the data best.

1015: Indeed, writing

1016: \bea

1017: v_0(\theta) &=& (1-\theta) v_1 + \theta v_2,

1018: \label{integer-hyper-t}\\

1019: {\bf K}_0(\theta) &=& (1-\theta) {\bf K}_1 + \theta {\bf K}_2

1020: \label{integer-hyper-K}

1021: ,

1022: \eea

1023: a binary $\theta\in \{0,1\}$ implements

1024: hard switching between alternative templates or inverse covariances,

1025: corresponding to a conditional prior

1026: \be

1027: p(v|\theta) \propto e^{-(1-\theta)E_1(v)-\theta E_2(v)}

1028: \label{mix-prior-bin}

1029: \ee

1030: with

1031: \bea

1032: E_1(v)  &=& \frac{1}{2}\mel{v-v_1}{{\bf K}_1}{v-v_1}

1033: ,

1034: \\

1035: E_2(v) &=& \frac{1}{2}\mel{v-v_2}{{\bf K}_2}{v-v_2}

1036: .

1037: \eea

1038: Similarly, a real $\theta\in [0,1]$

1039: in (\ref{integer-hyper-t}) or

1040: (\ref{integer-hyper-K})

1041: yields soft mixing.

1042: In that case, however,

1043: the mixing of templates in (\ref{integer-hyper-t})

1044: is not equivalent to

1045: a mixing of prior energies

1046: as in (\ref{mix-prior-bin})

1047: because for real $\theta$

1048: Eqs.~(\ref{integer-hyper-t})

1049: and (\ref{integer-hyper-K})

1050: lead to mixed terms,

1051: like

1052: $(1-\theta)\theta \mel{v-v_1}{{\bf K}_0}{v-v_2}/2$

1053: for ${\bf K}_1$ = ${\bf K}_2$.

1054: When $\theta$ takes integer values the integral

1055: $\int\! d\theta$

1056: becomes a sum $\sum_\theta$

1057: so that prior, posterior, and predictive probability

1058: have the form of a {\it finite mixture}

1059: with components $\theta$ \cite{lemm-mixture-1999}.

1060:

1061:

1062: For a moderate number of components

1063: one may be able to include

1064: all of the mixture components in the calculations.

1065: If the number of mixture components is too large

1066: one must select some of the components,

1067: for example by creating a random sample

1068: using Monte Carlo methods,

1069: or by solving for the $\theta^*$

1070: with maximal posterior.

1071: In contrast to typical optimization problems for real variables,

1072: the corresponding integer optimization problems

1073: are usually not very smooth with respect to $\theta$

1074: (with smoothness defined in terms of differences instead of derivatives),

1075: and are therefore often much harder to solve.

1076:

1077:

1078: There exists

1079: a variety of deterministic and stochastic integer optimization algorithms,

1080: which may be combined with ensemble methods like genetic algorithms

1081: \cite{Holland-1975,Goldberg-1989,Michalewicz-1992,Schwefel-1995,Mitchell-1996},

1082: and with homotopy methods like simulated annealing

1083: \cite{Kirkpatrick-Gelatt-Vecchi-1983,Mezard-Parisi-Virasoro-1987,Aarts-Korts-1989,Gelfand-Mitter-1991,Yuille-Kosowski-1994}.

1084: Annealing methods are similar to (Markov chain) Monte Carlo methods,

1085: which aim at sampling many points

1086: from a specific distribution

1087: (i.e., for example at fixed temperature).

1088: For Monte Carlo methods it is important to have (nearly) independent samples

1089: and the correct limiting distribution for the Markov chain.

1090: For annealing methods the aim is to find the correct minimum

1091: by smoothly changing the temperature from a finite value to zero.

1092: For the latter it is thus less important to model the distribution

1093: for nonzero temperatures exactly, but

1094: it is important to use an adequate

1095: cooling scheme for lowering the temperature.

1096:

1097:

1098:

1099:

1100: \subsection{Hyperfields}

1101: \label{hyperfields}

1102:

1103: The hyperparameters $\theta$

1104: considered so far have been real or integer {\it numbers},

1105: or {\it vectors} with real or integer components $\theta_i$.

1106: In this section we will discuss

1107: priors parameterized by functions,

1108: called {\it hyperfields} \cite{Lemm-BFT-1999},

1109: resulting in a still larger flexibility of the formalism.

1110: In numerical calculations where functions have to be discretized

1111: hyperfields stand for high dimensional hyperparameter vectors.

1112:

1113:

1114: Using hyperfields

1115: one has to keep in mind

1116: that a gain in flexibility at the same time

1117: tends to lower the influence of the prior.

1118: For example,

1119: consider as hyperfield a completely adaptive reference potential

1120: $\theta(x)$ = $v_0(x)$

1121: within a Gaussian prior (\ref{gaussprior}).

1122: Then, for any $v(x)$

1123: the prior energy vanishes

1124: for $v_0(x)$ = $v(x)$.

1125: In the absence of additional hyperpriors $p(\theta)$

1126: the corresponding MAP solution for the hyperfield

1127: $\theta(x)$ = $v_0(x)$

1128: is thus

1129: $\theta^*(x)$ = $v(x)$

1130: for which the Gaussian prior (\ref{gaussprior})

1131: becomes uniform in $v(x)$.

1132: Hence the price to be paid for the additional flexibility

1133: introduced by hyperfields

1134: are weaker priors

1135: and a large number of additional degrees of freedom.

1136: This can considerably complicate calculations and

1137: requires sufficiently restrictive hyperpriors for the hyperfields.

1138:

1139:

1140: Let us define {\it local hyperfields} $\theta(x)$

1141: to be  hyperfields depending on the position variable $x$.

1142: (In general hyperfields can be introduced

1143: which depend on other real variables or

1144: on several position variables.)

1145: Local hyperfields can be used, for example,

1146: to adapt templates or inverse covariances locally.

1147: To this end,

1148: we express real symmetric, positive (semi)\-definite inverse covariances

1149: by square roots or (real) {\it filter operators} ${\bf W}$,

1150: so that

1151: \be

1152: {\bf K}_0 = {\bf W}^T{\bf W}

1153: .

1154: \ee

1155: In components

1156: \be

1157: {\bf K}_0(x,x^\prime)

1158: = \int \!dx^{\prime\prime}\;

1159:    {\bf W}^T(x,x^{\prime\prime}){\bf W}(x^{\prime\prime},x^{\prime})

1160: ,

1161: \ee

1162: and therefore

1163: \bea

1164: \mel{{v}-v_0}{{\bf K}_0}{{v}-v_0}

1165: &=&

1166: \int\! dx\,dx^\prime\, dx^{\prime\prime}\,

1167: [{v}(x)-v_0(x)]

1168: \nonumber\\

1169: &&\times\;

1170: {\bf W}^T(x,x^{\prime}){\bf W}(x^{\prime},x^{\prime\prime})

1171: \nonumber\\

1172: &&\times\;

1173: [{v}(x^{\prime\prime})-v_0(x^{\prime\prime})]

1174: \nonumber\\

1175: &=&

1176: \int \! dx\, |\omega (x)|^2

1177: ,

1178: \eea

1179: where we define the  {\it filtered difference}

1180: \be

1181: \omega (x)

1182: %=\scp{W_x}{{v}-v_0}

1183: =

1184: \int \!dx^\prime \, {\bf W}(x,x^\prime)

1185: [{v}(x^\prime)-v_0(x^\prime )]

1186: .

1187: \label{filtered-diff}

1188: \ee

1189: For instance,

1190: a square root (\ref{discrete2})

1191: of the discrete negative Laplacian (\ref{discrete1})

1192: corresponds for $v_0\equiv 0$ to a filtered difference

1193: $\omega(x)$ = $v(x+1)-v(x)$.

1194:

1195: The exponent of a Gaussian prior for a local potential ${v}$

1196: can thus be written as an integral over $x$,

1197: \be

1198: p({v}) \propto e^{-E(v)}

1199: ;\quad

1200: E(v) = \frac{1}{2}\int \!dx \, |\omega(x)|^2

1201: .

1202: \label{Gauss-omega}

1203: \ee

1204: In contrast to

1205: Eqs.~(\ref{integer-hyper-t}) and (\ref{integer-hyper-K})

1206: the representation (\ref{Gauss-omega})

1207: is well suited for introducing local hyperfields.

1208: For instance,

1209: an adaptive prior

1210: \be

1211: p({v}|\theta) = e^{-E(v|\theta)}

1212: ,

1213: \label{hyper-prior}

1214: \ee

1215: with a real local hyperfield

1216: $\theta(x)\in [0,1]$

1217: can be obtained by

1218: mixing locally two alternative filtered differences

1219: \be

1220: \omega (x;\theta)

1221: = [1-\theta(x)] \, \omega_1(x) + \theta(x) \,\omega_2(x)

1222: \label{hyper-function-omega}

1223: ,

1224: \ee

1225: where the two $\omega_i$

1226: may differ in their filters and/or reference potentials.

1227: In that case

1228: the hyperfield $\theta(x)$

1229: can locally select

1230: the best mixture of the filtered differences

1231: $\omega_i$, i.e.,

1232: that one which yields in (\ref{hyper-prior})

1233: the largest probability

1234: or smallest prior energy

1235: \bea

1236: E(v|\theta)

1237: &=& \frac{1}{2}

1238: \int \!dx |\omega (x;\theta)|^2

1239: +\ln Z_{\cal V}(\theta)

1240: \label{local-hyper-p-r}

1241: \\

1242: &=&\frac{1}{2} \! \int \!\!dx

1243:  \Big| [1-\theta(x) ] \omega_1(x)

1244:  +\theta(x) \omega_2(x)

1245:  \Big|^2

1246: \!+\ln Z_{\cal V}(\theta)

1247: .

1248: \nonumber

1249: \eea

1250: Here the normalization factor

1251: \be

1252: Z_{\cal V}(\theta)

1253: =

1254: \int_{v\in {\cal V}}

1255: d \!v\, e^{-\frac{1}{2} \int \!dx |\omega (x;\theta)|^2}

1256: ,

1257: \ee

1258: depends in general on $\theta$

1259: if the filters of the $\omega_i$ differ.

1260: Clearly,

1261: allowing an unbounded $-\infty\le \theta(x)\le \infty$

1262: any function $\omega (x;\theta)$

1263: can be written in the form of Eq.~(\ref{hyper-function-omega}),

1264: provided $\omega_1(x)\ne \omega_2(x)$ for all $x$.

1265:

1266:

1267:

1268: In contrast to soft mixing with real functions $\theta(x)$

1269: a binary local hyperfield $\theta(x)\in \{0,1\}$

1270: implements hard switching

1271: between alternative filtered differences.

1272: Since in the binary case

1273: $\theta^2$ = $\theta$,

1274: $(1-\theta)^2$ = $(1-\theta)$,

1275: and

1276: $\theta(1-\theta)$ = $0$,

1277: Eq.~(\ref{local-hyper-p-r})

1278: becomes [compare Eq.~(\ref{mix-prior-bin})]

1279: \bea

1280: E({v}|\theta)

1281: &=&

1282: \frac{1}{2} \int \!dx \,

1283:  \Big( [1-\theta(x)]|\omega_1(x)|^2

1284: \nonumber\\

1285: &&

1286: \quad +\theta(x) |\omega_2(x)|^2

1287:  \Big)

1288: +\ln Z_{\cal V}(\theta)

1289: \label{local-hyper-p}

1290: ,

1291: \eea

1292: while for real $\theta(x)$

1293: Eq.~(\ref{local-hyper-p-r})

1294: includes a mixed term in $\omega_1\omega_2$.

1295: It is sometimes helpful to transform

1296: an unrestricted real hyperfield $-\infty\le g(x)\le\infty$

1297: into a bounded real hyperfield

1298: $\theta(x)\in [0,1]$ by

1299: \be

1300: \theta(x) = \sigma(g(x)-\vartheta)

1301: ,

1302: \label{def-B}

1303: \ee

1304: with threshold $\vartheta$

1305: and sigmoidal transformation

1306: \be

1307: \sigma(x) = \frac{1}{1+e^{-2\nu x}}

1308: = \frac{1}{2} (\tanh(\nu x) + 1)

1309: .

1310: \label{sigmoid-bsp}

1311: \ee

1312: In the limit $\nu\rightarrow\infty$

1313: the transformation $\sigma(x)$ of (\ref{sigmoid-bsp})

1314: approaches the step function $\Theta(x)$

1315: and (\ref{def-B}) results in a binary

1316: $\theta(x)$ = $\Theta(g(x)-\vartheta)\in \{0,1\}$.

1317:

1318:

1319: Analogous to the global mixing or global switching

1320: in Eq.~(\ref{integer-hyper-t})

1321: and Eq.~(\ref{integer-hyper-K}),

1322: the alternative filtered differences $\omega_i (x)$

1323: at position $x$

1324: in Eq.~(\ref{hyper-function-omega})

1325: can be constructed by local mixing or switching

1326: between

1327: template functions

1328: $v_1(x^\prime)$, $v_2(x^\prime)$

1329: or filters

1330: ${\bf W}_1(x,x^\prime)$, ${\bf W}_2(x,x^\prime)$

1331: using a local hyperfield $\theta(x)$,

1332: \bea

1333: v_x(x^\prime;\theta)

1334: &=&

1335: [1-\theta(x)] \, v_1(x^\prime) + \theta(x)\, v_2(x^\prime),

1336: \label{hyper-function-t}

1337: \\

1338: {\bf W}(x,x^\prime; \theta) &=&

1339: [1\! -\theta(x)] {\bf W}_{1}(x,x^\prime)

1340: \! + \theta(x) {\bf W}_{2}(x,x^\prime)

1341: \label{hyper-function-W}

1342: .\;\;

1343: \eea

1344: It is important to note that the local templates or

1345: reference potentials

1346: $v_x(x^\prime; \theta)$

1347: are functions

1348: of $x^\prime$ and $x$.

1349: Indeed, to obtain a filtered difference $\omega(x;\theta)$ at position $x$,

1350: a reference function $v_x$ is needed for all $x^\prime$ for which

1351: the corresponding ${\bf W}(x,x^\prime)$

1352: is nonzero, since

1353: \be

1354: \omega(x;\theta )

1355: =

1356: \int\!dx^\prime\,

1357: {\bf W}(x,x^\prime)

1358: [{v}(x^\prime)-v_x(x^\prime;\theta)]

1359: .

1360: \ee

1361: In this way the whole template function $v_x(x^\prime;\theta)$,

1362: rather than individual function values $v_0(x,\theta)$,

1363: is adapted individually for every local filtered difference.

1364: In particular, the local reference potentials of Eq.~(\ref{hyper-function-t})

1365: have to be distinguished from one global,

1366: locally adapted reference potential

1367: \be

1368: v_0(x^\prime;\theta)

1369: =

1370: [1-\theta(x^\prime )] \, v_{1}(x^\prime)

1371:  + \theta(x^\prime )\, v_{2}(x^\prime)

1372: \label{mixing-t}

1373: ,

1374: \ee

1375: which at first glance seems to be the natural generalization of

1376: Eq.~(\ref{integer-hyper-t}) to local hyperfields.

1377: Only in Gaussian prior terms

1378: with the identity ${\bf I}$ as covariance,

1379: local template functions $v_x(x^\prime, \theta)$

1380: are not required.

1381: In that case $v_{x}(x^\prime;\theta)$

1382: is only needed for $x$ = $x^\prime$

1383: and we may directly write

1384: $v_{x}(x^\prime;\theta)$

1385: =

1386: $\tilde v_{0}(x^\prime;\theta)$,

1387: skipping the variable $x$, and obtain the prior energy

1388: \be

1389: \frac{1}{2}\int\! dx\; |\omega(x;\theta)|^2

1390: =

1391: \frac{1}{2}\scp{v-\tilde v_0(\theta)}{v-\tilde v_0(\theta)}

1392: .

1393: \label{identity-cov}

1394: \ee

1395: We remark that one can also generalize

1396: Eq.~(\ref{hyper-function-t}),

1397: which uses the same

1398: $v_1(x^\prime)$, $v_2(x^\prime)$ for all $x$,

1399: by working with reference potentials

1400: $v_{1,x}(x^\prime)$, $v_{2,x}(x^\prime)$

1401: which vary with the position $x$

1402: at which the filtered difference $\omega(x)$

1403: is required. This yields

1404: \be

1405: v_x(x^\prime;\theta)

1406: =

1407: [1-\theta(x)] \, v_{1,x}(x^\prime)

1408: + \theta(x)\,  v_{2,x}(x^\prime)

1409: .

1410: \label{hyper-function-t-nonlocal}

1411: \ee

1412:

1413: For binary $\theta(x)$

1414: Eq.~(\ref{hyper-function-W})

1415: corresponds

1416: to an inverse covariance

1417: \bea

1418: {\bf K}_0(\theta)

1419: &=& \int \!dx\; {\bf K}_x(\theta)

1420: =

1421: \int \!dx \, {W}_{x}(\theta){W}^T_{x}(\theta)

1422: \nonumber\\

1423: &=& \int \!\! dx

1424: \left(

1425: [1-\theta(x)] {W}_{1,x}{W}^T_{1,x}

1426: +  \theta(x)  {W}_{2,x}{W}^T_{2,x}

1427: \right)

1428: \qquad

1429: \label{invcov}

1430: \eea

1431: with

1432: %${\bf K}_x(\theta)$ = ${W}_{x}(\theta){W}^T_{x}(\theta)$

1433: \be

1434: {\bf K}_x(\theta) = {W}_{x}(\theta){W}^T_{x}(\theta)

1435: \ee

1436: written as dyadic product of the vector

1437: $W_{x}(\theta )$ = ${\bf W}(x,\cdot\,;\theta)$

1438: and with analogously defined $W_{i,x}$ = ${\bf W}_i(x,\cdot)$.

1439: For $\theta$--dependent inverse covariances

1440: the normalization factors $Z_{\cal V}(\theta)$ become

1441: $\theta$--dependent. They have to be included

1442: when integrating over $\theta$ or

1443: solving for the optimal $\theta$ in MAP.

1444:

1445: In Eqs.~(\ref{hyper-function-t}) and (\ref{hyper-function-W})

1446: it is straightforward to introduce

1447: two binary hyperfields $\theta$, $\theta^\prime$,

1448: one for the reference potential $v_x$ and one for

1449: the filter ${\bf W}$.

1450: This results in a conditional prior

1451: \bea

1452: p({v}|\theta,\theta^\prime)

1453: &\propto&

1454: e^{-\frac{1}{2}

1455: \int\!dx\,

1456: \mel{{v} - v_x(\theta)}{{\bf K}_x(\theta^\prime)}{{v}-v_x(\theta)}

1457: }

1458: \nonumber\\

1459: &=&

1460: e^{-\frac{1}{2} \int \!dx\, |\omega(x;\theta,\theta^\prime)|^2}

1461: .

1462: \eea

1463: Here we can write

1464: \bea

1465: \int\! dx \, |\omega(x;\theta,\theta^\prime)|^2

1466: &=&

1467: \mel{{v}-v_0(\theta,\theta^\prime)}

1468: {{\bf K}_0(\theta^\prime)}{{v}-v_0(\theta,\theta^\prime)}

1469: \nonumber\\

1470: &&+

1471: \int \!dx \,

1472: \mel{v_x(\theta)}{{\bf K}_x(\theta^\prime)}{v_x(\theta)}

1473: \nonumber\\

1474: &&-

1475: \mel{v_0(\theta,\theta^\prime)}

1476:   {{\bf K}_0(\theta^\prime)}{v_0(\theta,\theta^\prime)}

1477: ,

1478: \label{eff-E}

1479: \eea

1480: with an effective template $v_0(\theta,\theta^\prime)$

1481: given by

1482: \be

1483: v_0(\theta ,\theta^\prime)

1484: =

1485: {\bf K}_0(\theta^\prime)^{-1}

1486: \int\!dx\, {\bf K}_x(\theta^\prime ) \,  v_x(\theta)

1487: %{\bf K}_0(\theta^\prime)

1488: ,

1489: \ee

1490: and effective inverse covariance ${\bf K}_0(\theta^\prime)$

1491: =

1492: $\int \! dx\, {\bf K}_x(\theta^\prime)$

1493: as in Eq. (\ref{invcov}).

1494: Since

1495: the last two terms in Eq.~(\ref{eff-E}) are ${v}$--independent constants

1496: (only depending on $\theta$, $\theta^\prime$)

1497: we see that for fixed hyperfields

1498: %$\theta$, $\theta^\prime$

1499: this prior is minimized by

1500: $v$ = $v_0(\theta,\theta^\prime)$.

1501: For given hyperparameters $\theta$, $\theta^\prime$

1502: we can write

1503: $p({v}|\theta,\theta^\prime)\propto e^{-E({v}|\theta,\theta^\prime)}$

1504: with a prior energy of the form

1505: $E({v}|\theta,\theta^\prime)$

1506: =

1507: $\frac{1}{2}

1508: \mel{{v}-v_0(\theta,\theta^\prime)}

1509:     {{\bf K}_0(\theta^\prime )}{{v}-v_0(\theta,\theta^\prime)}$.

1510:

1511: As the product of Gaussians is again a Gaussian

1512: several Gaussian prior factors can easily be combined.

1513: In this way one can implement a nonlocal property like smoothness

1514: and still avoid local template functions $v_x(x^\prime, \theta)$

1515: by combining a Gaussian prior with ${\bf K}_0$ = ${\bf I}$

1516: as in (\ref{identity-cov})

1517: with a Gaussian prior with nondiagonal covariance and

1518: zero (or fixed) template,

1519: \be

1520: E({v}|\theta) =

1521: \frac{1}{2}

1522: \scp{{v}-\tilde v_0(\theta)}{{v}-\tilde v_0(\theta)}

1523: +\frac{1}{2}

1524: \mel{{v}}{{\bf K}}{{v}}

1525: .

1526: \label{local+laplace}

1527: \ee

1528: Combining both terms yields

1529: \bea

1530: E({v}|\theta)

1531: &=&

1532: \frac{1}{2}

1533: \bigg(

1534: \mel{{v}-v_0(\theta)}{{\bf K}_0}{{v}-v_0(\theta)}

1535: \nonumber\\

1536: &&\quad  + \;\;

1537: \mel{\tilde v_0(\theta)}

1538:     {{\bf I}-{\bf K}_0^{-1}}{\tilde v_0(\theta)}

1539: \bigg)

1540: ,

1541: \label{local+laplace2}

1542: \eea

1543: with the second term

1544: being independent of $v$

1545: and

1546: with effective template and effective inverse covariance

1547: \be

1548: v_0(\theta) = {\bf K}_0^{-1} \tilde v_0(\theta)

1549: ,\quad

1550: {\bf K}_0 = {\bf I}+ {\bf K}

1551: .

1552: \ee

1553: For differential operators ${\bf K}_0$

1554: the effective $v_0(\theta)$

1555: is thus a smoothed version of $\tilde v_0(\theta)$.

1556:

1557:

1558: The extreme case would be to treat

1559: $v_0$ and ${\bf W}$ itself as unrestricted hyperfields.

1560: As already discussed,

1561: this just eliminates the corresponding prior term.

1562: Hence, to restrict the flexibility,

1563: typically a smoothness hyperprior may be imposed

1564: to prevent highly oscillating functions $\theta (x)$.

1565: For real $\theta(x)$, for example, a smoothness prior

1566: like a Laplacian prior $\mel{\theta}{\! -\!\Delta}{\theta}/2$ can be used

1567: in regions where it is defined.

1568: (The space of functions

1569: for which a smoothness prior

1570: with discontinuous templates is defined

1571: depends on the locations of the discontinuities.)

1572: An example of a non--Gaussian hyperprior is

1573: \be

1574: p(\theta) \propto

1575: e^{-\frac{\tau}{2} \int\!dx \, C_\theta(x)}

1576: ,

1577: \label{hyperprior-C}

1578: \ee

1579: where $\tau$ is a constant

1580: and

1581: \be

1582: C_\theta(x) =

1583: \sigma \left( \left(\frac{\partial \theta}{\partial x}\right)^2

1584:   - \vartheta_\theta\right)

1585: ,

1586: \label{jumps}

1587: \ee

1588: with a sigmoid $\sigma(x)$ as in  (\ref{sigmoid-bsp}).

1589: For $\nu\rightarrow\infty$ the sigmoid approaches a step function

1590: and $C_\theta(x)$

1591: becomes zero at locations where the square of the first derivative

1592: is smaller than a certain

1593: threshold $0\le \vartheta_\theta < \infty$,

1594: and one otherwise.

1595: For discrete $x$ one can analogously

1596: count the number of jumps

1597: larger than a given threshold.

1598: One can then penalize the number $N_d(\theta)$

1599: of discontinuities

1600: where

1601: $\left(\partial \theta/\partial x\right)^2$ = $\infty$

1602: and use

1603: \be

1604: p(\theta) \propto e^{-\frac{\tau}{2} N_d(\theta)}

1605: .

1606: \label{hyperprior-Nd}

1607: \ee

1608: In the case of a binary field

1609: this corresponds

1610: to counting the number of times the field changes its value.

1611: The expression $C_\theta$

1612: of Eq. (\ref{jumps})

1613: can be generalized to

1614: \be

1615: C_\theta(x)

1616: = \sigma\left( |\omega_\theta(x)|^2-\vartheta_\theta\right)

1617: ,

1618: \label{Cdef}

1619: \ee

1620: where,

1621: analogous to Eq.~(\ref{filtered-diff}),

1622: \be

1623: \omega_\theta(x)

1624: =

1625: \int \!dx^\prime \,

1626: {\bf W}_\theta(x,x^\prime)

1627: [\theta(x^\prime)-t_\theta(x^\prime)]

1628: ,

1629: %\label{filtered-diff-theta}}

1630: \ee

1631: with template

1632: $t_\theta(x^\prime)$

1633: representing the expected form for the hyperfield,

1634: and a filter operator

1635: ${\bf W}_\theta$

1636: defining a distance measure for hyperfields.

1637: Parameters of the hyperprior like $\tau$

1638: in Eq. (\ref{hyperprior-C}) or Eq.~(\ref{hyperprior-Nd})

1639: can be treated as higher level hyperparameters.

1640:

1641:

1642:

1643: \subsection{Non--Gaussian priors and auxiliary fields}

1644: \label{Non--Gaussian-priors}

1645:

1646: As an alternative to introducing hyperfields $\theta(x)$

1647: one can work with priors which are

1648: explicitly non--Gaussian with respect to $v$.

1649: This can be done by introducing auxiliary fields

1650: $B(x;v)$

1651: whose function values are not considered

1652: as independent variables

1653: but are directly defined as functionals of $v$.

1654: (For the sake of simplicity

1655: we will for $B(x;v)$ also write

1656: $B(x)$ or $B(v)$, depending on the context.)

1657: Like hyperfields,

1658: auxiliary fields

1659: can select locally the best adapted filtered difference

1660: from a set of alternative $\omega_i$.

1661:

1662: For instance, consider the auxiliary field

1663: [compare with Eqs.~(\ref{def-B}) or (\ref{Cdef})]

1664: \be

1665: B(x) =

1666: \sigma \left(u(x) - \vartheta \right)

1667: ,

1668: \label{jumps2}

1669: \ee

1670: where

1671: \be

1672: u(x) = |\omega_1(x)|^2-|\omega_2(x)|^2

1673: ,

1674: \ee

1675: $\vartheta$ represents a threshold,

1676: $\sigma (x)$ a sigmoidal function as in (\ref{sigmoid-bsp}),

1677: and the $\omega_i$ are filtered differences

1678: defined in terms of $v$

1679: according to Eq.~(\ref{filtered-diff}).

1680: Again a binary field $B(x)$ is obtained

1681: by letting the sigmoid approach the step function.

1682: Because the $\omega_i$ depend on $v$,

1683: it is clear from the definition (\ref{jumps2})

1684: that the auxiliary field $B(x)$ is no independent hyperfield

1685: but has values being functionals of ${v}$.

1686: Notice that $B(x)$

1687: is nonlocal with respect to ${v}(x)$

1688: if $\omega_i(x)$ is nonlocal;

1689: a value $B(x)$ then depends

1690: on more than one ${v}(x)$--value.

1691: For a negative Laplacian prior in one--dimension

1692: Eq.~(\ref{jumps2}) reads,

1693: \be

1694: B(x) =

1695: \sigma \left(

1696: \left|\frac{\partial ({v}-v_1)}{\partial x}\right|^2

1697: -\left|\frac{\partial ({v}-v_2)}{\partial x}\right|^2

1698:             - \vartheta

1699: \right)

1700: .

1701: \label{jumps2b}

1702: \ee

1703: While auxiliary fields $B(x)$ are directly determined by ${v}$,

1704: hyperfields are indirectly coupled to $v$

1705: through the MAP stationarity equations.

1706: Conversely,

1707: an auxiliary field $B(x)$ can be treated formally

1708: as independent hyperfield

1709: if a Lagrange multiplier term

1710: \mbox{$\lambda

1711: \left[

1712: B(x)-\sigma \left(u(x) - \vartheta

1713: \right)

1714: \right]$}

1715: is added to the prior energy

1716: in the limit $\lambda\rightarrow\infty$.

1717:

1718:

1719: Like hyperfields $\theta(x)$

1720: auxiliary fields $B(x)$

1721: can be used

1722: to adapt reference potentials $v_0$ or filters ${\bf W}$.

1723: However,

1724: a prior as in  Eq.~(\ref{gaussprior})

1725: is non--Gaussian with respect to $v$

1726: if $v_0(B)$ and ${\bf K}_0(B)$

1727: depend on $B$ and thus also on $v$.

1728: Furthermore, analogous to hyperpriors $p(\theta)$,

1729: additional prior terms $p(B(v))\propto \exp(-E_B(v))$ for $v$

1730: can be included,

1731: formulated in terms of an auxiliary field $B(x)$.

1732: As in Eq.~(\ref{local-hyper-p})

1733: a binary $B(x)$ can switch between two filtered differences

1734: \be

1735: |\omega(x;B)|^2

1736: =

1737: [1-B(x)] |\omega_1(x)|^2

1738: +

1739: B(x) |\omega_2(x)|^2

1740: ,

1741: \label{binary-B}

1742: \ee

1743: within a (non--Gaussian) prior for ${v}$

1744: \be

1745: p({v}) \propto

1746: e^{-E(v)-E_B(v)}

1747: ,

1748: \label{b-prior}

1749: \ee

1750: where the normalization factor

1751: $Z$ = $\int \!dv\,e^{-E(v)-E_B(v)}$

1752: of (\ref{b-prior})

1753: is by definition independent of $v$.

1754: Hence it can be skipped for MAP calculations

1755: also for non--Gaussian $p(v)$.

1756: In Eq.~(\ref{b-prior})

1757: \be

1758: E(v) = \frac{1}{2} \int\!dx\,

1759: \left(

1760: [1-B(x)] |\omega_1(x)|^2

1761: +

1762: B(x) |\omega_2(x)|^2

1763: \right)

1764: ,

1765: \label{omega-B-energy}

1766: \ee

1767: according to Eq.~(\ref{binary-B}),

1768: while $E_B(v)$ depends on $v$

1769: only through $B(v)$.

1770: For example, the number of switchings

1771: can be restricted by taking

1772: \be

1773: E_B(v) = \frac{\tau}{2}N_d(B)

1774: ,

1775: \label{additional-b-prior}

1776: \ee

1777: where

1778: $N_d(B)$ counts the number of discontinuities of $B(x)$.

1779: Other choices, for real $B(x)$,

1780: are quadratic energies

1781: \be

1782: E_B(v) = \frac{\tau}{2}\int \!dx |\omega_B(x)|^2

1783: \label{quad-err}

1784: \ee

1785: or non--quadratic energies of the form

1786: \be

1787: E_B(v) = \frac{\tau}{2}\int \!dx \,C_B (x)

1788: \label{non-quad-err}

1789: \ee

1790: where, similar to (\ref{Cdef}),

1791: \be

1792: C_B(x)

1793: = \sigma\left( |\omega_B(x)|^2-\vartheta_B\right)

1794: .

1795: \label{cforb}

1796: \ee

1797: and

1798: \be

1799: \omega_B(x)

1800: =

1801: \int \!dx^\prime \,

1802: {\bf W}_B(x,x^\prime)

1803: [B(x^\prime)-t_B(x^\prime)]

1804: ,

1805: \label{filtered-diff-ng}

1806: \ee

1807: is a filtered difference of $B$

1808: with filter operator

1809: ${\bf W}_B$

1810: and template

1811: $t_B$.

1812:

1813:

1814:

1815: Let us compare a non--Gaussian prior

1816: built of prior energies \ (\ref{omega-B-energy})

1817: and (\ref{additional-b-prior})

1818: for a binary auxiliary field (\ref{jumps2})

1819: \be

1820: p(v) \propto

1821: e^{-\frac{1}{2} \int\!dx\,

1822: \left(

1823: [1-B(x)] |\omega_1(x)|^2

1824: +

1825: B(x) |\omega_2(x)|^2

1826: \right)

1827: -\frac{\tau}{2} N_d(B)

1828: }

1829: ,

1830: \label{cmpB}

1831: \ee

1832: with the similar--looking

1833: combination of Gaussian prior (\ref{local-hyper-p})

1834: with hyperprior (\ref{hyperprior-Nd})

1835: for a binary hyperfield,

1836: \be

1837: p({v},\theta)

1838: =p(v|\theta) p(\theta)

1839: \propto

1840: \label{omega-theta-energy}

1841: \label{cmpT}

1842: \ee

1843: \[

1844: e^{-\frac{1}{2} \int\!dx\,

1845: \left[

1846: (1-\theta(x)) |\omega_1(x)|^2

1847: +

1848: \theta(x) |\omega_2(x)|^2

1849: \right]

1850: -\frac{\tau}{2} N_d(\theta)

1851: -\ln Z_{\cal V}(\theta)

1852: }

1853: .

1854: \]

1855: Eq.~(\ref{cmpT}) works with conditional probabilities $p(v|\theta)$,

1856: hence the corresponding

1857: normalization factors are in general $\theta$--dependent

1858: and have to be included

1859: for MAP calculations.

1860: Typically, MAP solutions

1861: for $B$, $N_d(B)$ and $C_B$

1862: being directly defined in terms of the corresponding MAP solution for $v$

1863: are different from the MAP solutions for $\theta$,

1864: $N_d(\theta)$ and $C_\theta$,

1865: respectively.

1866: However, if the filtered differences $\omega_i$

1867: in Eq.~(\ref{omega-theta-energy})

1868: differ only in their templates,

1869: the normalization term can be skipped.

1870: Then

1871: assuming

1872: $\vartheta$ = $0$,

1873: $p(\theta) \propto  1$,

1874: $p(B) \propto 1$

1875: the two equations are equivalent

1876: for

1877: $\theta(x)$ = $\Theta\left(|\omega_1(x)|^2-|\omega_2(x)|^2\right)$.

1878: In the absence of hyperpriors,

1879: it is indeed easily seen

1880: that this is a selfconsistent solution for $\theta$

1881: for every given ${v}$.

1882: In general, however, when

1883: hyperpriors are included,

1884: another solution for $\theta$

1885: may have a larger posterior.

1886:

1887:

1888: Hyperpriors $p(\theta)$

1889: or additional auxiliary prior terms $p(B)$

1890: can be useful to enforce specific

1891: global constraints for $\theta(x)$ or $B(x)$.

1892: In natural images, for example, discontinuities

1893: are expected to form closed curves.

1894: Priors or hyperpriors,

1895: organizing discontinuities along lines or closed curves,

1896: are thus important for image segmentation or image restoration

1897: \cite{Geman-Geman-1984,Poggio-Torre-Koch-1985,Marroquin-Mitter-Poggio-1987,Geiger-Girosi-1991,Zhu-Yuille-1996}.

1898: A similar method has been used

1899: in the determination of piecewise smooth relaxation time spectra

1900: from rheological data

1901: \cite{Roths-Maier-Friedrich-Marth-Honerkamp-2000}.

1902:

1903: Another  useful class of

1904: non--Gaussian priors

1905: generalizing (\ref{Gauss-omega})

1906: has the form \cite{Winkler-1995,Zhu-Mumford-1997,Zhu-Wu-Mumford-1997}

1907: \be

1908: p(v)

1909: \propto

1910: e^{-\frac{1}{2} \int\! dx\, \psi[\omega(x)]}

1911: ,

1912: \ee

1913: where $\psi$ is a non--quadratic function.

1914: This function $\psi$

1915: can be fixed in advance for a given problem

1916: or adapted using hyperparameters.

1917: Typical choices to allow discontinuities

1918: are symmetric ``cup'' functions

1919: with minimum at zero and flat tails

1920: for which one large step is cheaper than many small ones

1921: (see Fig.~\ref{Zhu-Mum-pic}).

1922:

1923: Table \ref{collection}

1924: summarizes the basic variants of prior energies

1925: discussed in the paper.

1926:

1927:

1928: \begin{figure}

1929: \vspace{-0.5cm}

1930: \begin{center}

1931: %\epsfig{file=wink1a.eps, width=50mm}\\

1932: \epsfig{file=figure1.eps, width=50mm}\\

1933: \end{center}

1934: \vspace{-0.5cm}

1935: \caption{Example of a non--quadratic

1936: ``cup''--function

1937: $\psi(x)$ = $a( 1.0 - 1/(1+(|x-x_0|/b)^\gamma))$,

1938: with

1939: $a$= $5$,

1940: $b$ = $10$,

1941: $\gamma$ = $0.7$,

1942: $x_0$ = $0$.

1943: }

1944: \label{Zhu-Mum-pic}

1945: \end{figure}

1946:

1947:

1948: \begin{table}[ht]

1949: \begin{center}

1950: \begin{tabular}{|c|c|}

1951: %\hline\rule[-2mm]{0mm}{6mm}

1952: %prior energy  & Eq. \\

1953: %\hline

1954: \hline

1955: \multicolumn{2}{|c|}{Gaussian prior\rule[-2mm]{0mm}{6mm}}\\

1956: \hline\rule[-2mm]{0mm}{6mm}

1957: $E(v)$ =

1958: $\frac{1}{2} \mel{v-v_0}{{\bf K}_0}{v-v_0}$

1959: & (\ref{gaussprior}) \\

1960: \hline

1961: \multicolumn{2}{|c|}{with hyperparameter $\theta$\rule[-2mm]{0mm}{6mm}}\\

1962: \hline\rule[-2mm]{0mm}{6mm}

1963: $E(v|\theta)$ =

1964: $\frac{1-\theta}{2}\mel{v-v_1}{{\bf K}_1}{v-v_1}$ &\\

1965: $\quad\qquad +\frac{\theta}{2}\mel{v-v_2}{{\bf K}_2}{v-v_2}$

1966: \rule[-2mm]{0mm}{6mm}

1967: &(\ref{mix-prior-bin})\\

1968: \hline

1969: \multicolumn{2}{|c|}{with local hyperfield $\theta(x)$\rule[-2mm]{0mm}{6mm}}\\

1970: \hline\rule[-2mm]{0mm}{6mm}

1971: $E(v|\theta)$ =

1972: $\frac{1}{2} \int \!dx \,

1973:  \Big( [1-\theta(x)]|\omega_1(x)|^2$ &\\

1974: $\qquad\qquad+\theta(x) |\omega_2(x)|^2

1975:  \Big)

1976: +\ln Z_{\cal V}(\theta)$

1977: & (\ref{local-hyper-p})\\

1978: %\hline

1979: $E(v|\theta)$ =

1980: $\frac{1}{2}

1981: \scp{{v}-\tilde v_0(\theta)}{{v}-\tilde v_0(\theta)}

1982: +\frac{1}{2}

1983: \mel{{v}}{{\bf K}}{{v}}$

1984: \rule[-2mm]{0mm}{6mm}

1985: &(\ref{local+laplace})\\

1986: \hline

1987: \multicolumn{2}{|c|}{

1988: Non--Gaussian prior with auxiliary field $B(x;v)$\rule[-2mm]{0mm}{6mm}}\\

1989: \hline\rule[-2mm]{0mm}{6mm}

1990: $E(v)$ =

1991: $\frac{1}{2} \int\!dx\,

1992: \left(

1993: [1-B(x)]|\omega_1(x)|^2

1994: +

1995: B(x) |\omega_2(x)|^2

1996: \right)$

1997: &(\ref{omega-B-energy})\\

1998: \hline

1999: \end{tabular}

2000: \end{center}

2001: \caption{Summary of basic prior energy variants discussed in this paper.}

2002: \label{collection}

2003: \end{table}

2004:

2005:

2006: \section{Stationarity equations}

2007: \label{stationarity-equations}

2008:

2009: To reconstruct a local potential $v$

2010: in MAP we have to maximize the posterior $p(v|D)$

2011: with respect to $v$.

2012: If the functional derivative of the posterior with respect to $v$ exists,

2013: the reconstructed potential can be found by solving the

2014: stationarity equation

2015: \be

2016: \delta_{v} \ln p(v|D) = 0

2017: ,

2018: \label{stat-eq}

2019: \ee

2020: where we have chosen the logarithm for technical convenience, and

2021: $\delta_{v}$ denotes the functional derivative with respect to $v$.

2022:

2023: For observational data consisting of

2024: $n$ independent position measurements

2025: the posterior (\ref{bayestheorem}) reads

2026: \be

2027: p(v|D)

2028: \propto

2029: p(v) \prod_{i=1}^n p(x_i|\hat x,v)

2030: .

2031: \label{posterior2}

2032: \ee

2033: To formulate the stationarity equation (\ref{stat-eq})

2034: we have to calculate the functional derivatives

2035: of likelihood and prior.

2036: For inverse quantum statistics \cite{Lemm-IQS-2000}

2037: the likelihood for position measurements (\ref{pos-likelihood})

2038: on a canonical ensemble (\ref{canonical})

2039: depends on the eigenfunctions and eigenvalues

2040: of the $v$--dependent Hamiltonian $H(v)$.

2041: We thus have to find the functional derivatives

2042: of the eigenfunctions $\phi_\alpha$ and eigenvalues $E_\alpha$.

2043: Those can be obtained by taking the functional derivative

2044: of the eigenvalue equation

2045: $H\ket{\phi_\alpha}$ = $E_\alpha \ket{\phi_\alpha}$,

2046: where we will assume the eigenfunctions

2047: to be orthonormalized.

2048: Choosing

2049: $\scp{\phi_\alpha}{\delta_{v(x)}\phi_\alpha}$ = 0

2050: and

2051: utilizing

2052: \be

2053: \delta_{v(x)} H (x^\prime,x^{\prime\prime})

2054: =

2055: \delta_{v(x)} V (x^\prime,x^{\prime\prime})

2056: =

2057: \delta(x-x^\prime) \delta (x^\prime-x^{\prime\prime})

2058: ,

2059: \ee

2060: we find for nondegenerate eigenfunctions

2061: \bea

2062: \delta_{v(x)} E_\alpha

2063: &=& \mel{\phi_\alpha}{\delta_{v(x)} H}{\phi_\alpha}

2064: =|\phi_\alpha(x)|^2

2065: ,

2066: \label{deltaE-nonp}

2067: \\

2068: \delta_{v(x)} \phi_\alpha(x^{\prime})

2069: &=& \sum_{\gamma\ne \alpha} \frac{1}{E_\alpha-E_\gamma}

2070: \phi_\gamma(x^{\prime})\phi^*_\gamma(x) \phi_\alpha (x)

2071: .

2072: \eea

2073: It follows for the functional derivative of the likelihood

2074: \bea

2075: \delta_{v(x)}p(x_i|\hat x,v)

2076: &=&

2077: \av{\left(\delta_{v(x)}\phi^*(x_i)\right) \phi (x_i)}

2078: \nonumber\\&&

2079: +\av{\phi^*(x_i)\delta_{v(x)}\phi (x_i)}

2080: \nonumber\\&&

2081: -

2082: \beta \Big(

2083: \av{|\phi (x_i)|^2 |\phi (x)|^2}

2084: \nonumber\\&&

2085: -\av{|\phi (x_i)|^2}\av{|\phi (x)|^2}

2086: \Big)

2087: .

2088: \label{der-like}

2089: \eea

2090:

2091:

2092: Having obtained Eq.~(\ref{der-like})

2093: for the likelihood

2094: we now have to find the functional derivative of the prior.

2095: For the Gaussian prior (\ref{gaussprior})

2096: one gets directly

2097: \be

2098: \delta_{v} \ln p(v)

2099: = -{\bf K}_0(v-v_0)

2100: .

2101: \label{prior-dev}

2102: \ee

2103:

2104:

2105: If hyperparameters $\theta$ are included

2106: and treated in MAP

2107: (i.e., not integrated out by Monte Carlo techniques),

2108: the posterior has to be maximized simultaneously

2109: with respect to $v$ and $\theta$.

2110: We have already mentioned that $\theta$--dependent inverse covariances

2111: lead to normalization factors which are independent of $v$

2112: but depend on $\theta$.

2113: Such factors have to be included

2114: when maximizing with respect to $\theta$.

2115:

2116: As a non--Gaussian example

2117: consider a prior

2118: where two filtered differences

2119: are mixed by an auxiliary field $B(x)$

2120: and an additional prior factor $p(B)$ is included,

2121: for example to prevent fast oscillations of $B(x)$.

2122: With $B(x)$ = $\sigma(u(x)-\vartheta)$,

2123: threshold $\vartheta$,

2124: sigmoidal function  $\sigma(x)$

2125: as in Eq.~(\ref{sigmoid-bsp}),

2126: and

2127: $u(x)$ = $|\omega_1(x)|^2-|\omega_2(x)|^2$

2128: this gives

2129: \be

2130: p(v) \propto

2131: e^{-\frac{1}{2} \, \int\! dx \,

2132: \big| [1-B(x)] \omega_1(x) + B(x) \omega_2(x)

2133: \big|^2

2134: -E_B

2135: }

2136: .

2137: \label{non-gauss-prior}

2138: \ee

2139: Analogous to Eq.~(\ref{b-prior}),

2140: the term

2141: \be

2142: E_B = \int\!dx\, E_B(x)

2143: ,

2144: \ee

2145: represents an auxiliary prior energy

2146: formulated in terms of the mixing function $B(x)$.

2147: Like $\omega(x)$ the function value $E_B(x)$

2148: may depend on the whole function $B$

2149: and not necessarily only on the function value $B(x)$.

2150: Using $\omega_i(x)$ = $\scp{x}{{\bf W}_i (v-v_i)}$

2151: we find

2152: \be

2153: \delta_{v(x)} \omega_i(x^\prime)

2154: = {\bf W}_i(x^\prime,x)

2155: ,

2156: \ee

2157: and thus

2158: \be

2159: \delta_{v(x)} u(x^\prime)

2160: =

2161: 2\left({\bf W}^T_1(x,x^\prime) \, \omega_1(x^\prime)

2162: -{\bf W}^T_2(x,x^\prime) \, \omega_2(x^\prime)

2163: \right).

2164: \ee

2165: Furthermore, we obtain for the functional derivative of $E_B$

2166: \be

2167: \delta_{v(x)} E_B(x^\prime)

2168: =

2169: \int\!dx^{\prime\prime}\,

2170: \left[

2171: \delta_{v(x)} B(x^{\prime\prime})

2172: \right]

2173: \,

2174: \left[

2175: \delta_{B(x^{\prime\prime})} E_B(x^\prime)

2176: \right]

2177: ,

2178: \ee

2179: where with Eq.~(\ref{jumps2})

2180: \be

2181: \delta_{v(x)} B(x^{\prime\prime})

2182: =

2183: \sigma^\prime(u(x^{\prime\prime})-\vartheta)

2184: \delta_{v(x)} u(x^{\prime\prime})

2185: ,

2186: \ee

2187: and

2188: $\sigma^\prime(u)$ = $d\sigma(u)/du$.

2189: For a prior energy as in (\ref{quad-err}) which is quadratic in  $B(x)$

2190: \be

2191: E_B(x) = |\omega_B(x)|^2

2192: ,

2193: \ee

2194: $\omega_B(x)$ defined in Eq.~(\ref{filtered-diff-ng}),

2195: the functional derivative with respect to $B(x)$ becomes

2196: \be

2197: \delta_{B(x)}E_B(x^\prime)

2198:  =

2199: 2 {\bf W}_B^T(x,x^\prime)\, \omega_B(x^\prime)

2200: .

2201: \ee

2202: For a  non--Gaussian prior with energy (\ref{non-quad-err})

2203: an additional derivative of the sigmoid appears.

2204: Now all terms can be collected and inserted into the

2205: functional derivative

2206: of the prior (\ref{non-gauss-prior})

2207: \bea

2208: \delta_{v} \ln p(v)

2209: &=&

2210: -\int\! dx \,

2211: \Big(

2212: \left[[1-B(x)] \omega_1(x) + B(x) \omega_2(x)

2213: \right]

2214: \nonumber\\&&

2215: \qquad\times

2216: \big(

2217: [1-B(x)] \delta_v\omega_1(x)

2218: + B(x) \delta_v\omega_2(x)

2219: \nonumber\\&&

2220: \qquad\qquad

2221: +\;\delta_v B(x) [\omega_2(x)-\omega_1(x)]

2222: \big)

2223: \nonumber\\&&

2224: \qquad+\;\delta_v E_B(x) \Big)

2225: .

2226: \eea

2227:

2228: The Bayesian approach to inverse quantum mechanics

2229: is not restricted to position measurements,

2230: but allows to deal with all kinds of observations

2231: for which the likelihood can be calculated.

2232: To have better information about the depth of a potential

2233: it is useful to include information on the

2234: ground state energy of a system.

2235: For instance,

2236: including a noisy  measurement of the average energy

2237: \be

2238: U

2239: = \av{E}

2240: =

2241: \sum_\alpha p_\alpha E_\alpha

2242: ,

2243: \ee

2244: yields an additional factor in the posterior of the form

2245: \be

2246: p_U \propto e^{-E_U}

2247: ,\quad

2248: E_U =

2249: \frac{\mu}{2} (U - \kappa)^2

2250: \label{averageE-penal}

2251: .

2252: \ee

2253: In the noise free limit

2254: $\mu\rightarrow\infty$

2255: this yields $U\rightarrow\kappa$.

2256:

2257: Calculating

2258: the functional derivative of $U$

2259: with respect to a local potential

2260: \be

2261: \delta_{v(x)} U =

2262: \av{\delta_{v(x)} E}-\beta \av{E\; \delta_{v(x)} E}

2263: + \beta \av{E} \av{\delta_{v(x)} E}

2264: ,

2265: \ee

2266: it is straightforward to obtain

2267: \be

2268: \delta_{v(x)} E_U

2269: =

2270:   \mu\left(U\!-\!\kappa\right)

2271: \av{|\phi (x)|^2\left[1-\beta \left( E  -U \right) \right]}

2272: .

2273: \ee

2274:

2275:

2276: Stationarity equations are typically nonlinear

2277: and have to be solved by iteration.

2278: A possible iteration scheme is

2279: \bea

2280: v^{(r+1)}

2281: &=&

2282: v^{(r)}\! +

2283: \eta {\bf A}^{-1}

2284: \Big[\delta_v \ln p(v^{(r)})

2285: %{\bf K}_0^{(r)} (v_0\!-\!v^{(r)})

2286: \label{iter1}

2287: \nonumber\\&&

2288: \quad +\sum_i \delta_{v}\ln p(x_i|\hat x,v^{(r)})

2289: -\delta_{v} E_U^{(r)}

2290: \Big]

2291: .

2292: \label{iteration}

2293: \eea

2294: Here $\eta$ is a step width which can be optimized

2295: by a line search algorithm

2296: and

2297: the positive definite operator ${\bf A}$

2298: distinguishes different learning algorithms.

2299:

2300:

2301:

2302: \section{Numerical examples}

2303: \label{numerical}

2304:

2305: As numerical application of BIQM

2306: and to test several variants

2307: of implementing {\it a priori} information

2308: we will study the reconstruction of an approximately

2309: periodic, one--di\-mensional potential.

2310: Such a potential may represent a one--dimensional surface

2311: where a periodic structure,

2312: e.g. that of a regular crystal,

2313: is distorted by impurities,

2314: located at unknown positions and of unknown form.

2315:

2316: To test the quality of reconstruction algorithms,

2317: artificial data will be sampled

2318: from a model with known ``true'' potential $v_{\rm true}$.

2319: Selecting a specific prior model

2320: and applying the corresponding Bayesian reconstruction algorithm

2321: to the sampled data,

2322: we will be able to compare the reconstructed

2323: potential with the original one.

2324: In particular, we will take as true potential

2325: the following perturbed periodic potential

2326: \be

2327: v_{\rm true}(x) =

2328: \left\{

2329: {

2330: \sin \left( \frac{2\pi}{6}x\right); \quad 1\le x\le12,\,\;  25\le x\le 36,

2331: \atop

2332: \!\!\!\!\!\!

2333: \!\!\!\!\!\!\!\!\!\!\!\!

2334: \!\!\!\!\!\!\!\!\!\!\!\!

2335: \!\!\!\!

2336: \sin \left( \frac{2\pi}{12}x\right); \quad 13\le x\le 24,

2337: }

2338: \right.

2339: \ee

2340: using for the numerical calculations a mesh

2341: of size 36.

2342: Considering a system prepared as canonical ensemble

2343: the potential $v_{\rm true}$

2344: defines a corresponding canonical density operator $\rho$

2345: as given in Eq.~(\ref{canonical}).

2346: Artificial data $D$ can then be sampled

2347: according to the likelihood model

2348: of quantum mechanics (\ref{qm-likelihood}).

2349: For the following examples, $n$ = 200 data points representing

2350: position measurements have been sampled

2351: using the transformation method

2352: \cite{Press-Teukolsky-Vetterling-Flannery-1992}.

2353: In all calculations we used periodic boundary conditions

2354: for quantum mechanical wave functions

2355: while the potential $v$ has been set to zero at the boundaries.

2356:

2357:

2358: We will now discuss the results of a Bayesian reconstruction

2359: under varying prior models.

2360: As first example, consider a simple Gaussian prior

2361: (\ref{gaussprior})

2362: with negative Laplacian inverse covariance

2363: ${\bf K}_0$ = $-\lambda \Delta$,

2364: zero reference potential $v_0\equiv 0$,

2365: and an additional prior factor (\ref{averageE-penal})

2366: representing a noisy measurement of the average energy.

2367: The reconstruction results

2368: %ed potential $v_{\rm BIQM}$

2369: %and the corresponding

2370: %reconstructed likelihood $p(x|\hat x, v_{\rm BIQM})$

2371: are shown in Fig.~ \ref{p162}.

2372: In particular, the figure on top compares

2373: the reconstructed likelihood

2374: $p_{\rm BIQM}(x|\hat x,v_{\rm BIQM})$

2375: with the true likelihood

2376: $p_{\rm true}(x|\hat x,v_{\rm true})$

2377: and with the empirical density, i.e.,

2378: the relative frequencies of the sampled data

2379: \be

2380: p_{\rm emp}(x) =

2381: \frac{1}{n} \sum_{i=1}^n \delta(x-x_i)

2382: .

2383: \ee

2384: Similarly, the lower figure compares

2385: the reconstructed  potential $v_{\rm BIQM}$

2386: with the true potential $v_{\rm true}$.

2387: Since information on the average energy

2388: was available the depth of the potential

2389: is well approximated at least at one of its minima.

2390: This is sufficient to fulfill the noisy average energy condition.

2391: However, because only smoothness and  no

2392: periodicity information is implemented by the prior

2393: the reconstructed potential is too flat.

2394: The effect is stronger near the maxima

2395: than near the minima of the potential

2396: because near the maxima only few data points are available

2397: and hence the reconstructed potential is there dominated by the zero reference

2398: potential in the smoothness prior.

2399:

2400: To include information on approximate periodicity

2401: we have replaced in the next example

2402: the zero reference potential $v_0\equiv 0$

2403: by the strictly periodic reference potential

2404: \be

2405: v_0(x) = \sin \left( \frac{2\pi}{6}x\right)

2406: ,

2407: \label{per-ref-prior}

2408: \ee

2409: shown as dashed line

2410: in the following figures of potentials.

2411: A reconstruction

2412: with the periodic reference potential (\ref{per-ref-prior})

2413: but without average energy information,

2414: and starting the iteration with the reference potential as

2415: initial guess $v^{(0)}$ = $v_0$

2416: is shown in Fig.~\ref{p19}.

2417: Due to missing average energy information

2418: the depth of the potential is not well approximated.

2419: It is also clearly visible in Fig.~\ref{p19}

2420: that the smoothness prior

2421: does not favor solutions which are

2422: similar to the reference $v_0$ itself

2423: but solutions

2424: which have derivatives similar to that of $v_0$.

2425: Fig.~\ref{p19} also displays

2426: that the reconstruction of the potential does clearly identify

2427: the impurity.

2428: As the reference potential is not adapted

2429: to the impurity region the reconstruction is there poorer

2430: than in the regular region.

2431:

2432: Furthermore, it is worth emphasizing

2433: that the reconstructed likelihood

2434: fits the empirical density well,

2435: even slightly better than the true likelihood does.

2436: This is due to the flexibility of a nonparametric approach

2437: which allows to fit the fluctuations of the empirical density

2438: caused by the finite sample size.

2439: The effect is well known in empirical learning and

2440: leads to so called ``overfitting''

2441: if the influence of the prior becomes to small.

2442: Since observational data

2443: influence the reconstruction only through the likelihood,

2444: the reconstruction of potentials is in general

2445: a more difficult task than the reconstruction of likelihoods.

2446: This indicates the special

2447: importance of {\it a priori} information

2448: when reconstructing potentials.

2449: Indeed, even if the complete likelihood is given,

2450: the problem of determining the potential

2451: can still be ill--defined

2452: in regions where the likelihood is small

2453: \cite{Zhu-Rabitz-1999}.

2454:

2455: A prior model with periodic reference potential

2456: can be made more flexible

2457: by adapting

2458: amplitude, frequency, and phase

2459: of the reference potential (\ref{per-ref-prior}).

2460: For this purpose one can introduce a hyperparameter vector

2461: $\theta$ = $(\theta_1,\theta_2,\theta_3)$

2462: parameterizing amplitude, frequency, and phase

2463: and take as reference potential

2464: \be

2465: v_0(x;\theta)

2466: = \theta_1 \sin \left( \frac{2\pi}{\theta_2}x+\theta_3\right)

2467: .

2468: \ee

2469: The corresponding maximization of the posterior

2470: with respect to $\theta$ is easy in that case

2471: and does not change the results of Fig.~\ref{p19}

2472: where the hyperparameters are already optimally adapted.

2473:

2474:

2475: Including an additional noisy energy measurement

2476: (\ref{averageE-penal})

2477: Fig.~\ref{p22} shows that

2478: the depth of the

2479: potential is indeed better approximated

2480: than in Fig.~\ref{p19}.

2481: To avoid local maxima of the posterior

2482: the solution of Fig.~\ref{p19}

2483: has been used as initial guess

2484: and the factor $\mu$ multiplying the average energy term

2485: has been slowly increased to its final value.

2486: Fig.~\ref{p22} still only represents a

2487: local and no global maximum of the posterior,

2488: as can be by seen by starting

2489: with a different initial guess $v^{(0)}$.

2490: In Fig.~\ref{p155}

2491: a better solution for the same parameters

2492: is presented

2493: where the initial guess has been selected

2494: using {\it a priori} information

2495: about the location of the impurity region.

2496:

2497:

2498: Alternatively to a Gaussian prior with periodic reference,

2499: approximate periodicity can be enforced by

2500: the inverse covariance of a Gaussian prior.

2501: In this case the prior

2502: favors periodicity but no special form of the potential.

2503: The prior is thus less specific

2504: than a prior with explicit periodic reference function.

2505: Corresponding BIQM results

2506: for the inverse covariance (\ref{periodic-cov})

2507: are shown in

2508: Fig.~\ref{p31}.

2509: Indeed while the potential is well approximated

2510: in regions where many observations have been collected,

2511: it is not as well approximated in regions where no or only few data

2512: are available.

2513: These are the regions where the prior dominates the observational data.

2514: In particular, in the case presented in Fig.~\ref{p31},

2515: the zero reference function $v_0\equiv 0$

2516: of an additional Laplacian smoothness prior

2517: implements a tendency to flat potentials.

2518:

2519: If impurities are expected,

2520: a prior with one fixed periodic reference potential

2521: for the whole region is no adequate choice.

2522: Near impurities one would like to

2523: switch off the standard periodic reference potential

2524: which in these regions will be misleading.

2525: Because it is usually not known in advance

2526: where a given reference should be used and where not,

2527: those regions  must be identified

2528: during learning.

2529: As first example we study a prior energy

2530: similar to Eq.~(\ref{local+laplace}),

2531: \bea

2532: E(v)

2533: &=&

2534: \frac{\lambda_1}{2}

2535: \int\!dx\,

2536: |v(x)-v_0(x)|^2 [1-B(x)]

2537:  -\frac{\lambda_2}{2}\mel{v}{\Delta}{v},

2538: \nonumber\\&&

2539: \label{eq1}

2540: \eea

2541: which allows to switch off a given reference locally

2542: by means of a binary switching function

2543: defined as

2544: $B(x)$ = $\Theta\left(|v(x)-v_0(x)|^2-\vartheta\right)$.

2545: (An average energy term

2546: $E_U$ = $\frac{\mu}{2}(U-\kappa)^2$

2547: could easily be included.)

2548: In the prior energy (\ref{eq1}) the reference $v_0$ is only used

2549: if $|v(x)-v_0(x)|^2$ is smaller than the given threshold $\vartheta$.

2550: Starting with a smoothed version of Eq.~(\ref{eq1})

2551: with a real mixing function

2552: $B(x)$ = $\sigma\left(|v(x)-v_0(x)|^2-\vartheta\right)$,

2553: the results of Fig.~\ref{p102}

2554: have been obtained by changing during iteration

2555: $\sigma(x)$ slowly from a sigmoid to a step function.

2556: Using a step function for $B$ directly from the beginning

2557: leads to nearly indistinguishable results.

2558: Compared to Fig.~\ref{p31}

2559: the reconstruction in Fig.\ \ref{p102}

2560: is improved mainly in the unperturbed region

2561: where the algorithm can now use the correct reference potential.

2562: An additional advantage is

2563: that the final auxiliary field $B(x)$

2564: directly shows the identified impurity regions.

2565: One sees in  Fig.~\ref{p102}

2566: that the auxiliary field $B(x)$ is always switched off

2567: if the solution $v(x)$ is similar enough to the template $v_0(x)$.

2568:

2569: The two $v$--dependent terms in

2570: Eq.~(\ref{eq1}) can be combined

2571: [compare Eqs.~(\ref{local+laplace}) and (\ref{local+laplace2})].

2572: Skipping a term which only depends

2573: on $v$ through $B(x)$,

2574: one arrives at another

2575: prior which also implements local switching.

2576: More general, choosing the prior energy (\ref{omega-B-energy})

2577: for switching between

2578: two filtered differences

2579: with two reference potentials  $v_1$ and $v_2$

2580: leads to

2581: \bea

2582: E(v)

2583: &=&

2584:  \frac{\lambda_1}{2}

2585: \int\!dx\,

2586: [1-B(x)] |\omega_1(x)|^2

2587: \nonumber\\&&

2588: +

2589: \frac{\lambda_2}{2}

2590: \int\!dx\,

2591: B(x) |\omega_2(x)|^2

2592: %+\frac{\mu}{2}(U-\kappa)^2

2593: \label{eq2}

2594: ,

2595: \eea

2596: where the switching is controlled by

2597: the binary function $B(x)$

2598: =

2599: $\Theta\left(|\omega_1(x)|^2-|\omega_2(x)|^2 -\vartheta\right)$

2600: defined in terms of the filtered differences

2601: $\omega_i(x)$ = $(\partial/\partial x) [v(x)-v_i(x)]$.

2602: A prior energy (\ref{eq2})

2603: with two different nonzero reference potentials

2604: $v_1$ and $v_2$ is obtained, for example,

2605: when a different nonzero reference potential is given

2606: for the unperturbed and the perturbed region.

2607: The number of changes

2608: in the switching function

2609: $B(x)$ = $\Theta\left(|\omega_1(x)|^2-|\omega_2(x)|^2-\vartheta\right)$,

2610: can be controlled

2611: by adding a prior term $p(B)$

2612: penalizing the number of times the function $B(x)$

2613: changes its value.

2614: To avoid local minima for binary $B(x)$,

2615: simulated annealing techniques

2616: are useful.

2617: We have obtained an initial guess for $v$,

2618: and thus for $B(x)$,

2619: by writing $v(x)$ = $[1-c(x)]v_1(x)+c(x)v_2(x)$

2620: and optimizing the binary function $c(x)$

2621: by simulated annealing

2622: with respect to the likelihood and the additional prior $p(B)$.

2623: In particular, starting from $c(x)$ = $0$,

2624: new trial functions have been generated

2625: by selecting two points $x_1$, $x_2$ randomly

2626: and exchanging the function values zero and one

2627: in between (see Fig.~\ref{trial}).

2628: A new trial function has been accepted or rejected

2629: using the Metropolis rule

2630: $p($accept) = min$[1,\exp({-\beta_{\rm ann} \Delta E_{\rm ann}})]$

2631: with $\Delta E_{\rm ann}$ denoting the difference in the error

2632: between actual function and new trial function.

2633: In the present case we have

2634: $E_{\rm ann}(v)$ = $\sum_{i} E(x_i|\hat x,v)$ + $E_B(v)$

2635: where

2636: $E(x_i|\hat x,v)$ = $- \ln p(x_i|\hat x,v)$ and

2637: $p(B)\propto \exp(-E_B)$.

2638: %$E_B$ = $-\ln p(B)$.

2639: The annealing temperature $1/\beta_{\rm ann}$

2640: decreases during optimization.

2641:

2642: Fig.~\ref{p75}

2643: shows the reconstruction results

2644: using the following two reference potentials

2645: \bea

2646: v_1(x) &=& \frac{2}{3} \sin \left( \frac{2\pi}{6}x\right)

2647: ,

2648: \label{two-ref-potentialsA}

2649: \\

2650: v_2(x) &=& \sin^2 \left( \frac{2\pi}{6}x\right)

2651: {\rm sign}\left[\sin \left( \frac{2\pi}{6}x\right)\right]

2652: .

2653: \label{two-ref-potentialsB}

2654: \eea

2655: Compared to Fig.\ \ref{p102}

2656: the reconstruction is improved

2657: in the perturbed region,

2658: where the algorithm can now rely on a useful reference potential.

2659:

2660: Finally, the switching function can be introduced

2661: as local hyperfield.

2662: As an example for a prior with hyperfield,

2663: Fig. \ref{p120} shows the reconstruction

2664: with the prior energy

2665: \be

2666: E(v,\theta)

2667: =

2668: \frac{\lambda_1}{2}\scp{v-v_0(\theta)}{v-v_0(\theta)}

2669: -\frac{\lambda_2}{2}\mel{v}{\Delta}{v}

2670: %\nonumber\\&&

2671: %+\frac{\mu}{2}(U-\kappa)^2

2672: -\ln p(\theta)

2673: \label{eq3}

2674: ,

2675: \ee

2676: where

2677: $v_0(x;\theta)$ = $v_1(x)[1-\theta(x)]+v_2(x)\theta(x)$

2678: with the reference potentials of

2679: Eq.~(\ref{two-ref-potentialsA})

2680: and Eq.~(\ref{two-ref-potentialsB}).

2681: A hyperprior $p(\theta)$ has been used

2682: penalizing the number of discontinuities of the hyperfield $\theta(x)$,

2683: analogous to $p(B)$ for Fig.~\ref{p75}.

2684: The $E(v|\theta)$ part of the prior energy (\ref{eq3})

2685: is of the form (\ref{local-hyper-p}) with $\theta$--independent

2686: covariances. Hence the $\theta$--independent normalization factor

2687: can be skipped.

2688: An initial guess

2689: for the  local hyperfield $\theta (x)$ has been obtained

2690: by simulated annealing as described for Fig.~\ref{p75}.

2691: As in this case optimization is required only

2692: with respect to the $\theta$--dependent parts of the posterior,

2693: optimizing $\theta(x)$ for given $v$

2694: is faster than optimizing $v$ through $c(x)$

2695: which requires diagonalization of the hamiltonian $H$

2696: for every new trial function.

2697: However, as $\theta(x)$ is independent of $v$, the hyperfield

2698: has to be updated during iteration which has also been done

2699: by simulated annealing.

2700: As expected, a reconstruction with the non--Gaussian prior

2701: corresponding to the prior energy (\ref{eq2})

2702: is very similar to a reconstruction

2703: using hyperfields as in  Eq.~(\ref{eq3}).

2704:

2705:

2706: \begin{figure}[ht]

2707: \begin{center}

2708: \setlength{\unitlength}{0.9mm}

2709: \hspace{3cm}

2710: \begin{picture}(90,20)

2711: \thicklines

2712: %lines

2713: \put(0,0){\line(1,0){40}}

2714: %

2715: \put(50,0){\line(1,0){10}}

2716: \put(80,0){\line(1,0){10}}

2717: \put(60,0){\line(0,1){10}}

2718: \put(80,0){\line(0,1){10}}

2719: \put(60,10){\line(1,0){20}}

2720: % points

2721: \put(10,0){\circle*{1.2}}

2722: \put(30,0){\circle*{1.2}}

2723: \put(60,0){\circle*{1.2}}

2724: \put(80,0){\circle*{1.2}}

2725: % vector

2726: \put(41,5){\vector(1,0){8}}

2727: \put(49,5){\vector(-1,0){8}}

2728: \end{picture}

2729: \end{center}

2730:

2731:

2732: \begin{center}

2733: \setlength{\unitlength}{0.9mm}

2734: \begin{picture}(90,20)

2735: \thicklines

2736: % lines

2737: \put(0,0){\line(1,0){10}}

2738: \put(10,0){\line(0,1){10}}

2739: \put(10,10){\line(1,0){30}}

2740: %

2741: \put(50,0){\line(1,0){30}}

2742: \put(80,0){\line(0,1){10}}

2743: \put(80,10){\line(1,0){10}}

2744: % points

2745: \put(10,0){\circle*{1.2}}

2746: \put(30,0){\circle*{1.2}}

2747: \put(60,0){\circle*{1.2}}

2748: \put(80,0){\circle*{1.2}}

2749: % vector

2750: \put(41,5){\vector(1,0){8}}

2751: \put(49,5){\vector(-1,0){8}}

2752: \end{picture}

2753: \end{center}

2754:

2755:

2756: \begin{center}

2757: \setlength{\unitlength}{0.9mm}

2758: \begin{picture}(90,20)

2759: \thicklines

2760: % lines

2761: \put(0,0){\line(1,0){20}}

2762: \put(20,0){\line(0,1){10}}

2763: \put(20,10){\line(1,0){20}}

2764: %

2765: \put(50,0){\line(1,0){10}}

2766: \put(60,0){\line(0,1){10}}

2767: \put(60,10){\line(1,0){10}}

2768: \put(70,0){\line(0,1){10}}

2769: \put(70,0){\line(1,0){10}}

2770: \put(80,0){\line(0,1){10}}

2771: \put(80,10){\line(1,0){10}}

2772: % points

2773: \put(10,0){\circle*{1.2}}

2774: \put(30,0){\circle*{1.2}}

2775: \put(60,0){\circle*{1.2}}

2776: \put(80,0){\circle*{1.2}}

2777: % vector

2778: \put(41,5){\vector(1,0){8}}

2779: \put(49,5){\vector(-1,0){8}}

2780: \end{picture}

2781: \end{center}

2782: \caption{Generation of new trial configurations

2783: for simulated annealing

2784: by selecting two points randomly

2785: and exchanging the values zero and one of the binary function in between.

2786: This mechanism has been used to optimize the binary functions

2787: $c(x)$ and $\theta(x)$.}

2788: \label{trial}

2789: \end{figure}

2790:

2791:

2792: \section{Conclusion}

2793:

2794: A nonparametric Bayesian approach

2795: has been developed and applied

2796: to the inverse problem of reconstructing potentials

2797: of quantum systems from observational data.

2798: Relying on observational data only

2799: the problem is typically ill--defined.

2800: It is therefore essential

2801: to include adequate {\it a priori} information.

2802: Since reconstructed potentials

2803: obtained by Bayesian Inverse Quantum Mechanics (BIQM)

2804: depend sensitively on the implemented {\it a priori} information,

2805: flexible prior models are required

2806: which can be adapted to the specific situation under study.

2807: In particular, the use of hyperparameters, hyperfields,

2808: and non--Gaussian priors with auxiliary fields

2809: has been discussed in detail.

2810: In this paper we have focussed on

2811: the implementation of approximate periodicity

2812: for potentials in inverse problems of quantum statistics.

2813: The presented prior models, however, can be useful

2814: for many empirical learning problems,

2815: including for example regression or general density estimation.

2816: Several variants of implementing  {\it a priori} information

2817: on approximate periodicity

2818: have been tested and compared numerically.

2819:

2820:

2821:

2822: %\subsection*{Acknowledgments}

2823:

2824: \begin{thebibliography}{00}

2825:

2826: \bibitem{Tikhonov-Arsenin-1977}

2827: A.N. Tikhonov,

2828: V. Arsenin,

2829: {\it Solution of Ill--posed Problems.}

2830: (New York: Wiley, 1977).

2831:

2832: \bibitem{Kirsch-1996}

2833: A. Kirsch,

2834: {\it An Introduction to the Mathematical Theory of Inverse Problems.}

2835: (New York: Springer Verlag, 1996).

2836:

2837: \bibitem{Vapnik-1998}

2838: V.N. Vapnik,

2839: {\it Statistical Learning Theory.}

2840: (New York: Wiley, 1998).

2841:

2842: \bibitem{Honerkamp-1998}

2843: J. Honerkamp,

2844: {\it Statistical Physics.}

2845: (New York: Springer Verlag, 1998)

2846:

2847: \bibitem{Newton-1989}

2848: R.G. Newton,

2849: {\it Inverse Schr\"odinger Scattering in Three Dimensions.}

2850: (New York: Springer Verlag, 1989).

2851:

2852: \bibitem{Chadan-Sabatier-1989}

2853: K. Chadan,

2854: P.C. Sabatier,

2855: {\it Inverse Problems in Quantum Scattering Theory.}

2856: (Berlin: Springer Verlag, 1989)

2857:

2858:

2859: \bibitem{Chadan-Colton-Paivarinta-Rundell-1997}

2860: K. Chadan,

2861: D. Colton,

2862: L. P\"aiv\"arinta,

2863: W. Rundell,

2864: {\it An Introduction to Inverse Scattering and Inverse Spectral Problems.}

2865: (Philadelphia: SIAM, 1997).

2866:

2867: \bibitem{Gelfand-Levitan-1951}

2868: I.M. Gel'fand,

2869: B.M. Levitan,

2870: %On the determination of a differential equation from its spectral function.

2871: {Trans.\ Amer.\ Soc.\ } {\bf 1}, 253--302 (1951).

2872:

2873: \bibitem{Kac-1966}

2874: M. Kac,

2875: Can one hear the shape of a drum?

2876: {Am.\ Math.\ Mon.} {\bf 73}, 1--23 (1966).

2877:

2878: \bibitem{Marchenko-1986}

2879: V.A. Marchenko,

2880: {\it Sturm--Liouville Operators and Applications.}

2881: (Basel: Birk\-h\"auser, 1986).

2882:

2883: \bibitem{Zakhariev-Chabanov-1997}

2884: B.N. Zakhariev,

2885: V.M. Chabanov,

2886: %New situation in quantum mechanics.

2887: {Inverse Problems} {\bf 13}, R47--R79 (1997).

2888:

2889: \bibitem{Lemm-IQS-2000}

2890: J.C. Lemm,

2891: J. Uhlig,

2892: A. Weiguny,

2893: %Bayesian Approach to Inverse Quantum Statistics.

2894: %Technical Report, MS-TP1-99-6, M\"unster University,

2895: {Phys. Rev. Lett.} {\bf 84}, 2068 (2000).

2896:

2897: \bibitem{Lemm-BFT-1999}

2898: J.C. Lemm,

2899: {\it Bayesian Field Theory.}

2900: Technical Report No.~MS-TP1-99-1, Univ.\ of M\"unster,

2901: {\tt arXiv:physics/9912005}, (1999).

2902:

2903: \bibitem{Lemm-TDQ-2000}

2904: J.C. Lemm,

2905: {\it Inverse Time--Dependent Quantum Mechanics.}

2906: Technical Report, MS-TP1-00-1, M\"unster University,\\

2907: {\tt arXiv:quant-ph/0002010}, (2000).

2908:

2909: \bibitem{Lemm-IHF-2000}

2910: J.C. Lemm,

2911: J. Uhlig,

2912: {Phys. Rev. Lett.} {\bf 84}, 4517 (2000)

2913: %{\it Hartree-Fock Approximation for Inverse Many-Body Problems.}

2914: %Technical Report, MS-TP1-99-10, M\"unster University,

2915: %{\tt arXiv:nucl-th/9908056}, (1999).

2916:

2917: \bibitem{Helstrom:1976}

2918: C.W. Helstrom,

2919: {\it Quantum Detection and Estimation Theory.}

2920: (New York: Academic Press, 1976).

2921:

2922: \bibitem{Holevo:1982}

2923: A.S. Holevo,

2924: {\it Probabilistic and Statistical Aspects of Quantum Theory.}

2925: (Amsterdam: North--Holland, 1982).

2926:

2927: \bibitem{Tan:1997}

2928: M. Tan,

2929: %An inverse problem approach to optical homodyne tomography.

2930: {J. Mod. Opt.} {\bf 44} 2233 (1997).

2931:

2932: \bibitem{Buzek-Drobny-Derka-Adam-Wiedemann:1998}

2933: V. Bu\~zek, G. Drobn\'y, R. Derka, G. Adam, H. Wiedemann,

2934: {\tt arXiv:quant-ph/9805020}.

2935: %Bu\~zek, V., Drobn\'y, G., Derka, R., Adam, G., Wiedemann, H. (1998)

2936: %{\tt arXiv:quant-ph/9805020}.

2937:

2938: \bibitem{Bayes-1763}

2939: T.R. Bayes,

2940: %(1763)

2941: %An Essay Towards Solving a Problem in the Doctrine of Chances.

2942: {Phil. Trans. Roy. Soc. London} {\bf 53}, 370 (1763),

2943: reprinted in {\it Biometrika} {\bf 45}, 293 (1958).

2944:

2945: \bibitem{Berger-1980}

2946: J.O. Berger,

2947: {\it Statistical Decision Theory and Bayesian Analysis.}

2948: (New York: Springer Verlag, 1980).

2949:

2950:

2951: \bibitem{Loredo-1990}

2952: T. Loredo,

2953: {\it From Laplace to Supernova SN 1987A: Bayesian Inference in Astrophysics.}

2954: In Foug\`ere, P.F. (ed.)

2955: {\it Maximum-Entropy and Bayesian Methods, Dartmouth, 1989},  81--142.

2956: (Dordrecht: Kluwer, 1990),

2957: available at {\tt http://bayes.wustl.edu/gregory/gregory.html}.

2958:

2959: \bibitem{Bernado-Smith-1994}

2960: J.M. Bernado,

2961: A.F. Smith,

2962: {\it Bayesian Theory.}

2963: (New York: John Wiley, 1994).

2964:

2965:

2966: \bibitem{Gelman-Carlin-Stern-Rubin-1995}

2967: A. Gelman,

2968: J.B. Carlin,

2969: H.S. Stern,

2970: D.B. Rubin,

2971: {\it Bayesian Data Analysis.}

2972: (New York: Chapman \& Hall, 1995).

2973:

2974: \bibitem{Sivia-1996}

2975: D.S. Sivia,

2976: {\it Data Analysis: A Bayesian Tutorial.}

2977: (Oxford: Oxford University Press, 1996).

2978:

2979: \bibitem{Carlin-Louis-1996}

2980: B.P. Carlin,

2981: T.A. Louis,

2982: {\it Bayes and Empirical Bayes Methods for Data Analysis.}

2983: (Boca Raton: Chapman \& Hall/CRC, 1996).

2984:

2985:

2986: \bibitem{Metropolis-Rosenbluth-Rosenbluth-Teller-Teller-1953}

2987: N. Metropolis,

2988: A.W. Rosenbluth,

2989: M.N. Rosenbluth,

2990: A.H. Teller,

2991: E. Teller,

2992: %Equation of state calculations by fast computing machines.

2993: {Journal of Chemical Physics} {\bf 21}, 1087--1092, (1953).

2994:

2995: \bibitem{Binder-Heermann-1988}

2996: K. Binder,

2997: D.W. Heermann,

2998: {\it Monte Carlo simulation in statistical physics: an introduction.}

2999: (Berlin: Springer Verlag, 1988).

3000:

3001: \bibitem{Neal-1997}

3002: R.M. Neal,

3003: {\it Monte Carlo Implementation of Gaussian Process Models

3004: for Bayesian Regression and Classification.}

3005: Technical Report No. 9702, Dept.\ of Statistics,

3006: Univ.\ of Toronto, Canada (1997).

3007:

3008:

3009: \bibitem{De-Bruijn-1981}

3010: N.G. De Bruijn,

3011: {\it Asymptotic Methods in Analysis.}

3012: (New York: Dover, 1981),

3013: originally published in 1958

3014: by the North--Holland Publishing Co., Amsterdam.

3015:

3016: \bibitem{Bleistein-Handelsman-1986}

3017: N. Bleistein, N. Handelsman,

3018: {\it Asymptotic Expansions of Integrals.}

3019: (New York: Dover 1986),

3020: originally published in 1975

3021: by Holt, Rinehart and Winston, New York.

3022:

3023: \bibitem{Girosi-Jones-Poggio-1995}

3024: F. Girosi,

3025: M. Jones,

3026: T. Poggio,

3027: %Regularization Theory and Neural Networks Architectures.

3028: {Neural Computation} {\bf 7} (2), 219--269 (1995).

3029:

3030: \bibitem{Lemm-1996}

3031: J.C. Lemm,

3032: {\it Prior Information and Generalized Questions.}

3033: A.I.Memo No. 1598, C.B.C.L. Paper No. 141,

3034: Massachusetts Institute of Technology, (1996),

3035: available at {\tt http://pauli.uni-muenster.de/${}^\sim$lemm}.

3036:

3037: \bibitem{Lemm-1998}

3038: J.C. Lemm,

3039: {\it How to Implement A Priori Information:

3040: A Statistical Mechanics Approach.}

3041: Technical Report MS-TP1-98-12, M\"unster University,

3042: {\tt arXiv:cond-mat/9808039} (1998).

3043:

3044: \bibitem{Bishop-1995b}

3045: C.M. Bishop,

3046: {\it Neural Networks for Pattern Recognition.}

3047: (Oxford: Oxford University Press, 1995).

3048:

3049: \bibitem{lemm-mixture-1999}

3050: J.C. Lemm,

3051: {\it Mixtures of Gaussian Process Priors.}

3052: In Proceedings of ICANN 99

3053: IEEE Conference Publication,

3054: Vol. 1, pp 292--297

3055: (London, IEEE, 1999).

3056:

3057:

3058: \bibitem{Holland-1975}

3059: J.H. Holland,

3060: {\it Adaption in Natural and Artificial Systems.}

3061: (University of Michigan Press, 1975),

3062: 2nd ed. MIT Press, 1992.

3063:

3064: \bibitem{Goldberg-1989}

3065: D.E. Goldberg,

3066: {\it Genetic Algorithms in Search, Optimization, and Machine Learning.}

3067: (Redwood City, CA: Addison--Wesley, 1989).

3068:

3069: \bibitem{Michalewicz-1992}

3070: Z. Michalewicz,

3071: {\it Genetic Algorithms + Data Structures = Evolution Programs.}

3072: (Berlin: Springer Verlag, 1992).

3073:

3074:

3075: \bibitem{Schwefel-1995}

3076: H.--P. Schwefel,

3077: {\it Evolution and Optimum Seeking.}

3078: (New York: Wiley, 1995).

3079:

3080: \bibitem{Mitchell-1996}

3081: M. Mitchell,

3082: {\it An Introduction to Genetic Algorithms.}

3083: (Cambridge, MA: MIT Press, 1996).

3084:

3085: \bibitem{Kirkpatrick-Gelatt-Vecchi-1983}

3086: S. Kirkpatrick,

3087: C.D. Gelatt Jr.,

3088: M.P. Vecchi,

3089: %Optimization by Simulated Annealing.

3090: {Science} {\bf 220}, 671--680 (1983).

3091:

3092: \bibitem{Mezard-Parisi-Virasoro-1987}

3093: M. Mezard,

3094: G. Parisi,

3095: M.A. Virasoro,

3096: {\it Spin Glass Theory and Beyond.}

3097: (Singapore: World Scientific, 1987).

3098:

3099: \bibitem{Aarts-Korts-1989}

3100: E. Aarts, J. Korts,

3101: {\it Simulated Annealing and Boltzmann Machines.}

3102: (New York: Wiley, 1989).

3103:

3104: \bibitem{Gelfand-Mitter-1991}

3105: S.B. Gelfand,

3106: S.K. Mitter,

3107: %Simulated Annealing Type Algorithms for Multivariate Optimization.

3108: %Volume 6, Number 3, 1991

3109: Algorithmica {\bf 6} (3) 419-436 (1991).

3110: %\bibitem{Gelfand-Mitter-1993}

3111: %S.B. Gelfand, S.K. Mitter,

3112: %On Sampling Methods and Annealing Algorithms.

3113: %{Markov Random Fields -- Theory and Applications.}

3114: %(New York: Academic Press, 1993).

3115:

3116: \bibitem{Yuille-Kosowski-1994}

3117: A.L. Yuille,

3118: J.J. Kosowski,

3119: %Statistical Physics Algorithm That Converge.

3120: {Neural Computation} {\bf 6} (3), 341--356 (1994).

3121:

3122: \bibitem{Geman-Geman-1984}

3123: S. Geman,

3124: D. Geman,

3125: Stochastic relaxation,

3126: Gibbs distributions and the Bayesian restoration of images.

3127: {IEEE Trans. on Pattern Analysis and Machine Intelligence}

3128: {\bf 6}, 721--741 (1984),

3129: reprinted in Shafer \& Pearl (eds.)

3130: Readings in Uncertainty Reasoning.

3131: (San Mateo, CA: Morgan Kaufmann, 1990)

3132:

3133: \bibitem{Poggio-Torre-Koch-1985}

3134: T. Poggio,

3135: V. Torre,

3136: C. Koch,

3137: Computational vision and regularization theory.

3138: {Nature} {\bf 317}, 314--319, (1985).

3139:

3140: \bibitem{Marroquin-Mitter-Poggio-1987}

3141: J.L. Marroquin,

3142: S. Mitter,

3143: T. Poggio,

3144: %Probabilistic solution of ill--posed problems in computational vision.

3145: {J. Am. Stat. Assoc.} {\bf 82}, 76--89 (1987).

3146:

3147: \bibitem{Geiger-Girosi-1991}

3148: D. Geiger,

3149: F. Girosi,

3150: %Parallel and Deterministic Algortihms for MRFs: Surface Reconstruction.

3151: {IEEE Trans. on Pattern Analysis and Machine Intelligence}

3152: {\bf 13} (5), 401--412 (1991).

3153:

3154: \bibitem{Zhu-Yuille-1996}

3155: S.C. Zhu,

3156: A.L. Yuille,

3157: %Region Competition: Unifying Snakes, Region Growing,

3158: %and Bayes/MDL for Multiband Image Segmentation.

3159: {IEEE Trans.\ on Pattern Analysis and Machine Intelligence}

3160: {\bf 18} (9), 884--900 (1996).

3161:

3162: \bibitem{Roths-Maier-Friedrich-Marth-Honerkamp-2000}

3163: T. Roths,

3164: D. Maier,

3165: Chr. Friedrich,

3166: M. Marth,

3167: J. Honerkamp,

3168: %Roths, T., Maier, D., Friedrich, C., Marth, M.,Honerkamp, J.(2000)

3169: %Determination of the relaxation time spectrum from dynamic moduli

3170: %using an edge preserving regularization method.

3171: {Rheol. Acta} {\bf 39} (2) 163-173 (2000).

3172:

3173: \bibitem{Winkler-1995}

3174: G. Winkler,

3175: {\it Image Analysis, Random Fields and Dynamic Monte Carlo Methods.}

3176: (Berlin: Springer Verlag, 1995).

3177:

3178:

3179: \bibitem{Zhu-Mumford-1997}

3180: S.C. Zhu,

3181: D. Mumford,

3182: %Prior Learning and Gibbs Reaction--Diffusion.

3183: {IEEE Trans.\ on Pattern Analysis and Machine Intelligence}

3184: {\bf 19} (11), 1236--1250 (1997).

3185:

3186: \bibitem{Zhu-Wu-Mumford-1997}

3187: S.C. Zhu,

3188: Y.N. Wu,

3189: D. Mumford,

3190: %Minimax Entropy principle and Its Application to Texture Modeling.

3191: {\it Neural Computation}, {\bf 9} (8), 1627--1660 (1997).

3192:

3193:

3194: \bibitem{Press-Teukolsky-Vetterling-Flannery-1992}

3195: W.H. Press,

3196: S.A. Teukolsky,

3197: W.T. Vetterling,

3198: B.P. Flannery,

3199: {\it Numerical Recipes in C.}

3200: (Cambridge: Cambridge University Press, 1992).

3201:

3202: \bibitem{Zhu-Rabitz-1999}

3203: W. Zhu,

3204: H. Rabitz,

3205: %Potential surfaces from the inversion

3206: %of time dependent probability density data.

3207: {J. Chem. Phys.} {\bf 111}, 472--480 (1999).

3208:

3209: \end{thebibliography}

3210: %\clearpage

3211:

3212: \begin{figure}

3213: \begin{center}

3214: \epsfig{file=figure2.eps, width= 67mm}

3215: %\epsfig{file=ps/FLDframe162k.eps, width= 67mm}

3216: \epsfig{file=figure3.eps, width= 67mm}

3217: %\epsfig{file=ps/OFVframe162k.eps, width= 67mm}

3218: \end{center}

3219: \caption{

3220: Gaussian prior with Laplacian inverse covariance,

3221: zero reference potential,

3222: and additional noisy energy measurement.

3223: Top: %L.h.s:

3224: Empirical density $p_{\rm emp}$ (bars),

3225: true likelihood $p_{\rm true}$ (thin),

3226: %reference likelihood $p_0$ (dashed),

3227: reconstructed likelihood $p_{\rm BIQM}$ (thick)

3228: Bottom: %R.h.s.:

3229: Reconstructed potential $v_{\rm BIQM}$ (thick)

3230: and true potential $v_{\rm true}$ (thin).

3231: %and the reference potential $v_0$ (dashed) of Eq.~(\ref{per-ref-prior}).

3232: (With 200 data points,

3233: $m$ = 0.25 for $\hbar$ = 1, $\beta$ = 4.

3234: Gaussian prior (\ref{gaussprior})

3235: with inverse covariance ${\bf K}_0$ = $-\lambda \Delta$,

3236: $\lambda$ = 0.2,

3237: zero reference potential $v_0\equiv 0$,

3238: and an additional energy penalty term of the form (\ref{averageE-penal})

3239: with $\mu$ = 1000

3240: and $\kappa$ = $-0.330$,

3241: equal to the true average energy $U(v_{\rm true})$.

3242: The solution has been obtained by iterating according to

3243: Eq. (\ref{iteration}) with ${\bf A}$ = ${\bf K}_0$, starting

3244: with initial guess $v^{(0)} \equiv 0$.

3245: The optimal step width $\eta$ has

3246: been determined for each iteration by a line search algorithm.)

3247: }

3248: \label{p162}

3249: \end{figure}

3250:

3251:

3252: \begin{figure}

3253: \begin{center}

3254: \epsfig{file=figure4.eps, width= 67mm}

3255: %\epsfig{file=ps/FLDframe159k.eps, width= 67mm}

3256: \epsfig{file=figure5.eps, width= 67mm}

3257: %\epsfig{file=ps/OFVframe159k.eps, width= 67mm}

3258: %\epsfig{file=ps/FLDframe19k.eps, width= 67mm}

3259: %\epsfig{file=ps/OFVframe19k.eps, width= 67mm}

3260: \end{center}

3261: \caption{

3262: Gaussian prior with periodic reference potential

3263: without noisy energy measurement.

3264: Top: %L.h.s:

3265: Empirical density $p_{\rm emp}$ (bars),

3266: true likelihood $p_{\rm true}$ (thin),

3267: reconstructed likelihood $p_{\rm BIQM}$ (thick)

3268: Bottom: %R.h.s.:

3269: Reconstructed potential $v_{\rm BIQM}$ (thick).

3270: true potential $v_{\rm true}$ (thin),

3271: and reference potential $v_0$ (dashed)

3272: of Eq.~(\ref{per-ref-prior}).

3273: (Number of data points

3274: %200 data $m$ = 0.25 for $\beta$ = 4,

3275: %${\bf K}_0$ = $-\lambda \Delta$,

3276: %$\lambda$ = 0.2,

3277: and parameters $m$, $\beta$,

3278: ${\bf K}_0$,

3279: and $\lambda$

3280: as for Fig.~\ref{p162}

3281: but with $\mu$ = 0.

3282: The solution has been obtained by

3283: iterating according to (\ref{iteration})

3284: as described for Fig.\ref{p162}

3285: with initial guess

3286: $v^{(0)} = v_0$.)

3287: }

3288: \label{p19}

3289: \end{figure}

3290:

3291:

3292: \begin{figure}

3293: \begin{center}

3294: \epsfig{file=figure6.eps, width= 67mm}

3295: %\epsfig{file=ps/FLDframe160k.eps, width= 67mm}

3296: \epsfig{file=figure7.eps, width= 67mm}

3297: %\epsfig{file=ps/OFVframe160k.eps, width= 67mm}

3298: %\epsfig{file=ps/FLDframe22k.eps, width= 67mm}

3299: %\epsfig{file=ps/OFVframe22k.eps, width= 67mm}

3300: \end{center}

3301: \caption{

3302: Gaussian prior with periodic reference potential

3303: and additional energy measurement,

3304: improving the approximation of the minima.

3305: (Reference potential $v_0$ given in (\ref{per-ref-prior}),

3306: energy penalty term as in (\ref{averageE-penal})

3307: with $\mu$ = 1000

3308: and

3309: $\kappa$ = $-0.330$.

3310: %equal to the true average energy $U(v_{\rm true})$.

3311: All other parameters as for Fig.~\ref{p19}.

3312: Iterated with

3313: the solution shown in Fig.~\ref{p19}

3314: as initial guess $v^{(0)}$.)

3315: }

3316: \label{p22}

3317: \end{figure}

3318:

3319:

3320: \begin{figure}

3321: \begin{center}

3322: \epsfig{file=figure8.eps, width= 67mm}

3323: %\epsfig{file=ps/FLDframe155k.eps, width= 67mm}

3324: \epsfig{file=figure9.eps, width= 67mm}

3325: %\epsfig{file=ps/OFVframe155k.eps, width= 67mm}

3326: \end{center}

3327: \caption{

3328: Gaussian prior with periodic reference potential

3329: and additional energy measurement,

3330: with initial guess $v^{(0)}$ different from that of Fig.~\ref{p22}.

3331: (Reference potential $v_0$ given in (\ref{per-ref-prior}),

3332: energy penalty term as in (\ref{averageE-penal})

3333: All  parameters as for Fig.~\ref{p22}.

3334: Iterated with

3335: initial guess $v^{(0)}(x)$ = $v_0(x)$

3336: for $0<x\le12,\,  25\le x$ and

3337: $v^{(0)}(x)$ = $0$ for $13\le x\le 24$.)

3338: }

3339: \label{p155}

3340: \end{figure}

3341:

3342:

3343: \begin{figure}

3344: \begin{center}

3345: \epsfig{file=figure10.eps, width= 67mm}

3346: %\epsfig{file=ps/FLDframe182k.eps, width= 67mm}

3347: \epsfig{file=figure11.eps, width= 67mm}

3348: %\epsfig{file=ps/OFVframe182k.eps, width= 67mm}

3349: %\epsfig{file=ps/FLDframe163k.eps, width= 67mm}

3350: %\epsfig{file=ps/OFVframe163k.eps, width= 67mm}

3351: %\epsfig{file=ps/FLDframe31k.eps, width= 67mm}

3352: %\epsfig{file=ps/OFVframe31k.eps, width= 67mm}

3353: \end{center}

3354: \caption{

3355: Approximate periodicity implemented by an inverse covariance

3356: ${\bf K}_0$

3357: =

3358: $- \lambda (\Delta+\gamma \Delta_\theta)$

3359: as in Eq.~(\ref{periodic-cov}).

3360: % 163: (With $\gamma$ = 4.0, $\lambda$ = 0.05,

3361: (With $\gamma$ = 1.0, $\lambda$ = 0.2,

3362: a fixed $\theta$ = 6,

3363: energy penalty term with $\mu$ = 1000,

3364: and zero reference potential $v_0\equiv 0$.

3365: Initial guess $v^{(0)}$ = $v_0\equiv 0$.

3366: All other parameters as for Fig.~\ref{p19}.

3367: % neue Werte:

3368: % $U(v)$ = $-0.348$

3369: % $\lambda$ = 0.7

3370: % $\gamma$ = 0.4/0.7

3371: %

3372: )

3373: }

3374: \label{p31}

3375: \end{figure}

3376:

3377:

3378: \begin{figure}

3379: \begin{center}

3380: \epsfig{file=figure12.eps, width= 67mm}

3381: %\epsfig{file=ps/FLDframe167k.eps, width= 67mm}

3382: \epsfig{file=figure13.eps, width= 67mm}

3383: %\epsfig{file=ps/OFVframe167k.eps, width= 67mm}

3384: % 168 like 167 except mu = 1000 starting with solution of 167

3385: %\epsfig{file=ps/FLDframe168k.eps, width= 67mm}

3386: %\epsfig{file=ps/OFVframe168k.eps, width= 67mm}

3387: %\epsfig{file=ps/FLDframe102k.eps, width= 67mm}

3388: %\epsfig{file=ps/OFVframe102k.eps, width= 67mm}

3389: \end{center}

3390: \caption{

3391: Local switching between periodic

3392: and zero reference potential.

3393: The black bars on top indicate regions where $B(x)$ = 1,

3394: i.e., regions where impurities have been identified.

3395: (Prior of Eq.\ (\ref{eq1}) with

3396: %$\lambda_1$ = 5, $\lambda_2$ = 1, $\mu$ = 10, $\kappa$ = $-0.388$,

3397: $\lambda_1$ = 0.2, $\lambda_2$ = 0.2,

3398: $\mu$ = 0,

3399: and reference potential as in (\ref{per-ref-prior}).

3400: The $v$--dependent function $B(x)$ was slowly changed

3401: from a sigmoid to a step function

3402: during iteration, keeping  the threshold $\vartheta$ = $0.15$ fixed.

3403: %$\beta$ = 4, $m$ = 0.25, %$\hbar^2/2m$ = 2,

3404: All other parameters as

3405: in Fig.~\ref{p19}.

3406: Initial guess $v^{(0)}$ as for Fig.~\ref{p155})

3407: }

3408: \label{p102}

3409: \end{figure}

3410:

3411:

3412: \begin{figure}

3413: \begin{center}

3414: \epsfig{file=figure14.eps, width= 67mm}

3415: %\epsfig{file=ps/FLDframe75k.eps, width= 67mm}

3416: \epsfig{file=figure15.eps, width= 67mm}

3417: %\epsfig{file=ps/OFVframe75k.eps, width= 67mm}

3418: \end{center}

3419: \caption{

3420: Local switching between two nonzero reference potentials.

3421: (Reference potentials $v_1$, $v_2$

3422: given in

3423: (\ref{two-ref-potentialsA})

3424: and

3425: (\ref{two-ref-potentialsB}).

3426: Prior of Eq.\ (\ref{eq2}),

3427: with

3428: $\lambda_1$ = $\lambda_2$ = 10, $\mu$ = 0.

3429: Step function for $B(x)$ with $\vartheta$ = 0.

3430: An additional prior $p(B)$ on $B$ has been included

3431: with $-\ln p(B)/10$

3432: counting the number of discontinuities of the function $B(x)$.

3433: Other parameters as

3434: in Fig.~\ref{p19}.)

3435: }

3436: \label{p75}

3437: \end{figure}

3438:

3439:

3440: \begin{figure}

3441: \begin{center}

3442: \epsfig{file=figure16.eps, width= 67mm}

3443: %\epsfig{file=ps/FLDframe122k.eps, width= 67mm}

3444: \epsfig{file=figure17.eps, width= 67mm}

3445: %\epsfig{file=ps/OFVframe122k.eps, width= 67mm}

3446: \end{center}

3447: \caption{

3448: Prior with local hyperfield.

3449: (Prior of Eq.\ (\ref{eq3}),

3450: with

3451: $\lambda_1$ = 10, $\lambda_2$ = 1,

3452: $\vartheta$ = 0, $\mu$ = 0,

3453: including a hyperprior $p(\theta)$ with

3454: %$-\ln p(\theta)/10$

3455: $E_B/10$

3456: counting the number of discontinuities of the hyperfield $\theta(x)$.

3457: Other parameters as

3458: in Fig.~\ref{p19}.)

3459: }

3460: \label{p120}

3461: \end{figure}

3462:

3463:

3464:

3465: \end{document}

3466:

3467:

3468:

3469:

3470:

3471:

3472:

3473:

3474: