0412:cond-mat0412717/pre.tex

1: % Date Dec. 27, 2004 verified by AKB

2: % Date Dec. 24, 2004 by PB (Version 2)

3: % Date Dec. 15, 2004 by DAD (Revised version 1)

4: % Date May 15, 2004 by PB (version 1)

5:

6: \documentclass[prl,aps,twocolumn,showpacs]{revtex4}

7: \usepackage{graphicx,epsfig}

8:

9: % Some definitions are here

10:

11: \def\be{\begin{equation}}

12: \def\ee{\end{equation}}

13: \def\bea{\begin{eqnarray}}

14: \def\eea{\end{eqnarray}}

15:

16: \begin{document}

17:

18: \title{Maximum entropy and the problem of moments: A stable algorithm}

19:

20: \author{K.\,Bandyopadhyay and A.\,K.\,Bhattacharya}

21: \email{physakb@yahoo.com}

22: \affiliation{Department of Physics, University of Burdwan, Burdwan, WB 713104, India}

23:

24: \author{Parthapratim Biswas}

25: \email{biswas@phy.ohiou.edu}

26: \author{D.\,A.\,Drabold}

27: \email{drabold@ohio.edu}

28: \affiliation{Department of Physics and Astronomy, Ohio University, Athens, OH 45701}

29:

30: \pacs{71.23.Cq, 71.55.Jv, 02.30.Zz}

31:

32: \begin{abstract}

33:

34: We present a technique for entropy optimization to calculate a distribution from

35: its moments. The technique is based upon maximizing a discretized form of the

36: Shannon entropy functional by mapping the problem onto a dual space where an

37: optimal solution can be constructed iteratively. We demonstrate the performance

38: and stability of our algorithm with several tests on numerically difficult functions.

39: We then consider an electronic structure application, the electronic density of

40: states of amorphous silica and study the convergence of Fermi level with increasing

41: number of moments.

42: \end{abstract}

43:

44: \maketitle

45:

46: One of the fixed themes of physics is the solution of inverse problems.

47: A ubiquitous example in theoretical physics is the ``Classical Moment

48: Problem" (CMP), in which only a finite set of power moments of a

49: non-negative distribution function $p$ is known, and the full

50: distribution is needed~\cite{Shohat}. It is obvious that the solution for $p$ is {\it

51: not unique} for a finite set of moments. This non-uniqueness suggests

52: the need for a ``best guess" for $p$, based upon the available

53: information.  With its ultimate roots in nineteenth century statistical

54: mechanics and a subsequent strong justification based upon probability

55: theory, the ``maximum entropy" (maxent) method has provided an

56: extremely successful variational principle to address this type of inverse

57: problem~\cite{Jaynes}. Collins and Wragg used the maxent method to solve

58: the CMP for a modest number of moments~\cite{Collins}. In a

59: comprehensive paper with seminal applications, Mead and Papanicolaou~\cite{Mead1}

60: solved the CMP with

61: maximum entropy techniques and proposed the first practical numerical

62: scheme to solve the moment problem for up to 15 moments. In a

63: host of subsequent papers, the utility of the method as an unbiased and

64: surprisingly efficient (rapidly convergent) solution of the CMP has

65: been established. The principle has been used extensively in a number of

66: diverse applications ranging from image construction to spectral analysis,

67: large-scale electronic structure problems~\cite{Drabold1,Silver0}, series

68: extrapolation and analytic continuation~\cite{Drabold2}, quantum electronic transport~\cite{Mello},

69: ligand-binding distribution in polymers~\cite{Poland}, and transport

70: planning~\cite{Steeb}.

71:

72:

73: There exist a number of maximum entropy algorithms~\cite{Skilling, Mead1, Turek, Silver0, Brett}

74: that have been developed over the last two decades. Many of the algorithms

75: (but not all) are constrained  by the number of moments that it can deal with

76: and become unreliable when the number of constraints exceeds a problem-dependent

77: upper limit. As the number of moments increases, the calculation of moments

78: (particularly the power moments) becomes more sensitive to machine

79: precision and the optimization problem becomes ill-conditioned. It has

80: been observed that implementation of a maxent  algorithm with more than 20

81: power moments is notoriously difficult even with extended precision arithmetic

82: and it rarely gives any further information on the nature of the distribution.

83: The use of orthogonal polynomials as basis set significantly improves the

84: accuracy and remedies most of the problems that one encounters with power

85: moments.

86:

87: In this paper we present an iterative approach to construct the maxent

88: solution of CMP, which is based upon discretization of the Shannon entropy

89: functional~\cite{Shannon}.

90: The essential idea is to discretize Shannon entropy

91: and map the problem from the primal space onto dual space where an

92: optimal solution can be constructed iteratively without

93: the need of matrix inversion. We discuss theoretical ideas and develop

94: algorithms that can be used with both power and Chebyshev moments. The

95: stability and the accuracy of the method are discussed with reference to

96: two numerically non-trivial examples -- a uniform distribution and a double-delta function.

97: We illustrate the usefulness of our technique by computing the

98: electronic density of states (EDOS) of amorphous silica with

99: particular emphasis on convergence of the Fermi level as a function of number

100: of moments.

101:

102:

103: The starting point of our approach is to use a discretized

104: form of the Shannon entropy functional~\cite{Shannon} $S[x]$

105: using a quadrature formula

106: \be

107: \label{eq-010}

108: S = - \int \, p(x) \ln p(x) dx \approx - \sum_{j=1}^n w_j \, p_j \ln p_j

109: \ee

110:

111: Here $ w_j$  and $ x_j$ are the weights and abscissas of any accurate

112: quadrature formula, say the Gauss-Legendre and without any loss of

113: generality we restrict ourselves to $x \in $ [0,1].  We want to maximize

114: $S$ subject to the discretized moment constraints

115:

116: \be

117: \label{eq-020}

118: \sum_{j=1}^{n} w_j\, x_j^i \, p_j = \sum_{j=1}^n a_{ij} \tilde p_j = \mu_i, \; i = 1, 2, ..., m

119: \ee

120:

121: where we define $\tilde p_j = w_j\, p_j $ and $ a_{ij} = x_j^i $.

122: The entropy optimization program (EOP) can now be stated as to optimize

123: the Lagrangian function

124: %

125: \be

126: \label{eq-050}

127: L({\bf \tilde p}, \eta) \equiv \sum_{j=1}^n \tilde p_j \, \ln \left(\frac{\tilde p_j}{w_j}\right)

128: - \sum_{i=1}^m \tilde \eta_i \left(\sum_{j=1}^n a_{ij} \tilde p_j - \mu_i \right)

129: \ee

130:

131: and the solution can be written as

132:

133: \be

134: \label{eq-060}

135: \tilde p_j = w_j \exp\left(\sum_{i=1}^m a_{ij} \tilde \eta_i - 1 \right), \: \: j = 1, 2,..., n

136: \ee

137:

138: Since  ${\bf w} \ge 0$, Eq.(\ref{eq-060}) implies that  ${\bf \tilde p} \ge $ 0.

139: Furthermore, the conditions in Eqs.(\ref{eq-020}) and (\ref{eq-060}) can be

140: lumped together

141:

142: \be

143: \label{eq-070}

144: h_i(\tilde \eta) \equiv \sum_{j=1}^n a_{ij} \, w_j \, \exp \left(\sum_{k=1}^m a_{kj}

145: \tilde \eta_k - 1 \right) - \mu_i = 0,  \; \; \forall \;  i.

146: \ee

147:

148: We now see from Eq.(\ref{eq-070}) that the original constrained optimization

149: program is now reduced to an {\em unconstrained convex optimization program}

150: involving the dual variables

151:

152: \be

153: \label{eq-080}

154: \min_{\tilde \eta \in R^m} \: d(\tilde \eta) \equiv  \sum_{j=1}^n  w_j \exp

155: (\sum_{i=1}^m a_{ij}\tilde \eta_{i} - 1) - \sum_{i=1}^m \mu_i \tilde \eta_i

156: \ee

157:

158: If the dual optimization program stated above has an optimal solution

159: ${\tilde {\bf \eta^*}} $, the solution ${\tilde p_j ({\bf \eta^*})}$

160: can be obtained from Eq.(\ref{eq-060}). Bergman has proposed an iterative

161: method to minimize the dual objective function $d(\bf \tilde \eta)$

162: taking {\em only one} dual variable at a time~\cite{Bergman}. The method

163: starts with an arbitrarily chosen ${\bf \tilde \eta^0} \in R^m$, and

164: then cyclically updates all the dual variables as follows:

165:

166: Step 1: Start with any ${\bf \tilde \eta^0} \in R^m$  and a sufficiently small

167: tolerance level $\epsilon >$ 0. Set k = 0  and obtain $\tilde p_j^0$.

168:

169: Step 2: Let i =  (k mod m) + 1. Solve the equation

170:

171: \bea

172: \label{eq-105}

173: \phi_i^k(\lambda^k)   = \sum_{j=1}^n a_{ij} \tilde p_j^k \exp(a_{ij}\lambda^k) - \mu_i = 0

174: \eea

175:

176: Step 3: Update each component of ${\bf \tilde \eta}$

177:

178: \be

179: \label{eq-110}

180: \tilde \eta_l^{k+1}  = \tilde \eta_l^k + \lambda^k (\mbox{if}\; l = i),  \; \;

181: \tilde \eta_l^{k+1} = \tilde \eta_l^k \; \mbox{if} \; l \ne i

182: \ee

183:

184: Step 4: If Eq.(\ref{eq-070}) is satisfied within the preset level of

185: tolerance, then stop with ${\bf \eta^*} = {\tilde \eta^k}$,

186: and obtain the primal solution from Eq.(\ref{eq-060}). Otherwise,

187: calculate

188: %

189: \be

190: \tilde p_j^{k+1} = w_j \exp(\sum_{i=1}^m a_{ij} \tilde

191: \eta_i^{k+1} -1), \; \; \; j = 1,2,...,n

192: \ee

193: %

194: and go to Step 2

195:

196: From a computational point of view, the most problematic part of the above

197: algorithm  is the solution of the set of Eq.(\ref{eq-105}) in Step 2.

198: In a variant of the above scheme known as multiplicative algebraic reconstruction

199: technique~\cite{Fang, Gordon}, one uses the following closed-form expression

200: to approximate the correction term $\lambda^k$

201: \be

202: \label{eq-120}

203: \lambda_i^k = \ln \left(\frac{\mu_i}{\sum_{j=1}^n a_{ij}\tilde p_j^k}\right)

204: \ee

205:

206: Step 3 of the algorithm is now modified by substituting the expression

207: above for $\lambda_i^k$ in Eq.(\ref{eq-110}). A convergence theorem for the

208: modified algorithm can be found in Lent~\cite{Lent}. It is, however,

209: quite easy to see that the algorithm will fail unless for every

210: $i = 1, 2, ., m,$  either

211:

212: \be

213: \label{eq-130}

214: \mu_i > 0 \;\;\; \; \mbox{and} \;\;\; 0 \le a_{ij} \le 1, \; \; j = 1, 2, ..., n

215: \ee

216: {\hskip 4cm or}

217: \be

218: \label{eq-140}

219: \mu_i < 0 \;\;\; \; \mbox{and} \;\;\; 0 \ge a_{ij} \ge -1, \; \; j = 1, 2, ..., n

220: \ee

221:

222: We note that in this case we are assured of convergence of the solution

223: of our discretized EOP because the condition (\ref{eq-130}) holds.

224:

225: The EOP algorithm above can only be used provided that the condition

226: stated by the inequality (\ref{eq-130}) or (\ref{eq-140}) is satisfied.

227: This constrains us to apply the algorithm for power moments but

228: neither of these two are necessarily true for other polynomial moments.

229: In order to work with Chebyshev polynomials, we first employ the averages

230: of shifted Chebyshev polynomials~\cite{num-recipe} of the first kind

231: $T_n^{*}(x) = T_n(2x-1)$ to recast the entropy optimization program

232: (EOP) given by statement (\ref{eq-070}). The only change needed for

233: this purpose is to redefine $a_{ij}$ by $a_{ij} = T_i^{*}(x_j)$.

234:

235: Our next step is to find a transformation that will convert

236: the EOP into an equivalent problem in which all the program

237: parameters are non-negative. For finding the necessary transformation, we

238: define for $i = 1, 2, 3, . . ., m,$

239: \be

240: u_j = [\max_j (-a_{ij}) ] + 1. \nonumber

241: \ee

242: Obviously, for $i = 1, 2, 3, . . . , m$ and $j = 1, 2, 3, . . . , n,$

243: \be

244: (u_i + a_{ij}) > 0.  \nonumber

245: \ee

246:

247: Let us now define for $i = 1, 2, 3, . . . , m,$

248: \be

249: M_i \equiv \max_j(u_j + a_{ij}) \: \: ; \: \: t_i \equiv \frac{1}{m(M_i+1)}. \nonumber

250: \ee

251:

252: It is easy to see that the following relations hold for $i = 1, 2, 3, . . . , m$

253:

254: \bea

255: &&M_i > 0, \; \; t_j  > 0 \nonumber \\

256: &&(M_i+1)\,t_j = \frac{1}{m}, \; \; t_i\,(u_i + a_{ij}) \le t_i\,M_i < \frac{1}{m}

257: \nonumber

258: \eea

259:

260:

261: For $i = 1, 2, 3, . . . , m$ and $j = 1, 2, 3, . . . , n,$ let us define

262: \be

263: a_{ij}^{'} \equiv t_i(u_i + a_{ij}).

264: \ee

265: Apparently, for $i = 1, 2, 3, . . . , m$ and $j = 1, 2, 3, . . . , n,$ we have

266: \be

267: \frac{1}{m} > a_{ij}^{'} > 0 \: \: ; \: \: 0 < \sum_{i=1}^m a_{ij}^{'} =

268: \sum_{i=1}^m t_i(u_i + a_{ij}) < 1.

269: \ee

270:

271: It is interesting to note that if $ {\bf \tilde p}$ is a feasible solution to

272: the EOP involving averages of $T_n^{*}(x)$, then for $i = 1, 2, 3, . . . , m$

273: \be

274: \sum_{j=1}^n a_{ij}^{'}\tilde p_j = \sum_{j=1}^n t_i(u_i + a_{ij}) \tilde p_j = t_i(u_i

275: + \mu_i).

276: \ee

277: Hence, if we define for $i = 1, 2, 3, . . . , m,$

278: \be

279: \label{eq-new}

280: \mu_i^{'} \equiv t_i\,(u_i + \mu_i) = \sum_{j=1}^n a_{ij}^{'} \, \tilde p_j.

281: \ee

282:

283: It is easy to verify that $ 1/m > \mu_i^{'}$ for $ i = 1, 2, 3, . . ., m.$

284: The transformed EOP has thus the same form as previously, except for the fact

285: that we use Eq.(\ref{eq-new}) in place of Eq.(\ref{eq-020}). Since both

286: $a_{ij}^{'}$ and $ \mu_i^{'}$ can take only positive values, a feasible

287: solution to the original program  can now be obtained by replacing $a_{ij}$

288: and $\mu_{ij}$ in Eq.(\ref{eq-070}) by $a_{ij}^{'}$ and

289: $\mu_{ij}^{'}$~\cite{note1}.

290:

291: We consider two numerically difficult examples, a uniform distribution and a

292: double-delta function, to study the stability and accuracy of the algorithm.

293: The Chebyshev moments of these two functions can be exactly calculated. Earlier

294: efforts to reproduce these distributions have met with limited success because

295: of the difficulty in matching a sufficient number of moments and for the singular

296: nature of the functions. It would be interesting to see how the algorithm performs

297: in case of a) Uniform distribution $f(x) = 1 $,  $x \in [0,1]$ and b) a

298: double-delta function $g(x) = \delta (x-\frac{1}{4})+\delta(x-\frac{3}{4})$, $ x \in [0,1]$.

299:

300: The algorithm produces the uniform distribution correctly up to five decimal

301: places. We found that the first 25 shifted Chebyshev moments are sufficient

302: for this purpose. The fact that the end points have been produced so accurately

303: without any spurious oscillations is a definitive strength of this approach

304: and reflects the stability and accuracy of our algorithm. In figure \ref{fig1},

305: we have plotted

306: the result for the double-delta function.  The result is equally convincing and

307: certainly establishes the usefulness of this method over the other existing ones

308: in the literature.

309: \begin{figure}

310: \includegraphics[width=2.25in,height=2.25in,angle=270]{nfig1}

311: \caption{

312: \label{fig1}

313: Reconstruction of a double-delta function $ {\rm g(x)=\delta(x-\frac{1}{4})

314: +\delta(x-\frac{3}{4}})$ from shifted Chebyshev moments.

315: }

316: \end{figure}

317: In addition to these examples, we have also tested our algorithm to reconstruct a

318: Tent map, a semicircular distribution, a square-root distribution and a distribution

319: with a gap in the spectrum. In all these cases, the algorithm correctly produces

320: all the features of the distributions without failing. These results clearly

321: demonstrate that the algorithm is very stable, accurate and is capable

322: of producing some very uncommon distributions (such as double-delta function) without

323: any difficulty.

324:

325: We now consider a practical case where exact moments are not known but approximate

326: moments are available. An archetypal example is the calculation of electronic density

327: of states from its moments.  In the context of solid state physics, maxent

328: has been used profitably to calculate the density of electronic (vibrational) states

329: from a knowledge of the moments of the Hamiltonian (Dynamical) matrix. The computation

330: of moments itself is an interesting problem in this field and there are methods

331: available in the literature that specifically address this

332: issue~\cite{Drabold1, Skilling}.

333: \begin{figure}

334: \includegraphics[width=2.25in,height=2.25in,angle=270]{nfig2}

335: \caption{

336: \label{fig2}

337: Normalized electronic density of states/eV (dotted line) of amorphous silica

338: using the first 60 shifted Chebyshev moments. The distribution of energy

339: eigenvalues (point) from direct diagonalization of the Hamiltonian

340: matrix is also plotted in the figure. Normalized Fermi level is at

341: 0.595 eV.

342: }

343: \end{figure}

344: Here one is interested in determining physical quantities such as Fermi level

345: and band energy of large systems (e.g.~clusters, biological macromolecules etc.)

346: without diagonalizing the Hamiltonian matrix. For amorphous semiconductors,

347: this is particularly suitable because of disordered scattering (of electrons) that

348: washes out the van Hove singularities in the electronic spectrum. A stable and

349: accurate maxent algorithm, therefore, would be very useful in calculating

350: electronic properties of amorphous semiconductors. The two examples discussed

351: above suggest that we should be able to produce complex electronic spectrum with

352: a gap (or gaps) to a high degree of precision and hence the Fermi level and band

353: energy. As for metallic systems, the determination of Fermi energy is a non-trivial

354: problem for $O(n)$ methods. A primary requirement for a maxent algorithm in this

355: case is that 1) it must produce the distribution accurately and 2) it must

356: do so in a stable way using a sufficient number of moments to correctly produce

357: the singularities of the spectrum. It is very pleasing to note that our algorithm

358: does satisfy this requirement and therefore may offer an alternative approach to

359: compute Fermi energy of metallic systems.

360:

361: In figure~\ref{fig2}, we have plotted the EDOS of amorphous silica using first 60

362: moments and compared it to the result obtained by direct diagonalization of the

363: Hamiltonian matrix. It is clear from the figure that all the features of the EDOS

364: are correctly produced by our maxent algorithm. Finally, in figure~\ref{fig3} we

365: have plotted the variation of Fermi energy with the number of moments.  The Fermi

366: energy is computed by integrating the normalized density of states to obtain

367: the correct number of total electrons. It is clear from figure~\ref{fig3} that

368: the Fermi energy starts to converge after first 30 moments and eventually converges

369: after 40 moments.

370: \begin{figure}

371: \includegraphics[width=2.25in,height=2.25in,angle=270]{nfig3}

372: \caption{

373: \label{fig3}

374: Fermi level of amorphous silica as a function of number of the shifted

375: Chebyshev moments.  The value obtained from direct diagonalization of

376: the Hamiltonian matrix is -5.465 eV and is plotted as a horizontal

377: line in the figure.

378: }

379: \end{figure}

380:

381: In conclusion, we present an algorithm for maximum entropy construction of a

382: distribution from its moments. The algorithm is very stable, accurate and can

383: handle a large number of moments~\cite{note2} (up to 500). The usefulness of this algorithm

384: is demonstrated by constructing some numerically difficult distributions and

385: applying it to amorphous silica to compute the electronic density of states

386: and the Fermi level.

387:

388: We acknowledge the support of National Science Foundation under Grant

389: Nos.\,DMR-0205858 and DMR-0310933.

390:

391:

392:

393:

394:

395: \begin{thebibliography}{*99}

396:

397: \bibitem{Shohat}

398: J. A. Shohat and J. D. Tamarkin, {\em The Problem of Moments},

399: (American Mathematical Society, Providence, Rhode Island, 1963).

400:

401: \bibitem{Jaynes}

402: E.T. Jaynes, {\em Probability Theory: The Logic of Science} (Cambridge University Press, 2003)

403:

404: \bibitem{Collins}

405: R. Collins and A. Wragg, J Phys. A: Math. Gen. 10, 1441 (1977)

406:

407: \bibitem{Mead1}

408: L. R. Mead and N. Papanicolaou, J. Math. Phys. {\bf 25}, 2404 (1984).

409:

410: \bibitem{Drabold1}

411: D. A. Drabold and O. F. Sankey, Phys. Rev. Lett. 70, 3631 (1993).

412:

413: \bibitem{Silver0}

414: R.N. Silver and H. R\"oder,  Phys. Rev. E 56, 4822 (1997)

415:

416: \bibitem{Drabold2}

417: D.A. Drabold and G.L. Jones, J. Phys. A: Math. Gen. 24, 4705 (1991)

418:

419: \bibitem{Mello}

420: P.A. Mello and Jean-Louis Picard, Phys. Rev. B {\bf 40}, R5276 (1989)

421:

422: \bibitem{Poland}

423: D. Poland, J. Chem. Phys. 113, 4774 (2000)

424:

425: \bibitem{Steeb}

426: W-H Steeb, F. Solms and R.Stoop, J. Phys. A: Math. Gen. 27, L399 (1994).

427:

428: \bibitem{Skilling}

429: J. Skilling, in {\em Maximum entropy and Bayesian Methods}, edited

430: by J. Skilling (Kluwer, Dordrecht, 1989)

431:

432: \bibitem{Turek}

433: I. Turek, J. Phys. C: Solid St. Phys. 21, 3251 (1988).

434:

435: \bibitem{Brett}

436: G. L. Bretthorst (Unpublished)

437:

438: \bibitem{Shannon}

439: C.\,Shannon, Bell System Tech J. {\bf 27}, 379 (1948)

440:

441: \bibitem{Bergman}

442: L. M. Bergman, U.S.S.R. Comput. Maths. and Math. Phys. 7, 200 (1967).

443:

444: \bibitem{Fang}

445: S.C.Fang, J.R.Rajasekara, and H. -S. J. Tsao, {\em Entropy Optimization

446: and Mathematical programming}, (Kluwer Academic Publishers, Dordrecht, 1997).

447:

448: \bibitem{Gordon}

449: R. Gordon, R. Bender and G. T. Herman, J. Theoret. Biol. 29, 471 (1970).

450:

451: \bibitem{Lent}

452: A. Lent, in {\em Image analysis and evaluation}, edited by R. Shaw (SPSE,

453: Washington, D. C. 1953).

454:

455: \bibitem{num-recipe}

456: M. Abramowitz and I. A. Stegun, {\em Handbook of mathematical

457: functions}, (Dover Publications, New York, 1972).

458:

459: \bibitem{note1}

460: The transformed problem in terms of $a_{ij}^{'}$ and $\mu_i^{'}$

461: has exactly the same solution as the original problem. If the original

462: problem is infeasible (due to inaccurate values of higher power moments

463: etc.), this gets reflected by the lack of positive definiteness of

464: $a_{ij}^{'}$ and $\mu_i^{'}$.

465:

466: \bibitem{note2}

467: In principle there is no limit to the number of moments that can be handled

468: by the method at the expense of computational time. In the present context we

469: have gone up to 500 moments without any difficulty.

470:

471: \end{thebibliography}

472:

473: \end{document}

474: