1: % Date Dec. 27, 2004 verified by AKB
2: % Date Dec. 24, 2004 by PB (Version 2)
3: % Date Dec. 15, 2004 by DAD (Revised version 1)
4: % Date May 15, 2004 by PB (version 1)
5:
6: \documentclass[prl,aps,twocolumn,showpacs]{revtex4}
7: \usepackage{graphicx,epsfig}
8:
9: % Some definitions are here
10:
11: \def\be{\begin{equation}}
12: \def\ee{\end{equation}}
13: \def\bea{\begin{eqnarray}}
14: \def\eea{\end{eqnarray}}
15:
16: \begin{document}
17:
18: \title{Maximum entropy and the problem of moments: A stable algorithm}
19:
20: \author{K.\,Bandyopadhyay and A.\,K.\,Bhattacharya}
21: \email{physakb@yahoo.com}
22: \affiliation{Department of Physics, University of Burdwan, Burdwan, WB 713104, India}
23:
24: \author{Parthapratim Biswas}
25: \email{biswas@phy.ohiou.edu}
26: \author{D.\,A.\,Drabold}
27: \email{drabold@ohio.edu}
28: \affiliation{Department of Physics and Astronomy, Ohio University, Athens, OH 45701}
29:
30: \pacs{71.23.Cq, 71.55.Jv, 02.30.Zz}
31:
32: \begin{abstract}
33:
34: We present a technique for entropy optimization to calculate a distribution from
35: its moments. The technique is based upon maximizing a discretized form of the
36: Shannon entropy functional by mapping the problem onto a dual space where an
37: optimal solution can be constructed iteratively. We demonstrate the performance
38: and stability of our algorithm with several tests on numerically difficult functions.
39: We then consider an electronic structure application, the electronic density of
40: states of amorphous silica and study the convergence of Fermi level with increasing
41: number of moments.
42: \end{abstract}
43:
44: \maketitle
45:
46: One of the fixed themes of physics is the solution of inverse problems.
47: A ubiquitous example in theoretical physics is the ``Classical Moment
48: Problem" (CMP), in which only a finite set of power moments of a
49: non-negative distribution function $p$ is known, and the full
50: distribution is needed~\cite{Shohat}. It is obvious that the solution for $p$ is {\it
51: not unique} for a finite set of moments. This non-uniqueness suggests
52: the need for a ``best guess" for $p$, based upon the available
53: information. With its ultimate roots in nineteenth century statistical
54: mechanics and a subsequent strong justification based upon probability
55: theory, the ``maximum entropy" (maxent) method has provided an
56: extremely successful variational principle to address this type of inverse
57: problem~\cite{Jaynes}. Collins and Wragg used the maxent method to solve
58: the CMP for a modest number of moments~\cite{Collins}. In a
59: comprehensive paper with seminal applications, Mead and Papanicolaou~\cite{Mead1}
60: solved the CMP with
61: maximum entropy techniques and proposed the first practical numerical
62: scheme to solve the moment problem for up to 15 moments. In a
63: host of subsequent papers, the utility of the method as an unbiased and
64: surprisingly efficient (rapidly convergent) solution of the CMP has
65: been established. The principle has been used extensively in a number of
66: diverse applications ranging from image construction to spectral analysis,
67: large-scale electronic structure problems~\cite{Drabold1,Silver0}, series
68: extrapolation and analytic continuation~\cite{Drabold2}, quantum electronic transport~\cite{Mello},
69: ligand-binding distribution in polymers~\cite{Poland}, and transport
70: planning~\cite{Steeb}.
71:
72:
73: There exist a number of maximum entropy algorithms~\cite{Skilling, Mead1, Turek, Silver0, Brett}
74: that have been developed over the last two decades. Many of the algorithms
75: (but not all) are constrained by the number of moments that it can deal with
76: and become unreliable when the number of constraints exceeds a problem-dependent
77: upper limit. As the number of moments increases, the calculation of moments
78: (particularly the power moments) becomes more sensitive to machine
79: precision and the optimization problem becomes ill-conditioned. It has
80: been observed that implementation of a maxent algorithm with more than 20
81: power moments is notoriously difficult even with extended precision arithmetic
82: and it rarely gives any further information on the nature of the distribution.
83: The use of orthogonal polynomials as basis set significantly improves the
84: accuracy and remedies most of the problems that one encounters with power
85: moments.
86:
87: In this paper we present an iterative approach to construct the maxent
88: solution of CMP, which is based upon discretization of the Shannon entropy
89: functional~\cite{Shannon}.
90: The essential idea is to discretize Shannon entropy
91: and map the problem from the primal space onto dual space where an
92: optimal solution can be constructed iteratively without
93: the need of matrix inversion. We discuss theoretical ideas and develop
94: algorithms that can be used with both power and Chebyshev moments. The
95: stability and the accuracy of the method are discussed with reference to
96: two numerically non-trivial examples -- a uniform distribution and a double-delta function.
97: We illustrate the usefulness of our technique by computing the
98: electronic density of states (EDOS) of amorphous silica with
99: particular emphasis on convergence of the Fermi level as a function of number
100: of moments.
101:
102:
103: The starting point of our approach is to use a discretized
104: form of the Shannon entropy functional~\cite{Shannon} $S[x]$
105: using a quadrature formula
106: \be
107: \label{eq-010}
108: S = - \int \, p(x) \ln p(x) dx \approx - \sum_{j=1}^n w_j \, p_j \ln p_j
109: \ee
110:
111: Here $ w_j$ and $ x_j$ are the weights and abscissas of any accurate
112: quadrature formula, say the Gauss-Legendre and without any loss of
113: generality we restrict ourselves to $x \in $ [0,1]. We want to maximize
114: $S$ subject to the discretized moment constraints
115:
116: \be
117: \label{eq-020}
118: \sum_{j=1}^{n} w_j\, x_j^i \, p_j = \sum_{j=1}^n a_{ij} \tilde p_j = \mu_i, \; i = 1, 2, ..., m
119: \ee
120:
121: where we define $\tilde p_j = w_j\, p_j $ and $ a_{ij} = x_j^i $.
122: The entropy optimization program (EOP) can now be stated as to optimize
123: the Lagrangian function
124: %
125: \be
126: \label{eq-050}
127: L({\bf \tilde p}, \eta) \equiv \sum_{j=1}^n \tilde p_j \, \ln \left(\frac{\tilde p_j}{w_j}\right)
128: - \sum_{i=1}^m \tilde \eta_i \left(\sum_{j=1}^n a_{ij} \tilde p_j - \mu_i \right)
129: \ee
130:
131: and the solution can be written as
132:
133: \be
134: \label{eq-060}
135: \tilde p_j = w_j \exp\left(\sum_{i=1}^m a_{ij} \tilde \eta_i - 1 \right), \: \: j = 1, 2,..., n
136: \ee
137:
138: Since ${\bf w} \ge 0$, Eq.(\ref{eq-060}) implies that ${\bf \tilde p} \ge $ 0.
139: Furthermore, the conditions in Eqs.(\ref{eq-020}) and (\ref{eq-060}) can be
140: lumped together
141:
142: \be
143: \label{eq-070}
144: h_i(\tilde \eta) \equiv \sum_{j=1}^n a_{ij} \, w_j \, \exp \left(\sum_{k=1}^m a_{kj}
145: \tilde \eta_k - 1 \right) - \mu_i = 0, \; \; \forall \; i.
146: \ee
147:
148: We now see from Eq.(\ref{eq-070}) that the original constrained optimization
149: program is now reduced to an {\em unconstrained convex optimization program}
150: involving the dual variables
151:
152: \be
153: \label{eq-080}
154: \min_{\tilde \eta \in R^m} \: d(\tilde \eta) \equiv \sum_{j=1}^n w_j \exp
155: (\sum_{i=1}^m a_{ij}\tilde \eta_{i} - 1) - \sum_{i=1}^m \mu_i \tilde \eta_i
156: \ee
157:
158: If the dual optimization program stated above has an optimal solution
159: ${\tilde {\bf \eta^*}} $, the solution ${\tilde p_j ({\bf \eta^*})}$
160: can be obtained from Eq.(\ref{eq-060}). Bergman has proposed an iterative
161: method to minimize the dual objective function $d(\bf \tilde \eta)$
162: taking {\em only one} dual variable at a time~\cite{Bergman}. The method
163: starts with an arbitrarily chosen ${\bf \tilde \eta^0} \in R^m$, and
164: then cyclically updates all the dual variables as follows:
165:
166: Step 1: Start with any ${\bf \tilde \eta^0} \in R^m$ and a sufficiently small
167: tolerance level $\epsilon >$ 0. Set k = 0 and obtain $\tilde p_j^0$.
168:
169: Step 2: Let i = (k mod m) + 1. Solve the equation
170:
171: \bea
172: \label{eq-105}
173: \phi_i^k(\lambda^k) = \sum_{j=1}^n a_{ij} \tilde p_j^k \exp(a_{ij}\lambda^k) - \mu_i = 0
174: \eea
175:
176: Step 3: Update each component of ${\bf \tilde \eta}$
177:
178: \be
179: \label{eq-110}
180: \tilde \eta_l^{k+1} = \tilde \eta_l^k + \lambda^k (\mbox{if}\; l = i), \; \;
181: \tilde \eta_l^{k+1} = \tilde \eta_l^k \; \mbox{if} \; l \ne i
182: \ee
183:
184: Step 4: If Eq.(\ref{eq-070}) is satisfied within the preset level of
185: tolerance, then stop with ${\bf \eta^*} = {\tilde \eta^k}$,
186: and obtain the primal solution from Eq.(\ref{eq-060}). Otherwise,
187: calculate
188: %
189: \be
190: \tilde p_j^{k+1} = w_j \exp(\sum_{i=1}^m a_{ij} \tilde
191: \eta_i^{k+1} -1), \; \; \; j = 1,2,...,n
192: \ee
193: %
194: and go to Step 2
195:
196: From a computational point of view, the most problematic part of the above
197: algorithm is the solution of the set of Eq.(\ref{eq-105}) in Step 2.
198: In a variant of the above scheme known as multiplicative algebraic reconstruction
199: technique~\cite{Fang, Gordon}, one uses the following closed-form expression
200: to approximate the correction term $\lambda^k$
201: \be
202: \label{eq-120}
203: \lambda_i^k = \ln \left(\frac{\mu_i}{\sum_{j=1}^n a_{ij}\tilde p_j^k}\right)
204: \ee
205:
206: Step 3 of the algorithm is now modified by substituting the expression
207: above for $\lambda_i^k$ in Eq.(\ref{eq-110}). A convergence theorem for the
208: modified algorithm can be found in Lent~\cite{Lent}. It is, however,
209: quite easy to see that the algorithm will fail unless for every
210: $i = 1, 2, ., m,$ either
211:
212: \be
213: \label{eq-130}
214: \mu_i > 0 \;\;\; \; \mbox{and} \;\;\; 0 \le a_{ij} \le 1, \; \; j = 1, 2, ..., n
215: \ee
216: {\hskip 4cm or}
217: \be
218: \label{eq-140}
219: \mu_i < 0 \;\;\; \; \mbox{and} \;\;\; 0 \ge a_{ij} \ge -1, \; \; j = 1, 2, ..., n
220: \ee
221:
222: We note that in this case we are assured of convergence of the solution
223: of our discretized EOP because the condition (\ref{eq-130}) holds.
224:
225: The EOP algorithm above can only be used provided that the condition
226: stated by the inequality (\ref{eq-130}) or (\ref{eq-140}) is satisfied.
227: This constrains us to apply the algorithm for power moments but
228: neither of these two are necessarily true for other polynomial moments.
229: In order to work with Chebyshev polynomials, we first employ the averages
230: of shifted Chebyshev polynomials~\cite{num-recipe} of the first kind
231: $T_n^{*}(x) = T_n(2x-1)$ to recast the entropy optimization program
232: (EOP) given by statement (\ref{eq-070}). The only change needed for
233: this purpose is to redefine $a_{ij}$ by $a_{ij} = T_i^{*}(x_j)$.
234:
235: Our next step is to find a transformation that will convert
236: the EOP into an equivalent problem in which all the program
237: parameters are non-negative. For finding the necessary transformation, we
238: define for $i = 1, 2, 3, . . ., m,$
239: \be
240: u_j = [\max_j (-a_{ij}) ] + 1. \nonumber
241: \ee
242: Obviously, for $i = 1, 2, 3, . . . , m$ and $j = 1, 2, 3, . . . , n,$
243: \be
244: (u_i + a_{ij}) > 0. \nonumber
245: \ee
246:
247: Let us now define for $i = 1, 2, 3, . . . , m,$
248: \be
249: M_i \equiv \max_j(u_j + a_{ij}) \: \: ; \: \: t_i \equiv \frac{1}{m(M_i+1)}. \nonumber
250: \ee
251:
252: It is easy to see that the following relations hold for $i = 1, 2, 3, . . . , m$
253:
254: \bea
255: &&M_i > 0, \; \; t_j > 0 \nonumber \\
256: &&(M_i+1)\,t_j = \frac{1}{m}, \; \; t_i\,(u_i + a_{ij}) \le t_i\,M_i < \frac{1}{m}
257: \nonumber
258: \eea
259:
260:
261: For $i = 1, 2, 3, . . . , m$ and $j = 1, 2, 3, . . . , n,$ let us define
262: \be
263: a_{ij}^{'} \equiv t_i(u_i + a_{ij}).
264: \ee
265: Apparently, for $i = 1, 2, 3, . . . , m$ and $j = 1, 2, 3, . . . , n,$ we have
266: \be
267: \frac{1}{m} > a_{ij}^{'} > 0 \: \: ; \: \: 0 < \sum_{i=1}^m a_{ij}^{'} =
268: \sum_{i=1}^m t_i(u_i + a_{ij}) < 1.
269: \ee
270:
271: It is interesting to note that if $ {\bf \tilde p}$ is a feasible solution to
272: the EOP involving averages of $T_n^{*}(x)$, then for $i = 1, 2, 3, . . . , m$
273: \be
274: \sum_{j=1}^n a_{ij}^{'}\tilde p_j = \sum_{j=1}^n t_i(u_i + a_{ij}) \tilde p_j = t_i(u_i
275: + \mu_i).
276: \ee
277: Hence, if we define for $i = 1, 2, 3, . . . , m,$
278: \be
279: \label{eq-new}
280: \mu_i^{'} \equiv t_i\,(u_i + \mu_i) = \sum_{j=1}^n a_{ij}^{'} \, \tilde p_j.
281: \ee
282:
283: It is easy to verify that $ 1/m > \mu_i^{'}$ for $ i = 1, 2, 3, . . ., m.$
284: The transformed EOP has thus the same form as previously, except for the fact
285: that we use Eq.(\ref{eq-new}) in place of Eq.(\ref{eq-020}). Since both
286: $a_{ij}^{'}$ and $ \mu_i^{'}$ can take only positive values, a feasible
287: solution to the original program can now be obtained by replacing $a_{ij}$
288: and $\mu_{ij}$ in Eq.(\ref{eq-070}) by $a_{ij}^{'}$ and
289: $\mu_{ij}^{'}$~\cite{note1}.
290:
291: We consider two numerically difficult examples, a uniform distribution and a
292: double-delta function, to study the stability and accuracy of the algorithm.
293: The Chebyshev moments of these two functions can be exactly calculated. Earlier
294: efforts to reproduce these distributions have met with limited success because
295: of the difficulty in matching a sufficient number of moments and for the singular
296: nature of the functions. It would be interesting to see how the algorithm performs
297: in case of a) Uniform distribution $f(x) = 1 $, $x \in [0,1]$ and b) a
298: double-delta function $g(x) = \delta (x-\frac{1}{4})+\delta(x-\frac{3}{4})$, $ x \in [0,1]$.
299:
300: The algorithm produces the uniform distribution correctly up to five decimal
301: places. We found that the first 25 shifted Chebyshev moments are sufficient
302: for this purpose. The fact that the end points have been produced so accurately
303: without any spurious oscillations is a definitive strength of this approach
304: and reflects the stability and accuracy of our algorithm. In figure \ref{fig1},
305: we have plotted
306: the result for the double-delta function. The result is equally convincing and
307: certainly establishes the usefulness of this method over the other existing ones
308: in the literature.
309: \begin{figure}
310: \includegraphics[width=2.25in,height=2.25in,angle=270]{nfig1}
311: \caption{
312: \label{fig1}
313: Reconstruction of a double-delta function $ {\rm g(x)=\delta(x-\frac{1}{4})
314: +\delta(x-\frac{3}{4}})$ from shifted Chebyshev moments.
315: }
316: \end{figure}
317: In addition to these examples, we have also tested our algorithm to reconstruct a
318: Tent map, a semicircular distribution, a square-root distribution and a distribution
319: with a gap in the spectrum. In all these cases, the algorithm correctly produces
320: all the features of the distributions without failing. These results clearly
321: demonstrate that the algorithm is very stable, accurate and is capable
322: of producing some very uncommon distributions (such as double-delta function) without
323: any difficulty.
324:
325: We now consider a practical case where exact moments are not known but approximate
326: moments are available. An archetypal example is the calculation of electronic density
327: of states from its moments. In the context of solid state physics, maxent
328: has been used profitably to calculate the density of electronic (vibrational) states
329: from a knowledge of the moments of the Hamiltonian (Dynamical) matrix. The computation
330: of moments itself is an interesting problem in this field and there are methods
331: available in the literature that specifically address this
332: issue~\cite{Drabold1, Skilling}.
333: \begin{figure}
334: \includegraphics[width=2.25in,height=2.25in,angle=270]{nfig2}
335: \caption{
336: \label{fig2}
337: Normalized electronic density of states/eV (dotted line) of amorphous silica
338: using the first 60 shifted Chebyshev moments. The distribution of energy
339: eigenvalues (point) from direct diagonalization of the Hamiltonian
340: matrix is also plotted in the figure. Normalized Fermi level is at
341: 0.595 eV.
342: }
343: \end{figure}
344: Here one is interested in determining physical quantities such as Fermi level
345: and band energy of large systems (e.g.~clusters, biological macromolecules etc.)
346: without diagonalizing the Hamiltonian matrix. For amorphous semiconductors,
347: this is particularly suitable because of disordered scattering (of electrons) that
348: washes out the van Hove singularities in the electronic spectrum. A stable and
349: accurate maxent algorithm, therefore, would be very useful in calculating
350: electronic properties of amorphous semiconductors. The two examples discussed
351: above suggest that we should be able to produce complex electronic spectrum with
352: a gap (or gaps) to a high degree of precision and hence the Fermi level and band
353: energy. As for metallic systems, the determination of Fermi energy is a non-trivial
354: problem for $O(n)$ methods. A primary requirement for a maxent algorithm in this
355: case is that 1) it must produce the distribution accurately and 2) it must
356: do so in a stable way using a sufficient number of moments to correctly produce
357: the singularities of the spectrum. It is very pleasing to note that our algorithm
358: does satisfy this requirement and therefore may offer an alternative approach to
359: compute Fermi energy of metallic systems.
360:
361: In figure~\ref{fig2}, we have plotted the EDOS of amorphous silica using first 60
362: moments and compared it to the result obtained by direct diagonalization of the
363: Hamiltonian matrix. It is clear from the figure that all the features of the EDOS
364: are correctly produced by our maxent algorithm. Finally, in figure~\ref{fig3} we
365: have plotted the variation of Fermi energy with the number of moments. The Fermi
366: energy is computed by integrating the normalized density of states to obtain
367: the correct number of total electrons. It is clear from figure~\ref{fig3} that
368: the Fermi energy starts to converge after first 30 moments and eventually converges
369: after 40 moments.
370: \begin{figure}
371: \includegraphics[width=2.25in,height=2.25in,angle=270]{nfig3}
372: \caption{
373: \label{fig3}
374: Fermi level of amorphous silica as a function of number of the shifted
375: Chebyshev moments. The value obtained from direct diagonalization of
376: the Hamiltonian matrix is -5.465 eV and is plotted as a horizontal
377: line in the figure.
378: }
379: \end{figure}
380:
381: In conclusion, we present an algorithm for maximum entropy construction of a
382: distribution from its moments. The algorithm is very stable, accurate and can
383: handle a large number of moments~\cite{note2} (up to 500). The usefulness of this algorithm
384: is demonstrated by constructing some numerically difficult distributions and
385: applying it to amorphous silica to compute the electronic density of states
386: and the Fermi level.
387:
388: We acknowledge the support of National Science Foundation under Grant
389: Nos.\,DMR-0205858 and DMR-0310933.
390:
391:
392:
393:
394:
395: \begin{thebibliography}{*99}
396:
397: \bibitem{Shohat}
398: J. A. Shohat and J. D. Tamarkin, {\em The Problem of Moments},
399: (American Mathematical Society, Providence, Rhode Island, 1963).
400:
401: \bibitem{Jaynes}
402: E.T. Jaynes, {\em Probability Theory: The Logic of Science} (Cambridge University Press, 2003)
403:
404: \bibitem{Collins}
405: R. Collins and A. Wragg, J Phys. A: Math. Gen. 10, 1441 (1977)
406:
407: \bibitem{Mead1}
408: L. R. Mead and N. Papanicolaou, J. Math. Phys. {\bf 25}, 2404 (1984).
409:
410: \bibitem{Drabold1}
411: D. A. Drabold and O. F. Sankey, Phys. Rev. Lett. 70, 3631 (1993).
412:
413: \bibitem{Silver0}
414: R.N. Silver and H. R\"oder, Phys. Rev. E 56, 4822 (1997)
415:
416: \bibitem{Drabold2}
417: D.A. Drabold and G.L. Jones, J. Phys. A: Math. Gen. 24, 4705 (1991)
418:
419: \bibitem{Mello}
420: P.A. Mello and Jean-Louis Picard, Phys. Rev. B {\bf 40}, R5276 (1989)
421:
422: \bibitem{Poland}
423: D. Poland, J. Chem. Phys. 113, 4774 (2000)
424:
425: \bibitem{Steeb}
426: W-H Steeb, F. Solms and R.Stoop, J. Phys. A: Math. Gen. 27, L399 (1994).
427:
428: \bibitem{Skilling}
429: J. Skilling, in {\em Maximum entropy and Bayesian Methods}, edited
430: by J. Skilling (Kluwer, Dordrecht, 1989)
431:
432: \bibitem{Turek}
433: I. Turek, J. Phys. C: Solid St. Phys. 21, 3251 (1988).
434:
435: \bibitem{Brett}
436: G. L. Bretthorst (Unpublished)
437:
438: \bibitem{Shannon}
439: C.\,Shannon, Bell System Tech J. {\bf 27}, 379 (1948)
440:
441: \bibitem{Bergman}
442: L. M. Bergman, U.S.S.R. Comput. Maths. and Math. Phys. 7, 200 (1967).
443:
444: \bibitem{Fang}
445: S.C.Fang, J.R.Rajasekara, and H. -S. J. Tsao, {\em Entropy Optimization
446: and Mathematical programming}, (Kluwer Academic Publishers, Dordrecht, 1997).
447:
448: \bibitem{Gordon}
449: R. Gordon, R. Bender and G. T. Herman, J. Theoret. Biol. 29, 471 (1970).
450:
451: \bibitem{Lent}
452: A. Lent, in {\em Image analysis and evaluation}, edited by R. Shaw (SPSE,
453: Washington, D. C. 1953).
454:
455: \bibitem{num-recipe}
456: M. Abramowitz and I. A. Stegun, {\em Handbook of mathematical
457: functions}, (Dover Publications, New York, 1972).
458:
459: \bibitem{note1}
460: The transformed problem in terms of $a_{ij}^{'}$ and $\mu_i^{'}$
461: has exactly the same solution as the original problem. If the original
462: problem is infeasible (due to inaccurate values of higher power moments
463: etc.), this gets reflected by the lack of positive definiteness of
464: $a_{ij}^{'}$ and $\mu_i^{'}$.
465:
466: \bibitem{note2}
467: In principle there is no limit to the number of moments that can be handled
468: by the method at the expense of computational time. In the present context we
469: have gone up to 500 moments without any difficulty.
470:
471: \end{thebibliography}
472:
473: \end{document}
474: