1: %%%%%%%%%%%%%%%%%% file template.tex %%%%%%%%%%%%%%%%%%%%
2: % %
3: % Copyright (c) Optical Society of America, 1992. %
4: % %
5: %%%%%%%%%%%%%%%%%%% November 17, 1992 %%%%%%%%%%%%%%%%%%%
6: %
7: % THIS FILE IS A TEMPLATE TO PRODUCE AN ARTICLE SUBMISSION
8: % TO THE OSA JOURNALS, JOSA-A, JOSA-B, and APPLIED OPTICS.
9: %
10: % THIS TEMPLATE CONTAINS TYPESETTING COMMANDS WHICH BEGIN WITH A
11: % BACKSLASH. THESE COMMANDS WILL BE READ BY LATEX, USING THE
12: % REVTEX 3.0 STANDARD MACROS. PLEASE FILL IN THE REQUIRED DATA
13: % FOR THE MACROS, BUT DO NOT ALTER THE DEFINITIONS.
14: %
15: % EXAMPLE: IN \author{Authors' names} , PLEASE FILL IN THE
16: % AUTHORS' NAME(S).
17: %
18: % COMMENTS BEGIN WITH THE PERCENT (%) SYMBOL. AFTER A %, ANY
19: % DATA ON THE REST OF A LINE WILL NOT PRINT.
20: %
21: \documentstyle[aps,manuscript]{revtex} % DON'T CHANGE
22: %
23: %
24: \newcommand{\MF}{{\large{\manual META}\-{\manual FONT}}}
25: \newcommand{\manual}{rm} % Substitute rm (Roman) font.
26: \newcommand\bs{\char '134 } % add backslash char to \tt font
27: %
28: %
29:
30: \makeatletter
31: \newcounter{subeqncnt}
32: \def\thesubeqncnt{\alph{subeqncnt}}
33: \def\subequations{\begingroup%
34: \stepcounter{equation}\edef\@tempa{\theequation}%
35: \let\c@equation\c@subeqncnt\c@subeqncnt\z@
36: \edef\theequation{\@tempa\noexpand\thesubeqncnt}}
37: \let\endsubequations\endgroup
38: \makeatother
39:
40: \begin{document} % INITIALIZE - DONT CHANGE
41: %
42: %
43: %
44: \title{Efficient Recursion Method for Inverting Overlap Matrix}
45: \author{T. Ozaki}
46: \address{
47: RICS,
48: National Institute of Advanced Industrial Science and Technology (AIST),
49: central 2, 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568, Japan
50: and
51: JRCAT-ATP,
52: central 4, 1-1-1 Higashi, Tsukuba,
53: Ibaraki 305-0046, Japan
54: }
55: %
56: \maketitle
57: \begin{abstract} % DON'T CHANGE THIS LINE
58: A new O($N$) algorithm based on a recursion method, in which the
59: computational effort is proportional to the number of atoms $N$,
60: is presented for calculating the inverse of an overlap matrix which
61: is needed in electronic structure calculations with the the
62: non-orthogonal localized basis set. This efficient inverting
63: method can be incorporated in several O($N$) methods
64: for diagonalization of a generalized secular equation.
65: By studying convergence properties of the 1-norm of an error matrix
66: for diamond and fcc Al, this method is compared to three other O($N$)
67: methods (the divide method, Taylor expansion method, and Hotelling's
68: method) with regard to computational accuracy and efficiency within
69: the density functional theory.
70: The test calculations show that the new method
71: is about one-hundred times faster than the divide method in
72: computational time to achieve the same convergence for both diamond
73: and fcc Al, while the Taylor expansion method and Hotelling's method
74: suffer from numerical instabilities in most cases.
75: \end{abstract}
76: %
77: \vspace{2cm}
78:
79: The development of O($N$) methods
80: \cite{Pettifor,Ozaki,Goedecker,Stephan,Yang,Galli,Mauri,Daw,Li,Palser}
81: and the revival of localized
82: orbitals as a basis set
83: \cite{Sankey,Kobayashi,Kurita,Kobayashi2,Hierse,Hernandez,Ordejon,Sanchez,Horsfield}
84: have been made during the last decade
85: in order to extend the applicability of the first-principles molecular
86: dynamics (FPMD) simulations using the plane wave expansion and the
87: Car-Parrinello method within density functional
88: theories (DFT) \cite{Payne}.
89: However, only few applications of these ${\rm O}(N)$ methods to large
90: systems have been reported within the DFT calculations
91: \cite{Sanchez,Bowler,Applications}.
92: Although there are a lot of limitations of the method based on
93: the localized description \cite{Applications}, one of the limitations
94: is that several O($N$) methods require evaluating the inverse
95: of the overlap matrix $S$ which comes from non-orthogonality among
96: the localized orbitals.
97:
98: In the generalized Fermi operator expansion (FOE) method \cite{Stephan}
99: to the non-orthogonal basis we need to calculate the inverse of
100: overlap matrix to construct the modified Hamiltonian $H'\equiv S^{-1}H$,
101: while Stephan et al. have proposed solving a linear equation $SH'=H$
102: with the cutoff radii of $H$ instead of calculating the inverse of
103: overlap matrix.
104: In the density matrix (DM) method \cite{Daw,Li,Palser} which is
105: a promising approach for materials with a wide gap, fortunately,
106: the evaluation of the inverse is not required during the optimization
107: of grand potentials, although we have to evaluate the inverse of the
108: overlap matrix for a good initial guess of the density matrix \cite{Palser}.
109: The block bond-order potential (BOP) method \cite{Ozaki}, which has good
110: convergence properties for both insulators and metals, also
111: requires the evaluation of the modified Hamiltonian $H'$ as in method
112: the FOE method. If the overlap matrix is sparse, the computational
113: cost scales as the second power of the number of atoms $N$ in the
114: inverse calculation. Therefore, an efficient O($N$) method
115: for inverting the overlap matrix should be developed.
116:
117: So far, several O($N$) inverting methods have been proposed.
118: Gibson et al. used a simple method in which a linear equation
119: $SH'=H$ constructed for a finite cluster is solved without
120: explicit calculation of $S^{-1}$ \cite{Gibson}.
121: Mauri et al. considered approximating the inverse of
122: overlap matrix by the Taylor expansion \cite{Mauri}. The approach could be
123: an O($N$) inverting method when the matrix elements in the $p$th
124: moment $O^p$ of the overlap matrix $O$ are cut at a finite distance.
125: Palser and Manolopoulos proposed to evaluate the inverse
126: by Hotelling's method which is similar to the iterative
127: purification algorithm of the DM method \cite{Palser}.
128: The iterative calculation can be performed in O($N$) operations,
129: provided that the cutoff of matrix elements at a finite distance is
130: introduced in the product of two matrices.
131: It is worth pointing out that the ideas of these
132: O($N$) inverting methods are analogous to those of the
133: O($N$) methods for the diagonalization.
134: The divide method by Gibson et al. \cite{Gibson}, the Taylor expansion
135: method \cite{Mauri}, and Hotelling's method \cite{Palser} strategically
136: and mathematically correspond to the divide and conquer method \cite{Yang},
137: the FOE method \cite{Goedecker,Stephan}, and the DM method
138: \cite{Daw,Li,Palser}, respectively.
139: Therefore, one may expect that these O($N$) inverting methods
140: may have the convergence properties for realistic materials
141: similar to the O($N$) methods for the diagonalization \cite{Comparison}.
142: However, it remains to be seen whether the expectation is meaningful
143: or not.
144:
145: In this paper we propose a new O($N$) method for calculating
146: the inverse of the overlap matrix which is based on a resolvent and
147: the block Lanczos algorithm. The new method is compared
148: with the other three methods in terms of the computational accuracy
149: and efficiency. Thus, our aim of this paper is to clarify the
150: applicability of these four O($N$) inverting methods for
151: realistic materials.
152: The paper is organized as follows. In Sec. II we present the theory
153: of a new O($N$) inverting method based on a recursion method,
154: and also summarize the three other O($N$) inverting methods.
155: In Sec. III we discuss the convergence properties of these four
156: O($N$) inverting methods for the diamond and fcc Al within
157: the DFT calculations using the 1-norm of an error matrix
158: which will be related to the error in the eigenvalues in this section.
159: In Sec. IV we conclude with clear characterization of the
160: four O($N$) inverse methods.
161:
162: \begin{center}
163: {\bf II.~THEORY}
164: \end{center}
165:
166: \begin{center}
167: {\bf A. Recursion method}
168: \end{center}
169:
170: It is assumed that one-particle wave functions are expanded
171: by a localized orbital basis set $(\vert i\alpha\rangle)$, where
172: $i$ is a site index and $\alpha$ is an orbital index.
173: The localized orbitals could be Slater-type
174: \cite{Kobayashi,Kurita,Kobayashi2}, Gaussian-type \cite{Hierse},
175: and numerical orbitals \cite{Sankey,Hernandez} obtained by
176: DFT calculations for atoms.
177: In most cases, the orbitals are non-orthogonal between them,
178: leading to an overlap matrix $S$ defined by
179: \begin{eqnarray}
180: S_{i\alpha,j\beta} = \langle i\alpha \vert \hat{S}\vert j\beta \rangle,
181: \end{eqnarray}
182: where $\hat{S}$ is the overlap operator which is introduced as a matter
183: of form in order to emphasize the similarity
184: between the new inverting method and the block BOP method \cite{Ozaki},
185: although the overlap operator generally should be the identity operator I.
186: The overlap integral exponentially decays in real space
187: because of the localized nature of the orbitals, so that
188: the overlap matrix $S$ is sparse. Here we introduce a resolvent
189: $R(Z)$ for the matrix $S$ as follows:
190: \begin{eqnarray}
191: R(Z) = (S-Z{\rm I})^{-1}.
192: \end{eqnarray}
193: It is then easy to verify that
194: \begin{eqnarray}
195: S^{-1} = {\rm Re}R(0).
196: \end{eqnarray}
197: Thus, we see that the real part of the resolvent for $Z=0$
198: gives the inverse $S^{-1}$ of the overlap matrix.
199: If the resolvent for $Z=0$ has a finite value for the imaginary part,
200: the basis set is not linearly independent.
201: The resolvent can be evaluated by adopting the algorithm of the
202: block BOP method \cite{Ozaki} which is recently developed to simulate
203: orthogonal tight-binding (TB) models in O($N$) operations.
204: It is noted that the new inverting method is derived just by replacing
205: the Hamiltonian $\hat{H}$ in the block BOP method within the orthogonal
206: TB models with the overlap operator $\hat{S}$.
207: The first step in this algorithm is to block-tridiagonalize
208: the overlap matrix $S$ using the block Lanczos algorithm
209: \cite{Lanczos,Jones,Inoue,Haydock}.
210: The central equations is
211: \begin{eqnarray}
212: \hat{S}\vert U_{n}) & = & \vert U_{n})\underline{A}_{n}
213: +
214: \vert U_{n-1})\underline{B}_{n}
215: +
216: \vert U_{n+1})\underline{B}_{n+1}
217: \end{eqnarray}
218: with
219: \begin{eqnarray}
220: \vert U_0) =
221: (\vert i1\rangle,\vert i2\rangle,\dots,\vert iM_i\rangle )
222: \end{eqnarray}
223: as the starting state. $\underline{A}_n$ and $\underline{B}_n$ are
224: recursion block coefficients with $M_{i}\times M_{i}$ in size,
225: where $M_{i}$ is the number of localized orbitals on the starting
226: atom $i$, and the underline indicates that the element is a block.
227: In the block Lanczos algorithm, we need to start the recursion with
228: Eq.~(5) to make the recursion method accurate and efficient \cite{Ozaki}.
229: The Lanczos algorithm with a finite recursion transforms the overlap
230: matrix $S$ into the block-tridiagonalized matrix $S^L$ which has
231: the diagonal $A_{n}$ and the sub-diagonal block elements $B_{n}$,
232: where the index $L$ indicates the representation based on the Lanczos
233: basis. Considering the resolvent $R^{L}(Z)\equiv (S^{L}-Z{\rm I})^{-1}$
234: for the block-tridiagonalized overlap matrix,
235: the diagonal $\underline{R}^L_{00}(Z)$ and off-diagonal block elements
236: $\underline{R}^L_{0n}(Z)$ can be easily derived along the same line
237: as that described in the block BOP method \cite{Ozaki}.
238: For $Z=0$, the elements are given by
239: \begin{eqnarray}
240: \underline{R}^L_{00}(0)
241: =[\underline{A}_0-\hspace{0.4mm}^t\hspace{-0.4mm}\underline{B}_1[
242: \underline{A}_1-\hspace{0.4mm}^t\hspace{-0.4mm}\underline{B}_2[
243: \cdots
244: ]^{-1}\underline{B}_2
245: ]^{-1}\underline{B}_1
246: ]^{-1},
247: \end{eqnarray}
248: \begin{eqnarray}
249: \nonumber
250: \lefteqn{
251: \underline{R}^{L}_{0n}(0)
252: =
253: \biggl(
254: \delta_{1n}\underline{\rm I}
255: -\underline{R}^{L}_{0n-1}(0)\underline{A}_{n-1}
256: }\\
257: &&
258: \quad\quad\quad\quad
259: -\underline{R}^{L}_{0n-2}(0)
260: \hspace{0.4mm}^t\hspace{-0.4mm}\underline{B}_{n-1}
261: \biggr)
262: (\underline{B}_{n})^{-1},
263: \end{eqnarray}
264: where $\delta$ is Kronecker's delta, and
265: $R_{0-1}(0)=\hspace{0.4mm}^t\hspace{-0.4mm}B_{0}=0$.
266: Once the block diagonal element is calculated as the multiple
267: inverse Eq.~(6), the off-diagonal elements are evaluated
268: from the recurrence relation Eq.~(7) with $\underline{R}^L_{00}(0)$
269: as the starting element. In order to truncate the multiple inverse
270: in Eq.~(6) without reducing the accuracy significantly, a square root
271: terminator could be used, while there could
272: be an infinite number of levels in the multiple inverse of diagonal
273: Green's function for an infinite system.
274: In the test calculations of Sec.~III we used the square root
275: temninator for the truncation at a finite number of levels.
276: The two Eqs.~(6) and (7) provide the resolvent based on the Lanczos
277: basis representation, so that we can obtain the original resolvent
278: through the following inverse transformation:
279: \begin{eqnarray}
280: \underline{R}_{ij}(0) = \sum_{n}
281: \underline{R}^L_{0n}(0)
282: \hspace{0.4mm}^t\hspace{-0.4mm}\underline{U}_{nj},
283: \end{eqnarray}
284: where $\hspace{0.4mm}^t\hspace{-0.4mm}\underline{U}_{nj}$ is defined by
285: $\hspace{0.4mm}^t\hspace{-0.4mm}\underline{U}_{nj} = (U_{n}\vert
286: (\vert j1\rangle,\vert j2\rangle,\dots,\vert jM_j\rangle ).$
287: The inverse transformation Eq.~(8) is significantly simplified
288: because of the orthogonality in the Lanczos bases. Therefore, we only
289: have to evaluate the 0th block line of the resolvent in the Lanczos
290: basis representation.
291: The resolvent exactly satisfies a sum rule $\sum_{ij}
292: {\rm tr\left\{\underline{S}_{ij}\underline{R}_{ji}(0)\right\}}
293: = N_{B}$ which is derived from Eq.~(2), where $N_{B}$ is the
294: number of bases, and is constructed by up to (q+1)th moments
295: $S^{q+1}$ \cite{Ozaki}, where $q$ is a final level for the recursion.
296: Equation (8) gives a good approximation for the inverse of
297: overlap matrix as the number of recursion levels increases.
298: However, the approximated inverse is not strictly
299: a symmetric matrix at a finite recursion.
300: If the approximated inverse is symmetric, eigenvalues of
301: a generalized secular equation with the overlap matrix
302: are real numbers. Therefore, we evaluate the inverse of
303: overlap matrix by symmetrizing the resolvent in terms of
304: simple arithmetic average:
305: \begin{eqnarray}
306: \underline{S}^{-1}_{ij} =
307: \frac{{\rm Re}\underline{R}_{ij}(0)
308: + {\rm Re}\hspace{0.4mm}^t\hspace{-0.4mm}\underline{R}_{ji}(0)}
309: {2}.
310: \end{eqnarray}
311: The symmetrization preserves the above sum rule.
312: The all elements of the inverse are evaluated by applying the
313: series of the algorithm repeatedly to each atom.
314: The cluster over which the hops are made in the Lanczos algorithm is
315: determined by the logical truncation method \cite{Ozaki}.
316: Thus, the computational cost of the recursion method is strictly
317: proportional to the number of atoms $N$.
318:
319: \begin{center}
320: {\bf B. Divide method}
321: \end{center}
322:
323: In the case of the block BOP \cite{Ozaki} and FOE methods
324: \cite{Goedecker,Stephan}, it is required to evaluate
325: the modified Hamiltonian $H'=S^{-1}H$ rather than the inverse of
326: overlap matrix. In such cases we have an alternative way
327: where a linear equation
328: \begin{eqnarray}
329: SH'=H
330: \end{eqnarray}
331: is solved instead of calculating the inverse.
332: In conventional ways of solving the linear equation for a total system,
333: the computational cost scales as the third
334: power of the number of atoms $N$, while the scaling could be
335: reduced to ${\rm O}(N^2)$, making use of the sparseness
336: of the overlap matrix. Therefore, Gibson et al. have proposed
337: a solution of Eq.~(10) with the cutoff radii of $H$ and $S$ \cite{Gibson}.
338: The linear equation Eq.~(10) can be decomposed into $N$ subspace
339: linear equations for $N$ finite clusters under this constraint.
340: One solves each of the subspace linear equations for the finite clusters
341: centered on atom $i$ using a conventional method such as the
342: Cholesky factorization, which results in O($N$) operations
343: for the computational effort.
344: However, the divide method has redundancy in the calculation
345: that one has to evaluate all matrix elements of the modified
346: Hamiltonian $H'$ for each finite cluster compared to the other
347: ${\rm O}(N)$ inverting methods in which the elements in the inverse
348: of the overlap matrix are not doubly calculated.
349: Thus, the prefactor of the ${\rm O}(N)$ operations could be
350: very large for highly coordinated structures such as fcc.
351: The magnitude of the prefactor will be discussed in Sec.~III.
352: An iterative scheme such as the Gauss-Siedel method \cite{Jones,Foulkes}
353: which is commonly used for large-scale systems is also available for
354: solving the linear equation Eq.~(10). However, it has been
355: recognized that the iterative scheme is computationally expensive
356: \cite{Gibson}, so that the iterative scheme was
357: not investigated in this study.
358: We used the logical truncation method to construct the subspace
359: linear equation as well as the recursion method in the test calculations
360: discussed in Sec.~III in order to compare the computational performance.
361:
362: \begin{center}
363: {\bf C. Taylor expansion method}
364: \end{center}
365:
366: Mauri et al. have proposed to approximate the inverse of the overlap
367: matrix using the Taylor expansion in their ${\rm O}(N)$ unconstrained
368: minimization method \cite{Mauri}. The overlap matrix $S$ is expressed as a
369: sum of the identity ${\rm I}$ and an $O$-matrix $O$ which is the overlap
370: matrix between the different orbitals:
371: \begin{eqnarray}
372: S = {\rm I} + O,
373: \end{eqnarray}
374: then we can expand the inverse of $S$ in respect to the $O$-matrix
375: as follows:
376: \begin{eqnarray}
377: \nonumber
378: S^{-1} & = & \sum_{n=0}^{\infty}(-1)^n O^n\\
379: & = & {\rm I} - O + O^2 - O^3 + \dots
380: \end{eqnarray}
381: The computational accuracy and efficiency of the approximation
382: by the Taylor series depend on the convergence for the summation
383: of Eq.~(12). The summation in Eq.~(12) does not converge, but
384: diverges, when the spectrum radius of the $O$-matrix exceeds 1.0.
385: Even if the $O$-matrix has no eigenvalues which are and below -1.0,
386: indicating the basis set is linearly independent, the eigenvalues
387: of the $O$-matrix exceed 1.0 in most cases as shown in Sec.~III.
388: In such cases, the Taylor expansion method cannot be applied.
389: The matrix $O^n$ is calculated as the product of the perfect
390: but highly sparse $O$-matrix, and $O^{n-1}$ with the cutoff
391: radii for the elements, so that the summation to a finite order
392: in Eq.~(12) can be performed with ${\rm O}(N)$ operations.
393:
394: \begin{center}
395: {\bf D. Hotelling's method}
396: \end{center}
397:
398: Palser and Manolopoulos \cite{Palser} have suggested evaluating
399: the inverse $S^{-1}$ using Hotelling's method \cite{Recipes,Pan}.
400: The method has an iterative algorithm very similar to the purification
401: algorithm \cite{Palser} in the DM method.
402: The convergence rate in Hotelling's method is also quadratic
403: as with the DM method.
404: The purification of an approximate inverse is achieved using the
405: following iterative relation:
406: \begin{eqnarray}
407: S^{-1}_{n+1} = 2 S^{-1}_{n} - S^{-1}_{n}SS^{-1}_{n}.
408: \end{eqnarray}
409: In case of $S^{-1}_0 = {\rm I}$, Hotelling's method is equivalent to
410: the Taylor expansion method to a finite order described in the previous
411: subsection (C). It is easy to verify that $S_1$ and $S_2$ are
412: the Taylor series to the first and third orders of the $O$-matrix,
413: respectively:
414: \begin{eqnarray}
415: \nonumber
416: S^{-1}_{1} & = & 2 S^{-1}_{0} - S^{-1}_{0}SS^{-1}_{0}\\
417: & = & {\rm I} - O,
418: \end{eqnarray}
419: \begin{eqnarray}
420: \nonumber
421: S^{-1}_{2} & = & 2 S^{-1}_{1} - S^{-1}_{1}SS^{-1}_{1}\\
422: & = & {\rm I} - O + O^2 - O^3.
423: \end{eqnarray}
424: From Eqs.~(14) and (15), we see that Hotelling's method converges
425: quadratically compared to the linear convergence of Taylor
426: expansion method. Thus, if Eq.~(12) is a convergent series,
427: Hotelling's method should be more efficient rather than
428: the Taylor expansion method.
429: When the spectrum radius of the $O$-matrix exceeds 1.0,
430: the identity ${\rm I}$ cannot be used as the initial guess for
431: the inverse $S^{-1}$. In such cases, although it is very difficult
432: to estimate a good initial matrix $S_{0}^{-1}$ for the iteration Eq.~(13),
433: in this study, we use the overlap $S$ with a small prefactor $\sigma$
434: derived by Pan and Reif \cite{Pan} as the initial guess:
435: \begin{eqnarray}
436: S^{-1}_{0} = \sigma S
437: \end{eqnarray}
438: with
439: \begin{eqnarray}
440: \sigma = \frac{1}
441: {\left(
442: \displaystyle{\max_{i\alpha}}
443: \displaystyle{\sum_{j\beta}}\vert S_{i\alpha,j\beta}\vert
444: \right)^2}.
445: \end{eqnarray}
446: It is noted that Hotelling's method possesses an advantage
447: that the inverse at the previous MD step could be a good guess
448: of $S_{0}^{-1}$ at the current MD step, while any information
449: at the previous MD step cannot be made use of in the other methods;
450: the recursion method, the divide method, and the Taylor expansion method.
451: In the iteration Eq.~(13), the elements of $S_{n}^{-1}$ are cut
452: at a finite distance. As a result of this truncation, the computational
453: effort of Hotelling's method scales linearly with the system size.
454: In test calculations of Sec.~III, we used the logical truncation
455: method for the cutoff of the elements as in the other inverting
456: O($N$) methods.
457:
458: \begin{center}
459: {\bf III.~CONVERGENCE PROPERTIES}
460: \end{center}
461:
462: \begin{center}
463: {\bf A. Error analysis}
464: \end{center}
465:
466: In order to compare the four ${\rm O}(N)$ inverse methods presented
467: in the Sec.~II in terms of computational accuracy and efficiency,
468: we first relate the 1-norm of an error matrix $E$ with the error of
469: eigenvalues $\epsilon_{\nu}$ of a secular equation by using
470: an error analysis theory \cite{Golub,Chatelin}.
471: The generalized secular equation with the overlap matrix $S$ is derived
472: from the variational principle within DFT using a non-orthogonal basis set.
473: \begin{eqnarray}
474: S^{-1}HC_{\nu} = \epsilon_{\nu} C_{\nu},
475: \end{eqnarray}
476: where $H_{i\alpha,j\beta}
477: \equiv \langle i\alpha \vert\hat{H}\vert j\beta\rangle$
478: and
479: $C_{i\alpha,\nu}$ is an expansion coefficient
480: $C_{i\alpha,\nu}\equiv \langle i\alpha\vert \phi_{\nu}\rangle$
481: in a one-particle wave function $\vert \phi_{\nu}\rangle$.
482: Let us consider substituting the exact inverse $S^{-1}$ with
483: an approximate inverse $S'^{-1}$ in Eq.~(18), then the difference
484: between $S^{-1}$ and $S'^{-1}$ is
485: \begin{eqnarray}
486: S'^{-1} - S^{-1} = \Delta S^{-1}.
487: \end{eqnarray}
488: For the approximate inverse $S'^{-1}$
489: the secular equation $S'^{-1}HC'_{\nu} = \epsilon'_{\nu} C'_{\nu}$
490: is satisfied with approximate eigenvalues $\epsilon'_{\nu}$ and
491: eigenvectors $C'_{\nu}$.
492: According to the error analysis theory \cite{Golub,Chatelin},
493: the difference between the exact and the approximate eigenvalues
494: is given by
495: \begin{eqnarray}
496: \vert\epsilon'_{\nu} - \epsilon_{\nu}\vert = {\rm O}(\lambda)
497: \end{eqnarray}
498: with $\lambda$, which is the 1-norm of a matrix $\Delta S^{-1}H$,
499: defined by
500: \begin{eqnarray}
501: \lambda =
502: \max_{j\beta}\sum_{i\alpha}
503: \left\vert \sum_{k\gamma}
504: \Delta S^{-1}_{i\alpha,k\gamma}H_{k\gamma,j\beta}
505: \right\vert.
506: \end{eqnarray}
507: Therefore, we see that the error in eigenvalue is proportional to
508: the 1-norm of $\Delta S^{-1}H$ for the approximation of the overlap
509: matrix. Equation (20) apparently connects the error of the overlap
510: matrix to that of the eigenvalue. However, it is not possible
511: to calculate the exact inverse for infinite or periodic systems,
512: so that we introduce an error matrix $E$, which is easily evaluated,
513: defined as the difference between a matrix $SS'^{-1}H$ and the original
514: Hamiltonian $H$:
515: \begin{eqnarray}
516: \nonumber
517: E & \equiv &
518: SS'^{-1}H - H\\
519: & = & S\Delta S^{-1}H.
520: \end{eqnarray}
521: The 1-norm $\eta$ of the error matrix $E$ can be related to
522: that $\lambda$ of the matrix $\Delta S^{-1}H$ as follows:
523: \begin{eqnarray}
524: \nonumber
525: \eta & = & \max_{j\beta}\sum_{k'\gamma'}
526: \left\vert \sum_{i\alpha}\sum_{k\gamma}
527: S_{k'\gamma',i\alpha}\Delta S^{-1}_{i\alpha,k\gamma}
528: H_{k\gamma,j\beta}
529: \right\vert\\
530: \nonumber
531: & \leq &
532: \max_{j\beta}\sum_{k'\gamma'}
533: \sum_{i\alpha}
534: \vert S_{k'\gamma',i\alpha} \vert
535: \left\vert
536: \sum_{k\gamma}
537: \Delta S^{-1}_{i\alpha,k\gamma}
538: H_{k\gamma,j\beta}
539: \right\vert\\
540: \nonumber
541: & \leq &
542: N_{av}
543: \left(
544: \max_{j\beta}
545: \sum_{i\alpha}
546: \left\vert
547: \sum_{k\gamma}
548: \Delta S^{-1}_{i\alpha,k\gamma}
549: H_{k\gamma,j\beta}
550: \right\vert
551: \right)\\
552: & = &
553: N_{av}\lambda,
554: \end{eqnarray}
555: where $N_{av}$ is the average number of the non-zero elements
556: in the overlap matrix for an orbital $\vert i\alpha \rangle$.
557: The third relation in Eq.~(23) is derived by substituting the non-zero
558: overlap integrals $\vert S_{k'\gamma,i\alpha} \vert$ to 1 with
559: the variables $i\alpha$ fixed in the second relation.
560: Considering Eqs.~(21) and (23), we can relate the 1-norm of the
561: error matrix to the error of the eigenvalue:
562: \begin{eqnarray}
563: \vert\epsilon'_{\nu} - \epsilon_{\nu}\vert = {\rm O}(\eta).
564: \end{eqnarray}
565: Therefore, we will compare the four O($N$) inverse methods
566: using the 1-norm $\eta$, which is easily evaluated, instead of
567: $\lambda$.
568:
569: \begin{center}
570: {\bf B. Numerical tests}
571: \end{center}
572:
573: We numerically studied convergence properties of the four inverse
574: ${\rm O}(N)$ methods using 1-norm $\eta$ for diamond and fcc
575: Al within DFT proposed by Sankey and Niklewski \cite{Sankey}.
576: In this DFT calculations we used numerical localized orbitals,
577: fireball bases by Sankey and Niklewski \cite{Sankey},
578: as a minimal basis set for valence electrons.
579: The radii of the radial-wave function confinement are 2.1 and
580: 3.7~\AA~for carbon and aluminum atoms, respectively.
581: The minimal basis sets give 1.253 (1.244)
582: and 2.515 (2.466)~\AA~ as an equilibrium bond length of dimer for
583: carbon and aluminum, respectively, where the values in the
584: parentheses are experimental results.
585:
586: In Fig.~(1) we show the density of states for eigenvalues of
587: $O$-matrix, which is defined by Eq.~(11), in diamond and fcc Al.
588: In both cases the $O$-matrices have no eigenvalues smaller than
589: -1.0, so that the basis sets are linearly independent for
590: the structures. However, the density of states possess finite
591: values for the eigenvalues larger than or equal to 1.0 in both cases.
592: In other words the spectrum radius of the $O$-matrix exceeds 1.0.
593: This means that the summation in Eq.~(12) for the Taylor expansion
594: method diverges for diamond and fcc Al.
595: In addition to the above cases, we confirmed that the spectrum
596: radii of the $O$-matrix also exceed 1.0 for the graphite and
597: poly(ethylene), so that the applicability of the Taylor expansion
598: method is strictly restricted. Therefore, we do not provide the
599: convergence properties of the Taylor expansion method in this paper.
600:
601: Figure 2 shows the convergence properties of the 1-norm $\eta$ of
602: the error matrix for diamond calculated by the recursion, divide,
603: and Hotelling's methods.
604: In the recursion method the 1-norm exponentially decays for each
605: shell cluster as a function of the number of recursion levels,
606: and finally converges to the value of the 1-norm calculated by
607: the divide method for the corresponding cluster.
608: In the divide method the 1-norm almost exponentially diminishes
609: as a function of number of shells.
610: For the seven-shell cluster the 1-norm is only $3.1\times 10^{-5}$~eV.
611: The identity matrix ${\rm I}$ cannot be used as an initial guess
612: $S_{0}^{-1}$ in Hotelling's method because the spectrum
613: radii of the $O$-matrix exceed 1.0. Thus, we gave the initial guess
614: $S_{0}^{-1}$ by Eq.~(16), where $\sigma$ is 0.021 for diamond.
615: In Hotelling's method the convergence properties are not
616: monotonic compared to the other two methods.
617: For three-, five-, and seven-shell clusters, the 1-norm is
618: gradually reduced for smaller number of iterations.
619: However, the 1-norm increases after reaching at the minimum,
620: and finally we have a numerical instability that the 1-norm diverges
621: as iteration proceeds.
622: The smallest 1-norm for each shell-cluster is slightly larger than
623: that calculated by the divide method for the same cluster.
624: Therefore, we see that Hotelling's method cannot reach the perfect
625: convergence for diamond due to the numerical instability.
626: For Hotelling's method we also examined the convergence properties
627: of the 1-norm $\eta$ for carbon in the diamond structure
628: with 3.9~\AA~of a lattice constant in which the spectrum radius
629: of the $O$-matrix is within 1.0, while the result is not shown in
630: this paper. In this system the 1-norm very quickly converges to
631: the corresponding value calculated by the divide method for the
632: same cluster. Thus, we heuristically find that Hotelling's method
633: gives convergent results for systems with the spectrum radii smaller
634: than 1.0.
635:
636: As with Fig.~2, the convergence properties of the 1-norm are shown
637: in Fig.~3 for fcc Al.
638: The magnitude of the 1-norm is 1$\sim$2 order larger than that of
639: diamond, while the behavior of the 1-norm is very similar to
640: that of diamond.
641: In the recursion method the converged values of the 1-norm are
642: consistent with those of the divide method for four- and six-shell clusters,
643: respectively. In Hotelling's method we used Eq.~(16) with $\sigma=0.0098$
644: as $S_{0}^{-1}$, since the spectrum radius of the $O$-matrix exceed 1.0
645: for fcc Al. The 1-norms for the four- and six-shell clusters
646: finally diverge without achieving the full convergence like for diamond.
647: Although we tested the convergence properties using several values
648: for $\sigma$ in both diamond and fcc Al, we could not obtain
649: converged results and moreover could not avoid the numerical instability.
650:
651: Figures 4(a) and 4(b) show the relation between the magnitude of
652: the 1-norm $\eta$ of the error matrix and the computational time per
653: atom to evaluate the inverse of the overlap matrix for
654: diamond and fcc Al, respectively.
655: The comparison clearly indicates that the computational efficiency
656: increases in the order of the divide $<$ Hotelling's $<$
657: the recursion methods for both diamond and fcc Al.
658: The recursion method is about one-hundred times faster than the divide
659: method in computational time to achieve the same convergence for diamond
660: and fcc Al.
661:
662: \begin{center}
663: {\bf IV.~CONCLUSIONS}
664: \end{center}
665:
666: We presented a new O($N$) algorithm for calculating the inverse of the
667: overlap matrix $S$. It is based on the recursion method with the block
668: Lanczos algorithm. The problem of evaluating $S^{-1}$ is mapped to the
669: block BOP method for an orthogonal TB model just by replacing the
670: Hamiltonian with the overlap operator.
671: In addition, we briefly described the other known-methods
672: for calculating the inverse in ${\rm O}(N)$ operations:
673: the divide, the Taylor expansion, and Hotelling's methods.
674: We examined the computational accuracy and efficiency
675: of these ${\rm O}(N)$ inverting methods using the 1-norm of the
676: error matrix for diamond and fcc Al in DFT calculations with
677: the minimal basis set for valence electrons.
678: The spectrum radius of the $O$-matrix given by $(S-{\rm I})$
679: exceeds 1.0 for many real materials in the DFT calculations
680: based on the localized bases, which means that the applicability
681: of the Taylor expansion method is significantly restricted.
682: In the recursion method the 1-norm of the error matrix exponentially
683: converges to the value calculated by the divide method for the same
684: cluster in both diamond and fcc Al with numerical stability.
685: On the other hand, Hotelling's method cannot reach the
686: converged results due to the numerical instability in both cases.
687: The comparison of computational time shows that the recursion
688: method is the most efficient algorithm among the four O($N$)
689: inverting methods in diamond and fcc Al. The recursion method is
690: about one-hundred times faster than the divide method.
691: Thus, the new method for the evaluation of the inverse is
692: a practical algorithm and can be incorporated
693: in several O($N$) methods for total energy calculations using
694: localized orbital basis.
695:
696: \begin{center}
697: {\bf ACKNOWLEDGMENS}
698: \end{center}
699:
700: We would like to thank Y. Morikawa and H. Kino for helpful suggestions
701: about the DFT calculations.
702: We would like to thank D. R. Bowler for useful suggestions about
703: ${\rm O}(N)$ inverting methods.
704: Part of the computation in this work has been done using the computational
705: facilities of the Japan Advanced Institute of Science and Technology (JAIST).
706:
707: %
708: % ({\it REVTEX} 3.0 automatically issues
709: % a \newpage command when the \begin{table} or \begin{figure}
710: % commands are used, so the figures and tables will be placed
711: % on separate pages by {\it REVTEX}).
712:
713: \begin{references}
714:
715: % O(N) methods
716:
717: \bibitem{Pettifor}
718: D. G. Pettifor, Phys. Rev. Lett. {\bf 63}, 2480 (1989);
719: M. Aoki, Phys. Rev. Lett. {\bf 71}, 3842 (1993);
720: A. P. Horsfield, A. M. Bratkovsky,
721: D. G. Pettifor, and M. Aoki,
722: Phys. Rev. B {\bf 53},
723: 1656 (1996);
724: A. P. Horsfield, A. M. Bratkovsky,
725: M. Fearn, D. G. Pettifor, and M. Aoki,
726: Phys. Rev. B {\bf 53},
727: 12694 (1996);
728:
729: \bibitem{Ozaki}
730: T. Ozaki, Phys. Rev. B {\bf 59}, 16061 (1999);
731: T. Ozaki, M. Aoki, and D. G. Pettifor, Phys. Rev. B {\bf 61}, 7972 (2000);
732: T. Ozaki and K. Terakura, submitted to Phys. Rev. Lett.
733:
734: \bibitem{Goedecker}S. Goedecker and L. Colombo,
735: Phys. Rev. Lett. {\bf 73}, 122 (1994).
736:
737: \bibitem{Stephan}U. Stephan and D. A. Drabold,
738: Phys. Rev. B {\bf 57}, 6391 (1998).
739:
740: \bibitem{Yang}W. T. Yang, Phys. Rev. Lett. {\bf 66},
741: 1438 (1991).
742:
743: \bibitem{Galli}
744: G. Galli and M. Parrinello, Phys. Rev. Lett. {\bf 69}, 3547 (1992).
745:
746: \bibitem{Mauri}
747: F. Mauri, G. Galli, and R. Car, Phys. Rev. B {\bf 47}, 9973 (1993);
748: F. Mauri and G. Galli, Phys. Rev. B {\bf 50}, 4316 (1994).
749:
750: \bibitem{Daw}
751: M. S. Daw, Phys. Rev. B {\bf 47}, 10895 (1993).
752:
753: \bibitem{Li}
754: X.-P. Li, R. W. Nunes, and D. Vanderbilt,
755: Phys. Rev. B {\bf 47}, 10891 (1993);
756: R. Nunes and D. Vanderbilt,
757: Phys. Rev. B {\bf 50}, 17611 (1994).
758:
759: \bibitem{Palser}
760: A. H. R. Palser and D. Manolopoulos,
761: Phys. Rev. B {\bf 58}, 12704 (1998).
762:
763: % ab initio tight-binding
764:
765: \bibitem{Sankey} O. F. Sankey and D. J. Niklewski,
766: Phys. Rev. B {\bf 40}, 3979 (1989).
767:
768: \bibitem{Kobayashi} K. Kobayashi, N. Kurita, H. Kumahora, and K. Tago,
769: Phys. Rev. B {\bf 45,} 11299 (1992).
770:
771: \bibitem{Kurita} N. Kurita and K. Kobayashi,
772: Comp. and Chem. {\bf 24,} 351 (2000) and references therein.
773:
774: \bibitem{Kobayashi2} K. Kobayashi, K. Tago, and N. Kurita,
775: Phys. Rev. A {\bf 53,} 1903 (1996).
776:
777: \bibitem{Hierse} W. Hierse and E. B. Stechel,
778: Phys. Rev. B {\bf 50}, 17811 (1994).
779:
780: \bibitem{Hernandez} E. Hernandez and M. Gillan,
781: Phys. Rev. B {\bf 51}, 10157 (1995).
782:
783: \bibitem{Ordejon} P. Ordejon, E. Artacho, and J. M. Soler,
784: Phys. Rev. B {\bf 53}, R10441 (1996).
785:
786: \bibitem{Sanchez} D. Sanchez-Portal, P. Ordejon, E. Artacho, and J. M. Soler,
787: Int. J. Quant. Chem. {\bf 65}, 453 (1997).
788:
789: \bibitem{Horsfield} A. P. Horsfield,
790: Phys. Rev. B {\bf 56,} 6594 (1997).
791:
792: % Car-Parrinello
793:
794: \bibitem{Payne} M. C. Payne, M. P. Teter, D. C. Allan, T. A. Arias
795: and J. D. Joannopoulos, Rev. Mod. Phys {\bf 64,} 1045 (1992).
796:
797: % Applications
798:
799: \bibitem{Bowler} D. R. Bowler and M. J. Gillan,
800: Mol. Simulat. {\bf 25}, 239 (2000).
801:
802: \bibitem{Applications}
803: S. Goedecker, Rev. of Mod. Phys. {\bf 71,} 1085 (1999)
804: and references therein.
805:
806: % O(N) inverse methods
807:
808: \bibitem{Gibson} A. Gibson, R. Haydock, and J. P. LaFemina,
809: Phys. Rev. B {\bf 47,} 9229 (1993).
810:
811: % Comparison of O(N) methods
812:
813: \bibitem{Comparison}D. R. Bowler, M. Aoki, C. M. Goringe,
814: A. P. Horsfield, and D. G. Pettifor,
815: Modelling Simul. Mater. Sci. Eng. {\bf 5,} 199 (1997).
816:
817: % Lanczos
818:
819: \bibitem{Lanczos}C. Lanczos,
820: J. Res. Natl. Bur. Stand. {\bf 45}, 225 (1950).
821:
822: \bibitem{Jones}
823: R. Jones and M. W. Lewis, Philos. Mag. B {\bf 49}, 95 (1984);
824:
825: \bibitem{Inoue}
826: J. Inoue and Y. Ohta, J. Phys. C {\bf 20}, 1947 (1987).
827:
828: \bibitem{Haydock}
829: R. Haydock, V. Heine, and M. J. Kelly, J. Phys. C {\bf 5},
830: 2845 (1972); {\bf 8,} 2591 (1975);
831: R. Haydock, Solid State Phys. {\bf 35}, 216 (1980).
832:
833: % Gauss-Siedel
834:
835: \bibitem{Foulkes} M. Foulkes and R. Haydock,
836: J. Phys. C {\bf 19}, 6573 (1986).
837:
838: % Hotelling's method
839:
840: \bibitem{Recipes} W. H. Press, S. A. Teukolsky, W. T. Vetterling,
841: and B. P. Flannery, {\it Numerical Recipes}, 2nd ed.
842: (Cambridge University Press, Cambridge, 1992), p. 49.
843:
844: \bibitem{Pan} V. Pan and J. Reif,
845: in Proceedings of the Seventeenth Annual ACM Symposium on
846: Theory of Computing (New York: Association for Computing Machinery).
847:
848: % Error analysis
849:
850: \bibitem{Golub} G. H. Golub and C. van Loan,
851: {\it Matrix Computations}, 2nd ed.,
852: North Oxford Academic, Oxford, 1989.
853:
854: \bibitem{Chatelin} F. Chatelin,
855: {\it Valeurs propres de matrices}, Masson, Paris 1988.
856:
857: \end{references}
858:
859: % Fig.1
860:
861: \begin{figure}[t]
862: \caption{\small
863: The density of states for eigenvalues of the $O$-matrix for diamond
864: and fcc Al, where carbon and aluminum atoms have minimal numerical
865: basis sets for valence electrons which were obtained by DFT calculations
866: for the atomic states. The experimental values, 3.57 and 4.05~\AA, were
867: used as the lattice constants of diamond and fcc Al, respectively.}
868: \end{figure}
869:
870: % Fig.2
871:
872: \begin{figure}[t]
873: \caption{\small
874: The 1-norm of the error matrix for diamond calculated by
875: the (a) recursion, (b) divide, and (c) Hotelling's methods.
876: In both the recursion and Hotelling's methods, the 1-norms were
877: calculated for three-, five-, and seven-shell clusters as a function
878: of number of recursion levels and iterations, respectively.}
879: \end{figure}
880:
881: % Fig.3
882:
883: \begin{figure}[t]
884: \caption{\small
885: The 1-norm of the error matrix for fcc Al calculated by
886: the (a) recursion, (b) divide, and (c) Hotelling's methods.
887: In both the recursion and Hotelling's methods, the 1-norms were
888: calculated for four- and six-shell clusters as a function
889: of number of recursion levels and iterations, respectively.
890: }
891: \end{figure}
892:
893: % Fig.4
894:
895: \begin{figure}[t]
896: \caption{\small
897: The 1-norm of the error matrix for (a) diamond and (b) fcc Al
898: against the computational time taken per atom calculated by three
899: O($N$) inverting methods. The calculations were performed using
900: single processor on a compaq ES40 workstation.}
901: \end{figure}
902:
903: \end{document}
904:
905: