cond-mat0106048/part1
1: %%%%%%%%%%%%%%%%%% file template.tex %%%%%%%%%%%%%%%%%%%%
2: %                                                       %
3: %    Copyright (c) Optical Society of America, 1992.    %
4: %                                                       %
5: %%%%%%%%%%%%%%%%%%% November 17, 1992 %%%%%%%%%%%%%%%%%%%
6: %
7: % THIS FILE IS A TEMPLATE TO PRODUCE AN ARTICLE SUBMISSION
8: % TO THE OSA JOURNALS, JOSA-A, JOSA-B, and APPLIED OPTICS.
9: %
10: % THIS TEMPLATE CONTAINS TYPESETTING COMMANDS WHICH BEGIN WITH A
11: % BACKSLASH.  THESE COMMANDS WILL BE READ BY LATEX, USING THE
12: % REVTEX 3.0 STANDARD MACROS.   PLEASE FILL IN THE REQUIRED DATA
13: % FOR THE MACROS, BUT DO NOT ALTER THE DEFINITIONS.
14: %
15: % EXAMPLE: IN \author{Authors' names} , PLEASE FILL IN THE
16: % AUTHORS' NAME(S).
17: %
18: % COMMENTS BEGIN WITH THE PERCENT (%) SYMBOL. AFTER A %, ANY
19: % DATA ON THE REST OF A LINE WILL NOT PRINT.
20: %
21: \documentstyle[aps,manuscript]{revtex}  % DON'T CHANGE
22: %
23: %
24: \newcommand{\MF}{{\large{\manual META}\-{\manual FONT}}}
25: \newcommand{\manual}{rm}        % Substitute rm (Roman) font.
26: \newcommand\bs{\char '134 }     % add backslash char to \tt font
27: %
28: %
29: 
30: \makeatletter
31: \newcounter{subeqncnt}
32: \def\thesubeqncnt{\alph{subeqncnt}}
33: \def\subequations{\begingroup%
34:    \stepcounter{equation}\edef\@tempa{\theequation}%
35:    \let\c@equation\c@subeqncnt\c@subeqncnt\z@
36:    \edef\theequation{\@tempa\noexpand\thesubeqncnt}}
37: \let\endsubequations\endgroup
38: \makeatother
39: 
40: \begin{document}                % INITIALIZE - DONT CHANGE
41: %
42: %
43: %
44: \title{Efficient Recursion Method for Inverting Overlap Matrix}
45: \author{T. Ozaki}
46: \address{
47:      RICS,
48:      National Institute of Advanced Industrial Science and Technology (AIST),
49:      central 2, 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568, Japan
50:      and
51:      JRCAT-ATP,
52:      central 4, 1-1-1 Higashi, Tsukuba, 
53:      Ibaraki 305-0046, Japan
54: }
55: %
56: \maketitle
57: \begin{abstract}                % DON'T CHANGE THIS LINE
58:      A new O($N$) algorithm based on a recursion method, in which the
59:      computational effort is proportional to the number of atoms $N$,
60:      is presented for calculating the inverse of an overlap matrix which
61:      is needed in electronic structure calculations with the the
62:      non-orthogonal localized basis set. This efficient inverting
63:      method can be incorporated in several O($N$) methods
64:      for diagonalization of a generalized secular equation.
65:      By studying convergence properties of the 1-norm of an error matrix
66:      for diamond and fcc Al, this method is compared to three other O($N$)
67:      methods (the divide method, Taylor expansion method, and Hotelling's
68:      method) with regard to computational accuracy and efficiency within
69:      the density functional theory.
70:      The test calculations show that the new method
71:      is about one-hundred times faster than the divide method in
72:      computational time to achieve the same convergence for both diamond
73:      and fcc Al, while the Taylor expansion method and Hotelling's method
74:      suffer from numerical instabilities in most cases.
75: \end{abstract}
76: %
77: \vspace{2cm}
78: 
79:  The development of O($N$) methods
80:  \cite{Pettifor,Ozaki,Goedecker,Stephan,Yang,Galli,Mauri,Daw,Li,Palser}
81:  and the revival of localized
82:  orbitals as a basis set
83:  \cite{Sankey,Kobayashi,Kurita,Kobayashi2,Hierse,Hernandez,Ordejon,Sanchez,Horsfield}
84:  have been made during the last decade
85:  in order to extend the applicability of the first-principles molecular
86:  dynamics (FPMD) simulations using the plane wave expansion and the
87:  Car-Parrinello method within density functional
88:  theories (DFT) \cite{Payne}.
89:  However, only few applications of these ${\rm O}(N)$ methods to large
90:  systems have been reported within the DFT calculations
91:  \cite{Sanchez,Bowler,Applications}.
92:  Although there are a lot of limitations of the method based on
93:  the localized description \cite{Applications}, one of the limitations
94:  is that several O($N$) methods require evaluating the inverse
95:  of the overlap matrix $S$ which comes from non-orthogonality among
96:  the localized orbitals.
97: 
98:  In the generalized Fermi operator expansion (FOE) method \cite{Stephan}
99:  to the non-orthogonal basis we need to calculate the inverse of
100:  overlap matrix to construct the modified Hamiltonian $H'\equiv S^{-1}H$,
101:  while Stephan et al. have proposed solving a linear equation $SH'=H$
102:  with the cutoff radii of $H$ instead of calculating the inverse of
103:  overlap matrix.
104:  In the density matrix (DM) method \cite{Daw,Li,Palser} which is
105:  a promising approach for materials with a wide gap, fortunately,
106:  the evaluation of the inverse is not required during the optimization
107:  of grand potentials, although we have to evaluate the inverse of the
108:  overlap matrix for a good initial guess of the density matrix \cite{Palser}.
109:  The block bond-order potential (BOP) method \cite{Ozaki}, which has good
110:  convergence properties for both insulators and metals, also
111:  requires the evaluation of the modified Hamiltonian $H'$ as in method
112:  the FOE method. If the overlap matrix is sparse, the computational
113:  cost scales as the second power of the number of atoms $N$ in the
114:  inverse calculation. Therefore, an efficient O($N$) method
115:  for inverting the overlap matrix should be developed.
116: 
117:  So far, several O($N$) inverting methods have been proposed.
118:  Gibson et al. used a simple method in which a linear equation
119:  $SH'=H$ constructed for a finite cluster is solved without
120:  explicit calculation of $S^{-1}$ \cite{Gibson}.
121:  Mauri et al. considered approximating the inverse of
122:  overlap matrix by the Taylor expansion \cite{Mauri}. The approach could be 
123:  an O($N$) inverting method when the matrix elements in the $p$th
124:  moment $O^p$ of the overlap matrix $O$ are cut at a finite distance.
125:  Palser and Manolopoulos proposed to evaluate the inverse 
126:  by Hotelling's method which is similar to the iterative
127:  purification algorithm of the DM method \cite{Palser}.
128:  The iterative calculation can be performed in O($N$) operations,
129:  provided that the cutoff of matrix elements at a finite distance is
130:  introduced in the product of two matrices.
131:  It is worth pointing out that the ideas of these
132:  O($N$) inverting methods are analogous to those of the
133:  O($N$) methods for the diagonalization. 
134:  The divide method by Gibson et al. \cite{Gibson}, the Taylor expansion
135:  method \cite{Mauri}, and Hotelling's method \cite{Palser} strategically
136:  and mathematically correspond to the divide and conquer method \cite{Yang},
137:  the FOE method \cite{Goedecker,Stephan}, and the DM method
138:  \cite{Daw,Li,Palser}, respectively.
139:  Therefore, one may expect that these O($N$) inverting methods
140:  may have the convergence properties for realistic materials
141:  similar to the O($N$) methods for the diagonalization \cite{Comparison}.
142:  However, it remains to be seen whether the expectation is meaningful
143:  or not.
144: 
145:  In this paper we propose a new O($N$) method for calculating 
146:  the inverse of the overlap matrix which is based on a resolvent and
147:  the block Lanczos algorithm. The new method is compared 
148:  with the other three methods in terms of the computational accuracy
149:  and efficiency. Thus, our aim of this paper is to clarify the
150:  applicability of these four O($N$) inverting methods for
151:  realistic materials.
152:  The paper is organized as follows. In Sec. II we present the theory
153:  of a new O($N$) inverting method based on a recursion method,
154:  and also summarize the three other O($N$) inverting methods.
155:  In Sec. III we discuss the convergence properties of these four
156:  O($N$) inverting methods for the diamond and fcc Al within
157:  the DFT calculations using the 1-norm of an error matrix
158:  which will be related to the error in the eigenvalues in this section.
159:  In Sec. IV we conclude with clear characterization of the
160:  four O($N$) inverse methods.
161: 
162:  \begin{center}
163:    {\bf II.~THEORY}
164:  \end{center}
165: 
166:  \begin{center}
167:    {\bf A. Recursion method}
168:  \end{center}
169: 
170:  It is assumed that one-particle wave functions are expanded 
171:  by a localized orbital basis set $(\vert i\alpha\rangle)$, where 
172:  $i$ is a site index and $\alpha$ is an orbital index.
173:  The localized orbitals could be Slater-type
174:  \cite{Kobayashi,Kurita,Kobayashi2}, Gaussian-type \cite{Hierse},
175:  and numerical orbitals \cite{Sankey,Hernandez} obtained by
176:  DFT calculations for atoms.
177:  In most cases, the orbitals are non-orthogonal between them,
178:  leading to an overlap matrix $S$ defined by 
179:  \begin{eqnarray}
180:    S_{i\alpha,j\beta} = \langle i\alpha \vert \hat{S}\vert j\beta \rangle,
181:  \end{eqnarray}
182:  where $\hat{S}$ is the overlap operator which is introduced as a matter
183:  of form in order to emphasize the similarity
184:  between the new inverting method and the block BOP method \cite{Ozaki},
185:  although the overlap operator generally should be the identity operator I.
186:  The overlap integral exponentially decays in real space 
187:  because of the localized nature of the orbitals, so that 
188:  the overlap matrix $S$ is sparse. Here we introduce a resolvent
189:  $R(Z)$ for the matrix $S$ as follows:
190:  \begin{eqnarray}
191:    R(Z) = (S-Z{\rm I})^{-1}.
192:  \end{eqnarray}
193:  It is then easy to verify that  
194:  \begin{eqnarray}
195:    S^{-1} = {\rm Re}R(0).
196:  \end{eqnarray}
197:  Thus, we see that the real part of the resolvent for $Z=0$
198:  gives the inverse $S^{-1}$ of the overlap matrix. 
199:  If the resolvent for $Z=0$ has a finite value for the imaginary part,
200:  the basis set is not linearly independent.
201:  The resolvent can be evaluated by adopting the algorithm of the
202:  block BOP method \cite{Ozaki} which is recently developed to simulate
203:  orthogonal tight-binding (TB) models in O($N$) operations.
204:  It is noted that the new inverting method is derived just by replacing
205:  the Hamiltonian $\hat{H}$ in the block BOP method within the orthogonal
206:  TB models with the overlap operator $\hat{S}$.
207:  The first step in this algorithm is to block-tridiagonalize 
208:  the overlap matrix $S$ using the block Lanczos algorithm
209:  \cite{Lanczos,Jones,Inoue,Haydock}.
210:  The central equations is 
211:  \begin{eqnarray}
212:    \hat{S}\vert U_{n}) & = & \vert U_{n})\underline{A}_{n}
213:                        +
214:             \vert U_{n-1})\underline{B}_{n}
215:                        +
216:             \vert U_{n+1})\underline{B}_{n+1}
217:  \end{eqnarray}
218:  with 
219:  \begin{eqnarray}
220:     \vert U_0) =
221:             (\vert i1\rangle,\vert i2\rangle,\dots,\vert iM_i\rangle )
222:  \end{eqnarray}
223:  as the starting state. $\underline{A}_n$ and $\underline{B}_n$ are 
224:  recursion block coefficients with $M_{i}\times M_{i}$ in size, 
225:  where $M_{i}$ is the number of localized orbitals on the starting 
226:  atom $i$, and the underline indicates that the element is a block.
227:  In the block Lanczos algorithm, we need to start the recursion with
228:  Eq.~(5) to make the recursion method accurate and efficient \cite{Ozaki}.
229:  The Lanczos algorithm with a finite recursion transforms the overlap
230:  matrix $S$ into the block-tridiagonalized matrix $S^L$ which has
231:  the diagonal $A_{n}$ and the sub-diagonal block elements $B_{n}$,
232:  where the index $L$ indicates the representation based on the Lanczos
233:  basis. Considering the resolvent $R^{L}(Z)\equiv (S^{L}-Z{\rm I})^{-1}$
234:  for the block-tridiagonalized overlap matrix,
235:  the diagonal $\underline{R}^L_{00}(Z)$ and off-diagonal block elements 
236:  $\underline{R}^L_{0n}(Z)$ can be easily derived along the same line
237:  as that described in the block BOP method \cite{Ozaki}.
238:  For $Z=0$, the elements are given by 
239:  \begin{eqnarray}
240:    \underline{R}^L_{00}(0)
241:         =[\underline{A}_0-\hspace{0.4mm}^t\hspace{-0.4mm}\underline{B}_1[
242:           \underline{A}_1-\hspace{0.4mm}^t\hspace{-0.4mm}\underline{B}_2[
243:                        \cdots
244:           ]^{-1}\underline{B}_2
245:           ]^{-1}\underline{B}_1
246:           ]^{-1},
247:  \end{eqnarray}
248:  \begin{eqnarray}
249:     \nonumber
250:     \lefteqn{ 
251:       \underline{R}^{L}_{0n}(0)
252:       =
253:     \biggl(
254:       \delta_{1n}\underline{\rm I}
255:       -\underline{R}^{L}_{0n-1}(0)\underline{A}_{n-1}
256:     }\\
257:     &&
258:     \quad\quad\quad\quad
259:        -\underline{R}^{L}_{0n-2}(0)
260:          \hspace{0.4mm}^t\hspace{-0.4mm}\underline{B}_{n-1}
261:     \biggr)
262:          (\underline{B}_{n})^{-1},
263:  \end{eqnarray}
264:  where $\delta$ is Kronecker's delta, and 
265:  $R_{0-1}(0)=\hspace{0.4mm}^t\hspace{-0.4mm}B_{0}=0$.
266:  Once the block diagonal element is calculated as the multiple
267:  inverse Eq.~(6), the off-diagonal elements are evaluated
268:  from the recurrence relation Eq.~(7) with $\underline{R}^L_{00}(0)$
269:  as the starting element. In order to truncate the multiple inverse
270:  in Eq.~(6) without reducing the accuracy significantly, a square root
271:  terminator could be used, while there could
272:  be an infinite number of levels in the multiple inverse of diagonal
273:  Green's function for an infinite system.
274:  In the test calculations of Sec.~III we used the square root
275:  temninator for the truncation at a finite number of levels.
276:  The two Eqs.~(6) and (7) provide the resolvent based on the Lanczos
277:  basis representation, so that we can obtain the original resolvent
278:  through the following inverse transformation:
279:  \begin{eqnarray}
280:     \underline{R}_{ij}(0) = \sum_{n}
281:                  \underline{R}^L_{0n}(0) 
282:                  \hspace{0.4mm}^t\hspace{-0.4mm}\underline{U}_{nj},
283:  \end{eqnarray}
284:  where $\hspace{0.4mm}^t\hspace{-0.4mm}\underline{U}_{nj}$ is defined by
285:  $\hspace{0.4mm}^t\hspace{-0.4mm}\underline{U}_{nj} = (U_{n}\vert
286:         (\vert j1\rangle,\vert j2\rangle,\dots,\vert jM_j\rangle ).$
287:  The inverse transformation Eq.~(8) is significantly simplified 
288:  because of the orthogonality in the Lanczos bases. Therefore, we only
289:  have to evaluate the 0th block line of the resolvent in the Lanczos
290:  basis representation.
291:  The resolvent exactly satisfies a sum rule $\sum_{ij}
292:   {\rm tr\left\{\underline{S}_{ij}\underline{R}_{ji}(0)\right\}}
293:   = N_{B}$ which is derived from Eq.~(2), where $N_{B}$ is the 
294:  number of bases, and is constructed by up to (q+1)th moments
295:  $S^{q+1}$ \cite{Ozaki}, where $q$ is a final level for the recursion.
296:  Equation (8) gives a good approximation for the inverse of
297:  overlap matrix as the number of recursion levels increases.
298:  However, the approximated inverse is not strictly
299:  a symmetric matrix at a finite recursion.
300:  If the approximated inverse is symmetric, eigenvalues of
301:  a generalized secular equation with the overlap matrix
302:  are real numbers. Therefore, we evaluate the inverse of
303:  overlap matrix by symmetrizing the resolvent in terms of
304:  simple arithmetic average:
305:  \begin{eqnarray}
306:     \underline{S}^{-1}_{ij} = 
307:     \frac{{\rm Re}\underline{R}_{ij}(0)
308:         + {\rm Re}\hspace{0.4mm}^t\hspace{-0.4mm}\underline{R}_{ji}(0)}
309:          {2}.
310:  \end{eqnarray}
311:  The symmetrization preserves the above sum rule.
312:  The all elements of the inverse are evaluated by applying the
313:  series of the algorithm repeatedly to each atom.
314:  The cluster over which the hops are made in the Lanczos algorithm is
315:  determined by the logical truncation method \cite{Ozaki}.
316:  Thus, the computational cost of the recursion method is strictly
317:  proportional to the number of atoms $N$.
318: 
319:  \begin{center}
320:    {\bf B. Divide method}
321:  \end{center}
322: 
323:  In the case of the block BOP \cite{Ozaki} and FOE methods
324:  \cite{Goedecker,Stephan}, it is required to evaluate
325:  the modified Hamiltonian $H'=S^{-1}H$ rather than the inverse of
326:  overlap matrix. In such cases we have an alternative way
327:  where a linear equation 
328:  \begin{eqnarray}
329:    SH'=H
330:  \end{eqnarray}
331:  is solved instead of calculating the inverse.
332:  In conventional ways of solving the linear equation for a total system,
333:  the computational cost scales as the third
334:  power of the number of atoms $N$, while the scaling could be 
335:  reduced to ${\rm O}(N^2)$, making use of the sparseness
336:  of the overlap matrix. Therefore, Gibson et al. have proposed
337:  a solution of Eq.~(10) with the cutoff radii of $H$ and $S$ \cite{Gibson}.
338:  The linear equation Eq.~(10) can be decomposed into $N$ subspace
339:  linear equations for $N$ finite clusters under this constraint. 
340:  One solves each of the subspace linear equations for the finite clusters
341:  centered on atom $i$ using a conventional method such as the 
342:  Cholesky factorization, which results in O($N$) operations
343:  for the computational effort.
344:  However, the divide method has redundancy in the calculation
345:  that one has to evaluate all matrix elements of the modified
346:  Hamiltonian $H'$ for each finite cluster compared to the other
347:  ${\rm O}(N)$ inverting methods in which the elements in the inverse
348:  of the overlap matrix are not doubly calculated.
349:  Thus, the prefactor of the ${\rm O}(N)$ operations could be 
350:  very large for highly coordinated structures such as fcc.
351:  The magnitude of the prefactor will be discussed in Sec.~III.
352:  An iterative scheme such as the Gauss-Siedel method \cite{Jones,Foulkes}
353:  which is commonly used for large-scale systems is also available for
354:  solving the linear equation Eq.~(10). However, it has been
355:  recognized that the iterative scheme is computationally expensive
356:  \cite{Gibson}, so that the iterative scheme was
357:  not investigated in this study. 
358:  We used the logical truncation method to construct the subspace
359:  linear equation as well as the recursion method in the test calculations
360:  discussed in Sec.~III in order to compare the computational performance.
361: 
362:  \begin{center}
363:    {\bf C. Taylor expansion method}
364:  \end{center}
365: 
366:  Mauri et al. have proposed to approximate the inverse of the overlap
367:  matrix using the Taylor expansion in their ${\rm O}(N)$ unconstrained
368:  minimization method \cite{Mauri}. The overlap matrix $S$ is expressed as a
369:  sum of the identity ${\rm I}$ and an $O$-matrix $O$ which is the overlap
370:  matrix between the different orbitals:
371:  \begin{eqnarray}
372:     S = {\rm I} + O,
373:  \end{eqnarray}
374:  then we can expand the inverse of $S$ in respect to the $O$-matrix
375:  as follows:
376:  \begin{eqnarray}
377:     \nonumber
378:     S^{-1} & = & \sum_{n=0}^{\infty}(-1)^n O^n\\
379:            & = & {\rm I} - O + O^2 - O^3 +  \dots
380:  \end{eqnarray}
381:  The computational accuracy and efficiency of the approximation 
382:  by the Taylor series depend on the convergence for the summation
383:  of Eq.~(12). The summation in Eq.~(12) does not converge, but
384:  diverges, when the spectrum radius of the $O$-matrix exceeds 1.0.
385:  Even if the $O$-matrix has no eigenvalues which are and below -1.0,
386:  indicating the basis set is linearly independent, the eigenvalues
387:  of the $O$-matrix exceed 1.0 in most cases as shown in Sec.~III.
388:  In such cases, the Taylor expansion method cannot be applied.
389:  The matrix $O^n$ is calculated as the product of the perfect
390:  but highly sparse $O$-matrix, and $O^{n-1}$ with the cutoff
391:  radii for the elements, so that the summation to a finite order
392:  in Eq.~(12) can be performed with ${\rm O}(N)$ operations.
393: 
394:  \begin{center}
395:    {\bf D. Hotelling's method}
396:  \end{center}
397: 
398:  Palser and Manolopoulos \cite{Palser} have suggested evaluating
399:  the inverse $S^{-1}$ using Hotelling's method \cite{Recipes,Pan}.
400:  The method has an iterative algorithm very similar to the purification
401:  algorithm \cite{Palser} in the DM method.
402:  The convergence rate in Hotelling's method is also quadratic
403:  as with the DM method.
404:  The purification of an approximate inverse is achieved using the
405:  following iterative relation:
406:  \begin{eqnarray}
407:     S^{-1}_{n+1} = 2 S^{-1}_{n} - S^{-1}_{n}SS^{-1}_{n}.
408:  \end{eqnarray}
409:  In case of $S^{-1}_0 = {\rm I}$, Hotelling's method is equivalent to 
410:  the Taylor expansion method to a finite order described in the previous
411:  subsection (C). It is easy to verify that $S_1$ and $S_2$ are 
412:  the Taylor series to the first and third orders of the $O$-matrix,
413:  respectively:
414:  \begin{eqnarray}
415:     \nonumber
416:     S^{-1}_{1} & = & 2 S^{-1}_{0} - S^{-1}_{0}SS^{-1}_{0}\\
417:                & = & {\rm I} - O,
418:  \end{eqnarray}
419:  \begin{eqnarray}
420:     \nonumber
421:     S^{-1}_{2} & = & 2 S^{-1}_{1} - S^{-1}_{1}SS^{-1}_{1}\\
422:                & = & {\rm I} - O + O^2 - O^3.
423:  \end{eqnarray}
424:  From Eqs.~(14) and (15), we see that Hotelling's method converges
425:  quadratically compared to the linear convergence of Taylor
426:  expansion method. Thus, if Eq.~(12) is a convergent series,
427:  Hotelling's method should be more efficient rather than
428:  the Taylor expansion method.
429:  When the spectrum radius of the $O$-matrix exceeds 1.0,
430:  the identity ${\rm I}$ cannot be used as the initial guess for
431:  the inverse $S^{-1}$. In such cases, although it is very difficult
432:  to estimate a good initial matrix $S_{0}^{-1}$ for the iteration Eq.~(13),
433:  in this study, we use the overlap $S$ with a small prefactor $\sigma$
434:  derived by Pan and Reif \cite{Pan} as the initial guess:
435:  \begin{eqnarray}
436:    S^{-1}_{0} = \sigma S
437:  \end{eqnarray}
438:  with  
439:  \begin{eqnarray}
440:     \sigma = \frac{1}
441:             {\left(
442:              \displaystyle{\max_{i\alpha}}
443:              \displaystyle{\sum_{j\beta}}\vert S_{i\alpha,j\beta}\vert
444:              \right)^2}.
445:  \end{eqnarray}
446:  It is noted that Hotelling's method possesses an advantage 
447:  that the inverse at the previous MD step could be a good guess
448:  of $S_{0}^{-1}$ at the current MD step, while any information
449:  at the previous MD step cannot be made use of in the other methods;
450:  the recursion method, the divide method, and the Taylor expansion method. 
451:  In the iteration Eq.~(13), the elements of $S_{n}^{-1}$ are cut 
452:  at a finite distance. As a result of this truncation, the computational
453:  effort of Hotelling's method scales linearly with the system size.
454:  In test calculations of Sec.~III, we used the logical truncation
455:  method for the cutoff of the elements as in the other inverting
456:  O($N$) methods.
457: 
458:  \begin{center}
459:    {\bf III.~CONVERGENCE PROPERTIES}
460:  \end{center}
461: 
462:  \begin{center}
463:    {\bf A. Error analysis}
464:  \end{center}
465: 
466:  In order to compare the four ${\rm O}(N)$ inverse methods presented
467:  in the Sec.~II in terms of computational accuracy and efficiency,  
468:  we first relate the 1-norm of an error matrix $E$ with the error of
469:  eigenvalues $\epsilon_{\nu}$ of a secular equation by using
470:  an error analysis theory \cite{Golub,Chatelin}.
471:  The generalized secular equation with the overlap matrix $S$ is derived
472:  from the variational principle within DFT using a non-orthogonal basis set.
473:  \begin{eqnarray}
474:    S^{-1}HC_{\nu} = \epsilon_{\nu} C_{\nu},
475:  \end{eqnarray}
476:  where $H_{i\alpha,j\beta}
477:         \equiv \langle i\alpha \vert\hat{H}\vert j\beta\rangle$
478:  and
479:  $C_{i\alpha,\nu}$ is an expansion coefficient
480:  $C_{i\alpha,\nu}\equiv \langle i\alpha\vert \phi_{\nu}\rangle$
481:  in a one-particle wave function $\vert \phi_{\nu}\rangle$.
482:  Let us consider substituting the exact inverse $S^{-1}$ with
483:  an approximate inverse $S'^{-1}$ in Eq.~(18), then the difference
484:  between $S^{-1}$ and $S'^{-1}$ is 
485:  \begin{eqnarray}
486:    S'^{-1} - S^{-1} = \Delta S^{-1}.
487:  \end{eqnarray}
488:  For the approximate inverse $S'^{-1}$
489:  the secular equation $S'^{-1}HC'_{\nu} = \epsilon'_{\nu} C'_{\nu}$
490:  is satisfied with approximate eigenvalues $\epsilon'_{\nu}$ and
491:  eigenvectors $C'_{\nu}$.
492:  According to the error analysis theory \cite{Golub,Chatelin},
493:  the difference between the exact and the approximate eigenvalues
494:  is given by
495:  \begin{eqnarray}
496:    \vert\epsilon'_{\nu} - \epsilon_{\nu}\vert = {\rm O}(\lambda)
497:  \end{eqnarray}
498:  with $\lambda$, which is the 1-norm of a matrix $\Delta S^{-1}H$,
499:  defined by 
500:  \begin{eqnarray}
501:    \lambda  = 
502:             \max_{j\beta}\sum_{i\alpha}
503:             \left\vert \sum_{k\gamma}
504:             \Delta S^{-1}_{i\alpha,k\gamma}H_{k\gamma,j\beta}
505:             \right\vert.
506:   \end{eqnarray}
507:  Therefore, we see that the error in eigenvalue is proportional to
508:  the 1-norm of $\Delta S^{-1}H$ for the approximation of the overlap
509:  matrix. Equation (20) apparently connects the error of the overlap
510:  matrix to that of the eigenvalue. However, it is not possible 
511:  to calculate the exact inverse for infinite or periodic systems, 
512:  so that we introduce an error matrix $E$, which is easily evaluated,
513:  defined as the difference between a matrix $SS'^{-1}H$ and the original
514:  Hamiltonian $H$:
515:  \begin{eqnarray}
516:     \nonumber
517:     E & \equiv & 
518:          SS'^{-1}H - H\\
519:       & = &  S\Delta S^{-1}H.
520:  \end{eqnarray}
521:  The 1-norm $\eta$ of the error matrix $E$ can be related to
522:  that $\lambda$ of the matrix $\Delta S^{-1}H$ as follows:
523:   \begin{eqnarray}
524:      \nonumber
525:      \eta & = & \max_{j\beta}\sum_{k'\gamma'}
526:                 \left\vert \sum_{i\alpha}\sum_{k\gamma}
527:                 S_{k'\gamma',i\alpha}\Delta S^{-1}_{i\alpha,k\gamma}
528:                 H_{k\gamma,j\beta}
529:                 \right\vert\\
530:      \nonumber
531:           & \leq & 
532:                 \max_{j\beta}\sum_{k'\gamma'}
533:                 \sum_{i\alpha}
534:                 \vert S_{k'\gamma',i\alpha} \vert
535:                 \left\vert
536:                 \sum_{k\gamma}
537:                 \Delta S^{-1}_{i\alpha,k\gamma}
538:                 H_{k\gamma,j\beta}
539:                 \right\vert\\
540:      \nonumber
541:           & \leq &
542:                 N_{av} 
543:                 \left(
544:                 \max_{j\beta} 
545:                 \sum_{i\alpha}
546:                 \left\vert
547:                 \sum_{k\gamma}
548:                 \Delta S^{-1}_{i\alpha,k\gamma}
549:                 H_{k\gamma,j\beta}
550:                 \right\vert
551:                 \right)\\
552:           & = &
553:                 N_{av}\lambda,
554:   \end{eqnarray}
555:  where $N_{av}$ is the average number of the non-zero elements
556:  in the overlap matrix for an orbital $\vert i\alpha \rangle$.
557:  The third relation in Eq.~(23) is derived by substituting the non-zero
558:  overlap integrals $\vert S_{k'\gamma,i\alpha} \vert$ to 1 with
559:  the variables $i\alpha$ fixed in the second relation.
560:  Considering Eqs.~(21) and (23), we can relate the 1-norm of the
561:  error matrix to the error of the eigenvalue:
562:  \begin{eqnarray}
563:    \vert\epsilon'_{\nu} - \epsilon_{\nu}\vert = {\rm O}(\eta). 
564:  \end{eqnarray}
565:  Therefore, we will compare the four O($N$) inverse methods
566:  using the 1-norm $\eta$, which is easily evaluated, instead of 
567:  $\lambda$.
568: 
569:  \begin{center}
570:    {\bf B. Numerical tests}
571:  \end{center}
572: 
573:  We numerically studied convergence properties of the four inverse
574:  ${\rm O}(N)$ methods using 1-norm $\eta$ for diamond and fcc
575:  Al within DFT proposed by Sankey and Niklewski \cite{Sankey}.
576:  In this DFT calculations we used numerical localized orbitals, 
577:  fireball bases by Sankey and Niklewski \cite{Sankey},
578:  as a minimal basis set for valence electrons.
579:  The radii of the radial-wave function confinement are 2.1 and 
580:  3.7~\AA~for carbon and aluminum atoms, respectively. 
581:  The minimal basis sets give 1.253 (1.244)
582:  and 2.515 (2.466)~\AA~ as an equilibrium bond length of dimer for
583:  carbon and aluminum, respectively, where the values in the
584:  parentheses are experimental results.
585: 
586:  In Fig.~(1) we show the density of states for eigenvalues of
587:  $O$-matrix, which is defined by Eq.~(11), in diamond and fcc Al.
588:  In both cases the $O$-matrices have no eigenvalues smaller than
589:  -1.0, so that the basis sets are linearly independent for
590:  the structures. However, the density of states possess finite
591:  values for the eigenvalues larger than or equal to 1.0 in both cases.
592:  In other words the spectrum radius of the $O$-matrix exceeds 1.0.
593:  This means that the summation in Eq.~(12) for the Taylor expansion
594:  method diverges for diamond and fcc Al.
595:  In addition to the above cases, we confirmed that the spectrum 
596:  radii of the $O$-matrix also exceed 1.0 for the graphite and
597:  poly(ethylene), so that the applicability of the Taylor expansion
598:  method is strictly restricted. Therefore, we do not provide the 
599:  convergence properties of the Taylor expansion method in this paper.
600: 
601:  Figure 2 shows the convergence properties of the 1-norm $\eta$ of 
602:  the error matrix for diamond calculated by the recursion, divide,
603:  and Hotelling's methods.
604:  In the recursion method the 1-norm exponentially decays for each
605:  shell cluster as a function of the number of recursion levels,
606:  and finally converges to the value of the 1-norm calculated by
607:  the divide method for the corresponding cluster.
608:  In the divide method the 1-norm almost exponentially diminishes
609:  as a function of number of shells.
610:  For the seven-shell cluster the 1-norm is only $3.1\times 10^{-5}$~eV.
611:  The identity matrix ${\rm I}$ cannot be used as an initial guess
612:  $S_{0}^{-1}$ in Hotelling's method because the spectrum
613:  radii of the $O$-matrix exceed 1.0. Thus, we gave the initial guess
614:  $S_{0}^{-1}$ by Eq.~(16), where $\sigma$ is 0.021 for diamond.
615:  In Hotelling's method the convergence properties are not 
616:  monotonic compared to the other two methods.
617:  For three-, five-, and seven-shell clusters, the 1-norm is
618:  gradually reduced for smaller number of iterations.
619:  However, the 1-norm increases after reaching at the minimum, 
620:  and finally we have a numerical instability that the 1-norm diverges
621:  as iteration proceeds.
622:  The smallest 1-norm for each shell-cluster is slightly larger than  
623:  that calculated by the divide method for the same cluster.
624:  Therefore, we see that Hotelling's method cannot reach the perfect 
625:  convergence for diamond due to the numerical instability.
626:  For Hotelling's method we also examined the convergence properties
627:  of the 1-norm $\eta$ for carbon in the diamond structure
628:  with 3.9~\AA~of a lattice constant in which the spectrum radius
629:  of the $O$-matrix is within 1.0, while the result is not shown in
630:  this paper. In this system the 1-norm very quickly converges to
631:  the corresponding value calculated by the divide method for the
632:  same cluster. Thus, we heuristically find that Hotelling's method
633:  gives convergent results for systems with the spectrum radii smaller
634:  than 1.0.
635: 
636:  As with Fig.~2, the convergence properties of the 1-norm are shown
637:  in Fig.~3 for fcc Al.
638:  The magnitude of the 1-norm is 1$\sim$2 order larger than that of
639:  diamond, while the behavior of the 1-norm is very similar to
640:  that of diamond.
641:  In the recursion method the converged values of the 1-norm are
642:  consistent with those of the divide method for four- and six-shell clusters,
643:  respectively. In Hotelling's method we used Eq.~(16) with $\sigma=0.0098$
644:  as $S_{0}^{-1}$, since the spectrum radius of the $O$-matrix exceed 1.0
645:  for fcc Al. The 1-norms for the four- and six-shell clusters 
646:  finally diverge without achieving the full convergence like for diamond.
647:  Although we tested the convergence properties using several values
648:  for $\sigma$ in both diamond and fcc Al, we could not obtain
649:  converged results and moreover could not avoid the numerical instability.
650: 
651:  Figures 4(a) and 4(b) show the relation between the magnitude of
652:  the 1-norm $\eta$ of the error matrix and the computational time per
653:  atom to evaluate the inverse of the overlap matrix for
654:  diamond and fcc Al, respectively.
655:  The comparison clearly indicates that the computational efficiency
656:  increases in the order of the divide $<$ Hotelling's $<$
657:  the recursion methods for both diamond and fcc Al.
658:  The recursion method is about one-hundred times faster than the divide
659:  method in computational time to achieve the same convergence for diamond
660:  and fcc Al.
661: 
662:  \begin{center}
663:    {\bf IV.~CONCLUSIONS}
664:  \end{center}
665:  
666:  We presented a new O($N$) algorithm for calculating the inverse of the 
667:  overlap matrix $S$. It is based on the recursion method with the block
668:  Lanczos algorithm. The problem of evaluating $S^{-1}$ is mapped to the
669:  block BOP method for an orthogonal TB model just by replacing the 
670:  Hamiltonian with the overlap operator.
671:  In addition, we briefly described the other known-methods
672:  for calculating the inverse in ${\rm O}(N)$ operations:
673:  the divide, the Taylor expansion, and Hotelling's methods.
674:  We examined the computational accuracy and efficiency
675:  of these ${\rm O}(N)$ inverting methods using the 1-norm of the 
676:  error matrix for diamond and fcc Al in DFT calculations with
677:  the minimal basis set for valence electrons.
678:  The spectrum radius of the $O$-matrix given by $(S-{\rm I})$
679:  exceeds 1.0 for many real materials in the DFT calculations
680:  based on the localized bases, which means that the applicability
681:  of the Taylor expansion method is significantly restricted.
682:  In the recursion method the 1-norm of the error matrix exponentially
683:  converges to the value calculated by the divide method for the same
684:  cluster in both diamond and fcc Al with numerical stability.
685:  On the other hand, Hotelling's method cannot reach the
686:  converged results due to the numerical instability in both cases.
687:  The comparison of computational time shows that the recursion 
688:  method is the most efficient algorithm among the four O($N$)
689:  inverting methods in diamond and fcc Al. The recursion method is
690:  about one-hundred times faster than the divide method.
691:  Thus, the new method for the evaluation of the inverse is
692:  a practical algorithm and can be incorporated
693:  in several O($N$) methods for total energy calculations using 
694:  localized orbital basis.
695: 
696:  \begin{center}
697:    {\bf ACKNOWLEDGMENS}
698:  \end{center}
699: 
700:  We would like to thank Y. Morikawa and H. Kino for helpful suggestions
701:  about the DFT calculations.
702:  We would like to thank D. R. Bowler for useful suggestions about
703:  ${\rm O}(N)$ inverting methods.
704:  Part of the computation in this work has been done using the computational
705:  facilities of the Japan Advanced Institute of Science and Technology (JAIST).
706: 
707: %
708: % ({\it REVTEX} 3.0 automatically issues
709: % a \newpage command when the \begin{table} or \begin{figure}
710: % commands are used, so the figures and tables will be placed
711: % on separate pages by {\it REVTEX}).
712: 
713:  \begin{references}
714: 
715:   % O(N) methods
716: 
717:   \bibitem{Pettifor}
718:   D. G. Pettifor, Phys. Rev. Lett. {\bf 63}, 2480 (1989);
719:   M. Aoki, Phys. Rev. Lett. {\bf 71}, 3842 (1993);
720:   A. P. Horsfield, A. M. Bratkovsky,
721:   D. G. Pettifor, and M. Aoki,
722:   Phys. Rev. B {\bf 53},
723:   1656 (1996);
724:   A. P. Horsfield, A. M. Bratkovsky,
725:   M. Fearn, D. G. Pettifor, and M. Aoki,
726:   Phys. Rev. B {\bf 53},
727:   12694 (1996);
728: 
729:   \bibitem{Ozaki}
730:   T. Ozaki, Phys. Rev. B {\bf 59}, 16061 (1999); 
731:   T. Ozaki, M. Aoki, and D. G. Pettifor, Phys. Rev. B {\bf 61}, 7972 (2000);
732:   T. Ozaki and K. Terakura, submitted to Phys. Rev. Lett.
733: 
734:   \bibitem{Goedecker}S. Goedecker and L. Colombo,
735:   Phys. Rev. Lett. {\bf 73}, 122 (1994).
736: 
737:   \bibitem{Stephan}U. Stephan and D. A. Drabold,
738:   Phys. Rev. B {\bf 57}, 6391 (1998).
739: 
740:   \bibitem{Yang}W. T. Yang, Phys. Rev. Lett. {\bf 66},
741:   1438 (1991).
742: 
743:   \bibitem{Galli}
744:   G. Galli and M. Parrinello, Phys. Rev. Lett. {\bf 69}, 3547 (1992).
745: 
746:   \bibitem{Mauri}
747:   F. Mauri, G. Galli, and R. Car, Phys. Rev. B {\bf 47}, 9973 (1993);
748:   F. Mauri and G. Galli, Phys. Rev. B {\bf 50}, 4316 (1994).
749: 
750:   \bibitem{Daw}
751:   M. S. Daw, Phys. Rev. B {\bf 47}, 10895 (1993).
752: 
753:   \bibitem{Li}
754:   X.-P. Li, R. W. Nunes, and D. Vanderbilt,
755:   Phys. Rev. B {\bf 47}, 10891 (1993);
756:   R. Nunes and D. Vanderbilt,
757:   Phys. Rev. B {\bf 50}, 17611 (1994). 
758: 
759:   \bibitem{Palser}
760:   A. H. R. Palser and D. Manolopoulos,
761:   Phys. Rev. B {\bf 58}, 12704 (1998).
762:  
763:   % ab initio tight-binding
764: 
765:   \bibitem{Sankey} O. F. Sankey and D. J. Niklewski,
766:   Phys. Rev. B {\bf 40}, 3979 (1989).
767: 
768:   \bibitem{Kobayashi} K. Kobayashi, N. Kurita, H. Kumahora, and K. Tago,
769:   Phys. Rev. B {\bf 45,} 11299 (1992).
770: 
771:   \bibitem{Kurita} N. Kurita and K. Kobayashi,
772:   Comp. and Chem. {\bf 24,} 351 (2000) and references therein.
773: 
774:   \bibitem{Kobayashi2} K. Kobayashi, K. Tago, and N. Kurita,
775:   Phys. Rev. A {\bf 53,} 1903 (1996).
776: 
777:   \bibitem{Hierse} W. Hierse and E. B. Stechel,
778:   Phys. Rev. B {\bf 50}, 17811 (1994).
779: 
780:   \bibitem{Hernandez} E. Hernandez and M. Gillan,
781:   Phys. Rev. B {\bf 51}, 10157 (1995).
782: 
783:   \bibitem{Ordejon} P. Ordejon, E. Artacho, and J. M. Soler, 
784:   Phys. Rev. B {\bf 53}, R10441 (1996).
785: 
786:   \bibitem{Sanchez} D. Sanchez-Portal, P. Ordejon, E. Artacho, and J. M. Soler,
787:   Int. J. Quant. Chem. {\bf 65}, 453 (1997).
788: 
789:   \bibitem{Horsfield} A. P. Horsfield,
790:   Phys. Rev. B {\bf 56,} 6594 (1997).
791: 
792:   % Car-Parrinello
793: 
794:   \bibitem{Payne} M. C. Payne, M. P. Teter, D. C. Allan, T. A. Arias
795:   and J. D. Joannopoulos, Rev. Mod. Phys {\bf 64,} 1045 (1992).
796: 
797:   % Applications
798: 
799:   \bibitem{Bowler} D. R. Bowler and M. J. Gillan,
800:   Mol. Simulat. {\bf 25}, 239 (2000).
801: 
802:   \bibitem{Applications}
803:   S. Goedecker, Rev. of Mod. Phys. {\bf 71,} 1085 (1999)
804:   and references therein.
805: 
806:   % O(N) inverse methods
807: 
808:   \bibitem{Gibson} A. Gibson, R. Haydock, and J. P. LaFemina,
809:   Phys. Rev. B {\bf 47,} 9229 (1993).
810: 
811:   % Comparison of O(N) methods
812: 
813:   \bibitem{Comparison}D. R. Bowler, M. Aoki, C. M. Goringe,
814:     A. P. Horsfield, and D. G. Pettifor,
815:     Modelling Simul. Mater. Sci. Eng. {\bf 5,} 199 (1997).
816: 
817:   % Lanczos
818: 
819:   \bibitem{Lanczos}C. Lanczos,
820:   J. Res. Natl. Bur. Stand. {\bf 45}, 225 (1950).
821: 
822:   \bibitem{Jones}
823:   R. Jones and M. W. Lewis, Philos. Mag. B {\bf 49}, 95 (1984);
824: 
825:   \bibitem{Inoue}
826:   J. Inoue and Y. Ohta, J. Phys. C {\bf 20}, 1947 (1987).
827: 
828:   \bibitem{Haydock}
829:   R. Haydock, V. Heine, and M. J. Kelly, J. Phys. C {\bf 5},
830:   2845 (1972); {\bf 8,} 2591 (1975);
831:   R. Haydock, Solid State Phys. {\bf 35}, 216 (1980).
832: 
833:  % Gauss-Siedel
834: 
835:   \bibitem{Foulkes} M. Foulkes and R. Haydock,
836:   J. Phys. C {\bf 19}, 6573 (1986).
837: 
838:  % Hotelling's method
839: 
840:   \bibitem{Recipes} W. H. Press, S. A. Teukolsky, W. T. Vetterling,
841:   and B. P. Flannery, {\it Numerical Recipes}, 2nd ed. 
842:   (Cambridge University Press, Cambridge, 1992), p. 49.
843: 
844:   \bibitem{Pan} V. Pan and J. Reif,
845:   in Proceedings of the Seventeenth Annual ACM Symposium on 
846:   Theory of Computing (New York: Association for Computing Machinery).
847: 
848:  % Error analysis
849: 
850:   \bibitem{Golub} G. H. Golub and C. van Loan,
851:   {\it Matrix Computations}, 2nd ed.,
852:   North Oxford Academic, Oxford, 1989.
853: 
854:   \bibitem{Chatelin} F. Chatelin,
855:   {\it Valeurs propres de matrices}, Masson, Paris 1988.
856: 
857:  \end{references}
858: 
859: % Fig.1
860:  
861:  \begin{figure}[t]
862:   \caption{\small
863:    The density of states for eigenvalues of the $O$-matrix for diamond
864:    and fcc Al, where carbon and aluminum atoms have minimal numerical
865:    basis sets for valence electrons which were obtained by DFT calculations
866:    for the atomic states. The experimental values, 3.57 and 4.05~\AA, were
867:    used as the lattice constants of diamond and fcc Al, respectively.}
868:  \end{figure}
869: 
870: % Fig.2
871: 
872:  \begin{figure}[t]
873:   \caption{\small
874:    The 1-norm of the error matrix for diamond calculated by
875:    the (a) recursion, (b) divide, and (c) Hotelling's methods.
876:    In both the recursion and Hotelling's methods, the 1-norms were
877:    calculated for three-, five-, and seven-shell clusters as a function
878:    of number of recursion levels and iterations, respectively.}
879:  \end{figure}
880: 
881: % Fig.3
882: 
883:  \begin{figure}[t]
884:   \caption{\small
885:    The 1-norm of the error matrix for fcc Al calculated by
886:    the (a) recursion, (b) divide, and (c) Hotelling's methods.
887:    In both the recursion and Hotelling's methods, the 1-norms were
888:    calculated for four- and six-shell clusters as a function
889:    of number of recursion levels and iterations, respectively.
890:    }
891:  \end{figure}
892: 
893: % Fig.4
894: 
895:  \begin{figure}[t]
896:   \caption{\small
897:    The 1-norm of the error matrix for (a) diamond and (b) fcc Al
898:    against the computational time taken per atom calculated by three
899:    O($N$) inverting methods. The calculations were performed using
900:    single processor on a compaq ES40 workstation.}
901:  \end{figure}
902: 
903: \end{document}
904: 
905: