0106:cond-mat0106048/part1

1: %%%%%%%%%%%%%%%%%% file template.tex %%%%%%%%%%%%%%%%%%%%

2: %                                                       %

3: %    Copyright (c) Optical Society of America, 1992.    %

4: %                                                       %

5: %%%%%%%%%%%%%%%%%%% November 17, 1992 %%%%%%%%%%%%%%%%%%%

6: %

7: % THIS FILE IS A TEMPLATE TO PRODUCE AN ARTICLE SUBMISSION

8: % TO THE OSA JOURNALS, JOSA-A, JOSA-B, and APPLIED OPTICS.

9: %

10: % THIS TEMPLATE CONTAINS TYPESETTING COMMANDS WHICH BEGIN WITH A

11: % BACKSLASH.  THESE COMMANDS WILL BE READ BY LATEX, USING THE

12: % REVTEX 3.0 STANDARD MACROS.   PLEASE FILL IN THE REQUIRED DATA

13: % FOR THE MACROS, BUT DO NOT ALTER THE DEFINITIONS.

14: %

15: % EXAMPLE: IN \author{Authors' names} , PLEASE FILL IN THE

16: % AUTHORS' NAME(S).

17: %

18: % COMMENTS BEGIN WITH THE PERCENT (%) SYMBOL. AFTER A %, ANY

19: % DATA ON THE REST OF A LINE WILL NOT PRINT.

20: %

21: \documentstyle[aps,manuscript]{revtex}  % DON'T CHANGE

22: %

23: %

24: \newcommand{\MF}{{\large{\manual META}\-{\manual FONT}}}

25: \newcommand{\manual}{rm}        % Substitute rm (Roman) font.

26: \newcommand\bs{\char '134 }     % add backslash char to \tt font

27: %

28: %

29:

30: \makeatletter

31: \newcounter{subeqncnt}

32: \def\thesubeqncnt{\alph{subeqncnt}}

33: \def\subequations{\begingroup%

34:    \stepcounter{equation}\edef\@tempa{\theequation}%

35:    \let\c@equation\c@subeqncnt\c@subeqncnt\z@

36:    \edef\theequation{\@tempa\noexpand\thesubeqncnt}}

37: \let\endsubequations\endgroup

38: \makeatother

39:

40: \begin{document}                % INITIALIZE - DONT CHANGE

41: %

42: %

43: %

44: \title{Efficient Recursion Method for Inverting Overlap Matrix}

45: \author{T. Ozaki}

46: \address{

47:      RICS,

48:      National Institute of Advanced Industrial Science and Technology (AIST),

49:      central 2, 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568, Japan

50:      and

51:      JRCAT-ATP,

52:      central 4, 1-1-1 Higashi, Tsukuba,

53:      Ibaraki 305-0046, Japan

54: }

55: %

56: \maketitle

57: \begin{abstract}                % DON'T CHANGE THIS LINE

58:      A new O($N$) algorithm based on a recursion method, in which the

59:      computational effort is proportional to the number of atoms $N$,

60:      is presented for calculating the inverse of an overlap matrix which

61:      is needed in electronic structure calculations with the the

62:      non-orthogonal localized basis set. This efficient inverting

63:      method can be incorporated in several O($N$) methods

64:      for diagonalization of a generalized secular equation.

65:      By studying convergence properties of the 1-norm of an error matrix

66:      for diamond and fcc Al, this method is compared to three other O($N$)

67:      methods (the divide method, Taylor expansion method, and Hotelling's

68:      method) with regard to computational accuracy and efficiency within

69:      the density functional theory.

70:      The test calculations show that the new method

71:      is about one-hundred times faster than the divide method in

72:      computational time to achieve the same convergence for both diamond

73:      and fcc Al, while the Taylor expansion method and Hotelling's method

74:      suffer from numerical instabilities in most cases.

75: \end{abstract}

76: %

77: \vspace{2cm}

78:

79:  The development of O($N$) methods

80:  \cite{Pettifor,Ozaki,Goedecker,Stephan,Yang,Galli,Mauri,Daw,Li,Palser}

81:  and the revival of localized

82:  orbitals as a basis set

83:  \cite{Sankey,Kobayashi,Kurita,Kobayashi2,Hierse,Hernandez,Ordejon,Sanchez,Horsfield}

84:  have been made during the last decade

85:  in order to extend the applicability of the first-principles molecular

86:  dynamics (FPMD) simulations using the plane wave expansion and the

87:  Car-Parrinello method within density functional

88:  theories (DFT) \cite{Payne}.

89:  However, only few applications of these ${\rm O}(N)$ methods to large

90:  systems have been reported within the DFT calculations

91:  \cite{Sanchez,Bowler,Applications}.

92:  Although there are a lot of limitations of the method based on

93:  the localized description \cite{Applications}, one of the limitations

94:  is that several O($N$) methods require evaluating the inverse

95:  of the overlap matrix $S$ which comes from non-orthogonality among

96:  the localized orbitals.

97:

98:  In the generalized Fermi operator expansion (FOE) method \cite{Stephan}

99:  to the non-orthogonal basis we need to calculate the inverse of

100:  overlap matrix to construct the modified Hamiltonian $H'\equiv S^{-1}H$,

101:  while Stephan et al. have proposed solving a linear equation $SH'=H$

102:  with the cutoff radii of $H$ instead of calculating the inverse of

103:  overlap matrix.

104:  In the density matrix (DM) method \cite{Daw,Li,Palser} which is

105:  a promising approach for materials with a wide gap, fortunately,

106:  the evaluation of the inverse is not required during the optimization

107:  of grand potentials, although we have to evaluate the inverse of the

108:  overlap matrix for a good initial guess of the density matrix \cite{Palser}.

109:  The block bond-order potential (BOP) method \cite{Ozaki}, which has good

110:  convergence properties for both insulators and metals, also

111:  requires the evaluation of the modified Hamiltonian $H'$ as in method

112:  the FOE method. If the overlap matrix is sparse, the computational

113:  cost scales as the second power of the number of atoms $N$ in the

114:  inverse calculation. Therefore, an efficient O($N$) method

115:  for inverting the overlap matrix should be developed.

116:

117:  So far, several O($N$) inverting methods have been proposed.

118:  Gibson et al. used a simple method in which a linear equation

119:  $SH'=H$ constructed for a finite cluster is solved without

120:  explicit calculation of $S^{-1}$ \cite{Gibson}.

121:  Mauri et al. considered approximating the inverse of

122:  overlap matrix by the Taylor expansion \cite{Mauri}. The approach could be

123:  an O($N$) inverting method when the matrix elements in the $p$th

124:  moment $O^p$ of the overlap matrix $O$ are cut at a finite distance.

125:  Palser and Manolopoulos proposed to evaluate the inverse

126:  by Hotelling's method which is similar to the iterative

127:  purification algorithm of the DM method \cite{Palser}.

128:  The iterative calculation can be performed in O($N$) operations,

129:  provided that the cutoff of matrix elements at a finite distance is

130:  introduced in the product of two matrices.

131:  It is worth pointing out that the ideas of these

132:  O($N$) inverting methods are analogous to those of the

133:  O($N$) methods for the diagonalization.

134:  The divide method by Gibson et al. \cite{Gibson}, the Taylor expansion

135:  method \cite{Mauri}, and Hotelling's method \cite{Palser} strategically

136:  and mathematically correspond to the divide and conquer method \cite{Yang},

137:  the FOE method \cite{Goedecker,Stephan}, and the DM method

138:  \cite{Daw,Li,Palser}, respectively.

139:  Therefore, one may expect that these O($N$) inverting methods

140:  may have the convergence properties for realistic materials

141:  similar to the O($N$) methods for the diagonalization \cite{Comparison}.

142:  However, it remains to be seen whether the expectation is meaningful

143:  or not.

144:

145:  In this paper we propose a new O($N$) method for calculating

146:  the inverse of the overlap matrix which is based on a resolvent and

147:  the block Lanczos algorithm. The new method is compared

148:  with the other three methods in terms of the computational accuracy

149:  and efficiency. Thus, our aim of this paper is to clarify the

150:  applicability of these four O($N$) inverting methods for

151:  realistic materials.

152:  The paper is organized as follows. In Sec. II we present the theory

153:  of a new O($N$) inverting method based on a recursion method,

154:  and also summarize the three other O($N$) inverting methods.

155:  In Sec. III we discuss the convergence properties of these four

156:  O($N$) inverting methods for the diamond and fcc Al within

157:  the DFT calculations using the 1-norm of an error matrix

158:  which will be related to the error in the eigenvalues in this section.

159:  In Sec. IV we conclude with clear characterization of the

160:  four O($N$) inverse methods.

161:

162:  \begin{center}

163:    {\bf II.~THEORY}

164:  \end{center}

165:

166:  \begin{center}

167:    {\bf A. Recursion method}

168:  \end{center}

169:

170:  It is assumed that one-particle wave functions are expanded

171:  by a localized orbital basis set $(\vert i\alpha\rangle)$, where

172:  $i$ is a site index and $\alpha$ is an orbital index.

173:  The localized orbitals could be Slater-type

174:  \cite{Kobayashi,Kurita,Kobayashi2}, Gaussian-type \cite{Hierse},

175:  and numerical orbitals \cite{Sankey,Hernandez} obtained by

176:  DFT calculations for atoms.

177:  In most cases, the orbitals are non-orthogonal between them,

178:  leading to an overlap matrix $S$ defined by

179:  \begin{eqnarray}

180:    S_{i\alpha,j\beta} = \langle i\alpha \vert \hat{S}\vert j\beta \rangle,

181:  \end{eqnarray}

182:  where $\hat{S}$ is the overlap operator which is introduced as a matter

183:  of form in order to emphasize the similarity

184:  between the new inverting method and the block BOP method \cite{Ozaki},

185:  although the overlap operator generally should be the identity operator I.

186:  The overlap integral exponentially decays in real space

187:  because of the localized nature of the orbitals, so that

188:  the overlap matrix $S$ is sparse. Here we introduce a resolvent

189:  $R(Z)$ for the matrix $S$ as follows:

190:  \begin{eqnarray}

191:    R(Z) = (S-Z{\rm I})^{-1}.

192:  \end{eqnarray}

193:  It is then easy to verify that

194:  \begin{eqnarray}

195:    S^{-1} = {\rm Re}R(0).

196:  \end{eqnarray}

197:  Thus, we see that the real part of the resolvent for $Z=0$

198:  gives the inverse $S^{-1}$ of the overlap matrix.

199:  If the resolvent for $Z=0$ has a finite value for the imaginary part,

200:  the basis set is not linearly independent.

201:  The resolvent can be evaluated by adopting the algorithm of the

202:  block BOP method \cite{Ozaki} which is recently developed to simulate

203:  orthogonal tight-binding (TB) models in O($N$) operations.

204:  It is noted that the new inverting method is derived just by replacing

205:  the Hamiltonian $\hat{H}$ in the block BOP method within the orthogonal

206:  TB models with the overlap operator $\hat{S}$.

207:  The first step in this algorithm is to block-tridiagonalize

208:  the overlap matrix $S$ using the block Lanczos algorithm

209:  \cite{Lanczos,Jones,Inoue,Haydock}.

210:  The central equations is

211:  \begin{eqnarray}

212:    \hat{S}\vert U_{n}) & = & \vert U_{n})\underline{A}_{n}

213:                        +

214:             \vert U_{n-1})\underline{B}_{n}

215:                        +

216:             \vert U_{n+1})\underline{B}_{n+1}

217:  \end{eqnarray}

218:  with

219:  \begin{eqnarray}

220:     \vert U_0) =

221:             (\vert i1\rangle,\vert i2\rangle,\dots,\vert iM_i\rangle )

222:  \end{eqnarray}

223:  as the starting state. $\underline{A}_n$ and $\underline{B}_n$ are

224:  recursion block coefficients with $M_{i}\times M_{i}$ in size,

225:  where $M_{i}$ is the number of localized orbitals on the starting

226:  atom $i$, and the underline indicates that the element is a block.

227:  In the block Lanczos algorithm, we need to start the recursion with

228:  Eq.~(5) to make the recursion method accurate and efficient \cite{Ozaki}.

229:  The Lanczos algorithm with a finite recursion transforms the overlap

230:  matrix $S$ into the block-tridiagonalized matrix $S^L$ which has

231:  the diagonal $A_{n}$ and the sub-diagonal block elements $B_{n}$,

232:  where the index $L$ indicates the representation based on the Lanczos

233:  basis. Considering the resolvent $R^{L}(Z)\equiv (S^{L}-Z{\rm I})^{-1}$

234:  for the block-tridiagonalized overlap matrix,

235:  the diagonal $\underline{R}^L_{00}(Z)$ and off-diagonal block elements

236:  $\underline{R}^L_{0n}(Z)$ can be easily derived along the same line

237:  as that described in the block BOP method \cite{Ozaki}.

238:  For $Z=0$, the elements are given by

239:  \begin{eqnarray}

240:    \underline{R}^L_{00}(0)

241:         =[\underline{A}_0-\hspace{0.4mm}^t\hspace{-0.4mm}\underline{B}_1[

242:           \underline{A}_1-\hspace{0.4mm}^t\hspace{-0.4mm}\underline{B}_2[

243:                        \cdots

244:           ]^{-1}\underline{B}_2

245:           ]^{-1}\underline{B}_1

246:           ]^{-1},

247:  \end{eqnarray}

248:  \begin{eqnarray}

249:     \nonumber

250:     \lefteqn{

251:       \underline{R}^{L}_{0n}(0)

252:       =

253:     \biggl(

254:       \delta_{1n}\underline{\rm I}

255:       -\underline{R}^{L}_{0n-1}(0)\underline{A}_{n-1}

256:     }\\

257:     &&

258:     \quad\quad\quad\quad

259:        -\underline{R}^{L}_{0n-2}(0)

260:          \hspace{0.4mm}^t\hspace{-0.4mm}\underline{B}_{n-1}

261:     \biggr)

262:          (\underline{B}_{n})^{-1},

263:  \end{eqnarray}

264:  where $\delta$ is Kronecker's delta, and

265:  $R_{0-1}(0)=\hspace{0.4mm}^t\hspace{-0.4mm}B_{0}=0$.

266:  Once the block diagonal element is calculated as the multiple

267:  inverse Eq.~(6), the off-diagonal elements are evaluated

268:  from the recurrence relation Eq.~(7) with $\underline{R}^L_{00}(0)$

269:  as the starting element. In order to truncate the multiple inverse

270:  in Eq.~(6) without reducing the accuracy significantly, a square root

271:  terminator could be used, while there could

272:  be an infinite number of levels in the multiple inverse of diagonal

273:  Green's function for an infinite system.

274:  In the test calculations of Sec.~III we used the square root

275:  temninator for the truncation at a finite number of levels.

276:  The two Eqs.~(6) and (7) provide the resolvent based on the Lanczos

277:  basis representation, so that we can obtain the original resolvent

278:  through the following inverse transformation:

279:  \begin{eqnarray}

280:     \underline{R}_{ij}(0) = \sum_{n}

281:                  \underline{R}^L_{0n}(0)

282:                  \hspace{0.4mm}^t\hspace{-0.4mm}\underline{U}_{nj},

283:  \end{eqnarray}

284:  where $\hspace{0.4mm}^t\hspace{-0.4mm}\underline{U}_{nj}$ is defined by

285:  $\hspace{0.4mm}^t\hspace{-0.4mm}\underline{U}_{nj} = (U_{n}\vert

286:         (\vert j1\rangle,\vert j2\rangle,\dots,\vert jM_j\rangle ).$

287:  The inverse transformation Eq.~(8) is significantly simplified

288:  because of the orthogonality in the Lanczos bases. Therefore, we only

289:  have to evaluate the 0th block line of the resolvent in the Lanczos

290:  basis representation.

291:  The resolvent exactly satisfies a sum rule $\sum_{ij}

292:   {\rm tr\left\{\underline{S}_{ij}\underline{R}_{ji}(0)\right\}}

293:   = N_{B}$ which is derived from Eq.~(2), where $N_{B}$ is the

294:  number of bases, and is constructed by up to (q+1)th moments

295:  $S^{q+1}$ \cite{Ozaki}, where $q$ is a final level for the recursion.

296:  Equation (8) gives a good approximation for the inverse of

297:  overlap matrix as the number of recursion levels increases.

298:  However, the approximated inverse is not strictly

299:  a symmetric matrix at a finite recursion.

300:  If the approximated inverse is symmetric, eigenvalues of

301:  a generalized secular equation with the overlap matrix

302:  are real numbers. Therefore, we evaluate the inverse of

303:  overlap matrix by symmetrizing the resolvent in terms of

304:  simple arithmetic average:

305:  \begin{eqnarray}

306:     \underline{S}^{-1}_{ij} =

307:     \frac{{\rm Re}\underline{R}_{ij}(0)

308:         + {\rm Re}\hspace{0.4mm}^t\hspace{-0.4mm}\underline{R}_{ji}(0)}

309:          {2}.

310:  \end{eqnarray}

311:  The symmetrization preserves the above sum rule.

312:  The all elements of the inverse are evaluated by applying the

313:  series of the algorithm repeatedly to each atom.

314:  The cluster over which the hops are made in the Lanczos algorithm is

315:  determined by the logical truncation method \cite{Ozaki}.

316:  Thus, the computational cost of the recursion method is strictly

317:  proportional to the number of atoms $N$.

318:

319:  \begin{center}

320:    {\bf B. Divide method}

321:  \end{center}

322:

323:  In the case of the block BOP \cite{Ozaki} and FOE methods

324:  \cite{Goedecker,Stephan}, it is required to evaluate

325:  the modified Hamiltonian $H'=S^{-1}H$ rather than the inverse of

326:  overlap matrix. In such cases we have an alternative way

327:  where a linear equation

328:  \begin{eqnarray}

329:    SH'=H

330:  \end{eqnarray}

331:  is solved instead of calculating the inverse.

332:  In conventional ways of solving the linear equation for a total system,

333:  the computational cost scales as the third

334:  power of the number of atoms $N$, while the scaling could be

335:  reduced to ${\rm O}(N^2)$, making use of the sparseness

336:  of the overlap matrix. Therefore, Gibson et al. have proposed

337:  a solution of Eq.~(10) with the cutoff radii of $H$ and $S$ \cite{Gibson}.

338:  The linear equation Eq.~(10) can be decomposed into $N$ subspace

339:  linear equations for $N$ finite clusters under this constraint.

340:  One solves each of the subspace linear equations for the finite clusters

341:  centered on atom $i$ using a conventional method such as the

342:  Cholesky factorization, which results in O($N$) operations

343:  for the computational effort.

344:  However, the divide method has redundancy in the calculation

345:  that one has to evaluate all matrix elements of the modified

346:  Hamiltonian $H'$ for each finite cluster compared to the other

347:  ${\rm O}(N)$ inverting methods in which the elements in the inverse

348:  of the overlap matrix are not doubly calculated.

349:  Thus, the prefactor of the ${\rm O}(N)$ operations could be

350:  very large for highly coordinated structures such as fcc.

351:  The magnitude of the prefactor will be discussed in Sec.~III.

352:  An iterative scheme such as the Gauss-Siedel method \cite{Jones,Foulkes}

353:  which is commonly used for large-scale systems is also available for

354:  solving the linear equation Eq.~(10). However, it has been

355:  recognized that the iterative scheme is computationally expensive

356:  \cite{Gibson}, so that the iterative scheme was

357:  not investigated in this study.

358:  We used the logical truncation method to construct the subspace

359:  linear equation as well as the recursion method in the test calculations

360:  discussed in Sec.~III in order to compare the computational performance.

361:

362:  \begin{center}

363:    {\bf C. Taylor expansion method}

364:  \end{center}

365:

366:  Mauri et al. have proposed to approximate the inverse of the overlap

367:  matrix using the Taylor expansion in their ${\rm O}(N)$ unconstrained

368:  minimization method \cite{Mauri}. The overlap matrix $S$ is expressed as a

369:  sum of the identity ${\rm I}$ and an $O$-matrix $O$ which is the overlap

370:  matrix between the different orbitals:

371:  \begin{eqnarray}

372:     S = {\rm I} + O,

373:  \end{eqnarray}

374:  then we can expand the inverse of $S$ in respect to the $O$-matrix

375:  as follows:

376:  \begin{eqnarray}

377:     \nonumber

378:     S^{-1} & = & \sum_{n=0}^{\infty}(-1)^n O^n\\

379:            & = & {\rm I} - O + O^2 - O^3 +  \dots

380:  \end{eqnarray}

381:  The computational accuracy and efficiency of the approximation

382:  by the Taylor series depend on the convergence for the summation

383:  of Eq.~(12). The summation in Eq.~(12) does not converge, but

384:  diverges, when the spectrum radius of the $O$-matrix exceeds 1.0.

385:  Even if the $O$-matrix has no eigenvalues which are and below -1.0,

386:  indicating the basis set is linearly independent, the eigenvalues

387:  of the $O$-matrix exceed 1.0 in most cases as shown in Sec.~III.

388:  In such cases, the Taylor expansion method cannot be applied.

389:  The matrix $O^n$ is calculated as the product of the perfect

390:  but highly sparse $O$-matrix, and $O^{n-1}$ with the cutoff

391:  radii for the elements, so that the summation to a finite order

392:  in Eq.~(12) can be performed with ${\rm O}(N)$ operations.

393:

394:  \begin{center}

395:    {\bf D. Hotelling's method}

396:  \end{center}

397:

398:  Palser and Manolopoulos \cite{Palser} have suggested evaluating

399:  the inverse $S^{-1}$ using Hotelling's method \cite{Recipes,Pan}.

400:  The method has an iterative algorithm very similar to the purification

401:  algorithm \cite{Palser} in the DM method.

402:  The convergence rate in Hotelling's method is also quadratic

403:  as with the DM method.

404:  The purification of an approximate inverse is achieved using the

405:  following iterative relation:

406:  \begin{eqnarray}

407:     S^{-1}_{n+1} = 2 S^{-1}_{n} - S^{-1}_{n}SS^{-1}_{n}.

408:  \end{eqnarray}

409:  In case of $S^{-1}_0 = {\rm I}$, Hotelling's method is equivalent to

410:  the Taylor expansion method to a finite order described in the previous

411:  subsection (C). It is easy to verify that $S_1$ and $S_2$ are

412:  the Taylor series to the first and third orders of the $O$-matrix,

413:  respectively:

414:  \begin{eqnarray}

415:     \nonumber

416:     S^{-1}_{1} & = & 2 S^{-1}_{0} - S^{-1}_{0}SS^{-1}_{0}\\

417:                & = & {\rm I} - O,

418:  \end{eqnarray}

419:  \begin{eqnarray}

420:     \nonumber

421:     S^{-1}_{2} & = & 2 S^{-1}_{1} - S^{-1}_{1}SS^{-1}_{1}\\

422:                & = & {\rm I} - O + O^2 - O^3.

423:  \end{eqnarray}

424:  From Eqs.~(14) and (15), we see that Hotelling's method converges

425:  quadratically compared to the linear convergence of Taylor

426:  expansion method. Thus, if Eq.~(12) is a convergent series,

427:  Hotelling's method should be more efficient rather than

428:  the Taylor expansion method.

429:  When the spectrum radius of the $O$-matrix exceeds 1.0,

430:  the identity ${\rm I}$ cannot be used as the initial guess for

431:  the inverse $S^{-1}$. In such cases, although it is very difficult

432:  to estimate a good initial matrix $S_{0}^{-1}$ for the iteration Eq.~(13),

433:  in this study, we use the overlap $S$ with a small prefactor $\sigma$

434:  derived by Pan and Reif \cite{Pan} as the initial guess:

435:  \begin{eqnarray}

436:    S^{-1}_{0} = \sigma S

437:  \end{eqnarray}

438:  with

439:  \begin{eqnarray}

440:     \sigma = \frac{1}

441:             {\left(

442:              \displaystyle{\max_{i\alpha}}

443:              \displaystyle{\sum_{j\beta}}\vert S_{i\alpha,j\beta}\vert

444:              \right)^2}.

445:  \end{eqnarray}

446:  It is noted that Hotelling's method possesses an advantage

447:  that the inverse at the previous MD step could be a good guess

448:  of $S_{0}^{-1}$ at the current MD step, while any information

449:  at the previous MD step cannot be made use of in the other methods;

450:  the recursion method, the divide method, and the Taylor expansion method.

451:  In the iteration Eq.~(13), the elements of $S_{n}^{-1}$ are cut

452:  at a finite distance. As a result of this truncation, the computational

453:  effort of Hotelling's method scales linearly with the system size.

454:  In test calculations of Sec.~III, we used the logical truncation

455:  method for the cutoff of the elements as in the other inverting

456:  O($N$) methods.

457:

458:  \begin{center}

459:    {\bf III.~CONVERGENCE PROPERTIES}

460:  \end{center}

461:

462:  \begin{center}

463:    {\bf A. Error analysis}

464:  \end{center}

465:

466:  In order to compare the four ${\rm O}(N)$ inverse methods presented

467:  in the Sec.~II in terms of computational accuracy and efficiency,

468:  we first relate the 1-norm of an error matrix $E$ with the error of

469:  eigenvalues $\epsilon_{\nu}$ of a secular equation by using

470:  an error analysis theory \cite{Golub,Chatelin}.

471:  The generalized secular equation with the overlap matrix $S$ is derived

472:  from the variational principle within DFT using a non-orthogonal basis set.

473:  \begin{eqnarray}

474:    S^{-1}HC_{\nu} = \epsilon_{\nu} C_{\nu},

475:  \end{eqnarray}

476:  where $H_{i\alpha,j\beta}

477:         \equiv \langle i\alpha \vert\hat{H}\vert j\beta\rangle$

478:  and

479:  $C_{i\alpha,\nu}$ is an expansion coefficient

480:  $C_{i\alpha,\nu}\equiv \langle i\alpha\vert \phi_{\nu}\rangle$

481:  in a one-particle wave function $\vert \phi_{\nu}\rangle$.

482:  Let us consider substituting the exact inverse $S^{-1}$ with

483:  an approximate inverse $S'^{-1}$ in Eq.~(18), then the difference

484:  between $S^{-1}$ and $S'^{-1}$ is

485:  \begin{eqnarray}

486:    S'^{-1} - S^{-1} = \Delta S^{-1}.

487:  \end{eqnarray}

488:  For the approximate inverse $S'^{-1}$

489:  the secular equation $S'^{-1}HC'_{\nu} = \epsilon'_{\nu} C'_{\nu}$

490:  is satisfied with approximate eigenvalues $\epsilon'_{\nu}$ and

491:  eigenvectors $C'_{\nu}$.

492:  According to the error analysis theory \cite{Golub,Chatelin},

493:  the difference between the exact and the approximate eigenvalues

494:  is given by

495:  \begin{eqnarray}

496:    \vert\epsilon'_{\nu} - \epsilon_{\nu}\vert = {\rm O}(\lambda)

497:  \end{eqnarray}

498:  with $\lambda$, which is the 1-norm of a matrix $\Delta S^{-1}H$,

499:  defined by

500:  \begin{eqnarray}

501:    \lambda  =

502:             \max_{j\beta}\sum_{i\alpha}

503:             \left\vert \sum_{k\gamma}

504:             \Delta S^{-1}_{i\alpha,k\gamma}H_{k\gamma,j\beta}

505:             \right\vert.

506:   \end{eqnarray}

507:  Therefore, we see that the error in eigenvalue is proportional to

508:  the 1-norm of $\Delta S^{-1}H$ for the approximation of the overlap

509:  matrix. Equation (20) apparently connects the error of the overlap

510:  matrix to that of the eigenvalue. However, it is not possible

511:  to calculate the exact inverse for infinite or periodic systems,

512:  so that we introduce an error matrix $E$, which is easily evaluated,

513:  defined as the difference between a matrix $SS'^{-1}H$ and the original

514:  Hamiltonian $H$:

515:  \begin{eqnarray}

516:     \nonumber

517:     E & \equiv &

518:          SS'^{-1}H - H\\

519:       & = &  S\Delta S^{-1}H.

520:  \end{eqnarray}

521:  The 1-norm $\eta$ of the error matrix $E$ can be related to

522:  that $\lambda$ of the matrix $\Delta S^{-1}H$ as follows:

523:   \begin{eqnarray}

524:      \nonumber

525:      \eta & = & \max_{j\beta}\sum_{k'\gamma'}

526:                 \left\vert \sum_{i\alpha}\sum_{k\gamma}

527:                 S_{k'\gamma',i\alpha}\Delta S^{-1}_{i\alpha,k\gamma}

528:                 H_{k\gamma,j\beta}

529:                 \right\vert\\

530:      \nonumber

531:           & \leq &

532:                 \max_{j\beta}\sum_{k'\gamma'}

533:                 \sum_{i\alpha}

534:                 \vert S_{k'\gamma',i\alpha} \vert

535:                 \left\vert

536:                 \sum_{k\gamma}

537:                 \Delta S^{-1}_{i\alpha,k\gamma}

538:                 H_{k\gamma,j\beta}

539:                 \right\vert\\

540:      \nonumber

541:           & \leq &

542:                 N_{av}

543:                 \left(

544:                 \max_{j\beta}

545:                 \sum_{i\alpha}

546:                 \left\vert

547:                 \sum_{k\gamma}

548:                 \Delta S^{-1}_{i\alpha,k\gamma}

549:                 H_{k\gamma,j\beta}

550:                 \right\vert

551:                 \right)\\

552:           & = &

553:                 N_{av}\lambda,

554:   \end{eqnarray}

555:  where $N_{av}$ is the average number of the non-zero elements

556:  in the overlap matrix for an orbital $\vert i\alpha \rangle$.

557:  The third relation in Eq.~(23) is derived by substituting the non-zero

558:  overlap integrals $\vert S_{k'\gamma,i\alpha} \vert$ to 1 with

559:  the variables $i\alpha$ fixed in the second relation.

560:  Considering Eqs.~(21) and (23), we can relate the 1-norm of the

561:  error matrix to the error of the eigenvalue:

562:  \begin{eqnarray}

563:    \vert\epsilon'_{\nu} - \epsilon_{\nu}\vert = {\rm O}(\eta).

564:  \end{eqnarray}

565:  Therefore, we will compare the four O($N$) inverse methods

566:  using the 1-norm $\eta$, which is easily evaluated, instead of

567:  $\lambda$.

568:

569:  \begin{center}

570:    {\bf B. Numerical tests}

571:  \end{center}

572:

573:  We numerically studied convergence properties of the four inverse

574:  ${\rm O}(N)$ methods using 1-norm $\eta$ for diamond and fcc

575:  Al within DFT proposed by Sankey and Niklewski \cite{Sankey}.

576:  In this DFT calculations we used numerical localized orbitals,

577:  fireball bases by Sankey and Niklewski \cite{Sankey},

578:  as a minimal basis set for valence electrons.

579:  The radii of the radial-wave function confinement are 2.1 and

580:  3.7~\AA~for carbon and aluminum atoms, respectively.

581:  The minimal basis sets give 1.253 (1.244)

582:  and 2.515 (2.466)~\AA~ as an equilibrium bond length of dimer for

583:  carbon and aluminum, respectively, where the values in the

584:  parentheses are experimental results.

585:

586:  In Fig.~(1) we show the density of states for eigenvalues of

587:  $O$-matrix, which is defined by Eq.~(11), in diamond and fcc Al.

588:  In both cases the $O$-matrices have no eigenvalues smaller than

589:  -1.0, so that the basis sets are linearly independent for

590:  the structures. However, the density of states possess finite

591:  values for the eigenvalues larger than or equal to 1.0 in both cases.

592:  In other words the spectrum radius of the $O$-matrix exceeds 1.0.

593:  This means that the summation in Eq.~(12) for the Taylor expansion

594:  method diverges for diamond and fcc Al.

595:  In addition to the above cases, we confirmed that the spectrum

596:  radii of the $O$-matrix also exceed 1.0 for the graphite and

597:  poly(ethylene), so that the applicability of the Taylor expansion

598:  method is strictly restricted. Therefore, we do not provide the

599:  convergence properties of the Taylor expansion method in this paper.

600:

601:  Figure 2 shows the convergence properties of the 1-norm $\eta$ of

602:  the error matrix for diamond calculated by the recursion, divide,

603:  and Hotelling's methods.

604:  In the recursion method the 1-norm exponentially decays for each

605:  shell cluster as a function of the number of recursion levels,

606:  and finally converges to the value of the 1-norm calculated by

607:  the divide method for the corresponding cluster.

608:  In the divide method the 1-norm almost exponentially diminishes

609:  as a function of number of shells.

610:  For the seven-shell cluster the 1-norm is only $3.1\times 10^{-5}$~eV.

611:  The identity matrix ${\rm I}$ cannot be used as an initial guess

612:  $S_{0}^{-1}$ in Hotelling's method because the spectrum

613:  radii of the $O$-matrix exceed 1.0. Thus, we gave the initial guess

614:  $S_{0}^{-1}$ by Eq.~(16), where $\sigma$ is 0.021 for diamond.

615:  In Hotelling's method the convergence properties are not

616:  monotonic compared to the other two methods.

617:  For three-, five-, and seven-shell clusters, the 1-norm is

618:  gradually reduced for smaller number of iterations.

619:  However, the 1-norm increases after reaching at the minimum,

620:  and finally we have a numerical instability that the 1-norm diverges

621:  as iteration proceeds.

622:  The smallest 1-norm for each shell-cluster is slightly larger than

623:  that calculated by the divide method for the same cluster.

624:  Therefore, we see that Hotelling's method cannot reach the perfect

625:  convergence for diamond due to the numerical instability.

626:  For Hotelling's method we also examined the convergence properties

627:  of the 1-norm $\eta$ for carbon in the diamond structure

628:  with 3.9~\AA~of a lattice constant in which the spectrum radius

629:  of the $O$-matrix is within 1.0, while the result is not shown in

630:  this paper. In this system the 1-norm very quickly converges to

631:  the corresponding value calculated by the divide method for the

632:  same cluster. Thus, we heuristically find that Hotelling's method

633:  gives convergent results for systems with the spectrum radii smaller

634:  than 1.0.

635:

636:  As with Fig.~2, the convergence properties of the 1-norm are shown

637:  in Fig.~3 for fcc Al.

638:  The magnitude of the 1-norm is 1$\sim$2 order larger than that of

639:  diamond, while the behavior of the 1-norm is very similar to

640:  that of diamond.

641:  In the recursion method the converged values of the 1-norm are

642:  consistent with those of the divide method for four- and six-shell clusters,

643:  respectively. In Hotelling's method we used Eq.~(16) with $\sigma=0.0098$

644:  as $S_{0}^{-1}$, since the spectrum radius of the $O$-matrix exceed 1.0

645:  for fcc Al. The 1-norms for the four- and six-shell clusters

646:  finally diverge without achieving the full convergence like for diamond.

647:  Although we tested the convergence properties using several values

648:  for $\sigma$ in both diamond and fcc Al, we could not obtain

649:  converged results and moreover could not avoid the numerical instability.

650:

651:  Figures 4(a) and 4(b) show the relation between the magnitude of

652:  the 1-norm $\eta$ of the error matrix and the computational time per

653:  atom to evaluate the inverse of the overlap matrix for

654:  diamond and fcc Al, respectively.

655:  The comparison clearly indicates that the computational efficiency

656:  increases in the order of the divide $<$ Hotelling's $<$

657:  the recursion methods for both diamond and fcc Al.

658:  The recursion method is about one-hundred times faster than the divide

659:  method in computational time to achieve the same convergence for diamond

660:  and fcc Al.

661:

662:  \begin{center}

663:    {\bf IV.~CONCLUSIONS}

664:  \end{center}

665:

666:  We presented a new O($N$) algorithm for calculating the inverse of the

667:  overlap matrix $S$. It is based on the recursion method with the block

668:  Lanczos algorithm. The problem of evaluating $S^{-1}$ is mapped to the

669:  block BOP method for an orthogonal TB model just by replacing the

670:  Hamiltonian with the overlap operator.

671:  In addition, we briefly described the other known-methods

672:  for calculating the inverse in ${\rm O}(N)$ operations:

673:  the divide, the Taylor expansion, and Hotelling's methods.

674:  We examined the computational accuracy and efficiency

675:  of these ${\rm O}(N)$ inverting methods using the 1-norm of the

676:  error matrix for diamond and fcc Al in DFT calculations with

677:  the minimal basis set for valence electrons.

678:  The spectrum radius of the $O$-matrix given by $(S-{\rm I})$

679:  exceeds 1.0 for many real materials in the DFT calculations

680:  based on the localized bases, which means that the applicability

681:  of the Taylor expansion method is significantly restricted.

682:  In the recursion method the 1-norm of the error matrix exponentially

683:  converges to the value calculated by the divide method for the same

684:  cluster in both diamond and fcc Al with numerical stability.

685:  On the other hand, Hotelling's method cannot reach the

686:  converged results due to the numerical instability in both cases.

687:  The comparison of computational time shows that the recursion

688:  method is the most efficient algorithm among the four O($N$)

689:  inverting methods in diamond and fcc Al. The recursion method is

690:  about one-hundred times faster than the divide method.

691:  Thus, the new method for the evaluation of the inverse is

692:  a practical algorithm and can be incorporated

693:  in several O($N$) methods for total energy calculations using

694:  localized orbital basis.

695:

696:  \begin{center}

697:    {\bf ACKNOWLEDGMENS}

698:  \end{center}

699:

700:  We would like to thank Y. Morikawa and H. Kino for helpful suggestions

701:  about the DFT calculations.

702:  We would like to thank D. R. Bowler for useful suggestions about

703:  ${\rm O}(N)$ inverting methods.

704:  Part of the computation in this work has been done using the computational

705:  facilities of the Japan Advanced Institute of Science and Technology (JAIST).

706:

707: %

708: % ({\it REVTEX} 3.0 automatically issues

709: % a \newpage command when the \begin{table} or \begin{figure}

710: % commands are used, so the figures and tables will be placed

711: % on separate pages by {\it REVTEX}).

712:

713:  \begin{references}

714:

715:   % O(N) methods

716:

717:   \bibitem{Pettifor}

718:   D. G. Pettifor, Phys. Rev. Lett. {\bf 63}, 2480 (1989);

719:   M. Aoki, Phys. Rev. Lett. {\bf 71}, 3842 (1993);

720:   A. P. Horsfield, A. M. Bratkovsky,

721:   D. G. Pettifor, and M. Aoki,

722:   Phys. Rev. B {\bf 53},

723:   1656 (1996);

724:   A. P. Horsfield, A. M. Bratkovsky,

725:   M. Fearn, D. G. Pettifor, and M. Aoki,

726:   Phys. Rev. B {\bf 53},

727:   12694 (1996);

728:

729:   \bibitem{Ozaki}

730:   T. Ozaki, Phys. Rev. B {\bf 59}, 16061 (1999);

731:   T. Ozaki, M. Aoki, and D. G. Pettifor, Phys. Rev. B {\bf 61}, 7972 (2000);

732:   T. Ozaki and K. Terakura, submitted to Phys. Rev. Lett.

733:

734:   \bibitem{Goedecker}S. Goedecker and L. Colombo,

735:   Phys. Rev. Lett. {\bf 73}, 122 (1994).

736:

737:   \bibitem{Stephan}U. Stephan and D. A. Drabold,

738:   Phys. Rev. B {\bf 57}, 6391 (1998).

739:

740:   \bibitem{Yang}W. T. Yang, Phys. Rev. Lett. {\bf 66},

741:   1438 (1991).

742:

743:   \bibitem{Galli}

744:   G. Galli and M. Parrinello, Phys. Rev. Lett. {\bf 69}, 3547 (1992).

745:

746:   \bibitem{Mauri}

747:   F. Mauri, G. Galli, and R. Car, Phys. Rev. B {\bf 47}, 9973 (1993);

748:   F. Mauri and G. Galli, Phys. Rev. B {\bf 50}, 4316 (1994).

749:

750:   \bibitem{Daw}

751:   M. S. Daw, Phys. Rev. B {\bf 47}, 10895 (1993).

752:

753:   \bibitem{Li}

754:   X.-P. Li, R. W. Nunes, and D. Vanderbilt,

755:   Phys. Rev. B {\bf 47}, 10891 (1993);

756:   R. Nunes and D. Vanderbilt,

757:   Phys. Rev. B {\bf 50}, 17611 (1994).

758:

759:   \bibitem{Palser}

760:   A. H. R. Palser and D. Manolopoulos,

761:   Phys. Rev. B {\bf 58}, 12704 (1998).

762:

763:   % ab initio tight-binding

764:

765:   \bibitem{Sankey} O. F. Sankey and D. J. Niklewski,

766:   Phys. Rev. B {\bf 40}, 3979 (1989).

767:

768:   \bibitem{Kobayashi} K. Kobayashi, N. Kurita, H. Kumahora, and K. Tago,

769:   Phys. Rev. B {\bf 45,} 11299 (1992).

770:

771:   \bibitem{Kurita} N. Kurita and K. Kobayashi,

772:   Comp. and Chem. {\bf 24,} 351 (2000) and references therein.

773:

774:   \bibitem{Kobayashi2} K. Kobayashi, K. Tago, and N. Kurita,

775:   Phys. Rev. A {\bf 53,} 1903 (1996).

776:

777:   \bibitem{Hierse} W. Hierse and E. B. Stechel,

778:   Phys. Rev. B {\bf 50}, 17811 (1994).

779:

780:   \bibitem{Hernandez} E. Hernandez and M. Gillan,

781:   Phys. Rev. B {\bf 51}, 10157 (1995).

782:

783:   \bibitem{Ordejon} P. Ordejon, E. Artacho, and J. M. Soler,

784:   Phys. Rev. B {\bf 53}, R10441 (1996).

785:

786:   \bibitem{Sanchez} D. Sanchez-Portal, P. Ordejon, E. Artacho, and J. M. Soler,

787:   Int. J. Quant. Chem. {\bf 65}, 453 (1997).

788:

789:   \bibitem{Horsfield} A. P. Horsfield,

790:   Phys. Rev. B {\bf 56,} 6594 (1997).

791:

792:   % Car-Parrinello

793:

794:   \bibitem{Payne} M. C. Payne, M. P. Teter, D. C. Allan, T. A. Arias

795:   and J. D. Joannopoulos, Rev. Mod. Phys {\bf 64,} 1045 (1992).

796:

797:   % Applications

798:

799:   \bibitem{Bowler} D. R. Bowler and M. J. Gillan,

800:   Mol. Simulat. {\bf 25}, 239 (2000).

801:

802:   \bibitem{Applications}

803:   S. Goedecker, Rev. of Mod. Phys. {\bf 71,} 1085 (1999)

804:   and references therein.

805:

806:   % O(N) inverse methods

807:

808:   \bibitem{Gibson} A. Gibson, R. Haydock, and J. P. LaFemina,

809:   Phys. Rev. B {\bf 47,} 9229 (1993).

810:

811:   % Comparison of O(N) methods

812:

813:   \bibitem{Comparison}D. R. Bowler, M. Aoki, C. M. Goringe,

814:     A. P. Horsfield, and D. G. Pettifor,

815:     Modelling Simul. Mater. Sci. Eng. {\bf 5,} 199 (1997).

816:

817:   % Lanczos

818:

819:   \bibitem{Lanczos}C. Lanczos,

820:   J. Res. Natl. Bur. Stand. {\bf 45}, 225 (1950).

821:

822:   \bibitem{Jones}

823:   R. Jones and M. W. Lewis, Philos. Mag. B {\bf 49}, 95 (1984);

824:

825:   \bibitem{Inoue}

826:   J. Inoue and Y. Ohta, J. Phys. C {\bf 20}, 1947 (1987).

827:

828:   \bibitem{Haydock}

829:   R. Haydock, V. Heine, and M. J. Kelly, J. Phys. C {\bf 5},

830:   2845 (1972); {\bf 8,} 2591 (1975);

831:   R. Haydock, Solid State Phys. {\bf 35}, 216 (1980).

832:

833:  % Gauss-Siedel

834:

835:   \bibitem{Foulkes} M. Foulkes and R. Haydock,

836:   J. Phys. C {\bf 19}, 6573 (1986).

837:

838:  % Hotelling's method

839:

840:   \bibitem{Recipes} W. H. Press, S. A. Teukolsky, W. T. Vetterling,

841:   and B. P. Flannery, {\it Numerical Recipes}, 2nd ed.

842:   (Cambridge University Press, Cambridge, 1992), p. 49.

843:

844:   \bibitem{Pan} V. Pan and J. Reif,

845:   in Proceedings of the Seventeenth Annual ACM Symposium on

846:   Theory of Computing (New York: Association for Computing Machinery).

847:

848:  % Error analysis

849:

850:   \bibitem{Golub} G. H. Golub and C. van Loan,

851:   {\it Matrix Computations}, 2nd ed.,

852:   North Oxford Academic, Oxford, 1989.

853:

854:   \bibitem{Chatelin} F. Chatelin,

855:   {\it Valeurs propres de matrices}, Masson, Paris 1988.

856:

857:  \end{references}

858:

859: % Fig.1

860:

861:  \begin{figure}[t]

862:   \caption{\small

863:    The density of states for eigenvalues of the $O$-matrix for diamond

864:    and fcc Al, where carbon and aluminum atoms have minimal numerical

865:    basis sets for valence electrons which were obtained by DFT calculations

866:    for the atomic states. The experimental values, 3.57 and 4.05~\AA, were

867:    used as the lattice constants of diamond and fcc Al, respectively.}

868:  \end{figure}

869:

870: % Fig.2

871:

872:  \begin{figure}[t]

873:   \caption{\small

874:    The 1-norm of the error matrix for diamond calculated by

875:    the (a) recursion, (b) divide, and (c) Hotelling's methods.

876:    In both the recursion and Hotelling's methods, the 1-norms were

877:    calculated for three-, five-, and seven-shell clusters as a function

878:    of number of recursion levels and iterations, respectively.}

879:  \end{figure}

880:

881: % Fig.3

882:

883:  \begin{figure}[t]

884:   \caption{\small

885:    The 1-norm of the error matrix for fcc Al calculated by

886:    the (a) recursion, (b) divide, and (c) Hotelling's methods.

887:    In both the recursion and Hotelling's methods, the 1-norms were

888:    calculated for four- and six-shell clusters as a function

889:    of number of recursion levels and iterations, respectively.

890:    }

891:  \end{figure}

892:

893: % Fig.4

894:

895:  \begin{figure}[t]

896:   \caption{\small

897:    The 1-norm of the error matrix for (a) diamond and (b) fcc Al

898:    against the computational time taken per atom calculated by three

899:    O($N$) inverting methods. The calculations were performed using

900:    single processor on a compaq ES40 workstation.}

901:  \end{figure}

902:

903: \end{document}

904:

905: