0111:cond-mat0111199/qn.tex

1: \documentstyle[seceq,preprint]{jpsj}

2: %\documentstyle[12pt]{article}

3: %\documentstyle[twocolumn]{jpsj}

4:

5: \newcommand{\Rvec}{\mbox{\boldmath $R$}}

6: \newcommand{\gvec}{\mbox{\boldmath $g$}}

7: \newcommand{\xvec}{\mbox{\boldmath $x$}}

8: \newcommand{\pvec}{\mbox{\boldmath $p$}}

9: \newcommand{\rvec}{\mbox{\boldmath $r$}}

10: \newcommand{\uvec}{\mbox{\boldmath $u$}}

11: \newcommand{\vvec}{\mbox{\boldmath $v$}}

12: \newcommand{\qvec}{\mbox{\boldmath $q$}}

13: \newcommand{\svec}{\mbox{\boldmath $s$}}

14: \newcommand{\yvec}{\mbox{\boldmath $y$}}

15: \newcommand{\zvec}{\mbox{\boldmath $z$}}

16: \newcommand{\wfvec}{\mbox{\boldmath $\Psi$}}

17: \newcommand{\NB}{N_{\mbox{\tiny B}}}

18: \newcommand{\NG}{N_{\mbox{\tiny G}}}

19: \newcommand{\NENE}{N_{\mbox{\tiny E}}}

20: \newcommand{\NBIT}{N_{\mbox{\tiny bit}}}

21: \newcommand{\NITER}{N_{\mbox{\tiny iter}}}

22: \newcommand{\IMAX}{I_{\mbox{\tiny max}}}

23: \newcommand{\PINT}{\pvec_{\mbox{\tiny I}}}

24:

25:

26: \pagestyle{plain}

27: \setlength{\oddsidemargin}{0cm}

28: \setlength{\textwidth}{16cm}

29: \setlength{\topmargin}{0cm}

30: \setlength{\textheight}{23cm}

31:

32: \begin{document}

33:

34: \title{An efficient algorithm for electronic-structure calculations}

35:

36: \author{E\hspace{0.4mm}i\hspace{0.4mm}j\hspace{0.4mm}i Tsuchida}

37:

38: \inst{Research Institute for Computational Sciences, AIST, \\

39: Tsukuba Central 2, Umezono 1-1-1, Tsukuba, Ibaraki 305-8568, Japan}

40:

41: \abst{

42: We show how to adapt the quasi-Newton method

43: to the electronic-structure calculations

44: using systematic basis sets.

45: Our implementation requires less iterations than

46: the conjugate gradient method, while the computational

47: cost per iteration is much lower.

48: The memory usage is also quite modest,

49: thanks to the efficient representation of the

50: approximate Hessian.

51: }

52:

53: \kword{density-functional theory, quasi-Newton method,

54: BFGS update, finite-element method,

55: Born-Oppenheimer dynamics}

56:

57: \maketitle

58:

59:

60: \section{Introduction}

61:

62: The importance of the first-principles

63: electronic-structure calculations

64: based on the density-functional theory \cite{HK,KS,CP}

65: is increasing year by year \cite{PAY,TREV,MARX}.

66: Since the optimization of the ground-state wavefunctions is

67: the most time-consuming part of these calculations,

68: it is crucial to use an efficient algorithm for this purpose.

69: However, the number of degrees of freedom

70: is so large for systematic basis sets like

71: plane-waves \cite{PAY,TREV,MARX},

72: finite-differences \cite{CTS,BER,RREV},

73: and finite-elements \cite{RREV,WHT,PRB2,JPSJ,PKFS}, that

74: the memory usage of the algorithm being used

75: is severely restricted.

76: Currently, the conjugate gradient method \cite{RCP,GLL,TPA,SCP,BKL,PAY}

77: seems to be most widely used because of

78: its efficiency and modest memory usage,

79: while the direct inversion in the iterative subspace

80: (DIIS) \cite{PLY,WOZU,MACO,HLP,KRFU} is also sometimes used.

81:

82: On the other hand, the quasi-Newton methods have rarely been used

83: for electronic optimization

84: in combination with systematic basis sets, although their efficiency

85: is well known \cite{SREV,FREV,NREV};

86: to the best of our knowledge, the application of the quasi-Newton

87: methods in this context

88: has been limited to atomic orbitals \cite{HEPO,FIAL} or

89: one-dimensional problems \cite{HSZ}.

90: This is presumably because they require significantly more storage

91: for the elements of the (approximate) Hessian matrix.

92: If an all-band update is used,

93: the dimension of the Hessian (${\cal H}$) is given by

94: ${\cal N} = \NB \NG$, where $\NB$ is the number of

95: orbitals and $\NG$ is the number of basis functions.

96: Therefore, the storage requirement for

97: ${\cal H} \, ({\cal N} \times {\cal N}) $ will be

98: ${\cal N}^2$ in a naive implementation \cite{RCP}, which is

99: prohibitive for large-scale simulations

100: where $\cal N$ can exceed 10$^7$.

101: A more practical implementation of the quasi-Newton method is

102: also found in the literature \cite{NOCE}, in which

103: only the $m$ previous steps are relevant.

104: Since two update vectors of size $\cal N$ are required

105: per step \cite{NOCE},

106: the memory usage amounts to $2 m {\cal N}$ elements,

107: where $m$ is usually less than 10.

108: However, this can be further reduced to $ m {\cal N}$

109: if the initial Hessian

110: is a multiple of the unit matrix \cite{SIEG,GILE}.

111: In this article, we present the implementation of

112: the quasi-Newton method using

113: the BFGS (Broyden-Fletcher-Goldfarb-Shanno) formula \cite{FREV,GILE}

114: along this line.

115: As explained in the next section,

116: we make a number of modifications to adapt the algorithm

117: to the electronic-structure calculations.

118: The most important one is the compression of the

119: update vectors by an order of magnitude,

120: which makes this algorithm attractive

121: even for very large systems.

122:

123: \section{Methods}

124:

125: \subsection{Electronic-structure calculations}

126: First of all, we explain the basic problems

127: in the electronic-structure

128: calculations within the density-functional theory \cite{HK,KS}.

129: Only real wavefunctions at the $\Gamma$-point of the Brillouin zone

130: are considered for notational simplicity,

131: but generalization to complex wavefunctions is straightforward.

132:

133: The total energy functional for an ionic configuration $\Rvec$

134: is given by \cite{PAY}

135: \begin{eqnarray}

136: E_{\mbox{\scriptsize total}} \, [\wfvec, \Rvec] & = &

137: \sum_i \int \psi_i (\rvec) \left[ -\nabla^2 +

138: V_{\mbox{\scriptsize ps}} [\Rvec] \right] \psi_i (\rvec) \, {\mbox d}\rvec

139: + E_{\mbox{\scriptsize Hxc}} [n (\rvec)]

140: + E_{\mbox{\scriptsize ion}} [\Rvec],

141: \end{eqnarray}

142: where

143: \begin{equation}

144: \wfvec=(\psi_1(\rvec) \;\; \psi_2(\rvec) \;\;... \;\; \psi_{\NB}(\rvec))^T,

145: \end{equation}

146: \begin{equation}

147: n (\rvec) = \sum_i |\psi_i (\rvec)|^2,

148: \end{equation}

149: and $E_{\mbox{\scriptsize Hxc}}$ is the sum of

150: the Hartree and exchange-correlation

151: energy, which is a nonlinear and nonlocal functional of

152: the electron density $ n (\rvec) $.

153: In practice, each $\psi_i(\rvec)$ is discretized by a

154: basis set expansion \cite{PAY,TREV,MARX,CTS,BER,WHT,RREV,PRB2,JPSJ,PKFS},

155: which makes $\wfvec$ a huge vector with ${\cal N} (=\NB \NG)$ elements.

156:

157: In the conventional approach \cite{PAY},

158: the ground-state energy $E_{\mbox{\tiny G}}$ and

159: wavefunctions $\wfvec_{\mbox{\tiny G}}$ for the given $\Rvec$

160: are obtained by minimization of

161: $ E_{\mbox{\scriptsize total}}[\wfvec, \Rvec] $

162: with respect to the wavefunctions $\wfvec$

163: under the orthonormality constraints:

164: \begin{equation}

165: \int \psi_i (\rvec) \, \psi_j (\rvec) \, {\mbox d}\rvec = \delta_{ij}.

166: \end{equation}

167: $\wfvec_{\mbox{\tiny G}}$ calculated in this way is then used to study

168: various properties of the system.

169:

170: In our implementation, on the other hand,

171: the above constraints are eliminated

172: by modifying the total energy functional

173: according to Refs. [\citen{SCP,APJ1,MGC,KV}],

174: in which orthonormality of the wavefunctions is satisfied

175: either implicitly \cite{SCP,APJ1,KV} or automatically \cite{MGC}.

176: Moreover, all the orbitals are updated

177: simultaneously \cite{KV,JPSJ}, and

178: self-consistency of $E_{\mbox{\scriptsize Hxc}}$ is taken into account

179: in the evaluation of its gradient.

180: Then, if the modified total energy functional for the given $\Rvec$

181: is denoted by $E \, [\wfvec]$, $E_{\mbox{\tiny G}}$

182: and $\wfvec_{\mbox{\tiny G}}$ are obtained by

183: minimization of $E \, [\wfvec]$ with respect to $\wfvec$

184: without any constraints.

185: Thanks to this reformulation, we can easily implement

186: the quasi-Newton method which is one of the most efficient algorithms

187: for the unconstrained optimization of

188: nonlinear functions \cite{SREV,FREV,NREV}.

189: Furthermore, the use of nonorthogonal basis functions is

190: much easier in this case \cite{JPSJ,KV}.

191: The above ground-state calculations are usually performed

192: for a series of slowly varying $\Rvec$, each of which

193: is called an {\it ionic step}.

194:

195: \subsection{BFGS with full Hessian}

196: \label{FBFGS}

197: We illustrate the conventional

198: quasi-Newton method using the BFGS formula \cite{FREV,GILE} here,

199: which will serve as a prototype

200: for the implementation in reduced space.

201: For simplicity, we assume the new total energy

202: (eq. (\ref{ENEW})) is always lower than the previous value. \\

203: \ \\

204: Choose ${\cal H}_0$ and $\wfvec_0$. \\

205: Calculate $ E_0 = E \, [\wfvec_0]$ and

206: $\gvec_0 = \nabla E \, [\wfvec_0]$. \\

207: Set k=0. \\

208: Do while ($|\gvec_k| \ge \epsilon$)

209: \begin{equation}

210: \pvec_k = -{\cal H}_k^{-1} \gvec_k

211: \end{equation}

212: \begin{equation}

213: \wfvec_{k+1} = \wfvec_k + \pvec_k

214: \end{equation}

215: \begin{equation}

216: \label{ENEW}

217: E_{k+1} = E \, [\wfvec_{k+1}]

218: \end{equation}

219: \begin{equation}

220: \gvec_{k+1} = \nabla E \, [\wfvec_{k+1}]

221: \end{equation}

222: \begin{equation}

223: \Delta \wfvec_k = \wfvec_{k+1} - \wfvec_k

224: \end{equation}

225: \begin{equation}

226: \Delta \gvec_k = \gvec_{k+1} - \gvec_k

227: \end{equation}

228: \begin{equation}

229: \label{FBFGSEQ}

230: {\cal H}_{k+1} = {\cal H}_k -

231: \frac{{\cal H}_k \Delta \wfvec_k \Delta \wfvec_k^T {\cal H}_k}

232: {\Delta \wfvec_k^T {\cal H}_k \Delta \wfvec_k}

233: +\frac{\Delta \gvec_k \Delta \gvec_k^T}

234: {\Delta \wfvec_k^T \Delta \gvec_k}

235: \end{equation}

236: \begin{equation}

237: k=k+1

238: \end{equation}

239: End do \\

240:

241: While this algorithm is simple and efficient in terms of the

242: convergence rate, its memory usage and computational effort

243: scale as $O({\cal N}^2)$ and $O({\cal N}^3)$ respectively,

244: which are prohibitive.

245: Although the latter can be reduced to $O({\cal N}^2)$, if the

246: updating formula for the inverse Hessian (${\cal H}^{-1}$)

247: is used \cite{RCP}, this is still far from practical.

248: The purpose of this article is to present

249: the improved algorithm \cite{SIEG,GILE}

250: in which both scale as $O({\cal N})$ with modest prefactors.

251:

252: \subsection {QR-decomposition}

253: \label{QRD}

254: At this point, we give a brief introduction to the

255: QR-decomposition \cite{RCP},

256: which plays an important role in the algorithm

257: presented in the next section.

258: Let us assume $B ({\cal N} \times r)$ is a set of

259: linearly independent vectors:

260: \begin{equation}

261: B = (\pvec_1 \;\; \pvec_2 \;\; \cdots \;\; \pvec_r),

262: \end{equation}

263: where $1 \le r \ll {\cal N}$.

264: Then the QR-decomposition of $B$ is given by

265: \begin{equation}

266: B = Z \,\, T,

267: \end{equation}

268: where

269: $Z ({\cal N} \times r)$ is a set of orthonormal vectors spanning the

270: same subspace as $B$, i.e.

271: \begin{equation}

272: Z^T Z = I,

273: \end{equation}

274: and $T (r \times r)$ is an invertible upper-triangular matrix.

275: In practice, this decomposition is obtained by applying the

276: addition procedure given below repeatedly, which is

277: (mathematically) equivalent to constructing an orthonormal basis

278: from the left ($\pvec_1$) to the right ($\pvec_r$)

279: by the Gram-Schmidt scheme.

280: Note, however, that only $B$ and $T$ are considered explicitly

281: in the following \cite{SIEG,GILE}.

282:

283: Here we show how to update the above QR-decomposition

284: when $B$ is slightly modified.

285: In the first case where a vector $\gvec$ is added to $B$, i.e.

286: \begin{equation}

287: B_+ = (\pvec_1 \;\; \pvec_2 \;\; \cdots \;\; \pvec_r \;\; \gvec)

288: = (B \;\; \gvec),

289: \end{equation}

290: the new decomposition is given by

291: \begin{equation}

292: B_+ = Z_+ \, T_+,

293: \end{equation}

294: where

295: \begin{equation}

296: T_+ ((r+1) \times (r+1)) = \left(

297: \begin{array}{cc}

298: T &  \uvec \\

299: 0 &  \rho \\

300: \end{array}

301: \right) ,

302: \end{equation}

303: \begin{equation}

304: \uvec = Z^T \gvec = (T^T)^{-1} ( B^T \gvec ),

305: \end{equation}

306: and

307: \begin{equation}

308: \rho = \sqrt{|\gvec|^2 - |\uvec|^2}.

309: \end{equation}

310: If $\rho \ne 0$, $T_+$ is also an invertible upper-triangular

311: matrix.

312:

313: Next, we consider the case of

314: dropping the leftmost vector $\pvec_1$ from $B$, i.e.

315: \begin{equation}

316: B_- = (\pvec_2 \;\; \pvec_3 \;\; \cdots \;\; \pvec_r).

317: \end{equation}

318: The corresponding decomposition is given by

319: \begin{equation}

320: B_- = Z_- T_-,

321: \end{equation}

322: where $T_-$ satisfies

323: \begin{equation}

324: \label{QRM}

325: T^T_- \, T_- = B^T_- \, B_-.

326: \end{equation}

327: Obviously, the right-hand side of eq. (\ref{QRM}) is

328: included in $ B^T B$, which is easily calculated from

329: \begin{equation}

330: B^T B = T^T T.

331: \end{equation}

332: Therefore, $T_-$ is obtained by the Cholesky decomposition \cite{RCP}

333: of a small matrix at negligible cost.

334: A more refined approach is introduced

335: in Ref. \citen{GILE}, but

336: the above procedure seems to be sufficient for our present purpose.

337:

338:

339: \subsection{BFGS with reduced Hessian and limited memory}

340: \label{ALG}

341: Here we present the state-of-the-art implementation of the

342: quasi-Newton method \cite{SIEG,GILE}, which is obtained by

343: modifying the conventional algorithm ($\S$ \ref{FBFGS})

344: under two assumptions: (i) ${\cal H}_0 = \sigma I$ ($\sigma > 0$), and

345: (ii) At most $m$ previous steps are stored.

346:

347: In order to fully exploit these conditions, it is more

348: convenient to use a compact representation for the

349: Hessian:

350: \begin{equation}

351: H = Z^T \, {\cal H} \, Z,

352: \end{equation}

353: where $Z \, ({\cal N} \times r)$ is the current (orthonormal) basis,

354: $H \, (r \times r)$ is the reduced Hessian,

355: and $1 \le r \le m+1 \ll {\cal N}$.

356: While $Z$ and ${\cal H}$ also appear in the following algorithm,

357: they are not explicitly calculated.

358: The reduced vectors are defined in a similar way;

359: the reduced gradient $\uvec$, for instance, is given by

360: $\uvec = Z^T \gvec$, where $\gvec = \nabla E$.

361: The correspondence between the full/reduced vectors

362: is shown in Table \ref{TAB0}.

363:

364: \begin{enumerate}

365: \item \label{QNINI} Initilization: \\

366: Set $k=0$ and $r=1$, where $k$ and $r$ denote the loop index

367: and the rank of the reduced space, respectively. \\

368: Choose the initial wavefunction ($\wfvec_0$),

369: the approximate curvature ($\sigma$),

370: the convergence criterion ($\epsilon$), and

371: the maximum rank of the reduced space ($m$). \\

372: Calculate the total energy

373: \begin{equation}

374: E_0=E \, [\wfvec_0]

375: \end{equation}

376: and its gradient

377: \begin{equation}

378: \gvec_0=\nabla E \, [\wfvec_0].

379: \end{equation}

380: {\sf IF} ($|\gvec_0| < \epsilon$) {\sf THEN} quit. \\

381: {\sf ELSE} $ H_0 = (\sigma), \; B_0=(\gvec_0), \;

382: T_0=(|\gvec_0|)$, and $ \vvec_0=(|\gvec_0|) $. Moreover,

383: $Z_0 = (\gvec_0/|\gvec_0|)$ and ${\cal H}_0 = \sigma I$

384: are implicitly assumed. \\

385: If the Hessian of the previous ionic step is

386: taken over, several modifications are

387: required in this step, which are, however, straightforward.

388:

389: \item \label{QNLOOP} Calculate the new search direction in reduced space:

390: \begin{equation}

391: \label{QKEQ}

392: \qvec_k = -H_k^{-1} \vvec_k.

393: \end{equation}

394: \item \label{PFUL} Calculate the new search direction:

395: \begin{equation}

396: \label{FULLP}

397: \pvec_k = Z_k \qvec_k = B_k (T_k^{-1} \qvec_k).

398: \end{equation}

399: \item \label{UPB1} Update the subspace: \

400: $ (B_k = Z_k T_k \rightarrow B_k' = Z_k T_k') $,

401: where

402: \begin{equation}

403: B_k ({\cal N} \times r) =

404: (\pvec_{k-r+1} \;\; \cdots \;\; \pvec_{k-1} \;\; \gvec_k)

405: \end{equation}

406: and

407: \begin{equation}

408: B'_k ({\cal N} \times r) =

409: (\pvec_{k-r+1} \;\; \cdots \;\; \pvec_{k-1} \;\; \pvec_k).

410: \end{equation}

411: $T_k'$ is obtained from $T_k$ and $\qvec_k$,

412: whereas $Z_k$ remains unchanged \cite{GILE}.

413: \item Set $\alpha = 1$ and calculate the gradient of the total energy

414: along $\pvec_k$ as

415: \begin{equation}

416: \label{EDEQ}

417: E' = \left.

418: \frac{\partial E \, [\wfvec_k + \alpha \pvec_k] }{\partial \alpha}

419: \right|_{\alpha=0}

420: = \gvec_k^T \pvec_k = \vvec_k^T \qvec_k.

421: \end{equation}

422: \item \label{QNLINMIN} Calculate the new wavefunction:

423: \begin{equation}

424: \wfvec_{k+1} = \wfvec_k + \alpha \pvec_k.

425: \end{equation}

426: \item \label{QNLM2} Calculate the new total energy:

427: \begin{equation}

428: E_{k+1} = E \, [\wfvec_{k+1}].

429: \end{equation}

430: {\sf IF} $(E_{k+1} \geq E_k)$ {\sf THEN}

431: estimate the optimal $\alpha$ by a parabolic fit

432: with $E_k$, $E'$, and $E_{k+1}$, and go to \ref{QNLINMIN}.

433: \item \label{GFUL} Calculate the new gradient:

434: \begin{equation}

435: \gvec_{k+1}=\nabla E \, [\wfvec_{k+1}].

436: \end{equation}

437: {\sf IF} ($|\gvec_{k+1}| < \epsilon$) {\sf THEN} quit.

438:

439: \item \label{RGITEM} Extend the subspace: \

440: $( B_k'=Z_k T_k' \rightarrow B_k''= Z_k' T_k'') $,

441: where

442: \begin{equation}

443: B'_k({\cal N} \times r) =

444: (\pvec_{k-r+1} \;\; \cdots \;\; \pvec_k)

445: \end{equation}

446: and

447: \begin{equation}

448: B_k'' ({\cal N} \times (r+1)) =

449: (\pvec_{k-r+1} \;\; \cdots \;\; \pvec_{k} \;\; \gvec_{k+1}).

450: \end{equation}

451: As explained in $\S$ \ref{QRD},

452: $T_k''$ is obtained from $T_k'$, $\uvec_k$, and $\rho_{k+1}$, where

453: \begin{equation}

454: \label{REDG}

455: \uvec_{k} = Z_k^T \gvec_{k+1}=(T_{k}^{'T})^{-1} (B_{k}^{'T} \gvec_{k+1})

456: \end{equation}

457: and

458: \begin{equation}

459: \rho_{k+1} = \sqrt{|\gvec_{k+1}|^2 - |\uvec_k|^2}.

460: \end{equation}

461: We assume $\rho_{k+1} \ne 0$ in the following.

462: Then, the new basis $Z_k' ({\cal N} \times (r+1))$

463: is given by \cite{GILE}

464: \begin{equation}

465: Z_k' = ( Z_k \;\;\; \zvec_{k+1}),

466: \end{equation}

467: where $\zvec_{k+1} = (\gvec_{k+1} - Z_k \uvec_k) / \rho_{k+1}$.

468: However, $\zvec_{k+1}$ is not explicitly calculated.

469:

470: \item $ r=r+1 $

471: \item Calculate the reduced gradients as

472: \begin{equation}

473: \vvec_k' = Z_k^{'T} \gvec_k = \left(

474: \begin{array}{c}

475: \vvec_k \\

476: 0 \\

477: \end{array} \right)

478: \end{equation}

479: and

480: \begin{equation}

481: \uvec_k' = Z_k^{'T} \gvec_{k+1} = \left(

482: \begin{array}{c}

483: \uvec_k \\

484: \rho_{k+1}

485: \end{array}

486: \right).

487: \end{equation}

488: There is no loss of information here, since

489: $\gvec_k, \gvec_{k+1} \in Z_k'$.

490:

491: \item \label{QNSY}

492: Update the reduced Hessian using the BFGS formula \cite{FREV,GILE}:

493: \begin{equation}

494: \label{RBFGSEQ}

495: H''_k (r \times r)

496: = Z_k^{'T} {\cal H}_{k}^{+} Z'_k

497: = H'_k

498: - \frac{H'_k \svec_k \svec_k^T H'_k}{\svec_k^T H'_k \svec_k}

499: + \frac{\yvec_k \yvec_k^T}{\svec_k^T \yvec_k},

500: \end{equation}

501: where

502: \begin{equation}

503: \svec_k = Z_{k}^{'T} \Delta \wfvec_k =

504: \alpha \left(

505: \begin{array}{c}

506: \qvec_k \\

507: 0 \\

508: \end{array}

509: \right),

510: \end{equation}

511: \begin{equation}

512: \yvec_k = Z_{k}^{'T} \Delta \gvec_k = \uvec_k' - \vvec_k',

513: \end{equation}

514: and

515: \begin{equation}

516: \label{SIG2}

517: H'_{k}(r \times r) = Z_k^{'T} {\cal H}_{k} Z'_k =

518: \left(

519: \begin{array}{cc}

520: H_{k}  & 0 \\

521: 0    & \sigma \\

522: \end{array}

523: \right).

524: \end{equation}

525: ${\cal H}_k^+$ is defined as the right-hand side of

526: eq. (\ref{FBFGSEQ}), and eq. (\ref{RBFGSEQ}) is derived from

527: this definition.

528: Note that $\svec_k^T \yvec_k$ $ > 0$ is assumed here; otherwise,

529: the Hessian is not updated.

530:

531: \item {\sf IF} $(r = m+1)$ {\sf THEN}

532: reduce the subspace: \ $(B_k'' = Z_k' T_k''

533: \rightarrow B_{k+1} = Z_{k+1} T_{k+1})$, where

534: \begin{equation}

535: B_k'' ({\cal N} \times (m+1)) =

536: (\pvec_{k-m+1} \;\; \pvec_{k-m+2} \;\;

537: \cdots \;\; \pvec_{k} \;\; \gvec_{k+1})

538: \end{equation}

539: and

540: \begin{equation}

541: B_{k+1} ({\cal N} \times m) =

542: (\pvec_{k-m+2} \;\; \cdots \;\; \pvec_{k} \;\; \gvec_{k+1}).

543: \end{equation}

544: $T_{k+1}$ is easily obtained from $T_k''$ according to $\S$ \ref{QRD}.

545: Then, $H_{k+1} (m \times m) = Z_{k+1}^T {\cal H}_k^+ Z_{k+1}$

546: is calculated from $T_k'', T_{k+1}$, and $H_k''$

547: by way of $B_{k+1}^T {\cal H}_k^+ B_{k+1}$.

548: At this point, the new Hessian (${\cal H}_{k+1}$)

549: in the new basis ($Z_{k+1}$ and its orthogonal complement)

550: is defined as a block-diagonal matrix consisting of $H_{k+1}$

551: and $\sigma I (({\cal N}-m) \times ({\cal N}-m))$,

552: which was implicitly used in eq. (\ref{SIG2}).

553: Therefore, part of the information contained in ${\cal H}_k^+$

554: has been discarded here.

555: Similarly, $\vvec_{k+1} = Z_{k+1}^T \gvec_{k+1}$ is calculated

556: from $T_k'', T_{k+1}$, and $\uvec_k'$

557: by way of $B_{k+1}^T \gvec_{k+1}$, but there is no loss of

558: information here.

559: Finally, we set $r=m$. \\

560: {\sf ELSE}

561: $ H_{k+1}=H_k'', B_{k+1}=B_k'', T_{k+1}=T_k''$,

562: and $\vvec_{k+1}= \uvec_k' $.

563: Moreover, $Z_{k+1} = Z_k'$ and ${\cal H}_{k+1} = {\cal H}_k^+$

564: are implicitly assumed.

565: \item $k=k+1$

566: \item Go to \ref{QNLOOP}.

567: \end{enumerate}

568:

569: \begin{itemize}

570: \item While $0 \le k \le m-1$,

571: this algorithm is identical to the conventional one

572: ($\S$ \ref{FBFGS}) with ${\cal H}_0 = \sigma I $

573: within round-off errors.

574: The two algorithms begin to differ once $k$ reaches $m$,

575: but the deterioration of the convergence rate is minimized

576: by constructing the subspace with

577: the previous search directions rather than the

578: gradients \cite{SIEG,GILE}.

579:

580: \item For simplicity,

581: the above algorithm includes minimal exception handling.

582: Therefore, the original paper \cite{GILE} should be

583: consulted for a more complete one.

584: However, such exceptions are observed

585: only in the very early stages of the first

586: ionic step, where the quadratic model is not valid.

587:

588: \item One cycle requires approximately

589: $ 2 r {\cal N}$ multiply-and-add operations,

590: arising from eqs. (\ref{FULLP}) and (\ref{REDG}).

591: For practical values of $m \, (< 10$), these costs will be

592: much lower than those of evaluating

593: the total energy in step \ref{QNLM2} \cite{PAY}.

594:

595: \item The basis functions should be appropriately

596: scaled \cite{TPA,HEPO} in advance,

597: so that their contribution to the total energy is similar.

598:

599: \item The reduced Hessian $H_k$ is diagonalized in each cycle to

600: guarantee its positive definiteness; nonpositive eigenvalues,

601: if any, are modified appropriately.

602: Then, it follows from eqs. (\ref{QKEQ}) and (\ref{EDEQ}) that

603: $ E' = - \vvec_k^T H_k^{-1} \vvec_k < 0$,

604: because $H_k^{-1}$ is also positive definite and

605: $|\vvec_k| = |\gvec_k| \ge \epsilon$.

606: Furthermore,

607: the average eigenvalue of the reduced Hessian,

608: denoted by $\lambda_k$, is also calculated and stored for later use.

609:

610: \item We explain the choice of $\sigma$

611: used in step \ref{QNINI} and \ref{QNSY} here.

612: Since $\sigma$ is the approximate curvature

613: along the new direction \cite{GILE}, a reasonable

614: estimate is needed to achieve high performance.

615: Therefore, a number of strategies have been proposed

616: to choose optimal $\sigma$ \cite{NOCE,SIEG,GILE},

617: most of which provide dynamical estimates.

618: Nevertheless, we use a constant $\sigma$ during each ionic step

619: unless otherwise noted, which is determined as follows:

620: In the first ionic step, $\sigma$ is estimated from the

621: coarse grid iterations \cite{JPSJ}.

622: At the end of each ionic step, the sequence $\{ \lambda_k \}$

623: is further averaged

624: to give the new $\sigma$ for the next ionic step.

625: $\sigma$ obtained in this way varies only slowly with ionic steps,

626: while providing stable and high performance

627: in the systems we have studied so far.

628: Comparison is also made with the dynamical estimates

629: in $\S$ \ref{RESSEC}.

630: \end{itemize}

631:

632: \subsection{Data compression}

633: \label{ZIP}

634: The memory usage of the algorithm illustrated in the

635: previous section is dominated by the $m$ previous search directions,

636: which amount to $m {\cal N}$ elements.

637: While this is much smaller than

638: the storage of the full Hessian $(={\cal N}^2)$, it is still

639: a serious obstacle in large-scale simulations.

640: In what follows, we present a simple algorithm to compress

641: the previous search directions

642: without sacrificing the efficiency of the original method.

643: In this algorithm, one search direction is compressed

644: in each cycle, by taking advantage of its structure.

645: If $ \pvec \, ({\cal N}) $, which is being compressed,

646: is viewed as a two-dimensional array $ \pvec \, (\NB, \NG) $,

647: the size of $\pvec \, (i, j)$

648: for a given basis function ($j$) is

649: expected to be similar for all orbitals ($i$).

650: Based on this idea, the largest element of $ |\pvec (i, j)|$

651: with respect to $i$ is chosen as the scale factor.

652: Moreover, $\NBIT$ is defined as the number of bits

653: assigned to each element of $\pvec$ after compression.

654:

655: Then, the scale factor $\omega \, (\NG)$

656: and the compressed array $\PINT \, (\NB, \NG)$ are given by

657: \begin{equation}

658: \label{SCLF}

659: \mbox{real$\ast$8} \;\;\;\;\; \omega \, (j) =

660: \left( \max_{1 \le i \le \NB} |\pvec \, (i,j)| \right) / \IMAX

661: \end{equation}

662: and

663: \begin{equation}

664: \mbox{integer}  \;\;\;\;\; \PINT \, (i,j)=

665: \mbox{round} \left( \frac{\pvec \, (i,j)}{\omega \, (j)} \right)

666: + \IMAX

667: \end{equation}

668: respectively,

669: where $ \IMAX = 2^{\NBIT-1} -1 $

670: and $ 0 \le \PINT \, (i,j) \le 2 \IMAX = 2^{\NBIT} -2 $.

671: Therefore, each element of $\PINT$ is representable by

672: $\NBIT$ bits.

673: The original values of $\pvec$ are recovered approximately by

674: \begin{equation}

675: \pvec \, (i,j) \approx \omega (j) \, (\PINT (i,j) - \IMAX).

676: \end{equation}

677:

678: In this method, the quality of the compression can

679: be controlled by a single parameter, $\NBIT$.

680: Furthermore, the largest element for each $j$,

681: which is the most important one, remains exact.

682:

683: The total storage for the $m$ search directions

684: after compression is $ m \NBIT {\cal N} / 8 $ bytes,

685: if appropriately packed with bit operations.

686: If $m=\NBIT=8$, for instance, this amounts to only

687: one double-precision array of size $\cal N$.

688: Note also that the storage for the scale factors is minor.

689:

690: In the current implementation,

691: $\pvec_k$ is compressed in step \ref{UPB1}, when added to $B_k'$.

692: At the same time, the last column of $T_k'$ is

693: calculated directly from the compressed $\pvec_k$

694: (rather than using $\qvec_k$)

695: to maintain the consistency of the QR-decomposition.

696: However, the uncompressed $\pvec_k$ is also retained and used

697: in step \ref{QNLINMIN}.

698:

699: Unfortunately, some inconsistency seems inevitable

700: in the update of the reduced Hessian, since

701: $\svec_k$ and $\yvec_k$ no longer belong to $Z_k'$.

702: Nevertheless,

703: $E' < 0$ remains valid as long as the reduced Hessian is positive

704: definite and the latest $\gvec$ and $\pvec$ are uncompressed.

705: Therefore, the stability of the minimization is guaranteed

706: even if the previous search directions are highly compressed.

707:

708: \section{Results}

709: \label{RESSEC}

710: As a test of our implementation under realistic conditions,

711: we performed a series of Born-Oppenheimer dynamics \cite{WM}

712: for bulk diamond at 220 K in a periodic cubic supercell of

713: 64 atoms within the local density approximation \cite{HK,KS}.

714: The wavefunctions were expanded by the adaptive finite-element

715: method \cite{PRB2,JPSJ} with an average cutoff energy of 43 Ry,

716: which corresponds to $ \NG = 8 \times 14^3 = 21,952$.

717: Since $\NB$ is equal to 128,

718: ${\cal N}$ amounts to approximately 2,800,000 in this system.

719: The Brillouin zone was sampled only at the $\Gamma$-point,

720: and the separable pseudopotentials were used \cite{KB,GTH}.

721: The convergence criterion ($\epsilon$) was chosen so that

722: $|E_{k+1} - E_k| \simeq 2 \times 10^{-8}$ Ry/atom

723: when $|\gvec_{k+1}| < \epsilon$ was satisfied.

724: Convergence to the ground state was accelerated by the

725: enhanced extrapolation scheme \cite{EES},

726: which provides accurate

727: initial wavefunctions with the help of population analysis.

728:

729: The equations of motion for the ions were integrated using

730: the velocity-Verlet method \cite{LIQ}

731: with a timestep of 80 a.u. ($\sim 2$ fs).

732: Starting from the same ionic configuration,

733: each run lasted for 57 ionic steps,

734: the last 50 steps of which were used to collect the statistics.

735: Moreover, $B,T$, and $H$ were taken over from

736: previous ionic steps unless otherwise noted.

737: Therefore, these matrices were saturated during this period in all runs.

738:

739: We first show the average number of iterations

740: ($\NITER$) and total energy evaluations ($\NENE$)

741: needed to optimize the electronic-structures

742: for the conjugate gradient method

743: using the Polak-Ribiere formula \cite{RCP} and

744: the quasi-Newton method using the BFGS formula

745: in Table \ref{TAB1}.

746: The convergence rate of the quasi-Newton method

747: as measured by $\NITER$ is already comparable to that of

748: the conjugate gradient method for $m = 2$,

749: and becomes better as $m$ is increased.

750: However, there is no point in using $m$ much larger

751: than $\NITER$ (say, 20), because the Hessian

752: is dominated by the contribution from previous ionic steps.

753: In practice,

754: any reasonable choice of $m$, e.g. 5-8, will provide near-optimal

755: performance, since $\NITER$ depends

756: only weakly on $m$ in this range.

757: Note also that the CPU-time is more closely related to

758: $\NENE$ than $\NITER$.

759: Therefore, the quasi-Newton method was much faster

760: than the conjugate gradient method for all $m$ we tried.

761: Specifically, $\NENE = 2 \NITER+1$ in the conjugate gradient method,

762: because at least one line search was forced

763: to maintain the conjugacy of the search directions.

764: In contrast,

765: $ \NENE = \NITER+1$ in the quasi-Newton method, which means

766: that no line search was required in step \ref{QNLM2}.

767:

768: The algorithm presented in $\S$ \ref{ALG} has a number of

769: options which are not uniquely determined.

770: Therefore, we examine some of them here,

771: as shown in Table \ref{TAB2}.

772: (a) is the reference run performed with $m=7$,

773: taken from Table \ref{TAB1}.

774: (b)-(d) were performed under the same conditions as (a)

775: except for the following points:

776: (b) A line search with a parabolic fit was forced in each cycle

777: to see if the convergence rate is improved.

778: However, $\NENE$ was almost doubled without any reduction of $\NITER$.

779: Therefore, it is not justified to perform a line search

780: in the quasi-Newton method,

781: which is consistent with previous findings \cite{NOCE}.

782: (c) The Hessian was discarded at the end of each ionic step.

783: Since the convergence rate deteriorates significantly,

784: the inheritance of the Hessian seems to be profitable.

785: (d) We tried $ \sigma_k=|\yvec_k|^2 / \yvec_k^T \svec_k $,

786: which gave good results in Refs. [\citen{NOCE,GILE}].

787: However, this choice requires more iterations on average,

788: presumably because $\sigma_k$ varies too rapidly with $k$.

789: The norm of the gradient also decays

790: less smoothly in this case.

791:

792: So far the previous search directions have been uncompressed,

793: i.e. stored as 64-bit double-precision arrays.

794: The effect of compression is examined here

795: in a series of runs for $m=3$ and $7$,

796: with different $\NBIT$.

797: As shown in Table \ref{TAB3}, the performance

798: is maintained after compression by a factor of 8-16,

799: especially for $m=7$.

800: Moreover, no instability occurred up to $\NBIT=3$.

801: We also show the distribution of the search direction

802: after compression with $\NBIT=8$

803: in Fig. \ref{PDOS}.

804: The distribution function $d(x)$ is defined as the

805: number of elements of $\PINT$ such that

806: $\PINT (i) = x$, where $ 1 \le i \le {\cal N}$ and

807: $ 0 \le x \le 2^8-2 = 254$.

808: Therefore, $\sum_x d(x) = {\cal N}$, and

809: there are two singularities at $x=0$ and 254.

810: The width of the distribution is approximately equal to $\IMAX$,

811: which indicates that our choice of the scale factor

812: (eq.(\ref{SCLF})) is appropriate.

813:

814: Finally, in order to examine the generality of

815: our implementation,

816: part of the runs were repeated for

817: an isolated cytosine molecule (C$_4$H$_5$N$_3$O)

818: in a cubic supercell of (16 a.u.)$^3$, with a timestep of 40 a.u.

819: ($\sim$ 1 fs).

820: The average cutoff energy was 39 Ry, which

821: corresponds to $\NB=21, \NG= 8 \times 16^3 = 32,768$, and

822: ${\cal N} \sim 700,000 $.

823: The results shown in Table \ref{TAB4} suggest

824: that the performance of the quasi-Newton method in this system

825: is somewhat more robust against compression,

826: but is qualitatively similar to the previous results in other respects.

827:

828: \section{Summary}

829: We have shown in this article that

830: the quasi-Newton method using the BFGS formula

831: is the method of choice for

832: large-scale electronic-structure calculations,

833: if combined with efficient memory management.

834: The advantages of the quasi-Newton method over the conjugate

835: gradient method are summarized as follows:

836: (i) The Hessian of the previous ionic step can

837: be taken over to accelerate the convergence.

838: (ii) Practically no line search

839: is required, which reduces the cost of

840: each step significantly.

841:

842: Although there is room for fine-tuning the algorithm and

843: more extensive tests are necessary,

844: the quasi-Newton method will

845: provide significant speedups of the first-principles codes,

846: together with other techniques like

847: the enhanced extrapolation scheme \cite{EES} and

848: the constrained molecular dynamics \cite{RAT,JPSJ2}.

849:

850: \section*{Acknowledgements}

851: The author would like to thank Dr. K.~Terakura for

852: helpful discussions.

853: The numerical calculations were performed on Hitachi SR-8000 at the

854: Tsukuba Advanced Computing Center.

855:

856:

857: \begin{thebibliography}{99}

858:   \bibitem{HK} P.~Hohenberg and W.~Kohn: Phys.~Rev. {\bf 136} (1964) B864.

859:

860:   \bibitem{KS} W.~Kohn and L.~J.~Sham: Phys.~Rev. {\bf 140} (1965) A1133.

861:

862:   \bibitem{CP} R.~Car and M.~Parrinello:

863:   Phys.~Rev.~Lett. {\bf 55} (1985) 2471.

864:

865:   \bibitem{PAY} M.~C.~Payne, M.~P.~Teter, D.~C.~Allan, T.~A.~Arias

866:   and J.~D.~Joannopoulos: Rev.~Mod.~Phys. {\bf 64} (1992) 1045.

867:

868:   \bibitem{TREV} K.~Terakura: {\it Computational Physics

869:   as a New Frontier in Condensed Matter Research}, ed. H.~Takayama,

870:   M.~Tsukada, H.~Shiba, F.~Yonezawa, M.~Imada

871:   and Y.~Okabe (The Physical Society of Japan, Tokyo, 1995).

872:

873:   \bibitem{MARX} D.~Marx and J.~Hutter: {\it Modern Methods and

874:   Algorithms of Quantum Chemistry}, ed. J.~Grotendorst (NIC,

875:   Forschungszentrum Julich, 2000), available at

876:   http:// www.theochem.ruhr-uni-bochum.de/

877:   research/ marx/ index.en.html.

878:

879:   \bibitem{CTS} J.~R.~Chelikowsky, N.~Troullier, K.~Wu and Y.~Saad:

880:   Phys.~Rev.~B {\bf 50} (1994) 11355.

881:

882:   \bibitem{BER} E.~L.~Briggs, D.~J.~Sullivan and J.~Bernholc:

883:   Phys.~Rev.~B {\bf 54} (1996) 14362.

884:

885:   \bibitem{RREV} T.~L.~Beck: Rev.~Mod.~Phys. {\bf 72} (2000) 1041.

886:

887:   \bibitem{WHT} S.~R.~White, J.~W.~Wilkins and M.~P.~Teter:

888:   Phys.~Rev.~B {\bf 39} (1989) 5819.

889:

890:   \bibitem{PRB2} E.~Tsuchida and M.~Tsukada:

891:   Phys.~Rev.~B {\bf 54} (1996) 7602.

892:

893:   \bibitem{JPSJ} E.~Tsuchida and M.~Tsukada:

894:   J.~Phys.~Soc.~Jpn. {\bf 67} (1998) 3844,

895:   available at http:// wwwsoc.nii.ac.jp/ jps/ jpsj/.

896:

897:   \bibitem{PKFS} J.~E.~Pask, B.~M.~Klein, C.~Y.~Fong and P.~A.~Sterne:

898:   Phys.~Rev.~B {\bf 59} (1999) 12352.

899:

900:   \bibitem{RCP} W.~H.~Press, S.~A.~Teukolsky, W.~T.~Vetterling

901:   and B.~P.~Flannery: {\it Numerical Recipes in Fortran}

902:   (Cambridge University Press, Cambridge, 1992).

903:

904:   \bibitem{GLL} M.~J.~Gillan:

905:   J.~Phys.:~Condens.~Matter {\bf 1} (1989) 689.

906:

907:   \bibitem{TPA} M.~P.~Teter, M.~C.~Payne and D.~C.~Allan:

908:   Phys.~Rev.~B {\bf 40} (1989) 12255.

909:

910:   \bibitem{SCP} I.~Stich, R.~Car, M.~Parrinello and S.~Baroni:

911:   Phys.~Rev.~B {\bf 39} (1989) 4997.

912:

913:   \bibitem{BKL} D.~M.~Bylander, L.~Kleinman and S.~Lee:

914:   Phys.~Rev.~B {\bf 42} (1990) 1394.

915:

916:   \bibitem{PLY} P.~Pulay: Chem.~Phys.~Lett. {\bf 73} (1980) 393.

917:

918:   \bibitem{WOZU} D.~M.~Wood and A.~Zunger:

919:   J.~Phys.~A {\bf 18} (1985) 1343.

920:

921:   \bibitem{MACO} J.~L.~Martins and M.~L.~Cohen,

922:   Phys.~Rev.~B {\bf 37} (1988) 6134.

923:

924:   \bibitem{HLP} J.~Hutter, H.~P.~L\"uthi and M.~Parrinello:

925:   Comp.~Mat.~Sci. {\bf 2} (1994) 244.

926:

927:   \bibitem{KRFU} G.~Kresse and J.~Furthm\"uller:

928:   Comp.~Mat.~Sci. {\bf 6} (1996) 15.

929:

930:   \bibitem{SREV} T.~Schlick: Rev.~Comp.~Chem. {\bf 3} (1992) 1.

931:

932:   \bibitem{FREV} R.~Fletcher: Report NA/149, Department of

933:    mathematics and computer science, University of Dundee, 1993,

934:    availble at

935:    http:// citeseer.nj.nec.com/ fletcher93overview.html.

936:

937:   \bibitem{NREV} J.~Nocedal: {\it The State of the Art in

938:   Numerical Analysis}, ed. I.~S.~Duff and G.~A.~Watson

939:   (Oxford University Press, 1998), available at

940:   http:// www.ece.nwu.edu/\~{}nocedal/ recent\_{}pub.html.

941:

942:   \bibitem{HEPO} M.~Head-Gordon and J.~A.~Pople:

943:   J.~Phys.~Chem. {\bf 92} (1988) 3063.

944:

945:   \bibitem{FIAL} T.~H.~Fischer and J.~Alml\"of:

946:   J.~Phys.~Chem. {\bf 96} (1992) 9768.

947:

948:   \bibitem{HSZ} R.~A.~Hyman, M.~D.~Stiles and A.~Zangwill:

949:   Phys.~Rev.~B {\bf 62} (2000) 15521.

950:

951:   \bibitem{NOCE} D.~C.~Liu and J.~Nocedal:

952:   Math.~Prog. {\bf 45} (1989) 503.

953:

954:   \bibitem{SIEG} D.~Siegel: Report DAMTP 1992/NA12,

955:   University of Cambridge.

956:

957:   \bibitem{GILE} P.~E.~Gill and M.~W.~Leonard:

958:   Report NA 97-1, Department of Mathematics, Santa Clara University,

959:   available at http:// www.math.ucla.edu/

960:   \~{}mwl/ vita/ vita.html.

961:

962:   \bibitem{APJ1} T.~A.~Arias, M.~C.~Payne and J.~D.~Joannopoulos:

963:   Phys.~Rev.~Lett. {\bf 69} (1992) 1077.

964:

965:   \bibitem{MGC} F.~Mauri, G.~Galli and R.~Car:

966:   Phys.~Rev.~B {\bf 47} (1993) 9973.

967:

968:   \bibitem{KV} R.~D.~King-Smith and D.~Vanderbilt:

969:   Phys.~Rev.~B {\bf 49} (1994) 5828.

970:

971:   \bibitem{WM} See, {\it e.g.}, R.~M.~Wentzcovitch and J.~L.~Martins:

972:   Solid State Commun. {\bf 78} (1991) 831.

973:

974:   \bibitem{KB} L.~Kleinman and D.~M.~Bylander:

975:   Phys.~Rev.~Lett. {\bf 48} (1982) 1425.

976:

977:   \bibitem{GTH} S.~Goedecker, M.~Teter and J.~Hutter:

978:   Phys.~Rev.~B {\bf 54} (1996) 1703.

979:

980:   \bibitem{EES} E.~Tsuchida and K.~Terakura:

981:   J.~Phys.~Soc.~Jpn. {\bf 71} (2002), to appear.

982:

983:   \bibitem{LIQ} M.~P.~Allen and D.~J.~Tildesley:

984:   {\it Computer Simulation of Liquids} (Oxford Science Publications,

985:   Clarendon, Oxford, 1987).

986:

987:   \bibitem{RAT} H.~C.~Andersen: J.~Comp.~Phys. {\bf 52} (1983) 24.

988:

989:   \bibitem{JPSJ2} E.~Tsuchida and K.~Terakura:

990:   J.~Phys.~Soc.~Jpn. {\bf 70} (2001) 924.

991:

992: \end{thebibliography}

993:

994: \newpage

995:

996: \begin{table}

997: \caption{Notation for the full/reduced vectors. Note that

998: $\vvec$ and $\uvec$ denote the previous and

999: the current gradients, respectively.}

1000: \label{TAB0}

1001: \begin{tabular}{lcc}

1002: \hspace*{2cm} & \hspace{1.5cm} & \hspace{1.5cm} \\

1003: \hline

1004:          & full     & reduced \\

1005: \hline

1006: gradient         & $\gvec$    & $\vvec, \uvec$ \\

1007: search direction & $\pvec$    & $\qvec$ \\

1008: update vectors   & $\Delta \wfvec$  & $\svec$ \\

1009:                  & $\Delta \gvec$   & $\yvec$ \\

1010: wavefunction     & \wfvec     &  - \\

1011: \hline

1012: \end{tabular}

1013: \end{table}

1014:

1015:

1016: \begin{table}

1017: \caption{

1018: The performance of the conjugate gradient method and the

1019: quasi-Newton method is compared in the molecular-dynamics simulations

1020: of bulk diamond. $\NITER$ and $\NENE$ denote the number of iterations

1021: and total energy evaluations averaged over 50 ionic steps,

1022: respectively. }

1023: \label{TAB1}

1024: \begin{tabular}{rrr}

1025: \hspace*{2.0cm} & \hspace{1.5cm} & \hspace{1.5cm} \\

1026: \hline

1027: method     & $\NITER$ & $\NENE$ \\

1028: \hline

1029: Conjugate gradient  & 14.3  & 29.6 \\

1030: \hline

1031: BFGS, \,\,\, $m$ =  2 & 14.9  & 15.9 \\

1032:                     3 & 13.8  & 14.8 \\

1033:                     4 & 13.8  & 14.8 \\

1034:                     5 & 12.9  & 13.9 \\

1035:                     6 & 12.4  & 13.4 \\

1036:                     7 & 12.1  & 13.1 \\

1037:                     8 & 11.9  & 12.9 \\

1038:                     9 & 11.7  & 12.7 \\

1039:                    10 & 11.6  & 12.6 \\

1040:                    20 & 12.4  & 13.4 \\

1041: \hline

1042: \end{tabular}

1043: \end{table}

1044:

1045:

1046: \begin{table}

1047: \caption{

1048: A number of variants are compared for

1049: the BFGS with $m = 7$. (a) Reference run from Table \ref{TAB1}.

1050: (b) A line search with a parabolic fit was forced in each cycle.

1051: (c) The Hessian was discarded at the end of each ionic step.

1052: (d) $ \sigma_k = |\yvec_k|^2 / \yvec_k^T \svec_k $

1053: was used as the curvature for the new direction. }

1054: \label{TAB2}

1055: \begin{tabular}{ccc}

1056: \hspace*{1.5cm} & \hspace{2cm} & \hspace{1.5cm} \\

1057: \hline

1058: method     & $\NITER$ & $\NENE$ \\

1059: \hline

1060: (a)  & 12.1  & 13.1 \\

1061: (b)  & 12.2  & 25.4 \\

1062: (c)  & 15.1  & 16.1 \\

1063: (d)  & 14.3  & 15.4 \\

1064: \hline

1065: \end{tabular}

1066: \end{table}

1067:

1068:

1069: \begin{table}

1070: \caption{

1071: The effect of compression is compared for the BFGS with $m=3$ and 7.

1072: }

1073: \label{TAB3}

1074: \begin{tabular}{rrrr}

1075: \hspace*{1cm} & \hspace{1.2cm} & \hspace{1.5cm} & \hspace {1.5cm} \\

1076: \hline

1077: $m$   & $\NBIT$ & $\NITER$ & $\NENE$ \\

1078: \hline

1079:  3    & 64 & 13.8  & 14.8 \\

1080:       &  8 & 14.1  & 15.1 \\

1081:       &  4 & 14.3  & 15.3 \\

1082:       &  3 & 15.4  & 16.4 \\

1083:  7    & 64 & 12.1  & 13.1 \\

1084:       &  8 & 12.2  & 13.2 \\

1085:       &  4 & 12.2  & 13.2 \\

1086:       &  3 & 13.3  & 14.3 \\

1087: \hline

1088: \end{tabular}

1089: \end{table}

1090:

1091:

1092: \begin{table}

1093: \caption{

1094: The results of selected runs from Table \ref{TAB1}-\ref{TAB3},

1095: repeated for an isolated cytosine molecule (C$_4$H$_5$N$_3$O).}

1096: \label{TAB4}

1097: \begin{tabular}{rrrr}

1098: \hspace*{2.0cm} & \hspace{1.2cm} & \hspace{1.5cm} & \hspace {1.5cm} \\

1099: \hline

1100: method  & $\NBIT$ & $\NITER$ & $\NENE$ \\

1101: \hline

1102: Conjugate gradient  & -- & 13.1  & 27.1 \\

1103: BFGS, \,\,\, $m=3$  & 64 & 13.0  & 14.0 \\

1104:                     &  8 & 13.0  & 14.0 \\

1105:                     &  4 & 13.0  & 14.0 \\

1106:                     &  3 & 13.3  & 14.3 \\

1107: BFGS, \,\,\, $m=7$  & 64 & 10.7  & 11.7 \\

1108: \hline

1109: \end{tabular}

1110: \end{table}

1111:

1112: \begin{figure}

1113:    \caption{The distribution function $d(x)$ of the compressed search

1114:    direction $\PINT$ for $\NBIT = 8$.

1115:    }

1116:    \label{PDOS}

1117: \end{figure}

1118:

1119: \end{document}

1120:

1121: