cond-mat0111199/qn.tex
1: \documentstyle[seceq,preprint]{jpsj}
2: %\documentstyle[12pt]{article}
3: %\documentstyle[twocolumn]{jpsj}
4: 
5: \newcommand{\Rvec}{\mbox{\boldmath $R$}}
6: \newcommand{\gvec}{\mbox{\boldmath $g$}}
7: \newcommand{\xvec}{\mbox{\boldmath $x$}}
8: \newcommand{\pvec}{\mbox{\boldmath $p$}}
9: \newcommand{\rvec}{\mbox{\boldmath $r$}}
10: \newcommand{\uvec}{\mbox{\boldmath $u$}}
11: \newcommand{\vvec}{\mbox{\boldmath $v$}}
12: \newcommand{\qvec}{\mbox{\boldmath $q$}}
13: \newcommand{\svec}{\mbox{\boldmath $s$}}
14: \newcommand{\yvec}{\mbox{\boldmath $y$}}
15: \newcommand{\zvec}{\mbox{\boldmath $z$}}
16: \newcommand{\wfvec}{\mbox{\boldmath $\Psi$}}
17: \newcommand{\NB}{N_{\mbox{\tiny B}}}
18: \newcommand{\NG}{N_{\mbox{\tiny G}}}
19: \newcommand{\NENE}{N_{\mbox{\tiny E}}}
20: \newcommand{\NBIT}{N_{\mbox{\tiny bit}}}
21: \newcommand{\NITER}{N_{\mbox{\tiny iter}}}
22: \newcommand{\IMAX}{I_{\mbox{\tiny max}}}
23: \newcommand{\PINT}{\pvec_{\mbox{\tiny I}}}
24: 
25: 
26: \pagestyle{plain}
27: \setlength{\oddsidemargin}{0cm}
28: \setlength{\textwidth}{16cm}
29: \setlength{\topmargin}{0cm}
30: \setlength{\textheight}{23cm}
31: 
32: \begin{document}
33: 
34: \title{An efficient algorithm for electronic-structure calculations}
35: 
36: \author{E\hspace{0.4mm}i\hspace{0.4mm}j\hspace{0.4mm}i Tsuchida}
37: 
38: \inst{Research Institute for Computational Sciences, AIST, \\
39: Tsukuba Central 2, Umezono 1-1-1, Tsukuba, Ibaraki 305-8568, Japan}
40: 
41: \abst{
42: We show how to adapt the quasi-Newton method
43: to the electronic-structure calculations
44: using systematic basis sets.
45: Our implementation requires less iterations than
46: the conjugate gradient method, while the computational
47: cost per iteration is much lower.
48: The memory usage is also quite modest,
49: thanks to the efficient representation of the
50: approximate Hessian.
51: }
52: 
53: \kword{density-functional theory, quasi-Newton method,
54: BFGS update, finite-element method,
55: Born-Oppenheimer dynamics}
56: 
57: \maketitle
58: 
59: 
60: \section{Introduction}
61: 
62: The importance of the first-principles
63: electronic-structure calculations
64: based on the density-functional theory \cite{HK,KS,CP}
65: is increasing year by year \cite{PAY,TREV,MARX}.
66: Since the optimization of the ground-state wavefunctions is
67: the most time-consuming part of these calculations,
68: it is crucial to use an efficient algorithm for this purpose.
69: However, the number of degrees of freedom
70: is so large for systematic basis sets like
71: plane-waves \cite{PAY,TREV,MARX},
72: finite-differences \cite{CTS,BER,RREV},
73: and finite-elements \cite{RREV,WHT,PRB2,JPSJ,PKFS}, that
74: the memory usage of the algorithm being used
75: is severely restricted.
76: Currently, the conjugate gradient method \cite{RCP,GLL,TPA,SCP,BKL,PAY}
77: seems to be most widely used because of
78: its efficiency and modest memory usage,
79: while the direct inversion in the iterative subspace
80: (DIIS) \cite{PLY,WOZU,MACO,HLP,KRFU} is also sometimes used.
81: 
82: On the other hand, the quasi-Newton methods have rarely been used
83: for electronic optimization
84: in combination with systematic basis sets, although their efficiency
85: is well known \cite{SREV,FREV,NREV};
86: to the best of our knowledge, the application of the quasi-Newton
87: methods in this context
88: has been limited to atomic orbitals \cite{HEPO,FIAL} or
89: one-dimensional problems \cite{HSZ}.
90: This is presumably because they require significantly more storage
91: for the elements of the (approximate) Hessian matrix.
92: If an all-band update is used,
93: the dimension of the Hessian (${\cal H}$) is given by
94: ${\cal N} = \NB \NG$, where $\NB$ is the number of
95: orbitals and $\NG$ is the number of basis functions.
96: Therefore, the storage requirement for
97: ${\cal H} \, ({\cal N} \times {\cal N}) $ will be
98: ${\cal N}^2$ in a naive implementation \cite{RCP}, which is
99: prohibitive for large-scale simulations
100: where $\cal N$ can exceed 10$^7$.
101: A more practical implementation of the quasi-Newton method is
102: also found in the literature \cite{NOCE}, in which
103: only the $m$ previous steps are relevant.
104: Since two update vectors of size $\cal N$ are required
105: per step \cite{NOCE},
106: the memory usage amounts to $2 m {\cal N}$ elements,
107: where $m$ is usually less than 10.
108: However, this can be further reduced to $ m {\cal N}$
109: if the initial Hessian
110: is a multiple of the unit matrix \cite{SIEG,GILE}. 
111: In this article, we present the implementation of
112: the quasi-Newton method using
113: the BFGS (Broyden-Fletcher-Goldfarb-Shanno) formula \cite{FREV,GILE}
114: along this line.
115: As explained in the next section,
116: we make a number of modifications to adapt the algorithm
117: to the electronic-structure calculations.
118: The most important one is the compression of the
119: update vectors by an order of magnitude,
120: which makes this algorithm attractive
121: even for very large systems.
122: 
123: \section{Methods}
124: 
125: \subsection{Electronic-structure calculations}
126: First of all, we explain the basic problems
127: in the electronic-structure
128: calculations within the density-functional theory \cite{HK,KS}.
129: Only real wavefunctions at the $\Gamma$-point of the Brillouin zone
130: are considered for notational simplicity,
131: but generalization to complex wavefunctions is straightforward.
132: 
133: The total energy functional for an ionic configuration $\Rvec$
134: is given by \cite{PAY}
135: \begin{eqnarray}
136: E_{\mbox{\scriptsize total}} \, [\wfvec, \Rvec] & = & 
137: \sum_i \int \psi_i (\rvec) \left[ -\nabla^2 +
138: V_{\mbox{\scriptsize ps}} [\Rvec] \right] \psi_i (\rvec) \, {\mbox d}\rvec
139: + E_{\mbox{\scriptsize Hxc}} [n (\rvec)]
140: + E_{\mbox{\scriptsize ion}} [\Rvec],
141: \end{eqnarray}
142: where 
143: \begin{equation}
144: \wfvec=(\psi_1(\rvec) \;\; \psi_2(\rvec) \;\;... \;\; \psi_{\NB}(\rvec))^T, 
145: \end{equation}
146: \begin{equation}
147: n (\rvec) = \sum_i |\psi_i (\rvec)|^2,
148: \end{equation}
149: and $E_{\mbox{\scriptsize Hxc}}$ is the sum of
150: the Hartree and exchange-correlation
151: energy, which is a nonlinear and nonlocal functional of
152: the electron density $ n (\rvec) $.
153: In practice, each $\psi_i(\rvec)$ is discretized by a
154: basis set expansion \cite{PAY,TREV,MARX,CTS,BER,WHT,RREV,PRB2,JPSJ,PKFS},
155: which makes $\wfvec$ a huge vector with ${\cal N} (=\NB \NG)$ elements. 
156: 
157: In the conventional approach \cite{PAY},
158: the ground-state energy $E_{\mbox{\tiny G}}$ and
159: wavefunctions $\wfvec_{\mbox{\tiny G}}$ for the given $\Rvec$
160: are obtained by minimization of
161: $ E_{\mbox{\scriptsize total}}[\wfvec, \Rvec] $
162: with respect to the wavefunctions $\wfvec$ 
163: under the orthonormality constraints: 
164: \begin{equation}
165: \int \psi_i (\rvec) \, \psi_j (\rvec) \, {\mbox d}\rvec = \delta_{ij}. 
166: \end{equation}
167: $\wfvec_{\mbox{\tiny G}}$ calculated in this way is then used to study
168: various properties of the system. 
169: 
170: In our implementation, on the other hand,
171: the above constraints are eliminated
172: by modifying the total energy functional
173: according to Refs. [\citen{SCP,APJ1,MGC,KV}],
174: in which orthonormality of the wavefunctions is satisfied
175: either implicitly \cite{SCP,APJ1,KV} or automatically \cite{MGC}.
176: Moreover, all the orbitals are updated
177: simultaneously \cite{KV,JPSJ}, and
178: self-consistency of $E_{\mbox{\scriptsize Hxc}}$ is taken into account
179: in the evaluation of its gradient. 
180: Then, if the modified total energy functional for the given $\Rvec$
181: is denoted by $E \, [\wfvec]$, $E_{\mbox{\tiny G}}$
182: and $\wfvec_{\mbox{\tiny G}}$ are obtained by
183: minimization of $E \, [\wfvec]$ with respect to $\wfvec$
184: without any constraints. 
185: Thanks to this reformulation, we can easily implement
186: the quasi-Newton method which is one of the most efficient algorithms
187: for the unconstrained optimization of
188: nonlinear functions \cite{SREV,FREV,NREV}.
189: Furthermore, the use of nonorthogonal basis functions is
190: much easier in this case \cite{JPSJ,KV}.
191: The above ground-state calculations are usually performed
192: for a series of slowly varying $\Rvec$, each of which
193: is called an {\it ionic step}.
194: 
195: \subsection{BFGS with full Hessian}
196: \label{FBFGS}
197: We illustrate the conventional 
198: quasi-Newton method using the BFGS formula \cite{FREV,GILE} here,
199: which will serve as a prototype
200: for the implementation in reduced space.
201: For simplicity, we assume the new total energy
202: (eq. (\ref{ENEW})) is always lower than the previous value. \\
203: \ \\
204: Choose ${\cal H}_0$ and $\wfvec_0$. \\
205: Calculate $ E_0 = E \, [\wfvec_0]$ and
206: $\gvec_0 = \nabla E \, [\wfvec_0]$. \\
207: Set k=0. \\
208: Do while ($|\gvec_k| \ge \epsilon$)
209: \begin{equation}
210: \pvec_k = -{\cal H}_k^{-1} \gvec_k
211: \end{equation}
212: \begin{equation}
213: \wfvec_{k+1} = \wfvec_k + \pvec_k
214: \end{equation}
215: \begin{equation}
216: \label{ENEW}
217: E_{k+1} = E \, [\wfvec_{k+1}]
218: \end{equation}
219: \begin{equation}
220: \gvec_{k+1} = \nabla E \, [\wfvec_{k+1}]
221: \end{equation}
222: \begin{equation}
223: \Delta \wfvec_k = \wfvec_{k+1} - \wfvec_k
224: \end{equation}
225: \begin{equation}
226: \Delta \gvec_k = \gvec_{k+1} - \gvec_k
227: \end{equation}
228: \begin{equation}
229: \label{FBFGSEQ}
230: {\cal H}_{k+1} = {\cal H}_k -
231: \frac{{\cal H}_k \Delta \wfvec_k \Delta \wfvec_k^T {\cal H}_k}
232: {\Delta \wfvec_k^T {\cal H}_k \Delta \wfvec_k}
233: +\frac{\Delta \gvec_k \Delta \gvec_k^T}
234: {\Delta \wfvec_k^T \Delta \gvec_k}
235: \end{equation}
236: \begin{equation}
237: k=k+1
238: \end{equation}
239: End do \\
240: 
241: While this algorithm is simple and efficient in terms of the
242: convergence rate, its memory usage and computational effort
243: scale as $O({\cal N}^2)$ and $O({\cal N}^3)$ respectively,
244: which are prohibitive.
245: Although the latter can be reduced to $O({\cal N}^2)$, if the
246: updating formula for the inverse Hessian (${\cal H}^{-1}$)
247: is used \cite{RCP}, this is still far from practical.
248: The purpose of this article is to present
249: the improved algorithm \cite{SIEG,GILE}
250: in which both scale as $O({\cal N})$ with modest prefactors.
251: 
252: \subsection {QR-decomposition}
253: \label{QRD}
254: At this point, we give a brief introduction to the
255: QR-decomposition \cite{RCP},
256: which plays an important role in the algorithm
257: presented in the next section.
258: Let us assume $B ({\cal N} \times r)$ is a set of
259: linearly independent vectors:
260: \begin{equation}
261: B = (\pvec_1 \;\; \pvec_2 \;\; \cdots \;\; \pvec_r), 
262: \end{equation}
263: where $1 \le r \ll {\cal N}$.
264: Then the QR-decomposition of $B$ is given by 
265: \begin{equation}
266: B = Z \,\, T, 
267: \end{equation}
268: where
269: $Z ({\cal N} \times r)$ is a set of orthonormal vectors spanning the
270: same subspace as $B$, i.e. 
271: \begin{equation}
272: Z^T Z = I, 
273: \end{equation}
274: and $T (r \times r)$ is an invertible upper-triangular matrix.
275: In practice, this decomposition is obtained by applying the
276: addition procedure given below repeatedly, which is
277: (mathematically) equivalent to constructing an orthonormal basis
278: from the left ($\pvec_1$) to the right ($\pvec_r$)
279: by the Gram-Schmidt scheme.
280: Note, however, that only $B$ and $T$ are considered explicitly
281: in the following \cite{SIEG,GILE}.
282: 
283: Here we show how to update the above QR-decomposition
284: when $B$ is slightly modified.
285: In the first case where a vector $\gvec$ is added to $B$, i.e.
286: \begin{equation}
287: B_+ = (\pvec_1 \;\; \pvec_2 \;\; \cdots \;\; \pvec_r \;\; \gvec)
288: = (B \;\; \gvec), 
289: \end{equation}
290: the new decomposition is given by 
291: \begin{equation}
292: B_+ = Z_+ \, T_+, 
293: \end{equation}
294: where 
295: \begin{equation}
296: T_+ ((r+1) \times (r+1)) = \left(
297: \begin{array}{cc}
298: T &  \uvec \\
299: 0 &  \rho \\
300: \end{array}
301: \right) ,
302: \end{equation}
303: \begin{equation}
304: \uvec = Z^T \gvec = (T^T)^{-1} ( B^T \gvec ), 
305: \end{equation}
306: and 
307: \begin{equation}
308: \rho = \sqrt{|\gvec|^2 - |\uvec|^2}.
309: \end{equation}
310: If $\rho \ne 0$, $T_+$ is also an invertible upper-triangular
311: matrix.
312: 
313: Next, we consider the case of
314: dropping the leftmost vector $\pvec_1$ from $B$, i.e.
315: \begin{equation}
316: B_- = (\pvec_2 \;\; \pvec_3 \;\; \cdots \;\; \pvec_r).
317: \end{equation}
318: The corresponding decomposition is given by 
319: \begin{equation}
320: B_- = Z_- T_-,
321: \end{equation}
322: where $T_-$ satisfies
323: \begin{equation}
324: \label{QRM}
325: T^T_- \, T_- = B^T_- \, B_-.
326: \end{equation}
327: Obviously, the right-hand side of eq. (\ref{QRM}) is
328: included in $ B^T B$, which is easily calculated from  
329: \begin{equation}
330: B^T B = T^T T.
331: \end{equation}
332: Therefore, $T_-$ is obtained by the Cholesky decomposition \cite{RCP}
333: of a small matrix at negligible cost.
334: A more refined approach is introduced
335: in Ref. \citen{GILE}, but
336: the above procedure seems to be sufficient for our present purpose.
337: 
338: 
339: \subsection{BFGS with reduced Hessian and limited memory}
340: \label{ALG}
341: Here we present the state-of-the-art implementation of the
342: quasi-Newton method \cite{SIEG,GILE}, which is obtained by
343: modifying the conventional algorithm ($\S$ \ref{FBFGS})
344: under two assumptions: (i) ${\cal H}_0 = \sigma I$ ($\sigma > 0$), and
345: (ii) At most $m$ previous steps are stored.
346: 
347: In order to fully exploit these conditions, it is more
348: convenient to use a compact representation for the
349: Hessian: 
350: \begin{equation}
351: H = Z^T \, {\cal H} \, Z,
352: \end{equation}
353: where $Z \, ({\cal N} \times r)$ is the current (orthonormal) basis,
354: $H \, (r \times r)$ is the reduced Hessian,
355: and $1 \le r \le m+1 \ll {\cal N}$.
356: While $Z$ and ${\cal H}$ also appear in the following algorithm,
357: they are not explicitly calculated.
358: The reduced vectors are defined in a similar way;
359: the reduced gradient $\uvec$, for instance, is given by
360: $\uvec = Z^T \gvec$, where $\gvec = \nabla E$. 
361: The correspondence between the full/reduced vectors
362: is shown in Table \ref{TAB0}. 
363: 
364: \begin{enumerate}
365: \item \label{QNINI} Initilization: \\
366: Set $k=0$ and $r=1$, where $k$ and $r$ denote the loop index
367: and the rank of the reduced space, respectively. \\
368: Choose the initial wavefunction ($\wfvec_0$),
369: the approximate curvature ($\sigma$),
370: the convergence criterion ($\epsilon$), and
371: the maximum rank of the reduced space ($m$). \\
372: Calculate the total energy
373: \begin{equation}
374: E_0=E \, [\wfvec_0]
375: \end{equation}
376: and its gradient 
377: \begin{equation}
378: \gvec_0=\nabla E \, [\wfvec_0].
379: \end{equation}
380: {\sf IF} ($|\gvec_0| < \epsilon$) {\sf THEN} quit. \\
381: {\sf ELSE} $ H_0 = (\sigma), \; B_0=(\gvec_0), \;
382: T_0=(|\gvec_0|)$, and $ \vvec_0=(|\gvec_0|) $. Moreover,
383: $Z_0 = (\gvec_0/|\gvec_0|)$ and ${\cal H}_0 = \sigma I$
384: are implicitly assumed. \\
385: If the Hessian of the previous ionic step is
386: taken over, several modifications are
387: required in this step, which are, however, straightforward.
388: 
389: \item \label{QNLOOP} Calculate the new search direction in reduced space:
390: \begin{equation}
391: \label{QKEQ}
392: \qvec_k = -H_k^{-1} \vvec_k.
393: \end{equation}
394: \item \label{PFUL} Calculate the new search direction:
395: \begin{equation}
396: \label{FULLP}
397: \pvec_k = Z_k \qvec_k = B_k (T_k^{-1} \qvec_k).
398: \end{equation}
399: \item \label{UPB1} Update the subspace: \
400: $ (B_k = Z_k T_k \rightarrow B_k' = Z_k T_k') $,
401: where
402: \begin{equation}
403: B_k ({\cal N} \times r) =
404: (\pvec_{k-r+1} \;\; \cdots \;\; \pvec_{k-1} \;\; \gvec_k)
405: \end{equation}
406: and
407: \begin{equation}
408: B'_k ({\cal N} \times r) =
409: (\pvec_{k-r+1} \;\; \cdots \;\; \pvec_{k-1} \;\; \pvec_k).
410: \end{equation}
411: $T_k'$ is obtained from $T_k$ and $\qvec_k$,
412: whereas $Z_k$ remains unchanged \cite{GILE}.
413: \item Set $\alpha = 1$ and calculate the gradient of the total energy
414: along $\pvec_k$ as
415: \begin{equation}
416: \label{EDEQ}
417: E' = \left. 
418: \frac{\partial E \, [\wfvec_k + \alpha \pvec_k] }{\partial \alpha}
419: \right|_{\alpha=0}
420: = \gvec_k^T \pvec_k = \vvec_k^T \qvec_k.
421: \end{equation}
422: \item \label{QNLINMIN} Calculate the new wavefunction: 
423: \begin{equation}
424: \wfvec_{k+1} = \wfvec_k + \alpha \pvec_k.
425: \end{equation}
426: \item \label{QNLM2} Calculate the new total energy: 
427: \begin{equation}
428: E_{k+1} = E \, [\wfvec_{k+1}].
429: \end{equation}
430: {\sf IF} $(E_{k+1} \geq E_k)$ {\sf THEN}
431: estimate the optimal $\alpha$ by a parabolic fit
432: with $E_k$, $E'$, and $E_{k+1}$, and go to \ref{QNLINMIN}.
433: \item \label{GFUL} Calculate the new gradient:
434: \begin{equation}
435: \gvec_{k+1}=\nabla E \, [\wfvec_{k+1}].
436: \end{equation}
437: {\sf IF} ($|\gvec_{k+1}| < \epsilon$) {\sf THEN} quit.
438: 
439: \item \label{RGITEM} Extend the subspace: \
440: $( B_k'=Z_k T_k' \rightarrow B_k''= Z_k' T_k'') $,
441: where 
442: \begin{equation}
443: B'_k({\cal N} \times r) =
444: (\pvec_{k-r+1} \;\; \cdots \;\; \pvec_k)
445: \end{equation}
446: and
447: \begin{equation}
448: B_k'' ({\cal N} \times (r+1)) =
449: (\pvec_{k-r+1} \;\; \cdots \;\; \pvec_{k} \;\; \gvec_{k+1}).
450: \end{equation}
451: As explained in $\S$ \ref{QRD},
452: $T_k''$ is obtained from $T_k'$, $\uvec_k$, and $\rho_{k+1}$, where
453: \begin{equation}
454: \label{REDG}
455: \uvec_{k} = Z_k^T \gvec_{k+1}=(T_{k}^{'T})^{-1} (B_{k}^{'T} \gvec_{k+1})
456: \end{equation}
457: and 
458: \begin{equation}
459: \rho_{k+1} = \sqrt{|\gvec_{k+1}|^2 - |\uvec_k|^2}.
460: \end{equation}
461: We assume $\rho_{k+1} \ne 0$ in the following. 
462: Then, the new basis $Z_k' ({\cal N} \times (r+1))$
463: is given by \cite{GILE}
464: \begin{equation}
465: Z_k' = ( Z_k \;\;\; \zvec_{k+1}), 
466: \end{equation}
467: where $\zvec_{k+1} = (\gvec_{k+1} - Z_k \uvec_k) / \rho_{k+1}$.
468: However, $\zvec_{k+1}$ is not explicitly calculated. 
469: 
470: \item $ r=r+1 $
471: \item Calculate the reduced gradients as 
472: \begin{equation}
473: \vvec_k' = Z_k^{'T} \gvec_k = \left(
474: \begin{array}{c}
475: \vvec_k \\
476: 0 \\
477: \end{array} \right)
478: \end{equation}
479: and 
480: \begin{equation}
481: \uvec_k' = Z_k^{'T} \gvec_{k+1} = \left(
482: \begin{array}{c}
483: \uvec_k \\
484: \rho_{k+1}
485: \end{array}
486: \right). 
487: \end{equation}
488: There is no loss of information here, since
489: $\gvec_k, \gvec_{k+1} \in Z_k'$. 
490: 
491: \item \label{QNSY}
492: Update the reduced Hessian using the BFGS formula \cite{FREV,GILE}:
493: \begin{equation}
494: \label{RBFGSEQ}
495: H''_k (r \times r)
496: = Z_k^{'T} {\cal H}_{k}^{+} Z'_k
497: = H'_k
498: - \frac{H'_k \svec_k \svec_k^T H'_k}{\svec_k^T H'_k \svec_k}
499: + \frac{\yvec_k \yvec_k^T}{\svec_k^T \yvec_k},
500: \end{equation}
501: where 
502: \begin{equation}
503: \svec_k = Z_{k}^{'T} \Delta \wfvec_k =
504: \alpha \left(
505: \begin{array}{c}
506: \qvec_k \\
507: 0 \\
508: \end{array}
509: \right),
510: \end{equation}
511: \begin{equation}
512: \yvec_k = Z_{k}^{'T} \Delta \gvec_k = \uvec_k' - \vvec_k',
513: \end{equation}
514: and
515: \begin{equation}
516: \label{SIG2}
517: H'_{k}(r \times r) = Z_k^{'T} {\cal H}_{k} Z'_k = 
518: \left(
519: \begin{array}{cc}
520: H_{k}  & 0 \\
521: 0    & \sigma \\
522: \end{array}
523: \right).
524: \end{equation}
525: ${\cal H}_k^+$ is defined as the right-hand side of
526: eq. (\ref{FBFGSEQ}), and eq. (\ref{RBFGSEQ}) is derived from
527: this definition.
528: Note that $\svec_k^T \yvec_k$ $ > 0$ is assumed here; otherwise,
529: the Hessian is not updated.
530: 
531: \item {\sf IF} $(r = m+1)$ {\sf THEN}
532: reduce the subspace: \ $(B_k'' = Z_k' T_k''
533: \rightarrow B_{k+1} = Z_{k+1} T_{k+1})$, where
534: \begin{equation}
535: B_k'' ({\cal N} \times (m+1)) =
536: (\pvec_{k-m+1} \;\; \pvec_{k-m+2} \;\;
537: \cdots \;\; \pvec_{k} \;\; \gvec_{k+1})
538: \end{equation}
539: and
540: \begin{equation}
541: B_{k+1} ({\cal N} \times m) =
542: (\pvec_{k-m+2} \;\; \cdots \;\; \pvec_{k} \;\; \gvec_{k+1}).
543: \end{equation}
544: $T_{k+1}$ is easily obtained from $T_k''$ according to $\S$ \ref{QRD}. 
545: Then, $H_{k+1} (m \times m) = Z_{k+1}^T {\cal H}_k^+ Z_{k+1}$ 
546: is calculated from $T_k'', T_{k+1}$, and $H_k''$
547: by way of $B_{k+1}^T {\cal H}_k^+ B_{k+1}$.
548: At this point, the new Hessian (${\cal H}_{k+1}$)
549: in the new basis ($Z_{k+1}$ and its orthogonal complement)
550: is defined as a block-diagonal matrix consisting of $H_{k+1}$
551: and $\sigma I (({\cal N}-m) \times ({\cal N}-m))$,
552: which was implicitly used in eq. (\ref{SIG2}).
553: Therefore, part of the information contained in ${\cal H}_k^+$
554: has been discarded here.
555: Similarly, $\vvec_{k+1} = Z_{k+1}^T \gvec_{k+1}$ is calculated
556: from $T_k'', T_{k+1}$, and $\uvec_k'$
557: by way of $B_{k+1}^T \gvec_{k+1}$, but there is no loss of
558: information here.
559: Finally, we set $r=m$. \\
560: {\sf ELSE}
561: $ H_{k+1}=H_k'', B_{k+1}=B_k'', T_{k+1}=T_k''$,
562: and $\vvec_{k+1}= \uvec_k' $.
563: Moreover, $Z_{k+1} = Z_k'$ and ${\cal H}_{k+1} = {\cal H}_k^+$
564: are implicitly assumed.
565: \item $k=k+1$
566: \item Go to \ref{QNLOOP}.
567: \end{enumerate}
568: 
569: \begin{itemize}
570: \item While $0 \le k \le m-1$,
571: this algorithm is identical to the conventional one
572: ($\S$ \ref{FBFGS}) with ${\cal H}_0 = \sigma I $
573: within round-off errors.
574: The two algorithms begin to differ once $k$ reaches $m$,
575: but the deterioration of the convergence rate is minimized
576: by constructing the subspace with
577: the previous search directions rather than the
578: gradients \cite{SIEG,GILE}.
579: 
580: \item For simplicity,
581: the above algorithm includes minimal exception handling.
582: Therefore, the original paper \cite{GILE} should be
583: consulted for a more complete one. 
584: However, such exceptions are observed
585: only in the very early stages of the first
586: ionic step, where the quadratic model is not valid.
587: 
588: \item One cycle requires approximately
589: $ 2 r {\cal N}$ multiply-and-add operations,
590: arising from eqs. (\ref{FULLP}) and (\ref{REDG}).
591: For practical values of $m \, (< 10$), these costs will be
592: much lower than those of evaluating
593: the total energy in step \ref{QNLM2} \cite{PAY}.
594: 
595: \item The basis functions should be appropriately
596: scaled \cite{TPA,HEPO} in advance,
597: so that their contribution to the total energy is similar.
598: 
599: \item The reduced Hessian $H_k$ is diagonalized in each cycle to
600: guarantee its positive definiteness; nonpositive eigenvalues,
601: if any, are modified appropriately.
602: Then, it follows from eqs. (\ref{QKEQ}) and (\ref{EDEQ}) that
603: $ E' = - \vvec_k^T H_k^{-1} \vvec_k < 0$,
604: because $H_k^{-1}$ is also positive definite and
605: $|\vvec_k| = |\gvec_k| \ge \epsilon$.
606: Furthermore,
607: the average eigenvalue of the reduced Hessian,
608: denoted by $\lambda_k$, is also calculated and stored for later use.
609: 
610: \item We explain the choice of $\sigma$
611: used in step \ref{QNINI} and \ref{QNSY} here.
612: Since $\sigma$ is the approximate curvature
613: along the new direction \cite{GILE}, a reasonable
614: estimate is needed to achieve high performance. 
615: Therefore, a number of strategies have been proposed 
616: to choose optimal $\sigma$ \cite{NOCE,SIEG,GILE},
617: most of which provide dynamical estimates.
618: Nevertheless, we use a constant $\sigma$ during each ionic step
619: unless otherwise noted, which is determined as follows:
620: In the first ionic step, $\sigma$ is estimated from the
621: coarse grid iterations \cite{JPSJ}.
622: At the end of each ionic step, the sequence $\{ \lambda_k \}$
623: is further averaged
624: to give the new $\sigma$ for the next ionic step.
625: $\sigma$ obtained in this way varies only slowly with ionic steps,
626: while providing stable and high performance
627: in the systems we have studied so far.
628: Comparison is also made with the dynamical estimates
629: in $\S$ \ref{RESSEC}.
630: \end{itemize}
631: 
632: \subsection{Data compression}
633: \label{ZIP}
634: The memory usage of the algorithm illustrated in the
635: previous section is dominated by the $m$ previous search directions,
636: which amount to $m {\cal N}$ elements.
637: While this is much smaller than
638: the storage of the full Hessian $(={\cal N}^2)$, it is still
639: a serious obstacle in large-scale simulations.
640: In what follows, we present a simple algorithm to compress
641: the previous search directions
642: without sacrificing the efficiency of the original method.
643: In this algorithm, one search direction is compressed
644: in each cycle, by taking advantage of its structure.
645: If $ \pvec \, ({\cal N}) $, which is being compressed,
646: is viewed as a two-dimensional array $ \pvec \, (\NB, \NG) $,
647: the size of $\pvec \, (i, j)$
648: for a given basis function ($j$) is
649: expected to be similar for all orbitals ($i$).
650: Based on this idea, the largest element of $ |\pvec (i, j)|$
651: with respect to $i$ is chosen as the scale factor. 
652: Moreover, $\NBIT$ is defined as the number of bits
653: assigned to each element of $\pvec$ after compression.  
654: 
655: Then, the scale factor $\omega \, (\NG)$
656: and the compressed array $\PINT \, (\NB, \NG)$ are given by
657: \begin{equation}
658: \label{SCLF}
659: \mbox{real$\ast$8} \;\;\;\;\; \omega \, (j) =
660: \left( \max_{1 \le i \le \NB} |\pvec \, (i,j)| \right) / \IMAX
661: \end{equation}
662: and 
663: \begin{equation}
664: \mbox{integer}  \;\;\;\;\; \PINT \, (i,j)=
665: \mbox{round} \left( \frac{\pvec \, (i,j)}{\omega \, (j)} \right)
666: + \IMAX
667: \end{equation}
668: respectively,
669: where $ \IMAX = 2^{\NBIT-1} -1 $
670: and $ 0 \le \PINT \, (i,j) \le 2 \IMAX = 2^{\NBIT} -2 $.
671: Therefore, each element of $\PINT$ is representable by
672: $\NBIT$ bits. 
673: The original values of $\pvec$ are recovered approximately by 
674: \begin{equation}
675: \pvec \, (i,j) \approx \omega (j) \, (\PINT (i,j) - \IMAX).
676: \end{equation}
677: 
678: In this method, the quality of the compression can
679: be controlled by a single parameter, $\NBIT$.
680: Furthermore, the largest element for each $j$,
681: which is the most important one, remains exact. 
682: 
683: The total storage for the $m$ search directions 
684: after compression is $ m \NBIT {\cal N} / 8 $ bytes,
685: if appropriately packed with bit operations.
686: If $m=\NBIT=8$, for instance, this amounts to only
687: one double-precision array of size $\cal N$.
688: Note also that the storage for the scale factors is minor.
689: 
690: In the current implementation,
691: $\pvec_k$ is compressed in step \ref{UPB1}, when added to $B_k'$. 
692: At the same time, the last column of $T_k'$ is
693: calculated directly from the compressed $\pvec_k$
694: (rather than using $\qvec_k$)
695: to maintain the consistency of the QR-decomposition.
696: However, the uncompressed $\pvec_k$ is also retained and used
697: in step \ref{QNLINMIN}.
698: 
699: Unfortunately, some inconsistency seems inevitable
700: in the update of the reduced Hessian, since
701: $\svec_k$ and $\yvec_k$ no longer belong to $Z_k'$.
702: Nevertheless,
703: $E' < 0$ remains valid as long as the reduced Hessian is positive
704: definite and the latest $\gvec$ and $\pvec$ are uncompressed.
705: Therefore, the stability of the minimization is guaranteed
706: even if the previous search directions are highly compressed.
707: 
708: \section{Results}
709: \label{RESSEC}
710: As a test of our implementation under realistic conditions,
711: we performed a series of Born-Oppenheimer dynamics \cite{WM} 
712: for bulk diamond at 220 K in a periodic cubic supercell of
713: 64 atoms within the local density approximation \cite{HK,KS}.
714: The wavefunctions were expanded by the adaptive finite-element
715: method \cite{PRB2,JPSJ} with an average cutoff energy of 43 Ry,
716: which corresponds to $ \NG = 8 \times 14^3 = 21,952$.
717: Since $\NB$ is equal to 128,
718: ${\cal N}$ amounts to approximately 2,800,000 in this system.
719: The Brillouin zone was sampled only at the $\Gamma$-point,
720: and the separable pseudopotentials were used \cite{KB,GTH}.
721: The convergence criterion ($\epsilon$) was chosen so that
722: $|E_{k+1} - E_k| \simeq 2 \times 10^{-8}$ Ry/atom
723: when $|\gvec_{k+1}| < \epsilon$ was satisfied. 
724: Convergence to the ground state was accelerated by the
725: enhanced extrapolation scheme \cite{EES},
726: which provides accurate
727: initial wavefunctions with the help of population analysis.
728: 
729: The equations of motion for the ions were integrated using
730: the velocity-Verlet method \cite{LIQ}
731: with a timestep of 80 a.u. ($\sim 2$ fs). 
732: Starting from the same ionic configuration,
733: each run lasted for 57 ionic steps,
734: the last 50 steps of which were used to collect the statistics.
735: Moreover, $B,T$, and $H$ were taken over from
736: previous ionic steps unless otherwise noted.
737: Therefore, these matrices were saturated during this period in all runs.
738: 
739: We first show the average number of iterations
740: ($\NITER$) and total energy evaluations ($\NENE$)
741: needed to optimize the electronic-structures
742: for the conjugate gradient method
743: using the Polak-Ribiere formula \cite{RCP} and
744: the quasi-Newton method using the BFGS formula
745: in Table \ref{TAB1}.
746: The convergence rate of the quasi-Newton method
747: as measured by $\NITER$ is already comparable to that of
748: the conjugate gradient method for $m = 2$,
749: and becomes better as $m$ is increased.
750: However, there is no point in using $m$ much larger
751: than $\NITER$ (say, 20), because the Hessian
752: is dominated by the contribution from previous ionic steps.
753: In practice,
754: any reasonable choice of $m$, e.g. 5-8, will provide near-optimal
755: performance, since $\NITER$ depends
756: only weakly on $m$ in this range.
757: Note also that the CPU-time is more closely related to
758: $\NENE$ than $\NITER$.
759: Therefore, the quasi-Newton method was much faster
760: than the conjugate gradient method for all $m$ we tried.
761: Specifically, $\NENE = 2 \NITER+1$ in the conjugate gradient method,
762: because at least one line search was forced
763: to maintain the conjugacy of the search directions.
764: In contrast,
765: $ \NENE = \NITER+1$ in the quasi-Newton method, which means
766: that no line search was required in step \ref{QNLM2}.
767: 
768: The algorithm presented in $\S$ \ref{ALG} has a number of
769: options which are not uniquely determined.
770: Therefore, we examine some of them here,
771: as shown in Table \ref{TAB2}.
772: (a) is the reference run performed with $m=7$,
773: taken from Table \ref{TAB1}.
774: (b)-(d) were performed under the same conditions as (a)
775: except for the following points: 
776: (b) A line search with a parabolic fit was forced in each cycle
777: to see if the convergence rate is improved.
778: However, $\NENE$ was almost doubled without any reduction of $\NITER$.
779: Therefore, it is not justified to perform a line search
780: in the quasi-Newton method,
781: which is consistent with previous findings \cite{NOCE}.
782: (c) The Hessian was discarded at the end of each ionic step.
783: Since the convergence rate deteriorates significantly,
784: the inheritance of the Hessian seems to be profitable. 
785: (d) We tried $ \sigma_k=|\yvec_k|^2 / \yvec_k^T \svec_k $,
786: which gave good results in Refs. [\citen{NOCE,GILE}]. 
787: However, this choice requires more iterations on average,
788: presumably because $\sigma_k$ varies too rapidly with $k$. 
789: The norm of the gradient also decays
790: less smoothly in this case.
791: 
792: So far the previous search directions have been uncompressed,
793: i.e. stored as 64-bit double-precision arrays.
794: The effect of compression is examined here
795: in a series of runs for $m=3$ and $7$,
796: with different $\NBIT$.
797: As shown in Table \ref{TAB3}, the performance
798: is maintained after compression by a factor of 8-16,
799: especially for $m=7$.
800: Moreover, no instability occurred up to $\NBIT=3$.
801: We also show the distribution of the search direction
802: after compression with $\NBIT=8$
803: in Fig. \ref{PDOS}.
804: The distribution function $d(x)$ is defined as the
805: number of elements of $\PINT$ such that
806: $\PINT (i) = x$, where $ 1 \le i \le {\cal N}$ and
807: $ 0 \le x \le 2^8-2 = 254$.
808: Therefore, $\sum_x d(x) = {\cal N}$, and
809: there are two singularities at $x=0$ and 254.
810: The width of the distribution is approximately equal to $\IMAX$,
811: which indicates that our choice of the scale factor
812: (eq.(\ref{SCLF})) is appropriate. 
813: 
814: Finally, in order to examine the generality of
815: our implementation,
816: part of the runs were repeated for
817: an isolated cytosine molecule (C$_4$H$_5$N$_3$O)
818: in a cubic supercell of (16 a.u.)$^3$, with a timestep of 40 a.u.
819: ($\sim$ 1 fs).
820: The average cutoff energy was 39 Ry, which
821: corresponds to $\NB=21, \NG= 8 \times 16^3 = 32,768$, and
822: ${\cal N} \sim 700,000 $.
823: The results shown in Table \ref{TAB4} suggest
824: that the performance of the quasi-Newton method in this system
825: is somewhat more robust against compression,
826: but is qualitatively similar to the previous results in other respects.
827: 
828: \section{Summary}
829: We have shown in this article that
830: the quasi-Newton method using the BFGS formula
831: is the method of choice for
832: large-scale electronic-structure calculations,
833: if combined with efficient memory management.
834: The advantages of the quasi-Newton method over the conjugate
835: gradient method are summarized as follows:
836: (i) The Hessian of the previous ionic step can
837: be taken over to accelerate the convergence.
838: (ii) Practically no line search
839: is required, which reduces the cost of
840: each step significantly. 
841: 
842: Although there is room for fine-tuning the algorithm and
843: more extensive tests are necessary,
844: the quasi-Newton method will
845: provide significant speedups of the first-principles codes,
846: together with other techniques like
847: the enhanced extrapolation scheme \cite{EES} and
848: the constrained molecular dynamics \cite{RAT,JPSJ2}.
849: 
850: \section*{Acknowledgements}
851: The author would like to thank Dr. K.~Terakura for
852: helpful discussions.
853: The numerical calculations were performed on Hitachi SR-8000 at the
854: Tsukuba Advanced Computing Center.
855: 
856: 
857: \begin{thebibliography}{99}
858:   \bibitem{HK} P.~Hohenberg and W.~Kohn: Phys.~Rev. {\bf 136} (1964) B864.
859: 
860:   \bibitem{KS} W.~Kohn and L.~J.~Sham: Phys.~Rev. {\bf 140} (1965) A1133.
861: 
862:   \bibitem{CP} R.~Car and M.~Parrinello:
863:   Phys.~Rev.~Lett. {\bf 55} (1985) 2471.
864: 
865:   \bibitem{PAY} M.~C.~Payne, M.~P.~Teter, D.~C.~Allan, T.~A.~Arias
866:   and J.~D.~Joannopoulos: Rev.~Mod.~Phys. {\bf 64} (1992) 1045.
867: 
868:   \bibitem{TREV} K.~Terakura: {\it Computational Physics
869:   as a New Frontier in Condensed Matter Research}, ed. H.~Takayama,
870:   M.~Tsukada, H.~Shiba, F.~Yonezawa, M.~Imada
871:   and Y.~Okabe (The Physical Society of Japan, Tokyo, 1995).
872: 
873:   \bibitem{MARX} D.~Marx and J.~Hutter: {\it Modern Methods and
874:   Algorithms of Quantum Chemistry}, ed. J.~Grotendorst (NIC,
875:   Forschungszentrum Julich, 2000), available at
876:   http:// www.theochem.ruhr-uni-bochum.de/
877:   research/ marx/ index.en.html.
878: 
879:   \bibitem{CTS} J.~R.~Chelikowsky, N.~Troullier, K.~Wu and Y.~Saad:
880:   Phys.~Rev.~B {\bf 50} (1994) 11355.
881: 
882:   \bibitem{BER} E.~L.~Briggs, D.~J.~Sullivan and J.~Bernholc:
883:   Phys.~Rev.~B {\bf 54} (1996) 14362.
884: 
885:   \bibitem{RREV} T.~L.~Beck: Rev.~Mod.~Phys. {\bf 72} (2000) 1041.
886: 
887:   \bibitem{WHT} S.~R.~White, J.~W.~Wilkins and M.~P.~Teter:
888:   Phys.~Rev.~B {\bf 39} (1989) 5819.
889: 
890:   \bibitem{PRB2} E.~Tsuchida and M.~Tsukada:
891:   Phys.~Rev.~B {\bf 54} (1996) 7602.
892: 
893:   \bibitem{JPSJ} E.~Tsuchida and M.~Tsukada:
894:   J.~Phys.~Soc.~Jpn. {\bf 67} (1998) 3844,
895:   available at http:// wwwsoc.nii.ac.jp/ jps/ jpsj/.
896: 
897:   \bibitem{PKFS} J.~E.~Pask, B.~M.~Klein, C.~Y.~Fong and P.~A.~Sterne:
898:   Phys.~Rev.~B {\bf 59} (1999) 12352.
899: 
900:   \bibitem{RCP} W.~H.~Press, S.~A.~Teukolsky, W.~T.~Vetterling
901:   and B.~P.~Flannery: {\it Numerical Recipes in Fortran}
902:   (Cambridge University Press, Cambridge, 1992).
903: 
904:   \bibitem{GLL} M.~J.~Gillan:
905:   J.~Phys.:~Condens.~Matter {\bf 1} (1989) 689.
906: 
907:   \bibitem{TPA} M.~P.~Teter, M.~C.~Payne and D.~C.~Allan:
908:   Phys.~Rev.~B {\bf 40} (1989) 12255.
909: 
910:   \bibitem{SCP} I.~Stich, R.~Car, M.~Parrinello and S.~Baroni:
911:   Phys.~Rev.~B {\bf 39} (1989) 4997.
912: 
913:   \bibitem{BKL} D.~M.~Bylander, L.~Kleinman and S.~Lee:
914:   Phys.~Rev.~B {\bf 42} (1990) 1394.
915: 
916:   \bibitem{PLY} P.~Pulay: Chem.~Phys.~Lett. {\bf 73} (1980) 393.
917: 
918:   \bibitem{WOZU} D.~M.~Wood and A.~Zunger:
919:   J.~Phys.~A {\bf 18} (1985) 1343.
920: 
921:   \bibitem{MACO} J.~L.~Martins and M.~L.~Cohen,
922:   Phys.~Rev.~B {\bf 37} (1988) 6134.
923: 
924:   \bibitem{HLP} J.~Hutter, H.~P.~L\"uthi and M.~Parrinello:
925:   Comp.~Mat.~Sci. {\bf 2} (1994) 244.
926: 
927:   \bibitem{KRFU} G.~Kresse and J.~Furthm\"uller:
928:   Comp.~Mat.~Sci. {\bf 6} (1996) 15.
929: 
930:   \bibitem{SREV} T.~Schlick: Rev.~Comp.~Chem. {\bf 3} (1992) 1.
931: 
932:   \bibitem{FREV} R.~Fletcher: Report NA/149, Department of
933:    mathematics and computer science, University of Dundee, 1993,
934:    availble at
935:    http:// citeseer.nj.nec.com/ fletcher93overview.html.
936: 
937:   \bibitem{NREV} J.~Nocedal: {\it The State of the Art in
938:   Numerical Analysis}, ed. I.~S.~Duff and G.~A.~Watson
939:   (Oxford University Press, 1998), available at
940:   http:// www.ece.nwu.edu/\~{}nocedal/ recent\_{}pub.html.
941: 
942:   \bibitem{HEPO} M.~Head-Gordon and J.~A.~Pople:
943:   J.~Phys.~Chem. {\bf 92} (1988) 3063.
944: 
945:   \bibitem{FIAL} T.~H.~Fischer and J.~Alml\"of:
946:   J.~Phys.~Chem. {\bf 96} (1992) 9768.
947: 
948:   \bibitem{HSZ} R.~A.~Hyman, M.~D.~Stiles and A.~Zangwill:
949:   Phys.~Rev.~B {\bf 62} (2000) 15521.
950: 
951:   \bibitem{NOCE} D.~C.~Liu and J.~Nocedal:
952:   Math.~Prog. {\bf 45} (1989) 503.
953: 
954:   \bibitem{SIEG} D.~Siegel: Report DAMTP 1992/NA12,
955:   University of Cambridge.
956: 
957:   \bibitem{GILE} P.~E.~Gill and M.~W.~Leonard:
958:   Report NA 97-1, Department of Mathematics, Santa Clara University,
959:   available at http:// www.math.ucla.edu/
960:   \~{}mwl/ vita/ vita.html.
961: 
962:   \bibitem{APJ1} T.~A.~Arias, M.~C.~Payne and J.~D.~Joannopoulos:
963:   Phys.~Rev.~Lett. {\bf 69} (1992) 1077.
964: 
965:   \bibitem{MGC} F.~Mauri, G.~Galli and R.~Car:
966:   Phys.~Rev.~B {\bf 47} (1993) 9973.
967: 
968:   \bibitem{KV} R.~D.~King-Smith and D.~Vanderbilt:
969:   Phys.~Rev.~B {\bf 49} (1994) 5828.
970: 
971:   \bibitem{WM} See, {\it e.g.}, R.~M.~Wentzcovitch and J.~L.~Martins:
972:   Solid State Commun. {\bf 78} (1991) 831.
973: 
974:   \bibitem{KB} L.~Kleinman and D.~M.~Bylander:
975:   Phys.~Rev.~Lett. {\bf 48} (1982) 1425.
976: 
977:   \bibitem{GTH} S.~Goedecker, M.~Teter and J.~Hutter:
978:   Phys.~Rev.~B {\bf 54} (1996) 1703.
979: 
980:   \bibitem{EES} E.~Tsuchida and K.~Terakura:
981:   J.~Phys.~Soc.~Jpn. {\bf 71} (2002), to appear.
982: 
983:   \bibitem{LIQ} M.~P.~Allen and D.~J.~Tildesley:
984:   {\it Computer Simulation of Liquids} (Oxford Science Publications,
985:   Clarendon, Oxford, 1987).
986: 
987:   \bibitem{RAT} H.~C.~Andersen: J.~Comp.~Phys. {\bf 52} (1983) 24.
988: 
989:   \bibitem{JPSJ2} E.~Tsuchida and K.~Terakura:
990:   J.~Phys.~Soc.~Jpn. {\bf 70} (2001) 924.
991: 
992: \end{thebibliography}
993: 
994: \newpage
995: 
996: \begin{table}
997: \caption{Notation for the full/reduced vectors. Note that
998: $\vvec$ and $\uvec$ denote the previous and
999: the current gradients, respectively.}
1000: \label{TAB0}
1001: \begin{tabular}{lcc}
1002: \hspace*{2cm} & \hspace{1.5cm} & \hspace{1.5cm} \\
1003: \hline
1004:          & full     & reduced \\
1005: \hline
1006: gradient         & $\gvec$    & $\vvec, \uvec$ \\
1007: search direction & $\pvec$    & $\qvec$ \\
1008: update vectors   & $\Delta \wfvec$  & $\svec$ \\
1009:                  & $\Delta \gvec$   & $\yvec$ \\
1010: wavefunction     & \wfvec     &  - \\
1011: \hline
1012: \end{tabular}
1013: \end{table}
1014: 
1015: 
1016: \begin{table}
1017: \caption{
1018: The performance of the conjugate gradient method and the
1019: quasi-Newton method is compared in the molecular-dynamics simulations
1020: of bulk diamond. $\NITER$ and $\NENE$ denote the number of iterations
1021: and total energy evaluations averaged over 50 ionic steps,
1022: respectively. }
1023: \label{TAB1}
1024: \begin{tabular}{rrr}
1025: \hspace*{2.0cm} & \hspace{1.5cm} & \hspace{1.5cm} \\
1026: \hline
1027: method     & $\NITER$ & $\NENE$ \\
1028: \hline
1029: Conjugate gradient  & 14.3  & 29.6 \\
1030: \hline
1031: BFGS, \,\,\, $m$ =  2 & 14.9  & 15.9 \\
1032:                     3 & 13.8  & 14.8 \\
1033:                     4 & 13.8  & 14.8 \\
1034:                     5 & 12.9  & 13.9 \\
1035:                     6 & 12.4  & 13.4 \\
1036:                     7 & 12.1  & 13.1 \\
1037:                     8 & 11.9  & 12.9 \\
1038:                     9 & 11.7  & 12.7 \\
1039:                    10 & 11.6  & 12.6 \\    
1040:                    20 & 12.4  & 13.4 \\
1041: \hline
1042: \end{tabular}
1043: \end{table}
1044: 
1045: 
1046: \begin{table}
1047: \caption{
1048: A number of variants are compared for
1049: the BFGS with $m = 7$. (a) Reference run from Table \ref{TAB1}.
1050: (b) A line search with a parabolic fit was forced in each cycle.
1051: (c) The Hessian was discarded at the end of each ionic step.
1052: (d) $ \sigma_k = |\yvec_k|^2 / \yvec_k^T \svec_k $
1053: was used as the curvature for the new direction. }
1054: \label{TAB2}
1055: \begin{tabular}{ccc}
1056: \hspace*{1.5cm} & \hspace{2cm} & \hspace{1.5cm} \\
1057: \hline
1058: method     & $\NITER$ & $\NENE$ \\
1059: \hline
1060: (a)  & 12.1  & 13.1 \\
1061: (b)  & 12.2  & 25.4 \\
1062: (c)  & 15.1  & 16.1 \\
1063: (d)  & 14.3  & 15.4 \\
1064: \hline
1065: \end{tabular}
1066: \end{table}
1067: 
1068: 
1069: \begin{table}
1070: \caption{
1071: The effect of compression is compared for the BFGS with $m=3$ and 7.
1072: }
1073: \label{TAB3}
1074: \begin{tabular}{rrrr}
1075: \hspace*{1cm} & \hspace{1.2cm} & \hspace{1.5cm} & \hspace {1.5cm} \\
1076: \hline
1077: $m$   & $\NBIT$ & $\NITER$ & $\NENE$ \\
1078: \hline
1079:  3    & 64 & 13.8  & 14.8 \\
1080:       &  8 & 14.1  & 15.1 \\
1081:       &  4 & 14.3  & 15.3 \\
1082:       &  3 & 15.4  & 16.4 \\
1083:  7    & 64 & 12.1  & 13.1 \\
1084:       &  8 & 12.2  & 13.2 \\
1085:       &  4 & 12.2  & 13.2 \\
1086:       &  3 & 13.3  & 14.3 \\
1087: \hline
1088: \end{tabular}
1089: \end{table}
1090: 
1091: 
1092: \begin{table}
1093: \caption{
1094: The results of selected runs from Table \ref{TAB1}-\ref{TAB3},
1095: repeated for an isolated cytosine molecule (C$_4$H$_5$N$_3$O).}
1096: \label{TAB4}
1097: \begin{tabular}{rrrr}
1098: \hspace*{2.0cm} & \hspace{1.2cm} & \hspace{1.5cm} & \hspace {1.5cm} \\
1099: \hline
1100: method  & $\NBIT$ & $\NITER$ & $\NENE$ \\
1101: \hline
1102: Conjugate gradient  & -- & 13.1  & 27.1 \\
1103: BFGS, \,\,\, $m=3$  & 64 & 13.0  & 14.0 \\
1104:                     &  8 & 13.0  & 14.0 \\
1105:                     &  4 & 13.0  & 14.0 \\
1106:                     &  3 & 13.3  & 14.3 \\
1107: BFGS, \,\,\, $m=7$  & 64 & 10.7  & 11.7 \\
1108: \hline
1109: \end{tabular}
1110: \end{table}
1111: 
1112: \begin{figure}
1113:    \caption{The distribution function $d(x)$ of the compressed search
1114:    direction $\PINT$ for $\NBIT = 8$.
1115:    }
1116:    \label{PDOS}
1117: \end{figure}
1118: 
1119: \end{document}
1120: 
1121: